版本：旧版 (v3.x)

必要的配置

非官方测试版翻译

本页面由 PageTurner AI 翻译（测试版）。未经项目官方认可。发现错误？报告问题 →

本节将介绍优化我们爬取效果的最佳实践。采用以下规范是让爬虫从您的网站构建最佳体验的必要条件。您需要更新网站并遵循这些规则。

信息

如果您的网站由我们支持的工具生成，则无需修改网站，因为它已符合我们的要求。

通用配置示例

您可以在下方找到默认的 DocSearch 配置模板，并参考complex extractors章节中的示例进行调整。

如果您正在使用我们的集成方案，请查阅模板页面。

docsearch-default.js

new Crawler({
  appId: 'YOUR_APP_ID',
  apiKey: 'YOUR_API_KEY',
  startUrls: ['https://YOUR_START_URL.io/'],
  sitemaps: ['https://YOUR_START_URL.io/sitemap.xml'],
  actions: [
    {
      indexName: 'YOUR_INDEX_NAME',
      pathsToMatch: ['https://YOUR_START_URL.io/**'],
      recordExtractor: ({ helpers }) => {
        return helpers.docsearch({
          recordProps: {
            lvl0: {
              selectors: '',
              defaultValue: 'Documentation',
            },
            lvl1: ['header h1', 'article h1', 'main h1', 'h1', 'head > title'],
            lvl2: ['article h2', 'main h2', 'h2'],
            lvl3: ['article h3', 'main h3', 'h3'],
            lvl4: ['article h4', 'main h4', 'h4'],
            lvl5: ['article h5', 'main h5', 'h5'],
            lvl6: ['article h6', 'main h6', 'h6'],
            content: ['article p, article li', 'main p, main li', 'p, li'],
          },
          aggregateContent: true,
          recordVersion: 'v3',
        });
      },
    },
  ],
  initialIndexSettings: {
    YOUR_INDEX_NAME: {
      attributesForFaceting: ['type', 'lang'],
      attributesToRetrieve: [
        'hierarchy',
        'content',
        'anchor',
        'url',
        'url_without_anchor',
        'type',
      ],
      attributesToHighlight: ['hierarchy', 'content'],
      attributesToSnippet: ['content:10'],
      camelCaseAttributes: ['hierarchy', 'content'],
      searchableAttributes: [
        'unordered(hierarchy.lvl0)',
        'unordered(hierarchy.lvl1)',
        'unordered(hierarchy.lvl2)',
        'unordered(hierarchy.lvl3)',
        'unordered(hierarchy.lvl4)',
        'unordered(hierarchy.lvl5)',
        'unordered(hierarchy.lvl6)',
        'content',
      ],
      distinct: true,
      attributeForDistinct: 'url',
      customRanking: [
        'desc(weight.pageRank)',
        'desc(weight.level)',
        'asc(weight.position)',
      ],
      ranking: [
        'words',
        'filters',
        'typo',
        'attribute',
        'proximity',
        'exact',
        'custom',
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: '</span>',
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: 'allOptional',
      separatorsToIndex: '_',
    },
  },
});

清晰布局概览

遵循这些最佳实践的网站将呈现简洁明了的视觉效果，可能具有以下特征：

主要的蓝色元素将是您的 .DocSearch-content 容器。更多细节请参考后续指南。

使用正确的类作为 `recordProps`

您可以添加特定的静态 CSS 类来帮助我们识别内容角色。这些类不应引起样式变化，而是专门用于在文档中实现即输即学的优质体验。

在文本内容的主容器添加静态类 DocSearch-content。该容器通常是 <main> 或 <article> HTML 元素。
主文档容器外部的所有可搜索 lvl 元素（例如侧边栏中）必须使用 global 选择器。它们将被全局采集并注入到页面生成的所有记录中。注意：层级值至关重要，所有匹配元素必须沿 HTML 流递增。层级 X（对应 lvlX）应出现在层级 Y 之后，且满足 X > Y。
lvlX 选择器应使用标准标题标签如 h1, h2, h3 等，也可使用静态类。请按以下要求为这些元素设置唯一的 id 或 name 属性。
所有匹配 lvlX 选择器的 DOM 元素必须具备唯一的 id 或 name 属性。这能确保重定向时精确滚动到目标位置，这些属性定义了正确的锚点。
所有文本元素（recordProps 的 content）必须包裹在 <p> 或 <li> 标签中。内容应保持原子化并拆分为小单元。注意避免嵌套匹配元素，否则会产生重复记录。
保持一致性，确保整个 HTML 流中遵循统一规范。

通过 meta 标签添加全局信息

我们的爬虫会自动提取 DocSearch 专属 meta 标签中的信息：

<meta name="docsearch:language" content="en" />
<meta name="docsearch:version" content="1.0.0" />

爬虫会将这些 meta 标签的 content 值添加到页面提取的所有记录中。meta 标签的 name 必须遵循 docsearch:$NAME 模式，其中 $NAME 是设置到所有记录的属性名。

docsearch:version meta 标签可以是逗号分隔的标记集合，每个标记代表页面相关的版本。这些标记必须符合SemVer 规范或仅包含字母数字字符（如 latest, next 等）。作为分面过滤器，这些版本标记不区分大小写。

例如，包含以下 meta 标签的页面所提取的所有记录：

<meta name="docsearch:version" content="2.0.0-alpha.62,latest" />

这些记录的 version 属性将为：

version:["2.0.0-alpha.62", "latest"]

您可将这些属性转换为 facetFilters，从而在 UI 界面上进行筛选。

通用配置示例​

清晰布局概览​

使用正确的类作为 recordProps​

通过 meta 标签添加全局信息​

推荐优化项​

通用配置示例

清晰布局概览

使用正确的类作为 `recordProps`

通过 meta 标签添加全局信息

推荐优化项