バージョン: 安定版 (v4.x)

レコード抽出ツール

非公式ベータ版翻訳

このページは PageTurner AI で翻訳されました（ベータ版）。プロジェクト公式の承認はありません。エラーを見つけましたか？問題を報告 →

はじめに

情報

このドキュメントでは helpers.docsearch メソッドに関する情報のみを扱います。Algolia Crawler の詳細については Algolia Crawler ドキュメント を参照してください。

ページは recordExtractor によって抽出されます。これらの抽出ツールは recordExtractor パラメータを介して actions に割り当てられます。このパラメータは、インデックス化したいデータを JSON オブジェクトの配列として整理して返す関数を指します。

ヘルパーはコンテンツ抽出と Algolia レコード生成を支援する関数群です

役立つリンク

使用方法

DocSearch ヘルパーを使用する最も一般的な方法は、その結果を recordExtractor 関数に返すことです。

recordExtractor: ({ helpers }) => {
  return helpers.docsearch({
    recordProps: {
      lvl0: {
        selectors: "header h1",
      },
      lvl1: "article h2",
      lvl2: "article h3",
      lvl3: "article h4",
      lvl4: "article h5",
      lvl5: "article h6",
      content: "main p, main li",
    },
  });
},

Cheerio を使用した DOM 操作

Cheerio instance ($) を使用すると DOM を操作できます：

recordExtractor: ({ $, helpers }) => {
  // Removing DOM elements we don't want to crawl
  $(".my-warning-message").remove();

  return helpers.docsearch({
    recordProps: {
      lvl0: {
        selectors: "header h1",
      },
      lvl1: "article h2",
      lvl2: "article h3",
      lvl3: "article h4",
      lvl4: "article h5",
      lvl5: "article h6",
      content: "main p, main li",
    },
  });
},

フォールバックセレクタの指定

一部のページに存在しない可能性のあるコンテンツを取得する場合、フォールバックセレクタが役立ちます：

recordExtractor: ({ $, helpers }) => {
  return helpers.docsearch({
    recordProps: {
      // `.exists h1` will be selected if `.exists-probably h1` does not exists.
      lvl0: {
        selectors: [".exists-probably h1", ".exists h1"],
      },
      lvl1: "article h2",
      lvl2: "article h3",
      lvl3: "article h4",
      lvl4: "article h5",
      lvl5: "article h6",
      // `.exists p, .exists li` will be selected.
      content: [
        ".does-not-exists p, .does-not-exists li",
        ".exists p, .exists li",
      ],
    },
  });
},

生テキストの提供 (`defaultValue`)

このオプションは lvl0 とカスタム変数セレクタでのみサポートされています

検索結果をウェブサイトとは異なる構造にしたい場合や、存在しない可能性のあるセレクタに defaultValue を提供したい場合があります：

recordExtractor: ({ $, helpers }) => {
  return helpers.docsearch({
    recordProps: {
      lvl0: {
        // It also supports the fallback DOM selectors syntax!
        selectors: ".exists-probably h1",
        defaultValue: "myRawTextIfDoesNotExists",
      },
      lvl1: "article h2",
      lvl2: "article h3",
      lvl3: "article h4",
      lvl4: "article h5",
      lvl5: "article h6",
      content: "main p, main li",
      // The variables below can be used to filter your search
      language: {
        // It also supports the fallback DOM selectors syntax!
        selectors: ".exists-probably .language",
        // Since custom variables are used for filtering, we allow sending
        // multiple raw values
        defaultValue: ["en", "en-US"],
      },
    },
  });
},

ファセット用コンテンツのインデックス登録

これらのセレクタも defaultValue とフォールバックセレクタをサポートしています

You might want to index content that will be used as filters in your frontend (e.g. version or lang), you can define any custom variable to the recordProps object to add them to your Algolia records:

recordExtractor: ({ helpers }) => {
  return helpers.docsearch({
    recordProps: {
      lvl0: {
        selectors: "header h1",
      },
      lvl1: "article h2",
      lvl2: "article h3",
      lvl3: "article h4",
      lvl4: "article h5",
      lvl5: "article h6",
      content: "main p, main li",
      // The variables below can be used to filter your search
      foo: ".bar",
      language: {
        // It also supports the fallback DOM selectors syntax!
        selectors: ".does-not-exists",
        // Since custom variables are used for filtering, we allow sending
        // multiple raw values
        defaultValue: ["en", "en-US"],
      },
      version: {
        // You can send raw values without `selectors`
        defaultValue: ["latest", "stable"],
      },
    },
  });
},

以下の version, lang, foo 属性がレコードで利用可能になります：

foo: "valueFromBarSelector",
language: ["en", "en-US"],
version: ["latest", "stable"]

これらを使用してフロントエンドで検索をフィルタリングできるようになります

`pageRank` による検索結果のブースト

このパラメータを使用すると、現在の pathsToMatch から構築されたカスタムランキング属性でレコードをブーストできます。pageRank が高いページは pageRank が低いページよりも先に返されます。デフォルト値は 0 で、負の値も含め任意の数値を文字列として渡せます。

検索結果は重み（降順）でソートされるため、ブーストされた結果と非ブースト結果の両方が表示されます。各結果の重みは、一致レベルや位置など複数の要因に基づいてクエリごとに計算され、pageRank 値が最終的な重みに加算されます。全体のランキング設定によっては、pageRank だけではクエリ結果に十分な影響を与えない場合があります。pageRank 値を変更しても（大きな値を設定しても）検索結果に十分な影響がない場合は、インデックスの Ranking and Sorting ページで weight.pageRank を上位に移動してください。

Algolia ダッシュボード（dashboard.algolia.com→検索→検索実行→各レコード右下の「ranking criteria」アイコンにマウスオーバー）で計算された重みを直接確認できます。これにより、どの程度の pageRank 値が適切か判断できます。

{
  indexName: "YOUR_INDEX_NAME",
  pathsToMatch: ["https://YOUR_WEBSITE_URL/api/**"],
  recordExtractor: ({ $, helpers, url }) => {
    const isDocPage = /\/[\w-]+\/docs\//.test(url.pathname);
    const isBlogPage = /\/[\w-]+\/blog\//.test(url.pathname);
    return helpers.docsearch({
      recordProps: {
        lvl0: {
          selectors: "header h1",
        },
        lvl1: "article h2",
        lvl2: "article h3",
        lvl3: "article h4",
        lvl4: "article h5",
        lvl5: "article h6",
        content: "article p, article li",
        pageRank: isDocPage ? "-2000" : isBlogPage ? "-1000" : "0",
      },
    });
  },
},

レコード数の削減

If you encounter the Extractors returned too many records error when your page outputs more than 750 records, the aggregateContent option helps you reduce the number of records at the content level of the extractor.

{
  indexName: "YOUR_INDEX_NAME",
  pathsToMatch: ["https://YOUR_WEBSITE_URL/api/**"],
  recordExtractor: ({ $, helpers }) => {
    return helpers.docsearch({
      recordProps: {
        lvl0: {
          selectors: "header h1",
        },
        lvl1: "article h2",
        lvl2: "article h3",
        lvl3: "article h4",
        lvl4: "article h5",
        lvl5: "article h6",
        content: "article p, article li",
      },
      aggregateContent: true,
    });
  },
},

レコードサイズの削減

If you encounter the Records extracted are too big error when crawling your website, it is usually because there is too much information in your records, or because your page is too large. The recordVersion option helps you reduce the records size by removing informations that are only used with DocSearch v2.

{
  indexName: "YOUR_INDEX_NAME",
  pathsToMatch: ["https://YOUR_WEBSITE_URL/api/**"],
  recordExtractor: ({ $, helpers }) => {
    return helpers.docsearch({
      recordProps: {
        lvl0: {
          selectors: "header h1",
        },
        lvl1: "article h2",
        lvl2: "article h3",
        lvl3: "article h4",
        lvl4: "article h5",
        lvl5: "article h6",
        content: "article p, article li",
      },
      recordVersion: "v3",
    });
  },
},

`recordProps` API リファレンス

`lvl0`

type: Lvl0 | 必須

type Lvl0 = {
  selectors: string | string[];
  defaultValue?: string;
};

`lvl1`, `content`

type: string | string[] | 必須

`lvl2`, `lvl3`, `lvl4`, `lvl5`, `lvl6`

type: string | string[] | オプション

`pageRank`

type: number | オプション

実際の使用例を参照

カスタム変数

type: string | string[] | CustomVariable | オプション

type CustomVariable =
  | {
      defaultValue: string | string[];
    }
  | {
      selectors: string | string[];
      defaultValue?: string | string[];
    };

カスタム変数はfilter your searchに使用され、recordProps で定義できます

`helpers.docsearch` API リファレンス

`aggregateContent`

type: boolean | default: true | オプション

このオプションはセレクターの content レベルで作成されたAlgoliaレコードを対応する見出しの単一レコードに集約します

`recordVersion`

type: 'v3' | 'v2' | default: v2 | オプション

This option removes content from the Algolia records that are only used for DocSearch v2. If you are using the latest version of DocSearch, you can set it to v3.

`indexHeadings`

type: boolean | { from: number, to: number } | default: true | オプション

このオプションはheadings(lvlX)をインデックスするかどうかをクローラーに指示します

falseの場合、contentレベルのレコードのみ生成されます
from, toを指定した場合、lvlXからlvlYまでのレコードのみ生成されます

はじめに​

役立つリンク​

使用方法​

Cheerio を使用した DOM 操作​

フォールバックセレクタの指定​

生テキストの提供 (defaultValue)​

ファセット用コンテンツのインデックス登録​

pageRank による検索結果のブースト​

レコード数の削減​

レコードサイズの削減​

recordProps API リファレンス​

lvl0​

lvl1, content​

lvl2, lvl3, lvl4, lvl5, lvl6​

pageRank​

カスタム変数​

helpers.docsearch API リファレンス​

aggregateContent​

recordVersion​

indexHeadings​