Version : Stable (v4.x)

Migration depuis l'ancien scraper

Traduction Bêta Non Officielle

Cette page a été traduite par PageTurner AI (bêta). Non approuvée officiellement par le projet. Vous avez trouvé une erreur ? Signaler un problème →

Introduction

Avec la nouvelle version de l'interface DocSearch, nous voulons aller plus loin en vous offrant de meilleurs outils pour créer et maintenir votre fichier de configuration, ainsi que des fonctionnalités Algolia supplémentaires que vous réclamiez depuis longtemps !

Quoi de neuf ?

Scraper

L'infrastructure DocSearch s'appuie désormais sur le Crawler Algolia. Nous avons collaboré avec nos équipes pour créer un nouvel outil DocSearch qui extrait les enregistrements comme le faisait notre cher ancien scraper DocSearch !

The best part is that you no longer need to install any tooling on your side if you want to maintain or update your index!

Nous proposons désormais une interface web héritée ou nouvelle qui vous permet de :

Démarrer, planifier et surveiller vos crawls
Modifier votre fichier de configuration via notre éditeur en direct
Tester vos résultats directement avec DocSearch v3 ou DocSearch v4

Application Algolia et identifiants

Nous avons reçu de nombreuses demandes concernant :

La gestion des membres d'équipe
La consultation de l'indexation des enregistrements Algolia
L'accès et l'abonnement à d'autres fonctionnalités Algolia

Tout cela est désormais disponible dans votre propre application Algolia, gratuitement :D

FAQ

Retrouvez les réponses concernant la migration DocSearch sur notre page FAQ du Crawler.

Liens utiles

Correspondance des clés du fichier de configuration

Below are the keys that can be found in the legacy DocSearch configs and their translation to an Algolia Crawler config. For more detailed information on the Algolia Crawler, see the official documentation.

`legacy`	`current`	description
`start_urls`	`startUrls`	Now accepts URLs only, see `helpers.docsearch` to handle custom variables
`page_rank`	`pageRank`	Can be added to the `recordProps` in `helpers.docsearch`, should be passed as a string
`js_render`	`renderJavaScript`	Unchanged
`js_wait`	`renderJavascript.waitTime`	See documentation of `renderJavaScript`
`index_name`	removed, see `actions`	Handled directly in the `actions`
`sitemap_urls`	`sitemaps`	Unchanged
`stop_urls`	`exclusionPatterns`	Supports `micromatch`
`selectors_exclude`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`
`custom_settings`	`initialIndexSettings`	Unchanged
`scrape_start_urls`	removed	Can be handled with `exclusionPatterns`
`strip_chars`	removed	`#` are removed automatically from anchor links, edge cases should be handled in the `recordExtractor` and `helpers.docsearch`
`conversation_id`	removed	Not needed anymore
`nb_hits`	removed	Not needed anymore
`sitemap_alternate_links`	removed	Not needed anymore
`stop_content`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`

Introduction​

Quoi de neuf ?​

Scraper​

Application Algolia et identifiants​

FAQ​

Liens utiles​

Correspondance des clés du fichier de configuration​