Versión: Estable (v4.x)

Migración desde el scraper heredado

Traducción Beta No Oficial

Esta página fue traducida por PageTurner AI (beta). No está respaldada oficialmente por el proyecto. ¿Encontraste un error? Reportar problema →

Introducción

Con la nueva versión de la UI de DocSearch, queríamos ir más allá y ofrecer mejores herramientas para crear y mantener tu archivo de configuración, ¡además de funciones adicionales de Algolia que lleváis mucho tiempo solicitando!

¿Qué hay de nuevo?

Scraper

La infraestructura de DocSearch ahora utiliza el Algolia Crawler. Nos hemos unido a nuestros colegas para crear un nuevo helper de DocSearch, que extrae registros como hacíamos antes con nuestro querido scraper de DocSearch.

The best part is that you no longer need to install any tooling on your side if you want to maintain or update your index!

Ahora ofrecemos una interfaz web heredada o nueva que te permitirá:

Iniciar, programar y monitorear tus rastreos
Editar tu archivo de configuración desde nuestro editor en vivo
Probar tus resultados directamente con DocSearch v3 o DocSearch v4

Aplicación y credenciales de Algolia

Hemos recibido muchas solicitudes pidiendo:

Formas de gestionar miembros del equipo
Explorar y ver cómo se indexan los registros de Algolia
Ver y suscribirse a otras funciones de Algolia

¡Ahora todas están disponibles en tu propia aplicación de Algolia, completamente gratis! :D

Preguntas frecuentes

Puedes encontrar respuestas relacionadas con la migración de DocSearch en nuestra página de preguntas frecuentes del Crawler.

Enlaces útiles

Mapeo de claves del archivo de configuración

Below are the keys that can be found in the legacy DocSearch configs and their translation to an Algolia Crawler config. For more detailed information on the Algolia Crawler, see the official documentation.

`legacy`	`current`	description
`start_urls`	`startUrls`	Now accepts URLs only, see `helpers.docsearch` to handle custom variables
`page_rank`	`pageRank`	Can be added to the `recordProps` in `helpers.docsearch`, should be passed as a string
`js_render`	`renderJavaScript`	Unchanged
`js_wait`	`renderJavascript.waitTime`	See documentation of `renderJavaScript`
`index_name`	removed, see `actions`	Handled directly in the `actions`
`sitemap_urls`	`sitemaps`	Unchanged
`stop_urls`	`exclusionPatterns`	Supports `micromatch`
`selectors_exclude`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`
`custom_settings`	`initialIndexSettings`	Unchanged
`scrape_start_urls`	removed	Can be handled with `exclusionPatterns`
`strip_chars`	removed	`#` are removed automatically from anchor links, edge cases should be handled in the `recordExtractor` and `helpers.docsearch`
`conversation_id`	removed	Not needed anymore
`nb_hits`	removed	Not needed anymore
`sitemap_alternate_links`	removed	Not needed anymore
`stop_content`	removed	Should be handled in the `recordExtractor` and `helpers.docsearch`

Introducción​

¿Qué hay de nuevo?​

Scraper​

Aplicación y credenciales de Algolia​

Preguntas frecuentes​

Enlaces útiles​

Mapeo de claves del archivo de configuración​