Loading connector details…
Loading connector details…
Choose a unique username to continue using AgentHotspot
by YoungerMax • Uncategorized
A horizontally scalable search engine crawler and indexing pipeline with a FastAPI search API.
Perform large-scale web crawling and indexing with horizontal scalability.
Integrated web and news search capabilities with offline ranking computations.
A deployable search API with support for distributed batch processing.
This project provides a production-style distributed crawler and indexing system built with Python and Postgres. It features fast stateless workers for crawling, offline global computations like PageRank and BM25, and a migration-first schema management using Alembic. The system supports both web and news search results, with integrated RSS/Atom news feed fetching and indexing. It is designed for easy deployment with Docker Compose and Docker Swarm, supporting distributed batch processing for scalability.