MCP Webcrawl Server

MCPOfficialOpen SourceNOASSERTION41.1

by pragmar • Analytics & Monitoring

Advanced search and retrieval for web crawler data, enabling LLMs and agents to filter, analyze, and extract information from archived crawls.

Example Use Cases

1
Search and filter large web-archive datasets with boolean and field-specific queries to locate pages, images, or other resource types.
2
Extract and condense HTML into Markdown or snippets (or request thumbnails) to reduce token usage for summarization, analysis, or QA tasks.
3
Run automated site audits (SEO, 404, performance) or scrape structured data via XPath/regex across archived crawls.

Description

MCP Webcrawl Server provides a boolean-capable fulltext search interface and resource filtering for web crawler archives, working with multiple crawl formats (ArchiveBox, HTTrack, WARC, wget, Katana, etc.). It supports field-specific queries, extras like Markdown conversion, snippets, regex/XPath extraction, and thumbnails to produce token-efficient results for LLMs. The package is Claude Desktop ready, installable via pip, includes copy-and-paste prompt routines for automated audits (SEO, 404, performance, file audits), and offers an interactive terminal mode for searching remote or local archives without downloads.

Quick Actions

View on GitHub

Security

Scanned 4 month(s) ago

Risk Level

MINIMAL

Read-only data retrieval, no side effects

Trust Score

D44/100

4/17 checks passed

Scores are informational only and provided “as is” without warranty. AgentHotspot assumes no liability for actions taken based on these ratings.

Quick Stats

Service TypeMCP

Pricing ModelFree

Capabilities0 Tools / 0 Prompts / 0 Resources

Ownerpragmar

CategoryAnalytics & Monitoring

DependenciesStandalone