LocalLlama MCP Server

MCPOpen Source45.5

by Heratiki • Modeling & Simulation

An MCP server that reduces token costs by routing coding tasks between local LLMs and paid APIs and by reusing indexed code.

Example Use Cases

1
Minimize API token usage and cost by routing coding tasks to local models when appropriate.
2
Reuse existing code by performing semantic code search (Retriv First) before generating new code.
3
Benchmark local and cloud models, decompose complex coding tasks, and track job progress and costs.

Description

LocalLlama MCP Server analyzes incoming coding tasks and dynamically decides whether to offload them to local instruct models (e.g., LM Studio, Ollama) or to paid APIs (via OpenRouter) to optimize cost and quality. It includes task decomposition, dependency mapping, benchmark-driven model selection, a Retriv-based code search first strategy, and a lock mechanism to prevent duplicate server instances. The server also provides job tracking, cost monitoring, and tools for benchmarking and managing OpenRouter/free model integrations.

Quick Actions

View on GitHub

Security

Scanned 4 month(s) ago

Risk Level

MINIMAL

Read-only data retrieval, no side effects

Trust Score

D44/100

7/17 checks passed

Scores are informational only and provided “as is” without warranty. AgentHotspot assumes no liability for actions taken based on these ratings.

Quick Stats

Service TypeMCP

Pricing ModelFree

Capabilities0 Tools / 0 Prompts / 0 Resources

OwnerHeratiki

CategoryModeling & Simulation

DependenciesStandalone