🍲 The cheapest token is the one you never spend

A local AI assistant that knows when not to use AI.

Runs on your own computer. Does the cheap, deterministic work with scripts β€” for free β€” and saves the model for what actually needs thinking.

Self-hosted β€” your machine Multi-LLM β€” Claude Β· GPT Β· Gemini Β· Groq Β· Ollama Β· any OpenAI-compatible Windows Β· macOS Β· Linux Reach it from browser Β· terminal Β· Telegram Β· Discord Apache-2.0
πŸ”’Privacy by design

Don't want an AI using your personal data?

TokenSoup redacts names, IDs, IBANs, cards and emails before anything is sent to a model.

The safest data is the data that never left your computer.

⚑0 tokens

Why pay a model to run git status?

Deterministic work runs as a local script β€” and reuses the scripts you already trust. No round trip, no cost.

The cheapest token is the one you never spend.

πŸ›‘οΈHuman in the loop

Nothing destructive happens without your yes.

Reject a command and it stays rejected β€” the model can't bypass you by switching tools or rewording the path.

You stay in control of your own machine.

🧠Multi-model teams

One task, the right model for each part.

A coordinator splits the work across parallel agents β€” frontier to plan, cheap or local models for the legwork.

Orchestrate many models, not just pick one.

πŸ”—Over MCP Β· beta

Works with your other tools.

Any MCP client β€” Claude Code, Cursor, Cline β€” can call TokenSoup's tools; TokenSoup can use other MCP servers too.

A good neighbor in your toolchain.

The point

Deterministic work and scripts you already trust shouldn't cost a single token.

Most cloud agents answer β€œgit status” or β€œis this valid JSON” by shipping your context to a model and paying for the round trip. TokenSoup recognizes those, runs a local routine instead, and only reaches for a model β€” the right-sized one β€” when the task genuinely needs reasoning. It runs on your machine, redacts personal data before anything leaves it, and asks before doing anything destructive.

What it is

A finished, self-hosted assistant you launch with a double-click and talk to in your browser, terminal, or a chat app. One API key to start. Built for a computer-comfortable person, not just developers.

What it isn't

Not a developer library you import into your code, and not trying to compete with those. Those are toolkits for engineers building agent systems. TokenSoup is the finished assistant itself.

How it saves tokens

Three ideas, one principle.

Spend nothing where nothing needs spending; spend modestly where a small model suffices; spend fully only on the hard parts.

⚑

Indie Agents

0 tokens

Deterministic tasks β€” git status, file counts, JSON checks, format checks β€” run as built-in indie scripts, not model calls. The smart router recognizes them and runs the local routine automatically. Auto-Indie can even write a reusable script the first time it sees a repeatable task.

🧩

Reuse what works

your scripts, 0 tokens

Already have battle-tested glue in Bash, PowerShell, Python, or Node β€” or a compiled binary? Register it as an indie agent and TokenSoup calls it directly instead of re-implementing, and re-paying for, the same logic in an LLM. Your legacy automation becomes part of the assistant.

🎯

Smart Router

right-sized model

When a model is needed, the router classifies the task and routes it: simple text to a cheap/fast model, standard work to a mid model, heavy work to a frontier model. A semantic cache returns repeated answers with no new call.

ScenarioTypical baselineWith TokenSoup
Git operations~500 tok / query0 tok Β· indie
Repeated queries~1000 tok0 tok Β· cache
Simple tasks~800 tok Β· large model~200 tok Β· small model
Email triage~2000 tok / email~200 tok Β· small model
⚠️ These are estimates, not measured benchmarks. A reproducible benchmark (same task set through TokenSoup and a baseline agent, tokens counted) is in progress. Treat the figures as the intent, not a proven result.
Security-first

An assistant on your machine shouldn't be able to quietly do harm.

Safety is layered, on the assumption that a tool which can run commands needs guardrails β€” not blind trust.

0

PII Redaction

Emails, IDs, IBANs, phone numbers replaced with placeholders before anything is sent to a model.

1

Injection Scan

User messages checked for prompt-injection patterns.

2

External Content Guard

Web and file content passes the same guard before it can influence the model.

3

Canary Tokens

Secret markers in context detect exfiltration attempts.

4Β·5

Response Validation

The final response is scanned before it's shown to you.

πŸ”

Encrypted KeyVault

API keys encrypted at rest (AES-256-GCM); localhost-only by default.

HITL β€” Human-in-the-Loop Path Guard

Dangerous operations require your explicit approval. Reject a file operation and that path is protected against every tool β€” bash, write_file, edit_file, python -c, shell redirects β€” for the rest of the turn. The model can't bypass your rejection by switching tools or rewording the path, and an already-rejected command won't re-prompt until you ask for it again yourself. Destructive writes that would clobber an existing file require confirmation.

Under the hood

How a request flows.

Unlike a gateway that fans messages out to channels, TokenSoup sends every request down through layers β€” one engine applies redaction, caching, and routing, then branches to a free local script or a right-sized model, with your approval guarding anything destructive.

Web UI Β· CLI Β· Bots Query Engine single point every request flows through 1 Β· PII redaction strips data before it leaves 2 Β· Semantic cache repeat answers, 0 tokens 3 Β· Smart router classifies the task, picks the cheapest path Indie agents deterministic Β· 0 tokens Model provider right-sized tier when needed HITL approval gate you approve anything destructive Response to you

Works with your other tools β€” over MCP beta

TokenSoup speaks the Model Context Protocol in both directions: any MCP client (Claude Code, Cursor, Cline, …) can call TokenSoup's tools, and TokenSoup can call out to other MCP servers.

MCP clients Claude Code Β· Cursor Cline Β· any MCP client TokenSoup MCP server + MCP client Other MCP servers their tools, used by TokenSoup calls Exposed tools media gen Β· analyze image Β· run-with-model (other LLMs) Β· indie agents Β· flows

Polyglot β€” port a slow function to a faster language alpha

A separate, experimental layer: hand it a slow Python function and it benchmarks, proposes a port to Rust/C/Go/COBOL, waits for your approval, compiles, and verifies the result matches. Registering the artifact as an indie agent is a deliberate second step.

Slow Python benchmark baseline Propose port Rust Β· C Β· Go Β· COBOL You approve HITL gate Compile Β· verify same result? Register as indie reuse it Β· 0 tokens manual second step Experimental / alpha β€” proof of concept, not production-ready

Multi-model orchestration β€” teams of agents

A coordinator breaks a complex task into subtasks and runs workers in parallel β€” and each agent can run a different model, including a local one. Use a frontier model to plan, cheap or local models for the legwork, a strong model to synthesize.

Complex task Coordinator plans & splits Β· any model Worker Β· cheap model parallel Worker Β· local model parallel Β· 0 API cost Worker Β· frontier parallel Synthesizer β†’ answer

And there's more under the hood β€” durable multi-step flows that survive a session restart, an autonomous agent mode, scheduled cron tasks, batch processing, a semantic cache, and per-agent security profiles. TokenSoup is built to bend to how you work, not the other way around.

Get started

Install & run.

First run installs dependencies, asks for one API key, and opens the web UI at http://localhost:7070.

πŸ‘† Easiest: open the tokensoup folder and double-click TokenSoup.bat (or TokenSoup-PS.cmd) β€” that's it.
…or from a terminal:
# From the extracted folder
> cd path\to\tokensoup
> python soup.py
Linux
$ cd tokensoup
$ bash install.sh        # or: python3 soup.py
macOS β€” creates ~/Applications/TokenSoup.app
$ cd tokensoup
$ bash install.sh        # or: python3 soup.py
FlagWhat it does
python soup.pyWeb UI at localhost:7070 (default)
--cliTerminal chat instead of the web UI
--resetRe-enter API keys
--checkCheck installation state
--lang en | srForce English or Serbian
--port 8080Custom port
Reference

Every command.

All commands are typed directly in chat. Arguments in [brackets] are optional. Full reference also lives in COMMANDS.md.

No commands match that filter.

πŸ’¬ Sessions

/new [name]New session /sessionsList all sessions /statusSession status β€” tokens, model, cost /contextContext window details /clearClear conversation history /compactCompress context /memoryShow MEMORY.md /resume [name]Resume previous session

πŸ€– Model

/model <name>Switch model /modelsList available models /lang <code>Interface language

βš™οΈ Configuration

/config get <key>Read setting /config set <key> <val>Set value /config listAll settings /security statusSecurity layer overview /security pii on|offPII redaction /security profile <name>Security profile (standard/strict/minimal) /vault set <key> <val>Store encrypted value /vault get <key>Get encrypted value

πŸ”‘ API Keys

/keys set <provider> <key>Set API key /keys listList keys (masked) /keys remove <provider>Delete key /keys test <provider>Test key

πŸš€ Agent Mode

/agent <task>Run autonomous agent /agent stopStop agent /agent statusAgent status /agent sandbox on|offSandbox isolation /agent capabilitiesAllowed tools /hitl on|offRequire approval before each step /approveApprove pending step /observe on|offMonitor agent activity /router statusSmart router status /router set <tier> <model>Model for a tier /si <task>Self-improve agent

⚑ Indie Agents · 0 tokens

Local scripts that run without API calls.

/indie listAll registered agents /indie run <name> [args]Run an agent /indie new <name>Create template /indie register <name> <path>Register a script /indie create "<desc>"LLM generates + registers /indie remove <name>Remove agent /indie info <name>Agent details
Built-in agents
git_statusgit_diffgit_loggit_commitfile_statsfile_findjson_validatejson_formatsystem_infodisk_usageweb_scrapesession_summary

πŸ“Š Semantic Cache

/cache on|offToggle semantic cache /cache statusHit rate and stats /cache clearClear cache /cache installInstall sentence-transformers (once) /cache threshold <0-1>Similarity threshold (default 0.85)

πŸ“‹ Batch Processing

Batch API costs 50% less; results within 24h.

/batch create <name>New batch job /batch add <name> <prompt>Add item /batch run <name>Submit /batch status <id>Check status /batch results <id>Get results

πŸ‘₯ Teams (Multi-Agent)

/team create <name>Create team /team add <team> <agent>Add agent /team run <name> <task>Run team task /team status <name>Team status /team listAll teams

πŸ’Ό Flows (Durable Tasks)

/flow start <name>Start flow /flow listActive flows /flow resume <id>Resume paused flow /flow stop <id>Stop flow

πŸ“§ Email Pipeline Β· optional

/email triageTriage inbox /email draft <to> <subject>Draft email /email send <id>Send drafted email /email configEmail settings

πŸ—ΊοΈ CodeMap

/map buildBuild AST knowledge graph /map search <query>Search codebase /map statsCodebase statistics /map deps <file>File dependencies

πŸ”§ Development

/code run <file>Execute file /code testRun tests /code lintLint codebase /git <command>Git operations /project initInitialize project

πŸ”— MCP beta

/mcp connect <url>Connect MCP server /mcp listConnected servers /mcp toolsAvailable MCP tools /mcp disconnect <id>Disconnect server

πŸ’‘ Other

/help [command]Help β€” general or specific /doctorFull diagnostic check /costToken cost summary /exportExport session to Markdown /versionTokenSoup version
Who it's for

Made for people, not just terminals.

πŸ§‘β€πŸ’»

Computer-comfortable, not a coder

You want a capable assistant you launch with a double-click and talk to in a browser β€” not a CLI tool that assumes you live in a terminal.

πŸ”’

Privacy-minded

You want the assistant on your own machine, with personal data redacted before anything leaves it, and a clear say over destructive actions.

🏒

Small teams Enterprise Β· in development

One self-hosted helper for everyday work. An Enterprise module (in development) will add team features, stronger controls, auditable oversight, and the option of a fully local model behind your firewall β€” all on the same core engine.

Project status & honesty

  • The HITL security path is well-tested β€” path guard, repeated-rejection blocking, .NET/alias-aware destructive-command detection, and destructive-write detection all have dedicated coverage.
  • The token-savings numbers are estimates, not yet a published reproducible benchmark.
  • Some areas are still being hardened: test-suite consolidation, unified logging, and a duplicated tool implementation scheduled for 0.15.1.
  • The security model has not yet had an independent third-party audit. For high-stakes deployments, commission one before relying on it.