Don't want an AI using your personal data?
TokenSoup redacts names, IDs, IBANs, cards and emails before anything is sent to a model.
The safest data is the data that never left your computer.
Runs on your own computer. Does the cheap, deterministic work with scripts β for free β and saves the model for what actually needs thinking.
TokenSoup redacts names, IDs, IBANs, cards and emails before anything is sent to a model.
The safest data is the data that never left your computer.
git status?Deterministic work runs as a local script β and reuses the scripts you already trust. No round trip, no cost.
The cheapest token is the one you never spend.
Reject a command and it stays rejected β the model can't bypass you by switching tools or rewording the path.
You stay in control of your own machine.
A coordinator splits the work across parallel agents β frontier to plan, cheap or local models for the legwork.
Orchestrate many models, not just pick one.
Any MCP client β Claude Code, Cursor, Cline β can call TokenSoup's tools; TokenSoup can use other MCP servers too.
A good neighbor in your toolchain.
Deterministic work and scripts you already trust shouldn't cost a single token.
Most cloud agents answer βgit statusβ or βis this valid JSONβ by shipping your context to a model and paying for the round trip. TokenSoup recognizes those, runs a local routine instead, and only reaches for a model β the right-sized one β when the task genuinely needs reasoning. It runs on your machine, redacts personal data before anything leaves it, and asks before doing anything destructive.
A finished, self-hosted assistant you launch with a double-click and talk to in your browser, terminal, or a chat app. One API key to start. Built for a computer-comfortable person, not just developers.
Not a developer library you import into your code, and not trying to compete with those. Those are toolkits for engineers building agent systems. TokenSoup is the finished assistant itself.
Spend nothing where nothing needs spending; spend modestly where a small model suffices; spend fully only on the hard parts.
Deterministic tasks β git status, file counts, JSON checks, format checks β run as built-in indie scripts, not model calls. The smart router recognizes them and runs the local routine automatically. Auto-Indie can even write a reusable script the first time it sees a repeatable task.
Already have battle-tested glue in Bash, PowerShell, Python, or Node β or a compiled binary? Register it as an indie agent and TokenSoup calls it directly instead of re-implementing, and re-paying for, the same logic in an LLM. Your legacy automation becomes part of the assistant.
When a model is needed, the router classifies the task and routes it: simple text to a cheap/fast model, standard work to a mid model, heavy work to a frontier model. A semantic cache returns repeated answers with no new call.
| Scenario | Typical baseline | With TokenSoup |
|---|---|---|
| Git operations | ~500 tok / query | 0 tok Β· indie |
| Repeated queries | ~1000 tok | 0 tok Β· cache |
| Simple tasks | ~800 tok Β· large model | ~200 tok Β· small model |
| Email triage | ~2000 tok / email | ~200 tok Β· small model |
Safety is layered, on the assumption that a tool which can run commands needs guardrails β not blind trust.
Emails, IDs, IBANs, phone numbers replaced with placeholders before anything is sent to a model.
User messages checked for prompt-injection patterns.
Web and file content passes the same guard before it can influence the model.
Secret markers in context detect exfiltration attempts.
The final response is scanned before it's shown to you.
API keys encrypted at rest (AES-256-GCM); localhost-only by default.
Dangerous operations require your explicit approval. Reject a file operation and that path is protected against every tool β bash, write_file, edit_file, python -c, shell redirects β for the rest of the turn. The model can't bypass your rejection by switching tools or rewording the path, and an already-rejected command won't re-prompt until you ask for it again yourself. Destructive writes that would clobber an existing file require confirmation.
Unlike a gateway that fans messages out to channels, TokenSoup sends every request down through layers β one engine applies redaction, caching, and routing, then branches to a free local script or a right-sized model, with your approval guarding anything destructive.
TokenSoup speaks the Model Context Protocol in both directions: any MCP client (Claude Code, Cursor, Cline, β¦) can call TokenSoup's tools, and TokenSoup can call out to other MCP servers.
A separate, experimental layer: hand it a slow Python function and it benchmarks, proposes a port to Rust/C/Go/COBOL, waits for your approval, compiles, and verifies the result matches. Registering the artifact as an indie agent is a deliberate second step.
A coordinator breaks a complex task into subtasks and runs workers in parallel β and each agent can run a different model, including a local one. Use a frontier model to plan, cheap or local models for the legwork, a strong model to synthesize.
And there's more under the hood β durable multi-step flows that survive a session restart, an autonomous agent mode, scheduled cron tasks, batch processing, a semantic cache, and per-agent security profiles. TokenSoup is built to bend to how you work, not the other way around.
First run installs dependencies, asks for one API key, and opens the web UI at http://localhost:7070.
# From the extracted folder > cd path\to\tokensoup > python soup.py
$ cd tokensoup $ bash install.sh # or: python3 soup.py
$ cd tokensoup $ bash install.sh # or: python3 soup.py
| Flag | What it does |
|---|---|
| python soup.py | Web UI at localhost:7070 (default) |
| --cli | Terminal chat instead of the web UI |
| --reset | Re-enter API keys |
| --check | Check installation state |
| --lang en | sr | Force English or Serbian |
| --port 8080 | Custom port |
All commands are typed directly in chat. Arguments in [brackets] are optional. Full reference also lives in COMMANDS.md.
No commands match that filter.
/new [name]New session
/sessionsList all sessions
/statusSession status β tokens, model, cost
/contextContext window details
/clearClear conversation history
/compactCompress context
/memoryShow MEMORY.md
/resume [name]Resume previous session
/model <name>Switch model
/modelsList available models
/lang <code>Interface language
/config get <key>Read setting
/config set <key> <val>Set value
/config listAll settings
/security statusSecurity layer overview
/security pii on|offPII redaction
/security profile <name>Security profile (standard/strict/minimal)
/vault set <key> <val>Store encrypted value
/vault get <key>Get encrypted value
/keys set <provider> <key>Set API key
/keys listList keys (masked)
/keys remove <provider>Delete key
/keys test <provider>Test key
/agent <task>Run autonomous agent
/agent stopStop agent
/agent statusAgent status
/agent sandbox on|offSandbox isolation
/agent capabilitiesAllowed tools
/hitl on|offRequire approval before each step
/approveApprove pending step
/observe on|offMonitor agent activity
/router statusSmart router status
/router set <tier> <model>Model for a tier
/si <task>Self-improve agent
Local scripts that run without API calls.
/indie listAll registered agents
/indie run <name> [args]Run an agent
/indie new <name>Create template
/indie register <name> <path>Register a script
/indie create "<desc>"LLM generates + registers
/indie remove <name>Remove agent
/indie info <name>Agent details
git_statusgit_diffgit_loggit_commitfile_statsfile_findjson_validatejson_formatsystem_infodisk_usageweb_scrapesession_summary/cache on|offToggle semantic cache
/cache statusHit rate and stats
/cache clearClear cache
/cache installInstall sentence-transformers (once)
/cache threshold <0-1>Similarity threshold (default 0.85)
Batch API costs 50% less; results within 24h.
/batch create <name>New batch job
/batch add <name> <prompt>Add item
/batch run <name>Submit
/batch status <id>Check status
/batch results <id>Get results
/team create <name>Create team
/team add <team> <agent>Add agent
/team run <name> <task>Run team task
/team status <name>Team status
/team listAll teams
/flow start <name>Start flow
/flow listActive flows
/flow resume <id>Resume paused flow
/flow stop <id>Stop flow
/email triageTriage inbox
/email draft <to> <subject>Draft email
/email send <id>Send drafted email
/email configEmail settings
/map buildBuild AST knowledge graph
/map search <query>Search codebase
/map statsCodebase statistics
/map deps <file>File dependencies
/code run <file>Execute file
/code testRun tests
/code lintLint codebase
/git <command>Git operations
/project initInitialize project
/mcp connect <url>Connect MCP server
/mcp listConnected servers
/mcp toolsAvailable MCP tools
/mcp disconnect <id>Disconnect server
/help [command]Help β general or specific
/doctorFull diagnostic check
/costToken cost summary
/exportExport session to Markdown
/versionTokenSoup version
You want a capable assistant you launch with a double-click and talk to in a browser β not a CLI tool that assumes you live in a terminal.
You want the assistant on your own machine, with personal data redacted before anything leaves it, and a clear say over destructive actions.
One self-hosted helper for everyday work. An Enterprise module (in development) will add team features, stronger controls, auditable oversight, and the option of a fully local model behind your firewall β all on the same core engine.