Save every token.
Power your team.

Cut 40-70% of your AI token costs with on-device optimization — then give your whole team full visibility with Terse Cloud. Token analytics by developer, project, and tool.

First Launch — macOS Security Step (one-time only)

macOS blocks unsigned apps by default. Pick whichever method works for you:

A System Settings — easiest, no Terminal needed
1. Open Terse — click OK on the security warning
2. Open System Settings → Privacy & Security
3. Scroll to the Security section — click "Open Anyway" next to Terse
4. Confirm in the popup — done. Terse opens normally from now on.
B Right-click to open — works on macOS Ventura and earlier
1. In Finder, right-click (or Control+click) on Terse.app
2. Choose Open from the menu
3. Click Open in the dialog that appears
Note: macOS Sequoia (15+) removed this option — use Method A instead.
C Terminal command — one command, works on all versions
Drag Terse to /Applications, then paste this in Terminal:
xattr -cr /Applications/Terse.app && /Applications/Terse.app/Contents/MacOS/terse 2>/dev/null &
OpenClaw — ChatAgent
claude-sonnet-4-6
Connected
0 turns 0 in 0 out 0 cache $0.000 0 saved
Google Chrome — ChatGPTBrowser
Ready
VS Code — Cursor ChatEditor
main.js
optimizer.js
capture.js
preload.js
1 const optimizer = new PromptOptimizer();
2 const result = optimizer.optimize(text);
3 console.log(result.stats);
Cursor Chat · Aggr Mode
Ready
Terminal — Claude CodeAgent
Claude Code detected Connect
0Turns
0Input
0Output
$0Cost
Terse — Session ManagerMulti-App
Google Chrome
ChatGPT — claude.ai
ACTIVE
VS Code
terse-project — main.js
Cursor
my-project — index.ts
+ Add Session

Terse is a macOS & iOS token optimization tool that reduces AI prompt costs by 40-70% through 35+ intelligent compression techniques — including git diff compression, prompt caching awareness, conversation history summarization, model-specific optimization, duplicate tool call detection, redundant file read flagging, and real-time per-turn cost tracking across Claude Code, Cursor, ChatGPT, OpenClaw, and Aider. Built on research from LLMLingua (EMNLP 2023), Norvig spelling correction, and selective context pruning.

Updated March 2026 · v1.2.0

C
ChatGPT
CC
Claude Code
OC
OpenClaw
Cu
Cursor
Ai
Aider
VS
VS Code
Sa
Safari
Ch
Chrome
Cl
Claude.ai
Gm
Gemini
Wi
Windsurf
Cp
Copilot
C
ChatGPT
CC
Claude Code
OC
OpenClaw
Cu
Cursor
Ai
Aider
VS
VS Code
Sa
Safari
Ch
Chrome
Cl
Claude.ai
Gm
Gemini
Wi
Windsurf
Cp
Copilot

Optimize every message

Intercepts prompts and agent commands before they hit the API — strips filler, fixes typos, shortens phrases, compresses verbose text. 35+ techniques including output compression: log deduplication, stack trace collapsing, JSON schema extraction, and terminal noise stripping. 40-80% fewer input tokens.

Monitor agent sessions

Auto-detects Claude Code, OpenClaw, Cursor, and Aider. Tails JSONL session logs in real time — tracking input/output tokens, cache efficiency, tool call overhead, duplicate calls, redundant file reads, and per-turn cost with model-aware pricing.

Eliminate wasted tokens

Detects duplicate tool calls, flags files read multiple times, estimates compressible tool results (git 85%, tests 90%, builds 80%), and catches typos that cause costly retry loops. See how to cut AI API costs by 60-80%.

Git diff compression

Detects git diff output and compresses to only changed lines + 1 line of context. Typical 70% reduction — critical for Claude Code and Cursor workflows where diffs dominate tool output.

Prompt caching awareness

Detects stable prefixes and repeated content blocks that benefit from prompt caching — the #1 cost reduction technique in 2026. Saves 50-90% on repeated context across agent turns.

Conversation history summarization

Auto-summarizes old turns in long agent sessions: "User: A B C, Assistant: X Y Z" → "Earlier: Discussed A, concluded X." Massive savings in multi-turn workflows — up to 80% on history context.

Model-specific optimization

Claude handles implicit reasoning — Terse compresses more aggressively. GPT-4 needs explicit instructions — Terse preserves structure. Automatic model detection adapts compression strategy per provider.

Works with
Agent Token Optimization

Every turn optimized,
automatically.

A single agent task can consume 50x more tokens than a chat message. Terse runs 5 parallel optimization strategies: compressing your prompts, detecting duplicate tool calls, flagging redundant file reads, estimating compressible tool results, and auto-generating CLAUDE.md rules that teach agents to waste fewer tokens next session.

  • Compresses every user message before it hits the API
  • Detects duplicate tool calls — same tool + input = wasted tokens
  • Flags files read multiple times (~800 tok wasted per re-read)
  • Estimates compressible tool results (Read: 60%, Grep: 40%)
  • Tracks unused tools — ~300 tok overhead per unused tool per call
  • Generates CLAUDE.md rules from session patterns for future savings
Claude CodeOpenClawAiderCursor Agent
Agent Session — Live Optimization
1 You type in agent session
 
 
2 Terse optimizes — compress + dedup + fix
1Typo Correction
2Context Dedup
3Filler Removal
4Prompt Compression
5History Trimming
6Imperative Rewrite
7Final Cleanup
3 Optimized prompt auto-sent to agent
 
 
4 Agent runs — Terse tracks everything
 
 
 
 
 
5 Cumulative savings — per-turn cost breakdown
Turn
Input tok
Output tok
Cache
Tools
Tok saved
Typos
Cost
 
Works in seconds

Download. Click connect.
Everything just works.

No config files. No API keys to paste. No terminal commands. Open Terse, see your running agent, click Connect — optimization starts immediately. Every message compressed, every session tracked, every wasted token flagged.

  • Auto-detects Claude Code, Cursor Agent, Aider, OpenClaw via process scan
  • One click to connect — optimization starts on the next message
  • Live popup bar shows savings on every single turn
  • Works with existing sessions — no restart required
Claude CodeCursor AgentAiderOpenClaw
Terse — Connect Agent
Claude Code detected — monitor session? Connect
0Turns
0Tok saved
$0Cost
0%Cache hit
Why Terse

Terse vs RTK & other
token libraries

RTK (Reduce Token Kit) and similar libraries offer basic prompt trimming. Terse goes far beyond with agent-aware optimization, real-time monitoring, and output compression — techniques RTK doesn't support.

Feature
Terse
RTK / Others
Optimization techniques
35+
5-10
Tool output compression
✓ RTK-style + more
Basic
Git diff compression
✓ 70% reduction
Prompt caching awareness
✓ 50-90% on repeats
Agent session monitoring
✓ Real-time
Duplicate tool call detection
History summarization
✓ Auto-compress
Model-specific optimization
✓ Claude / GPT modes
Multilingual support
✓ 11 languages
English only
Spellcheck pipeline
✓ Dict + Norvig + macOS
Basic / None
Code protection
✓ Backtick-aware
Partial
Token Exchange marketplace
✓ Buy/sell tokens
On-device / privacy
✓ 100% local
Varies

RTK = Reduce Token Kit. Comparison based on publicly documented features as of March 2026.

Works Everywhere

Runs on
any app, automatically.

Connect Terse to Chrome, Cursor, VS Code, OpenClaw, or any terminal — it auto-detects text fields via macOS Accessibility and agent sessions via process scanning. No plugins to install. Agent sessions are detected automatically every 5 seconds.

  • Auto-detects agent processes — Claude Code, OpenClaw, Cursor, Aider
  • 7-stage pipeline runs on every prompt and agent command
  • Code blocks, URLs, inline code all protected · on-device · zero latency
Pipeline — Live
1Spell Correction
2Whitespace
3Pattern Optimization
4Redundancy Elimination
5NLP Analysis
6Aggressive Compression
7Final Cleanup
Live Token Optimization

Every message
optimized live.

As you type, Terse rewrites your prompt in real time — fixing typos, stripping filler, compressing verbose phrasing. But it doesn't stop there: when agents take actions, Terse optimizes their commands too. Every token saved — from your prompts and agent operations — means lower cost and better responses.

  • Rewrites prompts and agent commands before they're sent
  • Context-aware: "what souls I do" → "what should I do"
  • Optimizes agent tool calls, file reads, and context passing
  • Safe: skips ALL-CAPS, Capitalized, code tokens
Token Optimization — Live
TYPOS Dict
Norvig
Context
macOS Spellcheck
35+ Techniques

Every wasted
token found.

Filler removal, question-to-imperative, Jaccard deduplication, telegraph compression — each technique targets a different source of token waste. Applied to your prompts and agent commands alike.

  • 130+ phrase-shortening rules for prompts and agent messages
  • Semantic dedup — catches repeated context across agent turns
  • Tool result compression — flags large Read/Grep results for trimming
Techniques — Live

Redundant Read Detection

Question → Imperative

Duplicate Tool Call

Semantic Dedup

Filler Removal

Unused Tool Overhead

Three Modes

You control
how much to save.

Different contexts need different levels. Soft for careful prompts, Normal for everyday chat, Aggressive for agent sessions where a single task can burn through thousands of tokens.

  • Soft: Typo-fix + whitespace only. Perfect for critical prompts.
  • Normal: Strips filler, hedging, and meta-language. Best for chat.
  • Aggressive: Max compression + telegraph style. Built for agent sessions.
Mode Comparison
Soft
Normal
Aggr
"I was just wondering if you could perhaps help me understand how to implement a binary search tree in Python please?"
22 tok
0%
Agent Monitor

See everything
your agent does.

Terse auto-detects Claude Code, OpenClaw, Aider, and Cursor Agent via process scanning every 5 seconds. It tails JSONL session logs in real time with model-aware pricing (Opus, Sonnet, Haiku) — giving you full visibility into where every token goes.

  • Live token tracking: input, output, cache reads, context fill %
  • Model-aware cost estimation — Opus/Sonnet/Haiku pricing built in
  • Duplicate tool call detection + redundant file read alerts
  • Unused tool overhead tracking (~300 tok per unused tool per call)
  • Context fill meter — warns at 60% and alerts at 85% to run /compact
  • Auto-generates CLAUDE.md optimization rules from session patterns
Claude CodeOpenClawAiderCursor Agent
Agent Monitor — Live
Claude Code detected — monitor session?Connect
0Turns
0Input
0Output
$0Cost
0Cache
0Tools
0Typos
0sDuration
Prompt savings
Model:claude-opus-4-6Streaming
Auto Model Routing

Opus billed.
Sonnet delivered.

Terse runs a local proxy on port 7860 that intercepts every API call. Simple tasks — short prompts, lookups, edits — are silently rerouted from Opus ($15/MTok) to Sonnet ($3/MTok). Complex tasks stay on Opus. You pay 80% less with zero code changes.

  • Complexity scoring — short prompts, lookups, edits → Sonnet
  • Architecture, security reviews, deep refactors → stay on Opus
  • Transparent — zero changes to Claude Code, Cursor, or Codex
Claude CodeCursorCodexAny OpenAI-compat client
Terse Proxy — Live Routingport 7860
0Requests
0Routed
$0Saved
0%Route rate
Benchmarks

Tested on
real sessions.

Tested on real ChatGPT prompts, Claude Code agent sessions, and multi-turn agent workflows. Clean technical prompts pass untouched. Verbose prompts and agent messages see 40-70% reduction. Combined with tool overhead savings, total session reduction reaches 30-60%.

  • Benchmarked across manual prompts, agent turns, and tool calls
  • Clean prompts correctly return 0% — no false changes
  • Savings compound: 5-turn agent session saves 200-400+ tokens
Benchmarks — Aggressive Mode
Agent: mixed typos + filler
-64%
Agent: verbose debug prompt
-60%
Claude Code: typo-heavy
-51%
OpenClaw: chatty request
-46%
Tool overhead (unused)
-35%
Repeated context (dedup)
-28%
Clean technical
0%
See the difference

Real outputs, real savings.

Side-by-side comparison on actual prompts and agent commands.

find . -name "*.rs"
cargo test
git diff
git log
verbose prompt
agent debug
find . -name "*.rs" ~276 tokens
./target/debug/build/serde_core-.../out/private.rs
./target/debug/build/libsqlite3-sys-.../out/bindgen.rs
./target/debug/build/serde-.../out/private.rs
./src/ls.rs
./src/local_llm.rs
./src/learn/detector.rs
./src/learn/report.rs
./src/learn/mod.rs
./src/discover/registry.rs
./src/discover/provider.rs
./src/discover/report.rs
./src/discover/mod.rs
./src/wget_cmd.rs
./src/npm_cmd.rs
./src/cargo_cmd.rs
./src/ccusage.rs
./src/config.rs
./src/lint_cmd.rs
./src/curl_cmd.rs
./src/prisma_cmd.rs
./src/cc_economics.rs
./src/find_cmd.rs
./src/gain.rs
./src/git.rs
... 49 files total
Terse optimized ~149 tokens -46%
49F 4D:

src/ cargo_cmd.rs cc_economics.rs
  ccusage.rs config.rs container.rs
  curl_cmd.rs deps.rs diff_cmd.rs
  display_helpers.rs env_cmd.rs
  filter.rs find_cmd.rs gain.rs
  gh_cmd.rs git.rs grep_cmd.rs
  init.rs json_cmd.rs lint_cmd.rs
  local_llm.rs log_cmd.rs ls.rs
  main.rs ...
src/discover/ mod.rs provider.rs
  registry.rs report.rs
src/learn/ detector.rs mod.rs report.rs
src/parser/ error.rs formatter.rs
  mod.rs types.rs
cargo test ~4,823 tokens
   Compiling serde v1.0.210
   Compiling serde_json v1.0.128
   Compiling tokio v1.40.0
   Compiling reqwest v0.12.7
   Compiling my-project v0.1.0 (/Users/dev/project)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 45.23s
     Running unittests src/main.rs (target/debug/deps/my_project-a1b2c3d4)

running 262 tests
test config::tests::test_default_config ... ok
test config::tests::test_parse_env ... ok
test config::tests::test_merge_configs ... ok
test git::tests::test_parse_diff ... ok
test git::tests::test_status_parse ... ok
... 257 more tests
test result: ok. 262 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.41s
Terse optimized ~38 tokens -99%
cargo test: 262 passed, 0 failed
Built 5 crates in 45.23s
All tests green.
git diff ~1,240 tokens
diff --git a/src/optimizer.js b/src/optimizer.js
index 3a4b5c6..7d8e9f0 100644
--- a/src/optimizer.js
+++ b/src/optimizer.js
@@ -142,8 +142,12 @@ class PromptOptimizer {
     // Remove filler words from text
-    const fillers = ['just', 'really', 'very', 'actually'];
+    const fillers = ['just', 'really', 'very', 'actually',
+      'basically', 'literally', 'honestly', 'perhaps',
+      'maybe', 'probably', 'simply'];
     for (const f of fillers) {
-      text = text.replace(new RegExp(`\\b${f}\\b`, 'gi'), '');
+      text = text.replace(
+        new RegExp(`\\b${f}\\b\\s*`, 'gi'), ''
+      );
     }
@@ -165,3 +169,8 @@ class PromptOptimizer {
+  telegraphCompress(text) {
+    return text.replace(/\b(the|a|an)\b/gi, '')
+               .replace(/\s{2,}/g, ' ')
+               .trim();
+  }
Terse optimized ~310 tokens -75%
src/optimizer.js: 2 hunks, +13 -3

L142: expanded fillers list
  (7 new: basically, literally, honestly,
   perhaps, maybe, probably, simply)
  + regex now strips trailing whitespace

L169: added telegraphCompress()
  strips articles (the/a/an), collapses spaces
git log --oneline -20 ~520 tokens
a1b2c3d Fix authentication module token refresh handling
e4f5g6h Update README with new API documentation
i7j8k9l Refactor optimizer pipeline for better performance
m0n1o2p Add spellcheck integration with macOS NSSpellChecker
q3r4s5t Fix duplicate tool call detection edge case
u6v7w8x Update dependencies: electron 40, node 22
y9z0a1b Add aggressive mode telegraph compression
c2d3e4f Fix session manager reconnection bug
g5h6i7j Implement CLAUDE.md rule generation from patterns
k8l9m0n Add context fill meter warning at 60% and 85%
o1p2q3r Fix AX read fallback for Electron editors
s4t5u6v Add Jaccard similarity dedup for agent turns
w7x8y9z Refactor capture module for multi-window support
a0b1c2d Fix popup window focus stealing on macOS
e3f4g5h Add model-aware pricing for Opus/Sonnet/Haiku
i6j7k8l Update landing page hero animation
m9n0o1p Fix shell hook installation path resolution
q2r3s4t Add RTK-style output compression techniques
u5v6w7x Fix cache efficiency tracking in agent monitor
y8z9a0b Initial commit
Terse optimized ~185 tokens -64%
20 commits, 3 authors

Recent: auth token fix, README update,
optimizer refactor, spellcheck integration,
dedup edge case fix

Themes: optimizer (4), agent monitor (3),
bug fixes (6), features (5), docs (2)
User prompt ~52 tokens
I was just wondering if you could perhaps maybe help me understand how to implement a binary search tree in Python? I'm not really sure about the best approach to take here and I would really appreciate any guidance you could provide please.
Terse optimized ~11 tokens -79%
Implement binary search tree in Python. Show best approach.
Agent debug prompt ~68 tokens
I don't know if this makes sense but the authetication module is broken again. Could you maybe look into it and try to figure out why the tokne refresh isn't working properly? Like I mentioned earlier, the refresh endpoint keeps returning 401 errors and I really need this fixed as soon as possible please.
Terse optimized ~18 tokens -74%
Fix authentication module: token refresh returns 401. Debug refresh endpoint. Priority: high.

No AI tool offers unlimited usage.

Even at $200/mo, every tool has caps. Terse compresses prompts so your limits stretch further — and Terse Cloud gives your team full visibility into exactly where tokens are going.

A typical 2h coding session with an AI agent:
0
CLI commands run
0
tokens of prompt + CLI noise
0
with Terse (89% less)

Without Terse, CLI output and verbose prompts alone can overflow a 200K context window. Based on avg 3,500 tokens/command measured across real coding sessions.

Every tool has limits

Terse stretches every plan further.

No matter which AI tool your team uses, token costs add up fast. Terse compresses what goes in — and Terse Cloud gives you team-wide analytics to track spend by developer, project, and tool.

Claude Code Terminal
Price$20 — $200/mo
Limits~45 msgs/5h (Pro), 5-20x on Max
Context200K tokens
Sessions ~3x longer
Even Max $200/mo (20x Pro) has weekly caps (240-480h). Quota resets every 5h. Terse compresses prompts and CLI outputs by avg 89%, so each message carries less noise and your quota stretches ~3x.
Cursor IDE
Price$20 — $200/mo
Limits$20 credits/mo (Pro), ~225 Claude reqs
ContextUp to 200K (Max mode)
Credits go ~2x further
Even Ultra $200/mo (20x credits) is capped. Each request consumes credits based on model — Claude burns 2.4x faster than Gemini. Terse compresses prompts and CLI outputs so each request starts cleaner.
OpenAI Codex Agent
Price$20/mo (Plus) — $200/mo (Pro)
Limits30-1,500 msgs/5h by plan
Context192K tokens
More iterations per cap
Included with ChatGPT plans. Pro $200/mo caps at 1,500 msgs/5h. The agent runs commands autonomously — each output eats your cap. Terse compresses them for more iterations per window.
Windsurf IDE
Price$15 — $60/mo
Limits500 credits/mo (Pro)
Context200K tokens
Credits last ~2x longer
Enterprise $60/user gets 1,000 credits/mo — still capped. Cascade consumes credits per prompt. Terse compresses prompts and CLI outputs so each interaction uses fewer tokens, stretching your credits.
Gemini CLI Terminal
PriceFree — pay-per-token
Limits1,000 req/day, 60 req/min (free)
Context1M tokens
~70% less on token bill
Free tier is generous (1,000 req/day) but still rate-limited. Beyond that, you pay per token. Terse compresses prompts and CLI outputs by avg 89%, cutting your bill or freeing rate limit headroom.
Aider Terminal
PriceFree + API costs ($5-300+/mo)
LimitsPer API provider
ContextPer model (up to 200K)
~70% less API cost
BYO API key — you pay per token to OpenAI, Anthropic, etc. Terse compresses every prompt and command output before it reaches the model, directly cutting your API bill by ~70% on verbose workflows.
GitHub Copilot IDE
PriceFree — $39/mo (Pro+)
Limits50-1,500 premium req/mo
ContextPer model (up to 200K)
Better context quality
Enterprise $39/user: 1,000 premium req/mo. Base completions are unlimited, but Chat and the coding agent have caps. Terse keeps terminal output lean so premium requests carry more useful context.
Cline / Roo VS Code
PriceFree + API costs ($0-500+/mo)
LimitsPer API provider
ContextPer model (up to 200K)
~70% less API cost
No tool-side limit, but your API provider caps apply. Heavy users report $200-500+/mo. Terse compresses every output by avg 89%, directly cutting your bill and reducing context overflow.

Pricing verified Feb 2026. Limits vary by usage and plan. Terse savings based on avg 89% compression across real prompts and CLI outputs.

Terse Pals

Your companion through every session

Pick a pal that celebrates every token you save — it reacts to each tool call, eats your savings, and keeps you company through long coding sessions.

20 pals available  ·  Click to poke  ·  Unlock new ones as you save tokens

...and loved by developers

Engineers and AI power users cutting costs and gaining visibility into their token usage.

M
Marcus Chen
@marcuschen_dev
Been using @Terse_App with Claude Code for a week. Token usage dropped ~40% on agent sessions. The spellcheck alone saves me from costly correction loops — and the monitor shows exactly where tokens go.
Claude Code
S
Sarah Kim
@sarahk_ai
I type verbose ChatGPT prompts out of habit. Terse catches all my filler words and hedging in real time. -60% tokens on average. It's like Grammarly for token efficiency.
ChatGPT
J
Jake Ortiz
@jakeortiz
The agent monitor alone is worth it. I can see input/output/cache per turn, tool call costs, and which Cursor sessions are burning the most tokens. Finally have visibility into agent spend.
Cursor Agent
A
Amara Patel
@amara_codes
Runs 100% on-device. No API calls, no cloud. As someone who works with sensitive codebases, this was the only token optimizer I'd actually trust. Privacy-first done right.
Privacy
R
Ravi Nguyen
@ravi_ng
Set up OpenClaw + Terse and my API bill dropped immediately. Auto-mode rewrites prompts before send, the monitor tracks every turn's cost, and catching typos means fewer "sorry, I meant..." follow-ups.
OpenClaw
E
Elena Vasquez
@elena_v
The three modes are perfect. Soft for important prompts where every word matters, Aggressive for quick throwaway questions. Terse adapts to how I work, not the other way around.
3 Modes
D
Daniel Park
@dpark_dev
I had $40/mo of Claude API credits going unused. Listed my key on the Token Exchange in 2 minutes — now I earn ~$25/mo from other devs using my spare capacity. Basically free money from tokens I was wasting.
Token Exchange
L
Lisa Chen
@lisachen_ai
As a student, paying full price for Claude API was painful. Token Exchange gets me Sonnet at 40% off retail. One env var and Claude Code just works through the proxy. Plus Terse optimizes my prompts, so tokens go even further.
Token Exchange

Built on research.

Grounded in LLMLingua, Norvig spelling, selective context pruning, and real-world agent session analysis across thousands of Claude Code turns.

0
Optimization strategies
0
Token reduction techniques
0
API providers supported
0
% max discount on Exchange
Developer API

Terse API

Integrate token optimization directly into your vibe coding project. One API call — 30–60% fewer tokens, same meaning, same results.

Read the docs GitHub examples
🔑

1. Get your API key

Sign in and generate a tsk_... developer key from your dashboard. Free tier included.

2. Optimize before sending

POST your prompt to /api/v1/optimize — get back a trimmed version with token count saved.

📡

3. Scan your codebase

Use /api/v1/scan to find every LLM call site in your project and see where tokens are wasted.

// Install: (no package needed — just HTTP)
const response = await fetch('https://www.terseai.org/api/v1/optimize', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TERSE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    text: userPrompt,   // your prompt string
    mode: 'normal',     // 'soft' | 'normal' | 'aggressive'
  }),
});
const { optimized, tokens_saved, reduction_pct } = await response.json();
// → use `optimized` as the prompt for your LLM call

API Pricing

Free tier included with every account. No credit card for free. Upgrade for higher limits and batch access.

Free
$0/mo

No credit card needed.

  • 500K tokens / month
  • 60 requests / minute
  • All 3 compression modes
  • Community support
Most Popular
Pro
$29/mo

Cancel anytime. No trial.

  • 50M tokens / month
  • 600 requests / minute
  • Batch compression
  • Usage analytics dashboard
  • Email support
Enterprise
Custom

Teams & high-volume use.

  • Unlimited tokens
  • Custom rate limits
  • Dedicated endpoints
  • SLA + priority support
  • Team management + audit logs
Contact us →
Modes
soft Typo fix + whitespace + safe phrase shortening. Meaning 100% preserved.
normal + filler removal, hedging, politeness, question→imperative conversion.
aggressive + abbreviations, markdown strip, article removal, telegraph compression.
Endpoints
POST /api/v1/optimize — optimize a prompt
POST /api/v1/scan — scan code for LLM calls
POST /api/v1/keys — create API key
GET  /api/v1/keys — list your keys
POST /api/v1/projects — publish your project
GET  /api/v1/projects — browse showcase
Platform

Vibe Coding Projects

Built with Terse API? Publish your project to get traffic, users, and visibility in the vibe coding developer community.

Browse all projects →
🤖
ClaudeFlow
Multi-agent workflow builder
247

Orchestrates Claude agents for complex tasks. Terse API cuts context window usage by 41% per turn, enabling longer sessions without hitting limits.

Node.js Agents
~38K tokens saved/mo
💬
PromptKit
Open-source prompt library
189

A library of 200+ production-ready prompts. Uses Terse API to auto-compress each prompt at runtime, reducing developer API bills by $80/mo on average.

TypeScript Prompts
~21K tokens saved/mo
GitHub ↗
🐍
PyAgentKit
Python agent scaffolding
134

Batteries-included Python scaffold for building LLM agents. Terse integration is built-in — every outbound message is auto-optimized before the API call.

Python Scaffold
~29K tokens saved/mo
GitHub ↗

Have a project using Terse API?

Teams

Terse Cloud

Visibility and control over your team's AI coding costs. See who's using what, how much is saved, and where tokens are wasted — across every tool.

📊

Token analytics

Dashboard by developer, project, and tool — Mac, Windows, Chrome, VS Code, iOS.

💰

Team savings report

"Your team saved $4,200 this month" — with per-developer and per-project breakdowns.

🔔

Rate-limit alerts

Smart notifications when a developer is approaching heavy usage thresholds.

🔑

Team token control

Distribute API tokens, add or remove developers, and audit all activity in one place.

Open team dashboard →

Free for open source. Teams from $15 / developer / month.

Pricing

Simple, transparent plans

Every plan includes a 30-day free trial. No charge until your trial ends.

Premium
$99/mo

For teams and power users. Unlimited everything, priority support.

30-day free trial — cancel anytime
  • Unlimited optimizations
  • Unlimited connected sessions
  • Unlimited devices
  • All 3 optimization modes
  • Full agent analytics + rule generation
  • Auto-replace & Send-mode
  • Priority support
FAQ

Frequently asked questions

Everything you need to know about token optimization and how Terse saves you money.

What is token optimization?
Token optimization is the process of reducing the number of tokens in AI prompts and outputs without losing meaning. Terse uses 35+ techniques — including spell correction, filler removal, prompt compression, and semantic deduplication — to cut token usage by 40-70%, directly lowering AI API costs. Read the full guide →
How much can Terse save on AI costs?
Terse reduces token usage by 40-70% on verbose prompts and up to 89% on CLI output noise. A typical 2-hour coding session generates ~210K tokens of raw output, which Terse compresses to ~23K. Combined with duplicate detection and redundant read flagging, total session costs drop 3-5x. See the cost breakdown →
Which AI tools does Terse work with?
Terse auto-detects Claude Code, Cursor, OpenClaw, Aider, and any terminal-based AI agent via process scanning. It also works with browser-based tools like ChatGPT, Claude.ai, and Gemini through macOS Accessibility API integration with Chrome and Safari. See agent monitoring →
How does prompt compression work?
Terse runs a 7-stage pipeline: spell correction (400+ typo fixes), whitespace normalization, pattern optimization (130+ rules), redundancy elimination, NLP analysis, telegraph compression, and final cleanup. Code blocks, URLs, and technical terms are protected throughout.
Does optimization reduce AI output quality?
No. Token optimization removes noise — filler words, hedging, redundant phrases, typos — without changing intent. Research from LLMLingua (EMNLP 2023) shows compressed prompts maintain or improve output quality because models receive cleaner, more focused instructions.
What’s the difference from prompt engineering?
Prompt engineering focuses on crafting better instructions for better AI outputs. Token optimization reduces the cost of those instructions by removing waste — filler, typos, redundancy — without changing meaning. Terse handles optimization automatically so you can focus on engineering for quality. Learn more →
Is Terse free to use?
Yes. Both plans include a 30-day free trial — no charge until your trial ends. Pro ($4.99/mo) includes unlimited optimizations, 3 sessions, agent monitoring, and CLAUDE.md generation. Premium ($99/mo) includes unlimited sessions, devices, and priority support. Cancel anytime.
How do AI tokens affect cost?
AI models charge per token (~4 characters each). Claude Opus costs $15-$75 per million tokens, GPT-4o costs $2.50-$10. A single agent session can consume 200K+ tokens, costing $3-$15. Heavy users spend $200-500+/month on API costs alone. See the full pricing comparison →
What is the Token Exchange?
The Token Exchange is a marketplace where users trade unused AI API tokens. Sellers list their API keys at a discount, buyers get cheaper access. Terse runs a proxy that optimizes every request before forwarding, so the actual API cost is 30-60% less. Terse takes a 15% commission. Your API keys are encrypted with AES-256 and never exposed to buyers.
How do I buy or sell tokens?
Sign in at terseai.org/marketplace. To sell: paste your API key, drag a slider to set your discount, done. To buy: top up your balance, generate a Terse API key, and use it in any SDK — just set the base URL to Terse's proxy. One terminal command to configure.
News & Insights

Stay ahead on AI cost optimization.

Research, guides, and analysis on reducing LLM token costs — from prompt caching to model routing to agent session efficiency.

Cost Optimization

Prompt Caching in 2026: How to Cut Claude Code Costs by 73%

Prompt caching is now the single highest-ROI technique for reducing AI agent costs. Cached tokens cost 10× less than standard input tokens — here's how to structure every session for maximum cache hit rate.

Agent Optimization

Git Diff Compression: Reclaim 70% of Your Agent Context Window

When Claude Code runs git diff, the output can consume 30–60K tokens per turn. Compressing to changed lines + 1 line of context is the fastest context win in any agent workflow.

Research

The Real Cost of Duplicate Tool Calls in AI Agent Sessions

Analysis of 500 real Claude Code sessions found 34% of Read tool calls were redundant. At $15/MTok for Opus, each repeated file read wastes ~$0.006 — small per turn, painful at scale.

Model Selection

Model Routing in 2026: When Sonnet Outperforms Opus at 1/5th the Cost

Not every agent task needs Claude Opus. Single-file edits, test generation, and short lookups perform identically on Sonnet — saving teams $200+/month with zero quality loss on routine tasks.

Technique Comparison

LLMLingua vs. Rule-Based Compression: A Developer's Benchmark

LLMLingua uses perplexity scoring to drop tokens at inference time. Deterministic rule pipelines work at the prompt layer before the API call. We tested both on 1,000 real developer prompts.

Research

Why Typos in Agent Prompts Cost More Than You Think

A single typo in an agent prompt can trigger a clarification turn, adding 200–500 tokens of overhead. Typo-caused retries account for 8–15% of total token spend across a typical week of Claude Code usage.

Weekly Newsletter

Get the token savings playbook
in your inbox.

Practical techniques — prompt caching, model routing, context compression — delivered weekly. No fluff.

Join developers cutting AI costs every week. Unsubscribe anytime.

Stop wasting
tokens and money.

Optimize every prompt. Monitor every agent session. And give your whole team visibility with Terse Cloud — analytics by developer, project, and tool.

First Launch — macOS Security Step (one-time only)

macOS blocks unsigned apps by default. Pick whichever method works for you:

A System Settings — easiest, no Terminal needed
1. Open Terse — click OK on the security warning
2. Open System Settings → Privacy & Security
3. Scroll to the Security section — click "Open Anyway" next to Terse
4. Confirm in the popup — done. Terse opens normally from now on.
B Right-click to open — works on macOS Ventura and earlier
1. In Finder, right-click (or Control+click) on Terse.app
2. Choose Open from the menu
3. Click Open in the dialog that appears
Note: macOS Sequoia (15+) removed this option — use Method A instead.
C Terminal command — one command, works on all versions
Drag Terse to /Applications, then paste this in Terminal:
xattr -cr /Applications/Terse.app && /Applications/Terse.app/Contents/MacOS/terse 2>/dev/null &
100% on-device Zero latency