Does Terse work with Claude Code, Cursor, and ChatGPT?

Yes. Terse auto-detects Claude Code, Cursor, OpenClaw, Aider, and any terminal-based AI agent via process scanning every 5 seconds. It also works with browser-based tools like ChatGPT, Claude.ai, and Gemini through macOS Accessibility API integration with Chrome, Safari, and other browsers.

Does token optimization reduce AI output quality?

No. Token optimization removes noise (filler words, hedging, redundant phrases, typos) that adds cost without improving results. Research from LLMLingua (EMNLP 2023) shows that compressed prompts maintain or even improve LLM output quality because the model receives a cleaner, more focused instruction. Terse's Soft mode preserves 100% of meaning for critical prompts.

What is the difference between token optimization and prompt engineering?

Prompt engineering focuses on crafting better instructions to get better AI outputs. Token optimization focuses on reducing the cost of those instructions by removing waste — filler, typos, redundancy, verbose phrasing — without changing the intent. Terse handles token optimization automatically so developers can focus on prompt engineering for quality.

Save every token.
Power your team.

Cut 40-70% of your AI token costs with on-device optimization — then give your whole team full visibility with Terse Cloud. Token analytics by developer, project, and tool.

Try Live Demo macOS Windows VS Code Chrome Terse Cloud Terse Docs NEW Join our Slack

✓ 30-day free trial · $0 due today · no card for the demo · cancel anytime

First Launch — macOS Security Step (one-time only)

macOS blocks unsigned apps by default. Pick whichever method works for you:

A System Settings — easiest, no Terminal needed

1. Open Terse — click OK on the security warning
2. Open System Settings → Privacy & Security
3. Scroll to the Security section — click "Open Anyway" next to Terse
4. Confirm in the popup — done. Terse opens normally from now on.

B Right-click to open — works on macOS Ventura and earlier

1. In Finder, right-click (or Control+click) on Terse.app
2. Choose Open from the menu
3. Click Open in the dialog that appears
Note: macOS Sequoia (15+) removed this option — use Method A instead.

C Terminal command — one command, works on all versions

Drag Terse to /Applications, then paste this in Terminal:
xattr -cr /Applications/Terse.app && /Applications/Terse.app/Contents/MacOS/terse 2>/dev/null &

OpenClaw — ChatAgent

Google Chrome — ChatGPTBrowser

Ready

VS Code — Cursor ChatEditor

1 const optimizer = new PromptOptimizer();

2 const result = optimizer.optimize(text);

3 console.log(result.stats);

Cursor Chat · Aggr Mode

Ready

Terminal — Claude CodeAgent

0Turns

0Input

0Output

$0Cost

Terse — Session ManagerMulti-App

Google Chrome

ChatGPT — claude.ai

ACTIVE

VS Code

terse-project — main.js

Cursor

my-project — index.ts

+ Add Session

4.8 · loved by developers

40–70%

average token savings

100%

on-device · zero cloud

Works with

Claude Code · Cursor · ChatGPT · Copilot · Aider

Terse is a macOS & iOS token optimization tool that reduces AI prompt costs by 40-70% through 35+ intelligent compression techniques — including git diff compression, prompt caching awareness, conversation history summarization, model-specific optimization, duplicate tool call detection, redundant file read flagging, and real-time per-turn cost tracking across Claude Code, Cursor, ChatGPT, OpenClaw, and Aider. Built on research from LLMLingua (EMNLP 2023), Norvig spelling correction, and selective context pruning.

Updated March 2026 · v1.2.0

ChatGPT

Claude Code

OpenClaw

Cursor

Aider

VS Code

Safari

Chrome

Claude.ai

Gemini

Windsurf

Copilot

ChatGPT

Claude Code

OpenClaw

Cursor

Aider

VS Code

Safari

Chrome

Claude.ai

Gemini

Windsurf

Copilot

Optimize every message

Intercepts prompts and agent commands before they hit the API — strips filler, fixes typos, shortens phrases, compresses verbose text. 35+ techniques including output compression: log deduplication, stack trace collapsing, JSON schema extraction, and terminal noise stripping. 40-80% fewer input tokens.

Monitor agent sessions

Auto-detects Claude Code, OpenClaw, Cursor, and Aider. Tails JSONL session logs in real time — tracking input/output tokens, cache efficiency, tool call overhead, duplicate calls, redundant file reads, and per-turn cost with model-aware pricing.

Eliminate wasted tokens

Detects duplicate tool calls, flags files read multiple times, estimates compressible tool results (git 85%, tests 90%, builds 80%), and catches typos that cause costly retry loops. See how to cut AI API costs by 60-80%.

Git diff compression

Detects git diff output and compresses to only changed lines + 1 line of context. Typical 70% reduction — critical for Claude Code and Cursor workflows where diffs dominate tool output.

Prompt caching awareness

Detects stable prefixes and repeated content blocks that benefit from prompt caching — the #1 cost reduction technique in 2026. Saves 50-90% on repeated context across agent turns.

Conversation history summarization

Auto-summarizes old turns in long agent sessions: "User: A B C, Assistant: X Y Z" → "Earlier: Discussed A, concluded X." Massive savings in multi-turn workflows — up to 80% on history context.

Model-specific optimization

Claude handles implicit reasoning — Terse compresses more aggressively. GPT-4 needs explicit instructions — Terse preserves structure. Automatic model detection adapts compression strategy per provider.

Works with

One butler, every agent

Not just a compressor.
A butler for your AI agents.

Terse handles the whole lifecycle of every AI coding agent you run — watching it, capping its spend, keeping its tools clean, and trimming every token. All on-device.

Optimize

Compress every prompt 40–70% before it hits the API — 35+ on-device techniques, code always protected.

Monitor

Live tokens, cost, cache efficiency, burn rate and context fill across 8 coding agents.

Budget breaker

Set spend ceilings that pause or kill a runaway agent before its next API call.

MCP manager

Discover every MCP server across your configs, risk-score each, toggle without editing JSON.

Doctor

25 waste scans — cache thrash, duplicate tool calls, redundant reads, context burn — with one-click fixes.

Team

Share live agent sessions and team analytics — by developer, project, and tool.

Agent Token Optimization

Every turn optimized,
automatically.

A single agent task can consume 50x more tokens than a chat message. Terse runs 5 parallel optimization strategies: compressing your prompts, detecting duplicate tool calls, flagging redundant file reads, estimating compressible tool results, and auto-generating CLAUDE.md rules that teach agents to waste fewer tokens next session.

Compresses every user message before it hits the API
Detects duplicate tool calls — same tool + input = wasted tokens
Flags files read multiple times (~800 tok wasted per re-read)
Estimates compressible tool results (Read: 60%, Grep: 40%)
Tracks unused tools — ~300 tok overhead per unused tool per call
Generates CLAUDE.md rules from session patterns for future savings

Claude CodeOpenClawAiderCursor Agent

Agent Session — Live Optimization

1 You type in agent session

2 Terse optimizes — compress + dedup + fix

1Typo Correction

2Context Dedup

3Filler Removal

4Prompt Compression

5History Trimming

6Imperative Rewrite

7Final Cleanup

3 Optimized prompt auto-sent to agent

4 Agent runs — Terse tracks everything

⚙

✎

▶

✓

5 Cumulative savings — per-turn cost breakdown

—Turn

—Input tok

—Output tok

—Cache

—Tools

—Tok saved

—Typos

—Cost

Works in seconds

Download. Click connect.
Everything just works.

No config files. No API keys to paste. No terminal commands. Open Terse, see your running agent, click Connect — optimization starts immediately. Every message compressed, every session tracked, every wasted token flagged.

Auto-detects Claude Code, Cursor Agent, Aider, OpenClaw via process scan
One click to connect — optimization starts on the next message
Live popup bar shows savings on every single turn
Works with existing sessions — no restart required

Claude CodeCursor AgentAiderOpenClaw

Terse — Connect Agent

0Turns

0Tok saved

$0Cost

0%Cache hit

Why Terse

Terse vs RTK & other
token libraries

RTK (Reduce Token Kit) and similar libraries offer basic prompt trimming. Terse goes far beyond with agent-aware optimization, real-time monitoring, and output compression — techniques RTK doesn't support.

Feature

Terse

RTK / Others

Optimization techniques

35+

5-10

Tool output compression

✓ RTK-style + more

Basic

Git diff compression

✓ 70% reduction

✗

Prompt caching awareness

✓ 50-90% on repeats

✗

Agent session monitoring

✓ Real-time

✗

Duplicate tool call detection

✓

✗

History summarization

✓ Auto-compress

✗

Model-specific optimization

✓ Claude / GPT modes

✗

Multilingual support

✓ 11 languages

English only

Spellcheck pipeline

✓ Dict + Norvig + macOS

Basic / None

Code protection

✓ Backtick-aware

Partial

Token Exchange marketplace

✓ Buy/sell tokens

✗

On-device / privacy

✓ 100% local

Varies

RTK = Reduce Token Kit. Comparison based on publicly documented features as of March 2026.

Works Everywhere

Runs on
any app, automatically.

Connect Terse to Chrome, Cursor, VS Code, OpenClaw, or any terminal — it auto-detects text fields via macOS Accessibility and agent sessions via process scanning. No plugins to install. Agent sessions are detected automatically every 5 seconds.

Auto-detects agent processes — Claude Code, OpenClaw, Cursor, Aider
7-stage pipeline runs on every prompt and agent command
Code blocks, URLs, inline code all protected · on-device · zero latency

Pipeline — Live

1Spell Correction

2Whitespace

3Pattern Optimization

4Redundancy Elimination

5NLP Analysis

6Aggressive Compression

7Final Cleanup

Live Token Optimization

Every message
optimized live.

As you type, Terse rewrites your prompt in real time — fixing typos, stripping filler, compressing verbose phrasing. But it doesn't stop there: when agents take actions, Terse optimizes their commands too. Every token saved — from your prompts and agent operations — means lower cost and better responses.

Rewrites prompts and agent commands before they're sent
Context-aware: "what souls I do" → "what should I do"
Optimizes agent tool calls, file reads, and context passing
Safe: skips ALL-CAPS, Capitalized, code tokens

Token Optimization — Live

TYPOS Dict

Norvig

Context

macOS Spellcheck

35+ Techniques

Every wasted
token found.

Filler removal, question-to-imperative, Jaccard deduplication, telegraph compression — each technique targets a different source of token waste. Applied to your prompts and agent commands alike.

130+ phrase-shortening rules for prompts and agent messages
Semantic dedup — catches repeated context across agent turns
Tool result compression — flags large Read/Grep results for trimming

Techniques — Live

Redundant Read Detection

Question → Imperative

Duplicate Tool Call

Semantic Dedup

Filler Removal

Unused Tool Overhead

Three Modes

You control
how much to save.

Different contexts need different levels. Soft for careful prompts, Normal for everyday chat, Aggressive for agent sessions where a single task can burn through thousands of tokens.

Soft: Typo-fix + whitespace only. Perfect for critical prompts.
Normal: Strips filler, hedging, and meta-language. Best for chat.
Aggressive: Max compression + telegraph style. Built for agent sessions.

Mode Comparison

Soft

Normal

Aggr

"I was just wondering if you could perhaps help me understand how to implement a binary search tree in Python please?"

22 tok

Agent Monitor

See everything
your agent does.

Terse auto-detects Claude Code, OpenClaw, Aider, and Cursor Agent via process scanning every 5 seconds. It tails JSONL session logs in real time with model-aware pricing (Opus, Sonnet, Haiku) — giving you full visibility into where every token goes.

Live token tracking: input, output, cache reads, context fill %
Model-aware cost estimation — Opus/Sonnet/Haiku pricing built in
Duplicate tool call detection + redundant file read alerts
Unused tool overhead tracking (~300 tok per unused tool per call)
Context fill meter — warns at 60% and alerts at 85% to run /compact
Auto-generates CLAUDE.md optimization rules from session patterns

Claude CodeOpenClawAiderCursor Agent

Agent Monitor — Live

0Turns

0Input

0Output

$0Cost

0Cache

0Tools

0Typos

0sDuration

Prompt savings—

Model:claude-opus-4-6Streaming

Auto Model Routing

Opus billed.
Sonnet delivered.

Terse runs a local proxy on port 7860 that intercepts every API call. Simple tasks — short prompts, lookups, edits — are silently rerouted from Opus ($15/MTok) to Sonnet ($3/MTok). Complex tasks stay on Opus. You pay 80% less with zero code changes.

Complexity scoring — short prompts, lookups, edits → Sonnet
Architecture, security reviews, deep refactors → stay on Opus
Transparent — zero changes to Claude Code, Cursor, or Codex

Claude CodeCursorCodexAny OpenAI-compat client

Terse Proxy — Live Routingport 7860

0Requests

0Routed

$0Saved

0%Route rate

Benchmarks

Tested on
real sessions.

Tested on real ChatGPT prompts, Claude Code agent sessions, and multi-turn agent workflows. Clean technical prompts pass untouched. Verbose prompts and agent messages see 40-70% reduction. Combined with tool overhead savings, total session reduction reaches 30-60%.

Benchmarked across manual prompts, agent turns, and tool calls
Clean prompts correctly return 0% — no false changes
Savings compound: 5-turn agent session saves 200-400+ tokens

Benchmarks — Aggressive Mode

Agent: mixed typos + filler

-64%

Agent: verbose debug prompt

-60%

Claude Code: typo-heavy

-51%

OpenClaw: chatty request

-46%

Tool overhead (unused)

-35%

Repeated context (dedup)

-28%

Clean technical

See the difference

Real outputs, real savings.

Side-by-side comparison on actual prompts and agent commands.

find . -name "*.rs"

cargo test

git diff

git log

verbose prompt

agent debug

find . -name "*.rs" ~276 tokens

./target/debug/build/serde_core-.../out/private.rs
./target/debug/build/libsqlite3-sys-.../out/bindgen.rs
./target/debug/build/serde-.../out/private.rs
./src/ls.rs
./src/local_llm.rs
./src/learn/detector.rs
./src/learn/report.rs
./src/learn/mod.rs
./src/discover/registry.rs
./src/discover/provider.rs
./src/discover/report.rs
./src/discover/mod.rs
./src/wget_cmd.rs
./src/npm_cmd.rs
./src/cargo_cmd.rs
./src/ccusage.rs
./src/config.rs
./src/lint_cmd.rs
./src/curl_cmd.rs
./src/prisma_cmd.rs
./src/cc_economics.rs
./src/find_cmd.rs
./src/gain.rs
./src/git.rs
... 49 files total

Terse optimized ~149 tokens -46%

49F 4D:

src/ cargo_cmd.rs cc_economics.rs
  ccusage.rs config.rs container.rs
  curl_cmd.rs deps.rs diff_cmd.rs
  display_helpers.rs env_cmd.rs
  filter.rs find_cmd.rs gain.rs
  gh_cmd.rs git.rs grep_cmd.rs
  init.rs json_cmd.rs lint_cmd.rs
  local_llm.rs log_cmd.rs ls.rs
  main.rs ...
src/discover/ mod.rs provider.rs
  registry.rs report.rs
src/learn/ detector.rs mod.rs report.rs
src/parser/ error.rs formatter.rs
  mod.rs types.rs

cargo test ~4,823 tokens

   Compiling serde v1.0.210
   Compiling serde_json v1.0.128
   Compiling tokio v1.40.0
   Compiling reqwest v0.12.7
   Compiling my-project v0.1.0 (/Users/dev/project)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 45.23s
     Running unittests src/main.rs (target/debug/deps/my_project-a1b2c3d4)

running 262 tests
test config::tests::test_default_config ... ok
test config::tests::test_parse_env ... ok
test config::tests::test_merge_configs ... ok
test git::tests::test_parse_diff ... ok
test git::tests::test_status_parse ... ok
... 257 more tests
test result: ok. 262 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 3.41s

Terse optimized ~38 tokens -99%

cargo test: 262 passed, 0 failed
Built 5 crates in 45.23s
All tests green.

git diff ~1,240 tokens

diff --git a/src/optimizer.js b/src/optimizer.js
index 3a4b5c6..7d8e9f0 100644
--- a/src/optimizer.js
+++ b/src/optimizer.js
@@ -142,8 +142,12 @@ class PromptOptimizer {
     // Remove filler words from text
-    const fillers = ['just', 'really', 'very', 'actually'];
+    const fillers = ['just', 'really', 'very', 'actually',
+      'basically', 'literally', 'honestly', 'perhaps',
+      'maybe', 'probably', 'simply'];
     for (const f of fillers) {
-      text = text.replace(new RegExp(`\\b${f}\\b`, 'gi'), '');
+      text = text.replace(
+        new RegExp(`\\b${f}\\b\\s*`, 'gi'), ''
+      );
     }
@@ -165,3 +169,8 @@ class PromptOptimizer {
+  telegraphCompress(text) {
+    return text.replace(/\b(the|a|an)\b/gi, '')
+               .replace(/\s{2,}/g, ' ')
+               .trim();
+  }

Terse optimized ~310 tokens -75%

src/optimizer.js: 2 hunks, +13 -3

L142: expanded fillers list
  (7 new: basically, literally, honestly,
   perhaps, maybe, probably, simply)
  + regex now strips trailing whitespace

L169: added telegraphCompress()
  strips articles (the/a/an), collapses spaces

git log --oneline -20 ~520 tokens

a1b2c3d Fix authentication module token refresh handling
e4f5g6h Update README with new API documentation
i7j8k9l Refactor optimizer pipeline for better performance
m0n1o2p Add spellcheck integration with macOS NSSpellChecker
q3r4s5t Fix duplicate tool call detection edge case
u6v7w8x Update dependencies: electron 40, node 22
y9z0a1b Add aggressive mode telegraph compression
c2d3e4f Fix session manager reconnection bug
g5h6i7j Implement CLAUDE.md rule generation from patterns
k8l9m0n Add context fill meter warning at 60% and 85%
o1p2q3r Fix AX read fallback for Electron editors
s4t5u6v Add Jaccard similarity dedup for agent turns
w7x8y9z Refactor capture module for multi-window support
a0b1c2d Fix popup window focus stealing on macOS
e3f4g5h Add model-aware pricing for Opus/Sonnet/Haiku
i6j7k8l Update landing page hero animation
m9n0o1p Fix shell hook installation path resolution
q2r3s4t Add RTK-style output compression techniques
u5v6w7x Fix cache efficiency tracking in agent monitor
y8z9a0b Initial commit

Terse optimized ~185 tokens -64%

20 commits, 3 authors

Recent: auth token fix, README update,
optimizer refactor, spellcheck integration,
dedup edge case fix

Themes: optimizer (4), agent monitor (3),
bug fixes (6), features (5), docs (2)

User prompt ~52 tokens

I was just wondering if you could perhaps maybe help me understand how to implement a binary search tree in Python? I'm not really sure about the best approach to take here and I would really appreciate any guidance you could provide please.

Terse optimized ~11 tokens -79%

Implement binary search tree in Python. Show best approach.

Agent debug prompt ~68 tokens

I don't know if this makes sense but the authetication module is broken again. Could you maybe look into it and try to figure out why the tokne refresh isn't working properly? Like I mentioned earlier, the refresh endpoint keeps returning 401 errors and I really need this fixed as soon as possible please.

Terse optimized ~18 tokens -74%

Fix authentication module: token refresh returns 401. Debug refresh endpoint. Priority: high.

No AI tool offers unlimited usage.

Even at $200/mo, every tool has caps. Terse compresses prompts so your limits stretch further — and Terse Cloud gives your team full visibility into exactly where tokens are going.

A typical 2h coding session with an AI agent:

CLI commands run

tokens of prompt + CLI noise

with Terse (89% less)

Without Terse, CLI output and verbose prompts alone can overflow a 200K context window. Based on avg 3,500 tokens/command measured across real coding sessions.

Every tool has limits

Terse stretches every plan further.

No matter which AI tool your team uses, token costs add up fast. Terse compresses what goes in — and Terse Cloud gives you team-wide analytics to track spend by developer, project, and tool.

Claude Code Terminal

Price$20 — $200/mo

Limits~45 msgs/5h (Pro), 5-20x on Max

Context200K tokens

Sessions ~3x longer

Even Max $200/mo (20x Pro) has weekly caps (240-480h). Quota resets every 5h. Terse compresses prompts and CLI outputs by avg 89%, so each message carries less noise and your quota stretches ~3x.

Cursor IDE

Price$20 — $200/mo

Limits$20 credits/mo (Pro), ~225 Claude reqs

ContextUp to 200K (Max mode)

Credits go ~2x further

Even Ultra $200/mo (20x credits) is capped. Each request consumes credits based on model — Claude burns 2.4x faster than Gemini. Terse compresses prompts and CLI outputs so each request starts cleaner.

OpenAI Codex Agent

Price$20/mo (Plus) — $200/mo (Pro)

Limits30-1,500 msgs/5h by plan

Context192K tokens

More iterations per cap

Included with ChatGPT plans. Pro $200/mo caps at 1,500 msgs/5h. The agent runs commands autonomously — each output eats your cap. Terse compresses them for more iterations per window.

Windsurf IDE

Price$15 — $60/mo

Limits500 credits/mo (Pro)

Context200K tokens

Credits last ~2x longer

Enterprise $60/user gets 1,000 credits/mo — still capped. Cascade consumes credits per prompt. Terse compresses prompts and CLI outputs so each interaction uses fewer tokens, stretching your credits.

Gemini CLI Terminal

PriceFree — pay-per-token

Limits1,000 req/day, 60 req/min (free)

Context1M tokens

~70% less on token bill

Free tier is generous (1,000 req/day) but still rate-limited. Beyond that, you pay per token. Terse compresses prompts and CLI outputs by avg 89%, cutting your bill or freeing rate limit headroom.

Aider Terminal

PriceFree + API costs ($5-300+/mo)

LimitsPer API provider

ContextPer model (up to 200K)

~70% less API cost

BYO API key — you pay per token to OpenAI, Anthropic, etc. Terse compresses every prompt and command output before it reaches the model, directly cutting your API bill by ~70% on verbose workflows.

GitHub Copilot IDE

PriceFree — $39/mo (Pro+)

Limits50-1,500 premium req/mo

ContextPer model (up to 200K)

Better context quality

Enterprise $39/user: 1,000 premium req/mo. Base completions are unlimited, but Chat and the coding agent have caps. Terse keeps terminal output lean so premium requests carry more useful context.

Cline / Roo VS Code

PriceFree + API costs ($0-500+/mo)

LimitsPer API provider

ContextPer model (up to 200K)

~70% less API cost

No tool-side limit, but your API provider caps apply. Heavy users report $200-500+/mo. Terse compresses every output by avg 89%, directly cutting your bill and reducing context overflow.

Pricing verified Feb 2026. Limits vary by usage and plan. Terse savings based on avg 89% compression across real prompts and CLI outputs.

Terse Pals

Your companion through every session

Pick a pal that celebrates every token you save — it reacts to each tool call, eats your savings, and keeps you company through long coding sessions.

20 pals available · Click to poke · Unlock new ones as you save tokens

...and loved by developers

Engineers and AI power users cutting costs and gaining visibility into their token usage.

Marcus Chen

@marcuschen_dev

Been using @Terse_App with Claude Code for a week. Token usage dropped ~40% on agent sessions. The spellcheck alone saves me from costly correction loops — and the monitor shows exactly where tokens go.

Claude Code

Sarah Kim

@sarahk_ai

I type verbose ChatGPT prompts out of habit. Terse catches all my filler words and hedging in real time. -60% tokens on average. It's like Grammarly for token efficiency.

ChatGPT

Jake Ortiz

@jakeortiz

The agent monitor alone is worth it. I can see input/output/cache per turn, tool call costs, and which Cursor sessions are burning the most tokens. Finally have visibility into agent spend.

Cursor Agent

Amara Patel

@amara_codes

Runs 100% on-device. No API calls, no cloud. As someone who works with sensitive codebases, this was the only token optimizer I'd actually trust. Privacy-first done right.

Privacy

Ravi Nguyen

@ravi_ng

Set up OpenClaw + Terse and my API bill dropped immediately. Auto-mode rewrites prompts before send, the monitor tracks every turn's cost, and catching typos means fewer "sorry, I meant..." follow-ups.

OpenClaw

Elena Vasquez

@elena_v

The three modes are perfect. Soft for important prompts where every word matters, Aggressive for quick throwaway questions. Terse adapts to how I work, not the other way around.

3 Modes

Daniel Park

@dpark_dev

I had $40/mo of Claude API credits going unused. Listed my key on the Token Exchange in 2 minutes — now I earn ~$25/mo from other devs using my spare capacity. Basically free money from tokens I was wasting.

Token Exchange

Lisa Chen

@lisachen_ai

As a student, paying full price for Claude API was painful. Token Exchange gets me Sonnet at 40% off retail. One env var and Claude Code just works through the proxy. Plus Terse optimizes my prompts, so tokens go even further.

Token Exchange

Built on research.

Grounded in LLMLingua, Norvig spelling, selective context pruning, and real-world agent session analysis across thousands of Claude Code turns.

Optimization strategies

Token reduction techniques

API providers supported

% max discount on Exchange

Developer API

Terse API

Integrate token optimization directly into your vibe coding project. One API call — 30–60% fewer tokens, same meaning, same results.

Read the docs GitHub examples

🔑

1. Get your API key

⚡

2. Optimize before sending

POST your prompt to /api/v1/optimize — get back a trimmed version with token count saved.

📡

3. Scan your codebase

Use /api/v1/scan to find every LLM call site in your project and see where tokens are wasted.

// Install: (no package needed — just HTTP)
const response = await fetch('https://www.terseai.org/api/v1/optimize', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TERSE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    text: userPrompt,   // your prompt string
    mode: 'normal',     // 'soft' | 'normal' | 'aggressive'
  }),
});
const { optimized, tokens_saved, reduction_pct } = await response.json();
// → use `optimized` as the prompt for your LLM call
        

# pip install requests (or use httpx)
import requests, os

resp = requests.post(
    "https://www.terseai.org/api/v1/optimize",
    headers={"Authorization": f"Bearer {os.environ['TERSE_API_KEY']}"},
    json={
        "text": user_prompt,
        "mode": "normal",  # "soft" | "normal" | "aggressive"
    },
)
data = resp.json()
optimized = data["optimized"]
tokens_saved = data["tokens_saved"]
# → pass `optimized` to your Anthropic/OpenAI call
        

curl -X POST https://www.terseai.org/api/v1/optimize \
  -H "Authorization: Bearer $TERSE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Can you please help me understand how the authentication flow works in this application?",
    "mode": "normal"
  }'

# Response:
{
  "optimized": "Explain the authentication flow.",
  "tokens_original": 22,
  "tokens_optimized": 7,
  "tokens_saved": 15,
  "reduction_pct": 68,
  "mode": "normal"
}
        

// Scan your project code for LLM call sites
const fs = require('fs');
const code = fs.readFileSync('./src/agent.js', 'utf8');

const result = await fetch('https://www.terseai.org/api/v1/scan', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${process.env.TERSE_API_KEY}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ code, language: 'javascript' }),
}).then(r => r.json());

// → result.findings: [{line, type, preview, recommendation}]
console.log(`Found ${result.total_findings} LLM calls to optimize`);
        

API Pricing

Free tier included with every account. No credit card for free. Upgrade for higher limits and batch access.

Free

$0/mo

No credit card needed.

500K tokens / month
60 requests / minute
All 3 compression modes
Community support

Vibe Coding Projects

Built with Terse API? Publish your project to get traffic, users, and visibility in the vibe coding developer community.

Browse all projects →

🤖

ClaudeFlow

Multi-agent workflow builder

247

Orchestrates Claude agents for complex tasks. Terse API cuts context window usage by 41% per turn, enabling longer sessions without hitting limits.

Node.js Agents

~38K tokens saved/mo

GitHub ↗

💬

PromptKit

Open-source prompt library

189

A library of 200+ production-ready prompts. Uses Terse API to auto-compress each prompt at runtime, reducing developer API bills by $80/mo on average.

TypeScript Prompts

~21K tokens saved/mo

GitHub ↗

🐍

PyAgentKit

Python agent scaffolding

134

Batteries-included Python scaffold for building LLM agents. Terse integration is built-in — every outbound message is auto-optimized before the API call.

Python Scaffold

~29K tokens saved/mo

GitHub ↗

Have a project using Terse API?

Teams

Terse Cloud

Visibility and control over your team's AI coding costs. See who's using what, how much is saved, and where tokens are wasted — across every tool.

📊

Token analytics

Dashboard by developer, project, and tool — Mac, Windows, Chrome, VS Code, iOS.

💰

Team savings report

"Your team saved $4,200 this month" — with per-developer and per-project breakdowns.

🔔

Rate-limit alerts

Smart notifications when a developer is approaching heavy usage thresholds.

🔑

Team token control

Distribute API tokens, add or remove developers, and audit all activity in one place.

Open team dashboard →

Free for open source. Teams from $15 / developer / month.

Pricing

Simple, transparent plans

Monthly plans include a 30-day free trial — $0 today. Prefer flexibility? Weekly and quarterly billing available.

Pro

$4.99/mo

or $1.99/week · $12/quarter (~$4/mo — save 20%)

For developers running agent sessions daily. Unlimited prompts, multi-session monitoring.

30-day free trial — cancel anytime

Unlimited optimizations
3 connected sessions
2 devices
All 3 optimization modes
Agent monitoring + duplicate detection
Auto-replace & Send-mode
CLAUDE.md rule generation

Premium

$99/mo

For teams and power users. Unlimited everything, priority support.

30-day free trial — cancel anytime

Unlimited optimizations
Unlimited connected sessions
Unlimited devices
All 3 optimization modes
Full agent analytics + rule generation
Auto-replace & Send-mode
Priority support

FAQ

Frequently asked questions

Everything you need to know about token optimization and how Terse saves you money.

What is token optimization?

Token optimization is the process of reducing the number of tokens in AI prompts and outputs without losing meaning. Terse uses 35+ techniques — including spell correction, filler removal, prompt compression, and semantic deduplication — to cut token usage by 40-70%, directly lowering AI API costs. Read the full guide →

How much can Terse save on AI costs?

Terse reduces token usage by 40-70% on verbose prompts and up to 89% on CLI output noise. A typical 2-hour coding session generates ~210K tokens of raw output, which Terse compresses to ~23K. Combined with duplicate detection and redundant read flagging, total session costs drop 3-5x. See the cost breakdown →

Which AI tools does Terse work with?

Terse auto-detects Claude Code, Cursor, OpenClaw, Aider, and any terminal-based AI agent via process scanning. It also works with browser-based tools like ChatGPT, Claude.ai, and Gemini through macOS Accessibility API integration with Chrome and Safari. See agent monitoring →

How does prompt compression work?

Terse runs a 7-stage pipeline: spell correction (400+ typo fixes), whitespace normalization, pattern optimization (130+ rules), redundancy elimination, NLP analysis, telegraph compression, and final cleanup. Code blocks, URLs, and technical terms are protected throughout.

Does optimization reduce AI output quality?

No. Token optimization removes noise — filler words, hedging, redundant phrases, typos — without changing intent. Research from LLMLingua (EMNLP 2023) shows compressed prompts maintain or improve output quality because models receive cleaner, more focused instructions.

What’s the difference from prompt engineering?

Prompt engineering focuses on crafting better instructions for better AI outputs. Token optimization reduces the cost of those instructions by removing waste — filler, typos, redundancy — without changing meaning. Terse handles optimization automatically so you can focus on engineering for quality. Learn more →

Is Terse free to use?

Yes. Both plans include a 30-day free trial — no charge until your trial ends. Pro ($4.99/mo) includes unlimited optimizations, 3 sessions, agent monitoring, and CLAUDE.md generation. Premium ($99/mo) includes unlimited sessions, devices, and priority support. Cancel anytime.

How do AI tokens affect cost?

AI models charge per token (~4 characters each). Claude Opus costs $15-$75 per million tokens, GPT-4o costs $2.50-$10. A single agent session can consume 200K+ tokens, costing $3-$15. Heavy users spend $200-500+/month on API costs alone. See the full pricing comparison →

What is the Token Exchange?

The Token Exchange is a marketplace where users trade unused AI API tokens. Sellers list their API keys at a discount, buyers get cheaper access. Terse runs a proxy that optimizes every request before forwarding, so the actual API cost is 30-60% less. Terse takes a 15% commission. Your API keys are encrypted with AES-256 and never exposed to buyers.

How do I buy or sell tokens?

Sign in at terseai.org/marketplace. To sell: paste your API key, drag a slider to set your discount, done. To buy: top up your balance, generate a Terse API key, and use it in any SDK — just set the base URL to Terse's proxy. One terminal command to configure.

News & Insights

Stay ahead on AI cost optimization.

Research, guides, and analysis on reducing LLM token costs — from prompt caching to model routing to agent session efficiency.

Cost Optimization May 1, 2026

Prompt Caching in 2026: How to Cut Claude Code Costs by 73%

Prompt caching is now the single highest-ROI technique for reducing AI agent costs. Cached tokens cost 10× less than standard input tokens — here's how to structure every session for maximum cache hit rate.

Agent Optimization May 7, 2026

Git Diff Compression: Reclaim 70% of Your Agent Context Window

When Claude Code runs git diff, the output can consume 30–60K tokens per turn. Compressing to changed lines + 1 line of context is the fastest context win in any agent workflow.

Research Apr 22, 2026

The Real Cost of Duplicate Tool Calls in AI Agent Sessions

Analysis of 500 real Claude Code sessions found 34% of Read tool calls were redundant. At $15/MTok for Opus, each repeated file read wastes ~$0.006 — small per turn, painful at scale.

Model Selection Apr 15, 2026

Model Routing in 2026: When Sonnet Outperforms Opus at 1/5th the Cost

Not every agent task needs Claude Opus. Single-file edits, test generation, and short lookups perform identically on Sonnet — saving teams $200+/month with zero quality loss on routine tasks.

Technique Comparison Mar 20, 2026

LLMLingua vs. Rule-Based Compression: A Developer's Benchmark

LLMLingua uses perplexity scoring to drop tokens at inference time. Deterministic rule pipelines work at the prompt layer before the API call. We tested both on 1,000 real developer prompts.

Research Mar 10, 2026

Why Typos in Agent Prompts Cost More Than You Think

A single typo in an agent prompt can trigger a clarification turn, adding 200–500 tokens of overhead. Typo-caused retries account for 8–15% of total token spend across a typical week of Claude Code usage.

Stop wasting
tokens and money.

Optimize every prompt. Monitor every agent session. And give your whole team visibility with Terse Cloud — analytics by developer, project, and tool.

macOS Windows VS Code Chrome Terse Cloud Join our Slack

First Launch — macOS Security Step (one-time only)

macOS blocks unsigned apps by default. Pick whichever method works for you:

A System Settings — easiest, no Terminal needed

B Right-click to open — works on macOS Ventura and earlier

C Terminal command — one command, works on all versions

Drag Terse to /Applications, then paste this in Terminal:
xattr -cr /Applications/Terse.app && /Applications/Terse.app/Contents/MacOS/terse 2>/dev/null &

100% on-device Zero latency

Save every token.Power your team.

Optimize every message

Monitor agent sessions

Eliminate wasted tokens

Git diff compression

Prompt caching awareness

Conversation history summarization

Model-specific optimization

Not just a compressor.A butler for your AI agents.

Optimize

Monitor

Budget breaker

MCP manager

Doctor

Team

Every turn optimized,automatically.

Download. Click connect.Everything just works.

Terse vs RTK & othertoken libraries

Runs onany app, automatically.

Every messageoptimized live.

Every wastedtoken found.

Redundant Read Detection

Question → Imperative

Duplicate Tool Call

Semantic Dedup

Filler Removal

Unused Tool Overhead

You controlhow much to save.

See everythingyour agent does.

Opus billed.Sonnet delivered.

Tested onreal sessions.

Real outputs, real savings.

No AI tool offers unlimited usage.

Terse stretches every plan further.

Your companion through every session

...and loved by developers

Built on research.

Terse API

1. Get your API key

2. Optimize before sending

3. Scan your codebase

API Pricing

Vibe Coding Projects

Submit your project

Terse Cloud

Token analytics

Team savings report

Rate-limit alerts

Team token control

Simple, transparent plans

Frequently asked questions

Trade tokens in3 clicks. Save 50%+.

Stay ahead on AI cost optimization.

Prompt Caching in 2026: How to Cut Claude Code Costs by 73%

Git Diff Compression: Reclaim 70% of Your Agent Context Window

The Real Cost of Duplicate Tool Calls in AI Agent Sessions

Model Routing in 2026: When Sonnet Outperforms Opus at 1/5th the Cost

LLMLingua vs. Rule-Based Compression: A Developer's Benchmark

Why Typos in Agent Prompts Cost More Than You Think

Get the token savings playbookin your inbox.

Stop wastingtokens and money.

Save every token.
Power your team.

Not just a compressor.
A butler for your AI agents.

Every turn optimized,
automatically.

Download. Click connect.
Everything just works.

Terse vs RTK & other
token libraries

Runs on
any app, automatically.

Every message
optimized live.

Every wasted
token found.

You control
how much to save.

See everything
your agent does.

Opus billed.
Sonnet delivered.

Tested on
real sessions.

Trade tokens in
3 clicks. Save 50%+.

Get the token savings playbook
in your inbox.

Stop wasting
tokens and money.