Documentation

Terse Documentation

Everything you need to install, configure, and use Terse — the on-device token optimizer and agent monitor for macOS.

Getting Started
How Terse Works
Connecting Apps
Agent Monitoring
Optimization Modes
Auto-Mode and Send-Mode
Keyboard Shortcuts
Privacy & Security
Pricing
Tech Stack

Getting Started

Download and Install

Terse is distributed as a DMG file from GitHub Releases. Download the latest .dmg file for macOS, open it, and drag the Terse app into your Applications folder.

First-Time Setup

Terse is not currently signed with an Apple Developer certificate. On first launch, macOS will block the app. To open it:

Open Finder and navigate to Applications
Right-click (or Control-click) on Terse
Select Open from the context menu
In the dialog that appears, click Open to confirm

You only need to do this once. After the first launch, macOS will remember your choice and Terse will open normally.

Grant Accessibility Permissions

Terse uses the macOS Accessibility API to read and write text in other applications. On first launch, macOS will prompt you to grant Accessibility access:

Open System Settings → Privacy & Security → Accessibility
Click the + button and add Terse
Ensure the toggle is enabled

Without Accessibility permissions, Terse cannot read text from connected apps or write optimized text back. The optimizer will still work in manual copy-paste mode, but automatic capture and replace will be disabled.

How Terse Works

Terse runs as a lightweight macOS application with two windows:

The Main Window

The main window is a session manager that lists all connected applications. You can open it at any time with Cmd+Shift+T. From here you can add or remove sessions, view optimization statistics, and access settings.

The Popup Window

The popup is a floating bar that appears at the top of your screen when you focus a connected app. It shows live optimization stats, mode controls, and action buttons (Capture, Replace, Copy). The popup uses alwaysOnTop rendering so it never steals focus from your active application.

The 7-Stage Optimization Pipeline

Every prompt that Terse processes passes through a 7-stage pipeline:

Code block protection — extracts and preserves code blocks, URLs, and inline code so they are never modified
Spell correction — fixes typos using a hardcoded dictionary for common coding/prompt typos, then runs macOS NSSpellChecker for broader coverage
Whitespace normalization — collapses multiple spaces, removes trailing whitespace, normalizes line breaks
Pattern optimization — applies 20+ rule-based transformations to shorten verbose phrases
NLP analysis — identifies and removes filler words, hedging language, politeness markers, and meta-language
Telegraph compression — removes articles, shortens syntax, applies abbreviations (Aggressive mode only)
Code block restoration — reinserts all protected code blocks in their original positions

The pipeline runs entirely on-device with zero latency from network calls. Typical processing time is under 5 milliseconds for prompts up to 4,000 tokens.

Connecting Apps

Terse automatically detects supported applications through a combination of macOS Accessibility API inspection and process scanning. No manual configuration is needed for supported apps.

ChatGPT (Chrome / Safari)

Terse detects ChatGPT via the Accessibility API when Chrome or Safari has a ChatGPT tab in focus. It reads the textarea content directly via AXTextArea / AXValue. Both the compose box and conversation history are accessible.

Claude Code

Claude Code runs as a terminal process. Terse detects it via process scanning (every 5 seconds) and tails the JSONL conversation log to extract token usage, tool calls, and costs. Prompt optimization is applied via the clipboard, since terminal emulators do not expose editable text fields through Accessibility.

Cursor

Cursor is detected through both Accessibility API and process scanning. Terse reads the compose input via Accessibility when available, and falls back to selection-based capture (Cmd+C) for the Electron canvas editor, similar to VS Code.

OpenClaw

OpenClaw is detected via process scanning. Terse tails its conversation logs to extract token metrics and provides optimization for prompts via clipboard integration.

Aider

Aider is detected via process scanning. As a terminal-based tool, Terse monitors its process output and provides prompt optimization through clipboard-based capture and replace.

VS Code

VS Code uses an Electron canvas renderer for its editor, which means the Accessibility API cannot read editor content directly. Terse detects VS Code via Accessibility and falls back to selection-based capture: it uses Cmd+C to copy selected text, optimizes it, and can write it back via Cmd+A + Cmd+V in the target field.

App Detection Notes

Terse prioritizes AXTextArea elements over AXTextField when scanning via Accessibility, since text areas are more likely to be prompt input fields. Results shorter than 5 characters are ignored as they are typically cursor indicators or UI fragments, not actual prompt content.

Agent Monitoring

The agent monitoring system provides real-time visibility into what your AI agents are doing, how much they cost, and where efficiency can be improved.

Session Detection

Terse scans running processes every 5 seconds to detect active agent sessions. When a supported agent process is found (Claude Code, Cursor, OpenClaw, Aider), Terse begins monitoring its activity automatically.

Metrics Tracked

Input tokens — total tokens sent to the model across all turns
Output tokens — total tokens generated by the model
Cache read/write — tokens served from cache vs. freshly computed (for providers that support prompt caching)
Tool calls — count and type of every tool invocation (Read, Edit, Bash, Grep, Glob, etc.)
Cost — per-turn and cumulative session cost, computed from provider pricing
Context fill — percentage of the model's context window currently in use

Insights and Alerts

Terse generates actionable insights based on session activity:

Context fill warnings — advisory at 60%, critical at 85%, limit warning at 95%
Duplicate tool calls — flags when the agent invokes the same tool with the same arguments multiple times
Redundant file reads — detects when a file is read again without intervening edits
Cost spikes — alerts when a single turn exceeds a configurable cost threshold
Cache efficiency — reports the ratio of cached vs. uncached input tokens

CLAUDE.md Rule Generation

Based on observed patterns in agent sessions, Terse can generate CLAUDE.md rules that teach agents to avoid wasteful behaviors. For example, if an agent repeatedly reads the same configuration files at the start of every session, Terse generates a rule instructing the agent to skip those reads unless the task specifically requires them. Learn more about how this extends selective context pruning across sessions.

Optimization Modes

Terse offers three optimization modes, each applying progressively more aggressive transformations. You can toggle between modes using the popup bar.

Soft Mode

The lightest optimization level. Soft mode applies only:

Spell correction — fixes typos via the hardcoded dictionary and macOS NSSpellChecker
Whitespace normalization — collapses extra spaces and blank lines

Soft mode preserves 100% of the original meaning and intent. It is ideal when you want clean, error-free prompts without any content changes. Typical token reduction: 2-5%.

Normal Mode

The balanced default. Normal mode includes everything in Soft mode plus:

Filler word removal — strips "just", "actually", "basically", "really", and similar low-information words
Hedging removal — removes "I think", "maybe", "perhaps", "it seems like" and other uncertainty markers
Politeness pruning — strips "please", "could you kindly", "would you mind", and similar courtesy phrases that LLMs do not need
Meta-language removal — removes "I want you to", "what I need is", "the thing is" and other self-referential phrasing
Pattern optimization — applies 20+ phrase-shortening rules

Normal mode offers a good balance between compression and readability. The optimized text is still natural-sounding but more concise. Typical token reduction: 15-25%.

Aggressive Mode

Maximum compression. Aggressive mode includes everything in Normal mode plus:

Telegraph compression — removes articles ("the", "a", "an"), shortens sentence structure
Abbreviation expansion — common phrases shortened to abbreviations where unambiguous
Markdown stripping — removes formatting markers that consume tokens without affecting LLM comprehension

Aggressive mode may sacrifice some nuance and readability for maximum token savings. The resulting text reads more like shorthand or technical notes. Typical token reduction: 25-40%.

Auto-Mode and Send-Mode

Auto-Mode

When Auto-Mode is enabled (toggled via the popup bar), Terse automatically monitors the active app's text field and writes the optimized version back whenever a change is detected. The workflow is fully hands-free:

You type or paste a prompt in the connected app
Terse detects the change via Accessibility polling
The optimization pipeline runs on the new text
The optimized text is written back to the app's input field via AXValue (or Cmd+A + Cmd+V fallback)

Auto-Mode works best with apps that expose editable text fields via Accessibility (ChatGPT in Chrome/Safari, Cursor's compose box). For terminal-based agents like Claude Code and Aider, manual Send-Mode is recommended.

Send-Mode (Manual)

Send-Mode is the default workflow. You control each step:

Capture — reads the current text from the active app (or use Cmd+Shift+C to capture selected text)
Replace — writes the optimized text back to the app's input field
Copy — copies the optimized text to your clipboard for manual pasting

Send-Mode gives you full control over when optimization is applied and lets you review the result before committing it.

Keyboard Shortcuts

Shortcut	Action	Context
`Cmd` + `Shift` + `T`	Toggle main window	Global — works from any application
`Cmd` + `Shift` + `C`	Capture selected text	Global — captures from the active app and sends to Terse for optimization

Privacy & Security

Terse is designed with a strict on-device architecture. Your prompts, text content, and agent session data never leave your machine.

100% on-device processing — all optimization runs locally. There are no API calls to external services for text processing.
No cloud, no telemetry — Terse does not phone home, does not collect usage analytics, and does not transmit any data over the network (except for license validation, which sends only your license key).
No data storage — Terse does not persist your prompts or conversation content to disk. Text is held in memory only during active optimization and discarded immediately after.
Code protection — code blocks, inline code, and URLs are extracted before optimization and restored after, ensuring they are never modified by the optimization pipeline.
Accessibility API only — Terse reads app content through the official macOS Accessibility API, the same system used by screen readers and assistive technology. It does not inject code, hook processes, or use private APIs.

Pricing

Free

$0/mo

50 optimizations per week
1 connected session
All three optimization modes
Spell correction
Basic popup interface

Pro

$7.99/mo

Unlimited optimizations
3 concurrent sessions
Agent monitoring dashboard
Context fill warnings
Duplicate tool call detection
CLAUDE.md rule generation

Premium

$99/mo

Unlimited everything
Unlimited concurrent sessions
Advanced agent insights
Priority support
Early access to new features

All plans include the full on-device optimization pipeline. Paid plans are billed monthly and can be cancelled at any time. Licenses are validated locally and do not require a persistent internet connection after initial activation.

Tech Stack

Terse is built on a modern, performant stack optimized for macOS:

Tauri — Rust-based application framework. Handles window management, process scanning, IPC, and the agent monitoring backend. Tauri produces small, fast binaries (~15 MB) compared to Electron alternatives.
Rust backend — the agent monitor, session manager, and JSONL log parser are implemented in Rust for performance and reliability. Token counting, cost computation, and context analysis run in Rust threads.
Web frontend — the popup and main window UIs are built with vanilla HTML, CSS, and JavaScript. No framework overhead — the popup renders in under 16ms.
Swift helper (terse-ax) — a compiled Swift binary that interfaces with the macOS Accessibility API. Handles reading text from apps (read-app), writing text back (write-pid), and spell checking via NSSpellChecker.
Norvig-style spelling — the spell correction engine uses a probabilistic approach inspired by Peter Norvig's spell corrector, combined with a hardcoded dictionary for common coding and prompt typos.
nspell / Hunspell — additional dictionary-based spell checking for broader language coverage beyond the hardcoded typo dictionary.

Need Help?

If you run into issues or have questions not covered here, you can:

File an issue on GitHub Issues
Check the Releases page for the latest version and changelog
Explore the technique deep-dives: Spell Correction, Pattern Optimization, NLP Analysis, Telegraph Compression
Read the research pages: LLMLingua, Norvig Spelling, Selective Context

Ready to Optimize?

Download Terse and start saving tokens in under two minutes. Free tier included.

Download Terse