Telegraph Compression
Maximum token reduction for AI agent sessions. Inspired by telegraph-era word economy, Terse's aggressive mode strips prompts to their essential meaning and achieves 40-70% compression.
What Is Telegraph Compression?
In the 1850s, sending a telegram cost money per word. This economic pressure created a new dialect: telegraph English. Articles disappeared. Pronouns vanished. Every word that did not carry critical meaning was stripped away. "I am arriving on the morning train on Tuesday" became "ARRIVING MORNING TRAIN TUESDAY." The meaning survived. The cost dropped by half.
AI token pricing creates the same economic pressure. Every token in your prompt costs money and consumes context window space. Telegraph compression applies the same principle: strip everything that does not carry meaning, and let the model reconstruct the intent from the essential words that remain.
Terse's Aggressive mode implements telegraph compression as the final stage of its optimization pipeline. After spell correction, pattern optimization, and NLP analysis have already removed structural waste, telegraph compression goes further by removing entire categories of words that carry minimal information in the context of an AI prompt.
How Telegraph Compression Works in Terse
Article Removal
English articles ("the", "a", "an") are among the most frequent tokens in any prompt. They help human readers parse sentences but add almost zero information for language models. LLMs are trained on enough text that they can reconstruct the intended meaning without articles in most contexts.
Before: "Fix the bug in the authentication module where the
token is not refreshed after the session expires"
After: "Fix bug in authentication module where token not
refreshed after session expires"
Saved: 5 tokens
Before: "Create a new endpoint that returns a list of all the
active users in the database"
After: "Create new endpoint returning list of active users
in database"
Saved: 6 tokens
Terse does not blindly strip every article. Articles inside code blocks, quoted strings, and variable names are preserved. The phrase a = 5 keeps its "a" because it is a variable name, not an article. Similarly, articles in strings like "The user has been deleted" are preserved because they are part of output that end users will see.
Pronoun Dropping
First-person and second-person pronouns are almost entirely redundant in AI prompts. When you send a message to Claude or GPT-4, the model already knows that "I" refers to the user and "you" refers to the model. These pronouns consume tokens without adding information.
Before: "I want you to refactor this function so that it uses
async/await instead of callbacks. I also need you to
add error handling."
After: "Refactor function to use async/await instead of
callbacks. Add error handling."
Saved: 11 tokens
Before: "Can you explain to me how I should structure my
database schema for a multi-tenant application?"
After: "Explain database schema structure for multi-tenant
application"
Saved: 9 tokens
Pronoun dropping combines with the implicit context removal from NLP analysis to eliminate the conversational framing that surrounds most prompts. The result reads more like a command than a conversation, but language models respond equally well to both styles. In many benchmarks, direct commands actually produce better results than polite requests.
Stopword Removal
Beyond articles and pronouns, English contains dozens of high-frequency, low-information words: "just", "really", "very", "basically", "actually", "simply", "quite", "rather", "somewhat", "perhaps". These words modulate tone but carry negligible semantic weight in technical prompts.
Before: "I'm just trying to basically understand why this
function is actually returning undefined instead of
the value I really expected"
After: "Why function returning undefined instead of expected
value"
Saved: 14 tokens
Before: "Could you perhaps help me to simply add some very
basic input validation to this form?"
After: "Add basic input validation to form"
Saved: 11 tokens
Terse maintains a curated stopword list tuned specifically for AI prompts. Unlike generic NLP stopword lists, Terse's list excludes words that carry important meaning in programming contexts. "Not", "no", "without", and "except" are never removed because negation is semantically critical. "If", "when", "while", and "until" are preserved because they define conditional logic. The list targets only words that serve as conversational filler in the specific context of human-to-AI communication.
Markdown Noise Stripping
Many users format their prompts with Markdown: headers, bold markers, bullet points, horizontal rules. When you are typing into a plain text input that gets sent as a prompt, these formatting characters consume tokens without producing any visual formatting. The model processes them as literal characters.
Before: "## Task\n\n**Please** review the following code:\n\n
- Check for bugs\n- Check for performance issues\n
- Suggest improvements\n\n---\n\n```python\ndef foo()..."
After: "Review following code. Check for bugs, performance
issues. Suggest improvements.\n\npython\ndef foo()..."
Saved: 12 tokens
Terse strips Markdown headers (##), bold/italic markers (**, *), horizontal rules (---), and converts bullet lists into comma-separated phrases when the items are short. Code fences are simplified but the code content inside them is never modified. This is a critical safety boundary: telegraph compression operates on natural language only and never touches code tokens.
Low-Information Word Removal
The final layer of telegraph compression targets words and phrases that carry minimal information density. These are not stopwords in the traditional sense but rather words that add specificity without changing the model's interpretation of the prompt.
Before: "Write a comprehensive and detailed implementation of
a binary search tree data structure in TypeScript
with full type annotations"
After: "Implement binary search tree TypeScript typed"
Saved: 14 tokens
Before: "Please provide a thorough explanation of how the
JavaScript event loop works, including microtasks
and macrotasks"
After: "Explain JavaScript event loop, microtasks, macrotasks"
Saved: 9 tokens
Words like "comprehensive", "detailed", "thorough", and "full" are common in prompts but rarely change the model's output. Claude and GPT-4 default to comprehensive responses regardless of whether you ask for them. Stripping these modifiers saves tokens without measurably affecting response quality.
When to Use Telegraph Compression
Telegraph compression is not appropriate for every situation. It is designed for specific use cases where maximum token efficiency outweighs preserving the original style and tone of the prompt.
Ideal Use Cases
- Agent sessions: Multi-turn coding sessions with Claude Code, Cursor, or Copilot where you are sending dozens of prompts and context window pressure is real. Telegraph compression keeps you within limits longer.
- API batch processing: When you are sending thousands of prompts through an API, even small per-prompt savings multiply into significant cost reduction. A 50% reduction on 10,000 prompts at $3/million tokens adds up.
- Context-heavy workflows: When your system prompt and retrieved context already consume most of the context window, compressing the user prompt with telegraph mode gives the model more room to reason.
- Rapid iteration: During development when you are sending the same type of prompt repeatedly with minor variations, telegraph compression removes the ceremony and lets you focus on what changes between iterations.
When NOT to Use Telegraph Compression
- Critical prompts with nuance: If the exact phrasing matters, such as legal document analysis, medical queries, or safety-critical instructions, use Soft or Normal mode instead. Telegraph compression may strip qualifiers that carry important meaning.
- Creative writing prompts: Tone, style, and word choice are the payload in creative prompts. Stripping them defeats the purpose. A prompt asking for "a whimsical, dreamy, slightly melancholic poem" should not become "write poem."
- Prompts with deliberate emphasis: If you have intentionally repeated or emphasized certain words to steer the model's attention, telegraph compression may remove that emphasis. Use Normal mode to preserve intentional structure.
- User-facing outputs: If the prompt will be shown to end users (e.g., in a chatbot interface), the telegraphic style may appear robotic or unfriendly. Keep the human-readable version for display.
How Terse Protects Sensitive Tokens
The most important design constraint in telegraph compression is knowing what not to remove. Terse implements several protection mechanisms to ensure that compression never corrupts functional content.
Code Block Preservation
Everything inside code fences, inline code markers, and indented code blocks is treated as immutable. Telegraph compression operates only on the natural language surrounding code. This means you can safely optimize prompts that contain code snippets, configuration files, error messages, or terminal output without any risk of the code being modified.
ALL-CAPS Protection
Words in ALL CAPS are preserved regardless of whether they would normally be stripped. ALL-CAPS words typically indicate emphasis, acronyms, constants, or environment variables. Removing "NOT" from "DO NOT delete the database" would invert the meaning entirely. Terse recognizes that capitalization signals importance and exempts these tokens from all compression.
URL and Path Preservation
URLs, file paths, and import statements are detected and protected. The articles and prepositions inside a URL like https://api.example.com/the/users/a/new are part of the path structure, not English grammar. Terse's tokenizer identifies these patterns before compression begins and marks them as immutable regions.
Quoted String Preservation
Text inside single quotes, double quotes, and backticks is preserved verbatim. These are typically string literals, error messages, or specific values that the user wants the model to see exactly as written. Telegraph compression skips over quoted regions entirely.
Real Examples: Before and After
Here are complete prompt transformations showing the full effect of telegraph compression combined with the earlier pipeline stages.
ORIGINAL (87 tokens): "Hey, I'm working on a React application and I'm having trouble with the useEffect hook. When I navigate to a new page, the previous page's useEffect cleanup function doesn't seem to be running. Could you help me understand why this might be happening and suggest a fix?" TELEGRAPH (31 tokens): "React useEffect cleanup not running on page navigation. Why? Fix?" REDUCTION: 64% (56 tokens saved)
ORIGINAL (104 tokens): "I have a Python script that processes a large CSV file (about 2GB) and it's running really slowly. Currently I'm reading the entire file into memory using pandas read_csv. I think the issue is that it's loading everything at once. Can you suggest a way to process it in chunks or stream it so that it uses less memory?" TELEGRAPH (29 tokens): "Python CSV processing 2GB file slow. Using pandas read_csv loads all into memory. Suggest chunked/streaming approach, reduce memory." REDUCTION: 72% (75 tokens saved)
ORIGINAL (68 tokens): "Please write a comprehensive unit test suite for the following TypeScript function. Make sure to cover edge cases including null inputs, empty arrays, and very large numbers. Use Jest as the testing framework." TELEGRAPH (22 tokens): "Write Jest unit tests for TypeScript function. Cover: null inputs, empty arrays, large numbers." REDUCTION: 68% (46 tokens saved)
Relationship to LLMLingua Research
Microsoft Research's LLMLingua system pioneered the idea of using a small language model to score each token's contribution to the prompt's meaning, then dropping low-scoring tokens. This perplexity-based approach achieves impressive compression ratios on long documents and retrieval-augmented generation contexts.
Telegraph compression in Terse shares the same goal but uses a fundamentally different mechanism. Instead of scoring tokens with a neural model, it uses categorical rules: articles are always low-information, pronouns are redundant in human-to-AI communication, stopwords modulate tone but not meaning. This rule-based approach has three advantages over perplexity scoring.
First, it is deterministic. The same input always produces the same output. There is no model variance, no temperature setting, no randomness. This predictability is essential for automated pipelines where you need to know exactly what the model will receive.
Second, it is fast. Telegraph compression runs in under 2 milliseconds on typical prompts. LLMLingua requires a forward pass through a language model for each token, which adds hundreds of milliseconds or more depending on the prompt length. In an interactive agent session where the user expects real-time optimization, this latency difference matters.
Third, it runs entirely on-device with zero external dependencies. No GPU, no API call, no model download. Terse's telegraph compression works offline, on any machine, with constant memory usage. This aligns with Terse's core principle that prompt optimization should be a lightweight local tool, not a cloud service.
The tradeoff is precision. LLMLingua can identify that a specific adjective is important in context even if it would normally be classified as low-information. Telegraph compression operates on categories rather than context, which means it occasionally strips a word that carries more meaning than its category suggests. This is why Terse offers three modes: users who need precision use Normal mode with NLP analysis, and users who want maximum compression accept the slight meaning loss of Aggressive mode.
Compounding Savings in Agent Sessions
The real power of telegraph compression emerges over the course of a long agent session. Consider a typical 50-turn coding session with an AI assistant. Without optimization, each user turn averages 120 tokens. With telegraph compression, each turn drops to approximately 45 tokens.
Over 50 turns, that is 6,000 tokens without optimization versus 2,250 tokens with telegraph compression. The 3,750 tokens saved are not just a cost reduction. They are context window space reclaimed for the model's reasoning. In long sessions where context truncation becomes a factor, this extra headroom can mean the difference between the model remembering a critical detail from turn 5 and losing it.
Telegraph compression also reduces the noise in the conversation history that the model reviews on each turn. Cleaner, more direct prompts in the history make it easier for the model to identify what the user actually wants, reducing the likelihood of misinterpretation in later turns. Several Terse users have reported that their agent sessions produce noticeably better results in later turns when telegraph compression is enabled, likely because the model spends less attention budget parsing conversational fluff from earlier turns.
Maximize Your Token Efficiency
Terse's Aggressive mode with telegraph compression achieves 40-70% token reduction on real prompts. Runs on-device, works with any AI model, no data leaves your machine.
Download TerseFurther Reading
- Pattern Optimization — How Terse replaces verbose phrases with concise alternatives
- NLP Analysis — Linguistic techniques for structural token reduction
- LLMLingua — Perplexity-based prompt compression from Microsoft Research
- Selective Context — Academic research on context-aware prompt compression
- Spell Correction — The first stage of Terse's optimization pipeline