Techniques

Pattern Optimization

130+ deterministic rules that compress verbose AI prompts by 20–40%, saving tokens on every turn across ChatGPT, Claude Code, and agent sessions.

Why Verbose Prompts Cost More Than You Think

Large language models charge per token. A token is roughly 4 characters or three-quarters of a word. When you write "I was wondering if you could maybe help me write a function that" instead of "Write function that", you are paying for 16 tokens instead of 4. The model understands both identically.

This matters at three levels. First, direct cost: every extra token in your input is billed at the model's per-token rate. At Claude Sonnet pricing, 1,000 unnecessary tokens cost roughly $0.003 on input. That sounds trivial until you multiply by the number of prompts per day.

Second, context window consumption. Every model has a finite context window. GPT-4o supports 128K tokens. Claude supports 200K. When your prompts are 40% longer than necessary, you hit the context ceiling 40% sooner. In long agent sessions, this means earlier context truncation and degraded response quality.

Third, and most significantly, agent session compounding. In a Claude Code or Cursor session, every prompt you send becomes part of the conversation context. A 40% overhead on turn 1 gets re-read by the model on turn 2, turn 3, turn 4, and every subsequent turn. Across a 30-turn session, that initial verbosity gets billed 30 times. A single wordy prompt that could have been 50 tokens shorter ends up costing 1,500 extra tokens over the session.

Pattern optimization addresses this by applying deterministic, rule-based transformations that shorten verbose phrases without changing their meaning. It runs on-device in under 2 milliseconds and requires no API calls.

How 130+ Rules Compress Your Prompts

Terse's pattern optimization engine contains over 130 phrase-shortening rules organized into six categories. Each rule is a regex or string match paired with a replacement. The rules are applied in a specific order to avoid conflicts, and each one has been tested against thousands of real AI prompts to verify that it preserves intent.

Filler Removal

Filler words and phrases add no information. They are verbal habits carried over from speech into writing. LLMs do not need them and process them at full token cost.

"I basically think that we should" → "we should"
"So essentially what I need is"   → "I need"
"I just wanted to quickly"        → ""
"Actually, I think"               → ""
"To be honest,"                   → ""
"At the end of the day,"          → ""

These patterns appear in roughly 35% of prompts written in conversational style. Removing them typically saves 3 to 8 tokens per prompt with zero loss of meaning. The model does not need to know that you "basically" think something. It just needs the instruction.

Phrase Shortening

English has many multi-word phrases that can be replaced with shorter equivalents. These are not filler; they carry meaning. But the same meaning can be expressed in fewer tokens.

"in order to"              → "to"
"due to the fact that"     → "because"
"at this point in time"    → "now"
"in the event that"        → "if"
"with regard to"           → "about"
"in the process of"        → "currently"
"a large number of"        → "many"
"on a daily basis"         → "daily"
"has the ability to"       → "can"
"it is important to note"  → "note:"
"take into consideration"  → "consider"
"make sure that"           → "ensure"
"the vast majority of"     → "most"
"in close proximity to"    → "near"

These rules are the highest-impact category. A single instance of "due to the fact that" replaced with "because" saves 5 tokens. Across a prompt with three or four such phrases, the savings reach 15 to 25 tokens.

Hedging Removal

Hedging language softens statements. In conversation, it is polite. In AI prompts, it is wasted tokens. The model does not have feelings to protect, and hedging often makes instructions ambiguous.

"I was wondering if you could maybe"  → ""
"Would it be possible to perhaps"     → ""
"I'm not sure but I think maybe"      → ""
"It might be worth considering"       → "consider"
"It seems like it could be"           → ""
"If it's not too much trouble"        → ""

Hedging removal is one of the more aggressive transformations. In Soft mode, Terse leaves hedging intact. In Normal mode, it removes clear hedging patterns. In Aggressive mode, it strips all hedging including partial hedges embedded in longer sentences.

Politeness Stripping

Politeness prefixes and suffixes are entirely invisible to the model's instruction-following capabilities. Extensive testing has shown that "Write a function" and "Could you please kindly help me write a function" produce identical outputs from every major LLM.

"Could you please help me"    → ""  (imperative follows)
"Would you mind"              → ""
"I'd really appreciate if"    → ""
"Thank you in advance"        → ""
"Please and thank you"        → ""
"If you don't mind"           → ""

This category saves an average of 6 to 12 tokens per prompt. In Normal mode, Terse removes obvious politeness wrappers. In Aggressive mode, it converts full polite requests into direct imperatives: "Could you please write a Python function that sorts a list?" becomes "Write Python function: sort list".

Meta-Language Removal

Meta-language is text about the text. It describes what you are about to say rather than saying it. LLMs do not need narration of your thought process.

"Let me explain what I need"      → ""
"As I mentioned earlier"          → ""
"To give you some context"        → ""
"What I'm trying to say is"       → ""
"Here's what I'm looking for"     → ""
"The thing is"                    → ""
"The reason I'm asking is"        → ""
"To put it another way"           → ""

Meta-language is especially common in follow-up prompts within a conversation. Users naturally write "As I mentioned earlier, the function should..." when they could simply write the instruction. The model already has the conversation history. It does not need to be told that something was mentioned earlier.

Question-to-Imperative Conversion

Questions addressed to an LLM can almost always be rephrased as imperatives. The question form adds interrogative tokens that carry no additional instruction.

"Can you write a function that"     → "Write function:"
"Could you create a script to"     → "Create script:"
"Would you be able to explain"     → "Explain"
"Is it possible to implement"      → "Implement"
"Do you know how to"               → ""

This conversion is most aggressive in Aggressive mode, where it also strips articles and reduces the instruction to its minimal form. In Normal mode, it converts the question form but preserves the full noun phrases.

Before and After: Real Prompt Examples

Before (87 tokens) Hi there! I was wondering if you could maybe help me out with something. I'm trying to write a Python function that takes in a list of numbers and returns the average, but I'm not sure about the best way to handle the edge case where the list might be empty. Could you please write that function for me? Thank you so much!

After (22 tokens) Write Python function: take list of numbers, return average. Handle empty list edge case.

That is a 75% reduction. The model produces identical output for both prompts. In an agent session where this context gets re-read 20 times, the savings compound to over 1,300 tokens.

Before (62 tokens) So basically what I need is for you to look at the following code and tell me if there are any bugs or issues with it. I think there might be a problem with the error handling, but I'm not entirely sure. Could you take a look and let me know what you think?

After (16 tokens) Review this code for bugs. Focus on error handling. List issues found.

A 74% reduction. The instruction is clearer in the compressed form. There is no ambiguity about what is being asked.

Before (44 tokens) I'd really appreciate it if you could help me refactor this function in order to make it more readable and also to improve the performance, due to the fact that it's currently running quite slowly.

After (14 tokens) Refactor this function for readability and performance. Currently slow.

A 68% reduction. Every phrase-shortening rule contributed: "in order to" became "for", "due to the fact that" became "because" (then merged into the sentence), and the politeness wrapper was stripped.

Three Modes, Three Levels of Compression

Not every prompt needs maximum compression. Sometimes you want to preserve your voice. Terse offers three optimization modes that control which pattern categories are applied.

Soft Mode

Soft mode applies only spell correction and whitespace normalization. No patterns are applied. Your prompt text stays exactly as written, minus typos and extra spaces. This mode preserves 100% of your intended phrasing and is ideal when you are crafting careful system prompts or few-shot examples where exact wording matters.

Normal Mode

Normal mode applies filler removal, phrase shortening, and meta-language removal. It leaves hedging partially intact and does not convert questions to imperatives. This is the default mode and provides a good balance between compression and naturalness. Typical savings: 20 to 30% token reduction.

Aggressive Mode

Aggressive mode applies all six pattern categories at full strength, plus telegraph compression and markdown stripping. Questions become imperatives. Politeness is fully stripped. Hedging is removed. Articles and prepositions are dropped where the meaning survives without them. Typical savings: 35 to 50% token reduction. This mode may alter nuance in complex instructions, so it is best suited for straightforward coding tasks and factual queries.

Why Rules Beat LLM-Based Compression

There are research approaches to prompt compression that use a secondary LLM to rewrite prompts in fewer tokens. LLMLingua and Selective Context are notable examples. These methods can achieve high compression ratios, but they have fundamental drawbacks for interactive use.

First, latency. Calling an LLM to compress a prompt takes 500 milliseconds to 3 seconds. Terse's pattern engine runs in under 2 milliseconds. When you are typing in a live chat interface, the compression needs to be instant.

Second, cost. Using an LLM to compress prompts means paying for the compression itself. If you spend 100 tokens to save 200 tokens, your net savings is only 100 tokens minus the cost of the compression call. For short prompts, the overhead can exceed the savings.

Third, determinism. Rule-based patterns produce identical output every time. You can predict exactly what will change. LLM-based compression is stochastic. The same input might produce different compressed versions on different runs, and occasionally the compressed version loses critical information.

Fourth, privacy. Rule-based compression runs entirely on-device. Your prompts are never sent anywhere. LLM-based compression requires sending your prompt to an API, which may not be acceptable for proprietary code or sensitive discussions.

Terse's approach is to use deterministic rules for the patterns that can be reliably identified and compressed, and to leave everything else untouched. This means the compression ratio is lower than what an LLM might achieve, but the reliability is 100% and the latency is negligible.

Pattern Optimization in the Pipeline

Pattern optimization is Stage 2 of the Terse optimization pipeline, running after spell correction and before NLP analysis. This ordering ensures that patterns match against correctly-spelled text, maximizing hit rates.

After pattern optimization, the NLP analysis stage examines the remaining text for structural redundancies that require sentence-level understanding, and telegraph compression performs the final token-level compression in Aggressive mode.

All stages are visible in the Techniques panel of the Terse interface, where you can see exactly which rules fired and how many tokens each one saved. This transparency lets you tune your writing habits over time, learning which verbal patterns cost you the most tokens.

Compress Every Prompt Automatically

130+ rules. Under 2ms. Zero API calls. See your token savings in real time.

Download for macOS