NLP Techniques

NLP Analysis for Token Optimization

How Terse uses linguistic analysis to compress AI prompts by 15-30% without losing meaning. Relative clause compression, modifier collapse, self-context removal, and more.

What NLP Analysis Means for Token Optimization

Natural Language Processing analysis goes far beyond simple find-and-replace. Where basic pattern optimization swaps known phrases for shorter equivalents, NLP analysis examines the grammatical structure of each sentence to identify redundancies that only become visible when you understand how language works at the clause level.

Every prompt you send to an AI model like Claude, GPT-4, or Gemini is broken into tokens. These tokens cost money. More importantly, they consume context window space that could be used for actual reasoning. When you are running agent sessions that span dozens of turns, every unnecessary token compounds. A 20% reduction per turn across a 50-turn session can mean the difference between staying within context limits and hitting the wall.

Terse applies NLP analysis as the third stage of its optimization pipeline, after spell correction and pattern optimization. By this point, the text has already been cleaned of typos and common verbose patterns. NLP analysis catches the structural inefficiencies that survive those earlier passes: redundant relative clauses, stacked modifiers that say the same thing twice, self-referential preambles, and sentences that bury their meaning under layers of implicit context the model does not need.

Core NLP Techniques in Terse

Relative Clause Compression

English is full of relative clauses that add words without adding meaning. The pattern that is, which is, who is followed by a participle or adjective can almost always be compressed by dropping the relative pronoun and copula entirely.

Consider how Terse handles this in practice:

Before: "Find the function that is used to validate user input"
After:  "Find the function used to validate user input"
Saved:  2 tokens

Before: "The variable which is declared at the top of the file"
After:  "The variable declared at the top of the file"
Saved:  2 tokens

Before: "The API endpoint that is responsible for authentication"
After:  "The API endpoint responsible for authentication"
Saved:  2 tokens

Two tokens per clause may sound small, but relative clauses appear constantly in technical prompts. A typical code review prompt might contain 8-12 relative clauses. That is 16-24 tokens recovered from a single technique. Across an agent session with 30 turns of back-and-forth, you are looking at hundreds of tokens saved from this one optimization alone.

Terse's implementation is careful about edge cases. It does not compress relative clauses where the pronoun serves as the subject of a separate verb phrase, and it preserves clauses inside code blocks or quoted strings.

Modifier Pair Collapse

People naturally reach for pairs of adjectives or adverbs when a single word would carry the same weight. This is especially common in prompts because users want to be precise, but end up being redundant instead.

Before: "Write a clear and concise summary"
After:  "Write a concise summary"
Saved:  2 tokens

Before: "Make it simple and straightforward"
After:  "Make it straightforward"
Saved:  2 tokens

Before: "I need a quick and fast solution"
After:  "I need a fast solution"
Saved:  2 tokens

Terse maintains a scored dictionary of modifier pairs where one word subsumes the meaning of the other. When both appear joined by "and" or "or", the less specific modifier is dropped. The scoring ensures that the surviving word is always the one that carries more information. "Concise" survives over "clear" because conciseness implies clarity but not the reverse. "Straightforward" survives over "simple" because it carries a stronger connotation of ease-of-use.

Self-Referential Context Removal

This is one of the most impactful NLP techniques in Terse's arsenal. Users habitually open prompts with preambles about themselves, their role, their project, or their intentions. These preambles consume tokens and provide zero useful information to the model in most cases.

Before: "I'm a senior developer working on a React project and I
         need help with state management. Can you show me how to
         implement a global store?"
After:  "Show how to implement a global store in React."
Saved:  22 tokens

Before: "As someone who is new to Python, I was wondering if you
         could explain how decorators work."
After:  "Explain how Python decorators work."
Saved:  15 tokens

Before: "I'm currently trying to debug an issue where my API
         returns a 500 error. I've been looking at the logs and
         I think the problem might be in the middleware."
After:  "Debug API 500 error. Logs suggest middleware issue."
Saved:  20 tokens

Terse identifies self-referential patterns using a combination of pronoun detection, role-declaration patterns ("I'm a...", "As a..."), and hedging language ("I was wondering", "I think", "I believe"). In Normal mode, it strips the self-referential framing but preserves the core request. In Aggressive mode, it also strips the hedging and restructures the remaining content for maximum density.

This technique is particularly powerful in agent sessions where the model already has context about the user's project from previous turns. Restating "I'm working on a React project" in turn 15 when the model has been helping with that React project since turn 1 is pure waste.

Implicit Context Resolution

Many prompts wrap their actual request inside unnecessary framing that signals intent without adding information. Phrases like "I want to know", "Can you tell me", "I'd like to understand", and "Could you help me with" are all implicit context that the model can infer from the fact that you are asking a question.

Before: "I want to know how to implement binary search in Rust"
After:  "How to implement binary search in Rust"
Saved:  4 tokens

Before: "Can you help me understand why my Docker container
         keeps crashing?"
After:  "Why does my Docker container keep crashing?"
Saved:  4 tokens

Before: "I'd like you to review this code and tell me if there
         are any performance issues"
After:  "Review this code for performance issues"
Saved:  9 tokens

The savings here are modest per instance but the patterns are extremely common. Terse's analysis of real-world prompts shows that over 60% of prompts to coding assistants begin with some form of implicit context wrapper. Stripping these wrappers across an entire session adds up quickly.

Sentence-Level Restructuring

The most advanced NLP technique in Terse examines entire sentences for structural inefficiency. This goes beyond individual patterns to look at how information is distributed across clauses and whether the same meaning can be achieved with fewer structural elements.

Before: "I have a list of objects and I need to filter them
         based on a property and then sort the result by date"
After:  "Filter object list by property, sort by date"
Saved:  14 tokens

Before: "There is a bug in the code where the function returns
         null instead of throwing an error when the input is invalid"
After:  "Bug: function returns null instead of throwing error
         on invalid input"
Saved:  8 tokens

Sentence restructuring identifies the core semantic payload of each sentence, strips the structural scaffolding ("There is", "I have", "I need to"), and reconstructs the meaning in the most token-efficient form. This technique is most aggressive in Terse's Aggressive mode but applies lighter transformations even in Normal mode.

How NLP Differs from Simple Pattern Matching

The critical difference between NLP analysis and pattern-based optimization is awareness of sentence structure. Pattern matching operates on fixed strings: it knows that "in order to" can become "to" regardless of context. NLP analysis operates on grammatical relationships: it knows that a relative clause can be compressed only when the relative pronoun is followed by a copula and a participle.

This distinction matters because language is ambiguous. The word "that" can be a demonstrative pronoun, a relative pronoun, a conjunction, or a determiner. Pattern matching that blindly removes "that is" will break sentences where "that" is a demonstrative pointing to a specific object. NLP analysis understands the grammatical role of each word in context and only applies transformations that preserve meaning.

Terse's NLP engine uses a lightweight rule-based parser rather than a full neural NLP pipeline. This is a deliberate design choice. Running a transformer model to optimize input to another transformer model would defeat the purpose. The rule-based approach runs in under 5 milliseconds on typical prompts, adds zero latency to the user's workflow, and operates entirely on-device with no network calls. It handles the 80% of cases that account for the most token waste without the overhead of a full syntactic parser.

Impact on Agent Sessions

Single-turn prompts benefit from NLP analysis, but the real impact shows up in agent sessions. When you are running a multi-turn coding session with Claude Code, Cursor, or GitHub Copilot, every turn includes both your new prompt and the accumulated context from previous turns. NLP-optimized prompts are not just shorter at the point of entry, they also reduce the context that gets carried forward.

Consider a 40-turn agent session where each user turn averages 150 tokens before optimization. With NLP analysis achieving a conservative 20% reduction, each turn drops to 120 tokens. That is 30 tokens saved per turn, or 1,200 tokens across the full session. At Claude's pricing of $3 per million input tokens, the direct cost savings are small. But the context window savings are significant: 1,200 fewer tokens means more room for the model's reasoning, fewer context window truncation events, and better response quality in later turns.

NLP techniques also compound with Terse's other optimization stages. A prompt that has already been cleaned by spell correction and pattern optimization is structurally simpler, which makes NLP analysis more effective. The full pipeline typically achieves 25-40% reduction in Normal mode, with NLP contributing 15-20 percentage points of that total.

Comparison with Academic Approaches

The academic research on prompt compression has produced sophisticated systems like LLMLingua and Selective Context that use perplexity-based scoring to identify which tokens can be removed with minimal impact on model output quality. These systems achieve impressive compression ratios but require running a secondary language model to score each token, which adds latency and computational cost.

Terse takes a different approach. Rather than using a model to compress input to another model, it uses deterministic linguistic rules that run in constant time. This means zero latency overhead, no GPU requirements, and fully predictable behavior. The tradeoff is that Terse's NLP analysis is less aggressive than perplexity-based methods on long-form text, but it is more reliable on short technical prompts where every word tends to carry high information density.

In practice, the two approaches are complementary. LLMLingua excels at compressing long context passages and retrieved documents where many tokens are genuinely redundant. Terse excels at compressing the user-authored portions of prompts where the waste comes from natural language habits rather than information redundancy. For agent sessions where the user is writing fresh prompts each turn, Terse's rule-based approach is both faster and more appropriate.

Preserving Meaning Across Modes

Terse offers three optimization modes, and NLP analysis behaves differently in each. In Soft mode, NLP techniques are disabled entirely. Only spell correction and whitespace normalization run. This guarantees zero semantic change.

In Normal mode, NLP analysis applies conservative transformations: relative clause compression, modifier collapse, and implicit context removal. These transformations preserve the full meaning of the prompt while removing structural waste. The model receives the same information in fewer tokens.

In Aggressive mode, NLP analysis applies all transformations including self-referential context removal and full sentence restructuring. Combined with telegraph compression, this can achieve 40-70% token reduction. The meaning is preserved at the semantic level, but the tone and style of the original prompt may change significantly. This mode is designed for agent sessions and API calls where efficiency matters more than preserving the user's voice.

Real-World Results

Across Terse's user base, NLP analysis contributes an average of 18% token reduction on prompts that have already passed through spell correction and pattern optimization. On prompts with heavy self-referential preambles, the reduction can exceed 30%. On already-concise technical prompts, the reduction may be as low as 5%.

The distribution is bimodal. Experienced prompt engineers who already write concise prompts see modest gains from NLP analysis. Casual users who write prompts the way they write emails see dramatic improvements. This is exactly the use case Terse is designed for: you should not have to think about token efficiency while writing prompts. Write naturally, and let the optimizer handle the compression.

Start Optimizing Your Prompts

Terse runs on-device, analyzes your prompts in real-time, and saves tokens automatically. No API keys, no cloud processing, no data leaves your machine.

Download Terse

Further Reading