Caveman Mode
Caveman Mode makes the AI respond in terse, compressed language — saving up to 65% on output tokens. It pairs with RTK to cut both input and output costs simultaneously.
Output savings: up to 65%Works with all providers
How It Works
lina-router injects a system-level instruction before your request reaches the model. The AI is told to skip pleasantries, drop explanations, and return only the essential output. The model still understands the full request — it just answers shorter.
Without Caveman Mode
# AI response
Sure! I'd be happy to help you refactor that function. Here's what I suggest we do: First, we should extract the validation logic into a separate helper function to improve readability. Then we can... [12 more lines of explanation]~340 output tokens
With Caveman Mode
# AI response
Extract validation → helper fn. Rename x → userId. Return early on null.~22 output tokens (-94%)
Real-world savings depend on the task — code generation tends to see 40–65% reduction, explanations can see up to 80%.
How to Enable
- 1Open the dashboard at localhost:20128/dashboard
- 2Go to Endpoint → Settings tab
- 3Toggle "Caveman Mode" on
- 4Choose intensity level: Balanced or Full
Levels: Balanced — terse but readable. Keeps code comments. Full — maximum compression. Drops all prose, abbreviates heavily.
RTK vs Caveman Mode
| RTK Token Saver | Caveman Mode | |
|---|---|---|
| Affects | Input tokens | Output tokens |
| Savings | 20–40% | 40–65% |
| How | Compresses tool_result blocks | Injects terse-response prompt |
| Model sees | Shorter tool output | Instruction to be brief |
| Best for | Heavy CLI tool usage | Chat / explanation tasks |
Both can be enabled simultaneously for maximum savings.