Docs/Caveman Mode

Caveman Mode

Caveman Mode makes the AI respond in terse, compressed language — saving up to 65% on output tokens. It pairs with RTK to cut both input and output costs simultaneously.

Output savings: up to 65%Works with all providers

How It Works

lina-router injects a system-level instruction before your request reaches the model. The AI is told to skip pleasantries, drop explanations, and return only the essential output. The model still understands the full request — it just answers shorter.

Without Caveman Mode

# AI response

Sure! I'd be happy to help you refactor that function. Here's what I suggest we do: First, we should extract the validation logic into a separate helper function to improve readability. Then we can... [12 more lines of explanation]

~340 output tokens

With Caveman Mode

# AI response

Extract validation → helper fn. Rename x → userId. Return early on null.

~22 output tokens (-94%)

Real-world savings depend on the task — code generation tends to see 40–65% reduction, explanations can see up to 80%.

How to Enable

1Open the dashboard at localhost:20128/dashboard
2Go to Endpoint → Settings tab
3Toggle "Caveman Mode" on
4Choose intensity level: Balanced or Full

Levels: Balanced — terse but readable. Keeps code comments. Full — maximum compression. Drops all prose, abbreviates heavily.

RTK vs Caveman Mode

	RTK Token Saver	Caveman Mode
Affects	Input tokens	Output tokens
Savings	20–40%	40–65%
How	Compresses tool_result blocks	Injects terse-response prompt
Model sees	Shorter tool output	Instruction to be brief
Best for	Heavy CLI tool usage	Chat / explanation tasks

Both can be enabled simultaneously for maximum savings.

←

RTK Token Saver

Input token compression

Providers Reference

Model aliases & tier list

→