Docs/Caveman Mode

Caveman Mode

Caveman Mode makes the AI respond in terse, compressed language — saving up to 65% on output tokens. It pairs with RTK to cut both input and output costs simultaneously.

Output savings: up to 65%Works with all providers

How It Works

lina-router injects a system-level instruction before your request reaches the model. The AI is told to skip pleasantries, drop explanations, and return only the essential output. The model still understands the full request — it just answers shorter.

Without Caveman Mode

# AI response

Sure! I'd be happy to help you refactor that function. Here's what I suggest we do: First, we should extract the validation logic into a separate helper function to improve readability. Then we can... [12 more lines of explanation]

~340 output tokens

With Caveman Mode

# AI response

Extract validation → helper fn. Rename x → userId. Return early on null.

~22 output tokens (-94%)

Real-world savings depend on the task — code generation tends to see 40–65% reduction, explanations can see up to 80%.

How to Enable

  1. 1Open the dashboard at localhost:20128/dashboard
  2. 2Go to Endpoint → Settings tab
  3. 3Toggle "Caveman Mode" on
  4. 4Choose intensity level: Balanced or Full
Levels: Balanced — terse but readable. Keeps code comments. Full — maximum compression. Drops all prose, abbreviates heavily.

RTK vs Caveman Mode

RTK Token SaverCaveman Mode
AffectsInput tokensOutput tokens
Savings20–40%40–65%
HowCompresses tool_result blocksInjects terse-response prompt
Model seesShorter tool outputInstruction to be brief
Best forHeavy CLI tool usageChat / explanation tasks

Both can be enabled simultaneously for maximum savings.