Overview
Every time an AI job runs, it pays for the same thing twice: the system prompt.
System prompts — the context, instructions, and role definitions that precede every AI task — are repeated in every job. A 1,000-token system prompt run 500 times a day costs the same as 500MB of data processed. But most AI infrastructure charges per token, regardless of whether those tokens are new or repeated.
promptcache is the first tool that finds these repeated tokens and tells you exactly how much you’d save by caching them.
No direct competitor exists for promptcache. It’s a genuine white-space product in the AI developer tooling market.
The Problem: Every AI Job Pays for the Same Prompt Over and Over
Here’s what AI infrastructure billing actually looks like in practice:
Job A: "You are a customer support agent for Acme Corp. [800 tokens]"
→ Runs 200x/day → pays 800 tokens × 200 = 160,000 tokens/day
Job B: "You are a customer support agent for Acme Corp. [800 tokens]"
→ Runs 150x/day → pays 800 tokens × 150 = 120,000 tokens/day
Job C: "You are a customer support agent for Acme Corp. [800 tokens]"
→ Runs 100x/day → pays 800 tokens × 100 = 80,000 tokens/day
All three jobs share the same 800-token system prompt. Without caching, each job pays for those 800 tokens on every single run. That’s 800 tokens × 450 runs = 360,000 tokens paid per day — for content that doesn’t change.
With caching: those 800 tokens are computed once, then retrieved on subsequent runs. You’re charged once, not 450 times.
How promptcache Works
Analysis
promptcache reads your job definitions, extracts the system-level prompt prefix (the boilerplate that precedes the task-specific input), and calculates:
- Shared prefix tokens — how many tokens are shared across all jobs
- Redundancy ratio — what percentage of total token consumption is pure repetition
- Annual cost of redundancy — at current volume, how much are you paying to re-process the same tokens
Output Example
promptcache audit
Analyzing 12 jobs from snapshot...
Found: 3 distinct system prefixes across 12 jobs
Prefix A: "You are a customer support agent..." (847 tokens)
Used by: Job A, Job B, Job C
Daily runs: 450
Daily redundancy: 847 × 450 = 381,150 tokens
Monthly redundancy cost: 381,150 × $2.50/1M × 30 = $28.59
Prefix B: "You are a medical coder..." (1,203 tokens)
Used by: Job D, Job E
Daily runs: 120
Monthly redundancy cost: $9.02
Total monthly waste: $37.61
With promptcache optimization: $2.84/month
Savings: $34.77/month ($417/year)
The Math of Why This Compounds
SaaS Scenario: 30-Job Pipeline
A SaaS company runs 30 AI-assisted content generation jobs per day.
- Average system prompt: 600 tokens
- Daily runs: 30
- Without caching: 600 × 30 = 18,000 tokens/day paid for repetition
- Monthly: 540,000 tokens/month × $2.50/1M = $1.35/month in pure waste
That sounds small. But:
- At 300 jobs/day (mid-size platform): $13.50/month
- At 3,000 jobs/day (large platform): $135/month
- At GPT-4o pricing ($2.50/1M): $135/month
- At Opus 4.6 pricing ($15/1M): $810/month
The waste scales linearly with volume.promptcache’s savings scale with it.
The Agency Scenario
An agency has 8 clients, each with a custom AI workflow.
- Shared brand guidelines prefix: 400 tokens × 8 clients × 50 jobs/day × 30 days = 4.8M tokens/month
- Monthly waste at $2.50/1M: $12/month per shared prefix
- 3 shared prefixes across the agency’s workflows: $36/month in pure redundancy
At $99/month for the Loom Elite plan, one promptcache audit pays for the entire tool in the first month — then saves money every subsequent month.
Competitive Positioning
Why no one else has built this:
- It requires access to job definitions across multiple pipelines — not just one team’s jobs
- It requires a job-level snapshot of actual prompt content, not just aggregate token counts
- The optimization (prompt caching) requires framework or infrastructure support that most tools don’t provide
promptcache works because Signallloom’s tools operate at the job-definition layer — not just the API call layer. We can see what your jobs actually contain, not just what they cost.
LangChain has prompt caching built in — but it’s inside LangChain, only works within LangChain-managed chains, and requires you to already know what to cache. promptcache shows you what you should cache before you’ve built anything.
The Compound Effect with contextbroker
When combined with contextbroker’s persistent memory:
- System prefix cached once → shared across all agent sessions
- Agent memory persists between sessions → no re-loading of user preferences or project state
- Together: a persistent, optimized context layer that eliminates the two biggest sources of token waste simultaneously
Integration
npm install -g @signalloom/promptcache
signalloom promptcache audit
Outputs: shared prefixes, redundancy ratios, cost projections, and an optimization guide.
Pricing
- Free: Analyze up to 10 jobs
- Loom Partner ($19/mo): Full analysis, unlimited jobs, monthly reports
- Loom Elite ($99/mo): Full access + priority support + API access
promptcache is part of the Signallloom Developer Toolbox.