promptcache — The White-Space Tool No One Else Is Building

Overview

Every time an AI job runs, it pays for the same thing twice: the system prompt.

System prompts — the context, instructions, and role definitions that precede every AI task — are repeated in every job. A 1,000-token system prompt run 500 times a day costs the same as 500MB of data processed. But most AI infrastructure charges per token, regardless of whether those tokens are new or repeated.

promptcache is the first tool that finds these repeated tokens and tells you exactly how much you’d save by caching them.

No direct competitor exists for promptcache. It’s a genuine white-space product in the AI developer tooling market.

The Problem: Every AI Job Pays for the Same Prompt Over and Over

Here’s what AI infrastructure billing actually looks like in practice:

Job A: "You are a customer support agent for Acme Corp. [800 tokens]"
        → Runs 200x/day → pays 800 tokens × 200 = 160,000 tokens/day

Job B: "You are a customer support agent for Acme Corp. [800 tokens]"
        → Runs 150x/day → pays 800 tokens × 150 = 120,000 tokens/day

Job C: "You are a customer support agent for Acme Corp. [800 tokens]"
        → Runs 100x/day → pays 800 tokens × 100 = 80,000 tokens/day

All three jobs share the same 800-token system prompt. Without caching, each job pays for those 800 tokens on every single run. That’s 800 tokens × 450 runs = 360,000 tokens paid per day — for content that doesn’t change.

With caching: those 800 tokens are computed once, then retrieved on subsequent runs. You’re charged once, not 450 times.

How promptcache Works

Analysis

promptcache reads your job definitions, extracts the system-level prompt prefix (the boilerplate that precedes the task-specific input), and calculates:

Shared prefix tokens — how many tokens are shared across all jobs
Redundancy ratio — what percentage of total token consumption is pure repetition
Annual cost of redundancy — at current volume, how much are you paying to re-process the same tokens

Output Example

promptcache audit

Analyzing 12 jobs from snapshot...
Found: 3 distinct system prefixes across 12 jobs

Prefix A: "You are a customer support agent..." (847 tokens)
  Used by: Job A, Job B, Job C
  Daily runs: 450
  Daily redundancy: 847 × 450 = 381,150 tokens
  Monthly redundancy cost: 381,150 × $2.50/1M × 30 = $28.59

Prefix B: "You are a medical coder..." (1,203 tokens)
  Used by: Job D, Job E
  Daily runs: 120
  Monthly redundancy cost: $9.02

Total monthly waste: $37.61
With promptcache optimization: $2.84/month
Savings: $34.77/month ($417/year)

The Math of Why This Compounds

SaaS Scenario: 30-Job Pipeline

A SaaS company runs 30 AI-assisted content generation jobs per day.

Average system prompt: 600 tokens
Daily runs: 30
Without caching: 600 × 30 = 18,000 tokens/day paid for repetition
Monthly: 540,000 tokens/month × $2.50/1M = $1.35/month in pure waste

That sounds small. But:

At 300 jobs/day (mid-size platform): $13.50/month
At 3,000 jobs/day (large platform): $135/month
At GPT-4o pricing ($2.50/1M): $135/month
At Opus 4.6 pricing ($15/1M): $810/month

The waste scales linearly with volume.promptcache’s savings scale with it.

The Agency Scenario

An agency has 8 clients, each with a custom AI workflow.

Shared brand guidelines prefix: 400 tokens × 8 clients × 50 jobs/day × 30 days = 4.8M tokens/month
Monthly waste at $2.50/1M: $12/month per shared prefix
3 shared prefixes across the agency’s workflows: $36/month in pure redundancy

At $99/month for the Loom Elite plan, one promptcache audit pays for the entire tool in the first month — then saves money every subsequent month.

Competitive Positioning

Why no one else has built this:

It requires access to job definitions across multiple pipelines — not just one team’s jobs
It requires a job-level snapshot of actual prompt content, not just aggregate token counts
The optimization (prompt caching) requires framework or infrastructure support that most tools don’t provide

promptcache works because Signallloom’s tools operate at the job-definition layer — not just the API call layer. We can see what your jobs actually contain, not just what they cost.

LangChain has prompt caching built in — but it’s inside LangChain, only works within LangChain-managed chains, and requires you to already know what to cache. promptcache shows you what you should cache before you’ve built anything.

The Compound Effect with contextbroker

When combined with contextbroker’s persistent memory:

System prefix cached once → shared across all agent sessions
Agent memory persists between sessions → no re-loading of user preferences or project state
Together: a persistent, optimized context layer that eliminates the two biggest sources of token waste simultaneously

Integration

npm install -g @signalloom/promptcache
signalloom promptcache audit

Outputs: shared prefixes, redundancy ratios, cost projections, and an optimization guide.

Pricing

Free: Analyze up to 10 jobs
Loom Partner ($19/mo): Full analysis, unlimited jobs, monthly reports
Loom Elite ($99/mo): Full access + priority support + API access

promptcache is part of the Signallloom Developer Toolbox.