The pattern

In March 2026, Andrej Karpathy open-sourced a 630-line script that ran AI experiments while he slept. The agent modified training code, ran a five-minute experiment, checked the result, and looped. He woke up to 700 changes and 20 improvements he'd missed after weeks of manual tuning.

Within ten days, builders had pointed the same loop at financial markets (+22% return), chess engines (expert to grandmaster), and rendering pipelines (53% faster). Nobody coordinated this. The structural conditions were right: a metric, a clock, and a loop.

The pattern has a name now — autoresearch. And it turns out call buying has the structural conditions to run it.

Why call buying has the right conditions

Autoresearch works when three things are true about your domain:

The metric is fast. You need to evaluate each experiment quickly enough to run dozens per day. In Karpathy's case, validation loss evaluated in five minutes. In call buying, CPA evaluates on every call — you know whether a bid change worked within hours, not weeks.

The metric is unambiguous. The number has to be clearly better or clearly worse. Validation loss goes down = better. CPA goes down = better. "Is this email good?" is ambiguous. "Did this bid modifier reduce cost per acquisition?" is not.

The metric is predictive. It has to correlate with what you actually care about. CPA directly measures the cost of acquiring a customer. It's not a proxy — it's the thing itself. This is rare. Most business metrics are proxies. CPA is honest.

Call buying has all three. That's why it compounds.

How Q Optimizer runs the loop

Q Optimizer is an AI agent that observes your campaign performance, proposes experiments, evaluates results against your target CPA, and applies winning changes — autonomously, every 15 minutes, 24/7.

The cycle

Observe. The system reads your campaign state in real time: which geos are converting, which sources are underperforming, which dayparts are overpriced, which bid modifiers are stale.

Propose. The agent proposes a specific change — bump the Florida geo modifier from +15% to +25%, cut the overnight daypart bid by 30%, increase source weighting for a high-converting publisher. Each proposal is a discrete, testable hypothesis.

Evaluate. The experiment runs on a deterministic traffic split. This is important: the evaluation isn't "did things get better overall" — it's "did this specific change improve CPA on the traffic it was applied to, compared to the traffic it wasn't." Same principle as an A/B test, running continuously.

Apply or revert. Winning changes are applied automatically. Losing changes are reverted. No human in the loop between cycles.

Repeat. Every 15 minutes. 96 cycles per day. 672 cycles per week. Each cycle builds on what the previous cycles learned.

What compounds

The compounding isn't additive — it's multiplicative. A 1% CPA improvement per cycle doesn't seem like much. Over 672 weekly cycles, the improvements that survive evaluation stack. TapQuality buyers typically see 10–20% CPA improvement from the first month of Q Optimizer, with continued gains as the system accumulates more conversion data and the propensity scoring models sharpen.

You wake up to better campaigns than you went to sleep with. This is what the autoresearch pattern looks like when deployed against a domain with an honest fitness function.

The five-minute clock vs. the 15-minute clock

MMNTM's analysis of the autoresearch pattern identifies the clock — the fixed time constraint — as the mechanism that makes the loop work. Without a fixed clock, experiments are incomparable. A good 3-minute experiment and a good 3-hour experiment can't be ranked on the same leaderboard.

In Karpathy's case, the clock is five minutes (one training run). In Q Optimizer's case, the clock is 15 minutes (enough call volume to produce a statistically meaningful signal on the traffic split). The principle is identical: every experiment has the same time cost, so results are directly comparable.

The 15-minute interval isn't arbitrary. It's calibrated to the statistical power needed to distinguish a real CPA change from noise on typical campaign volumes. Too short and you're evaluating on too few calls. Too long and you're leaving optimization cycles on the table.

Where the loop breaks (and where it doesn't)

The enterprise eval gap — the distance between "the loop exists" and "we have a metric the loop can optimize" — is the reason most agent deployments stall. Most knowledge work doesn't have a loss function. "Is this customer support response good?" can't be scored unambiguously.

Call buying doesn't have this problem. CPA is the loss function. It's fast (evaluates per call), unambiguous (lower is better), and predictive (it directly measures acquisition cost). The loop doesn't break because the score is honest.

Where Q Optimizer can't run the loop: campaign types where conversion data feeds back slowly (30+ day sales cycles), verticals with insufficient call volume (fewer than 50 calls/day per campaign), or situations where the target metric isn't CPA (brand awareness campaigns, for example). In those cases, the clock doesn't generate enough signal per cycle to evaluate experiments.

For everything else — insurance, home services, legal intake, financial services with same-day or same-week conversion attribution — the 15-minute loop runs and compounds.

The feedback loop is the moat

The Q Optimizer loop doesn't just improve your bids. It improves the data that improves your bids. Every call you buy feeds conversion data back into the propensity scoring model. The model gets sharper. Sharper scoring means better pre-auction filtering. Better filtering means higher-quality calls. Higher-quality calls mean more conversion data. The loop feeds itself.

This is the same structural advantage Karpathy identified: the companies running overnight loops aren't announcing it — they're shipping the results at 9am. The gap between buyers running a 15-minute optimization loop and buyers doing manual weekly bid reviews widens quietly. By the time you notice the delta, it's months of compounding.

Getting started

If you're buying calls on RTB exchanges with CPA as your primary metric, your campaigns have the structural conditions for the 15-minute loop. The requirements:

Conversion data flowing: Your CRM or disposition system feeds back sale/no-sale outcomes within 24 hours
Sufficient volume: At least 50 calls per day per campaign for statistical power
Clear CPA target: A number that's unambiguously better when it goes down
Patience for week 1: The loop needs 100–200 calls to calibrate before the compounding starts

The full platform overview covers the broader system. Q Optimizer is the piece that makes the system compound.

The 15-Minute Loop: How Q Optimizer Runs the Autoresearch Pattern on Call Buying

The pattern

Why call buying has the right conditions

How Q Optimizer runs the loop

The cycle

What compounds

The five-minute clock vs. the 15-minute clock

Where the loop breaks (and where it doesn't)

The feedback loop is the moat

Getting started

Related Articles

Meet TapQuality: The AI-Native Call Platform for World-Class Marketers

What Is Propensity Scoring for Inbound Calls?

RTB vs. Direct Call Buying: Which Model Wins?

Related Articles

product
Meet TapQuality: The AI-Native Call Platform for World-Class Marketers
TapQuality is a buy-side RTB platform for inbound phone calls. Real-time bidding, propensity scoring, auto-optimization, and performance intelligence — built for marketers who buy calls at scale.
5 min read

technology
What Is Propensity Scoring for Inbound Calls?
Propensity scoring predicts caller quality before the auction runs. Learn how it works, what signals it uses, and how it reduces CPA for call buyers.
9 min read

strategy
RTB vs. Direct Call Buying: Which Model Wins?
A head-to-head comparison of real-time bidding exchanges and direct publisher deals for inbound call buying. Pricing, quality control, scale, optimization speed, and attribution compared.
9 min read