Table of Contents
Incrementality testing is a measurement method that isolates the installs, registrations, or purchases that paid advertising actually caused, by comparing a group exposed to ads against a statistically equivalent group that was held out. For mobile apps in 2026, incrementality testing mobile apps teams run is the most reliable way to answer one question that attribution cannot: would this user have converted anyway? Admiral Media uses incrementality testing to separate true ad-driven growth from conversions that platforms claim but did not create. This matters because in a world of privacy-driven signal loss, last-click attribution systematically takes credit for organic demand, and budgets get misallocated as a result.
The Admiral Media team has run measurement programs across more than 150 mobile brands and over €500M in managed ad spend. The pattern is consistent: channels that look efficient in an attribution dashboard often look very different once a proper holdout reveals how many of those conversions were incremental. This guide lays out a practical framework for geo holdouts, PSA tests, ghost ads, and reading lift, so you can measure what your spend truly adds rather than what a pixel claims.
What is incrementality testing for mobile apps, in plain terms?
Incrementality testing for mobile apps is a controlled experiment that measures the difference in conversions between users who could see your ads and a matched control group that could not. The gap between the two groups is the incremental lift: the conversions that exist only because the campaign ran. Attribution counts every conversion that touched an ad. Incrementality counts only the conversions that would not have happened otherwise. Those are different numbers, and the gap between them is where wasted budget hides.
The mechanism is borrowed from clinical trials. A treatment group receives the intervention, the ad, and a control group receives nothing or a placebo. Because users are randomized or geographies are matched, the two groups share the same underlying behavior. Any measured difference in install rate, trial starts, or revenue can be attributed to the ads with statistical confidence. For app marketers, this is the cleanest available proxy for causation in an ecosystem where deterministic tracking has eroded.
Admiral Media treats incrementality as the complement to deterministic measurement, not a replacement for it. Attribution still tells you which creative and placement drove a click. Incrementality tells you whether that activity moved the business. The two answer different questions, and mature growth teams run both.
Why does attribution overstate paid performance?
Attribution overstates paid performance because it assigns full credit to the last touch before a conversion, even when the user was already going to install. This is the core flaw incrementality testing was designed to correct. A brand-search campaign, a retargeting pool, or a high-intent audience will always look efficient under last-click logic, because those users were close to converting before they ever saw the ad.
Three forces make this worse for apps specifically. First, signal loss from Apple’s App Tracking Transparency and the move to SKAdNetwork and AdAttributionKit means much of iOS attribution is now modeled rather than observed. Second, self-attributing networks mark conversions inside their own walled gardens, so two platforms can both claim the same install. Third, organic demand and paid demand overlap heavily for established apps, which inflates view-through and click-through credit. The result is a measured ROAS that can look strong while incremental ROAS, the number that reflects real profit, sits far lower.
In the Admiral Media team’s experience managing accounts through the post-IDFA transition, the channels most prone to overstatement are branded search, broad retargeting, and view-through-heavy social placements. These are exactly the line items that a holdout test reshapes. TIER’s performance marketing lead made the point directly after working with Admiral Media: beyond scale, the value was in finding answers to questions about tracking and incrementality.
The Admiral Media Incrementality Ladder
Admiral Media uses a four-rung sequence to decide how much measurement rigor a given budget justifies. The Admiral Media Incrementality Ladder moves from cheap and directional to expensive and definitive, so teams spend testing effort in proportion to the spend at risk.
- Baseline holdout: Pause spend or hold out a slice of users on a single channel and watch organic conversions. This is the fastest read on whether a channel adds anything at all, and it costs only the opportunity cost of the paused budget.
- Geo lift test: Split comparable regions into test and control, run ads only in test markets, and measure the difference in total conversions. This survives signal loss because it relies on aggregate business data, not device-level tracking.
- Platform lift study: Use the network’s randomized holdout tools, such as conversion lift studies, to get a controlled read inside a single platform’s inventory at user level.
- Always-on incrementality: Maintain a permanent small holdout and recalibrate bidding and budget allocation against incremental ROAS continuously, so measurement becomes an operating system rather than a one-off project.
The ladder is deliberately ordered. Most teams should not jump to rung four before they have proven they can run rung two cleanly. Each rung validates the assumptions of the next.
How do geo holdout tests work for apps?
Geo holdout tests work by turning entire regions into test and control groups, then comparing total conversions between them. Because the unit of measurement is a geography rather than a tracked device, geo testing is the most signal-loss-resistant method available to app marketers in 2026. You do not need an IDFA, a cookie, or a deterministic match. You need clean regional conversion data and comparable markets.
A geo lift test runs in four stages. You select matched markets with similar historical conversion trends, typically using a pre-period to confirm the markets move together. You then assign some to receive ads, the treatment, and some to go dark, the control. You hold the design fixed for long enough to clear the conversion delay, often two to four weeks for apps with trials. Finally, you compare the conversion volume in test markets against the counterfactual implied by control markets. Google’s geo experiment tooling and aggregated measurement guidance describe the same logic in its conversion lift and geo experiment documentation.
Miles Mobility shows why aggregate measurement matters. The car-sharing brand operates across Germany and Belgium, and accurate attribution of web-to-app conversions was one of its three core challenges. Admiral Media implemented a mobile measurement partner and restructured campaigns for Google Smart Bidding, then validated the gains against business outcomes rather than platform-claimed conversions. Admiral Media managed Miles Mobility’s Google Web-to-App campaigns and achieved a 260% increase in conversions and a 25% lower CPA, measured on the brand’s own conversion data. The result is shown below.
What about PSA tests and ghost ads?
PSA tests and ghost ads are user-level incrementality methods that hold out a control group inside a live campaign rather than across geographies. A PSA test shows the control group an unrelated public-service ad in place of your ad, so both groups experience an ad impression but only one sees your message. A ghost ad records which control users would have been served your ad, without actually showing it, which removes the cost of buying placebo inventory.
Ghost ads are the more elegant design because they control for ad-serving selection bias: the algorithm picks the same high-value users for both groups, and you compare conversion rates between users who saw your ad and statistically identical users who did not. PSA tests are simpler to run on platforms that support them but can introduce noise, since the placebo impression itself can influence behavior and the control inventory costs real money. Both approaches depend on the platform supporting a randomized holdout, which is why they sit on rung three of the ladder rather than rung two.
For apps facing heavy iOS signal loss, the Admiral Media team often prefers geo testing over user-level holdouts, because geo designs do not require device identifiers to remain intact. User-level lift studies remain valuable on Android and within platforms that still operate clean randomized holdouts. The right method depends on your platform mix, your conversion volume, and how much of your tracking has degraded.
| Method | What it measures | Signal-loss resilience | Relative cost | Best for |
|---|---|---|---|---|
| Baseline holdout (pause test) | Whether a channel adds any incremental conversions | High (uses aggregate data) | Low (opportunity cost only) | Quick directional reads on a single channel |
| Geo lift test | Incremental conversions across matched regions | Very high (no device IDs needed) | Medium | iOS-heavy apps and multi-market brands |
| PSA test | User-level lift vs a placebo-ad control | Medium (needs platform holdout) | Medium to high (buys control inventory) | Android and platforms with clean holdouts |
| Ghost ads | User-level lift with serving bias removed | Medium (needs platform support) | Low to medium | Precise single-platform causal reads |
| Marketing mix modeling | Channel-level contribution from aggregate spend and outcomes | Very high (privacy-safe by design) | High (data and modeling effort) | Portfolio-level budget allocation |
How do you read lift and calculate incremental ROAS?
You read lift by comparing the conversion rate of the test group against the control group, then expressing the difference as a percentage of the control. Incremental ROAS is incremental revenue divided by the spend that produced it, and it is almost always lower than platform-reported ROAS. This single calculation is the reason incrementality testing changes budgets.
The core formula is straightforward. Incremental conversions equal test conversions minus the conversions the control group produced on its own, scaled to the same population. Lift is incremental conversions divided by control conversions. Incremental ROAS is the revenue from those incremental conversions divided by the campaign cost. If a channel reports a 4.0 ROAS in its dashboard but the holdout shows that 40% of those conversions happened in the control group too, the incremental ROAS is meaningfully lower, and the budget decision changes accordingly.
Two practitioner cautions matter here. First, confirm statistical significance before acting; a small lift with wide confidence intervals is not a result, it is noise. Aggregate measurement bodies such as AppsFlyer’s incrementality guidance stress sufficient sample size and test duration for this reason. Second, account for the conversion delay; ending a test before trials convert to subscriptions will understate lift for subscription apps. The Admiral Media team pairs incremental ROAS with predictive LTV signals so that bidding optimizes toward incremental value rather than front-loaded installs.
TIER illustrates how validated measurement underwrites aggressive scaling. Once tracking and incrementality questions were answered, Admiral Media expanded TIER beyond a single channel and scaled budget with confidence. Admiral Media scaled TIER’s user acquisition across multiple markets, growing new customers by 297%, adding two new channels, and increasing the acquisition budget fivefold in under three months. The scaling trajectory is shown below.
When should you run an incrementality test?
You should run an incrementality test before any major budget decision on a channel that attribution flags as a top performer, because those are the channels most likely to be overstated. The trigger is not a calendar; it is a decision with real money attached. If you are about to double spend on retargeting or branded search on the strength of a dashboard ROAS, that is the moment a holdout pays for itself.
Practical triggers the Admiral Media team uses include: a channel that looks suspiciously efficient relative to its scale, a planned step-change in budget, a shift in the platform’s attribution model, or a board-level question about whether paid spend is truly driving growth. Incrementality also belongs in the diligence stack whenever an app crosses into eight-figure annual spend, where a few points of misattribution translate into large absolute waste.
What incrementality testing is not is a reason to stop using attribution day to day. Attribution remains the right tool for in-flight creative and bidding decisions, where speed matters more than causal purity. Incrementality is the periodic recalibration that keeps attribution honest. For a deeper view of why the industry is shifting from deterministic attribution toward causal measurement, see Admiral Media’s companion piece on embracing incrementality, and for full-funnel measurement strategy, the Admiral Media performance marketing approach.
Frequently Asked Questions
What is the difference between attribution and incrementality?
Attribution assigns credit for a conversion to the ad touchpoints that preceded it, usually the last click. Incrementality measures whether the conversion would have happened without any ad at all, by comparing an exposed group against a held-out control. Attribution answers which ad got the click; incrementality answers whether the spend caused the outcome. The two numbers differ, and the gap is the conversions a channel takes credit for but did not create.
Is geo lift testing better than user-level holdouts for iOS apps?
For iOS apps facing heavy signal loss, geo lift testing is usually more reliable than user-level holdouts. Geo tests compare whole regions using aggregate conversion data, so they do not depend on device identifiers that App Tracking Transparency has degraded. User-level holdouts such as ghost ads remain strong on Android and inside platforms with clean randomized holdouts. Many mature programs run both and triangulate the results.
How long should an app incrementality test run?
An app incrementality test should run long enough to reach statistical significance and to clear the conversion delay between install and the event you care about. For install or registration goals, two to four weeks is often enough. For subscription apps where trials convert later, the test must run past the trial window so that the measured lift includes paying users. Ending early understates lift and leads to wrong budget decisions.
Why is incremental ROAS lower than platform-reported ROAS?
Incremental ROAS is lower because platform-reported ROAS includes conversions that would have happened without the ad. A holdout reveals how many conversions the control group produced on its own, and those are subtracted out. The remaining incremental revenue, divided by spend, is the true return. Channels heavy in retargeting, branded search, and view-through credit typically show the largest gap between reported and incremental ROAS.
Can small apps run incrementality tests, or is it only for big budgets?
Small apps can run incrementality tests, but they need enough conversion volume to detect a difference between test and control. Low-volume apps should start with a simple baseline pause test on one channel rather than a complex geo or user-level design. As spend and conversion volume grow, more rigorous geo and platform lift studies become statistically viable. The method scales with the budget at risk.
Does incrementality testing replace mobile measurement partners?
No. A mobile measurement partner still records and structures your conversion data, which incrementality testing depends on. Incrementality is a layer of causal analysis on top of that data, not a substitute for it. In the Miles Mobility program, Admiral Media implemented a mobile measurement partner first, then used clean conversion data to validate the gains. The two work together.
Who should own incrementality testing inside a growth team?
Incrementality testing should be owned jointly by the performance marketing lead, who controls budget and channel design, and the analytics function, which controls measurement integrity. Keeping it with one side alone tends to fail: marketers may design tests that flatter their channels, while analysts may design tests that ignore operational constraints. Admiral Media typically runs incrementality as a shared protocol so that the test design and the budget decision stay connected.
Related Articles
- Predictive LTV Bidding: How to Acquire Users by Value, Not Volume
- Web-to-App Funnels: Scaling Subscription Apps After the DMA and External Purchase Rules
- How AI Overviews and ChatGPT Are Reshaping App Discovery (and What to Do About It)
- AdAttributionKit (AAK): The Post-SKAN iOS Measurement Playbook for 2026


