Lesson 5 of 5

Measuring Results and Scaling Winners

8 min read

When Is a Test "Done"?

The most common experimentation mistake is calling a test too early. A 15% lift after one week is not a result -- it is noise. Affiliate programs have high variance because a single large affiliate can swing your numbers in either direction. You need enough data to be confident the difference is real, not random.

Minimum run time: 4 weeks for commission tests, 2 weeks for creative tests
Minimum sample: 100+ conversions per group for CPA tests, 60+ for RevShare (revenue accumulates slower)
Stability check: The directional winner should be consistent for at least 2 consecutive weeks before declaring
Outlier audit: Remove or flag any affiliate contributing more than 25% of a group's total volume -- they should not determine the result alone

Metrics That Matter

Metric	What It Tells You	Watch For
Revenue per affiliate	Overall program efficiency	Skew from top performers
Conversion rate	Offer attractiveness	Quality degradation (high conversion + low deposit)
Traffic quality score	Whether the variant attracts real users	Score drops that lag behind conversion increases
Affiliate churn rate	Partner satisfaction with the new model	Delayed churn -- partners may leave 30-60 days after a test ends
Cost per acquisition	True cost including commissions + overhead	CPA alone misses RevShare long-tail costs
Player/customer LTV	Downstream value of acquired users	Requires 60-90 day lookback for meaningful data

A test can "win" on conversion rate but lose on profitability. Always measure at least one revenue metric and one quality metric alongside your primary conversion metric. A 20% conversion lift is worthless if the acquired users churn within a week.

Statistical Significance for Affiliate Tests

Standard A/B testing tools assume large sample sizes and independent observations. Affiliate tests violate both assumptions -- you have dozens of partners, not thousands of users, and partner behavior is correlated (they read the same forums, attend the same conferences). Use a practical threshold: a 10% or greater lift that persists for 2+ weeks across multiple sub-segments is actionable. For commission changes that cost significant margin, require a 15%+ lift.

Scaling a Winning Variant

Phase 1 -- Expand: Roll the winning variant to 50% of the remaining affiliates in that segment. Monitor for 2 weeks.
Phase 2 -- Validate: If the lift holds at 50%, roll to 100% of the segment. The lift may compress 10-20% at full scale.
Phase 3 -- Document: Record the test hypothesis, result, lift magnitude, and any caveats in your experimentation log.
Phase 4 -- Cross-segment: Test whether the winning variant also works in adjacent segments before assuming it is universal.

Expect a "scale-down" effect. The variant that won with 30 affiliates may show a smaller lift when rolled to 300 because the original test group was not perfectly representative. Budget for a 10-20% compression in lift when scaling.

Building an Experimentation Backlog

Every test generates new hypotheses. A commission test that reveals tier-2 affiliates respond to RevShare should prompt a follow-up test on what RevShare percentage maximizes LTV. A creative test that shows game-specific landing pages outperform generic ones should spawn tests on which game categories convert for which traffic sources.

Maintain a prioritized backlog of test ideas ranked by expected impact and feasibility
Run one commission test and one creative test simultaneously -- they do not interfere with each other
Review results monthly and feed learnings into your overall commission strategy and partner segmentation
Share sanitized results with your affiliate managers so they can advise partners based on data, not intuition

Create a simple experimentation log in your reporting dashboard: test name, hypothesis, start/end date, result, and next action. After 6 months, this log becomes your most valuable strategic asset -- a data-backed map of what actually drives your program.

Key Takeaways

Do not call tests early -- require minimum 4 weeks and 100+ conversions per group for commission tests
Always measure revenue, conversion rate, and quality together -- a conversion lift without profitability is meaningless
Use a practical significance threshold (10%+ lift for 2+ weeks) rather than strict statistical tests for small samples
Expect 10-20% lift compression when scaling a winning variant from test group to full program
Maintain an experimentation backlog and log -- after 6 months it becomes your most valuable strategic asset

Segment-Based Offer Experiments Back to all courses

Related Resources

Explore Further

Key terms and tools related to this lesson. Deepen your understanding and see how Track360 supports these concepts.

Browse glossary

General→

Measuring Results and Scaling Winners

When Is a Test "Done"?

Metrics That Matter

Statistical Significance for Affiliate Tests

Scaling a Winning Variant

Building an Experimentation Backlog

Key Takeaways

Explore Further

Affiliate Program

Affiliate Manager

Publisher

Sweeps Coins