Back to overview
Lesson 5 of 5

Measuring Results and Scaling Winners

8 min read

When Is a Test "Done"?

The most common experimentation mistake is calling a test too early. A 15% lift after one week is not a result -- it is noise. Affiliate programs have high variance because a single large affiliate can swing your numbers in either direction. You need enough data to be confident the difference is real, not random.

  • Minimum run time: 4 weeks for commission tests, 2 weeks for creative tests
  • Minimum sample: 100+ conversions per group for CPA tests, 60+ for RevShare (revenue accumulates slower)
  • Stability check: The directional winner should be consistent for at least 2 consecutive weeks before declaring
  • Outlier audit: Remove or flag any affiliate contributing more than 25% of a group's total volume -- they should not determine the result alone

Metrics That Matter

MetricWhat It Tells YouWatch For
Revenue per affiliateOverall program efficiencySkew from top performers
Conversion rateOffer attractivenessQuality degradation (high conversion + low deposit)
Traffic quality scoreWhether the variant attracts real usersScore drops that lag behind conversion increases
Affiliate churn ratePartner satisfaction with the new modelDelayed churn -- partners may leave 30-60 days after a test ends
Cost per acquisitionTrue cost including commissions + overheadCPA alone misses RevShare long-tail costs
Player/customer LTVDownstream value of acquired usersRequires 60-90 day lookback for meaningful data

A test can "win" on conversion rate but lose on profitability. Always measure at least one revenue metric and one quality metric alongside your primary conversion metric. A 20% conversion lift is worthless if the acquired users churn within a week.

Statistical Significance for Affiliate Tests

Standard A/B testing tools assume large sample sizes and independent observations. Affiliate tests violate both assumptions -- you have dozens of partners, not thousands of users, and partner behavior is correlated (they read the same forums, attend the same conferences). Use a practical threshold: a 10% or greater lift that persists for 2+ weeks across multiple sub-segments is actionable. For commission changes that cost significant margin, require a 15%+ lift.

Scaling a Winning Variant

  • Phase 1 -- Expand: Roll the winning variant to 50% of the remaining affiliates in that segment. Monitor for 2 weeks.
  • Phase 2 -- Validate: If the lift holds at 50%, roll to 100% of the segment. The lift may compress 10-20% at full scale.
  • Phase 3 -- Document: Record the test hypothesis, result, lift magnitude, and any caveats in your experimentation log.
  • Phase 4 -- Cross-segment: Test whether the winning variant also works in adjacent segments before assuming it is universal.

Expect a "scale-down" effect. The variant that won with 30 affiliates may show a smaller lift when rolled to 300 because the original test group was not perfectly representative. Budget for a 10-20% compression in lift when scaling.

Building an Experimentation Backlog

Every test generates new hypotheses. A commission test that reveals tier-2 affiliates respond to RevShare should prompt a follow-up test on what RevShare percentage maximizes LTV. A creative test that shows game-specific landing pages outperform generic ones should spawn tests on which game categories convert for which traffic sources.

  • Maintain a prioritized backlog of test ideas ranked by expected impact and feasibility
  • Run one commission test and one creative test simultaneously -- they do not interfere with each other
  • Review results monthly and feed learnings into your overall commission strategy and partner segmentation
  • Share sanitized results with your affiliate managers so they can advise partners based on data, not intuition

Create a simple experimentation log in your reporting dashboard: test name, hypothesis, start/end date, result, and next action. After 6 months, this log becomes your most valuable strategic asset -- a data-backed map of what actually drives your program.

Key Takeaways

  • Do not call tests early -- require minimum 4 weeks and 100+ conversions per group for commission tests
  • Always measure revenue, conversion rate, and quality together -- a conversion lift without profitability is meaningless
  • Use a practical significance threshold (10%+ lift for 2+ weeks) rather than strict statistical tests for small samples
  • Expect 10-20% lift compression when scaling a winning variant from test group to full program
  • Maintain an experimentation backlog and log -- after 6 months it becomes your most valuable strategic asset