Udacity Free Trial Screener A/B Testing

Experiment Overview

At the time of this experiment, Udacity courses currently have two options on the course overview page: “start free trial”, and “access course materials”. If the student clicks “start free trial”, they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks “access course materials”, they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked “start free trial”, they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. This screenshot shows what the experiment looks like.

Experiment Design

Hypothesis

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn’t have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches’ capacity to support students who are likely to complete the course.

H0: The treatment has no effect on number of people who enroll in the free trial HA: The treatment reduces number of people who enroll in the free trial

H0: The treatment has no effect on the share of people who leave the free trial HA: The treatment reduces the number people who leave the free trial

H0: The treatment has no effect on the number of people who continue past the free trial HA: The treatment affects the number of people who continue past the free trial

Unit of Diversion

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

Metric Choice

Note: Any place “unique cookies” are mentioned, the uniqueness is determined by day. (That is, the same cookie visiting on different days would be counted twice.) User-ids are automatically unique since the site does not allow the same user-id to enroll twice.

Invariant Metrics: number of cookies, number of clicks, click-through-probability

Evaluation Metrics: gross conversion, retention, net conversion

Invariant Metrics

Number of cookies: That is, the unique cookies to view the course overview page, which is also the unit of diversion. We expect a even distribution of number of cookies between experiment and control so they can be comparable

Number of clicks: That is, number of unique cookies to click the “Start free trial” button. People click “Start free trial” button before the experiment so at this point in the funnel the experience should be same across experiment and control. We expect even distribution

Click-through probability: That is, number of unique cookies to click the “Start free trial” button divided by number of unique cookies to view the course overview page. Same reason as above. At this point in the funnel the experience should be same across two groups. Expect even distribution

Evaluation Metrics

Retention: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01) Expect an increase of retention rate in the experiment group

Gross conversion: That is number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the “Start free trial” button. (dmin = 0.01). Expect a decrease of gross conversion in experiment group because the intervention (5 or more hours expectation) could discourage students to complete checkout

Net conversion: That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the “Start free trial” button (dmin= 0.0075). Expect an increase of Net Conversion rate in the experiment group

Estimate of Standard Error

The standard Error is calculated for a sample size of 5000 unique cookies visiting the course overview page

Import Data and Scaling

n_pageviews=40000
n_clicks=3200
n_enroll=660

click_through_probability=0.08 #clicks / pageviews
gross_conversion=0.20625 # enroll / click
retention=0.53 # payment / enroll
net_conversion=0.1093125 # payment / click

factor = 5000 / 40000
n_pageviews = 40000 * factor
n_clicks=3200 * factor
n_enroll=660 * factor

Check N for Normal Distribution

check <- function(n, p, metric) {
  if (n > 9 * ((1-p)/p) & n > 9 * (p/(1-p))){
    cat(metric, 'pass', '\n')
  }
  else{
    cat(metric, 'does not pass')
    print(metric)
  }
}

check(n_clicks, gross_conversion, 'Gross converstion')

## Gross converstion pass

check(n_clicks, net_conversion, 'Net converstion')

## Net converstion pass

check(n_enroll, retention, 'Retention')

## Retention pass

Since unit of diversion is the same as the unit of analysis for each evaluation metric (cookie for Gross Conversion and Net Conversion and user-id for Retention), and we have an N that’s large enough to assume normal distribution. We can calculate standard errors analytically

Compute Standard Errors Analytically

sd <- function(n,p){
  return (sqrt(p*(1-p)/n))
}

sd_GC = sd(n_clicks, gross_conversion)
sd_NC = sd(n_clicks, net_conversion)
sd_Rt = sd(n_enroll, retention)

Sizing

We set alpha level to 0.05 and statistical power to 0.8 (beta = 0.2). The size is calculated using this online calculator

Bonferroni correction

Bonferroni correction is not needed here. These evaluation metrics are very correlated that using Bonferroni correction would be too conservative

Gross Conversion

Baseline conversion rate: 0.0202306 Minimum Detectable Effect: 0.01 Sample size needed: 25835 per clicks per group Number of groups: 2 Clicks through probability: 0.08 Total Page views Required: 25835*2/0.08 = 645,875

Retention

Baseline conversion rate: 0.53 Minimum Detectable Effect: 0.01 Sample size needed: 39115 per enrollment per group Number of groups: 2 Enrollments/Pageview: 82.5/5000=0.0165 Total Page views Required: 39115*2/0.0165 = 4,741,212

Net Conversion

Baseline conversion rate: 0.1093125 Minimum Detectable Effect: 0.0075 Sample size needed: 27413 per enrollment per group Number of groups: 2 Clicks through probability: 0.08 Total Page views Required: 27413*2/0.08 = 685,325

The maximum pageview required of these three metrics is 4,741,212

Duration and Length

Given that there is not other experiment running and the pageview we required is so large, I will divert 100% of traffic to this experiment. Diverting 100% of traffic is risky usually but not so much in this case. This is not a very risky experiment. We don’t expect our intervention has the potential to completely ruin our enrollment rate.

100% traffic: 40,000 page views per day 4,741,212 pageviews would take 119 days 685,325 pageviews (only Net Conversion and Gross Conversion) takes 18 days

18 day experiment is definitely more reasonable and preferable. Running the experiment 119 days is also riskier. Although is unlikley, if the treatment harms user experience, we won’t notice until 120 days later. We also cannot run any other experiment during this four months. I would definitely recommend not testing the retention and discard the second hypothesis. 119 days is just not practical

Data Analysis

control <- read.csv("Final Project Results - Control.csv")
exp <- read.csv("Final Project Results - Experiment.csv")

Sanity Check

We are supposed to do sanity check on all invariant we defined earlier. However, because we only have data on pageview, clicks, Enrollments, and payments, we cannot do sanity check on cookies and click-through probability (don’t have cookies and unique visitors). Since we cannot check cookies, do sanity check on pageview is our second best option

Number of clicks

sum = sum(control[,3])+sum(exp[,3])
sd_click = sqrt(0.5*0.5/sum)
i_click = 1.96 * sd_click
ci_min_click = 0.5-i_click
ci_max_click = 0.5+i_click
observe_click = sum(control[,3])/sum

Number of Pageview

sum = sum(control[,2])+sum(exp[,2])
sd_pageview = sqrt(0.5*0.5/sum)
i_pageview = 1.96 * sd_pageview
ci_min_pageview = 0.5-i_pageview
ci_max_pageview = 0.5+i_pageview
observe_pageview = sum(control[,2])/sum

Sanity Check Summary

df = data.frame(Metric = c('num_clicks', 'num_pageview'),
                Expected = c(0.5, 0.5),
                Observed = c(observe_click, observe_pageview),
                CI_Lower = c(ci_min_click, ci_min_pageview),
                CI_Upper = c(ci_max_click, ci_max_pageview)
                )
df

##         Metric Expected  Observed  CI_Lower  CI_Upper
## 1   num_clicks      0.5 0.5004673 0.4958845 0.5041155
## 2 num_pageview      0.5 0.5006397 0.4988204 0.5011796

# All passed

Effect Size Tests

Data Cleaning

control <- na.omit(control)
exp <- na.omit(exp)

Sum

# Gross conversion of control
clicks_cont = sum(control[,3])
enroll_cont = sum(control[,4])
pay_cont = sum(control[,5])
clicks_exp = sum(exp[,3])
enroll_exp = sum(exp[,4])
pay_exp = sum(exp[,5])

Gross Conversion

d_hat=(enroll_exp/clicks_exp)-(enroll_cont/clicks_cont)
p_pool=(enroll_exp+enroll_cont)/(clicks_exp+clicks_cont)
SE = sqrt((p_pool) * (1-p_pool)*(1/clicks_cont+1/clicks_exp))
CI = SE * 1.96

cat('Observed:', d_hat, '\n')

## Observed: -0.02055487

cat('Confidence Interval:', d_hat-CI, d_hat+CI)

## Confidence Interval: -0.02912336 -0.01198639

The result is statistically significant because the confidence interval does not include 0 The result is also practiaclly significant because the confidence interval does not include -0.01

Net Conversion

d_hat=(pay_exp/clicks_exp)-(pay_cont/clicks_cont)
p_pool=(pay_exp+pay_cont)/(clicks_exp+clicks_cont)
SE = sqrt((p_pool) * (1-p_pool)*(1/clicks_cont+1/clicks_exp))
CI = SE * 1.96

cat('Observed:', d_hat, '\n')

## Observed: -0.004873723

cat('Confidence Interval:', d_hat-CI, d_hat+CI)

## Confidence Interval: -0.01160462 0.001857179

The result is neither statistically significant nor practically significant because the confidence interval contains 0 and the upper bound is less than 0.0075

Sign Test

df <- merge(control, exp, by='Date', all.x=TRUE)
colnames(df)=c('Date', 'Pageview_cont', 'Clicks_cont', 'Enrollments_cont', 'Payment_cont', 'Pageview_exp', 'Clicks_exp', 'Enrollments_exp', 'Payment_exp')

Gross Conversion

df <- df %>% 
  mutate(GC_cont = Enrollments_cont/Clicks_cont) %>% 
  mutate(GC_exp = Enrollments_exp/Clicks_exp)

diff <- sum(ifelse(df$GC_exp > df$GC_cont, 1, 0))
binom.test(diff, nrow(df), 0.5)

## 
##  Exact binomial test
## 
## data:  diff and nrow(df)
## number of successes = 4, number of trials = 23, p-value = 0.002599
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.04950765 0.38781189
## sample estimates:
## probability of success 
##               0.173913

# Statistically Significant. P-value less than 0.05

Net Conversion

df <- df %>% 
  mutate(NC_cont = Payment_cont/Clicks_cont) %>% 
  mutate(NC_exp = Payment_cont/Clicks_exp)

diff <- sum(ifelse(df$NC_exp > df$NC_cont, 1, 0))
binom.test(diff, nrow(df), 0.5)

## 
##  Exact binomial test
## 
## data:  diff and nrow(df)
## number of successes = 13, number of trials = 23, p-value = 0.6776
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3449466 0.7680858
## sample estimates:
## probability of success 
##              0.5652174

# Not Statistically Significant. P-value greater than 0.05

Summary and Recommendations

The goal of this experiment is to test whether setting a clearer time expectation to students upfront could reduce number of students who left free trial without significantly reducing the number of students to continue past the free trial and complete course.

The test result indicates a statistically and practically significant decrease in Gross Conversion and a non-statistical significant decrease in Net Conversion. This means the intervention causes less students to enroll the course and does not improve the number of students remain enrolled past the 14-day boundary (and thus make at least one payment). Therefore, my recommendation is not to launch

Udacity Free Trial Screener A/B Testing

Chen Xue

2022-08-13

Experiment Overview

Experiment Design

Hypothesis

Unit of Diversion

Metric Choice

Invariant Metrics

Evaluation Metrics

Estimate of Standard Error

Import Data and Scaling

Check N for Normal Distribution

Compute Standard Errors Analytically

Sizing

Bonferroni correction

Gross Conversion

Retention

Net Conversion

Duration and Length

Data Analysis

Sanity Check

Number of clicks

Number of Pageview

Sanity Check Summary

Effect Size Tests

Data Cleaning

Sum

Gross Conversion

Net Conversion

Sign Test

Gross Conversion

Net Conversion

Summary and Recommendations