A/B Testing

A class assignment looking at A/B Testing. While design changes were made, the focus was on the process of analysis rather than on the design. The design changes were decided upon by a group with 4 other students: Robert Wang, Carmen Schweizer, Catherine Hong, and Min Ju Kim.

The scenario for this assignment was that we were investigating new ways of making advertisements more attractive for users to interact with. These advertisements are part of a directory page, where they will have to compete with direct competitors who are selling the same service.

Personal role: Statistical analysis was done individually, in charge of HTML implementation of changes

Questions

  • How do you improve a product?
  • How can you determine if a design change has significantly impacted your user's experience?

Outcome

Through using statistical analysis on data collected about a design change, there were no significant changes in how users interacted with the design for the metrics collected.

Measured Metrics

Changes to Design

We made changes to attract attention to the advertisement. Our white background is different from the rest of the page, which is plain text links on a yellow background—this helps it stand out from all other listings. The type is larger, in order to make it easier to click and see. The description was changed to appeal to people’s thrill-seeking and celebrity-obsessed personalities. Because of these changes, I hypothesized the following changes:

Data Collection

Because this assignment was focused on teaching basics of A/B testing, there was no specific audience targeted. Participants were sourced from Mechanical Turk, which was an easy and cheap way to obtain participants. Participants were sent to either test scenario A or B, with A being the webpage containing the original version and B containing the redesigned version.

List of areas to improve data collection: (excerpt from data results)

Data collected was returned in a CSV. Each visitor was given a unique identifier which allowed us to track:

Data Evaluation

Time to click

Test Scenario A: 23, 37, 5, 3, 4, 4, 1, 12, 9, 11, 5 (seconds)
average of 10.364 seconds (3 d.p.)

Test Scenario B: 4, 5, 9, 5, 15, 6, 6, 3, 7, 7, 65, 4 (seconds)
average of 11.333 seconds (3 d.p.)

Increase of 0.969 seconds in the average time it takes to click on the advertisement

Evaluation method: T-Test
Null hypothesis: There is no significant difference between the time to click of the two interfaces (p < 0.05).

p = 0.87, there is no significant difference between the two states and we fail to reject the null hypothesis. Our bigger title/click area did not actually increase the clickthrough rate, resulting in us not meeting our hypothesis.

Dwell Time

Test Scenario A: 95, 6 (seconds)—average of 50.5 seconds
Test Scenario B: 5 (seconds)—average of 5 seconds
Decrease of 45.5 seconds in average dwell time

Evaluation method: T-Test
Null hypothesis: There is no significant difference between the return rate of the two interfaces

p cannot be calculated because there isn’t enough data

Clickthrough Rate

The number of unique visitors that clicked on our advertisement, divided by the number of unique sessions.
Test Scenario A: 11/214 = 0.0514 (3 s.f.) Clickthrough rate is 5.14%
Test Scenario B: 12/236 = 0.0508 (3 s.f.) Clickthrough rate is 5.08%
Decrease of 0.06% in clickthrough rate

Evaluation method: Chi Squared Test
Null hypothesis: Null hypothesis: There is no difference between the clickthrough rate of the two interfaces (p<0.05)

p = 0.98, there is no significant difference between the two states (p>0.05) and we fail to reject the null hypothesis. Although there was a slight decrease in clickthrough rate (implying that our changes were not effective), we determined that it was not a significant difference. Our changes were probably not enough to increase our clickthrough rate, thus not meeting our hypothesis.

Return Rate

The number of return sessions by visitors that have at least one click event, divided by the number of unique visitors that clicked on our advertisement.
Test Scenario A: 2/11 = 0.181 (3 s.f.) Return rate is 18.1%
Test Scenario B: 1/12 = 0.0833 (3 s.f.) Return rate is 8.33%
Decrease of 9.77% in return rate

Evaluation method: Fisher Exact Test
Null hypothesis: Null hypothesis: There is no difference between the return rate of the two interfaces (p<0.05)

p = 0.59, there is no statistical difference between the two states (p>0.05) and we fail to reject the null hypothesis. The description didn’t really matter in this case – it was very different in content, but didn’t affect the outcome, resulting in us not meeting our hypothesis.