Introducing Sliced, Fiverr’s Awesome A/B Testing Framework

This is the first blog post in a series about Sliced. While this one will introduce you to some basics, the next posts will be get more into the technical aspect of the tool, including its architecture, special features, and the different Sliced clients, including a JavaScript client for front end, and Ruby for back-end, Android, and iOS.

If you’re not familiar with A/B Testing, here’s a simple primer..

Given the importance of such A/B Testing, we decided to build our own testing framework rather than using third party products like Optimizely. The main reasons doing this were:

There were many features we wanted, but none of the existing products offer them all. Some of those will be specified later on.
Since we serve millions of users, using a 3rd party product would’ve been prohibitively expensive. Building Sliced’s core functionality took our small innovation team about six weeks to accomplish for a comparatively reasonable cost.
Other considerations – Here’s a nice blog post discussing whether to build or buy your A/B Testing framework.

We released Sliced internally on March 2014, and ever since then we’ve been running hundreds of experiments, testing just about every feature before we release it to Fiverr website or mobile apps.

Now that we have such a vast experience with Sliced, we’ll be soon working on making it fully open source so that other companies can enjoy the magic.

So, what’s so special about sliced?

Here are some key features:

1. Stunning, responsive design

Active experiments dashboard

An example of a specific experiment dashboard. Beginning with an index of all KPIs (Key Performance Indicators) that we measure those across all experiments, and Custom Events, a feature that allows you to track specific events, like when the user clicks a particular button.

An example of a KPI. A/B Testing is done by comparing a control group, which is the original version of the app, to an experiment group, which has the new feature we wanna test. Here, the experiment group performed better than the control group, with statistical significance, but still requiring some more participants to achieve the required sample size.

2. Support for experiments in the front-end (JavaScript), back-end (Ruby; can be easily extended to other languages), Mobile (both iOS and Android).

Most third-party products allow experimentation on client-side only (i.e., no support for BE). Having the ability to run the experiment on the server side allows us to experiment within deep levels of our system, including places like changes in BE flows, algorithms, search, etc.

3. Easy to use and dynamic.

For FE experiments, no code is required outside of Sliced editor. Create an experiment in seconds. The following example shows all that’s required to change something on your site (in this case, changing the text color of a button to orange):

BE experiments require a single line of code to render an experiment.
For example, we integrated Sliced into Fiverr’s Blog (where this post resides), which is a WordPress based website. It took a few hours and a couple of code lines to do the integration. Plug and play.

4. Track cross-system KPIs as well as Custom Events.

Cross-system KPIs (like orders, registrations, etc.) are always monitored, even if they don’t seem related to the current experiment. This way, if there’s some kind of unexpected effect of the experiment on a KPI, then it won’t be missed.

Specific events can be added to any experiment. No code required, simply specify the event we’re interested in, and Sliced will track it.

Example for monitored events on some experiment.

Example for a specific monitored event. This event will be fired once some DOM elements are inside the user’s viewport

5. Flexible matching rules: Match types are used for deciding if the user should be exposed to an experiment on the current page.

Static match types include using a fixed page identifier, or using the URL. Dynamic match type of ‘page tag’ is used for dynamically tagging the current page being rendered, and allocating the relevant experiment based on that. For example, if we want to run experiments only on Fiverr pages that display Gigs of top rated sellers, we can tag these pages with a sliced tag named ‘has_top_rated_seller_gig.’

For example, in the screenshot below we can see an experiment that uses the ‘page tag’ match type. The value of the tag that is chosen is ‘sitewide,’ so we added the tag ‘sitewide’ to all the pages on Fiverr.com, applying this experiment to all the pages on the site.

6. Conditional exposure based on events

Dynamic user activity on the page can be defined as a condition for exposing the user to an experiment.

Example for a conditional exposure. Only once the element with class ‘gig-load-more’ is in viewport, then the user will be marked as exposed to this experiment.

7. Smart traffic allocation options

Experiments can be confined to a single page, multiple pages or all site

Traffic allocation can be calculated automatically according to available traffic on the resource (resource standing for one or more pages). Traffic can also be fixed (e.g., exactly 10% of traffic on homepage).

8. Preview of the experiment both in editor mode as well as on the live site.

9. Multi-variant testing (i.e., more than one test group and one control group)

10. Dynamic, impartial statistical model

Both the statistical significance and the required sample size are calculated dynamically. Both have to be reached in order to be able to declare a winner

Nice post by Airbnb that elaborates on this subject: http://nerds.airbnb.com/experiments-at-airbnb/

11. Funnels drill down

This feature gives us the ability to drill down to the funnels that led to a KPI in our experiment. (We currently focus mostly on funnels leading to Orders KPI, but the feature is generic for any KPI.)

This allows us to notice differences that are specific to a certain funnel, and could have been missed when looking at the aggregated number of total orders.

In the example below we see an experiment that didn’t have a significant increase in total orders, but once we drilled down to the funnels that led to the orders, we saw that in the funnel beginning in search page, there was indeed a significant difference.

Other than giving us better insights as to the effect of the experiment on different funnels, funnels drill down can shorten the time required in order to achieve statistical significance.

More real-life case studies:

1. Simple UX experiment.

Here we experimented with the ‘Delivery Time’ filter in our advanced search page. Contenders were a slider widget or a plain list.

We knew that Delivery Time is extremely important for buyers.
But what will work better: a cool draggable widget with high accuracy, or a familiar standard list with pre-selected options?
Well, in this experiment, simplicity beat coolness:

2. Mixed Results in a heavy experiment.

The following is an example for mixed (statistically significant) results we got from Sliced when we ran an experiment on a new Gig page design we worked on. (The following is an experiment summary written by the Product Manager who led this feature. I left it untouched.)

POSITIVE : New Order + 3.42%

We are seeing very positive correlation between this measurement and supporting measurements: (Payment Token + 3.94, ‘Order Now’ click +4.4%, Account Activation + 4.27%, FTB +4.21%).

POSITIVE: Order with multiples + 11.66%

POSITIVE : Social Measurements
FB share + 79.19%, Twitter Share + 36.14%, gPlus + 34.43, LinkedIn +15.79%, Email Share +312% (!)

NEGATIVE: Order with Extras -5.48%

This is something we will have to optimize going forward (change interface to checkboxes).
We also think that it might be related to the drop in “contact seller” (we know that sellers are pushing buyers to order with Extras).

NEGATIVE: Contact Seller -4.8%

We suspect that this drop might influence the amount of order with Extras. However, we are not seeing a decrease in “New Orders” so this might also have a positive meaning (buyers are more comfortable in placing an order without contacting the seller).

NEGATIVE: Collect Gig -14.15%

This is probably a matter of visibility of the feature. Going forward, we will implement a more prominent design, to increase Gig added to collections.

Now, what does the future hold for Sliced?

Well, a lot of exciting stuff is ahead! First, we’ll open-source it and spread the love. Other than that, we have tons of awesome features in the pipeline, waiting to go live and take Fiverr to the next level of A/B Testing.

Stay tuned for some technical level posts showing how Sliced is actually built.

The post Introducing Sliced, Fiverr’s Awesome A/B Testing Framework appeared first on Official Fiverr Blog.