From Infrastructure to Culture A B Testing LinkedIn

From Infrastructure To Culture A B Testing Linkedin-Free PDF

  • Date:13 Sep 2020
  • Views:2
  • Downloads:0
  • Pages:10
  • Size:2.41 MB

Share Pdf : From Infrastructure To Culture A B Testing Linkedin

Download and Preview : From Infrastructure To Culture A B Testing Linkedin


Report CopyRight/DMCA Form For : From Infrastructure To Culture A B Testing Linkedin


Transcription:

experience to their profile This small change turned out to be We discuss several concepts and novel features we. extremely successful Results from the A B test showed a 14 introduced at LinkedIn that have significantly shaped our. increase in profile edits on volunteer experience experimentation culture. Another experiment was an entire redesign of the Premium Many real A B test examples are shared for the first time in. Subscription s payment flow Figure 2 Apart from providing a public Even though the examples may be LinkedIn specific. cleaner look and feel we reduced the number of payment most of the lessons and best practices we share are applicable. checkout pages and added an FAQ The experiment showed an to experimenting on social networks in general. increase of millions of dollars in annualized bookings about 30 The paper is organized as follows Section 2 introduces XLNT. reduction in refund orders and over 10 lift in free trial orders and how it is built to address several fundamental challenges. Section 3 discusses several more sophisticated A B testing use. cases Section 4 discusses the XLNT features that help create a. stronger experimentation culture and Section 5 concludes. 2 THE XLNT PLATFORM, We realized early on that ad hoc A B testing was not a scalable. approach to sustain the high speed of innovation at LinkedIn We. required an A B testing platform to allow us to quickly quantify. the impact of features Additionally this needed to be achieved in. a scientific and controlled manner across the company Thus we. built XLNT, The XLNT platform is aimed at encompassing every step of the. testing process from designing and deploying experiments to. analyzing them In particular it was built to address the following. concerns and challenges, 1 Scalability We continue to see tremendous growth in both. the number of concurrent experiments and the amount of. data collected per experiment The platform needs to scale to. handle not only today s data volume but also tomorrow s. Figure 2 New Premium Subscription payment flow 2 Incorporating Existing Practices Over time LinkedIn. developed many A B testing practices For instance we have. In both cases the XLNT platform made it feasible to quickly a strong tradition on targeting Section 2 1 2 as we believe. measure the impact of both small and large changes at low cost each one of our members is special and unique It is. This enables us to identify features with a high Return On important to incorporate these practices as part of the. Investment ROI and moreover to quantify the impact in a platform. scientific and controlled manner A screen shot of our analysis 3 Engineering Integration The platform has to be well. dashboard is shown in Figure 3 integrated into LinkedIn s engineering infrastructure The. experimentation platform architecture that works at other. companies is unlikely to work for us due to different. structure and tooling constraints, 4 Flexibility Although the basic A B testing requirements are. similar across the organization teams usually have their own. special needs given the diversity of the products they work. on The platform needs to offer enough flexibility to. accommodate such customization, 5 Usability A B testing is not limited only to the R D.
organizations To make it truly a platform for everyone we. needed to provide an intuitive User Interface together with. APIs for designing deploying and analyzing experiments. Taking these challenges into consideration we share the details of. the XLNT platform in this section with the overall architecture. outlined in Figure 4 below,Figure 3 XLNT analysis dashboard. Here is a summary of our contributions in this paper. We share the details of how we built our experimentation. platform including the engineering challenges we faced and. how we addressed them, We discuss several challenging A B testing scenarios we face. at LinkedIn Some of these challenges particularly the ones. that are specific to experimentation on social networks are. Figure 4 XLNT Platform overall architecture, discussed and shared in a public paper for the first time. 2 1 Design experiments we run at LinkedIn focus on how to provide the most. Experimental design is arguably the most important step in the improved user experience possible for specific user groups This. testing workflow to get good and meaningful results As Sir R A is achieved by creating different segments in an experiment. Fisher put it 27 To consult the statistician after an experiment is targeting different subpopulations as we have mentioned in. finished is often merely to ask him to conduct a post mortem Section 2 1 1 Deciding on the right population to target is the. examination He can perhaps say what the experiment died of most important part of experiment design There are three. To this end we have built the platform to incorporate the standard targeting capabilities provided by the platform. practice at LinkedIn while providing capabilities to enable better Built in Member Attributes The platform provides more than. designs and prevent common pitfalls In this section we first start 40 built in member attributes for experimenters to leverage They. with introducing a few key concepts that are fundamental to our range from static attributes such as a member s country to. experiment model and then focus on targeting a critical dynamic attributes such as a member s last login date These. component used in designing experiments at LinkedIn attributes are computed daily as part of our data pipelines and. pushed to Voldemort a distributed key value data storage system. 2 1 1 Experiment Definitions 22 for real time targeting. Most experimentation terminologies used at LinkedIn are standard. and can be found in any experimental design textbooks 28 We Customized Member Attributes Frequently experimenters need. focus here on only a few definitions that are key to our platform a targeting criterion beyond the default ones provided by XLNT. The platform provides a seamless onboarding process to include. To run an experiment one starts by creating a testKey which is a. member attributes generated regularly from external data. unique identifier that represents the concept or the feature to be. pipelines It is even more straightforward if this is a static list. tested An actual experiment is then created as an instantiation of. generated from a one off job as one can simply upload it to the. the testKey Such hierarchical structure makes it easy to manage. platform These customized attributes are pushed to Voldemort on. experiments at various stages of the testing process For example. a daily basis and can be used the same way as any of the built in. we want to investigate the benefits of adding a background image ones. We begin by diverting only 1 of US users to the treatment then. increasing the allocation to 50 and eventually expanding to Real time Attributes These attributes are only available at. users outside of the US market Even though the feature being runtime such as the browser type or mobile device XLNT. tested remains the same throughout the ramping process it provides an integrated way to target using these attributes or any. requires different experiment instances as the traffic allocations parameters passed during a runtime request For example to. and targeting changes In other words an experiment acts as a target only requests coming from iPhones one just needs to. realization of the testKey and only one experiment per testKey inform the platform that an attribute called osName is to be. can be active at a time evaluated at runtime and target only those with the value equal to. iPhone This feature is used extensively for mobile experiments. Every experiment is comprised of one or more segments with. as new mobile features are usually only rolled out for particular. each segment identifying a subpopulation to experiment on A. mobile app versions This is also beneficial when experimenting. common practice is to set up an experiment with a whitelist. on guest users where no information is available prior to the. segment containing only the team members developing the request Section 3 2 1 includes more discussions on this case. product an internal segment consisting of all LinkedIn. employees and additional segments targeting external users 2 2 Deployment. Because each segment defines its own traffic allocation the The XLNT A B testing platform is a key component of. treatment can be ramped to 100 in the whitelist segment while LinkedIn s Continuous Deployment framework 23 It spans. still running at 1 in the external segments Note that segment across every fabric and stack of LinkedIn s engineering. ordering matters because members are only considered as part of infrastructure providing A B testing capabilities universally. the first eligible segment After the experimenters input their Once the design is completed deploying an experiment involves. design through an intuitive User Interface all the information is the following two components. then concisely stored in a DSL Domain Specific Language For. example the line below indicates a single segment experiment 1 Application Layer This includes any production artifacts. targeting English speaking users in the US where 10 of them are e g web applications and offline libraries Each application. in the treatment variant while the rest in control requires a thick client dependency in order to run. experiments locally track experiment result events and. ab locale en US treatment 10 control 90, interface with the service layer The implementation in the. It is important to mention that each experiment is associated with application layer includes two parts 1 making a simple. a hashID which serves as an input to an MD5 based algorithm one line call to determine the variant and 2 creating a code. used to randomize users into variants By default all experiments path to reflect the new variant behavior accordingly For. of the same testKey share the same hashID and different testKeys example to decide the right color to show to a user in a. have different hashIDs This ensures that a user receives buttonColor experiment we just need to include the line. consistent experience as we ramp up a treatment More below. importantly as we have hundreds of experiments running in. String color client getTreatment memberID, parallel different hashIDs imply that the randomizations between.
buttonColor, active experiments are orthogonal The platform also allows. manually overwriting the hashIDs and the applicable usage cases The second step is then simply changing the color of the. will be discussed in Section 3 1 button depending on the value of color returned above. This is the same across all application stacks including. 2 1 2 Targeting frontend backend mobile or even email experiments. We recognize that not only are our products diverse each one of. our users is special and unique With that in mind many of the. 2 Service Layer This is a distributed cache and experiment variants but also the statistical significance information such as p. definition provider that implements Rest li endpoints 29 It values and confidence intervals. is capable of executing experiments remotely and querying Approximately 4TB of metrics data and 6TB of experiment. the built in member attributes store described in Section assignment data are processed every day to produce over 150. 2 1 2 After the internal testing phase is passed the million summary records Much of this computation utilizes the. experiment owner requests to activate the experiment An large scale joins and aggregations solution provided by the Cubert. SRE Site Reliability Engineer then reviews the framework 34 35 All these data are stored in Pinot 26 our in. specifications and if no red flags are found deploys the house distributed storage system to be queried by the UI. experiment to production Experiment deployments are applications. propagated via the Databus 24 relay and listeners The new. experiment definition is then distributed across LinkedIn s 2 3 1 Metrics. service stacks with updates sent to application clients every 5 LinkedIn has many diverse products Even though there are a. minutes This makes A B testing totally independent of handful of company metrics that everyone optimizes towards. application code releases and can easily be managed through every product has several product specific metrics that are most. a centralized configuration UI likely impacted by experiments in their area As LinkedIn s. At runtime a simple experiment that does not involve targeting on products evolve and new products emerge it is impossible for the. pre defined attributes can be executed locally at the application experimentation team to create and maintain all metrics for all. layer which takes no time delay at all Experiments that require products currently more than 1000 of them Therefore to. member attributes for targeting see Section 2 1 2 are sent to maintain the metrics we follow a hybrid of centralized and. execute at the service layer The results are then communicated decentralized model. back to the application client with a total delay of 1msec on Metrics are categorized into 3 tiers 1 Company wide 2 Product. average Because these are high throughput services with about Specific 3 Feature Specific A central team maintains tier 1. 20k to 30k QPS we need to establish strict SLAs and enforce it metrics Ownership of tier 2 3 metrics is decentralized each. From Infrastructure to Culture A B Testing Challenges in Large Scale Social Networks Ya Xu Nanyu Chen Adrian Fernandez Omar Sinno Anmol Bhasin LinkedIn Corp 2029 Stierlin Court Mountain View CA 94043 yaxu nchen afernandez osinno abhasin linkedin com ABSTRACT A B testing also known as bucket testing split testing or

Related Books