How To Run Many Tests At Once: Interaction Avoidance and Detection
Experiments allow us to test how changes to our products affect the behavior of our users. We make many such changes in many experiments running at the same time. As we scale up our experimentation volume at Vista, there will be an increased risk of interactions between experiments occurring.
We say an interaction has occurred when the effects of two—or more—experiments on a metric combined is different from a linear combination of each of those experiments in isolation. For example, two individual changes help customers find what they need more easily, while the two combined have the opposite effect.
In some cases, this interaction might result from a functional conflict between two changes; they are functionally incompatible, causing a difference in effect. In other cases, the changes might be functionally compatible but still interact to change our measurement or user behavior more subtly.
Experience from other organizations suggests that these kinds of interactions tend to be rare in practice. However, since the consequences for the user experience and our learning can be dire, we should still consider their possibility and ensure we take precautions to avoid or detect them.
In this talk, we will define two kinds of interaction effects and their potential consequences, discuss possible strategies for avoiding these interactions, and explain how we can detect them. We will also share some of the tools and processes we have built at Vista to address this issue.