How do you prioritize experiments?

David Mannheim
11 min readApr 16, 2021

TLDR; Prioritise down, not up. It’s not about prioritising experiments. It’s about prioritising everything that comes before the experiment.

Previously I spoke about why prioritisation is so damn difficult. I thought it was important to highlight that first so we can start to frame the thinking, and think about what we need in our prioritisation models for experimentation. Those things being alignment, focus and objectivity.

I’m going to save you a tonne of reading and bullet point how, I think, prioritisation should be done. I’m hoping you’re going to say “David, those bullet points are platitudinal. Who are you Boris Johnson? They don’t mean anything. I need to read more”. They’re not intended to be click-bait-y, they just book end this — hopefully — debate. And I encourage you to do just that — debate please. Please read and please comment. I learn more from the thread of the your reactions than reading 20 different articles on a single topic that are usually copied and pasted from one another.

In my opinion…

  1. Prioritisation models should be contextualised, evaluated and iterated. Don’t forget.
  2. The objective of prioritisation should be alignment, focus and objectivity
  3. The execution is different than the hypothesis. It is my hypothesis where most people get bogged down in prioritisation — because they are too narrow in their execution.
  4. Prioritising experiments is about prioritising everything before the experiment

Alignment.

All the way back in 2007, a poll of global CEOs (Donald Sull, Closing the Gap Between Strategy and Execution) suggested the no. 1 obstacle between strategy and execution is a lack of alignment. That is still true today.

If current prioritisation models lack alignment — which I argue they do — then this is what we must first address. Having clear business objectives is vital otherwise your experimentation program will become a centralised silo of activity. Which, they often so become or are. It doesn’t help that AB testing is a rather sexy activity that stakeholders want in their wheelhouse without truly understanding it. But that’s OK — I don’t expect them to understand it and more than I understand global merchandising strategies for China.

Mike Lee, CEO of My Fitness Pal, stated that

“alignment is about helping people understand what you want them to do”.

Thus, without connecting experiments to the wider business, it becomes inherently silo’d. So let’s connect the dots.

Your next question should, therefore, be how to prioritise your business objectives, not how to prioritise your experiments.

Purpose of an experiment.

I remember getting a brief at User Conversion for a new client. Big name in the space too. eCommerce. £300m business with over 3,000 employees. They presented us with their experiment backlog and asked “how would we tackle this”. There were 148 items on that list.

So don’t worry. You’re not alone.

One of my philosophies why this type of stuff exists is a mis-understanding of “what is an experiment”

An experiment is an executional hypothesis that can take many different forms. What is it that you’re testing? Are you testing the concept or the execution? A simple distinction between the two.

A hypothesis is just a statement you want to prove or disprove, with reasons why it exists. And we run experiments to validate whether our hypothesis is correct or incorrect.

The more nuanced the experiment therefore, for example, an executional play rather than a solution or conceptual play, the higher the chance of inherent bias, missing the big picture and mis-alignment. Let me explain.

Let’s say I identified some research to indicate that users didn’t know what STAR was within the Disney+ package Perfectly conceivable, it’s an ambiguous piece of brand communication. Actually, come to think of it, I barely know what it is. Do you? I just know that one day my Disney+ experience had Bambi, and now it also has Die Hard.

My hypothesis might be “We believe that by adding clarity to what STAR is on the Disney sign up page, we should see an increase in the % of users reaching the next stage of the funnel”. The observations should aways follow this i.e the reason why the hypothesis exists.

How I execute that could be a million and one ways. It’s a conceptual hypothesis that has room to breath and learn. Something I knocked up earlier, Carol Vorderman style…

But if my hypothesis were to be “we believe by placing an information icon next to the Star message on the Disney sign up page, we should see …”; do you see how it changes the framing of the purpose of the experiment? It’s executional.

As a result, the acid test, I feel, for a good hypothesis is “what do you want to learn from this. Ask yourself that from your last test. If it’s whether information icons added more clarity, then… 🤷‍♂️ But if it’s whether more clarity of the proposition impacts a users propensity to convert. Well then. 🎆

This approach also, in my experience, forces you to think in an MVP (minimum viable product) format which inherently reduces engineering effort.

This is where we run into a discussion on multi-variate testing. Personally, I’m not for those. Why? I revert us back to what a hypothesis should be. A statement that is proven or disproven. It has a binomial metric assigned against it; a yes or a no. It is not a statement that sits alongside a series of executions which vary in performance, thereby assigning a non-binomial metric against it or a range of figures. This is my personal philosophy with AB testing that I advocate, proving or disproving a hypothesis, learning and then iterating from those learnings. As a result, I can only count on one hand whenever I’ve done a multi-variate test. Each to their own.

Outcomes vs Outputs.

To the point around experiments being executional hypotheses; if your experiment is so embedded into the execution, not the hypothesis, and you prioritise on that you will always lack iteration. And a roadmap needs to move and be iterated on.

Unfortunately, when it comes to experimentation roadmaps, few are. Why?

The issue is that anytime you put a list of ideas on a document entitled “roadmap”, no matter how many disclaimers you put on it, people across the company will interpret the items as a commitment

Marty Cagan

Too often are experiment roadmaps, that I see, set in stone. There are two reasons for this:

  1. The experiment is so bedded into in the execution, not the concept. Meaning if the experiment ‘fails’ as it were, any analysis or lack thereof may not highlight that it was the execution that failed, not the concept. And you get a lot of hopping from one to the next…
  2. An over-structured process that lacks fluidity. Often as a result of having a ‘testing to win’ attitude rather than a ‘testing to learn’ attitude.

Either I’m going to disappoint you by giving you exactly what we thought six months ahead of time was the best solution when it’s not, or by changing course and having lied to you

David Cancel

If we appreciate an experimentation roadmap is there to help us learn, the roadmap should be fluid — no more than 3–6 months in advance — and aligned to outcomes, not output.

Output prioritisation is very much what you’re going to deliver. When it comes to AB testing that ‘thing’ is nuanced and therefore involves the execution. Despite how fuzzy you make it, stakeholder will rarely remember any up-front buts or caveats. Using prioritisation frameworks before that cause…well…

Frameworks are output driven and assume you’ve already done discovery on those items.

Andrea Saez

Outcome prioritisation focuses on the metrics that you’re trying to move by always linking it back to “the problem to be solved,” Those metrics might be qualitative or quantitative — but it’s useful in breaking down the behemoth that is “conversion rate”. It gives the owner more freedom to move, learn and iterate. Bruce McCarthy wrote the book on outcomes-based roadmaps — Product Roadmaps Relaunched.

If we were to go along an experimentation roadmap based on outputs, we create an immovable backlog of execution. There is no room to move, to learn, to iterate. And what you’ll find is that those towards — I’d say even the middle of the backlog — become those that are just never done and remain in the backlog ether.

I’d recommend two things when writing such roadmaps: using your own language (ideally influenced by the customer, not by marketing), and changing month 1, month 2 etc to “Now, Next, & Consider”.

Here’s an example of how we have presented outcome roadmaps to clients previously. They focus on the over-arching business objective, they focus on quarter at a time using OKRs, they try to tell a story to make it more compelling, keep it loose to allow for movement and they focus on outcomes not outputs — inherently so through the use of key results.

Note: this is a fabricated example of an anonymous retailer otherwise I’ll get told off.

In my experience, it is the outcome prioritisation frameworks that work best within experimentation. And as a result, I often champion OKRs. Always linking it back to the problem. On that note.

OKRs.

What we implemented at User Conversion, I feel, is well formed for a business that is external to the client, and used as a framework that can be contextualised to individual needs. That method, is one of OKRs (objectives and key results).

As a consultant, POD-led business we give autonomy to our consultants. And as businesses are contextual, the answer of “how do you prioritisation” is actually “different each time”. But the concepts are generally the same. Understanding the challenges of experiment prioritisation, and therefore utilising a methodology (or pseudo thereof) of OKRs.

The Google infamously-made “objectives and key results” (well, IBM originated but Google made famous), I genuinely think it was a turning point in our approach with clients. Why?

  • the experiment output is less about the revenue attribution, and more about the leanings learning toward the objective (given the muddy waters with experiment financial attribution, which is completely a different post). It was a win-win-win.
  • OKRs gives focus. A series of experiments focussed on improving a specific feature, experience or behaviour is 100x more impactful than a series of experiments arbitrarily ‘prioritised’ within a spreadsheet.
  • As well as alignment. Both client and agency know what we are working towards and how each experiment fits into that piece of the puzzle.
  • It breaks down “conversion rate optimisation” into manageable chunks. There’s always been debate of the “conversion rate” within that phrase, and so, removing the ambiguity and leaving just “optimisation” helps when looking at micro and macro metrics.

And from those OKRs we have a QBR (quarterly business review) that reviews — not necessarily the success or failure of that OKR, but the learnings from that OKR; often answering 3 questions (all 3 are within our hypothesis documentation)

  1. What did we learn
  2. What does that mean
  3. What next

Our prioritisation is done differently at different levels with different businesses.

You’ll notice a theme across all of them that they are done collaboratively and visually.

Business and project objectives

OKR prioritisation can be done in many ways and probably every time I’ve helped businesses with this — despite being a lonely external consultant — I’ve done it differently each time. I’d say visually, collaboratively and with an eye on value vs risk vs cost is important to me. I don’t profess to be a master on this, but I’ve read of Eric Leis’ Validated Learning and Dave McClure’s AARRR model which I find useful.

User Problems

Anxiety. Frustration. Motivation. Delight. I’m a fan of KANO model prioritisation and / or dot voting and / or story mapping. Anything that collaborative where we can plot user needs against how they’re being solved.

Hypothesis

I suppose this is what you’ve all been waiting for. Personally, I prioritise collaboratively and ask one single question “Will this help me reach my desired outcome?”. Honestly, I really like Teresa Torres’ opportunity solution tree. But I’m a creative person, so I like the concepts of gut and intuition and discussing problems.

The concept behind PXL is also really strong — so long as the attributes are tailored to your business needs. I like Karl Gils explanation (recently that was so insightful it forced me to add to this article) of “direct impact of this page on revenue (or whatever the KPI of the test is) — it forces the hypothesis creator to be always thinking about the commercial benefit of the experiment. Same applies for the example below with SCORE where volume and value are considered within an audience prioritisation.

That goes for saying something about minimum detectable effect. This is an input that’s worth considering, not so much within prioritisation, but “is your test worth running” — very well talked about in this article by Bhavik Patel.

Execution

Despite the downfalls that I mentioned in my last post, this is where I do think PIE, ICE, PXL etc are more useful and where the execution can be more simplified. Although impact is like looking into a crystal ball, and high-effort based test are almost never completed, they are necessary evils.

As a small bonus and shout out…

I actually think the guys at Conversion.com do a cracking job of reflecting this way of thinking in a slightly different way. They recognise that the goal (the “O” in OKRs) trickles down to the metrics (the “KRs” in OKRs). They reflect on audiences to determine personalisation focus — and prioritise this on volume, value and arbitrarily potential. Then areas, which I admit, I’m not the biggest fan of compartmentalising a complex journey into a series of 5x templates. ‘Levers’, those concepts that motivate a user to make a decision prioritised on confidence and win rate. Followed by, finally, the experiment.

Finally…

Hopefully this isn’t a running theme, but in true opinion-led-post-stylee, again, there’s no right or wrong. If there was, there’d be no debate and everyone would be using PIE. I’m pretty sure most do anyway. 🤷

The core messages I wanted to get across are:

  1. Prioritisation models should be contextualised, evaluated and iterated. Don’t forget.
  2. The objective of prioritisation should be alignment, focus and objectivity
  3. The execution is different than the hypothesis. It is my hypothesis where most people get bogged down in prioritisation — because they are too narrow in their execution.
  4. Prioritising experiments is about prioritising everything before the experiment

Because of the challenges of alignment, focus and a lack of objectivity, I’ve always found OKRs to be the best method of prioritising experiments.

One last thing … I’d love it if you could complete just ONE of the below. It really helps me to get feedback, speak to people, create relationships and learn from others.

  1. Please read, please comment, please share. I learn more from the thread of the your reactions than reading 20 different articles on a single topic.
  2. Ask questions and vote https://app.sli.do/event/orj9eqab/live/questions. I answer these on a weekly basis by taking the top voted topic and writing about it with my stories and experience.
  3. Subscribe to my newsletter at https://optimisation.substack.com/

--

--

David Mannheim

Stories and advice within the world of conversion rate optimisation. Founder @ User Conversion. Global VP of CRO @ Brainlabs. Experimenting with 2 x children