BIIB037: Analyzing Biogen's Alzheimer's drug data

Brian Naughton // Tue 24 March 2015 // Filed under data // Tags data analysis bayesian ipython alzheimer biogen drugs

Last week, Biogen released some Phase Ib data for their Alzheimer's drug, Aducanumab (BIIB037). I don't know too much about how these issuances work, but it's bizarre to me that you can show a chart at a conference, and not the actual data, and yet the information increases Biogen's value by billions of dollars. Maybe someone has access to the raw data, but I can't find it --- that sounds illegal for a public company anyway.

Since there is a lot of money at stake, there are people on Wall Street reanalyzing the data, though not publicly. I found an interesting example from Morgan Stanley here: http://www.investorvillage.com/smbd.asp?mb=2793&mn=414&pt=msg&mid=14782583

On Twitter, there was some skepticism about the data that piqued my interest. Someone said they didn't trust the data because just a few patients changing tack could change the results. It sounds plausible, but it's not obvious how to quantify that. Others said the experiment was underpowered; with about 20 patients per arm of the study, that also sounds plausible.

I decided to take a look at the data to see if there is anything more quantitative to say.

# This cell is just setup
from __future__ import division, print_function
from IPython.display import Image, SVG
import numpy as np
import pandas as pd
import random
from pprint import pprint
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
SVG_HEAD = """<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">"""
def uprint(astr): print("{}\n".format(astr) + "-"*len(astr))
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Parsing data from slides

I found some slides from Biogen's talk here (http://www.biospace.com/News/biogen-idec-slides-for-alzheimers-drug-wow/369620/). They look like they were taken by someone in the audience with a cellphone.

I thought the data from a slide on reduction in SUVR (a measure of plaque volume in the brain) looked the meatiest. They also showed data from MMSE and CDR-SB (paper-based tests), and a couple of other slides on plaque volume. For my purposes, I think it's reasonable just to test the assumption that the drug reduces plaque volume, not that it improves cognition.

I used Photoshop's awesome perspective crop tool to correct for the perspective on these charts. Then I used SVG to manually determine the values in the chart. Not terribly sophisticated, but I think it works as well as anything else out there; it's flexible and only a few lines of code.

IWH = (805,596) # image dimensions after crop
origin = (162,242)
totalwh = (604,238)
totaly = -0.3
bw = 46
# week, dose, num, xy, wh, se
data = [(26, 0,  34, (35,0),  (bw,5),   15),
        (26, 1,  26, (81,0),  (bw,26),  15),
        (26, 3,  27, (127,0), (bw,69),  15),
        (26, 6,  23, (173,0), (bw,112), 17),
        (26, 10, 27, (219,0), (bw,162), 16),
        (54, 0,  21, (334,0), (bw,0),   17),
        (54, 1,  21, (380,0), (bw,44),  18),
        (54, 3,  26, (426,0), (bw,110), 16),
        (54, 6,   0, (472,0), (bw,0),    0),
        (54, 10, 21, (518,0), (bw,210), 17)
        ]

def bg(xy,wh):
    return '''<rect x="{}" y="{}" width="{}" height="{}" fill="#ffff00" fill-opacity="0.05" />
        '''.format(xy[0],xy[1],wh[0],wh[1])

def box(xy,wh,sd):
    return '''<rect x="{}" y="{}" width="{}" height="{}" stroke="#ff0000" fill="none" />
        <line x1="{}" y1="{}" x2="{}" y2="{}" stroke="#00ffff" />
        '''.format(origin[0] + xy[0], origin[1] + xy[1], wh[0], wh[1],
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] - sd,
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] + sd)

svg = SVG_HEAD + """<svg width="{:d}px" height="{:d}px" version="1.1"
     xmlns="http://www.w3.org/2000/svg" xmlns:xlink= "http://www.w3.org/1999/xlink">
    <image xlink:href="biospace-news-biogen-idec-2.png" x="0" y="0" width="{:d}px" height="{:d}px"/>
    {} {}</svg>""".format(IWH[0], IWH[1], IWH[0], IWH[1],
                          bg(origin, totalwh), ''.join([box(dat[3],dat[4],dat[5]) for dat in data]))


SVG(svg)

Data

Now I can map these hand-drawn rectangles onto numbers, using the transparent yellow box as a reference. Unfortunately, there is no data for Week 54 / 6 mg/kg.

Based on these means and standard deviations, I can generate example values, maximizing the entropy since I want these values to represent only what I know from the chart. Here, this just means assuming a normal distribution. In reality, it's unlikely to be normal, since there are obviously important covariates like APOE e4 status.

def format_data(data):
    return "Week\tDose\tNum\tMean\tSD\n" + \
        "\n".join('{}\t{}\t{}\t{:.3f}\t{:.3f}'.format(d[0], d[1], d[2], d[3] ,d[4]) for d in data)

def transform(data):
    tdata = []
    for dat in data:
        y = dat[4][1] * (totaly/totalwh[1])
        sd = dat[2]**.5 * abs(dat[5] * (totaly/totalwh[1]))
        tdata.append((dat[0], dat[1], dat[2], y, sd))
    return tdata

print(format_data(transform(data)))
Week        Dose    Num     Mean    SD
26  0       34      -0.006  0.110
26  1       26      -0.033  0.096
26  3       27      -0.087  0.098
26  6       23      -0.141  0.103
26  10      27      -0.204  0.105
54  0       21      -0.000  0.098
54  1       21      -0.055  0.104
54  3       26      -0.139  0.103
54  6       0       -0.000  0.000
54  10      21      -0.265  0.098
tdata = transform(data)

def _gen_random(loc, scale, size):
    while True: yield np.random.normal(loc=loc, scale=scale, size=size)

e_vals = {}
example_vals = {}
for d in tdata:
    if d[4] > 0: # sd != 0
        e_vals[(d[0],d[1])] = _gen_random(d[3], d[4], d[2])
        example_vals[(d[0],d[1])] = e_vals[(d[0],d[1])].next()

uprint("Example values generated from chart data")
pprint({k:v[:4] for k,v in example_vals.items()})
Example values generated from chart data
----------------------------------------
{(26, 0): array([-0.19925843, -0.17029101,  0.11315593, -0.01056504]),
 (26, 1): array([ 0.03167159, -0.07239358, -0.00892414,  0.07187854]),
 (26, 3): array([-0.10161264, -0.01548505, -0.02418133, -0.00441314]),
 (26, 6): array([-0.14738112, -0.29103252, -0.13790777,  0.11396582]),
 (26, 10): array([-0.11370946, -0.04119348, -0.25958697,  0.06817597]),
 (54, 0): array([ 0.12597653, -0.01024324,  0.08792991, -0.03493747]),
 (54, 1): array([ 0.01815586, -0.18449709, -0.11815812, -0.14284047]),
 (54, 3): array([-0.46736124, -0.15305449, -0.23322862, -0.20419422]),
 (54, 10): array([-0.34687603, -0.23872551, -0.33569562, -0.11804706])}

t-test

Now that I have some values, I can run t-tests and see if my p-values agree with the p-values in the chart. The chart only indicates whether or not their statistical test produced p-values of less than 0.05 (*), 0.01 (**), or 0.001 (***).

The results generally agree pretty well with the chart. In fact, the first time I ran this code, they agreed completely. Again, covariates are bound to matter here.

At the bottom of the slide it explains that the data were analyzed with an ANCOVA (like ANOVA with covariates). Each dose of the drug is also compared pair-wise to the placebo to get a p-value per dose. The statistical test they used to get their p-values is not mentioned, but if were from a two-group ANCOVA, it should produce similar results to a t-test.

The big caveats are that I don't have access to the listed covariates and I don't know the true distribution of effect sizes.

from scipy.stats import ttest_ind
from itertools import permutations

def _stars(pval):
    stars = [(.001,"***"), (.01,"**"), (.05,"*"), (np.inf,"")]
    return next(s[1] for s in stars if pval <= s[0])

uprint("Pairwise t-test p-values")
for wk in [26,54]:
    uprint("Week {}".format(wk))

    for dose1,dose2 in ((a,b) for a,b in permutations([0,1,3,6,10], 2) if a<b):
        if wk == 54 and 6 in (dose1,dose2): continue # no data for 54/6

        _pvals = []
        for _ in range(100):
            v1, v2 = e_vals[(wk,dose1)].next(), e_vals[(wk,dose2)].next()
            _pvals.append( ttest_ind(v1, v2)[1] )

        pval = np.mean(_pvals) # arithmetic mean
        print("{} vs {}\t{:.4f}\t{}".format(dose1, dose2, pval, _stars(pval)))
Pairwise t-test p-values
------------------------
Week 26
-------
0 vs 1      0.3751
0 vs 3      0.0497  *
0 vs 6      0.0022  **
0 vs 10     0.0000  ***
1 vs 3      0.1364
1 vs 6      0.0132  *
1 vs 10     0.0000  ***
3 vs 6      0.1619
3 vs 10     0.0032  **
6 vs 10     0.1066
Week 54
-------
0 vs 1      0.1441
0 vs 3      0.0013  **
0 vs 10     0.0000  ***
1 vs 3      0.0411  *
1 vs 10     0.0000  ***
3 vs 10     0.0087  **

Robustness

How sensitive are these results to changes in the data? It's not obvious to me what the best way to test this is, especially without the real data. One simple way to test robustness is by zeroing some fraction of the data. In other words, for some fraction of patients, make it so that the drug has zero effect.

Here, I artificially zero from 0 to 100% of the plaque volume values, and plot the p-values for each dose.

from itertools import product
zeros = np.arange(0,1.01,0.05)
wks, doses = [26,54], [1,3,6,10]

pvals = {}
for wk,dose in product(wks, doses):
    if wk == 54 and dose == 6: continue

    for zero in zeros:
        pvals[(wk,dose,zero)] = []
        for _ in range(100):
            v0, vd = e_vals[(wk,0)].next(), e_vals[(wk,dose)].next()
            vd = [v if random.random()>zero else 0 for v in vd]
            pvals[(wk,dose,zero)].append( ttest_ind(v0, vd)[1] )


df = pd.DataFrame.from_dict(pvals)
color_map = {1:"indianred", 3:"darkseagreen", 6:"steelblue", 10:"purple"}

for wk in wks:
    f, ax = plt.subplots(1, figsize=(12, 4))
    plt.title("Week {}".format(wk))
    for dose in doses:
        if wk == 54 and dose == 6: continue
        _ = sns.tsplot(np.log(df[wk][dose].values), time=df[wk][dose].columns, condition=dose, color=color_map)

    plt.ylim(-10)
    plt.xlabel("fraction zeroed")
    plt.ylabel("log(pvalue)")
    plt.axhline(y=np.log(.05), linestyle='--', color="gray", label="0.05")
    _ = plt.text(1.01,np.log(.05),"p=0.05",fontsize=14,color="gray")
    plt.axhline(y=np.log(.01), linestyle='--', color="gray", label="0.01")
    _ = plt.text(1.01,np.log(.01),"p=0.01",fontsize=14,color="gray")
    plt.axhline(y=np.log(.001), linestyle='--', color="gray", label="0.001")
    _ = plt.text(1.01,np.log(.001),"p=0.001",fontsize=14,color="gray")

Results

The results graph shows that you would need to zero out about 20-40% of the data for the results to move down a p-value category (e.g., from 0.01 to 0.05).

Another way to think about that is you would have to have 4-10 patients in an arm of the study who, instead of responding, did not respond at all.

While that doesn't seem implausible, if it happened, only one of the five significant p-values in the graph would be affected. Hence, I think the results as presented are pretty robust to patients switching to non-responders.

Power and priors

Is the study underpowered? Since it's a phase Ib, it's almost by definition underpowered to detect treatment effects. However, it also depends a lot on your prior for an Alzheimer's drug working. If these data were not for Alzheimer's, but something more innocuous, it might not be called underpowered.

The prior could include the general lack of success of Alzheimer's drugs, but also intangibles such as potential data-dredging, unconscious biases, generally bad science (see Ioannidis), the huge amount of money at stake, and maybe (as someone suggested) even the age of the CEO.

The prior could also include investor confidence. A JPM survey gave this drug about a 35% chance of approval (https://twitter.com/TylerHCanalyst/status/578219628054347776/photo/1), where the general probability of any drug being approved after phase I is about 15% (http://pharmamarketresearchconference.com/files/Clinical_Development_Success_Rates_for_Investigational_Drugs.pdf).

Should I buy?

Obviously, I have no idea. The stock is already up about 30% since January though (when preliminary data appeared) and about 10% this week, so maybe only if you think the probability of approval is well above 35%.

Image("biib037_survey.png")

Comments


Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More