Boolean Biotech

Let's say you want to create an therapeutic antibody against a protein. Typically, you start by immunizing an animal with the protein (antigen) of interest, and harvesting the antibodies that the animal creates in response. Then you "humanize" the antibody (replace parts of animal IgG with human IgG) and start testing in humans. Although this method (and many permutations) has been working well for many years, there are still unpredictable effects, and an antibody developed in this way is far from assured to be safe and effective.

Recently, companies like X01 have shown the value of using "fully human" antibodies as therapies. X01 started with an anti-thrombin antibody isolated from a patient with an unusual clotting phenotype. They turned this antibody into an anticoagulation therapy and sold it to J&J this year.

So to generalize, instead of immunizing an animal with your antigen, you can try to find a human with the unusual phenotype you want and develop an antibody that way. The major putative advantage is that you start off with a free n-of-one human experiment showing that your therapy is safe and effective. The problem changes from one of biochemistry to bioprospecting.

PCSK9

Human genetics is playing an increasingly important role in drug development, especially in helping determine the best drug targets. The most famous example of this is the gene PCSK9, which, when homozygous null, results in very low LDL, excellent cardiovascular health, and no apparent side-effects. After its discovery, several pharma companies immediately began developing therapies against this target. This discovery also likely played a part in Amgen buying DeCode, Regeneron's huge human genetics effort, and Robert Plenge (@rplenge) joining Merck. For an excellent summary of the PCSK9 story, see this excellent talk by Jonathan Cohen.

Natural knockdowns

So we know that fully human antibodies have very attractive properties, and we know that human genetics can help us find useful targets. Can we combine the two? George Church sometimes shows a slide listing many of the known protective mutations. These genes are mostly full knockouts (nulls) and, perhaps surprisingly, these broken genes are sometimes beneficial.

Knockdowns are similar to knockouts, except here the protein is made correctly but then suffocated by something else, like an antibody. (Knockdowns normally refer to inhibition at the nucleotide level, but the term seems to fit better than the alternatives.) Natural knockdowns would present similarly to human knockouts, except that they would not be present at birth, but would develop later in life. In other words, these are unusual autoimmune diseases.

Myostatin

After going through various lists of beneficial knockouts, one stands out: myostatin. Myostatin is an inhibitor of muscle growth, so the knockout results in a muscular phenotype. There is at least one known homozygous null human, an extremely muscular German toddler. In the animal kingdom, there are several examples, including the Belgian Blue cow.

A natural knockdown of myostatin would theoretically result in unexpected muscle development, perhaps late in life, and without a change in lifestyle.

A myostatin knockdown fulfills my criteria:

It is an extracellular protein The protein must be accessible by antibodies.
It has an obvious therapeutic application Myostatin antibodies are potentially a treatment for many muscle-wasting diseases. There are already several in development by a number of companies (e.g., Pfizer, BMS).
It has a measurable phenotype Greatly increased musculature is pretty easy to measure, perhaps even easier than LDL.
The phenotype is benign This is not completely necessary, but it is an attractive property in terms of safety that the antibody does not cause another disease.

Of course, it is not trivial to find a person with this unusual phenotype — if they even exist — but if we could, then they might by carrying an extremely useful therapy in their blood. If you know someone who has added a lot of muscle quickly and cannot explain why, let me know!

Comment

Last week, Biogen released some Phase Ib data for their Alzheimer's drug, Aducanumab (BIIB037). I don't know too much about how these issuances work, but it's bizarre to me that you can show a chart at a conference, and not the actual data, and yet the information increases Biogen's value by billions of dollars. Maybe someone has access to the raw data, but I can't find it --- that sounds illegal for a public company anyway.

Since there is a lot of money at stake, there are people on Wall Street reanalyzing the data, though not publicly. I found an interesting example from Morgan Stanley here: http://www.investorvillage.com/smbd.asp?mb=2793&mn=414&pt=msg&mid=14782583

On Twitter, there was some skepticism about the data that piqued my interest. Someone said they didn't trust the data because just a few patients changing tack could change the results. It sounds plausible, but it's not obvious how to quantify that. Others said the experiment was underpowered; with about 20 patients per arm of the study, that also sounds plausible.

I decided to take a look at the data to see if there is anything more quantitative to say.

# This cell is just setup
from __future__ import division, print_function
from IPython.display import Image, SVG
import numpy as np
import pandas as pd
import random
from pprint import pprint
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
SVG_HEAD = """<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">"""
def uprint(astr): print("{}\n".format(astr) + "-"*len(astr))
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Parsing data from slides

I found some slides from Biogen's talk here (http://www.biospace.com/News/biogen-idec-slides-for-alzheimers-drug-wow/369620/). They look like they were taken by someone in the audience with a cellphone.

I thought the data from a slide on reduction in SUVR (a measure of plaque volume in the brain) looked the meatiest. They also showed data from MMSE and CDR-SB (paper-based tests), and a couple of other slides on plaque volume. For my purposes, I think it's reasonable just to test the assumption that the drug reduces plaque volume, not that it improves cognition.

I used Photoshop's awesome perspective crop tool to correct for the perspective on these charts. Then I used SVG to manually determine the values in the chart. Not terribly sophisticated, but I think it works as well as anything else out there; it's flexible and only a few lines of code.

IWH = (805,596) # image dimensions after crop
origin = (162,242)
totalwh = (604,238)
totaly = -0.3
bw = 46
# week, dose, num, xy, wh, se
data = [(26, 0,  34, (35,0),  (bw,5),   15),
        (26, 1,  26, (81,0),  (bw,26),  15),
        (26, 3,  27, (127,0), (bw,69),  15),
        (26, 6,  23, (173,0), (bw,112), 17),
        (26, 10, 27, (219,0), (bw,162), 16),
        (54, 0,  21, (334,0), (bw,0),   17),
        (54, 1,  21, (380,0), (bw,44),  18),
        (54, 3,  26, (426,0), (bw,110), 16),
        (54, 6,   0, (472,0), (bw,0),    0),
        (54, 10, 21, (518,0), (bw,210), 17)
        ]

def bg(xy,wh):
    return '''<rect x="{}" y="{}" width="{}" height="{}" fill="#ffff00" fill-opacity="0.05" />
        '''.format(xy[0],xy[1],wh[0],wh[1])

def box(xy,wh,sd):
    return '''<rect x="{}" y="{}" width="{}" height="{}" stroke="#ff0000" fill="none" />
        <line x1="{}" y1="{}" x2="{}" y2="{}" stroke="#00ffff" />
        '''.format(origin[0] + xy[0], origin[1] + xy[1], wh[0], wh[1],
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] - sd,
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] + sd)

svg = SVG_HEAD + """<svg width="{:d}px" height="{:d}px" version="1.1"
     xmlns="http://www.w3.org/2000/svg" xmlns:xlink= "http://www.w3.org/1999/xlink">
    <image xlink:href="biospace-news-biogen-idec-2.png" x="0" y="0" width="{:d}px" height="{:d}px"/>
    {} {}</svg>""".format(IWH[0], IWH[1], IWH[0], IWH[1],
                          bg(origin, totalwh), ''.join([box(dat[3],dat[4],dat[5]) for dat in data]))


SVG(svg)

Data

Now I can map these hand-drawn rectangles onto numbers, using the transparent yellow box as a reference. Unfortunately, there is no data for Week 54 / 6 mg/kg.

Based on these means and standard deviations, I can generate example values, maximizing the entropy since I want these values to represent only what I know from the chart. Here, this just means assuming a normal distribution. In reality, it's unlikely to be normal, since there are obviously important covariates like APOE e4 status.

def format_data(data):
    return "Week\tDose\tNum\tMean\tSD\n" + \
        "\n".join('{}\t{}\t{}\t{:.3f}\t{:.3f}'.format(d[0], d[1], d[2], d[3] ,d[4]) for d in data)

def transform(data):
    tdata = []
    for dat in data:
        y = dat[4][1] * (totaly/totalwh[1])
        sd = dat[2]**.5 * abs(dat[5] * (totaly/totalwh[1]))
        tdata.append((dat[0], dat[1], dat[2], y, sd))
    return tdata

print(format_data(transform(data)))

Week        Dose    Num     Mean    SD
26  0       34      -0.006  0.110
26  1       26      -0.033  0.096
26  3       27      -0.087  0.098
26  6       23      -0.141  0.103
26  10      27      -0.204  0.105
54  0       21      -0.000  0.098
54  1       21      -0.055  0.104
54  3       26      -0.139  0.103
54  6       0       -0.000  0.000
54  10      21      -0.265  0.098

tdata = transform(data)

def _gen_random(loc, scale, size):
    while True: yield np.random.normal(loc=loc, scale=scale, size=size)

e_vals = {}
example_vals = {}
for d in tdata:
    if d[4] > 0: # sd != 0
        e_vals[(d[0],d[1])] = _gen_random(d[3], d[4], d[2])
        example_vals[(d[0],d[1])] = e_vals[(d[0],d[1])].next()

uprint("Example values generated from chart data")
pprint({k:v[:4] for k,v in example_vals.items()})

Example values generated from chart data
----------------------------------------
{(26, 0): array([-0.19925843, -0.17029101,  0.11315593, -0.01056504]),
 (26, 1): array([ 0.03167159, -0.07239358, -0.00892414,  0.07187854]),
 (26, 3): array([-0.10161264, -0.01548505, -0.02418133, -0.00441314]),
 (26, 6): array([-0.14738112, -0.29103252, -0.13790777,  0.11396582]),
 (26, 10): array([-0.11370946, -0.04119348, -0.25958697,  0.06817597]),
 (54, 0): array([ 0.12597653, -0.01024324,  0.08792991, -0.03493747]),
 (54, 1): array([ 0.01815586, -0.18449709, -0.11815812, -0.14284047]),
 (54, 3): array([-0.46736124, -0.15305449, -0.23322862, -0.20419422]),
 (54, 10): array([-0.34687603, -0.23872551, -0.33569562, -0.11804706])}

t-test

Now that I have some values, I can run t-tests and see if my p-values agree with the p-values in the chart. The chart only indicates whether or not their statistical test produced p-values of less than 0.05 (*), 0.01 (**), or 0.001 (***).

The results generally agree pretty well with the chart. In fact, the first time I ran this code, they agreed completely. Again, covariates are bound to matter here.

At the bottom of the slide it explains that the data were analyzed with an ANCOVA (like ANOVA with covariates). Each dose of the drug is also compared pair-wise to the placebo to get a p-value per dose. The statistical test they used to get their p-values is not mentioned, but if were from a two-group ANCOVA, it should produce similar results to a t-test.

The big caveats are that I don't have access to the listed covariates and I don't know the true distribution of effect sizes.

from scipy.stats import ttest_ind
from itertools import permutations

def _stars(pval):
    stars = [(.001,"***"), (.01,"**"), (.05,"*"), (np.inf,"")]
    return next(s[1] for s in stars if pval <= s[0])

uprint("Pairwise t-test p-values")
for wk in [26,54]:
    uprint("Week {}".format(wk))

    for dose1,dose2 in ((a,b) for a,b in permutations([0,1,3,6,10], 2) if a<b):
        if wk == 54 and 6 in (dose1,dose2): continue # no data for 54/6

        _pvals = []
        for _ in range(100):
            v1, v2 = e_vals[(wk,dose1)].next(), e_vals[(wk,dose2)].next()
            _pvals.append( ttest_ind(v1, v2)[1] )

        pval = np.mean(_pvals) # arithmetic mean
        print("{} vs {}\t{:.4f}\t{}".format(dose1, dose2, pval, _stars(pval)))

Pairwise t-test p-values
------------------------
Week 26
-------
0 vs 1      0.3751
0 vs 3      0.0497  *
0 vs 6      0.0022  **
0 vs 10     0.0000  ***
1 vs 3      0.1364
1 vs 6      0.0132  *
1 vs 10     0.0000  ***
3 vs 6      0.1619
3 vs 10     0.0032  **
6 vs 10     0.1066
Week 54
-------
0 vs 1      0.1441
0 vs 3      0.0013  **
0 vs 10     0.0000  ***
1 vs 3      0.0411  *
1 vs 10     0.0000  ***
3 vs 10     0.0087  **

Robustness

How sensitive are these results to changes in the data? It's not obvious to me what the best way to test this is, especially without the real data. One simple way to test robustness is by zeroing some fraction of the data. In other words, for some fraction of patients, make it so that the drug has zero effect.

Here, I artificially zero from 0 to 100% of the plaque volume values, and plot the p-values for each dose.

from itertools import product
zeros = np.arange(0,1.01,0.05)
wks, doses = [26,54], [1,3,6,10]

pvals = {}
for wk,dose in product(wks, doses):
    if wk == 54 and dose == 6: continue

    for zero in zeros:
        pvals[(wk,dose,zero)] = []
        for _ in range(100):
            v0, vd = e_vals[(wk,0)].next(), e_vals[(wk,dose)].next()
            vd = [v if random.random()>zero else 0 for v in vd]
            pvals[(wk,dose,zero)].append( ttest_ind(v0, vd)[1] )


df = pd.DataFrame.from_dict(pvals)
color_map = {1:"indianred", 3:"darkseagreen", 6:"steelblue", 10:"purple"}

for wk in wks:
    f, ax = plt.subplots(1, figsize=(12, 4))
    plt.title("Week {}".format(wk))
    for dose in doses:
        if wk == 54 and dose == 6: continue
        _ = sns.tsplot(np.log(df[wk][dose].values), time=df[wk][dose].columns, condition=dose, color=color_map)

    plt.ylim(-10)
    plt.xlabel("fraction zeroed")
    plt.ylabel("log(pvalue)")
    plt.axhline(y=np.log(.05), linestyle='--', color="gray", label="0.05")
    _ = plt.text(1.01,np.log(.05),"p=0.05",fontsize=14,color="gray")
    plt.axhline(y=np.log(.01), linestyle='--', color="gray", label="0.01")
    _ = plt.text(1.01,np.log(.01),"p=0.01",fontsize=14,color="gray")
    plt.axhline(y=np.log(.001), linestyle='--', color="gray", label="0.001")
    _ = plt.text(1.01,np.log(.001),"p=0.001",fontsize=14,color="gray")

Results

The results graph shows that you would need to zero out about 20-40% of the data for the results to move down a p-value category (e.g., from 0.01 to 0.05).

Another way to think about that is you would have to have 4-10 patients in an arm of the study who, instead of responding, did not respond at all.

While that doesn't seem implausible, if it happened, only one of the five significant p-values in the graph would be affected. Hence, I think the results as presented are pretty robust to patients switching to non-responders.

Power and priors

Is the study underpowered? Since it's a phase Ib, it's almost by definition underpowered to detect treatment effects. However, it also depends a lot on your prior for an Alzheimer's drug working. If these data were not for Alzheimer's, but something more innocuous, it might not be called underpowered.

The prior could include the general lack of success of Alzheimer's drugs, but also intangibles such as potential data-dredging, unconscious biases, generally bad science (see Ioannidis), the huge amount of money at stake, and maybe (as someone suggested) even the age of the CEO.

The prior could also include investor confidence. A JPM survey gave this drug about a 35% chance of approval (https://twitter.com/TylerHCanalyst/status/578219628054347776/photo/1), where the general probability of any drug being approved after phase I is about 15% (http://pharmamarketresearchconference.com/files/Clinical_Development_Success_Rates_for_Investigational_Drugs.pdf).

Should I buy?

Obviously, I have no idea. The stock is already up about 30% since January though (when preliminary data appeared) and about 10% this week, so maybe only if you think the probability of approval is well above 35%.

Image("biib037_survey.png")