archives –––––––– @btnaughton

Some simple experiments in linear regression using scipy, Stan, and PyMC3.

Take some data from Google spreadsheets that includes a response variable (y) and one or more predictors (x1, x2). I assume that the first column is the response variable and apply linear regression several different ways.

from __future__ import print_function, division

import requests
import numpy as np
import seaborn as sns
import pandas as pd
import pymc3 as pm
import pystan
import matplotlib as mpl
import matplotlib.pyplot as plt

from StringIO import StringIO
from pprint import pprint
from IPython.display import display, HTML, SVG
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from scipy.stats import norm, probplot

def uprint(astr): print(astr + "\n" + "-"*len(astr))
def show_html(astr): return display(HTML('<center>{}</center>'.format(astr)))

def show_svg(astr, w=1000, h=1000):
    SVG_HEAD = '''<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">'''
    SVG_START = '''<svg width="{w:}px" height="{h:}px" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink= "http://www.w3.org/1999/xlink">'''
    return display(SVG(SVG_HEAD + SVG_START.format(w=w, h=h) + astr + '</svg>'))

# Plotting style
plt.rc("axes", titlesize=20, labelsize=15, linewidth=.25, edgecolor='#444444')
sns.set_context("notebook", font_scale=1.2, rc={})
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Data

Here we load the data into a DataFrame and look at a summary.

# Read in data from google spreadsheet
gsheet_key = "1q2xrN2YQ4KrGedaupDX2aM9dWGSmz23v7UF2GxwlO0A"
gsheet_url = "https://docs.google.com/spreadsheets/d/1q2xrN2YQ4KrGedaupDX2aM9dWGSmz23v7UF2GxwlO0A/edit?usp=sharing"

csv = requests.get("https://docs.google.com/spreadsheets/d/{}/export?gid=0&format=csv".format(gsheet_key))
df = pd.DataFrame.from_csv(StringIO(csv.text), index_col=False)

show_html("<h1>Data summary</h1>")
show_html(df.describe(percentiles=[]).to_html())

TEST_LOG = False
if TEST_LOG:
    df["x3"] = np.log(df["x3"])

Data summary

y x1 x2 x3
count 100 100 100 100
mean 0.016842 -0.052540 -0.027275 0.011383
std 0.092049 0.969025 0.959984 0.375131
min -0.204356 -2.257561 -2.447623 -0.859929
50% 0.012005 0.019029 -0.072433 0.040495
max 0.252106 2.304593 2.329134 0.934868

Standardizing data

By standardizing the data (subtracting the mean and dividing by the standard deviation), we can avoid some common problems. Specifically, when variables are on vastly different scales it can cause issues.

See http://andrewgelman.com/2009/07/11/when_to_standar/ for a discussion of standardization.

df_fits = {}
for col in df.columns:
    _df_fit = StandardScaler().fit(df[col])
    df[col] = _df_fit.transform(df[col])
    df_fits[col] = _df_fit.inverse_transform
show_html("<h1>Data overview, all-vs-all</h1>")
g = sns.PairGrid(df)
#g.fig.suptitle('Data overview, all-vs-all', y=1.05, fontsize=28)
g.map_upper(sns.regplot)
g.map_lower(sns.kdeplot, cmap="Blues_d")
g.map_diag(plt.hist)

Data overview, all-vs-all

linear regression

Are the data normal?

Regression performs better if the data are approximately normal.

It's not uncommon for some data types to be closer to log-normal than normal, i.e., the data look normal after taking the log. For example, adult blood pressure is approximately log-normal. If that is the case, we will use the logged version.

fig, ax = plt.subplots(figsize=(16,12))
#fig.suptitle('Compare fits for data vs logged data ', y=1.05, fontsize=28)
show_html("<h1>Compare fits for data vs logged data</h1>")
plt.subplots_adjust(hspace=1)
eps = 1e-4

for i, var in enumerate(df.columns):
    ax1 = plt.subplot(len(df.columns), 2, i*2+1)
    ax2 = plt.subplot(len(df.columns), 2, i*2+2)

    #
    # Original data (no log applied)
    #
    (_osm, _osr), (_slope, _intercept, _R) = probplot(df[var])
    ax1.scatter(_osm, _osr, color='steelblue')
    ax1.plot([_osm.min(), _osm.max()], [_intercept + _slope*_osm.min(), _intercept + _slope*_osm.max()], color='seagreen')
    ax1.set(title="{}  probability plot (no log) $R^2:{:.3f}$".format(var, _R**2))

    #
    # Data with log applied
    #
    log_X_var = np.log(eps - df[var].min() + df[var])
    (_osm, _osr), (_slope, _intercept, _Rlog) = probplot(log_X_var)

    ax2.scatter(_osm, _osr, color='steelblue')
    ax2.plot([_osm.min(), _osm.max()], [_intercept + _slope*_osm.min(), _intercept + _slope*_osm.max()], color='seagreen')
    ax2.set(title="{}  probability plot (log) $R^2:{:.3f}$".format(var, _Rlog**2))

    #
    # Decide whether to use original version or the logged version
    #
    if _Rlog > _R:
        _tmpfn1, _tmpval = df_fits[var], df[var].min() # necessary to avoid recursion problem
        df_fits[var] = lambda x: _tmpfn1(np.exp(x) + _tmpval - eps) # change inverse transform
        df[var] = np.log(eps - df[var].min() + df[var])
        annotate_ax = ax2
    else:
        annotate_ax = ax1

    annotate_ax.annotate("Higher $R^2$", xy=(0.7,0.15), xytext=(0.7,0.15), textcoords='axes fraction', color='indianred', fontsize=24)

plt.tight_layout()

Compare fits for data vs logged data

linear regression

Linear Regression

Apply ordinary least-squares linear regression.

X = df[df.columns[1:]].values
y = df[df.columns[0]].values

ols = LinearRegression(fit_intercept=True, normalize=False, n_jobs=-1)
ols.fit(X, y)

show_html('''
<h1>OLS linear regression results</h1>
<table>
<tr><td>Intercept</td><td>{intercept:.3f}</td></tr>
<tr><td>Coefficients</td><td>{coef:s}</td></tr>
<tr><td>Residual sum of squares</td><td>{rss:.3f}</td></tr>
<tr><td>$R^2$ of the prediction.</td><td>{score:.3f}</td></tr>
</table>'''.format(intercept = ols.intercept_,
            coef = ', '.join("{:s}:{:.3f}".format(df.columns[1:][i], ols.coef_[i]) for i in range(len(ols.coef_))),
            rss = np.mean((ols.predict(X) - y) ** 2),
            score = ols.score(X, y))
)

OLS linear regression results

Intercept0.000
Coefficientsx1:0.218, x2:0.163, x3:0.126
Residual sum of squares0.911
$R^2$ of the prediction.0.089
# http://www.stat.columbia.edu/~gelman/presentations/stantalk2014_handout.pdf
# Instead of a gamma use a half-normal, as Gelman recommends (see stan manual "Truncated Priors")
stan_regress = """
data {
  int N;
  int K;
  vector[N] y;
  matrix[N,K] X;
}
parameters {
  real intercept;
  vector[K] betas;
  real<lower=0.01> sigma;
}
model {
  sigma ~ normal(0,1);
  y ~ normal(intercept + X*betas, sigma);
}"""

stan_data = {'X': X, 'y': y, 'N': X.shape[0], 'K': X.shape[1]}

# Fit the model
fit = pystan.stan(model_code=stan_regress, data=stan_data, iter=1000, chains=4)

print(fit)
Inference for Stan model: anon_model_fa066f9745cd16eb2f6cf993e14d51c9.
4 chains, each with iter=1000; warmup=500; thin=1;
post-warmup draws per chain=500, total post-warmup draws=2000.

            mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
intercept 4.0e-3  4.0e-3    0.1  -0.18  -0.06 5.8e-3   0.07   0.19  564.0    1.0
betas[0]    0.22  4.2e-3    0.1 7.6e-3   0.15   0.22   0.29   0.41  587.0    1.0
betas[1]    0.17  4.0e-3    0.1  -0.03    0.1   0.17   0.23   0.37  615.0    1.0
betas[2]    0.13  4.1e-3    0.1  -0.07   0.06   0.13    0.2   0.32  634.0    1.0
sigma       0.98  2.9e-3   0.07   0.85   0.94   0.98   1.03   1.13  562.0    1.0
lp__      -48.43    0.08   1.58 -52.34 -49.28 -48.16 -47.24 -46.26  427.0    1.0

Samples were drawn using NUTS(diag_e) at Mon May  4 11:15:39 2015.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).

Bayesian Regression Models (MCMC)

Bayesian regression is more flexible than OLS. We can choose priors that best represent the data.

burnin, num_samples = 50, 200
np.random.seed(1)

# Add a column for the intercept (x_0)
X_1 = np.column_stack( (np.ones(X.shape[0]),  X) )
beta_names = ["intercept"] + list(df.columns[1:]) #['beta_{}'.format(i) for i in range(X_1.shape[1])]

def regress(regression="ridge", error="normal", beta_names=beta_names, nu=5):
    with pm.Model() as model:
        #
        # Priors
        #
        if regression == "ols":
            betas = [pm.Uniform(beta, -10, 10) for beta in beta_names] # >10 is really slow...
        elif regression == "ridge":
            betas = [pm.Normal(beta, mu=0, tau=1.0) for beta in beta_names]
        elif regression == "lasso":
            betas = [pm.Laplace(beta, mu=0, b=1.0) for beta in beta_names]
        else:
            raise ValueError, "No regression of type {}".format(regression)

        # prior on Normal/T error variance. Gamma(1,1) has mean=1, variance=1
        # HalfCauchy is better...
        tau = pm.HalfCauchy('tau', 1, testval=1.0)

        #
        # Likelihood
        #
        # mu ~ beta0 + beta1 * x1 + beta2 * x2 + beta3 * x3
        mu = sum(betas[n]*X_1[:,n] for n in range(len(betas)))

        if error == "normal":
            y_obs = pm.Normal('y_obs', mu=mu, tau=tau, observed=y)
        elif error == "T":
            y_obs = pm.T('y_obs', nu=nu, mu=mu, lam=tau, observed=y)
        else:
            raise ValueError, "No error model of type {}".format(error)

        #
        # Sample
        #
        start = pm.find_MAP()
        print("starting trace at MAP:", start)
        for k in start:
            if start[k] == np.array(0.0) or True: start[k] += .01 # avoids some corner cases?
        step = pm.NUTS(state=start, scaling=start)
        trace = pm.sample(num_samples, step, start, progressbar=True, njobs=1)[burnin:]
        #pm.traceplot(trace)
    return trace

trace = {}
show_html("<h1>Ordinary Least Squares</h1>")
trace['ols'] = regress(regression="ols", error="normal")
print(trace['ols'])
#pm.traceplot(trace['ols'])

Ordinary Least Squares

starting trace at MAP: {'x2': array(0.16259050929728255), 'x3': array(0.12564858828139785), 'x1': array(0.21839587006704464), 'intercept': array(-1.779530779589083e-08), 'tau': array(1.0740140634338355)}
 [-----------------100%-----------------] 200 of 200 complete in 0.5 sec<MultiTrace: 1 chains, 150 iterations, 5 variables>
show_html("<h1>Ridge Regression</h1>")
trace['ridge/T'] = regress(regression="ridge", error="T")
pm.traceplot(trace['ridge/T'])

Ridge Regression

starting trace at MAP: {'x2': array(0.1778392943710258), 'x3': array(0.08305421591573277), 'x1': array(0.22367363162561957), 'intercept': array(-0.00609829689854981), 'tau': array(1.3774522554543267)}
 [-----------------100%-----------------] 200 of 200 complete in 0.6 sec
linear regression linear regression
show_html("<h1>LASSO</h1>")
trace['lasso/T'] = regress(regression="lasso", error="T")
pm.traceplot(trace['lasso/T'])

LASSO

starting trace at MAP: {'x2': array(0.17055991944072316), 'x3': array(0.07282629907667493), 'x1': array(0.21454246999666238), 'intercept': array(-0.0002746999774107022), 'tau': array(1.3787986641723053)}
 [-----------------100%-----------------] 200 of 200 complete in 0.6 sec
linear regression linear regression
#--------------------------------------------
# Compare PyMC models
#
stats, quartiles = {}, {}
for regression in trace:
    stats[regression], quartiles[regression] = {}, {}

    for var in trace[regression].varnames:
        post_quar = pm.stats._calculate_posterior_quantiles(trace[regression][var], qlist=[5,25,50,75,95])
        _key, quartiles[regression][var] = post_quar.next(), post_quar.next()

        post_stats = pm.stats._calculate_stats(trace[regression][var], batches=100, alpha=0.05)
        _key, stats[regression][var] = post_stats.next(), post_stats.next()

# Graph all models
min_beta = min(min(quartiles[regression][var].values()) for regression in trace for var in beta_names)
max_beta = max(max(quartiles[regression][var].values()) for regression in trace for var in beta_names)

rh = 20
rh2 = 20*1.5*len(beta_names)
svgw = 1000
svgh = rh2 * len(beta_names)
def _norm(val): return svgw * (val - min_beta) / (max_beta - min_beta)
def _normw(val): return svgw * val / (max_beta - min_beta)

ssvg = ''
colors = ["#DD5C5C ", "#54CD7F", "#54BFFF", "#44dd99"]
for i, beta in enumerate(beta_names):
    for j, regression in enumerate(trace):
        color = colors[j%len(colors)]

        s, q = stats[regression][beta], quartiles[regression][beta]
        w = q['q75'] - q['q25']
        xj = q['q25']
        yj = rh2*i + rh*j
        m, lo, hi = s['mean'], q['lo'], q['hi']

        ssvg += '<line x1="{x1:f}" y1="{y1:f}" x2="{x2:f}" y2="{y2:f}" stroke="#777777" stroke-width=".5"/>'.format(
            x1=_norm(lo), y1=yj+rh/2, x2=_norm(hi), y2=yj+rh/2)
        ssvg += '<rect x="{x:f}" y="{y:f}" width="{w:f}" height="{h:f}" fill="{color:s}" />'.format(
            x=_norm(xj), w=_normw(w), y=yj, h=rh, color=color)
        ssvg += '<rect x="{x:f}" y="{y:f}" width="2px" height="{h:f}" fill="#ffffff" />'.format(
            x=_norm(m), w=2, y=yj, h=rh)
        ssvg += '<text x="{x:f}" y="{y:f}" text-anchor="middle" dy="0.3em">{text}</text>'.format(
            x=_norm(xj + w/2), y=yj+rh/2, text=beta)
        ssvg += '<text x="{x:f}" y="{y:f}" text-anchor="end" dy="0.3em" fill="#ffffff" font-size="10">{text}</text>'.format(
            x=_norm(xj + w) - 3, y=yj+rh/2, text=regression)

show_html("<h1>Analysis of the difference between different regressions</h1>")
show_svg(ssvg, w=svgw, h=svgh)

Analysis of the difference between different regressions

linear regression
Comment

This github repository has all of the exercises from Kruschke's Doing Bayesian Data Analysis book translated from R to Python/PyMC3.

I transferred the code into a giant IPython notebook.

Comment

Last week, Biogen released some Phase Ib data for their Alzheimer's drug, Aducanumab (BIIB037). I don't know too much about how these issuances work, but it's bizarre to me that you can show a chart at a conference, and not the actual data, and yet the information increases Biogen's value by billions of dollars. Maybe someone has access to the raw data, but I can't find it --- that sounds illegal for a public company anyway.

Since there is a lot of money at stake, there are people on Wall Street reanalyzing the data, though not publicly. I found an interesting example from Morgan Stanley here: http://www.investorvillage.com/smbd.asp?mb=2793&mn=414&pt=msg&mid=14782583

On Twitter, there was some skepticism about the data that piqued my interest. Someone said they didn't trust the data because just a few patients changing tack could change the results. It sounds plausible, but it's not obvious how to quantify that. Others said the experiment was underpowered; with about 20 patients per arm of the study, that also sounds plausible.

I decided to take a look at the data to see if there is anything more quantitative to say.

# This cell is just setup
from __future__ import division, print_function
from IPython.display import Image, SVG
import numpy as np
import pandas as pd
import random
from pprint import pprint
from matplotlib import pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
SVG_HEAD = """<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">"""
def uprint(astr): print("{}\n".format(astr) + "-"*len(astr))
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Parsing data from slides

I found some slides from Biogen's talk here (http://www.biospace.com/News/biogen-idec-slides-for-alzheimers-drug-wow/369620/). They look like they were taken by someone in the audience with a cellphone.

I thought the data from a slide on reduction in SUVR (a measure of plaque volume in the brain) looked the meatiest. They also showed data from MMSE and CDR-SB (paper-based tests), and a couple of other slides on plaque volume. For my purposes, I think it's reasonable just to test the assumption that the drug reduces plaque volume, not that it improves cognition.

I used Photoshop's awesome perspective crop tool to correct for the perspective on these charts. Then I used SVG to manually determine the values in the chart. Not terribly sophisticated, but I think it works as well as anything else out there; it's flexible and only a few lines of code.

IWH = (805,596) # image dimensions after crop
origin = (162,242)
totalwh = (604,238)
totaly = -0.3
bw = 46
# week, dose, num, xy, wh, se
data = [(26, 0,  34, (35,0),  (bw,5),   15),
        (26, 1,  26, (81,0),  (bw,26),  15),
        (26, 3,  27, (127,0), (bw,69),  15),
        (26, 6,  23, (173,0), (bw,112), 17),
        (26, 10, 27, (219,0), (bw,162), 16),
        (54, 0,  21, (334,0), (bw,0),   17),
        (54, 1,  21, (380,0), (bw,44),  18),
        (54, 3,  26, (426,0), (bw,110), 16),
        (54, 6,   0, (472,0), (bw,0),    0),
        (54, 10, 21, (518,0), (bw,210), 17)
        ]

def bg(xy,wh):
    return '''<rect x="{}" y="{}" width="{}" height="{}" fill="#ffff00" fill-opacity="0.05" />
        '''.format(xy[0],xy[1],wh[0],wh[1])

def box(xy,wh,sd):
    return '''<rect x="{}" y="{}" width="{}" height="{}" stroke="#ff0000" fill="none" />
        <line x1="{}" y1="{}" x2="{}" y2="{}" stroke="#00ffff" />
        '''.format(origin[0] + xy[0], origin[1] + xy[1], wh[0], wh[1],
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] - sd,
                   origin[0] + xy[0] + wh[0]/2, origin[1] + xy[1] + wh[1] + sd)

svg = SVG_HEAD + """<svg width="{:d}px" height="{:d}px" version="1.1"
     xmlns="http://www.w3.org/2000/svg" xmlns:xlink= "http://www.w3.org/1999/xlink">
    <image xlink:href="biospace-news-biogen-idec-2.png" x="0" y="0" width="{:d}px" height="{:d}px"/>
    {} {}</svg>""".format(IWH[0], IWH[1], IWH[0], IWH[1],
                          bg(origin, totalwh), ''.join([box(dat[3],dat[4],dat[5]) for dat in data]))


SVG(svg)

Data

Now I can map these hand-drawn rectangles onto numbers, using the transparent yellow box as a reference. Unfortunately, there is no data for Week 54 / 6 mg/kg.

Based on these means and standard deviations, I can generate example values, maximizing the entropy since I want these values to represent only what I know from the chart. Here, this just means assuming a normal distribution. In reality, it's unlikely to be normal, since there are obviously important covariates like APOE e4 status.

def format_data(data):
    return "Week\tDose\tNum\tMean\tSD\n" + \
        "\n".join('{}\t{}\t{}\t{:.3f}\t{:.3f}'.format(d[0], d[1], d[2], d[3] ,d[4]) for d in data)

def transform(data):
    tdata = []
    for dat in data:
        y = dat[4][1] * (totaly/totalwh[1])
        sd = dat[2]**.5 * abs(dat[5] * (totaly/totalwh[1]))
        tdata.append((dat[0], dat[1], dat[2], y, sd))
    return tdata

print(format_data(transform(data)))
Week        Dose    Num     Mean    SD
26  0       34      -0.006  0.110
26  1       26      -0.033  0.096
26  3       27      -0.087  0.098
26  6       23      -0.141  0.103
26  10      27      -0.204  0.105
54  0       21      -0.000  0.098
54  1       21      -0.055  0.104
54  3       26      -0.139  0.103
54  6       0       -0.000  0.000
54  10      21      -0.265  0.098
tdata = transform(data)

def _gen_random(loc, scale, size):
    while True: yield np.random.normal(loc=loc, scale=scale, size=size)

e_vals = {}
example_vals = {}
for d in tdata:
    if d[4] > 0: # sd != 0
        e_vals[(d[0],d[1])] = _gen_random(d[3], d[4], d[2])
        example_vals[(d[0],d[1])] = e_vals[(d[0],d[1])].next()

uprint("Example values generated from chart data")
pprint({k:v[:4] for k,v in example_vals.items()})
Example values generated from chart data
----------------------------------------
{(26, 0): array([-0.19925843, -0.17029101,  0.11315593, -0.01056504]),
 (26, 1): array([ 0.03167159, -0.07239358, -0.00892414,  0.07187854]),
 (26, 3): array([-0.10161264, -0.01548505, -0.02418133, -0.00441314]),
 (26, 6): array([-0.14738112, -0.29103252, -0.13790777,  0.11396582]),
 (26, 10): array([-0.11370946, -0.04119348, -0.25958697,  0.06817597]),
 (54, 0): array([ 0.12597653, -0.01024324,  0.08792991, -0.03493747]),
 (54, 1): array([ 0.01815586, -0.18449709, -0.11815812, -0.14284047]),
 (54, 3): array([-0.46736124, -0.15305449, -0.23322862, -0.20419422]),
 (54, 10): array([-0.34687603, -0.23872551, -0.33569562, -0.11804706])}

t-test

Now that I have some values, I can run t-tests and see if my p-values agree with the p-values in the chart. The chart only indicates whether or not their statistical test produced p-values of less than 0.05 (*), 0.01 (**), or 0.001 (***).

The results generally agree pretty well with the chart. In fact, the first time I ran this code, they agreed completely. Again, covariates are bound to matter here.

At the bottom of the slide it explains that the data were analyzed with an ANCOVA (like ANOVA with covariates). Each dose of the drug is also compared pair-wise to the placebo to get a p-value per dose. The statistical test they used to get their p-values is not mentioned, but if were from a two-group ANCOVA, it should produce similar results to a t-test.

The big caveats are that I don't have access to the listed covariates and I don't know the true distribution of effect sizes.

from scipy.stats import ttest_ind
from itertools import permutations

def _stars(pval):
    stars = [(.001,"***"), (.01,"**"), (.05,"*"), (np.inf,"")]
    return next(s[1] for s in stars if pval <= s[0])

uprint("Pairwise t-test p-values")
for wk in [26,54]:
    uprint("Week {}".format(wk))

    for dose1,dose2 in ((a,b) for a,b in permutations([0,1,3,6,10], 2) if a<b):
        if wk == 54 and 6 in (dose1,dose2): continue # no data for 54/6

        _pvals = []
        for _ in range(100):
            v1, v2 = e_vals[(wk,dose1)].next(), e_vals[(wk,dose2)].next()
            _pvals.append( ttest_ind(v1, v2)[1] )

        pval = np.mean(_pvals) # arithmetic mean
        print("{} vs {}\t{:.4f}\t{}".format(dose1, dose2, pval, _stars(pval)))
Pairwise t-test p-values
------------------------
Week 26
-------
0 vs 1      0.3751
0 vs 3      0.0497  *
0 vs 6      0.0022  **
0 vs 10     0.0000  ***
1 vs 3      0.1364
1 vs 6      0.0132  *
1 vs 10     0.0000  ***
3 vs 6      0.1619
3 vs 10     0.0032  **
6 vs 10     0.1066
Week 54
-------
0 vs 1      0.1441
0 vs 3      0.0013  **
0 vs 10     0.0000  ***
1 vs 3      0.0411  *
1 vs 10     0.0000  ***
3 vs 10     0.0087  **

Robustness

How sensitive are these results to changes in the data? It's not obvious to me what the best way to test this is, especially without the real data. One simple way to test robustness is by zeroing some fraction of the data. In other words, for some fraction of patients, make it so that the drug has zero effect.

Here, I artificially zero from 0 to 100% of the plaque volume values, and plot the p-values for each dose.

from itertools import product
zeros = np.arange(0,1.01,0.05)
wks, doses = [26,54], [1,3,6,10]

pvals = {}
for wk,dose in product(wks, doses):
    if wk == 54 and dose == 6: continue

    for zero in zeros:
        pvals[(wk,dose,zero)] = []
        for _ in range(100):
            v0, vd = e_vals[(wk,0)].next(), e_vals[(wk,dose)].next()
            vd = [v if random.random()>zero else 0 for v in vd]
            pvals[(wk,dose,zero)].append( ttest_ind(v0, vd)[1] )


df = pd.DataFrame.from_dict(pvals)
color_map = {1:"indianred", 3:"darkseagreen", 6:"steelblue", 10:"purple"}

for wk in wks:
    f, ax = plt.subplots(1, figsize=(12, 4))
    plt.title("Week {}".format(wk))
    for dose in doses:
        if wk == 54 and dose == 6: continue
        _ = sns.tsplot(np.log(df[wk][dose].values), time=df[wk][dose].columns, condition=dose, color=color_map)

    plt.ylim(-10)
    plt.xlabel("fraction zeroed")
    plt.ylabel("log(pvalue)")
    plt.axhline(y=np.log(.05), linestyle='--', color="gray", label="0.05")
    _ = plt.text(1.01,np.log(.05),"p=0.05",fontsize=14,color="gray")
    plt.axhline(y=np.log(.01), linestyle='--', color="gray", label="0.01")
    _ = plt.text(1.01,np.log(.01),"p=0.01",fontsize=14,color="gray")
    plt.axhline(y=np.log(.001), linestyle='--', color="gray", label="0.001")
    _ = plt.text(1.01,np.log(.001),"p=0.001",fontsize=14,color="gray")
BIIB037 BIIB037

Results

The results graph shows that you would need to zero out about 20-40% of the data for the results to move down a p-value category (e.g., from 0.01 to 0.05).

Another way to think about that is you would have to have 4-10 patients in an arm of the study who, instead of responding, did not respond at all.

While that doesn't seem implausible, if it happened, only one of the five significant p-values in the graph would be affected. Hence, I think the results as presented are pretty robust to patients switching to non-responders.

Power and priors

Is the study underpowered? Since it's a phase Ib, it's almost by definition underpowered to detect treatment effects. However, it also depends a lot on your prior for an Alzheimer's drug working. If these data were not for Alzheimer's, but something more innocuous, it might not be called underpowered.

The prior could include the general lack of success of Alzheimer's drugs, but also intangibles such as potential data-dredging, unconscious biases, generally bad science (see Ioannidis), the huge amount of money at stake, and maybe (as someone suggested) even the age of the CEO.

The prior could also include investor confidence. A JPM survey gave this drug about a 35% chance of approval (https://twitter.com/TylerHCanalyst/status/578219628054347776/photo/1), where the general probability of any drug being approved after phase I is about 15% (http://pharmamarketresearchconference.com/files/Clinical_Development_Success_Rates_for_Investigational_Drugs.pdf).

Should I buy?

Obviously, I have no idea. The stock is already up about 30% since January though (when preliminary data appeared) and about 10% this week, so maybe only if you think the probability of approval is well above 35%.

Image("biib037_survey.png")
BIIB037
Comment
Brian Naughton | Mon 05 January 2015 | data | statistics bayesian

BEST is a Bayesian t-test

Read More
Brian Naughton | Tue 11 November 2014 | data | bayesian pymc sequencing

Analyzing sequencing read errors with PyMC3

Read More

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More