Brian Naughton // Thu 12 July 2018 // Filed under biotech // Tags biotech alzheimers antibody

There have been a lot of results coming out from Alzheimer's trials recently, and a lot of discussion about the "amyloid hypothesis" and its role in the disease. In this post I'll review some of the evidence, and see how it relates to data from recent AD trial results from Merck and Biogen/Eisai. I mainly reference three good reviews that cover most of the basic facts and arguments around the amyloid hypothesis. Much of my additional data is from AlzForum, a fantastic resource for Alzheimer's news.

The basics

A simplified model of the amyloid hypothesis is that the cell-surface protein APP (Amyloid Precursor Protein) gets cleaved by BACE1 and γ-secretase and released as a 42 amino acid peptide, Aβ42; Aβ42 forms oligomers, then extracellular plaques in the brain; these oligomers and/or plaques somehow lead to intracellular Tau tangles which cause neuronal death.

One big question here is whether it's the plaques or oligomers that are the main trigger:

Several similar studies suggest that Aβ — particularly soluble oligomers of Aβ42 (Shankar et al, 2008) — can trigger AD‐type tau alterations

This model is nicely summarized by a diagram from NRD:

amyloid primer

As the diagram shows, the obvious drug targets are γ-secretase and BACE1 (to stop Aβ42 production), Aβ42 monomers/oligomers/plaques (to reduce plaque formation), Tau (to prevent Tau tangles).

There have been drugs targeting all of these processes. None have been successful:

The only approved drugs for Alzheimer's are fairly ineffectual cholinesterase inhibitors (and an accompanying NMDA receptor inhibitor). These drugs are usually thought of more as symptom relief than treatment.

Aβ42 Antibodies

Why do drug companies keep making Aβ42 antibodies after so many failures? In fact, there is quite a bit of variability in what these antibodies actually do. Ryan Watts, now CEO of Denali Therapeutics, gave an interview with AlzForum back in 2012 where he explained the difference between Genentech's crezenumab and other Aβ42 antibodies.

Q: How is crenezumab different from the other Aβ antibodies that are currently in Phase 2 and 3 trials?

A: We have a manuscript under review that describes its properties. Basically, crenezumab binds to oligomeric and fibrillar forms of Aβ with high affinity, and to monomeric Aβ with lower affinity. By comparison, solanezumab binds monomeric Aβ, and gantenerumab binds aggregated Aβ, as does bapineuzumab. Crenezumab binds all forms of the peptide. Crenezumab is engineered on an IgG4 backbone, which allows it to activate microglia just enough to promote engulfment of Aβ, but not so strongly as to induce inflammatory signaling through the p38 pathway and release of cytokines such as tumor necrosis factor α. Crenezumab is the only IgG4 anti-Aβ antibody in clinical development that I am aware of. We have not seen vasogenic edema in our Phase 1 trials, which was the first main hurdle for us to overcome.

Biogen describes the MOA of their aducanumab antibody like this:

Aducanumab is thought to target aggregated forms of beta amyloid including soluble oligomers and insoluble fibrils which can form into amyloid plaque in the brain of Alzheimer’s disease patients.

Denali Therapeutics

As an aside, Denali is not working on an Aβ42 inhibitor (perhaps for IP reasons since Ryan Watts was heavily involved in the development of crezenumab). Apart from their novel RIPK1 program, they are still pursuing BACE1 and Tau.

Our lead RIPK1 product candidate, DNL747, is a potent, selective and brain penetrant small molecule inhibitor of RIPK1 for Alzheimer’s disease and ALS. Microglia are the resident immune cells of the brain and play a significant role in neurodegeneration. RIPK1 activation in microglia results in production of a number of pro-inflammatory cytokines that can cause tissue damage.

Our three antibody programs are against known targets including aSyn, TREM2 and a bi-specific therapeutic agent against both BACE1 and Tau. Our BACE1 and Tau program is an example of combination therapy, which we believe holds significant promise in developing effective therapies in neurodegenerative diseases.

How does amyloid cause disease?

By one definition, the amyloid hypothesis "posits that the deposition of the amyloid-β peptide in the brain is a central event in Alzheimer's disease pathology". There are several ways that amyloid could cause AD. This diagram from a 2011 NRD review shows three options:

amyloid hypothesis

  • Aβ trigger: Aβ triggers the disease once it reaches a threshold, and once it starts, reducing Aβ levels does not help
  • Aβ threshold: Aβ triggers the disease once it reaches a threshold, but reducing Aβ levels back below the threshold does help
  • Aβ driver: Aβ causes Alzheimer's, and reducing Aβ levels at any time should ameliorate disease

Simplifying, if the Aβ trigger model is correct, then we don't expect anti-Aβ42 antibodies to work, except perhaps preventatively. If the Aβ driver model is correct, then these antibodies should work, at least partially.

From the same review:

A strong case can be made that the deposition of amyloid-β in the brain parenchyma is crucial for initiating the disease process, but there are no compelling data to support the view that, once initiated, the disease process is continuously driven by or requires amyloid-β deposition.

For this reason, after Aβ42 antibody trials fail, the stock answer from pharma is that they need to begin treatment earlier. Of course, the earlier you treat, the longer the trial takes, and the more you need new amyloid detection technologies like Florbetavir/PET to see what's going on. So it's probably natural that there is a gradual transition to ever earlier interventions and longer trials, even though this can also seem like excuse-making.

Evidence for the amyloid hypothesis

Despite all the failed drugs and holes in our understanding, the amyloid hypothesis remains durable due to the weight of evidence in its corner.


Mutations in APP both cause and prevent Alzheimer's. Half of people with trisomy 21 (or any APP duplication, it seems) develop AD by the time they reach their fifties.

A protective variant found in APP also points to a causal relationship, and therapeutic potential (see Robert Plenge on allelic series).

We found a coding mutation (A673T) in the APP gene that protects against Alzheimer's disease and cognitive decline in the elderly without Alzheimer's disease. This substitution is adjacent to the aspartyl protease β-site in APP, and results in an approximately 40% reduction in the formation of amyloidogenic peptides in vitro. Carriers are about 7.5 times more likely than non-carriers to reach the age of 85 without suffering major cognitive decline

A cryoEM structure of Aβ42 fibril from 2017 gives us structural evidence for why APP mutations should be protective or damaging, suggesting that APP's effect on AD is via amyloid/Aβ42. amyloid


The APOE e4 allele strongly predisposes people to Alzheimer's. It's one of the strongest genetic associations known, besides Mendelian diseases. In 2018, Yadong Huang's team at the Gladstone Institiute used iPSCs to investigate the mechanism. Confusingly, they found that APOE is independently associated with both Aβ42 and Tau.

"ApoE4 in human neurons boosted production of Aβ40 and Aβ42"

"It does not do that in mouse neurons. Independent of its effect on Aβ, ApoE4 triggered phosphorylation and mislocalization of tau."

"Based on these data, we should lower ApoE4 to treat AD"

This research may also help explain why mouse models of Alzheimer's have often been misleading.

Other evidence

  • Mutations in PSEN1 and PSEN2 (components of gamma-secretase) cause Alzheimer's.
  • Other diseases are caused by mutations in amyloid-forming proteins. For example, there is a mutation that enables IAPP to form amyloid, which causes Familial British Dementia. In ALS, the aggregated form of SOD1 may be protective and the soluble form disease-causing.

    The formation of large aggregates is in competition with trimer formation, suggesting that aggregation may be a protective mechanism against formation of toxic oligomeric intermediates.

Criticism of the amyloid hypothesis

The main criticism of the antibody hypothesis is that we have been testing anti-amyloid drugs — especially antibodies against Aβ42 — for a long time now, and none of them have had any effect on disease progression.

Derek Lowe (and many of his commenters) has written especially skeptically on his blog:

Eli Lilly remains committed to plunging through this concrete wall headfirst. [...] our gamma-secretase inhibitor completely failed in 2010. Then we took our antibody, solanezumab into Phase III trials that failed in 2012. And found out in 2013 that our beta-secretase inhibitor failed.

Morgan Sheng, VP of Neuroscience at Genentech, is much more positive. In a recent interview in NRD he said:

Let me start by saying that I fully believe in the amyloid hypothesis, and I think it’s going to be vindicated completely within years. [...] phase III results from drugs like Eli Lilly’s solanezumab suggest these agents sort of work; they just don’t work very well

It seems like targeting Tau is an acceptable strategy to amyloid hypothesis skeptics because it's not targeting Aβ42, even though it's still part of the standard amyloid hypothesis model. Drugs that are based on the "amyloid hypothesis" and drugs that work by trying to reduce amyloid tend to get conflated in a confusing way.

Evidence against the amyloid hyothesis

Here I am mainly summarizing from a 2015 review. In this review, the author mainly disputes the "linear story" of the amyloid hypothesis and not the fact that Aβ plays some kind of role in AD.

  • Many people have plaque but no disease.

    The existence of this group of individuals (healthy, but amyloid positive) is a substantial challenge to the amyloid cascade hypothesis. It is clearly possible to have amyloid deposits without dementia; therefore amyloid is not sufficient to cause disease.

    Such individuals are not rare; rather, they account for a quarter to a third of all older individuals with normal or near-normal cognitive function.

  • Anti-Aβ42 antibodies can reduce plaque without alleviating the disease.

    The second test of the amyloid cascade hypothesis has also been done: amyloid has been removed from the brains of individuals with AD and from mice with engineered familial forms of the disease. Here the tests have been less definitive and the evidence is mixed.

  • Other drugs that should work (beta-secretase inhibitors, gamma-secretase inhibitors, BACE1 inhibitors, Tau inhibitors) don't appear to work.

  • Mutations in the Tau gene can cause dementia without plaques forming, so amyloid is not a necessary step in the process.

  • We do not understand AD pathology well. For example, what are the toxic species of Aβ and Tau? What is the connection between Aβ and tangle pathology? Do Tau tangles spread between neurons like prions?

  • There are other possible causes of AD. For example, certain infections could be causative.


Recent work showing an association between herpes virus and Alzheimer's could be thought of as supporting or disputing the amyloid hypothesis. In this model, the virus "seeds" amyloid plaque formation, which then sequesters the virus. The idea that amyloid plaques are protective is not entirely new, beginning with the "bioflocculant hypothesis" for Aβ, published in 2002.

[Robinson and Bishop] posited that Aβ’s aggregative 332 properties could make it ideal for surrounding and sequestering pathogenic agents in the brain

If herpes causes AD, then we'd expect to see evidence in epidemiological datasets. Both herpes infection and periodontitis appear to be associated with AD risk. Further, antiherpetic medications appear to reduce the risk of AD. A lot more could be done here with a large database of phenotypic information, like UK biobank...

Relatedly, a Bay Area company, Cortexyme, recently raised $76M to pursue an AD drug against a bacterial protease found in plaques.

Recent news

So what about the recent trial results? There were two major trials with new results this year: Merck's BACE1 inhibitor, verubecestat, and Biogen/Eisai's anti-Aβ42 antibody, BAN2401. Meanwhile the trial design for Biogen's aducanumab is being tweaked — not a good sign generally — and there should be new data on that later this year.


After failing a Phase III in 2017 verubecestart had more bad news last month:

Treatment with verubecestat reduced the concentration of Aβ-40 and Aβ-42 in cerebrospinal fluid by 63 to 81%, which confirms that the drug had the intended action of reducing Aβ production. In the PET amyloid substudy, treatment with verubecestat reduced total brain amyloid load by a modest amount; the mean standardized uptake value ratio was reduced from 0.87 at baseline to 0.83 at week 78 in the 40-mg group. These results suggest that lowering Aβ in the cerebrospinal fluid is associated with some reduction in brain amyloid.

Notably, despite the drug working as intended, the reduction in brain amyloid was minimal. Hence, some people claim that amyloid removal has not been tested: gc tweet

Merck's Aβ42 antibody, BAN2401

New Phase II results for Merck's soluble protofibril antibody, BAN2401, were just released in July 2018. The results were hotly disputed because while the Bayesian analysis failed to show an effect, an alternative p-value based analysis (ANCOVA) showed positive results. I don't know exactly what the differences between the analyses were, but generally you would hope for agreement between the two, unless the effect was pretty marginal or just not real. The data pulled out in the tweet below shows how strange this situation is. Merck tweet

Given the ambivalent nature of the result, naturally some saw it as positive news, since there was at least something, while skeptics saw the opposite.


Aducanumab often seems like the most promising anti-Aβ antibody, and maybe the last chance for anti-Aβ antibodies to prove themselves. Back in 2015, Aducanumab showed some promising Phase Ib results. (I even wrote about it).

“They’re the most striking data we have seen with anything, period,” says [an AD trialist]

However, since then the many related trial failures, plus Biogen changing the trial design due to "variability", have left many people pessimistic. Perhaps BAN2401's recent results, however unsatisfying, show that an Aβ inhibitor is not just doomed to show no effect.


There doesn't actually seem to be much controversy about whether amyloid has a role in Alzheimer's; the genetic evidence is especially hard to dispute. I think the disagreement is more whether reducing Aβ plaques (or oligomers) can treat or prevent Alzheimer's. If the plaque is protective, then it's possible that reducing plaque may even worsen the disease.

There are also still plenty of unanswered disease mechanism questions, like whether it's oligomers or plaques that are causative, how Tau tangles cause neuronal death, and how tangles spread from neuron to neuron. Also, a 2018 paper suggests that Tau's function is the opposite of what we thought: instead of stabilizing microtubules, it keeps them dynamic.

One obvious question is why are there not more Tau-based drugs? Tau pathology is not a new idea and Tau's causal relationship with dementia is one of the least controversial parts of the AD story. In fact, there are now at least five Phase I trials underway, so these drugs might just be lagging behind Aβ42 antibodies by a few years. Certainly, Tau tangles being intracellular and in the brain makes drug development more complicated.

"Anti-tau antibodies don’t enter neurons and they don’t bind intracellular tau. We’ve invested a lot of careful rigorous work to try and understand this and I hope that the field will agree that we can put to rest that question"

(Crossing the blood-brain barrier is a problem for almost all AD drugs and especially antibodies — an interesting rule of thumb is that about 0.1% of antibody gets into the brain.)

Despite all the failures, I think the story is coming together and I'm pretty optimistic. We haven't actually tried that many ways of attacking the disease. I think that reducing plaque and/or oligomers very early could still work — mainly because we have seen the "drug" APP A673T working — meanwhile, reducing Tau tangles is arguably the most promising avenue of intervention, and it is yet to be properly tested.

Brian Naughton // Tue 17 October 2017 // Filed under genomics // Tags bioinformatics genomics programming

This small project started when I was looking for an implementation of Needleman-Wunsch (pairwise global DNA alignment) in javascript. I just wanted to align two sequences on a website and in a google sheet (using Google Apps Script).

I looked around for a simple javascript implementation (preferably one self-contained javascript file) but I was surprised that I couldn't find one. Needleman-Wunsch is a pretty simple algorithm so I decided to implement it. I did get a little bit side-tracked...

The first step was to find someone else's implementation to copy, so I started with some numpy code from @brent_p. Based on his other work, I think it's a safe assumption it's implemented correctly. (There is also a complete Cython version in this repo, which implements gap_extend and other parameters, and is obviously much faster. I really just need a very basic alignment so the simpler numpy version is fine for me).

numpy and friends

There are lots of ways to tweak the numpy implementation of Needleman-Wunsch to try to make it faster. Here are the things I tried:

  1. orig: the original numpy implementation.
  2. orig3: the original numpy implementation run with python 3.
    This is just to test how much faster or slower Python 3.6 is than 2.7.
  3. numpy: my numpy implementation.
    This is like the original numpy code, but modified a bit to make it more like my code.
  4. numba: my numpy implementation, but with numba applied.
    Numba is a pretty amazing JIT compiler you can turn on by adding one line of code. It comes with anaconda, and it's always worth trying just in case.
  5. torch: my numpy implementation, but with numpy replaced with PyTorch.
    PyTorch seems like a friendly alternative to TensorFlow, especially with its numpy-like syntax. Without explicitly applying .cuda() to my arrays it just uses the CPU, so it should not be too different to regular numpy.
  6. torchcuda: my numpy implementation, but with numpy replaced with PyTorch, and .cuda() applied to each array.
    The same as torch except using the GPU.
  7. cupy: my numpy implementation, but with numpy replaced with CuPy.
    CuPy is a drop-in replacement for numpy and, like PyTorch, only requires changing a couple of lines.


Nim is an interesting language that can compile to C (nimc) or javascript (nimjs). I thought this was a pretty good use-case for nim since I need javascript but writing scientific code in javascript is not fun. I started with a numpy-like library called arraymancer, which worked well, but since it relies on BLAS it would not compile to javascript (I could have checked that earlier...) Luckily, changing the code to simple for loops was pretty easy. Nim's syntax is a lot like Python, with some unnecessary differences like using echo instead of print. As someone used to Python, I didn't find it to be as friendly as I expected. The dream of Python-with-static-types is still a dream...


Finally, I just programmed the alignment in javascript (js). All of the implementations are almost line-for-line identical, so this did not take long.

Speed comparison

I ran all the above implementations on some random DNA of various lengths and the results are plotted below.

import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
data = {'orig': {500: 2.23, 1000: 8.50, 1500: 18.96, 2000: 33.45, 2500: 52.70, 3000: 76.44, 5000: 209.90}, 
        'orig3': {500: 2.62, 1000: 10.34, 1500: 22.68, 2000: 40.22, 2500: 62.03, 3000: 90.15, 5000: 248.93}, 
        'numpy': {500: 1.54, 1000: 3.41, 1500: 6.27, 2000: 10.60, 2500: 16.11, 3000: 22.87, 5000: 67.45}, 
        'numba': {500: 5.54, 1000: 7.05, 1500: 9.28, 2000: 13.08, 2500: 17.40, 3000: 22.21, 5000: 56.69}, 
        'torch': {500: 1.67, 1000: 3.92, 1500: 7.61, 2000: 12.86, 2500: 19.55, 3000: 27.48, 5000: 82.90}, 
        'torchcuda': {500: 8.54, 1000: 22.47, 1500: 46.26, 2000: 80.12, 2500: 119.92, 3000: 169.95, 5000: 467.04}, 
        'cupy': {500: 35.71, 1000: 138.86, 1500: 951.97, 2000: 1713.57, 2500: 2660.11, 3000: 3798.51}, 
        'nimc': {500: 0.016, 1000: 0.041, 1500: 0.08, 2000: 0.13, 2500: 0.20, 3000: 0.31, 5000: 0.85}, 
        'nimjs': {500: 0.14, 1000: 0.28, 1500: 0.48, 2000: 0.75, 2500: 1.12, 3000: 1.53, 5000: 4.06}, 
        'js': {500: 0.09, 1000: 0.14, 1500: 0.20, 2000: 0.34, 2500: 0.41, 3000: 0.82, 5000: 1.64}}
vfast_ones = ["nimjs", "js", "nimc"]
fast_ones = ["torch", "numpy", "numba"] + vfast_ones
ok_ones = ["torchcuda", "orig3", "orig"] + fast_ones
df = pd.DataFrame(data)
cupy js nimc nimjs numba numpy orig orig3 torch torchcuda
500 35.71 0.09 0.01 0.13 5.54 1.54 2.23 2.62 1.67 8.54
1000 138.86 0.13 0.04 0.28 7.05 3.41 8.50 10.34 3.92 22.47
1500 951.97 0.20 0.08 0.48 9.28 6.27 18.96 22.68 7.61 46.26
2000 1713.57 0.34 0.13 0.75 13.08 10.60 33.45 40.22 12.86 80.12
2500 2660.11 0.41 0.20 1.12 17.49 16.11 52.70 62.03 19.55 119.92
3000 3798.51 0.82 0.31 1.53 22.21 22.87 76.44 90.15 27.48 169.95
5000 NaN 1.64 0.85 4.06 56.69 67.45 209.90 248.93 82.90 467.04

I'll skip cupy since it's much slower than everything else and throws the plots off. That doesn't imply anything negative about cupy and I'd use it again. It's extremely easy to replace numpy with cupy, and for properly vectorized code I'm sure it's much faster than numpy.

f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fast: everything except cupy")
_ = df[ok_ones].plot(ax=ax)


f,ax = plt.subplots(figsize=(16,12))
ax.set_title("faster: numpy vs C vs js")
_ = df[fast_ones].plot(ax=ax)


f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fastest: C vs js")
_ = df[vfast_ones].plot(ax=ax)



I learned some interesting things here...

  • numba is good. It didn't speed this code up very much, but it was a bit faster than numpy for large alignments and didn't cost anything. I expected this to be the fastest Python-based code because there are several Python for loops (i.e., unvectorized code), which is where numba can help a lot.

  • I'm not sure why my numpy is faster than the original numpy since my changes were minimal. The original version is not coded for speed anyway.

  • GPUs don't help unless your code is written for GPUs. That basically means one repetitive task handed off to the GPU along with the data (no back-and-forth). There are ways to implement Needleman-Wunsch in a GPU-friendly way, but it complicates the code a lot. On the one hand this is a very obvious result to anyone who has used GPUs for computation — on the other hand, maybe a really smart compiler could use the CPU where appropriate and GPU where appropriate...

  • Nim is a pretty interesting language. I have seen it described as either "an easier C" or a "statically typed Python". To me it's definitely more like the former. It's not all that friendly compared to Python, but I think I'd try it again as a C/Cython replacement. Don't forget to compile with -d:release.

  • Javascript is fast! If nim is not compiled with -d:release it's even faster than nim's C code. Sadly, Google Apps Scripts' javascript is extremely slow for some reason. That was an unfortunate surprise, especially since it times out after about five minutes, so long alignments just fail! I can't explain why it's so slow...

Finally, just to note that this implementation is good enough for my purposes, but I haven't really spent any time making sure it works in all situations (apart from affirming that its output is the same as the original numpy code), so I wouldn't trust it too much. The code is available in this repo.

Brian Naughton // Fri 22 September 2017 // Filed under stats // Tags stats probability bayesianism maxent
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import animation, rc
from IPython.display import HTML, Image

rc('animation', html='html5')'ggplot')
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

Like many scientists, I've always been confused by and uncomfortable with the menagerie of available statistical tests, how they relate, and when to use them ("use a Chi-square test, unless n<5..."). It's hard to recover the logic underlying diagrams like the one below. Many of these tests rely on the data "being normal", but why? And how normal do they have to be?

statistical tests flowchart

Normal-looking distributions

As we learn, the normal — or Gaussian — results from the central limit theorem, when you add together random samples from the same distribution. There are some conditions, like the samples must be independent and the distribution must have finite variance.

If we know the underlying process producing the samples meets these conditions, then we can use the normal distribution with confidence. We also learn that even if the above conditions are not met, it's probably ok to just use the normal anyway (e.g., multipying values instead of adding, samples not from the same distribution), or we can apply a transformation to make the distribution "look normal". Then once we have a normal-looking distribution, we can use a standard \(t\)-test, etc. To many of us, this is not a hugely satisfying way to apply statistics, and it definitely does not feel very scientific.

Additionally, the normal doesn't seem particularly differentiated from other distributions. The binomial, Poisson, normal and t-distributions all look very similar... In fact, they can look very very similar, as we can see below. (Of course, the first three are members of the exponential family, so they are not unrelated, but I'll leave that aside.)

fig, ax = plt.subplots(1, 4, figsize=(16,5))

x = np.linspace(0, 30, 100)
_ = ax[0].plot(x, stats.norm(10, 3).pdf(x), 'r-', lw=3, alpha=0.6)

x = np.arange(30)
_ = ax[1].plot(x, stats.poisson(10).pmf(x), 'r-', lw=3, alpha=0.6)

x = np.arange(30)
_ = ax[2].plot(x, stats.binom(30, 1/3).pmf(x), 'r-', lw=3, alpha=0.6)

x = np.arange(30)
ax[3].set_title("t (df=10)")
_ = ax[3].plot(x, stats.t(10, 10, 3).pdf(x), 'r-', lw=3, alpha=0.6)


It turns out that these distributions are indeed all closely related, and you can derive them and understand how they are related using relatively simple (high school-ish level) maths. The trick is that you have to use the principle of "maximum entropy".

What is entropy?

I won't attempt to define entropy rigorously, since I am not a statistician and definitions are fiddly things, but I think it suffices to think of a maximum entropy distribution as the smoothest, most even distribution that still fits with some constraints. In other words, the distribution should not prefer any value in particular unless forced to do so to meet its constraints.

In the simplest example, the maximum entropy distribution bounded by two finite values is just a flat (uniform) distribution (the "principle of indifference").

You can calculate the entropy of a distribution using \(H(X) = -\sum_{k\geq1}p_k log(p_k)\); or for continuous distributions: \(H(X) = -\int_{-\infty}^{\infty}p(x) \log p(x)dx\). The concept of entropy is fundamental across several fields including information theory and statistical mechanics, so these formulas may look familiar.

Because we can measure the entropy of any distribution, we can define some constraints (e.g., a distribution has bounds 0 to 1, the mean is \(\mu\), etc), and derive the maximum entropy distribution given those constraints, using differentiation to find the maximum and Lagrange multipliers to enforce the constraints.

For a proper description and derivation, see Data Analysis, A Bayesian Tutorial by Sivia & Skilling (Section 5.3). That is where I first learned about this, and it is still the best treatment I have seen. Statistical Rethinking by McElreath (Section 9.1) also discusses this topic; Information Theory by MacKay (Section 22.13) (a free book) discusses it in passing, and I'm sure a lot of other books do too.

I will just give the gist, but hopefully it's near enough correct.

Deriving the normal distribution

To derive the normal distribution, we start with this formula (Sivia & Skilling, Section 5.3.2):

Sivia & Skilling, 5.3.2

It looks complicated, but it's really just the entropy formula with two Lagrange multipliers (and three constraints): the sum of \(p\)'s is \(1\) (because it's a probability distribution), the mean is \(\mu\), and the variance is \(\sigma^2\). If we take the derivative to find the maximum entropy — by setting \({\partial}Q/{\partial}p = 0\) — out pops the normal distribution in just a couple of steps (see the book for details!)

So, the normal distribution is the "fewest-assumptions" distribution there is if you have only a mean and a variance, and that is partially why it's so broadly applicable! That also explains (I think) why it's generally ok to use the normal distribution even when its conditions (i.i.d. samples, finite variance, etc) have not been met. It's about as general a distribution as you can get.

Other distributions

Sivia derives a few other fundamental distributions using maximum entropy:

  • Exponential: this is derived identically to the normal, but with no variance constraint (just the mean).
  • Binomial: this is derived using similar methods, but starting with the formula for \({n \choose k}\).
  • Poisson: this is derived similarly to the binomial, though it also requires a Taylor expansion.

Sivia also draws some interesting connections between the distributions:

  • The binomial can be approximated by the Poisson, for large \(N\) and small \(p\).
  • The Poisson can be approximated by the normal, for large \(\mu\), using Stirling's approximation.

I think this sums up pretty nicely how these common distributions are related and why they all look so similar.

Which distributions correspond to which constraints?

There is a nice wikipedia article on maximum entropy probability distributions, with a table of twenty or so distributions and the associated constraints (below are just a subset):

maximum entropy distributions

Interestingly, sometimes apparently simple constraints (e.g., bounds are \(a\) to \(b\), mean is \(\mu\)) can produce complicated answers.

The fat-tailed \(t\)-distribution

The \(t\)-distribution, of \(t\)-test fame, is not covered in the above section of Sivia & Skilling, though it can be derived using maximum entropy too. One description of the \(t\)-distribution that shows its connection to the normal is that it represents samples from a normal distribution of uncertain variance (hence its common description as "like the normal distribution with fatter tails"). As the number of degrees of freedom grows, the \(t\)-distribution approaches the normal distribution. At \(\geq10\) degrees of freedom, the \(t\) and normal distributions are difficult to distinguish (see my plots at the start of this post).

Another interesting thing about the \(t\)-distribution versus the normal is that MacKay claims that the normal distribution is not really a good "real world" distribution, since its tails are too light (Information Theory, Section 23.2). For example, fatter tails are more forgiving to outliers, and the natural world has a lot of outliers: height is almost the canonical example of a normally distributed trait, but dwarfism and gigantism produce heights more standard deviations from the mean than a normal distribution expects.

For any natural data, it might actually be better overall to use a \(t\)-distribution or other fatter tailed distribution. Of course, that makes it more difficult to solve Ordinary Least Squares and so on, but we have powerful computers now that can solve these things numerically, so maybe it should not matter? Similarly, when choosing priors, experts will often recommend fatter tailed distributions, like the \(t\)-distribution or the related half-Cauchy over the normal, especially for scale parameters (e.g., standard deviation).


Many of the common distributions we come across in science can be understood from the perspective of deriving the maximum entropy distribution subject to some constraints. Relatedly, some of the most fundamental distributions can be understood as approximations of each other under different conditions. This perspective on probability distributions was not at all obvious to me for a long time, but I wish probability and statistics were taught more in this style instead of the "cookbook" style.

With the growth in probabilistic programming (Stan, PyMC3, Edward), sometimes replacing standard statistical tests, it is becoming more important to think about which distributions really represent your data and why. It's not always normal.


It's not so difficult to prove that the maximum entropy distribution is uniform. Nevertheless here's a simulation. The highest entropy distribution from this simulation looks uniform, and the lowest entropy distribution looks very skewed.

min_ent, max_ent = np.inf, -np.inf
N = 11

for _ in range(10000):
    vals = np.zeros(N)
    for _ in range(500):
        vals[np.random.randint(N)] += 1
    vals /= sum(vals)

    ent = (-vals*np.log(vals)).sum()

    if ent > max_ent: 
        max_vals = vals[:]
        max_ent = ent
    if ent < min_ent: 
        min_vals = vals[:]
        min_ent = ent

fig, ax = plt.subplots(1, 2, figsize=(16,5))
ax[0].set_ylim([0, .2])
ax[1].set_ylim([0, .2])
ax[0].set_title("highest entropy ({:.2f})".format(max_ent))
ax[1].set_title("lowest entropy ({:.2f})".format(min_ent))

x = np.linspace(0, 10, N)
_ = ax[0].plot(x, max_vals)
_ = ax[1].plot(x, min_vals)


The beta distribution is also a maximum entropy distribution bounded by 0 and 1. It also starts to look pretty normal after a few samples. (Thanks to the blog for instructions on how to animate it).

ps = np.linspace(0, 1, 101)
n_start, k_start = 0, 0

def animate(i):
    if i > 90:   j = 0
    elif i > 50: j = 90 - i
    elif i > 40: j = 40
    else:        j = i

    n = j*2 + n_start
    k = j + k_start

    y = (ps**k)*((1-ps)**(n-k))
    y /= max(y)
    line.set_data(ps, y)
    line.set_label("heads:{:2d} flips:{:2d}".format(j,n))
    legend = plt.legend(prop={'family': 'monospace'})
    return (line,)

def init():
    line.set_data([], [])
    return (line,)

fig, ax = plt.subplots(figsize=(8,4))
ax.set_xlim((0, 1))
ax.set_ylim((0, 1.15))
line, = ax.plot([], [], lw=2)
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=100, interval=100, blit=True)
Brian Naughton // Mon 11 September 2017 // Filed under biotech // Tags biotech vc

A brief look at Y Combinator's biotech investments in 2017.

Read More
Brian Naughton // Tue 27 June 2017 // Filed under biotech // Tags biotech drug development

Some notes on drug development: classes of drug targets and therapies.

Read More
Brian Naughton // Mon 06 February 2017 // Filed under biotech // Tags biotech transcriptic snakemake synthetic biology

How to automate protein synthesis pipeline using transcriptic and snakemake

Read More
Brian Naughton // Thu 26 January 2017 // Filed under biotech // Tags biotech iolt nodemcu arduino

An internet-connected lab thermometer

Read More
Brian Naughton // Mon 10 October 2016 // Filed under genomics // Tags genomics nanopore

What's been going on with Oxford Nanopore in 2016

Read More
Brian Naughton // Sun 14 August 2016 // Filed under data // Tags data genomics statistics pymc3 nanopore

Estimating parameters for a toy nanopore simulator using a hierarchical Bayesian model in PyMC3

Read More
Brian Naughton // Sun 05 June 2016 // Filed under biotech // Tags data biotech genomics statistics

A review of interesting things in biotech, genomics, data analysis

Read More

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More