Brian Naughton // Mon 11 September 2017 // Filed under biotech // Tags biotech vc

Y Combinator recently announced that they want to do more biotech, specifically "health and synthetic biologies". This seems like a good thing in general, since there aren't too many incubators for biotech out there. IndieBio is the biggest, but they are completely focused on biotech, even providing lab space.

So what are YC actually investing in? Here are the biotech companies from the 2017 Winter and Summer batches (data from techcrunch: 1 2 3 4):

  • Darmiyan: Alzheimer's diagnostic
    We definitely need better Alzheimer's diagnostics, so this seems like a great thing. Usually these diagnostics are designed to help enroll prodromal patients in clinical trials (e.g., Avid), but this one seems to be for screening too, at least according to techcrunch. It's unclear to me what their technology is, though it appears to be MRI-based.
  • HelpWear: heart attack-detecting wearable
    This watch monitors heart palpitations, arrhythmias and heart attacks. I'd buy one if I were at risk... It will enter clinical trials in the "near future."
  • Oncobox: cancer genetic test
    This test appears to match cancer drugs to patients, so I guess it's similar to Foundation Medicine, though with "full DNA and RNA profiles". Usually, it's pretty expensive to do anything novel like this, since you have to convince doctors that it makes sense. So you need to fund a trial, or ten.
  • Forever Labs: autologous stem cells
    Pretty cool, but reminiscent of the whole cord blood thing. It also reminds me of the companies you can send your surplus teeth to (there are several!), which is an odd, but noninvasive, way to get stem cells from kids.
  • Cambridge Cancer Genomics: "next gen liquid biopsy, AI, smart genomics"
    This is quite a few buzzwords, especially for a British company. They say they are "applying our proprietary analysis to a tumour's genomic features", to help guide treatment. The team certainly looks solid, but similarly to Oncobox, they will likely need a bunch of money to prove this works.
  • Modern fertility: at-home fertility test
    These guys appear to be packaging a panel of useful fertility tests for home use. I don't think they have to invent anything new here, which is probably a good thing. It's "physician-ordered", which avoids FDA involvement (and is also a good phrase to search for in "DTC" genetic tests...)
  • BillionToOne: NIPT for developing countries
    (Disclaimer: I know these guys). This one makes sense to me. NIPT is a great technology that should be brought to every country.
  • PreDxion: blood test for the ER
    This appears to be a POC blood test. I don't doubt the need for new tests like this, but they'll need to go through FDA, which is a long road.
  • Clear Genetics: automated genetic counseling
    Automating genetic counseling is necessary because there are only a couple of thousand genetic counselors in the US and the tests are getting more common and more complex. It's a bit hard to believe genetic counseling is a $5B market in the US though, since that implies $2M+ revenue per genetic counselor.
  • Delee: a circulating tumor cell diagnostic
    I thought CTCs were done after On-Q-Ity but it's been a while and there's probably something new and interesting to do here. They've completed a small trial already, which is great.
  • AlemHealth: radiology imaging diagnostic
    Like BillionToOne, AlemHealth is bringing a known-useful technology to emerging markets. Perhaps surprisingly, I believe this kind of global radiology outsourcing is already common in the US (one large healthcare system sends images to Brazil and India, apparently).
  • Indee: CRISPR research tool
    This is apparently for "developing and manufacturing" cell therapies by gene editing. It sounds novel, cell therapies are here for the long-term, and there are dozens of CAR-T companies around to pay for it, so it could be cool.
  • InnaMed: home blood-testing device
    This seems to be a cartridge-based blood diagnostic, kind of like Cepheid, except for home use. I don't know if doctors like to bill for these visits or not, which can kill adoption, but if InnaMed can pull it off it seems like a great thing... Like PreDxion, they'll need FDA clearance.
  • Volt Health: electrical stimulation medical device
    This seems to be a topical neurostimulation device to treat incontinence. It looks like one of those electrical muscle stimulation belts for your abs. I instinctively like this because incontinence is one of those massive problems nobody wants to work on. Maybe the trial will be inexpensive too, since the risks seem low.

My sole criterion for a company to make the list was that it may involve FDA, CLIA/CAP, or similar. There may be omissions! — I just skimmed through the techcrunch articles.

In 2015, 18 out of 108 YC companies were labeled "biomedical". My criteria are stricter, and I count 14 out of 210 companies from 2017 as "biotech". This appears to be lower than — or at least not significantly higher than — the number in 2015, somewhat contradicting the YC quote.

Interestingly, the 2015 batch were also much more computation- and therapeutics-focused, including 20n, atomwise, Transcriptic and Notable Labs. Synbio leader Ginkgo was funded in 2014. 11 of the 14 biotechs from 2017 are diagnostics (including Clear Genetics), one is a therapy / medical device (Volt), one a research tool (Indee), and one a service I can't classify (Forever Labs).

It's a curious set of companies, surprisingly light on computation- or data-driven companies, which you'd think would be YC's strength. Also notably, I don't see any synthetic biology companies (arguably Indee?) and only Volt Health is therapeutic. By contrast, IndieBio has many (at least half?) synthetic biology companies, and several therapeutics.

Because the list is so diagnostics-focused, many of the companies on this list will need expensive trials to properly enter their markets. Perhaps the YC program is setting them up for a raise from traditional biotech VCs? I don't yet see what YC wants to do in biotech, but their statement about doing more is only a few months old, so it will be interesting to see what happens in 2018.

Brian Naughton // Tue 27 June 2017 // Filed under biotech // Tags biotech drug development

I looked around for a broad review of this area, discussing classes of drug targets and therapies, but didn't really find anything. These are some notes I made in an effort to help understand the landscape.

Drug target types

Every drug has a "target", a biomolecule that the drug affects in humans. There are only a few options for targets.

  • Underexpressed protein
    Sometimes, a protein is lacking and needs to be upregulated or replaced. The solution, for extracellular proteins, can be as simple as injecting the missing protein. This may sound like it's an obscure element of drug development, but it's basically the foundation of biotech. Genzyme started out replacing the missing enzymes in Gaucher and Fabry disease; Genentech got their start with recombinant insulin, an underexpressed peptide; Amgen's first big drug was recombinant G-CSF (neupogen).

  • Overexpressed protein
    Sometimes, you are making too much protein and you need to dial it down. Many cancer targets are like this (e.g., HER2 amplification), though some are mutated too. This is one of the easiest scenarios for drug development, because all you have to do is knock the protein down with high specificity. It's much easier to break something than fix it.

  • Malfunctioning protein
    If you have a genetic disease, there's a good chance it's due to an absent or malfunctioning protein. Cystic fibrosis has over a thousand known causative mutations and most of these mutations cause misfolding. How do you fix a misfolded protein? It's always difficult and sometimes impossible. Vertex's CF drugs represent one unusual success story. Their latest drug, Orkambi, is actually two drugs in one: one that increases CFTR's activity (Kalydeco, a drug originally for the G551D mutation only), and the other helps the mutated CFTR fold properly.

  • Non-proteins
    There are things in the body besides proteins, for example, hormones, lipids, DNA. These are much less common targets for therapies, though amazingly gene therapy is starting to become feasible. (There are now two human gene therapies approved in Europe, and several veterinary gene therapies). According to a 2016 review in NRDD, FDA-approved drugs targeted non-proteins in only 28/695 cases.

Choice of Therapy

There are a plethora of options these days.

  • Small molecules
    • Chemical
      This is what most people mean by a drug. You use chemistry to make them: perhaps you have a large library of randomly generated chemicals that you screen against a target, or perhaps you design your drug and simulate target binding computationally. Computer-aided drug design (CADD) works sometimes and is improving quickly, partially thanks to deep learning.
    • Biological
      Unlike chemically-derived small molecules, you don't create these, you find them in organisms in the soil, the ocean, etc. These molecules are generally more complex structurally than regular small molecules, since they've had millions of years to evolve that complexity. Many drugs come from natural sources, especially antibiotics; the main issue is how to find them. We have already found much of the low-hanging fruit, like penicillin, many times over.
    • DNA-encoded
      It's obvious that you can use DNA to encode and evolve proteins. Less obviously, you can encode small molecules too, by chemically fusing DNA barcodes to your chemicals. These libraries are interesting because just like proteins, you can evolve them, but you retain the flexible chemistry of small molecules. DiCE Molecules, which launched last year, is doing this.
  • Proteins
    • Recombinant
      You can make almost any protein recombinantly in a microbe, then use that to replace a missing (extracellular) enzyme. Proteins are big, and get degraded quickly in the gut by proteases, so you usually have to inject them. (If ingesting proteins had drug-like effects, then eating would be more hazardous!) They are also usually too large to enter cells efficiently, further limiting their use as drugs.
    • Antibodies
      Antibodies are proteins, but with a specific template for interfacing with the immune system. They are very large and cannot enter cells, so they only target cell-surface or extracellular targets. That works extremely well for the 10-20% of proteins that are accessible that way. Antibodies are our closest thing to a magic bullet, and are among the most successful drugs of all time. You can evolve antibodies in mice, use phage display, or identify them in humans. There are several varieties: antibody fragments, nanobodies, BiTEs, etc.
    • Peptides
      Peptides are basically just short proteins. They are interesting potential drugs because they exhibit some of the properties of small molecules (small size, sometimes able to enter cells) and some of the properties of proteins (safe, easy to synthesize). Peptides are still rapidly degraded in the gut, so most are injected. Some of the limitations of peptides being proteins can be overcome by using peptidomimetics like D-peptides.
    • Peptides (DNA-encoded)
      Peptides are generally encoded by DNA anyway, but there is a nice way to evolve a library of peptides, analogous to DNA-encoded small molecules: the decades-old-but-still-futuristic phage display. You can use phage display to select for antibody fragment binding too.
    • Protein + nucleotide
      CRISPR/Cas9 is a gene editing technology combining an enzyme that cuts DNA (Cas9) and an guide RNA that defines the target. Cas9 is a sophisticated enzyme, so it is quite large and cannot enter cells unaided. That means to make Cas9 into a real therapy you need a way to deliver it. Older gene editing technologies like TALENs and Zinc Fingers also work well if they can be delivered, though they are much harder to program than Cas9.
    • Protein + small molecule
      Antibody-drug conjugates promise to combine the advantages of antibodies (targeting cells with high specificity) with small molecules (ability to enter cells). The canonical example is an antibody designed to find a cancer cell, then deliver an attached "warhead" small molecule to kill it. Stemcentrx is one of many companies working on ADCs.
  • Nucleotides
    • mRNA
      Instead of delivering proteins to cells, why not just deliver the instructions to make the protein? Unfortunately, just like proteins get degraded by proteases, RNA gets degraded by nucleases, so again it comes down to delivery. Like proteins, the body tolerates nucleotides very well. Moderna is developing several mRNA-based therapies.
    • Antisense RNA
      Antisense RNA binds to mRNA and interferes with translation. There have been two successful antisense RNA therapies recently: one for Crohn's (mongersen, which can be orally administered, since Crohn's is a disease of the gut), and one for SMA (nusinersen, which is injected into the CSF). Despite not being the coolest technology around, antisense RNA has had some amazing successes.
    • Aptamers
      Like antibodies, aptamers are evolved to bind their targets with high specificity, but instead of amino acids, they are made from DNA or RNA. Aptamers can be evolved using a simple process called SELEX (unfortunately IP-encumbered). As nucleotides, aptamers are degraded quickly and require special delivery mechanisms. The only approved aptamer therapy, Macugen, is for AMD and it's injected directly into the eye.
    • RNAi
      RNAi encompasses a few related systems that are difficult for me to differentiate: siRNA, shRNA, miRNA, piRNA. RNA interference is an extremely effective tool for modifying cell lines, but so far has been less effective as a therapy. Alnylam has several RNAi therapies in development.
    • Raw DNA/RNA
      Raw DNA or RNA with the right sequence will do gene editing at low frequency, even without a proper vector. You can also inject complete plasmids that get transcribed and translated (assuming some accompanying stress like electroporation). This is called a DNA vaccine, which is conceptually quite similar to an mRNA, except that DNA vaccines are injected with an adjuvant like Freund's adjuvant, to elicit an immune response to the resultant protein.
  • Cells
    Cell-based therapies are becoming increasingly important, especially for cancer treatment. In cancer, the idea is to train the immune system (usually T-cells) to attack cancer cells more efficiently. Since the vast majority of cancers are quashed by the immune system anyway, this makes a lot of sense. CAR-Ts — perhaps the best-known cell therapy for cancer — are T-cells that have been extracted and forced to express a cell-surface antibody fragment with specificity for a cancer, which forces the T-cell to engage the cancer. There's a lot more going on in cell-based therapy, like stem cell therapies, but I don't know too much about this.


The big advantage of small molecules over proteins and nucleotides — and a watchword for most of the novel therapies above — is delivery: small molecules can get across the gut and penetrate the cell membrane to bind with targets inside the cell. Proteins and nucleotides are degraded quickly, especially in the gut, and proteins are usually way too large to enter cells anyway.

There are exceptions to this rule, cases when cells are more accessible: for example, the eye is easier to access than most organs (e.g., the aptamer therapy, Macugen); if your target is in the gut then you may be able to deliver it orally (e.g., the antisense RNA therapy, mongersen); if you can apply your therapy ex vivo (especially retransplantable tissue like liver or bone marrow) then you no longer need a delivery vector (e.g., Provenge and other adoptive cell transfers, gene editing in embryos). Examining the pipelines of the new batch of gene editing and RNA therapy biotechs, we see that the diseases they tackle are very often in the bone marrow, liver or eye.

If your target is inside a cell, the cell cannot be extracted for treatment ex vivo, and you can't develop a good small molecule, then what can you do? Many of the modern protein or nucleotide-based therapeutics listed above need some some additional help to protect them from degradation, localize them to target cells, and penetrate the cell membrane. It could be argued that delivery technology has lagged behind drug technology generally. There are still no great options for delivery.

  • Viruses
    Viruses, especially AAV, are especially useful for nucleotide therapies like gene therapy. This makes sense when you consider that viruses are nanomachines, evolved over billions of years to target cells and inject nucleotides. However, since viruses are often immunogenic, there can be side-effects.
  • Nanoparticles / Liposomes
    In rare cases, liposomes have been used to deliver drugs, like cisplatin. We were recently surprised to learn that Moderna, the mRNA biotech, is using lipid-nanoparticles to deliver mRNA to cells. Virus-like particles (essentially noninfectious viruses) may combine the best of both worlds.

For a more detailed discussion on delivery methods, there's a good recent review in NRDD.

Brian Naughton // Mon 06 February 2017 // Filed under biotech // Tags biotech transcriptic snakemake synthetic biology

In a previous blogpost I described a pipeline for synthesizing arbitrary proteins on the transcriptic robotic lab platform using only Python code. The ultimate goal of that project was to be able to run a program that takes a protein sequence as input, and "returns" a tube of bacteria expressing that protein. Here I'll describe some progress towards that goal.

pipeline diagram


The usual way to chain together different programs in bioinformatics is with a pipeline management system, for example, snakemake, nextflow, toil, WDL, and many many more. I've recently become a big fan of nextflow for computational pipelines, but its major advantages (e.g., containerization) don't help much here because so much of the work happens outside of the computer. For this project I've been using the slightly simpler snakemake, mainly for tracking which steps have been completed, and deciding which steps can be run in parallel based on their dependencies.

Each protocol has four associated steps in the pipeline:

  • generate protocol: create an autoprotocol file describing the protocol
  • submit protocol: submit the autoprotocol file to transcriptic
  • get results: download images, data, etc. from transcriptic
  • create report: create a HTML report from the downloaded data
snakemake pipeline

snakemake pipeline for protein synthesis


In my terminology, a "metaprotocol" defines the complete process, which is turned into a series of protocols. Ideally, the output of a single protocol will be a decision point: for example, whether or not a gel image includes the expected bands.

The metaprotocol is defined in yaml, which has its issues, but is more readable than json, and well supported. This code depends heavily on Pydna, a Python package for cloning and assembly. Given an insert and a vector, Pydna will design primers and a PCR program. The following is my metaprotocol yaml for expressing GFP:

- meta:
    assembly: |-
      Sequences........................: [2690] [786]
      Sequences with shared homologies.: [2690] [786]
      Homology limit (bp)..............: 25
      Number of overlaps...............: 2
      Nodes in graph(incl. 5' & 3')....: 4
      Only terminal overlaps...........: No
      Circular products................: [3412]
      Linear products..................: [3446] [3442] [34] [30]
    assembly_figure: |2-
      |            \/
      |            /\
      |            31|786bp_PCR_prod|30
      |                              \/
      |                              /\
      |                              30-
      |                                 |
    metaprotocol_id: 1k9ginus
    pcr_figure: |2-
                                                                ||||||||||||||||||||||||||||||| tm 59.8 (dbd) 70.6
                                     |||||||||||||||||||||||| tm 62.1 (dbd) 69.3
    pcr_program: |2

      Pfu-Sso7d (rate 15s/kb)
      Two-step|    30 cycles |      |786bp
      98.0°C  |98.0C         |      |Tm formula: Pydna tmbresluc
      _____ __|_____         |      |SaltC 50mM
      00min30s|10s  \        |      |Primer1C 1.0µM
              |      \ 72.0°C|72.0°C|Primer2C 1.0µM
              |       \______|______|GC 49%
              |       0min11s|10min |4-12°C
    project_name: pUC19_sfGFP_cloning_v1
- linearize:
    restriction_enzyme: EcoRI
    vector: pUC19
- oligosynthesize:
- thermocycle:
      extension_time: 11.0
      forward_primer_concentration: 0.001
      rate: 15.0
      reverse_primer_concentration: 0.001
      saltc: 50.0
      ta: 72.0
- assemble:
    insert: sfGFP
    vector: pUC19

DNA synthesis

Of course, before you can run this pipeline, you need to have the appropriate insert DNA in your transcriptic inventory. As far as I know, none of the major synthetic DNA suppliers has an API. However, you can order DNA from IDT by filling in an excel file. I have automated filling in and emailing this file, so DNA synthesis can be included in the pipeline too! It should take about a week from ordering for DNA to appear at transcriptic.


After each protocol finishes, a HTML report is generated. This allows the user to evaluate protocol results manually before initiating the next step. There are ways to automate this more, like using automated band mapping of gel images, but I think that kind of thing will work better once the transcriptic API settles down a bit. The HTML report also serves as a log of the experiment.

cut_plasmid FINISHED

cut_plasmid FINISHED

 Submitted at UTC 2016-08-20 19:42:48
   Started at UTC 2016-08-20 22:52:06
 Completed at UTC 2016-08-21 01:28:49
Ran report at UTC 2016-10-26 22:15:03

Expected DNA bands of size: 2686bp

synthesize_primers <strong>FINISHED</strong>

synthesize_primers FINISHED

 Submitted at UTC 2016-08-23 19:35:37
   Started at UTC 2016-08-23 20:00:11
 Completed at UTC 2016-08-24 20:01:03
Ran report at UTC 2016-10-26 22:15:44

Synthesized primers:


add_flanks FINISHED

add_flanks FINISHED

 Submitted at UTC 2016-10-08 15:12:38
   Started at UTC 2016-10-10 22:22:16
 Completed at UTC 2016-10-11 01:04:12
Ran report at UTC 2016-11-17 15:36:02

PCR program

Pfu-Sso7d (rate 15s/kb)
Two-step|    30 cycles |      |786bp
98.0°C  |98.0C         |      |Tm formula: Pydna tmbresluc
_____ __|_____         |      |SaltC 50mM
00min30s|10s  \        |      |Primer1C 1.0µM
        |      \ 72.0°C|72.0°C|Primer2C 1.0µM
        |       \______|______|GC 49%
        |       0min11s|10min |4-12°C
run_gibson_and_transform FINISHED

run_gibson_and_transform FINISHED

 Submitted at UTC 2016-10-24 23:10:06
   Started at UTC 2016-10-28 17:45:56
 Completed at UTC 2016-10-29 17:01:26
Ran report at UTC 2016-10-30 15:12:18