Oxford Nanopore in 2016
Many genomics people, especially in the US, are still unfamiliar with Oxford Nanopore's MinION sequencer. I was lucky enough to join their early access program last year, so I've been using it for a while. In that time I've become more and more excited about its potential. In fact, I think it's the most exciting thing to happen in genomics in a long time. I'll try to explain why.
MinION vs Illumina
The MinION is a tiny little sequencer that has some serious advantages over Illumina sequencers:
- it's very portable (see the photos!) and doesn't require any special equipment to run
- it's simple to run: there's a 10 minute prep with just a couple of pipetting steps
- the sequencer itself is essentially free, with a cost of $500-900 per flow-cell (which can be reused several times).
- the reads are very long, about as long as the input DNA (100kb is not unusual)
- it's a single molecule sequencer, so you can detect per molecule variation, including base modifications (this is still low accuracy though)
- it can read RNA directly, giving you full-length transcripts
- the turnaround time is very quick: you can generate tens to hundreds of megabases of data in an afternoon
- data analysis is easier than for short-read sequencers, since alignment and assembly are simpler. You may not even really need any alignment if you are sequencing a plasmid or insert.
- the data arrives per read instead of per base: so in one hour you can have thousands of long reads (as opposed to Illumina, where you'd have millions of partial reads, each only a few bases)
- seeing reads appear in real-time is amazing and you can literally pull the USB plug when you have enough data
There are also two big disadvantages:
- its accuracy is at least an order of magnitude worse than Illumina (~90% vs >99%)
- its per base cost is at least an order of magnitude higher than an Illumina HiSeq ($0.5/Mbase vs <$0.02/Mbase) and 2–10X more expensive than a MiSeq. Of course, these numbers are rough and in flux. For example, a HiSeq or MiSeq will require a service contract that could be $20k/yr — the cost of an Illumina run is highly volume-dependent.
Something that's not often discussed is the error rate of short-read sequencers. On a per base level they are extremely accurate, but incorrectly determined structural variants are also errors. In a human genome a miscalled 3Mb inversion could by itself be considered a 0.1% error rate. and there are lots of structural variants in humans. Unlike incorrect base-calls, it is often impossible to overcome this issue with greater read-depth.
Despite these advantages, many scientists remain skeptical of the MinION. There are probably two things going on here: (a) Oxford has consistently overpromised since announcing in 2012; (b) the MinION only started to be really competitive in the past few months, so there is a lag.
About six months ago, you could expect to get about 500Mb of DNA from a flow-cell, with each pore reading at 70 bases/second and accuracy of 70-80% (at least in our novice hands).
Earlier this year, Oxford made two important changes that improved performance: they updated their pore from an unspecified pore ("R7", which was tangled up in a patent dispute with Illumina) to an E. coli pore ("R9"), which has both better throughput and better accuracy than R7. At the same time, they updated their base-calling algorithm to a deep learning-based method, further improving accuracy.
They are still incrementally improving R9, and are already on version R9.4. At the time of writing, this version is currently only in the hands of the inner circle of nanoporati, but luckily they are all on Twitter so we can get a pretty good sense of how well it works. People are reporting excellent results, with runs of over 5Gb at the new R9 speed of 240 bases/second (this should be 500 bases/second soon, apparently with no loss in accuracy). Accuracy is also up, with 1D reads perhaps even edging over 90% in experienced hands.
So, compared to six months ago, you are probably getting 5-10 times as much data with half the error rate.
OK, what can I do with one of these gizmos?
The stats are definitely exciting, but I don't think they really capture why I think the MinION is so interesting. The MinION has several key areas where it can do some damage, and other areas where it opens up new possibilities.
sequencing microbial genomes de novo
This is very doable. I wouldn't say it's easy yet, but long reads negate a lot of the computational problems of de novo assembly: finding overlapping 10,000mers is a very different problem to finding overlapping 100mers.
infectious agent detection
Once you have prepped DNA, which takes from 10 minutes (with the "rapid" kit) to two hours, the actual process of detecting a pathogen could be under ten minutes. In practice I don't think anybody is going from blood sample to diagnosis this quickly, but the potential is there.
There is even software (Mykrobe) that detects drug-resistance genes in bacteria, and recommends appropriate antibiotics. When this is done cheaply and routinely it should help a lot with drug resistance and overprescription of antibiotics.
Since the data comes in one read at a time, as soon as you get one read from the infectious agent you are done.
direct RNA sequencing
If you want to read full-length transcripts, and see base modifications too, then the MinION is the only option that I know of. This capability is new, and the base modification detection is not accurate, but there's still plenty of interesting research to do with this.
Sequencing often requires barcoding, which adds fiddly extra steps before and after sequencing. But, if your reads are long enough, then you may not need to barcode. For example, you can sequence 96 plasmids at the same time — simply throw away any reads that are not the full length of the plasmid.
other long-read problems
There are a few classic long-read problems like HLA sequencing, VDJ sequencing and structural variant detection (especially for cancer). These are reasonably good applications for MinION, though VDJ sequencing probably needs more accuracy, and structural variant detection might need more throughput. (10X + Illumina makes the most sense for anything like this)
MinION in the Field
Oxford is making an effort to eliminate the "cold chain" for the MinION. The flow-cell itself already seems to keep well at temperatures well above refrigeration, and they claim they can lyophilize the other reagents. Even before that happens, with basically just a cooler, a laptop, and a way to extract DNA, doctors, ecologists, and other scientists can go out into the field and do sequencing anywhere.
Earlier this year, as part of the Zibra project, scientists from the UK and Brazil drove a van through Brazil, sampling and sequencing Zika virus along the way.
Biology labs and Biotechs
The advantage of MinION for non-genomics–focused biology labs is not really widely discussed, but I think it's one of the most important.
Basically, if you want a few megabases sequenced and you have a MinION and a flow-cell, you can have the data in your hands today. When you're done you can put the flow-cell away and use it again tomorrow. Depending on your needs, you might get 4-10 uses out of the flow-cell, meaning each run costs $150-300 including sample prep.
In contrast, if you want to get some data from a MiSeq, you are probably signing up for a gigabase of sequence. That's overkill for most labs, and it produces many gigabytes of raw data to manage too... If you want reasonable length reads (2x150bp), then sequencing will take at least 24 hours. If you are lucky enough to have a core lab at your institute then that helps, but you may still have to wait your turn.
If you don't work at a university — perhaps you're at a small biotech — then the alternative is buying a MiSeq (or MiniSeq) at $50-150k plus service contract, or sending your samples out to a CRO for sequencing. A CRO will have a turnaround time of at least a week, and that's after you've explained to them what you need and agreed on the terms.
It's hard to imagine a one-off MiSeq run happening in under a week, so being able to just do it yourself is a huge increase in efficiency.
If you're sequencing a thousand of anything, then Illumina is much cheaper, but I wonder how many biology labs need megabase-scale sequencing occasionally, but don't do it because of the current barriers to entry, including the computational burden of aligning and assembling short reads. There are cases where I would not have bothered with the hassle of getting something sequenced except that we could just do it ourselves with the MinION.
Genomics for Everyone
I think the most exciting thing going on here is just taking sequencing and genomics out of the lab and into the real world. Admittedly, this does require some improvements and inventions from Oxford, like easier DNA preparation, so there are caveats here, but nothing too crazy.
Oxford's metrichor site spells out some of the use-cases too. I'll just give some scattered examples of things to sequence, some more realistic than others, but I think each plausibly represent something new that has real economic value:
- hospital surfaces and employees for MRSA
- food at factories (detecting E. coli etc)
- the environment at airports, workplaces, etc for flu (flu is expensive!)
- at crime scenes (also a big deal since the current methods of forensic DNA analysis are awful)
- at home, to see if you have a cold or flu, the same cold or a new one, and even figure out where you picked up the virus
- the air to detect mold in buildings
- farm animals' microbiomes to monitor gut health and improve growth
- at methane farms, wine fermenters, beer fermenters, to monitor and manage the process
- various kinds of labs for bacterial contamination
- the sewage system of a city to monitor the city's diet and health
- for educational purposes, and at competitions like iGEM
- fish and other foods to detect mislabeling (a surprisingly big problem)
- animals out in the wild for conservation purposes
- your own microbiome to monitor your gut health
- soil, plants, droppings, insects at farms to monitor pests etc.
- at the dentist's to detect decay-causing bacteria
- at the dermatologist's (cosmetologist?) to detect and treat acne-causing bacteria
These applications (apps?) can potentially be run by anyone. Stick some DNA in, wait a bit, processing happens on the cloud and the answer appears on your phone in a few minutes to a few hours. You don't need to know anything about genetics or molecular biology, you'll just see a readout that says "E. coli detected" in food or "DNA from new rhino detected" in droppings.
There's already a teaser of this with Oxford's What's in my Pot app. It figures out which microbes are in a sample, and draws a nice cladogram for you.
To realize this potential, the sequencer still needs to be cheaper, but the lower bound on that seems good, since the number of molecules involved is really tiny. (That's another advantage of single-molecule sequencers.)
Finally, coming back to present-day reality a little bit, Oxford will need to execute on their plans to make sequencing easier and cheaper (reagent lyophilization, Zumbador, SmidgION, Voltrax, FPGAs, etc. — watch Oxford's latest tech update for more on that), but I think MinION is going to become a very big deal in the next few years.