Brian Naughton // Mon 15 February 2021 // Filed under genomics // Tags sequencing genomics nanopore protocol

In this post I'll describe how to sequence a human genome at home, something that's only recently become possible. The protocol described here is not necessarily the best way to do this, but it's what has worked best for me. It costs a few thousand dollars in equipment to get started, but the (low-coverage) sequencing itself requires only $150, a couple of hours of work, and almost no lab skills.

What does it mean to sequence a human genome?

First, it's useful to explain some terms: specifically, to differentiate a regular reference-based genome assembly from a de novo genome assembly.

Twenty years or so ago, the Human Genome Project resulted in one complete human genome sequence (actually combining data from several humans). Since all humans are 99.9%+ identical genetically, this reference genome can be used as a template for any human. The easiest way to sequence a human genome is to generate millions of short reads (100-300 base-pairs) and aligning them to this reference.

The alternative to this reference-based assembly is a de novo assembly, where you figure out the genome sequence without using the reference, by stitching together overlapping sequencing reads. This is much more difficult computationally (and actually impossible if your reads are too short), but the advantage is that you can potentially see large differences compared to the reference. For example, it's not uncommon to have some sequence in a genome that is not present in the reference genome, so-called structural variants.

There are also gaps in the reference genome, especially at the ends and middle of chromosomes, due to highly repetitive sequence. In fact, the first full end-to-end human chromosome was only sequenced last year, thanks to ultra-long nanopore reads.

For non-human species, genome assembly is usually de novo, either because the genomes are small and non-repetitive (bacteria), or there is no reference (newly sequenced species).

SNP chip

The cheapest way to get human genome sequence data is with a SNP chip, like the 23andMe chip. These chips work by measuring variation at specific, pre-determined positions in the genome. Since we know the positions that commonly vary in the human genome, we can just examine a few hundred thousand of those positions to see most of the variation. You can also accurately impute tons of additional variants not on the chip. The reason this is "genotyping" and not "sequencing" is that you don't get a contiguous sequence of As, Cs, Gs, and Ts. The major disadvantage of SNP chips is that you cannot directly measure variants not on the chip, so you miss things, especially rare and novel variants. On the other hand, the accuracy for a specific variant of interest (e.g., a recessive disease variant like cystic fibrosis ΔF508) is probably higher than from a sequenced genome.

Short-read sequencing

Short-read sequencing is almost always done with Illumina sequencers, though other short-read technologies are emerging. These machines output millions or billions of 100-300 base-pair reads that you can align to the reference human genome. Generally, people like to have on average 30X coverage of the human genome (~100 gigabases) to ensure high accuracy across the genome.

Although you can read variants not present on a SNP chip, this is still not a complete genome: coverage is not equal across the genome, so some regions will likely have too low coverage to call variants; the reference genome is incomplete; some structural variants (insertions, inversions, repetitive regions) cannot be detected with short reads.

Long-read sequencing

The past few years have seen single-molecule long-read sequencing develop into an essential complement and sometimes credible alternative to Illumina. The two players, Pacific Biosciences and Oxford Nanopore (ONT) are now mature technologies. The big advantage of these technologies is that you get reads much longer than 300bp — from low thousands up to megabases on ONT in extreme examples — so assembly is much easier. This enables de novo assembly, and is especially helpful with repetitive sequence. For this reason, long-read sequencing is almost essential for sequencing new species, especially highly repetitive plant genomes.

Sounds great! Why do people still use Illumina then? The per-base accuracy and per-base cost of Illumina is still considerably better than these competitors (though ONT's PromethION is getting close on price).

One huge advantage that ONT has over competitors is that the instrument is a fairly simple solid-state device that reads electrical signals from the nanopores. Since most of the technology is in the consumable "flow-cell" of pores, the instruments can be tiny and almost free to buy.

Instead of spending $50k-1M on a complex machine that requires a service contract, etc., you can get a stapler-sized MinION sequencer for almost nothing, and you can use it almost anywhere. ONT have also done a great job driving the cost per experiment down, especially by releasing a lower-output flow-cell adaptor called the flongle. Flongle flow-cells only cost $90 per flow-cell, and produce 100 megabases to >1 gigabase of sequence.

There is a great primer on how nanopore sequencing works at

Nanopore Sequencing Equipment

(Note, to make this article stand alone, I copied text from my previous home lab blogpost.)

Surprisingly, you don't actually need much expensive equipment to do nanopore sequencing.

In my home lab, I have the following:

Optional equipment:

  • A wireless fridge thermometer. This was only $25 and it works great! It's useful to be able to keep track of temperatures in your fridge or freezer. Some fridges can get cold enough to freeze, which is deadly for flow-cells.
  • A GeneQuant to check the quality of DNA extractions. A 20 year old machine cost me about $150 on ebay. It's a useful tool, but does require quite a lot of sample (I use 200 µl). I wrote a bit more about it here.

The lab during a sequencing run. MinION running and used flow-cell on the desk in front

(a) Eppendorf 5415C centrifuge. (b) Anova sous vide in a Costco coffee can

Protocol Part One: extracting DNA

The first step in sequencing is DNA extraction (i.e., isolating DNA from biological material). I use a Zymo Quick-DNA Microprep Plus Kit, costing $132. It's 50 preps, so a little under $3 per prep. There are other kits out there, like NEB's Monarch, but these are harder to buy (requiring a P.O., or business address).

The Zymo kit takes "20 minutes" (it takes me about 40 minutes including setting up). It is very versatile: it can work with "cell culture, solid tissue, saliva, and any biological fluid sample". This prep is pretty easy to do, and all the reagents except Proteinase k are just stored at room temperature. They claim it can recover >50kb fragments, and anecdotally, this is the maximum length I have seen. That is far from the megabase-long "whale" reads some labs can achieve, but those preps are much more complex and time-consuming. Generally speaking, 10kb fragments are more than long enough for most use-cases.

Protocol Part Two: library prep

Library prep is the process of preparing the DNA for sequencing, for example by attaching the "motor protein" that ratchets the DNA through the pore one base at a time. The rapid library prep (RAD-004) is the simplest and quickest library prep method available, at $600 for 12 preps ($50 per prep).

Library prep is about as difficult as DNA extraction, and takes around 30 minutes. There are some very low volumes involved (down to 0.5µl, which is as low as my pipettes go), and you need two to use two water bath temperatures, but overall it's pretty straightforward.

The total time from acquiring a sample to beginning sequencing could be as little as 60-90 minutes. You do pay for this convenience in lower read lengths and lower throughput though.

The Data

The amount of data you can get from ONT/nanopore varies quite a lot. There is a fundamental difference between Illumina and nanopore in that nanopore is single-molecule sequencing. With nanopore, each read represents a single DNA molecule traversing the pore. With Illumina, a read is an aggegated signal from many DNA molecules (which contributes to the accuracy).

So, nanopore is really working with the raw material you put in. If there are contaminants, then they can jam up the pores. If there are mostly short DNA fragments in the sample, you will get mostly short reads. Over time, the pores degrade, so you won't get as much data from a months-old flow-cell as a new one.

Using the protocol above, I have been able to get around 100-200 megabases of data from one flongle ($1 per megabase!). There are probably a few factors contributing to this relatively low throughput: the rapid kit does not work as well as the more complex ligation kit; I don't do a lot of sequencing, so the protocol is certainly executed imperfectly; my flow-cells are not always fresh.

For a human sample, 100 megabases is less than a 0.1X genome, which raises the fair question of why you would want to do that? Today, the answer is mainly just because you can. You could definitely do some interesting ancestry analyses, but it would be difficult to validate without a reference database. gencove also has several good population-level use-cases for low-pass sequencing.

The next step up from a flongle is a full-size MinION flow-cell, which runs on the same equipment and uses the same protocol, but costs around $900, and in theory can produce up to 42 gigabases of sequence. This would be a "thousand dollar genome", though the accuracy is probably below what you would want for diagnosic purposes. In a year or two, I may be able to generate a diagnostic-quality human genome at home for around $1000, perhaps even a decent de novo assembly.

Brian Naughton // Sat 27 July 2019 // Filed under biotech // Tags sequencing flongle nanopore homelab biotech

It's been a dream of mine for a long time to be able to do sequencing at home — just take whatever stuff I want: microbiome, viral/bacterial infections, insects, fungi, foods, sourdough, sauerkraut, and sequence it. Now at last, with the debut of Oxford Nanopore's flongle, it's possible!

So, a few months back, I bought some flongles (basically on launch day) and set up a home sequencing lab. In this post I'll describe what's in the lab and my first sequencing experiments.

What is a flongle?

As a refresher, Oxford Nanopore's MinION sequencer is a hand-held, single molecule nanopore sequencer. As DNA passes through a pore, the obstruction modulates the current across the pore in a pattern that can be mapped onto a sequence of nucleotides.

DNA going through a pore, from an ONT video.

The MinION device itself is a fairly simple container for the nanopore-containing flow-cells. The standard MinION flow-cell contains 512 channels (each of which has one active pore at a time), plus the ASIC chip that reads the changes in current. There's a great explainer on the Oxford Nanopore website.

Oxford Nanopore's newest flow-cell, the flongle ("flow-cell dongle"), is basically a cheaper, more disposable version of the standard MinION flow-cell. The flongle snaps into an adapter that includes the ASIC you usually find in a regular flow-cell.

(a) MinION sequencer with loaded flow-cell. (b) Flongle and adapter.

Whereas a 512 channel flow-cell costs $475-900, 126 channel flongles cost about $90-150 each. At the time of writing, you have to buy at least a starter pack of 12 for $1860, which includes the adapter.

Each pore can output several megabases of sequence, so the total amount of sequence can be in the hundreds of megabases (the biggest flongle run I know of is ~2Gb, in the hands of an experienced lab.)

For now, you have to spend quite a lot to get access to flongles, and supply is severely limited. I have received only four so far, out of the 48 I bought! (That was the minimum order size at launch.) So, there are definitely beta program issues here. Still, $100 NGS runs!

Home Lab Equipment

Surprisingly, you don't actually need that much expensive equipment to do nanopore sequencing. For my home lab, I bought the following:

The lab during a sequencing run. MinION running and Used flongle on the desk in front

(a) Eppendorf 5415C centrifuge. (b) Anova sous vide in a Costco coffee can

DNA Extraction

The first step in sequencing is DNA extraction. I bought a Zymo Quick-DNA Microprep Plus Kit for $132 (50 preps, so a little under $3 per prep).

This kit takes "20 minutes" (it takes me about 40 minutes including setting up). It can work with "cell culture, solid tissue, saliva, and any biological fluid sample". This prep is very easy to do, and almost all the reagents are just stored at room temperature. They claim it can recover >50kb fragments, which is very respectable. This is far from the megabase-long "whale" reads some labs work to achieve, but those preps are much more complex and time-consuming. Generally speaking, 10kb reads are more than long enough for most use-cases, and even 100bp-1kbp reads can still be used for species ID.

I am lucky to have access to a nanodrop spectrophotometer at work, so I can check my DNA quality. (Nanodrops cost thousands of dollars, even second-hand.) I think this wouldn't matter if I was was sequencing saliva repeatedly: that seems to work the same every time. However, it matters quite a bit for experimenting with sequencing different sample types.

Library prep

Library prep is the process of preparing the DNA for sequencing, for example by attaching the "motor protein" that ratchets the DNA through the pore one base at a time. Not being an experimentalist, I like to stick to the simplest possible protocols. That means rapid DNA extraction and ONT's rapid library prep (RAD-004), which costs $600 for 12 preps ($50 per prep).

Library prep is a little harder than DNA extraction, but still only takes around 40 minutes. There are some very low volumes involved (0.5µl, which is as low as my pipettes go!), and you need two water bath temperatures, but overall it's pretty straightforward.

The total time from acquiring a sample to beginning sequencing is maybe 1.5 hours. You definitely pay for this convenience in read length and throughput, but the tradeoff is not too bad.

Admittedly, the cost is more like $150 than $100 per run, but with the nuclease wash protocol now available to rejuvenate flow-cells, I think it's ok to round down...

Experiment one: saliva

This is probably the simplest possible experiment: extract human and bacterial DNA from saliva, and sequence it. Saliva has lots of human DNA — surprisingly, most of it is from white blood cells — and plenty of oral microbiome bacteria, and it's easy to get as much as you want. However, since bacterial genomes are about 0.001X the size of a human genome, you'd need 1000 bacterial genomes for every human genome if you want equal coverage of both.

Experiment one: (a) DNA quantification from saliva. (b) A decent read length distribution, topping out at 60kb.

This experiment generated a pretty respectable 100 megabases of sequence in 24 hours, which is basically what I was hoping for.

As soon as the DNA is loaded, reads start to get written to disk. After a minute, you have reads you can feed into BLAST to see if everything is working as expected. The instant access to data is a great reward for doing the boring prep work.

First sequencing run at home: pores sequencing, 34 minutes and 10 megabases in!

There are a few ways to analyze the data. There are several metagenome analysis tools, like Centrifuge and Kraken. I spent a couple of days(!) downloading the Centrifuge databases — which are massive since they need reference sequence data from bacteria, fungi, viruses etc. — only to have the build fail right afterward.

Luckily, Oxford Nanopore has some convenient tools online for analysis. It turns out that one of these, What's In My Pot (WIMP) is based on Centrifuge so it's convenient to just run that.

Experiment one: WIMP results for unfiltered saliva

As we can see, >99% of the reads are human or Escherichia. Upon closer inspection, the reads labeled "Escherichia coli" and "Escherichia virus Lambda" are all lambda DNA. As a QC step, I spiked lambda DNA (provided by ONT for QC purposes) into my DNA library at approximately 13% by volume. About 12% of my reads are lambda, so I know the molarity of my input sample is not too far off the reference lambda DNA.

After you get past the human and lambda DNA, the vast majority of reads map to known oral microbiome bacteria. Without anything to compare to, I can't point to any specific trends here yet.

Human DNA

What can you do with 80 megabases of human DNA? I know from just BLASTing reads that the accuracy is consistently 89-91%. Since a hundred megabases is only a 0.03X genome, it's not very useful for any human genetics tasks except maybe ancestry assignment, Gencove-style.

One thing I can do is intersect these reads with my 23andme data, and see how often it's concordant (the 23andme data is diploid and these are single molecule reads so it's not quite simple matching). Doing this intersection using bcftools and including only high quality reads resulted in only a few hundred SNPs. I did not find any variants that disagreed, which was surprising but nice to see.

Experiments two and three: failing to filter saliva

Obviously, it's a waste to generate so many human reads. Since I don't need my genome sequenced again (ok, I only have exome data), especially 0.03X at a time, I wanted to try to enrich for oral bacteria. There are host depletion kits that apparently work well, but that's kind of expensive, so I wanted to see what would happen if I just tried to physically filter saliva.

We know that human cells are usually >10µm and bacterial cells are usually <10µm so that's a pretty simple threshold to filter by. I bought a "10µm" paper filter on amazon, and just filtered saliva through it.

Experiments two and three produced almost identical results. The only differences were that after experiment two failed I tried to eliminate contamination from the paper filter by pre-washing it, and I quantified the DNA with a nanodrop, a step I skipped in experiment two. After multiple rounds of filtering–centrifugation–pouring off, I only managed to get 10ng/µl of DNA, which is very low. However, I knew that my first 32ng/µl run worked fine, so I convinced myself it must reach the recommended minimum of 5 femtomoles of DNA (that's only 3 billion molecules!), especially since the 260/280 was not that bad.

The experiment worked as planned, in the sense that instead of 99+% human DNA, I got 50% human DNA and 50% bacterial. However, instead of 100 megabases, I only got 2, and most were low quality!

Experiment three: (a) DNA quantification from filtered saliva. (b) WIMP on filtered saliva

My best guess here is that somehow the paper contaminated the DNA, since the pores apparently got clogged after just a couple of hours. I should at least have made sure I had a lot of DNA, though I don't have great ideas on how to do that beyond just spitting for an hour... It's likely I'll just need to use a proper microbiome prep kit next time.

Experiment four: wasp sequencing

Amazingly, despite having pretty small genomes (100s of megabases), most insects have never been sequenced! It's not clear to me that you can create a high quality genome assembly from only flongle reads, but if you can get 100 megabases of DNA, that's definitely a good start.

We have a wasp trap in our back yard. It caught a wasp but we were not sure what kind. It could be the most common type of wasp in the area, the western yellowjacket. It looks exactly like one, which is a bit of a clue.

Distribution of western yellowjacket vs common aerial yellowjacket, according to iNaturalist

But eyes can deceive. The only real way to figure out for sure if this is even a wasp is by sequencing its genome, or at least it would be if there were a reference genome. Surprisingly there is no genome for the western yellowjacket or the other likely species, the common aerial yellowjacket.

(a) Before mushing, with an aphid and other tiny insects in the second tube. (b) After mushing, which was pretty gross.

I took the wasp, plus an aphid that looked freshly caught in a spiderweb, and a few other tiny insects scurrying around nearby. Then I mushed them up and used the Zymo solid tissue protocol to extract DNA.

Experiment four: (a) DNA quantification and (b) read-length distribution

This time the DNA extraction was great quantity and quality. The total amount of sequence generated was 100 megabases again. However, the read length is extremely short on average. A general rule for nanopore sequencing is that you get out what you put in. In retrospect this problem should have been pretty obvious: although it looked ok, the wasp was not fresh enough so its DNA was very degraded.

Interestingly, there are quite a few long fragments (>5kb) in here too, and these map imperfectly to various aphid genomes (indicating that this particular aphid has also not been sequenced) and bacteria including possible wasp endosymbionts like Pantoea agglomerans. This is expected if the aphid and bacteria are fresh.

Experiment four: (a) WIMP of wasp and aphid reads. (b) BLASTing reads produces better results

I also ran WIMP but it turned out not to be useful, since this is not a "real" metagenomics run (i.e., it's not mainly a mixture of microbes and viruses). The closest matches are just misleading.

It would have been nice to be the first to assemble the western yellowjacket genome, or even a commensal bacterial genome, but I would have needed a lot more reads. Wasp genomes are around 200 megabases, so to get a decent quality genome I'd need at least 10 gigabases of sequence (50X coverage). That means a MinION run (or several), perhaps polished with some illumina data. The commensal bacteria are probably under 5 megabases, so it would be easier to create a reference genome, assuming any could be grown outside the wasp...

Next steps

Four flongles in, I am still pretty amazed that I can generate a hundred megabases of sequence, at home, for so little money and equipment.

I can almost run small genome projects at home, and submit the assembled genomes to ncbi. (I still need more sequencing throughput to do this in earnest.) Like the western yellowjacket, there are tons of genomes yet to be sequenced that should be sequenced. In general, plants and more complex eukaryotes will be too difficult, but bacteria, fungi, and insects should all be possible at home.

Preserving the DNA sequences of species could become an extremely important step in conservation and even de-extinction. The only group I know of doing work in this area is revive & restore. One of their projects is to try to bring back the woolly mammoth by bootstrapping from elephant to elephant–mammoth hybrid, and eventually to full mammoth. Of course this would not be possible without the mammoth genome sequence. The list of endangered species is very long, so there's a lot to do.

Brian Naughton // Mon 10 October 2016 // Filed under genomics // Tags genomics nanopore

Many genomics people, especially in the US, are still unfamiliar with Oxford Nanopore's MinION sequencer. I was lucky enough to join their early access program last year, so I've been using it for a while. In that time I've become more and more excited about its potential. In fact, I think it's the most exciting thing to happen in genomics in a long time. I'll try to explain why.

MinION vs Illumina

The MinION is a tiny little sequencer that has some serious advantages over Illumina sequencers:

  • it's very portable (see the photos!) and doesn't require any special equipment to run
  • it's simple to run: there's a 10 minute prep with just a couple of pipetting steps
  • the sequencer itself is essentially free, with a cost of $500-900 per flow-cell (which can be reused several times).
  • the reads are very long, about as long as the input DNA (100kb is not unusual)
  • it's a single molecule sequencer, so you can detect per molecule variation, including base modifications (this is still low accuracy though)
  • it can read RNA directly, giving you full-length transcripts
  • the turnaround time is very quick: you can generate tens to hundreds of megabases of data in an afternoon
  • data analysis is easier than for short-read sequencers, since alignment and assembly are simpler. You may not even really need any alignment if you are sequencing a plasmid or insert.
  • the data arrives per read instead of per base: so in one hour you can have thousands of long reads (as opposed to Illumina, where you'd have millions of partial reads, each only a few bases)
  • seeing reads appear in real-time is amazing and you can literally pull the USB plug when you have enough data

There are also two big disadvantages:

  • its accuracy is at least an order of magnitude worse than Illumina (~90% vs >99%)
  • its per base cost is at least an order of magnitude higher than an Illumina HiSeq ($0.5/Mbase vs <$0.02/Mbase) and 2–10X more expensive than a MiSeq. Of course, these numbers are rough and in flux. For example, a HiSeq or MiSeq will require a service contract that could be $20k/yr — the cost of an Illumina run is highly volume-dependent.

Something that's not often discussed is the error rate of short-read sequencers. On a per base level they are extremely accurate, but incorrectly determined structural variants are also errors. In a human genome a miscalled 3Mb inversion could by itself be considered a 0.1% error rate. and there are lots of structural variants in humans. Unlike incorrect base-calls, it is often impossible to overcome this issue with greater read-depth.

Despite these advantages, many scientists remain skeptical of the MinION. There are probably two things going on here: (a) Oxford has consistently overpromised since announcing in 2012; (b) the MinION only started to be really competitive in the past few months, so there is a lag.

What changed?

About six months ago, you could expect to get about 500Mb of DNA from a flow-cell, with each pore reading at 70 bases/second and accuracy of 70-80% (at least in our novice hands).

Earlier this year, Oxford made two important changes that improved performance: they updated their pore from an unspecified pore ("R7", which was tangled up in a patent dispute with Illumina) to an E. coli pore ("R9"), which has both better throughput and better accuracy than R7. At the same time, they updated their base-calling algorithm to a deep learning-based method, further improving accuracy.

They are still incrementally improving R9, and are already on version R9.4. At the time of writing, this version is currently only in the hands of the inner circle of nanoporati, but luckily they are all on Twitter so we can get a pretty good sense of how well it works. People are reporting excellent results, with runs of over 5Gb at the new R9 speed of 240 bases/second (this should be 500 bases/second soon, apparently with no loss in accuracy). Accuracy is also up, with 1D reads perhaps even edging over 90% in experienced hands.

So, compared to six months ago, you are probably getting 5-10 times as much data with half the error rate.

OK, what can I do with one of these gizmos?

The stats are definitely exciting, but I don't think they really capture why I think the MinION is so interesting. The MinION has several key areas where it can do some damage, and other areas where it opens up new possibilities.

sequencing microbial genomes de novo

This is very doable. I wouldn't say it's easy yet, but long reads negate a lot of the computational problems of de novo assembly: finding overlapping 10,000mers is a very different problem to finding overlapping 100mers.

infectious agent detection

Once you have prepped DNA, which takes from 10 minutes (with the "rapid" kit) to two hours, the actual process of detecting a pathogen could be under ten minutes. In practice I don't think anybody is going from blood sample to diagnosis this quickly, but the potential is there.

There is even software (Mykrobe) that detects drug-resistance genes in bacteria, and recommends appropriate antibiotics. When this is done cheaply and routinely it should help a lot with drug resistance and overprescription of antibiotics.

Since the data comes in one read at a time, as soon as you get one read from the infectious agent you are done.

direct RNA sequencing

If you want to read full-length transcripts, and see base modifications too, then the MinION is the only option that I know of. This capability is new, and the base modification detection is not accurate, but there's still plenty of interesting research to do with this.


Sequencing often requires barcoding, which adds fiddly extra steps before and after sequencing. But, if your reads are long enough, then you may not need to barcode. For example, you can sequence 96 plasmids at the same time — simply throw away any reads that are not the full length of the plasmid.

other long-read problems

There are a few classic long-read problems like HLA sequencing, VDJ sequencing and structural variant detection (especially for cancer). These are reasonably good applications for MinION, though VDJ sequencing probably needs more accuracy, and structural variant detection might need more throughput. (10X + Illumina makes the most sense for anything like this)

MinION in the Field

Oxford is making an effort to eliminate the "cold chain" for the MinION. The flow-cell itself already seems to keep well at temperatures well above refrigeration, and they claim they can lyophilize the other reagents. Even before that happens, with basically just a cooler, a laptop, and a way to extract DNA, doctors, ecologists, and other scientists can go out into the field and do sequencing anywhere.

Earlier this year, as part of the Zibra project, scientists from the UK and Brazil drove a van through Brazil, sampling and sequencing Zika virus along the way.

Biology labs and Biotechs

The advantage of MinION for non-genomics–focused biology labs is not really widely discussed, but I think it's one of the most important.

Basically, if you want a few megabases sequenced and you have a MinION and a flow-cell, you can have the data in your hands today. When you're done you can put the flow-cell away and use it again tomorrow. Depending on your needs, you might get 4-10 uses out of the flow-cell, meaning each run costs $150-300 including sample prep.

In contrast, if you want to get some data from a MiSeq, you are probably signing up for a gigabase of sequence. That's overkill for most labs, and it produces many gigabytes of raw data to manage too... If you want reasonable length reads (2x150bp), then sequencing will take at least 24 hours. If you are lucky enough to have a core lab at your institute then that helps, but you may still have to wait your turn.

If you don't work at a university — perhaps you're at a small biotech — then the alternative is buying a MiSeq (or MiniSeq) at $50-150k plus service contract, or sending your samples out to a CRO for sequencing. A CRO will have a turnaround time of at least a week, and that's after you've explained to them what you need and agreed on the terms.

It's hard to imagine a one-off MiSeq run happening in under a week, so being able to just do it yourself is a huge increase in efficiency.

If you're sequencing a thousand of anything, then Illumina is much cheaper, but I wonder how many biology labs need megabase-scale sequencing occasionally, but don't do it because of the current barriers to entry, including the computational burden of aligning and assembling short reads. There are cases where I would not have bothered with the hassle of getting something sequenced except that we could just do it ourselves with the MinION.

Genomics for Everyone

I think the most exciting thing going on here is just taking sequencing and genomics out of the lab and into the real world. Admittedly, this does require some improvements and inventions from Oxford, like easier DNA preparation, so there are caveats here, but nothing too crazy.

Oxford's metrichor site spells out some of the use-cases too. I'll just give some scattered examples of things to sequence, some more realistic than others, but I think each plausibly represent something new that has real economic value:

  • hospital surfaces and employees for MRSA
  • food at factories (detecting E. coli etc)
  • the environment at airports, workplaces, etc for flu (flu is expensive!)
  • at crime scenes (also a big deal since the current methods of forensic DNA analysis are awful)
  • at home, to see if you have a cold or flu, the same cold or a new one, and even figure out where you picked up the virus
  • the air to detect mold in buildings
  • farm animals' microbiomes to monitor gut health and improve growth
  • at methane farms, wine fermenters, beer fermenters, to monitor and manage the process
  • various kinds of labs for bacterial contamination
  • the sewage system of a city to monitor the city's diet and health
  • for educational purposes, and at competitions like iGEM
  • fish and other foods to detect mislabeling (a surprisingly big problem)
  • animals out in the wild for conservation purposes
  • your own microbiome to monitor your gut health
  • soil, plants, droppings, insects at farms to monitor pests etc.
  • at the dentist's to detect decay-causing bacteria
  • at the dermatologist's (cosmetologist?) to detect and treat acne-causing bacteria

These applications (apps?) can potentially be run by anyone. Stick some DNA in, wait a bit, processing happens on the cloud and the answer appears on your phone in a few minutes to a few hours. You don't need to know anything about genetics or molecular biology, you'll just see a readout that says "E. coli detected" in food or "DNA from new rhino detected" in droppings.

There's already a teaser of this with Oxford's What's in my Pot app. It figures out which microbes are in a sample, and draws a nice cladogram for you.

To realize this potential, the sequencer still needs to be cheaper, but the lower bound on that seems good, since the number of molecules involved is really tiny. (That's another advantage of single-molecule sequencers.)

Finally, coming back to present-day reality a little bit, Oxford will need to execute on their plans to make sequencing easier and cheaper (reagent lyophilization, Zumbador, SmidgION, Voltrax, FPGAs, etc. — watch Oxford's latest tech update for more on that), but I think MinION is going to become a very big deal in the next few years.

Brian Naughton // Sun 14 August 2016 // Filed under data // Tags data genomics statistics pymc3 nanopore

Estimating parameters for a toy nanopore simulator using a hierarchical Bayesian model in PyMC3

Read More
Brian Naughton // Thu 14 May 2015 // Filed under genomics // Tags sequencing nanopore minion

Some notes on the Oxford Nanopore Conference 2015

Read More

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More