archives –––––––– @btnaughton

It's been a dream of mine for a long time to be able to do sequencing at home — just take whatever stuff I want: microbiome, viral/bacterial infections, insects, fungi, foods, sourdough, sauerkraut, and sequence it. Now at last, with the debut of Oxford Nanopore's flongle, it's possible!

So, a few months back, I bought some flongles (basically on launch day) and set up a home sequencing lab. In this post I'll describe what's in the lab and my first sequencing experiments.

What is a flongle?

As a refresher, Oxford Nanopore's MinION sequencer is a hand-held, single molecule nanopore sequencer. As DNA passes through a pore, the obstruction modulates the current across the pore in a pattern that can be mapped onto a sequence of nucleotides.

*DNA going through a pore, from an [ONT video](https://vimeo.com/337258910).*

The MinION device itself is a fairly simple container for the nanopore-containing flow-cells. The standard MinION flow-cell contains 512 channels (each of which has one active pore at a time), plus the ASIC chip that reads the changes in current. There's a great explainer on the Oxford Nanopore website.

Oxford Nanopore's newest flow-cell, the flongle ("flow-cell dongle"), is basically a cheaper, more disposable version of the standard MinION flow-cell. The flongle snaps into an adapter that includes the ASIC you usually find in a regular flow-cell.

*(a) MinION sequencer with loaded flow-cell. (b) Flongle and adapter.*

Whereas a 512 channel flow-cell costs $475-900, 126 channel flongles cost about $90-150 each. At the time of writing, you have to buy at least a starter pack of 12 for $1860, which includes the adapter.

Each pore can output several megabases of sequence, so the total amount of sequence can be in the hundreds of megabases (the biggest flongle run I know of is ~2Gb, in the hands of an experienced lab.)

For now, you have to spend quite a lot to get access to flongles, and supply is severely limited. I have received only four so far, out of the 48 I bought! (That was the minimum order size at launch.) So, there are definitely beta program issues here. Still, $100 NGS runs!

Home Lab Equipment

Surprisingly, you don't actually need that much expensive equipment to do nanopore sequencing. For my home lab, I bought the following:

*The lab during a sequencing run. MinION running and Used flongle on the desk in front*

*(a) Eppendorf 5415C centrifuge. (b) Anova sous vide in a Costco coffee can*

DNA Extraction

The first step in sequencing is DNA extraction. I bought a Zymo Quick-DNA Microprep Plus Kit for $132 (50 preps, so a little under $3 per prep).

This kit takes "20 minutes" (it takes me about 40 minutes including setting up). It can work with "cell culture, solid tissue, saliva, and any biological fluid sample". This prep is very easy to do, and almost all the reagents are just stored at room temperature. They claim it can recover >50kb fragments, which is very respectable. This is far from the megabase-long "whale" reads some labs work to achieve, but those preps are much more complex and time-consuming. Generally speaking, 10kb reads are more than long enough for most use-cases, and even 100bp-1kbp reads can still be used for species ID.

I am lucky to have access to a nanodrop spectrophotometer at work, so I can check my DNA quality. (Nanodrops cost thousands of dollars, even second-hand.) I think this wouldn't matter if I was was sequencing saliva repeatedly: that seems to work the same every time. However, it matters quite a bit for experimenting with sequencing different sample types.

Library prep

Library prep is the process of preparing the DNA for sequencing, for example by attaching the "motor protein" that ratchets the DNA through the pore one base at a time. Not being an experimentalist, I like to stick to the simplest possible protocols. That means rapid DNA extraction and ONT's rapid library prep (RAD-004), which costs $600 for 12 preps ($50 per prep).

Library prep is a little harder than DNA extraction, but still only takes around 40 minutes. There are some very low volumes involved (0.5µl, which is as low as my pipettes go!), and you need two water bath temperatures, but overall it's pretty straightforward.

The total time from acquiring a sample to beginning sequencing is maybe 1.5 hours. You definitely pay for this convenience in read length and throughput, but the tradeoff is not too bad.

Admittedly, the cost is more like $150 than $100 per run, but with the nuclease wash protocol now available to rejuvenate flow-cells, I think it's ok to round down...

Experiment one: saliva

This is probably the simplest possible experiment: extract human and bacterial DNA from saliva, and sequence it. Saliva has lots of human DNA — surprisingly, most of it is from white blood cells — and plenty of oral microbiome bacteria, and it's easy to get as much as you want. However, since bacterial genomes are about 0.001X the size of a human genome, you'd need 1000 bacterial genomes for every human genome if you want equal coverage of both.

*Experiment one: (a) DNA quantification from saliva. (b) A decent read length distribution, topping out at 60kb.*

This experiment generated a pretty respectable 100 megabases of sequence in 24 hours, which is basically what I was hoping for.

As soon as the DNA is loaded, reads start to get written to disk. After a minute, you have reads you can feed into BLAST to see if everything is working as expected. The instant access to data is a great reward for doing the boring prep work.

*First sequencing run at home: pores sequencing, 34 minutes and 10 megabases in!*

There are a few ways to analyze the data. There are several metagenome analysis tools, like Centrifuge and Kraken. I spent a couple of days(!) downloading the Centrifuge databases — which are massive since they need reference sequence data from bacteria, fungi, viruses etc. — only to have the build fail right afterward.

Luckily, Oxford Nanopore has some convenient tools online for analysis. It turns out that one of these, What's In My Pot (WIMP) is based on Centrifuge so it's convenient to just run that.

*Experiment one: WIMP results for unfiltered saliva*

As we can see, >99% of the reads are human or Escherichia. Upon closer inspection, the reads labeled "Escherichia coli" and "Escherichia virus Lambda" are all lambda DNA. As a QC step, I spiked lambda DNA (provided by ONT for QC purposes) into my DNA library at approximately 13% by volume. About 12% of my reads are lambda, so I know the molarity of my input sample is not too far off the reference lambda DNA.

After you get past the human and lambda DNA, the vast majority of reads map to known oral microbiome bacteria. Without anything to compare to, I can't point to any specific trends here yet.

Human DNA

What can you do with 80 megabases of human DNA? I know from just BLASTing reads that the accuracy is consistently 89-91%. Since a hundred megabases is only a 0.03X genome, it's not very useful for any human genetics tasks except maybe ancestry assignment, Gencove-style.

One thing I can do is intersect these reads with my 23andme data, and see how often it's concordant (the 23andme data is diploid and these are single molecule reads so it's not quite simple matching). Doing this intersection using bcftools and including only high quality reads resulted in only a few hundred SNPs. I did not find any variants that disagreed, which was surprising but nice to see.

Experiments two and three: failing to filter saliva

Obviously, it's a waste to generate so many human reads. Since I don't need my genome sequenced again (ok, I only have exome data), especially 0.03X at a time, I wanted to try to enrich for oral bacteria. There are host depletion kits that apparently work well, but that's kind of expensive, so I wanted to see what would happen if I just tried to physically filter saliva.

We know that human cells are usually >10µm and bacterial cells are usually <10µm so that's a pretty simple threshold to filter by. I bought a "10µm" paper filter on amazon, and just filtered saliva through it.

Experiments two and three produced almost identical results. The only differences were that after experiment two failed I tried to eliminate contamination from the paper filter by pre-washing it, and I quantified the DNA with a nanodrop, a step I skipped in experiment two. After multiple rounds of filtering–centrifugation–pouring off, I only managed to get 10ng/µl of DNA, which is very low. However, I knew that my first 32ng/µl run worked fine, so I convinced myself it must reach the recommended minimum of 5 femtomoles of DNA (that's only 3 billion molecules!), especially since the 260/280 was not that bad.

The experiment worked as planned, in the sense that instead of 99+% human DNA, I got 50% human DNA and 50% bacterial. However, instead of 100 megabases, I only got 2, and most were low quality!

*Experiment three: (a) DNA quantification from filtered saliva. (b) WIMP on filtered saliva*

My best guess here is that somehow the paper contaminated the DNA, since the pores apparently got clogged after just a couple of hours. I should at least have made sure I had a lot of DNA, though I don't have great ideas on how to do that beyond just spitting for an hour... It's likely I'll just need to use a proper microbiome prep kit next time.

Experiment four: wasp sequencing

Amazingly, despite having pretty small genomes (100s of megabases), most insects have never been sequenced! It's not clear to me that you can create a high quality genome assembly from only flongle reads, but if you can get 100 megabases of DNA, that's definitely a good start.

We have a wasp trap in our back yard. It caught a wasp but we were not sure what kind. It could be the most common type of wasp in the area, the western yellowjacket. It looks exactly like one, which is a bit of a clue.

*Distribution of western yellowjacket vs common aerial yellowjacket, according to [iNaturalist](https://www.inaturalist.org)*

But eyes can deceive. The only real way to figure out for sure if this is even a wasp is by sequencing its genome, or at least it would be if there were a reference genome. Surprisingly there is no genome for the western yellowjacket or the other likely species, the common aerial yellowjacket.

*(a) Before mushing, with an aphid and other tiny insects in the second tube. (b) After mushing, which was pretty gross.*

I took the wasp, plus an aphid that looked freshly caught in a spiderweb, and a few other tiny insects scurrying around nearby. Then I mushed them up and used the Zymo solid tissue protocol to extract DNA.

*Experiment four: (a) DNA quantification and (b) read-length distribution*

This time the DNA extraction was great quantity and quality. The total amount of sequence generated was 100 megabases again. However, the read length is extremely short on average. A general rule for nanopore sequencing is that you get out what you put in. In retrospect this problem should have been pretty obvious: although it looked ok, the wasp was not fresh enough so its DNA was very degraded.

Interestingly, there are quite a few long fragments (>5kb) in here too, and these map imperfectly to various aphid genomes (indicating that this particular aphid has also not been sequenced) and bacteria including possible wasp endosymbionts like Pantoea agglomerans. This is expected if the aphid and bacteria are fresh.

*Experiment four: (a) WIMP of wasp and aphid reads. (b) BLASTing reads produces better results*

I also ran WIMP but it turned out not to be useful, since this is not a "real" metagenomics run (i.e., it's not mainly a mixture of microbes and viruses). The closest matches are just misleading.

It would have been nice to be the first to assemble the western yellowjacket genome, or even a commensal bacterial genome, but I would have needed a lot more reads. Wasp genomes are around 200 megabases, so to get a decent quality genome I'd need at least 10 gigabases of sequence (50X coverage). That means a MinION run (or several), perhaps polished with some illumina data. The commensal bacteria are probably under 5 megabases, so it would be easier to create a reference genome, assuming any could be grown outside the wasp...

Next steps

Four flongles in, I am still pretty amazed that I can generate a hundred megabases of sequence, at home, for so little money and equipment.

I can almost run small genome projects at home, and submit the assembled genomes to ncbi. (I still need more sequencing throughput to do this in earnest.) Like the western yellowjacket, there are tons of genomes yet to be sequenced that should be sequenced. In general, plants and more complex eukaryotes will be too difficult, but bacteria, fungi, and insects should all be possible at home.

Preserving the DNA sequences of species could become an extremely important step in conservation and even de-extinction. The only group I know of doing work in this area is revive & restore. One of their projects is to try to bring back the woolly mammoth by bootstrapping from elephant to elephant–mammoth hybrid, and eventually to full mammoth. Of course this would not be possible without the mammoth genome sequence. The list of endangered species is very long, so there's a lot to do.

Comment

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More