Cool Things Review

Brian Naughton // Sun 05 June 2016 // Filed under biotech // Tags data biotech genomics statistics

This is a short list of interesting projects, companies and technologies I've been following.

Biotech

Transcriptic

Transcriptic is still the only programmable cloud lab currently open for business. My previous blogpost has a lot more on this company. They're continuing to add capabilities, including magnetic bead operations, which enables a lot of new kinds of experiments.

Vium

After several years in stealth, Vium (formerly Mousera) just launched this week.

Essentially, Vium have "smart home" mouse cages that stream data directly to you. You can control the mouse's environment remotely and get video, humidity and temperature readings in real-time. Perhaps most importantly, you can start to analyze the data almost immediately, whereas with a regular CRO, you might have to wait months for the trial to finish. That just speeds everything up.

The advantages of this model are potentially enormous. I'm especially excited about the potential for adaptive trials: imagine administering 10 candidate drugs to 50 mice on day one, and discontinuing ineffectual drugs when your Bayes Factor drops below some threshold. The speed and cost advantage could be great.

IndieBio

IndieBio is an accelerator/VC fund that funds a lot of different kinds of lean biotech. A lot of it is hard-to-categorize stuff that neither typical biotech VCs nor tech VCs fund (probably most have no idea what you are talking about).

IndieBio especially focuses on sectors like food tech, agriculture and cosmetics, (or perhaps more broadly, biotech that's not a drug or a diagnostic). These industries are enormous though, and the opportunities are huge.

Their demo day videos (2015, 2016) have an extremely varied group of pitches, from 3D printed rhino horns to neuron-based computers, and are definitely worth checking out.

Oxford Nanopore

There's really too much going on with Oxford Nanopore to effectively summarize here. I might have to write a whole post or something.

Oxford's CTO, Clive Brown, gave a talk at London Calling that that describes their roadmap, and it includes incredible things like sequencers that attach to your iPhone, portable all-in-one sample prep, and DNA synthesis.

The next year or two is going to be very interesting for Oxford.

The MinION is very portable

Data Analysis

Continuum Analytics

Continuum Analytics is Travis Oliphant's new company (of numpy fame), and it produces some of the most useful software for scientific Python.

The products I use actively are:

  • anaconda Python distribution: this saves so many headaches it's unbelievable, and it has MKL support now
  • conda: this has replaced virtualenv for me all the time, and pip most of the time
  • numba: this comes up less but replaces Cython or numpy sometimes, and can even automatically use the GPU for you via CUDA

Continuum also have plotting libraries (bokeh), big data libraries (dask) and more. I haven't used these much, but I'd presume they are all high quality.

Tensorflow

Tensorflow is a library that is hard to explain, but basically takes numerical expressions that you define in Python, and does matrix math for you, including things like calculating gradients. Because of its design, it can distribute the work across computers, and across CPUs and GPUs.

Theano and Tensorflow are very similar: you can even use Tensorfuse as a common interface to both. Theano was first, but Tensorflow seems to be winning for the moment. Tensorflow has some advantages like integration with mobile devices (I don't think that really exists yet, but it's coming).

Although Theano and Tensorflow are best known as deep learning libraries, they do a lot more than that. One nice feature of building on Theano/Tensorflow is that you can write in pure Python, and let the library figure out the best way to compute everything.

Keras

Deep learning requires a lot of matrix multiplications and gradient calculations. Hence there are lots of deep learning libraries built on Theano/Tensorflow, and new ones pop up all the time. My favorite is keras, because it's in Python, high-level, well documented, and works with both Theano and Tensorflow.

PyMC3

PyMC3 is a "probabilistic programming" library similar to Stan (an MCMC workhorse from Andrew Gelman's lab), but in Python. Frankly, it's not nearly as polished or popular as Stan, but because it's built on Theano and scipy, the code is very short and readable Python, which is a big plus for me. It's extensible in ways Stan just can't be.

Like Stan, PyMC3 now supports faster-than-MCMC Variational Inference (ADVI). This blogpost, from @twiecki, one of the PyMC3's core developers, shows the power of building on Theano. In just a few dozen lines of Python, he builds a Bayesian neural net and solves it with ADVI. It's not super-practical, but a very interesting result.

ADVI Bayesian neural net

Edward

Edward is a very interesting project that I won't claim to fully understand. It's a probabilistic modeling library built on Tensorflow, that somehow manages to include the ability to use models defined in PyMC3, Stan, or keras.

It's a pretty amazing project, and I like their explanation of Box's loop (due to Edward Box) of modeling, reasoning, and criticism. It's also notable for how it shows the convergence of multiple related inference tools into simple, high-level code that sits on top of industrial-strength libraries like Tensorflow.


Comments


Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More