archives –––––––– @btnaughton

This small project started when I was looking for an implementation of Needleman-Wunsch (pairwise global DNA alignment) in javascript. I just wanted to align two sequences on a website and in a google sheet (using Google Apps Script).

I looked around for a simple javascript implementation (preferably one self-contained javascript file) but I was surprised that I couldn't find one. Needleman-Wunsch is a pretty simple algorithm so I decided to implement it. I did get a little bit side-tracked...

The first step was to find someone else's implementation to copy, so I started with some numpy code from @brent_p. Based on his other work, I think it's a safe assumption it's implemented correctly. (There is also a complete Cython version in this repo, which implements gap_extend and other parameters, and is obviously much faster. I really just need a very basic alignment so the simpler numpy version is fine for me).

numpy and friends

There are lots of ways to tweak the numpy implementation of Needleman-Wunsch to try to make it faster. Here are the things I tried:

  1. orig: the original numpy implementation.
  2. orig3: the original numpy implementation run with python 3.
    This is just to test how much faster or slower Python 3.6 is than 2.7.
  3. numpy: my numpy implementation.
    This is like the original numpy code, but modified a bit to make it more like my code.
  4. numba: my numpy implementation, but with numba applied.
    Numba is a pretty amazing JIT compiler you can turn on by adding one line of code. It comes with anaconda, and it's always worth trying just in case.
  5. torch: my numpy implementation, but with numpy replaced with PyTorch.
    PyTorch seems like a friendly alternative to TensorFlow, especially with its numpy-like syntax. Without explicitly applying .cuda() to my arrays it just uses the CPU, so it should not be too different to regular numpy.
  6. torchcuda: my numpy implementation, but with numpy replaced with PyTorch, and .cuda() applied to each array.
    The same as torch except using the GPU.
  7. cupy: my numpy implementation, but with numpy replaced with CuPy.
    CuPy is a drop-in replacement for numpy and, like PyTorch, only requires changing a couple of lines.

Nim

Nim is an interesting language that can compile to C (nimc) or javascript (nimjs). I thought this was a pretty good use-case for nim since I need javascript but writing scientific code in javascript is not fun. I started with a numpy-like library called arraymancer, which worked well, but since it relies on BLAS it would not compile to javascript (I could have checked that earlier...) Luckily, changing the code to simple for loops was pretty easy. Nim's syntax is a lot like Python, with some unnecessary differences like using echo instead of print. As someone used to Python, I didn't find it to be as friendly as I expected. The dream of Python-with-static-types is still a dream...

Javascript

Finally, I just programmed the alignment in javascript (js). All of the implementations are almost line-for-line identical, so this did not take long.

Speed comparison

I ran all the above implementations on some random DNA of various lengths and the results are plotted below.

import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
data = {'orig': {500: 2.23, 1000: 8.50, 1500: 18.96, 2000: 33.45, 2500: 52.70, 3000: 76.44, 5000: 209.90}, 
        'orig3': {500: 2.62, 1000: 10.34, 1500: 22.68, 2000: 40.22, 2500: 62.03, 3000: 90.15, 5000: 248.93}, 
        'numpy': {500: 1.54, 1000: 3.41, 1500: 6.27, 2000: 10.60, 2500: 16.11, 3000: 22.87, 5000: 67.45}, 
        'numba': {500: 5.54, 1000: 7.05, 1500: 9.28, 2000: 13.08, 2500: 17.40, 3000: 22.21, 5000: 56.69}, 
        'torch': {500: 1.67, 1000: 3.92, 1500: 7.61, 2000: 12.86, 2500: 19.55, 3000: 27.48, 5000: 82.90}, 
        'torchcuda': {500: 8.54, 1000: 22.47, 1500: 46.26, 2000: 80.12, 2500: 119.92, 3000: 169.95, 5000: 467.04}, 
        'cupy': {500: 35.71, 1000: 138.86, 1500: 951.97, 2000: 1713.57, 2500: 2660.11, 3000: 3798.51}, 
        'nimc': {500: 0.016, 1000: 0.041, 1500: 0.08, 2000: 0.13, 2500: 0.20, 3000: 0.31, 5000: 0.85}, 
        'nimjs': {500: 0.14, 1000: 0.28, 1500: 0.48, 2000: 0.75, 2500: 1.12, 3000: 1.53, 5000: 4.06}, 
        'js': {500: 0.09, 1000: 0.14, 1500: 0.20, 2000: 0.34, 2500: 0.41, 3000: 0.82, 5000: 1.64}}
vfast_ones = ["nimjs", "js", "nimc"]
fast_ones = ["torch", "numpy", "numba"] + vfast_ones
ok_ones = ["torchcuda", "orig3", "orig"] + fast_ones
df = pd.DataFrame(data)
df
cupy js nimc nimjs numba numpy orig orig3 torch torchcuda
500 35.71 0.09 0.01 0.13 5.54 1.54 2.23 2.62 1.67 8.54
1000 138.86 0.13 0.04 0.28 7.05 3.41 8.50 10.34 3.92 22.47
1500 951.97 0.20 0.08 0.48 9.28 6.27 18.96 22.68 7.61 46.26
2000 1713.57 0.34 0.13 0.75 13.08 10.60 33.45 40.22 12.86 80.12
2500 2660.11 0.41 0.20 1.12 17.49 16.11 52.70 62.03 19.55 119.92
3000 3798.51 0.82 0.31 1.53 22.21 22.87 76.44 90.15 27.48 169.95
5000 NaN 1.64 0.85 4.06 56.69 67.45 209.90 248.93 82.90 467.04

I'll skip cupy since it's much slower than everything else and throws the plots off. That doesn't imply anything negative about cupy and I'd use it again. It's extremely easy to replace numpy with cupy, and for properly vectorized code I'm sure it's much faster than numpy.

f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fast: everything except cupy")
_ = df[ok_ones].plot(ax=ax)

plot_fast

f,ax = plt.subplots(figsize=(16,12))
ax.set_title("faster: numpy vs C vs js")
_ = df[fast_ones].plot(ax=ax)

plot_faster

f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fastest: C vs js")
_ = df[vfast_ones].plot(ax=ax)

plot_fastest

Conclusions

I learned some interesting things here...

  • numba is good. It didn't speed this code up very much, but it was a bit faster than numpy for large alignments and didn't cost anything. I expected this to be the fastest Python-based code because there are several Python for loops (i.e., unvectorized code), which is where numba can help a lot.

  • I'm not sure why my numpy is faster than the original numpy since my changes were minimal. The original version is not coded for speed anyway.

  • GPUs don't help unless your code is written for GPUs. That basically means one repetitive task handed off to the GPU along with the data (no back-and-forth). There are ways to implement Needleman-Wunsch in a GPU-friendly way, but it complicates the code a lot. On the one hand this is a very obvious result to anyone who has used GPUs for computation — on the other hand, maybe a really smart compiler could use the CPU where appropriate and GPU where appropriate...

  • Nim is a pretty interesting language. I have seen it described as either "an easier C" or a "statically typed Python". To me it's definitely more like the former. It's not all that friendly compared to Python, but I think I'd try it again as a C/Cython replacement. Don't forget to compile with -d:release.

  • Javascript is fast! If nim is not compiled with -d:release it's even faster than nim's C code. Sadly, Google Apps Scripts' javascript is extremely slow for some reason. That was an unfortunate surprise, especially since it times out after about five minutes, so long alignments just fail! I can't explain why it's so slow...

Finally, just to note that this implementation is good enough for my purposes, but I haven't really spent any time making sure it works in all situations (apart from affirming that its output is the same as the original numpy code), so I wouldn't trust it too much. The code is available in this repo.

Comment

Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More