This small project started when I was looking for an implementation of NeedlemanWunsch (pairwise global DNA alignment) in javascript. I just wanted to align two sequences on a website and in a google sheet (using Google Apps Script).
I looked around for a simple javascript implementation (preferably one selfcontained javascript file) but I was surprised that I couldn't find one. NeedlemanWunsch is a pretty simple algorithm so I decided to implement it. I did get a little bit sidetracked...
The first step was to find someone else's implementation to copy, so I started with some numpy code from @brent_p. Based on his other work, I think it's a safe assumption it's implemented correctly. (There is also a complete Cython version in this repo, which implements gap_extend and other parameters, and is obviously much faster. I really just need a very basic alignment so the simpler numpy version is fine for me).
numpy and friends
There are lots of ways to tweak the numpy implementation of NeedlemanWunsch to try to make it faster. Here are the things I tried:
 orig: the original numpy implementation.
 orig3: the original numpy implementation run with python 3.
This is just to test how much faster or slower Python 3.6 is than 2.7.  numpy: my numpy implementation.
This is like the original numpy code, but modified a bit to make it more like my code.  numba: my numpy implementation, but with numba applied.
Numba is a pretty amazing JIT compiler you can turn on by adding one line of code. It comes with anaconda, and it's always worth trying just in case.  torch: my numpy implementation, but with numpy replaced with PyTorch.
PyTorch seems like a friendly alternative to TensorFlow, especially with its numpylike syntax. Without explicitly applying.cuda()
to my arrays it just uses the CPU, so it should not be too different to regular numpy.  torchcuda: my numpy implementation, but with numpy replaced with PyTorch, and
.cuda()
applied to each array.
The same as torch except using the GPU.  cupy: my numpy implementation, but with numpy replaced with CuPy.
CuPy is a dropin replacement for numpy and, like PyTorch, only requires changing a couple of lines.
Nim
Nim is an interesting language that can compile to C (nimc) or javascript (nimjs).
I thought this was a pretty good usecase for nim since I need javascript but writing scientific code in javascript is not fun.
I started with a numpylike library called arraymancer,
which worked well, but since it relies on BLAS it would not compile to javascript (I could have checked that earlier...)
Luckily, changing the code to simple for loops was pretty easy.
Nim's syntax is a lot like Python, with some unnecessary differences like using echo
instead of print
.
As someone used to Python, I didn't find it to be as friendly as I expected.
The dream of Pythonwithstatictypes is still a dream...
Javascript
Finally, I just programmed the alignment in javascript (js). All of the implementations are almost lineforline identical, so this did not take long.
Speed comparison
I ran all the above implementations on some random DNA of various lengths and the results are plotted below.
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
data = {'orig': {500: 2.23, 1000: 8.50, 1500: 18.96, 2000: 33.45, 2500: 52.70, 3000: 76.44, 5000: 209.90},
'orig3': {500: 2.62, 1000: 10.34, 1500: 22.68, 2000: 40.22, 2500: 62.03, 3000: 90.15, 5000: 248.93},
'numpy': {500: 1.54, 1000: 3.41, 1500: 6.27, 2000: 10.60, 2500: 16.11, 3000: 22.87, 5000: 67.45},
'numba': {500: 5.54, 1000: 7.05, 1500: 9.28, 2000: 13.08, 2500: 17.40, 3000: 22.21, 5000: 56.69},
'torch': {500: 1.67, 1000: 3.92, 1500: 7.61, 2000: 12.86, 2500: 19.55, 3000: 27.48, 5000: 82.90},
'torchcuda': {500: 8.54, 1000: 22.47, 1500: 46.26, 2000: 80.12, 2500: 119.92, 3000: 169.95, 5000: 467.04},
'cupy': {500: 35.71, 1000: 138.86, 1500: 951.97, 2000: 1713.57, 2500: 2660.11, 3000: 3798.51},
'nimc': {500: 0.016, 1000: 0.041, 1500: 0.08, 2000: 0.13, 2500: 0.20, 3000: 0.31, 5000: 0.85},
'nimjs': {500: 0.14, 1000: 0.28, 1500: 0.48, 2000: 0.75, 2500: 1.12, 3000: 1.53, 5000: 4.06},
'js': {500: 0.09, 1000: 0.14, 1500: 0.20, 2000: 0.34, 2500: 0.41, 3000: 0.82, 5000: 1.64}}
vfast_ones = ["nimjs", "js", "nimc"]
fast_ones = ["torch", "numpy", "numba"] + vfast_ones
ok_ones = ["torchcuda", "orig3", "orig"] + fast_ones
df = pd.DataFrame(data)
df
cupy  js  nimc  nimjs  numba  numpy  orig  orig3  torch  torchcuda  

500  35.71  0.09  0.01  0.13  5.54  1.54  2.23  2.62  1.67  8.54 
1000  138.86  0.13  0.04  0.28  7.05  3.41  8.50  10.34  3.92  22.47 
1500  951.97  0.20  0.08  0.48  9.28  6.27  18.96  22.68  7.61  46.26 
2000  1713.57  0.34  0.13  0.75  13.08  10.60  33.45  40.22  12.86  80.12 
2500  2660.11  0.41  0.20  1.12  17.49  16.11  52.70  62.03  19.55  119.92 
3000  3798.51  0.82  0.31  1.53  22.21  22.87  76.44  90.15  27.48  169.95 
5000  NaN  1.64  0.85  4.06  56.69  67.45  209.90  248.93  82.90  467.04 
I'll skip cupy since it's much slower than everything else and throws the plots off. That doesn't imply anything negative about cupy and I'd use it again. It's extremely easy to replace numpy with cupy, and for properly vectorized code I'm sure it's much faster than numpy.
f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fast: everything except cupy")
_ = df[ok_ones].plot(ax=ax)
f,ax = plt.subplots(figsize=(16,12))
ax.set_title("faster: numpy vs C vs js")
_ = df[fast_ones].plot(ax=ax)
f,ax = plt.subplots(figsize=(16,12))
ax.set_title("fastest: C vs js")
_ = df[vfast_ones].plot(ax=ax)
Conclusions
I learned some interesting things here...

numba is good. It didn't speed this code up very much, but it was a bit faster than numpy for large alignments and didn't cost anything. I expected this to be the fastest Pythonbased code because there are several Python for loops (i.e., unvectorized code), which is where numba can help a lot.

I'm not sure why my numpy is faster than the original numpy since my changes were minimal. The original version is not coded for speed anyway.

GPUs don't help unless your code is written for GPUs. That basically means one repetitive task handed off to the GPU along with the data (no backandforth). There are ways to implement NeedlemanWunsch in a GPUfriendly way, but it complicates the code a lot. On the one hand this is a very obvious result to anyone who has used GPUs for computation — on the other hand, maybe a really smart compiler could use the CPU where appropriate and GPU where appropriate...

Nim is a pretty interesting language. I have seen it described as either "an easier C" or a "statically typed Python". To me it's definitely more like the former. It's not all that friendly compared to Python, but I think I'd try it again as a C/Cython replacement. Don't forget to compile with
d:release
. 
Javascript is fast! If nim is not compiled with
d:release
it's even faster than nim's C code. Sadly, Google Apps Scripts' javascript is extremely slow for some reason. That was an unfortunate surprise, especially since it times out after about five minutes, so long alignments just fail! I can't explain why it's so slow...
Finally, just to note that this implementation is good enough for my purposes, but I haven't really spent any time making sure it works in all situations (apart from affirming that its output is the same as the original numpy code), so I wouldn't trust it too much. The code is available in this repo.
Comment