I've been messing around in a Kaggle competition, and one of the frustrating aspects
is waiting for my laptop to finish running analyses.
It makes sense to me that I should be using someone else's cloud for this, since then
I could get a bigger processor or more processors when I need them.
There used to be a nice Python-focused platform called PiCloud that did this.
Essentially, PiCloud was selling a convenient interface to AWS and taking a cut.
I never really used it but the execution was very nice.
Sadly, I don't think many people used PiCloud, and it has since been bought by Dropbox and
shut down.
The Python-based options for cloud data-analysis I know about right now are Wakari and
Domino.
Wakari
Wakari is from the awesome Continuum Analytics guys (makers of Anaconda), so it's
Python only.
It's simply an iPython notebook that runs in your browser, and an AWS instance on the
backend. They have an unlimited free tier, which is very nice for testing it out.
It's not too expensive, but there is no pure pay-as-you-go option, which is my preference.
To get more compute power than my laptop (which has good oomph at four 2.7GHz cores
and 16GB of RAM) would be pretty expensive: even the $100/month premium option only has
3GB of RAM.
Domino
Domino appears to be a new company, and is purely focused on "data science".
It supports Python, R, Julia and Matlab.
It is a great concept, and has some really nice features, such as the ability to
expose your model and results as API endpoints.
There's a lot of scope for Domino to add value on top of the output of your analyses
with UIs, APIs etc..
Despite its great promise, I found the execution of Domino very offputting:
- it doesn't have seaborn (a Python library) installed. Why not just install every
reasonably common library?
- the introductory/free tier has a two hour limit and a clock counting down at you so you
can't just try it out in peace. This should work more like a cellphone data cap, and
just slow down when you hit the limit. I don't want to start something if I have to
keep watching the clock.
- a minor point, but every running instance has an associated animated loading spinner,
which is just distracting.
- the lowest tier ("hobby"), where you just pay per minute of CPU, is limited to one(!)
project. To have five projects at a time I need to pay an additional $99 per month(!!)
Why would I want to start using something with such restrictive limits?
I would guess that the more projects I have going at the same time, the more I would end
up paying Domino. Bizarre...
So unfortunately, until someone comes up with a good alternative, I am back to putting my
laptop's CPU and noisy fans to work.
Comment