Tools for working on the cloud
These days most everything is on the cloud. However, probably the most common mode of working is to develop locally on a laptop, then deploy to the cloud when necessary. Instead, I like to try to run everything remotely on an instance on the cloud.
Why?
- All your files are together in one place.
- You can backup your cloud instance very easily (especially on GCP), and even spawn clone machines as necessary with more CPU etc.
- You can work with large datasets. Laptops usually max out at 16GB RAM, but on a cloud instance you can get 50GB+. You can also expand the disk size on your cloud instance as necessary.
- You can do a lot of computational work without making your laptop fans explode.
- Your laptop becomes more like a dumb terminal. When your laptop dies, or is being repaired, it's easy to continue work without interruption.
The big caveats here are that this requires having an always-on cloud instance, which is relatively expensive, and you cannot work without an internet connection. As someone who spends a lot of time in jupyter notebooks munging large dataframe, I find the trade-offs are worth it.
Here are some tools I use that help make working on the cloud easier.
Mosh
Mosh is the most important tool here for working on the cloud. Instead of sshing into a machine once or more a day, now my laptop is continuously connected to a cloud instance, essentially until the instance needs to reboot (yearly?). Mosh also works better than regular ssh on weak connections, so it's handy for working on the train, etc. It makes working on a remote machine feel like working locally. If you use ssh a lot, try mosh instead.
Tmux
Since I just have one mosh connection at a time, I need to have some tabs. Most people probably already use screen or tmux anyway. I have a basic tmux setup, with ten or so tabs, each with a two-letter name just to keep things a bit neater.
tmux resurrect is the only tmux add-on I use. It works ok: if your server needs to restart, tmux resurrect will at least rememeber the names of your tabs.
Autossh
Mosh works amazingly well for a primary ssh connection, but to run everything on the cloud I also need a few ssh tunnels. Mosh cannot do this, so instead I need to use autossh. Like mosh, autossh tries to keep a connection open indefinitely. It seems to be slightly less reliable and fiddlier to set up than mosh, but has been working great for me recently.
Here's the command I use, based on this article and others.
It took a while to get working via trial and error, so there may well be better ways.
The ssh part of this autossh command comes from running gcloud compute ssh --dry-run
.
autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -f -t -i $HOME/.ssh/google_compute_engine -o CheckHostIP=no -o HostKeyAlias=compute.1234567890 -o IdentitiesOnly=yes -o StrictHostKeyChecking=yes -o UserKnownHostsFile=/Users/briann/.ssh/google_compute_known_hosts [email protected] -N -L 2288:localhost:8888 -L 2280:localhost:8880 -L 8385:localhost:8384 -L 2222:localhost:22 -L 8443:localhost:8443
The tunnels I set up:
2288->8888
: to access jupyter running on my cloud instance (I keep8888
for local jupyter)2280->8880
: to access a remote webserver (e.g., if i runpython -m http.server
on my cloud instance)8385->8384
: syncthing (see below)2222->22
: sshfs (see below)8443->8443
: coder (see below)
So to access jupyter, I just run jupyter notebook
in a tmux tab on my cloud box,
and go to https://localhost:2288
.
To view a file on my cloud box I run python -m http.server
on my cloud box,
and go to https://localhost:2280
.
Syncthing
Syncthing is a dropbox-like tool that syncs files across a group of computers.
Unlike dropbox, the connections between machines are direct (i.e., there is no centralized server).
It's pretty simple: you run syncthing on your laptop and on your cloud instance,
they find each other and start syncing.
Since 8384
is the default syncthing port, I can see syncthing's local and remote dashboards on
https://localhost:8384
and https://localhost:8385
respectively.
In my experience, syncthing works pretty well, but I recently stopped using it
because I've found it unnecessary to have files synced to my laptop.
Sshfs
sshfs is a tool that lets you mount a filesystem over ssh. Like syncthing, I also don't use sshfs much any more since it's pretty slow and can fail on occasion. It is handy if you want to browse PDFs or similar files stored on your cloud instance though.
Coder
I recently started using Coder, which is Visual Studio Code (my preferred editor), but running in a browser. Amazingly, it's almost impossible to tell the difference between "native" VS Code (an Electron app) and the browser version, especially if it's running in full-screen mode.
It's very fast to get started. You run this on your cloud instance:
docker run -t -p 127.0.0.1:8443:8443 -v "${PWD}:/root/project" codercom/code-server code-server --allow-http --no-auth
then go to http://localhost:8443
and that's it!
Coder is new and has had some glitches and limitations for me. For example, I don't know how you are supposed to install extensions without also updating the Docker image, which is less than ideal, and the documentation is minimal. Still, the VS Code team seems to execute very quickly, so I am sticking with it for now. It think it will improve and stabilize soon.
Annoyances
One annoyance with having everything on the cloud is viewing files. X11 is the typical way to solve this problem, but I've never had much success with X11. Even at its best, it's ugly and slow. Most of my graphing, etc. happens in jupyter, so this is usually not a big issue.
However, for infrequent file viewing, this python code has come in handy.
def view(filename):
from pathlib import Path
from flask import Flask, send_file
app = Flask(__name__)
def get_view_func(_filename):
def fn(): return send_file(filename_or_fp=str(Path(_filename).resolve()))
return fn
print(f'python -m webbrowser -t "http://localhost:2280"')
app.add_url_rule(rule='/', view_func=get_view_func(filename))
app.run("127.0.0.1", FLASK_PORT_LOCAL, debug=False)
Appendix: GCP activation script
This is the bash script I use to set up the above tools from my mac for my GCP instance. People using GCP might find something useful in here.
_gcp_activate () {
# example full command: _gcp_activate myuserid [email protected] my-instance my-gcp-project us-central1-c $HOME/gcp/
clear
printf "#\n# [[ gcp_activate script ]] \n#\n"
printf "# mosh: on mac, 'brew install mosh'\n"
printf "# autossh: on mac, 'brew install autossh'\n"
printf "# sshfs: on mac, download osxfuse and sshfs from https://osxfuse.github.io/\n"
printf "# https://www.everythingcli.org/ssh-tunnelling-for-fun-and-profit-autossh/\n"
printf "# sshfs may need a new entry in $HOME/.ssh/known_hosts if logging in to a new host\n"
printf "# The error is \"remote host has disconnected\"\n"
printf "# to achieve that, delete the localhost:2222 entry from $HOME/.ssh/known_hosts\n"
printf "#\n"
[ $# -eq 0 ] && printf "No arguments supplied\n" && return 1
user=$1
account=$2
instance=$3
gcpproject=$4
zone=$5
mountpoint=$6
printf "#\n# 1. set gcp project, if it's not set \n#\n"
# automatically set project
echo gcloud config set account ${account};
echo gcloud config set project ${gcpproject};
# unmount sshfs
printf "#\n# 2. unmount sshfs (this command fails if it's already unmounted, which is ok)\n#\n"
echo umount -f ${mountpoint};
# commands
ssh_cmd=$(gcloud compute ssh ${user}@${instance} --zone=${zone} --dry-run) && \
external_ip=$(printf "${ssh_cmd}" | sed -E 's/.+@([0-9\.]+)/\1/') && \
autossh_cmd=$(printf "${ssh_cmd}" | sed s/'\/usr\/bin\/ssh'/'autossh -M 0 -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -f'/) && \
fullssh_cmd=$(printf "${autossh_cmd} -N -L 2288:localhost:8888 -L 2222:localhost:22 -L 2280:localhost:8000 -L 8385:localhost:8384 -L 8443:localhost:8443") && \
printf "#\n# 3. run autossh to set up ssh tunnels for jupyter (2288), web (2280), and sshfs (2222)\n#\n" && \
echo "${fullssh_cmd}" && \
printf "#\n# 4. run sshfs to mount to ${mountpoint}\n#\n" && \
echo sshfs -p 2222 -o reconnect,compression=yes,transform_symlinks,defer_permissions,IdentityFile=$HOME/.ssh/google_compute_engine,ServerAliveInterval=30,ServerAliveCountMax=0 -f \
${user}@localhost:. ${mountpoint} && \
printf "#\n# 5. run mosh\n#\n" && \
echo mosh -p 60000 --ssh=\"ssh -i $HOME/.ssh/google_compute_engine\" ${user}@${external_ip} -- tmux a
printf "#\n# 6. if mosh fails run this\n#\n"
echo gcloud compute ssh ${user}@${instance} -- killall mosh-server
}
gcp_activate () {
_gcp_activate myuserid [email protected] my-instance my-project us-central1-c $HOME/gcp/
}