IPython Notebook Tips

Brian Naughton | Thu 16 July 2015 | data | python ipython notebook data

IPython notebook is awesome, but it's still pretty immature in many ways. These are a few functions that I've found help make it a more pleasant experience.

Use retina displays

It's great to be able to make use of a retina display, if you have one. The difference in figure quality is very noticable. To use retina, just set this flag at the top of your notebook (you probably want inline plots too).

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

More prominent section breaks

IPython notebooks can look messy and disorienting. I like to include large section breaks using IPython.display.HTML.

from IPython.display import HTML

def new_section(title):
    style = "text-align:center;background:#66aa33;padding:120px;color:#ffffff;font-size:3em;"
    return HTML('<div style="{}">{}</div>'.format(style, title))

new_section("New Section")

New Section

Nicer image display

The IPython.display.Image function is very useful, but sometimes it's nice to use more of the horizontal space available. I extended the Image function to Images to allow for a table-based layout.

from IPython.display import display, HTML, Image

def Images(images, header=None, width="100%"): # to match Image syntax
    if type(width)==type(1): width = "{}px".format(width)
    html = ["<table style='width:{}'><tr>".format(width)]
    if header is not None:
        html += ["<th>{}</th>".format(h) for h in header] + ["</tr><tr>"]

    for image in images:
        html.append("<td><img src='{}' /></td>".format(image))
    html.append("</tr></table>")
    display(HTML(''.join(html)))

Images(["images/hanahaus.jpg","images/drug_approval.png"],
       header=["Hanahaus", "Drug approvals"], width="60%")

Hanahaus	Drug approvals

Running command-line tools

What do you do if you want to run a command-line tool from within IPython notebook? It can get very messy, since you may need to see stdout, but it may include a lot of unformatted text, which can make your notebook very unreadable.

To address this, I created my own wrapper around subprocess. I first have to set some global CSS for the notebook. There are no really good ways to do this yet, so I just call a HTML function at the top of the notebook. I then create a do_call function that runs the command-line tool (subprocess) and captures the output. The output is hidden but can be expanded using a CSS trick. It also keeps track of how long each subprocess took to run. This function also includes your environment variables from os.environ, which is often important.

I also include a @contextmanager-based function that allows me to run command-line tools from within specific directories. This comes up a lot.

This system was very useful for me when I was making a "pipeline" notebook, chaining together different command-line tools.

from IPython.display import HTML

HTML("""
<style>
    .showhide_label { display:block; cursor:pointer; }
    .showhide { position: absolute; left: -999em; }
    .showhide + div { display: none; }
    .showhide:checked + div { display: block; }
    .shown_or_hidden { font-size:85%; }
</style>
""")

import time, os, random
from IPython.display import display, HTML
from subprocess import Popen, PIPE
from contextlib import contextmanager

def do_call(cmd, stdin=None, stdout=None, stderr=None, env=None, base_dir=None):
    """Help call subprocess with some niceties. Output to html."""

    assert type(cmd)==type([])
    MAX_OUT = 30000 # characters to output to >stdout and >stderr

    def with_div(text, style="none", label=None):
        random_id = ''.join(random.choice("01234567890ABCDEF") for _ in range(16))
        div = {
            "none":"<div>{}</div>".format(text),
            "time":"<div style='color:#953'>{}</div>".format(text),
            "title":"<div style='font-size:125%;padding:5px;color:#6a3;'>{}</div>".format(text),
            "main":"<div style='border:2px solid #6a3;padding:5px;color:#555;'>{}</div>".format(text),
            "mono":"<div style='font-family:monospace;padding:5px;'><pre>{}</pre></div>".format(text),
            "hide":"""<label class="showhide_label" for="showhide_{}">▸{}</label>
                      <input type="checkbox" id="showhide_{}" class="showhide"/>
                      <div class="shown_or_hidden"><pre>{}</pre></div>""".format(random_id, label, random_id, text),
        }

        return div[style]

    # Keep track of the amount of time spent in the process
    start_time = time.time()

    # Treat Nones as empty
    cmd = [c for c in cmd if c is not None] # ignore Nones, which otherwise would be ""

    # Optionally, make the command more readable by pretending base_dir is an env
    # To actually make an env requires shell=True and seems worse since this is only cosmetic
    cmdstr = ' '.join(cmd).replace(base_dir, "$BASE") if base_dir is not None else ' '.join(cmd)

    # Use custom environment variables. os.environ variables are overwritten if duplicated
    process_env = dict(os.environ, **env) if env is not None else os.environ

    # Use Popen instead of subprocess.call to get stdout, stderr
    p = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, env=process_env)
    p_out, p_err = p.communicate(stdin)
    p_rc = p.returncode
    p_out = p_out if p_out != "" else "[No stdout]"
    p_err = p_err if p_err != "" else "[No stderr]"

    if stdout is not None: stdout.write(p_out)
    if stderr is not None: stderr.write(p_err)

    p_out_fmt = p_out if len(p_out) <= MAX_OUT else "{}\n{}".format(p_out[:MAX_OUT], "[TRUNCATED]")
    p_err_fmt = p_err if len(p_err) <= MAX_OUT else "{}\n{}".format(p_err[:MAX_OUT], "[TRUNCATED]")

    # Output a nicely formatted HTML block
    html = [with_div("Running subprocess", "title")]
    html += [with_div(cmdstr, "mono")]
    html += [with_div(p_out_fmt, "hide", "stdout ({})".format(p_out_fmt.count("\n")))]
    html += [with_div(p_err_fmt, "hide", "stderr ({})".format(p_err_fmt.count("\n")))]
    html += [with_div("subprocess time : {:.2f}s".format(time.time() - start_time), "time")]
    display(HTML(with_div(''.join(html), "main")))

@contextmanager
def using_dir(path):
    old_dir = os.getcwd()
    os.chdir(path)
    try:
        yield
    finally:
        os.chdir(old_dir)

# The same thing, two ways
with using_dir("/Users/briann/anaconda"):
    do_call(["ls", "."])

do_call(["ls", "/Users/briann/anaconda"])

Running subprocess

ls .

▸stdout (20)

Examples
Launcher.app
bin
conda-meta
docs
envs
imports
include
lib
mkspecs
node-webkit
phrasebooks
pkgs
plugins
python.app
q3porting.xml
share
ssl
tests
translations

▸stderr (0)

[No stderr]

subprocess time : 0.01s

Running subprocess

ls /Users/briann/anaconda

▸stdout (20)

Examples
Launcher.app
bin
conda-meta
docs
envs
imports
include
lib
mkspecs
node-webkit
phrasebooks
pkgs
plugins
python.app
q3porting.xml
share
ssl
tests
translations

▸stderr (0)

[No stderr]

subprocess time : 0.01s

Nicer printing

Sometimes the standard print function doesn't cut it. I have a few helper functions that make printing text a bit more flexible.

I also generally use "from __future__ import print_function" everywhere, since print should be a function anyway. I use "from __future__ import division" since we'll all be on Python 3 soon enough.

from __future__ import print_function, division

from IPython.display import display, HTML
from itertools import count
import yaml

def uprint(text):
    print("{}\n".format(text) + "-"*len(text))

def hprint(text):
    display(HTML(text))

def tprint(rows, header=True):
    html = ["<table>"]
    html_row = "</td><td>".join(k for k in rows[0])
    html.append("<tr style='font-weight:{}'><td>{}</td></tr>".format('bold' if header is True else 'normal', html_row))
    for row in rows[1:]:
        html_row = "</td><td>".join(r for r in row)
        html.append("<tr style='font-family:monospace;'><td>{:}</td></tr>".format(html_row))
    html.append("</table>")
    display(HTML(''.join(html)))

def jprint(dict_or_json, do_print=True, numbered=False):
    text = yaml.safe_dump(dict_or_json, indent=2, default_flow_style=False)
    if numbered:
        cnt = count(1)
        text = re.sub("^\-", lambda x: str(cnt.next()), text, 0, re.MULTILINE)
    if do_print:
        print(text)
    else:
        return text

uprint("Some text")
print("Normal text")

hprint("HTML table")
tprint([[random.choice("ACGT")*3 for _ in range(4)] for _ in range(4)])

uprint("Some JSON or a dict")
jprint({"a":{"b":"c"}, "d":"e", "f":{"g":{"i":"k"}}})

Some text
---------
Normal text

HTML table

CCC	GGG	TTT	AAA
TTT	TTT	CCC	AAA
AAA	AAA	CCC	AAA
GGG	TTT	CCC	TTT

Some JSON or a dict
-------------------
a:
  b: c
d: e
f:
  g:
    i: k

Use SVG for plotting

I really like using SVG for custom plots, since it is extremely flexible. For example, you can combine images/photos and data easily. Doing this does require learning some SVG, but it's not so complicated — in many ways matplotlib etc. are much more complex since you often can't get it to do exactly what you want (at least I can't.)

from IPython.display import display, SVG

def show_svg(svgs, width=1000, height=1000):
    SVG_HEAD = '''<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">'''
    SVG_START = '''<svg width="{w:}px" height="{h:}px" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink= "http://www.w3.org/1999/xlink">'''
    SVG_END = '</svg>'
    return display(SVG(SVG_HEAD + SVG_START.format(w=width, h=height) + svgs + SVG_END))

from random import random
w, h = 500, 390

def box(xy,wh,rgba=(50,50,50,1)):
    return '''<rect x="{}" y="{}" width="{}" height="{}" fill="rgba({:d},{:d},{:d},{:f})" stroke="rgb(0,0,0)" />
        '''.format(xy[0],xy[1], wh[0],wh[1], rgba[0],rgba[1],rgba[2],rgba[3])

svgs = ['<image xlink:href="static/biospace-news-biogen-idec-2.png" x="0" y="0" width="{:d}px" height="{:d}px"/>'.format(w,h)]
svgs += [box((100,140),(200,180), rgba=(0,255,255,.35))]
svgs += ['''<text x="{x:d}" y="{y:d}" text-anchor="middle" font-size="28" fill="rgba(0,255,255,.95)">
         IMPORTANT!</text>'''.format(x=200, y=120)]

show_svg(''.join(svgs), width=w, height=h)