Engineering Proteins in the Cloud with Python and Transcriptic, or, How to Make Any Protein You Want for $360

Brian Naughton // Mon 21 March 2016 // Filed under biotech // Tags virtual biotech cloud labs data transcriptic robots

What if you had an idea for a cool, useful protein, and you wanted to turn it into a reality? For example, what if you wanted to create a vaccine against H. pylori (like the 2008 Slovenian iGEM team) by creating a hybrid protein that combines the parts of E. coli flagellin that stimulate an immune response with regular H. pylori flagellin?

A design for a hybrid flagellin H. pylori vaccine, from the 2008 Slovenian iGEM team

Amazingly, we're pretty close to being able to create any protein we want from the comfort of our jupyter notebooks, thanks to developments in genomics, synthetic biology, and most recently, cloud labs.

In this article I'll develop Python code that will take me from an idea for a protein all the way to expression of the protein in a bacterial cell, all without touching a pipette or talking to a human. The total cost will only be a few hundred dollars! Using Vijay Pande from A16Z's terminology, this is Bio 2.0.

To make this a bit more concrete, this article describes a cloud lab protocol in Python to do the following:

  • Synthesize a DNA sequence that encodes any protein I want.
  • Clone that synthetic DNA into a vector that can express it.
  • Transform bacteria with that vector and confirm that it is expressed.

Python Setup

First, some general Python setup that I need for any jupyter notebook. I import some useful Python modules and make some utility functions, mainly for plotting and data presentation.

import re
import json
import logging
import requests
import itertools
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

from io import StringIO
from pprint import pprint
from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
from IPython.display import display, Image, HTML, SVG

def uprint(astr): print(astr + "\n" + "-"*len(astr))
def show_html(astr): return display(HTML('{}'.format(astr)))
def show_svg(astr, w=1000, h=1000):
    SVG_HEAD = '''<?xml version="1.0" standalone="no"?><!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "">'''
    SVG_START = '''<svg viewBox="0 0 {w:} {h:}" version="1.1" xmlns="" xmlns:xlink= "">'''
    return display(SVG(SVG_HEAD + SVG_START.format(w=w, h=h) + astr + '</svg>'))

def table_print(rows, header=True):
    html = ["<table>"]
    html_row = "</td><td>".join(k for k in rows[0])
    html.append("<tr style='font-weight:{}'><td>{}</td></tr>".format('bold' if header is True else 'normal', html_row))
    for row in rows[1:]:
        html_row = "</td><td>".join(row)
        html.append("<tr style='font-family:monospace;'><td>{:}</td></tr>".format(html_row))

def clean_seq(dna):
    dna = re.sub("\s","",dna)
    assert all(nt in "ACGTN" for nt in dna)
    return Seq(dna, generic_dna)

def clean_aas(aas):
    aas = re.sub("\s","",aas)
    assert all(aa in "ACDEFGHIKLMNPQRSTVWY*" for aa in aas)
    return aas

def Images(images, header=None, width="100%"): # to match Image syntax
    if type(width)==type(1): width = "{}px".format(width)
    html = ["<table style='width:{}'><tr>".format(width)]
    if header is not None:
        html += ["<th>{}</th>".format(h) for h in header] + ["</tr><tr>"]

    for image in images:
        html.append("<td><img src='{}' /></td>".format(image))

def new_section(title, color="#66aa33", padding="120px"):
    style = "text-align:center;background:{};padding:{} 10px {} 10px;".format(color,padding,padding)
    style += "color:#ffffff;font-size:2.55em;line-height:1.2em;"
    return HTML('<div style="{}">{}</div>'.format(style, title))

# Show or hide text
    .section { display:flex;align-items:center;justify-content:center;width:100%; height:400px; background:#6a3;color:#eee;font-size:275%; }
    .showhide_label { display:block; cursor:pointer; }
    .showhide { position: absolute; left: -999em; }
    .showhide + div { display: none; }
    .showhide:checked + div { display: block; }
    .shown_or_hidden { font-size:85%; }

# Plotting style
plt.rc("axes", titlesize=20, labelsize=15, linewidth=.25, edgecolor='#444444')
sns.set_context("notebook", font_scale=1.2, rc={})
%matplotlib inline
%config InlineBackend.figure_format = 'retina' # or 'svg'
Cloud Labs

Just like AWS or any compute cloud, a cloud lab owns molecular biology equipment and robots, and will rent them out to you in small increments. You can issue instructions to their robots by clicking some buttons on their front-end, or write code to instruct the robots yourself. It's not necessarily better to write your own protocols as I'll do here — in general a lot of molecular biology is the same few routine tasks, so you generally want to rely on a robust protocol that someone else has shown performs well with robots.

There are a number of nascent cloud lab companies out there: Transcriptic, Autodesk Wet Lab Accelerator (beta, and built on top of Transcriptic), Arcturus BioCloud (beta), Emerald Cloud Lab (beta), Synthego (not yet live). There are even companies built on top of cloud labs, like Desktop Genetics, which specializes in CRISPR. Scientific papers, like this one from the Siegel lab, are just starting to appear that use cloud labs to do real science.

At the time of writing, only Transcriptic is available for general use so that's what I'll be using. As I understand it, most of Transcriptic's business comes from automating common protocols, and writing your own protocols in Python (as I'll be doing in this article) is less common.

A "work cell" at Transcriptic, with freezers visible at the bottom and various bits of lab equipment on the bench

I issue instructions to Transcriptic's robots using autoprotocol. Autoprotocol is a JSON-based language for writing protocols for lab robots (and humans, sort of). Autoprotocol is mainly written using this Python library. The language was originally developed by, and is still supported by Transcriptic, but as I understand it it's completely open. There is some good and improving documentation.

One fascinating thing to think about here is that you could also generate an autoprotocol protocol and submit it to a lab staffed by humans — say in China or India — and potentially get some of the advantages of using humans (their judgement) and robots (their lack of judgement). I should also mention here, which is also an effort to standardize protocols for increased reproducibility, but aimed at humans instead of robots.

"instructions": [
      "to": [
          "well": "water/0",
          "volume": "500.0:microliter"
      "op": "provision",
      "resource_id": "rs17gmh5wafm5p"

Example snippet of autoprotocol

Molecular Biology Python Setup

As well as standard Python imports, I'll need some molecular-biology–specific utilities. This code is primarily autoprotocol- and Transcriptic-centric.

One thing that comes up a lot in this code is the concept of dead volume. This means the last bit of liquid that Transcriptic's robots cannot consistently pipette out of tubes (because they can't see!). I have to spend quite a bit of time ensuring that there is enough volume left in my tubes.

import autoprotocol
from autoprotocol import Unit
from autoprotocol.container import Container
from autoprotocol.protocol import Protocol
from autoprotocol.protocol import Ref # "Link a ref name (string) to a Container instance."
import requests
import logging

# Transcriptic authorization
org_name = 'hgbrian'
tsc_headers = {k:v for k,v in json.load(open("auth.json")).items() if k in ["X_User_Email","X_User_Token"]}

# Transcriptic-specific dead volumes
_dead_volume = [("96-pcr",3), ("96-flat",25), ("96-flat-uv",25), ("96-deep",15),
                ("384-pcr",2), ("384-flat",5), ("384-echo",15),
                ("micro-1.5",15), ("micro-2.0",15)]
dead_volume = {k:Unit(v,"microliter") for k,v in _dead_volume}

def init_inventory_well(well, headers=tsc_headers, org_name=org_name):
    """Initialize well (set volume etc) for Transcriptic"""
    def _container_url(container_id):
        return '{}/samples/{}.json'.format(org_name, container_id)

    response = requests.get(_container_url(, headers=headers)

    container = response.json()
    well_data = container['aliquots'][well.index] = "{}/{}".format(container["label"], well_data['name']) if well_data['name'] is not None else container["label"] = well_data['properties']
    well.volume = Unit(well_data['volume_ul'], 'microliter')

    if 'ERROR' in
        raise ValueError("Well {} has ERROR property: {}".format(well,["ERROR"]))
    if well.volume < Unit(20, "microliter"):
        logging.warn("Low volume for well {} : {}".format(, well.volume))

    return True

def touchdown(fromC, toC, durations, stepsize=2, meltC=98, extC=72):
    """Touchdown PCR protocol generator"""
    assert 0 < stepsize < toC < fromC
    def td(temp, dur): return {"temperature":"{:2g}:celsius".format(temp), "duration":"{:d}:second".format(dur)}

    return [{"cycles": 1, "steps": [td(meltC, durations[0]), td(C, durations[1]), td(extC, durations[2])]}
            for C in np.arange(fromC, toC-stepsize, -stepsize)]

def convert_ug_to_pmol(ug_dsDNA, num_nts):
    """Convert ug dsDNA to pmol"""
    return float(ug_dsDNA)/num_nts * (1e6 / 660.0)

def expid(val):
    """Generate a unique ID per experiment"""
    return "{}_{}".format(experiment_name, val)

def µl(microliters):
    """Unicode function name for creating microliter volumes"""
    return Unit(microliters,"microliter")
DNA synthesis & synthetic biology

Despite its connection to modern synthetic biology, DNA synthesis is a fairly old technology. We've been able to make "oligos" (meaning DNA sequences of <~200bp) for decades. It's always been expensive though, and the chemistry has never allowed for long sequences of DNA. Recently, it's become feasible to synthesize entire genes (up to thousands of bases) at a reasonable price. This advance is really what is enabling the era of "synthetic biology".

Craig Venter's Synthetic Genomics has taken synthetic biology the furthest by synthesizing an entire organism — over a million bases in length. As the length of the DNA grows, the problem becomes more about assembly (i.e., stitching together synthesized DNA sequences) rather than synthesis. Each time you assemble you can double the length of your DNA (or more), so after a dozen or so iterations you can get pretty long! The distinction between synthesis and assembly should become transparent to the end-user fairly soon.

Moore's Lab?

The price of DNA synthesis has been falling pretty quickly, from over 30c per base a couple of years ago to around 10c per base today, but it's developing more like batteries than CPUs. In contrast, DNA sequencing costs have been falling faster than Moore's law. A target of 2c/base has been mooted as an inflection point where you can replace a lot of money-saving-but-laborious DNA manipulation with simple synthesis. For example, at 2c/base you could synthesize an entire 3kb plasmid for $60, and skip over a ton of molecular biology chores. Hopefully that's what we'll all be doing in a couple of years.

DNA synthesis vs DNA sequencing costs (Carlson, 2014)

DNA Synthesis Companies

There are a few big companies in the DNA synthesis space: IDT is the largest manufacturer of oligos, and can produce longer (up to 2kb) "gene fragments" (gBlocks) too; Gen9, Twist, DNA 2.0 generally focus on longer DNA sequences — these are the gene synthesis companies. There are also some exciting new companies like Cambrian Genomics and Genesis DNA that are working on next-generation synthesis methods.

Other companies, like Amyris, Zymergen and Ginkgo Bioworks, use the DNA sythesized by these companies to do organism-scale work. Synthetic Genomics also does that but synthesizes its own DNA.

Recently, Ginkgo did a deal with Twist for 100 million bases, a leap over anything public I've seen before. In a move that proves we're living in the future, Twist has even advertised a promo code on Twitter for a deal where if you buy 10 million bases of DNA (almost an entire yeast genome!), you get 10 million bases free.

A niche Twist deal on Twitter

Part One: Designing the Experiment
Green Fluorescent Protein

For this experiment I want to synthesize the DNA sequence for a simple protein, Green Fluorescent Protein (GFP). GFP is a protein, first found in a jellyfish, that fluoresces under UV light. It's an extremely useful protein since it's easy to tell where it is being expressed simply by measuring fluorescence. There are variants of GFP that produce yellow, red, orange and other colors.

It's interesting to see how various mutations affect the color of the protein, and potentially an interesting machine learning problem. Not long ago, this would have been a significant investment of time in the lab, but now, as I'll show, it is (almost) as simple as editing a text file!

Technically, my GFP is a "superfolder" variant (sfGFP), with some enhancing mutations.

Superfolder GFP (sfGFP) has mutations that give it some useful properties

The Structure of GFP (visualized with PV)

Synthesizing GFP with Twist

I was fortunate to be included in Twist's alpha program so I used them to synthesize my DNA (they graciously accommodated my tiny order — thanks Twist!) Twist is a new company in the space, with a novel miniaturized process for synthesis. Although Twist's pricing is probably the best around at 10c a base or lower, they are still in beta only, and the alpha program I took part in has closed. Twist has raised about $150M so there is a lot of enthusiasm for their technology.

I sent my DNA sequence to Twist as an Excel spreadsheet (there is no API yet but I assume it'll come soon), and they sent the synthesized DNA directly to my inventory in Transcriptic's labs. (I also used IDT for synthesis, but since they did not ship the DNA straight to Transcriptic, it kind of ruins the fun.)

This process is clearly not a common use-case yet and required some hand-holding, but it worked, and it keeps the entire pipeline virtual. Without this, I would have likely needed access to a laboratory — many companies will not ship DNA or reagents to a home address.

GFP is harmless, and can make any species glow

The Plasmid Vector

To express my protein in bacteria, I need somewhere for the gene to live, otherwise the synthetic DNA encoding the gene will just be instantly degraded. Generally, in molecular biology we use a plasmid, a bit of circular DNA that lives outside the bacterial genome and expresses proteins. Plasmids are a convenient way for bacteria to share useful self-contained functional modules like antibiotic resistance. There can be hundreds of plasmids per cell.

The commonly used terminology is that the plasmid is the vector and the synthetic DNA is the insert. So here I'm trying to clone the insert into the vector, then transform bacteria with the vector.

A bacterial genome and plasmid (not to scale!) (Wikipedia)


I chose a fairly standard plasmid in pUC19. This plasmid is very commonly used and since it's available as part of the standard Transcriptic inventory, I do not need to ship anything to Transcriptic.

The structure of pUC19: the major components are an ampicillin resistance gene, lacZα, an MCS/polylinker, and an origin of replication (Wikipedia)

pUC19 has a nice feature where because it contains a lacZα gene, you can use it in a blue–white screen and see which colonies have had successful insertion events. You need two chemicals, IPTG and X-gal, and it works as follows:

  • lacZα expression is induced by IPTG
  • If lacZα is inactivated — by DNA inserted at the multiple cloning site (MCS/polylinker) within lacZα — then the plasmid cannot hydrolyze X-gal, and these colonies will be white instead of blue
  • Therefore a successful insertion produces white colonies and an unsuccessful insertion produces blue colonies

A blue–white screen shows where lacZα expression has been inactivated (Wikipedia)

A document on openwetware states:

DH5α E. coli does not require IPTG to induce expression from the lac promoter even though the strain expresses the Lac repressor. The copy number of most plasmids exceeds the repressor number in the cells. If you are concerned about obtaining maximal levels of expression, add IPTG to a final concentration of 1 mM.
Synthetic DNA Sequences

sfGFP DNA Sequence

It's straightforward to get a DNA sequence for sfGFP by taking the protein sequence and encoding it with codons suitable for the host organism (here, E. coli). It's a medium-sized protein at 236 amino acids, which means at 10c/base it costs me about $70 to synthesize the DNA.

Wolfram Alpha, calculating the cost of synthesis

The first 12 bases of my sfGFP are a Shine-Dalgarno sequence that I added myself, which should in theory increase expression (AGGAGGACAGCT, then an ATG (start codon) starts the protein). According to a computational tool developed by the Salis Lab (lecture slides), I should expect medium to high expression of my protein (a Translation Initiation Rate of 10,000 "arbitrary units").

sfGFP_plus_SD = clean_seq("""
print("Read in sfGFP plus Shine-Dalgarno: {} bases long".format(len(sfGFP_plus_SD)))

assert sfGFP_plus_SD[12:].translate() == sfGFP_aas
print("Translation matches protein with accession 532528641")
Read in sfGFP plus Shine-Dalgarno: 726 bases long
Translation matches protein with accession 532528641

pUC19 DNA Sequence

I first check that the pUC19 sequence I downloaded from NEB and is the right length, and that it includes the multiple cloning site I expect.

pUC19_fasta = !cat puc19fsa.txt
pUC19_fwd = clean_seq(''.join(pUC19_fasta[1:]))
pUC19_rev = pUC19_fwd.reverse_complement()
assert all(nt in "ACGT" for nt in pUC19_fwd)
assert len(pUC19_fwd) == 2686

print("Read in pUC19: {} bases long".format(len(pUC19_fwd)))
assert pUC19_MCS in pUC19_fwd
print("Found MCS/polylinker")
Read in pUC19: 2686 bases long
Found MCS/polylinker

I do some basic QC is to make sure that EcoRI and BamHI are only present in pUC19 once. (The following restriction enzymes are available in Transcriptic's default inventory: PstI, PvuII, EcoRI, BamHI, BbsI, BsmBI.)

REs = {"EcoRI":"GAATTC", "BamHI":"GGATTC"}
for rename, res in REs.items():
    assert (pUC19_fwd.find(res) == pUC19_fwd.rfind(res) and
            pUC19_rev.find(res) == pUC19_rev.rfind(res))
    assert (pUC19_fwd.find(res) == -1 or pUC19_rev.find(res) == -1 or
            pUC19_fwd.find(res) == len(pUC19_fwd) - pUC19_rev.find(res) - len(res))
print("Asserted restriction enzyme sites present only once: {}".format(REs.keys()))

Now I look at the lacZα sequence and make sure there is nothing unexpected. For example, it should start with a Met and end with a stop codon. It's also easy to confirm that this is the full 324bp lacZα ORF by loading the pUC19 sequece into the free snapgene viewer tool.

lacZ = pUC19_rev[2217:2541]
print("lacZα sequence:\t{}".format(lacZ))
print("r_MCS sequence:\t{}".format(pUC19_MCS.reverse_complement()))

lacZ_p = lacZ.translate()
assert lacZ_p[0] == "M" and not "*" in lacZ_p[:-1] and lacZ_p[-1] == "*"
assert pUC19_MCS.reverse_complement() in lacZ
assert pUC19_MCS.reverse_complement() == pUC19_rev[2234:2291]
print("Found MCS once in lacZ sequence")
Found MCS once in lacZ sequence
Gibson Assembly

DNA assembly simply means stitching DNA together. Usually, you assemble several pieces of DNA into a longer segment, and then clone that into a plasmid or genome. In this experiment I just want to clone one DNA segment into the pUC19 plasmid downstream of the lac promoter, so that it will be expressed in E. coli.

There are many different ways to do cloning (e.g., see NEB, openwetware, addgene). Here I will use Gibson assembly (developed by Daniel Gibson at Synthetic Genomics in 2009), which is not necessarily the cheapest method, but is straightforward and flexible. All you have to do is put all the DNA you want to assemble (with the appropriate overlaps) in a tube with the Gibson Assembly Master Mix, and it assembles itself!

Gibson assembly overview (NEB)

Starting material

I am starting with 100ng of synthetic DNA in 10µl of liquid. That equates to 0.21 picomoles of DNA or a concentration of 10ng/µl.

pmol_sfgfp = convert_ug_to_pmol(0.1, len(sfGFP_plus_SD))
print("Insert: 100ng of DNA of length {:4d} equals {:.2f} pmol".format(len(sfGFP_plus_SD), pmol_sfgfp))
Insert: 100ng of DNA of length  726 equals 0.21 pmol

According to NEB's Gibson assembly protocol, this is sufficient starting material for the protocol:

NEB recommends a total of 0.02–0.5 pmols of DNA fragments when 1 or 2 fragments are being assembled into a vector and 0.2–1.0 pmoles of DNA fragments when 4–6 fragments are being assembled

0.02–0.5 pmols* X µl
* Optimized cloning efficiency is 50–100 ng of vectors with 2–3 fold of excess inserts. Use 5 times more of inserts if size is less than 200 bps. Total volume of unpurified PCR fragments in Gibson Assembly reaction should not exceed 20%.

NEBuilder for Gibson Assembly

New England Biolab's NEBuilder is a really excellent tool to help you design your Gibson assembly protocol. It even generates a comprehensive four-page PDF protocol for you. Using this tool, I design a protocol to cut pUC19 with EcoRI and then use PCR to add appropriately sized flanks to the insert.

Part Two: Running The Experiment

There are four steps in the experiment:

  1. PCR of the insert to add complementary flanks;
  2. Cutting the plasmid to accommodate the insert;
  3. Gibson assembly of insert and plasmid;
  4. Transforming the bacteria with the assembled plasmid.
Step 1. PCR of the Insert

Gibson assembly relies on the DNA sequences you are assembling having some overlapping sequence (see the NEB protocol above for detailed instructions). As well as simple amplification, PCR also enables you to add flanking DNA to a sequence by simply including the additional sequence in the primers. (You can also clone using only OE-PCR).

I synthesize primers according to the NEB protocol above. I used a Quickstart protocol on the Transcriptic website to try it out, but there's also an autoprotocol command. Transcriptic does not do oligo synthesis in-house, so after a day or two of waiting, these primers magically appear in my inventory. (Note, the gene-specific part of the primers below is upper-case but it's just cosmetic.)

insert_primers = ["aaacgacggccagtgTTTATACAGTTCATCCATTCCATG", "cgggtaccgagctcgAGGAGGACAGCTATGTCG"]

Primer analysis

I can analyze the properties of these primers using IDT OligoAnalyzer. It is useful to know the melting temperatures and propensity for dimer-forming when debugging a PCR experiment, though the NEB protocol almost certainly chooses primers with good properties.

Gene-specific portion of flank (uppercase)
  Melt temperature: 51C, 53.5C
Full sequence
  Melt temperature: 64.5C, 68.5C
  Hairpin: -.4dG, -5dG
  Self-dimer: -9dG, -16dG
  Heterodimer: -6dG

I went through many iterations of this PCR protocol before getting results I was satisfied with, including experimenting with several different brands of PCR mixes. Since each of these iterations can take several days, (depending on the length of the queue at the lab) it is worth spending time debugging upfront, since it saves a lot of time in the long run. As cloud lab capacity increases this issue should diminish. Still, I would not assume that your first protocol will succeed — there are many variables at work here.

""" PCR overlap extension of sfGFP according to NEB protocol.
v5: Use 3/10ths as much primer as the v4 protocol.
v6: more complex touchdown pcr procedure. The Q5 temperature was probably too hot
v7: more time at low temperature to allow gene-specific part to anneal
v8: correct dNTP concentration, real touchdown

p = Protocol()

# ---------------------------------------------------
# Set up experiment
experiment_name = "sfgfp_pcroe_v8"
template_length = 740

_options = {'dilute_primers' : False, # if working stock has not been made
            'dilute_template': False, # if working stock has not been made
            'dilute_dNTP'    : False, # if working stock has not been made
            'run_gel'        : True,  # run a gel to see the plasmid size
            'run_absorbance' : False, # check absorbance at 260/280/320
            'run_sanger'     : False} # sanger sequence the new sequence
options = {k for k,v in _options.items() if v is True}

# ---------------------------------------------------
# Inventory and provisioning
# 'sfgfp2':              'ct17yx8h77tkme', # inventory; sfGFP tube #2, micro-1.5, cold_20
# 'sfgfp_puc19_primer1': 'ct17z9542mrcfv', # inventory; micro-2.0, cold_4
# 'sfgfp_puc19_primer2': 'ct17z9542m5ntb', # inventory; micro-2.0, cold_4
# 'sfgfp_idt_1ngul':     'ct184nnd3rbxfr', # inventory; micro-1.5, cold_4, (ERROR: no template)
inv = {
    'Q5 Polymerase':                     'rs16pcce8rdytv', # catalog; Q5 High-Fidelity DNA Polymerase
    'Q5 Buffer':                         'rs16pcce8rmke3', # catalog; Q5 Reaction Buffer
    'dNTP Mixture':                      'rs16pcb542c5rd', # catalog; dNTP Mixture (25mM?)
    'water':                             'rs17gmh5wafm5p', # catalog; Autoclaved MilliQ H2O
    'sfgfp_pcroe_v5_puc19_primer1_10uM': 'ct186cj5cqzjmr', # inventory; micro-1.5, cold_4
    'sfgfp_pcroe_v5_puc19_primer2_10uM': 'ct186cj5cq536x', # inventory; micro-1.5, cold_4
    'sfgfp1':                            'ct17yx8h759dk4', # inventory; sfGFP tube #1, micro-1.5, cold_20

# Existing inventory
template_tube = p.ref("sfgfp1", id=inv['sfgfp1'], cont_type="micro-1.5", storage="cold_4").well(0)
dilute_primer_tubes = [p.ref('sfgfp_pcroe_v5_puc19_primer1_10uM', id=inv['sfgfp_pcroe_v5_puc19_primer1_10uM'], cont_type="micro-1.5", storage="cold_4").well(0),
                       p.ref('sfgfp_pcroe_v5_puc19_primer2_10uM', id=inv['sfgfp_pcroe_v5_puc19_primer2_10uM'], cont_type="micro-1.5", storage="cold_4").well(0)]

# New inventory resulting from this experiment
dilute_template_tube = p.ref("sfgfp1_0.25ngul",  cont_type="micro-1.5", storage="cold_4").well(0)
dNTP_10uM_tube       = p.ref("dNTP_10uM",        cont_type="micro-1.5", storage="cold_4").well(0)
sfgfp_pcroe_out_tube = p.ref(expid("amplified"), cont_type="micro-1.5", storage="cold_4").well(0)

# Temporary tubes for use, then discarded
mastermix_tube = p.ref("mastermix", cont_type="micro-1.5", storage="cold_4",  discard=True).well(0)
water_tube =     p.ref("water",     cont_type="micro-1.5", storage="ambient", discard=True).well(0)
pcr_plate =      p.ref("pcr_plate", cont_type="96-pcr",    storage="cold_4",  discard=True)
if 'run_absorbance' in options:
    abs_plate = p.ref("abs_plate", cont_type="96-flat", storage="cold_4", discard=True)

# Initialize all existing inventory
all_inventory_wells = [template_tube] + dilute_primer_tubes
for well in all_inventory_wells:
    print(, well.volume,

# -----------------------------------------------------
# Provision water once, for general use
p.provision(inv["water"], water_tube, µl(500))

# -----------------------------------------------------
# Dilute primers 1/10 (100uM->10uM) and keep at 4C
if 'dilute_primers' in options:
    for primer_num in (0,1):
        p.transfer(water_tube, dilute_primer_tubes[primer_num], µl(90))
        p.transfer(primer_tubes[primer_num], dilute_primer_tubes[primer_num], µl(10), mix_before=True, mix_vol=µl(50))
        p.mix(dilute_primer_tubes[primer_num], volume=µl(50), repetitions=10)

# -----------------------------------------------------
# Dilute template 1/10 (10ng/ul->1ng/ul) and keep at 4C
# OR
# Dilute template 1/40 (10ng/ul->0.25ng/ul) and keep at 4C
if 'dilute_template' in options:
    p.transfer(water_tube, dilute_template_tube, µl(195))
    p.mix(dilute_template_tube, volume=µl(100), repetitions=10)

# Dilute dNTP to exactly 10uM
if 'dilute_DNTP' in options:
    p.transfer(water_tube,           dNTP_10uM_tube, µl(6))
    p.provision(inv["dNTP Mixture"], dNTP_10uM_tube, µl(4))

# -----------------------------------------------------
# Q5 PCR protocol
# 25ul reaction
# -------------
# Q5 reaction buffer      5    µl
# Q5 polymerase           0.25 µl
# 10mM dNTP               0.5  µl -- 1µl = 4x12.5mM
# 10uM primer 1           1.25 µl
# 10uM primer 2           1.25 µl
# 1pg-1ng Template        1    µl -- 0.5 or 1ng/ul concentration
# -------------------------------
# Sum                     9.25 µl

# Mastermix tube will have 96ul of stuff, leaving space for 4x1ul aliquots of template
p.transfer(water_tube,             mastermix_tube, µl(64))
p.provision(inv["Q5 Buffer"],      mastermix_tube, µl(20))
p.provision(inv['Q5 Polymerase'],  mastermix_tube, µl(1))
p.transfer(dNTP_10uM_tube,         mastermix_tube, µl(1), mix_before=True, mix_vol=µl(2))
p.transfer(dilute_primer_tubes[0], mastermix_tube, µl(5), mix_before=True, mix_vol=µl(10))
p.transfer(dilute_primer_tubes[1], mastermix_tube, µl(5), mix_before=True, mix_vol=µl(10))
p.mix(mastermix_tube, volume="48:microliter", repetitions=10)

# Transfer mastermix to pcr_plate without template
p.transfer(mastermix_tube, pcr_plate.wells(["A1","B1","C1"]), µl(24))
p.transfer(mastermix_tube, pcr_plate.wells(["A2"]),           µl(24)) # acknowledged dead volume problems
p.mix(pcr_plate.wells(["A1","B1","C1","A2"]), volume=µl(12), repetitions=10)

# Finally add template
p.transfer(template_tube,  pcr_plate.wells(["A1","B1","C1"]), µl(1))
p.mix(pcr_plate.wells(["A1","B1","C1"]), volume=µl(12.5), repetitions=10)

# ---------------------------------------------------------
# Thermocycle with Q5 and hot start
# 61.1 annealing temperature is recommended by NEB protocol
# p.seal is enforced by transcriptic
extension_time = int(max(2, np.ceil(template_length * (11.0/1000))))
assert 0 < extension_time < 60, "extension time should be reasonable for PCR"

cycles = [{"cycles":  1, "steps": [{"temperature": "98:celsius", "duration": "30:second"}]}] + \
         touchdown(70, 61, [8, 25, extension_time], stepsize=0.5) + \
         [{"cycles": 16, "steps": [{"temperature": "98:celsius", "duration": "8:second"},
                                   {"temperature": "61.1:celsius", "duration": "25:second"},
                                   {"temperature": "72:celsius", "duration": "{:d}:second".format(extension_time)}]},
          {"cycles":  1, "steps": [{"temperature": "72:celsius", "duration": "2:minute"}]}]
p.thermocycle(pcr_plate, cycles, volume=µl(25))

# --------------------------------------------------------
# Run a gel to hopefully see a 740bp fragment
if 'run_gel' in options:
    p.mix(pcr_plate.wells(["A1","B1","C1","A2"]), volume=µl(12.5), repetitions=10)
    p.transfer(pcr_plate.wells(["A1","B1","C1","A2"]), pcr_plate.wells(["D1","E1","F1","D2"]),
               [µl(2), µl(4), µl(8), µl(8)])
    p.transfer(water_tube, pcr_plate.wells(["D1","E1","F1","D2"]),
               [µl(18),µl(16),µl(12),µl(12)], mix_after=True, mix_vol=µl(10))
                   µl(20), "agarose(10,2%)", "ladder1", "10:minute", expid("gel"))

# Absorbance dilution series. Take 1ul out of the 25ul pcr plate wells
if 'run_absorbance' in options:
    abs_wells = ["A1","B1","C1","A2","B2","C2","A3","B3","C3"]

    p.transfer(water_tube, abs_plate.wells(abs_wells[0:6]), µl(10))
    p.transfer(water_tube, abs_plate.wells(abs_wells[6:9]), µl(9))

    p.transfer(pcr_plate.wells(["A1","B1","C1"]), abs_plate.wells(["A1","B1","C1"]), µl(1), mix_after=True, mix_vol=µl(5))
    p.transfer(abs_plate.wells(["A1","B1","C1"]), abs_plate.wells(["A2","B2","C2"]), µl(1), mix_after=True, mix_vol=µl(5))
    p.transfer(abs_plate.wells(["A2","B2","C2"]), abs_plate.wells(["A3","B3","C3"]), µl(1), mix_after=True, mix_vol=µl(5))
    for wavelength in [260, 280, 320]:
        p.absorbance(abs_plate, abs_plate.wells(abs_wells),
                     "{}:nanometer".format(wavelength), exp_id("abs_{}".format(wavelength)), flashes=25)

# -----------------------------------------------------------------------------
# Sanger sequencing:
# "Each reaction should have a total volume of 15 µl and we recommend the following composition of DNA and primer:
#  PCR product (40 ng), primer (1 µl of a 10 µM stock)"
#  By comparing to the gel ladder concentration (175ng/lane), it looks like 5ul of PCR product has approximately 30ng of DNA
if 'run_sanger' in options:
    seq_wells = ["G1","G2"]
    for primer_num, seq_well in [(0, seq_wells[0]),(1, seq_wells[1])]:
        p.transfer(dilute_primer_tubes[primer_num], pcr_plate.wells([seq_well]),
                   µl(1), mix_before=True, mix_vol=µl(50))
        p.transfer(pcr_plate.wells(["A1"]), pcr_plate.wells([seq_well]),
                   µl(5), mix_before=True, mix_vol=µl(10))
        p.transfer(water_tube, pcr_plate.wells([seq_well]), µl(9))

    p.mix(pcr_plate.wells(seq_wells), volume=µl(7.5), repetitions=10)
    p.sangerseq(pcr_plate, pcr_plate.wells(seq_wells[0]).indices(), expid("seq1"))
    p.sangerseq(pcr_plate, pcr_plate.wells(seq_wells[1]).indices(), expid("seq2"))

# -------------------------------------------------------------------------
# Then consolidate to one tube. Leave at least 3ul dead volume in each tube
remaining_volumes = [well.volume - dead_volume['96-pcr'] for well in pcr_plate.wells(["A1","B1","C1"])]
print("Consolidated volume", sum(remaining_volumes, µl(0)))
p.consolidate(pcr_plate.wells(["A1","B1","C1"]), sfgfp_pcroe_out_tube, remaining_volumes, allow_carryover=True)

uprint("\nProtocol 1. Amplify the insert (oligos previously synthesized)")
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze
WARNING:root:Low volume for well sfGFP 1 /sfGFP 1  : 2.0:microliter
sfGFP 1 /sfGFP 1  2.0:microliter {'dilution': '0.25ng/ul'}
sfgfp_pcroe_v5_puc19_primer1_10uM 75.0:microliter {}
sfgfp_pcroe_v5_puc19_primer2_10uM 75.0:microliter {}
Consolidated volume 52.0:microliter

Protocol 1. Amplify the insert (oligos previously synthesized)

✓ Protocol analyzed
  11 instructions
  8 containers
  Total Cost: $32.18
  Workcell Time: $4.32
  Reagents & Consumables: $27.86
Results: PCR of the insert

Analysis of gel results

By running a gel I can see if the amplified product is the right size (position of the band in the gel), and the right quantity (darkness of the band). The gel has a ladder corresponding to different lengths and quantities of DNA that can be used for comparison.

In the gel photograph below, lanes D1, E1, F1 contain 2µl, 4µl, and 8µl of amplified product, respectively. I can estimate the amount of DNA in each lane by comparison to the DNA in the ladder (50ng of DNA per band in the ladder). I think the results look very clean.

I tried using GelEval to analyze the image and estimate concentrations, and it worked pretty well, though I'm not sure it would be much more accurate than a more naive method. However, small changes to the location and size of the bands led to large changes in the estimate of the amount of DNA. My best estimate for the amount of DNA in my amplified product using GelEval is 40ng/µl.

If I assume that I am limited by the amount of primer in the mixture, as opposed to the amount of dNTP or enzyme, then since I have 12.5pmol of each primer, that implies a theoretical maximum of 6µg of 740bp DNA in 25µl. Since my estimate for the total amount of DNA using GelEval is 40ng x 25µl (1µg or 2pmol), these results are very reasonable and close to what I should expect under ideal conditions.

Gel electrophoresis of an EcoRI-cut pUC19, various concentrations (D1, E1, F1), plus a control (D2)

PCR results diagnostics

Recently, Transcriptic has started providing some interesting and useful diagnostic data, outputted by its robots. At the time of writing, the data were not available for download, so for now I just have an image of temperatures during thermocycling.

The data looks good, with no unexpected peaks or troughs. The PCR cycles 35 times in total, but some of these cycles are spent at very high temperature as part of the touchdown PCR process. In my previous attempts to amplify this segment — of which there were a few! — I had issues with self–primer hybridization so here I made the PCR spends quite a bit of time at high temperatures, which should increase the fidelity.

Thermocycling diagnostics for a touchdown PCR: temperatures of block, sample and lid over 35 cycles and 42 minutes

Step 2. Cutting the Plasmid

To insert my sfGFP DNA into pUC19, I first need to cut the plasmid open. Following the NEB protocol, I do this with the restriction enzyme EcoRI. Transcriptic has the reagents I need in its standard inventory: this NEB EcoRI and 10x CutSmart buffer and this NEB pUC19 plasmid.

Here are the prices from their inventory for reference. I only actually pay a fraction of the price below since Transcriptic sells by the aliquot:

   Item         ID       Amount      Concentration    Price
------------  ------ ------------- -----------------  ------
CutSmart 10x  B7204S       5 ml         10 X          $19.00
EcoRI         R3101L  50,000 units  20,000 units/ml  $225.00
pUC19         N3041L     250 µg      1,000 µg/ml     $268.00

I follow the NEB Protocol as closely as possible:

The buffer must be completely thawed before use. Dilute the 10X stock with dH2O to a final concentration of 1X. Add the water first, buffer next, the DNA solution and finally the enzyme. A typical 50 µl reaction should contain 5 µl of 10X NEBuffer with the rest of the volume coming from the DNA solution, enzyme and dH2O.

One unit is defined as the amount of enzyme required to digest 1 µg of λ DNA in 1 hour at 37°C in a total reaction volume of 50 µl.  In general, we recommend 5–10 units of enzyme per µg DNA, and 10–20 units for genomic DNA in a 1 hour digest.

A 50 µl reaction volume is recommended for digestion of 1 µg of substrate
"""Protocol for cutting pUC19 with EcoRI."""
p = Protocol()

experiment_name = "puc19_ecori_v3"

options = {}

inv = {
    'water':    "rs17gmh5wafm5p",   # catalog; Autoclaved MilliQ H2O; ambient
    "pUC19":    "rs17tcqmncjfsh",   # catalog; pUC19; cold_20
    "EcoRI":    "rs17ta8xftpdk6",   # catalog; EcoRI-HF; cold_20
    "CutSmart": "rs17ta93g3y85t",   # catalog; CutSmart Buffer 10x; cold_20
    "ecori_p10x": "ct187v4ea85k2h", # inventory; EcoRI diluted 10x

# Tubes and plates I use then discard
re_tube    = p.ref("re_tube",    cont_type="micro-1.5", storage="cold_4", discard=True).well(0)
water_tube = p.ref("water_tube", cont_type="micro-1.5", storage="cold_4", discard=True).well(0)
pcr_plate  = p.ref("pcr_plate",  cont_type="96-pcr",    storage="cold_4", discard=True)

# The result of the experiment, a pUC19 cut by EcoRI, goes in this tube for storage
puc19_cut_tube  = p.ref(expid("puc19_cut"), cont_type="micro-1.5", storage="cold_20").well(0)

# -------------------------------------------------------------
# Provisioning and diluting.
# Diluted EcoRI can be used more than once
p.provision(inv["water"], water_tube, µl(500))

if 'dilute_ecori' in options:
    ecori_p10x_tube = p.ref("ecori_p10x", cont_type="micro-1.5", storage="cold_20").well(0)
    p.transfer(water_tube,    ecori_p10x_tube, µl(45))
    p.provision(inv["EcoRI"], ecori_p10x_tube, µl(5))
    # All "inventory" (stuff I own at transcriptic) must be initialized
    ecori_p10x_tube = p.ref("ecori_p10x", id=inv["ecori_p10x"], cont_type="micro-1.5", storage="cold_20").well(0)

# -------------------------------------------------------------
# Restriction enzyme cutting pUC19
# 50ul total reaction volume for cutting 1ug of DNA:
# 5ul CutSmart 10x
# 1ul pUC19 (1ug of DNA)
# 1ul EcoRI (or 10ul diluted EcoRI, 20 units, >10 units per ug DNA)
p.transfer(water_tube,       re_tube, µl(117))
p.provision(inv["CutSmart"], re_tube, µl(15))
p.provision(inv["pUC19"],    re_tube, µl(3))
p.mix(re_tube, volume=µl(60), repetitions=10)
assert re_tube.volume == µl(120) + dead_volume["micro-1.5"]

print("Volumes: re_tube:{} water_tube:{} EcoRI:{}".format(re_tube.volume, water_tube.volume, ecori_p10x_tube.volume))

p.distribute(re_tube,         pcr_plate.wells(["A1","B1","A2"]), µl(40))
p.distribute(water_tube,      pcr_plate.wells(["A2"]),           µl(10))
p.distribute(ecori_p10x_tube, pcr_plate.wells(["A1","B1"]),      µl(10))
assert all(well.volume == µl(50) for well in pcr_plate.wells(["A1","B1","A2"]))

p.mix(pcr_plate.wells(["A1","B1","A2"]), volume=µl(25), repetitions=10)

# Incubation to induce cut, then heat inactivation of EcoRI
p.incubate(pcr_plate, "warm_37", "60:minute", shaking=False)
p.thermocycle(pcr_plate, [{"cycles":  1, "steps": [{"temperature": "65:celsius", "duration": "21:minute"}]}], volume=µl(50))

# --------------------------------------------------------------
# Gel electrophoresis, to ensure the cutting worked
p.mix(pcr_plate.wells(["A1","B1","A2"]), volume=µl(25), repetitions=5)
p.transfer(pcr_plate.wells(["A1","B1","A2"]), pcr_plate.wells(["D1","E1","D2"]), µl(8))
p.transfer(water_tube, pcr_plate.wells(["D1","E1","D2"]), µl(15), mix_after=True, mix_vol=µl(10))
assert all(well.volume == µl(20) + dead_volume["96-pcr"] for well in pcr_plate.wells(["D1","E1","D2"]))

p.gel_separate(pcr_plate.wells(["D1","E1","D2"]), µl(20), "agarose(10,2%)", "ladder2", "15:minute", expid("gel"))

# ----------------------------------------------------------------------------
# Then consolidate all cut plasmid to one tube (puc19_cut_tube).
remaining_volumes = [well.volume - dead_volume['96-pcr'] for well in pcr_plate.wells(["A1","B1"])]
print("Consolidated volume: {}".format(sum(remaining_volumes, µl(0))))
p.consolidate(pcr_plate.wells(["A1","B1"]), puc19_cut_tube, remaining_volumes, allow_carryover=True)

assert all(tube.volume >= dead_volume['micro-1.5'] for tube in [water_tube, re_tube, puc19_cut_tube, ecori_p10x_tube])

# ---------------------------------------------------------------
# Test protocol
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze
#print("Protocol {}\n\n{}".format(experiment_name, jprotocol))
Volumes: re_tube:135.0:microliter water_tube:383.0:microliter EcoRI:30.0:microliter
Consolidated volume: 78.0:microliter

✓ Protocol analyzed
  12 instructions
  5 containers
  Total Cost: $30.72
  Workcell Time: $3.38
  Reagents & Consumables: $27.34
Results: Cutting the plasmid

I ended up doing this experiment twice under slightly different conditions and with different-sized gels, but the results are almost identical. Both gels look good to me.

Originally, I did not allocate enough space for dead volume (1.5ml tubes have 15µl of dead volume!), which I believe explains the difference between D1 and E1 (these two lanes should be identical). This dead volume problem would be easily solved by making a proper working stock of diluted EcoRI at the start of the protocol.

Despite that error, in both gels, lanes D1 and E1 contain strong bands at the correct position of 2.6kb. Lane D2 contains uncut plasmid, so as expected, it is not visible in one gel and barely visible as a smear in the other.

The two gel photographs look pretty different, partially just because this is a step that Transcriptic has yet to automate.

Two gels showing a cut pUC19 (2.6kb) in lanes D1 and E1, and uncut pUC19 in D2

Step 3. Gibson Assembly

The simplest way to check if my Gibson assembly works is to assemble the insert and plasmid, then use standard M13 primers (which flank the insert) to amplify part of the plasmid and the inserted DNA, and run qPCR and a gel to see that the amplification worked. You could also run a sequencing reaction to confirm that everything inserted as expected, but I decided to leave this for later.

If the Gibson assembly fails, then the M13 amplification will fail, because the plasmid has been cut between the two M13 sequences.

"""Debugging transformation protocol: Gibson assembly followed by qPCR and a gel
v2: include v3 Gibson assembly"""

p = Protocol()
options = {}

experiment_name = "debug_sfgfp_puc19_gibson_seq_v2"

inv = {
    "water"                       : "rs17gmh5wafm5p", # catalog; Autoclaved MilliQ H2O; ambient
    "M13_F"                       : "rs17tcpqwqcaxe", # catalog; M13 Forward (-41); cold_20 (1ul = 100pmol)
    "M13_R"                       : "rs17tcph6e2qzh", # catalog; M13 Reverse (-48); cold_20 (1ul = 100pmol)
    "SensiFAST_SYBR_No-ROX"       : "rs17knkh7526ha", # catalog; SensiFAST SYBR for qPCR
    "sfgfp_puc19_gibson_v1_clone" : "ct187rzdq9kd7q", # inventory; assembled sfGFP; cold_4
    "sfgfp_puc19_gibson_v3_clone" : "ct188ejywa8jcv", # inventory; assembled sfGFP; cold_4

# ---------------------------------------------------------------
# First get my sfGFP pUC19 clones, assembled with Gibson assembly
clone_plate1 = p.ref("sfgfp_puc19_gibson_v1_clone", id=inv["sfgfp_puc19_gibson_v1_clone"],
                     cont_type="96-pcr", storage="cold_4", discard=False)
clone_plate2 = p.ref("sfgfp_puc19_gibson_v3_clone", id=inv["sfgfp_puc19_gibson_v3_clone"],
                     cont_type="96-pcr", storage="cold_4", discard=False)

water_tube = p.ref("water", cont_type="micro-1.5", storage="cold_4", discard=True).well(0)
master_tube = p.ref("master", cont_type="micro-1.5", storage="cold_4", discard=True).well(0)
primer_tube = p.ref("primer", cont_type="micro-1.5", storage="cold_4", discard=True).well(0)

pcr_plate = p.ref(expid("pcr_plate"), cont_type="96-pcr", storage="cold_4", discard=False)


seq_wells = ["B2","B4","B6", # clone_plate1
             "D2","D4","D6", # clone_plate2
             "F2","F4"] # control

# clone_plate2 was diluted 4X (20ul->80ul), according to NEB instructions
assert clone_plate1.well("A1").volume == µl(18), clone_plate1.well("A1").volume
assert clone_plate2.well("A1").volume == µl(78), clone_plate2.well("A1").volume

# --------------------------------------------------------------
# Provisioning
p.provision(inv["water"], water_tube, µl(500))

# primers, diluted 2X, discarded at the end
p.provision(inv["M13_F"], primer_tube, µl(13))
p.provision(inv["M13_R"], primer_tube, µl(13))
p.transfer(water_tube, primer_tube, µl(26), mix_after=True, mix_vol=µl(20), repetitions=10)

# -------------------------------------------------------------------
# PCR Master mix -- 10ul SYBR mix, plus 1ul each undiluted primer DNA (100pmol)
# Also add 15ul of dead volume
p.provision(inv['SensiFAST_SYBR_No-ROX'], master_tube, µl(11+len(seq_wells)*10))
p.transfer(primer_tube, master_tube, µl(4+len(seq_wells)*4))
p.mix(master_tube, volume=µl(63), repetitions=10)
assert master_tube.volume == µl(127) # 15ul dead volume

p.distribute(master_tube, pcr_plate.wells(seq_wells), µl(14), allow_carryover=True)
p.distribute(water_tube, pcr_plate.wells(seq_wells),
             [µl(ul) for ul in [5,4,2, 4,2,0, 6,6]],

# Template -- starting with some small, unknown amount of DNA produced by Gibson
p.transfer(clone_plate1.well("A1"), pcr_plate.wells(seq_wells[0:3]), [µl(1),µl(2),µl(4)], one_tip=True)
p.transfer(clone_plate2.well("A1"), pcr_plate.wells(seq_wells[3:6]), [µl(2),µl(4),µl(6)], one_tip=True)

assert all(pcr_plate.well(w).volume == µl(20) for w in seq_wells)
assert clone_plate1.well("A1").volume == µl(11)
assert clone_plate2.well("A1").volume == µl(66)

# --------------------------------------------------------------
# qPCR
# standard melting curve parameters
p.thermocycle(pcr_plate, [{"cycles":  1, "steps": [{"temperature": "95:celsius","duration": "2:minute"}]},
                          {"cycles": 40, "steps": [{"temperature": "95:celsius","duration": "5:second"},
                                                   {"temperature": "60:celsius","duration": "20:second"},
                                                   {"temperature": "72:celsius","duration": "15:second", "read": True}]}],
    volume=µl(20), # volume is optional
    dyes={"SYBR": seq_wells}, # dye must be specified (tells transcriptic what aborbance to use?)
    melting_start="65:celsius", melting_end="95:celsius", melting_increment="0.5:celsius", melting_rate="5:second")

# --------------------------------------------------------------
# Gel -- 20ul required
# Dilute such that I have 11ul for sequencing
p.distribute(water_tube, pcr_plate.wells(seq_wells), µl(11))
p.gel_separate(pcr_plate.wells(seq_wells), µl(20), "agarose(8,0.8%)", "ladder1", "10:minute", expid("gel"))

# This appears to be a bug in Transcriptic. The actual volume should be 11ul
# but it is not updating after running a gel with 20ul.
# Primer tube should be equal to dead volume, or it's a waste
assert all(pcr_plate.well(w).volume==µl(31) for w in seq_wells)
assert primer_tube.volume == µl(16) == dead_volume['micro-1.5'] + µl(1)
assert water_tube.volume > µl(25)

# ---------------------------------------------------------------
# Test and run protocol
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze
WARNING:root:Low volume for well sfgfp_puc19_gibson_v1_clone/sfgfp_puc19_gibson_v1_clone : 11.0:microliter
✓ Protocol analyzed
  11 instructions
  6 containers
  Total Cost: $32.09
  Workcell Time: $6.98
  Reagents & Consumables: $25.11
Results: Gibson assembly qPCR

I can use Transcriptic's data API to access the raw qPCR data as json. This feature is not very well documented, but it can be extremely useful. It even gives you access to some diagnostic data from the robots, which could help with debugging.

First, I request data on the run:

project_id, run_id = "p16x6gna8f5e9", "r18mj3cz3fku7"
api_url = "{}/runs/{}/data.json".format(project_id, run_id)
data_response = requests.get(api_url, headers=tsc_headers)
data = data_response.json()

Then I use that id to get "postprocessed" data from the qPCR:

qpcr_id = data['debug_sfgfp_puc19_gibson_seq_v1_qpcr']['id']
pp_api_url = "{}.json?key=postprocessed_data".format(qpcr_id)
data_response = requests.get(pp_api_url, headers=tsc_headers)
pp_data = data_response.json()

Here are the Ct (cycle threshold) values for each well. The Ct is simply the point at which the fluorescence exceeds a certain value. It tells us approximately how much DNA is currently present (and hence approximately how much we started with).

# Simple util to convert wellnum to wellname
n_w = {str(wellnum):'ABCDEFGH'[wellnum//12]+str(1+wellnum%12) for wellnum in range(96)}
w_n = {v: k for k, v in n_w.items()}

ct_vals = {n_w[k]:v for k,v in pp_data["amp0"]["SYBR"]["cts"].items()}
ct_df = pd.DataFrame(ct_vals, index=["Ct"]).T
ct_df["well"] = ct_df.index
f, ax = plt.subplots(figsize=(16,6))
_ = sns.barplot(y="well", x="Ct", data=ct_df)

We can see that amplification happens earliest in wells D2/4/6 (which uses DNA from my "v3" Gibson assembly), then B2/4/6 (my "v1" Gibson assembly). The differences between v1 and v3 are mainly that the v3 DNA was diluted 4X according to the NEB protocol, but both should work. There is some amplification after cycle 30 in the control wells (F2, F4) despate having no template DNA, but that's not unusual since they include lots of primer DNA.

I can also plot the qPCR amplification curve to see the dynamics of the amplification.

f, ax = plt.subplots(figsize=(16,6))
ax.set_color_cycle(['#fb6a4a', '#de2d26', '#a50f15', '#74c476', '#31a354', '#006d2c', '#08519c', '#6baed6'])
amp0 = pp_data['amp0']['SYBR']['baseline_subtracted']
_ = [plt.plot(amp0[w_n[well]], label=well) for well in ['B2', 'B4', 'B6', 'D2', 'D4', 'D6', 'F2', 'F4']]
_ = ax.set_ylim(0,)
_ = plt.title("qPCR (reds=Gibson v1, greens=Gibson v3, blues=control)")
_ = plt.legend(bbox_to_anchor=(1, .75), bbox_transform=plt.gcf().transFigure)

Overall, the qPCR results looks great, with good amplification for both versions of my Gibson assembly, and no real amplification in the control. Since the v3 assembly worked a bit better than v1 I will use that from here on.

Results: Gibson assembly gel

The gel is also very clean, showing strong bands at just below 1kb in lanes B2, B4, B6, D2, D4, D6, which is the size I expect (the insert is about 740bp, and the M13 primers are about 40bp upstream and downstream). The second band corresponds to primers. We can be pretty sure of this since lanes F2 and F4 have only primer DNA and no template DNA.

Gel electrophoresis: the "v3" Gibson assembly has stronger bands (D2, D4, D6), in line with the qPCR data above.

Step 4. Transformation

Transformation is the process of altering an organism by adding DNA. So in this experiment I am transforming E. coli with the sfGFP-expressing plasmid pUC19.

I am using an easy-to-work-with Zymo DH5α Mix&Go strain and the recommended Zymo protocol. This strain is part of the standard Transcriptic inventory. In general, transformations can be tricky since competent cells are quite fragile, so the simpler and more robust the protocol the better. In regular molecular biology labs, these competent cells would likely be too expensive for general use.

Zymo Mix & Go cells have a simple protocol

The trouble with robots

This protocol is a good example of how adapting human protocols for use with robots can be difficult, and can fail unexpectedly. Protocols can be surprisingly vague ("shake the tube from side to side"), relying on the shared context of molecular biologists, or they may ask for advanced image processing ("check that the pellet was resuspended"). Humans don't mind these tasks, but robots need more explicit instructions.

There are some interesting timing issues with this transformation. The transformation protocol advises that the cells not stay at room temperature for more than a few seconds, and that the plate should be pre-warmed to 37C. In theory, you would want to start the pre-warming so it ends at the same time as the transformation, but it's not clear how the Transcriptic robots would handle this situation — to my knowledge, there is no way to sync up the steps of the protocol exactly. A lack of fine control over timing seems like it will be a common issue with robotic protocols, due to the comparative inflexibility of the robotic arm, scheduling conflicts, etc. We will have to adjust our protocols accordingly.

There are usually reasonable solutions: sometimes you just have to use different reagents (e.g., hardier cells, like the Mix&Go cells above); sometimes you just try overkill (e.g., shake the thing ten times instead of three); sometimes you have to come up with tricks to make the process work better with robots (e.g., use a PCR machine for heat-shocking).

Of course, the big advantage is that once the protocol works once, you can mostly rely on it to work again and again. You may even be able to quantify how robust the protocol is, and improve it over time!

Test Transformation

Before I start transforming with my fully assembled plasmid, I run a simple experiment to make sure that a transformation using regular pUC19 (i.e., no Gibson assembly, and no sfGFP insert DNA) works. pUC19 contains an ampicillin-resistance gene, so a successful transformation should allow the bacteria to grow on plates that contain this antibiotic.

I transfer the bacteria straight onto plates ("6-flat" in Transcriptic's terminology) that either have ampicillin or no ampicillin. I expect that transformed bacteria contain an ampicillin-resistance gene, and hence will grow. Untransformed bacteria should not grow.

"""Simple transformation protocol: transformation with unaltered pUC19"""

p = Protocol()

experiment_name = "debug_sfgfp_puc19_gibson_v1"

inv = {
    "water"       : "rs17gmh5wafm5p", # catalog; Autoclaved MilliQ H2O; ambient
    "DH5a"        : "rs16pbj944fnny", # catalog; Zymo DH5α; cold_80
    "LB Miller"   : "rs17bafcbmyrmh", # catalog; LB Broth Miller; cold_4
    "Amp 100mgml" : "rs17msfk8ujkca", # catalog; Ampicillin 100mg/ml; cold_20
    "pUC19"       : "rs17tcqmncjfsh", # catalog; pUC19; cold_20

# Catalog
transform_plate = p.ref("transform_plate", cont_type="96-pcr", storage="ambient", discard=True)
transform_tube  = transform_plate.well(0)

# ------------------------------------------------------------------------------------
# Plating transformed bacteria according to Tali's protocol (requires different code!)
# Add 1-5ul plasmid and pre-warm culture plates to 37C before starting.

# Extra inventory for plating
inv["lb-broth-100ug-ml-amp_6-flat"] = "ki17sbb845ssx9" # (kit, not normal ref) from blogpost
inv["noAB-amp_6-flat"] = "ki17reefwqq3sq" # kit id
inv["LB Miller"] = "rs17bafcbmyrmh"

# Ampicillin and no ampicillin plates
amp_6_flat = Container(None, p.container_type('6-flat'))
p.refs["amp_6_flat"] = Ref('amp_6_flat',
                           {"reserve": inv['lb-broth-100ug-ml-amp_6-flat'], "store": {"where": 'cold_4'}}, amp_6_flat)
noAB_6_flat = Container(None, p.container_type('6-flat'))
p.refs["noAB_6_flat"] = Ref('noAB_6_flat',
                            {"reserve": inv['noAB-amp_6-flat'], "store": {"where": 'cold_4'}}, noAB_6_flat)

# Provision competent bacteria
p.provision(inv["DH5a"], transform_tube,  µl(50))
p.provision(inv["pUC19"], transform_tube, µl(2))

# Heatshock the bacteria to transform using a PCR machine
    [{"cycles":  1, "steps": [{"temperature":  "4:celsius", "duration":  "5:minute"}]},
     {"cycles":  1, "steps": [{"temperature": "37:celsius", "duration": "30:minute"}]}],

# Then dilute bacteria and spread onto 6-flat plates
# Put more on ampicillin plates for more opportunities to get a colony
p.provision(inv["LB Miller"], transform_tube, µl(355))
p.mix(transform_tube, µl(150), repetitions=5)
for i in range(6):
    p.spread(transform_tube, amp_6_flat.well(i), µl(55))
    p.spread(transform_tube, noAB_6_flat.well(i), µl(10))

assert transform_tube.volume >= µl(15), transform_tube.volume

# Incubate and image 6-flat plates over 18 hours
for flat_name, flat in [("amp_6_flat", amp_6_flat), ("noAB_6_flat", noAB_6_flat)]:
    for timepoint in [6,12,18]:
        p.incubate(flat, "warm_37", "6:hour")
        p.image_plate(flat, mode="top", dataref=expid("{}_t{}".format(flat_name, timepoint)))

# ---------------------------------------------------------------
# Analyze protocol
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze

#print("Protocol {}\n\n{}".format(experiment_name, protocol))
✓ Protocol analyzed
  43 instructions
  3 containers
Results: Test transformation

In the following plate photographs, we can see that with no antibiotic (left-hand side plates), there is growth on all six plates, though the amount of growth is quite variable, which is worrying. Transcriptic's robots do not seem to do a great job with spreading, a task that does require some dexterity.

In the presence of antibiotic (right-hand side plates), I also see growth, though again it's inconsistent. The first two antibiotic plates look odd, with lots of growth, which is likely the result of adding 55µl to these plates compared to the 10µl I added to the no-antibiotic plates. The third plate has some colonies and is essentially what I expected to see for all the plates. The last three plates should have some growth but do not. My only explanation for these odd results is that I did insufficient mixing of cells and media, so almost all the cells were dispensed into the first two plates.

(I really should have also done a positive control here with untransformed bacteria on ampicillin plates, but I had already done this in a previous experiment, so I know that the stocked ampicillin plates kill this strain of E. coli. Growth was much weaker in the ampicillin plates despite dispensing a greater volume, as expected.)

Overall, the transformation worked well enough to proceed, but there are some kinks to work out.

Plates of cells transformed with pUC19 after 18 hours: no antibiotic (left) and antibiotic (right)

Transformation with assembled product

Since the Gibson assembly and a simple pUC19 transformation seem to work, I can now attempt a transformation with a fully-assembled sfGFP-expressing plasmid.

Apart from the assembled insert, I will also add some IPTG and X-gal to the plates, so that I can see the successful transformation with a blue–white screen. This additional information is useful since if I am transforming with regular pUC19, which does not contain sfGFP, it would still confer antibiotic resistance.

Absorbance and Fluorescence

sfGFP fluoresces best with 485nm excitation / 510nm emission wavelengths (according to this chart). I found that 485/535 worked better at Transcriptic, I assume because 485 and 510 are too similar. I measure the growth of the bacteria at 600nm (OD600).

A menagerie of GFP (biotek)

IPTG and X-gal

My IPTG is at a concentration of 1M and should be used at 1:1000 dilution. My X-gal is at a concentration of 20mg/ml and should be used at a 1:1000 dilution (20mg/µl). Hence to a 2000µl LB-broth, I add 2µl of each.

According to one protocol you should first spread 40µl of X-gal at 20mg/ml and 40µl of IPTG at 0.1mM (or 4µl of IPTG at 1M) and then dry it for 30 minutes. That procedure did not work for me, so instead I mix IPTG, X-gal and competent cells, and spread that mixture directly.

"""Full Gibson assembly and transformation protocol for sfGFP and pUC19
v1: Spread IPTG and X-gal onto plates, then spread cells
v2: Mix IPTG, X-gal and cells; spread the mixture
v3: exclude X-gal so I can do colony picking better
v4: repeat v3 to try other excitation/emission wavelengths"""

p = Protocol()

options = {
    "gibson"          : False,    # do a new gibson assembly
    "sanger"          : False,    # sanger sequence product
    "control_pUC19"   : True,     # unassembled pUC19
    "XGal"            : False     # excluding X-gal should make the colony picking easier
for k, v in list(options.items()):
    if v is False: del options[k]

experiment_name = "sfgfp_puc19_gibson_plates_v4"

# -----------------------------------------------------------------------
# Inventory
inv = {
    # catalog
    "water"       : "rs17gmh5wafm5p", # catalog; Autoclaved MilliQ H2O; ambient
    "DH5a"        : "rs16pbj944fnny", # catalog; Zymo DH5α; cold_80
    "Gibson Mix"  : "rs16pfatkggmk5", # catalog; Gibson Mix (2X); cold_20
    "LB Miller"   : "rs17bafcbmyrmh", # catalog; LB Broth Miller; cold_4
    "Amp 100mgml" : "rs17msfk8ujkca", # catalog; Ampicillin 100mg/ml; cold_20
    "pUC19"       : "rs17tcqmncjfsh", # catalog; pUC19; cold_20
    # my inventory
    "puc19_cut_v2": "ct187v4ea7vvca", # inventory; pUC19 cut with EcoRI; cold_20
    "IPTG"        : "ct18a2r5wn6tqz", # inventory; IPTG at 1M (conc semi-documented); cold_20
    "XGal"        : "ct18a2r5wp5hcv", # inventory; XGal at 0.1M (conc not documented); cold_20
    "sfgfp_pcroe_v8_amplified"    : "ct1874zqh22pab", # inventory; sfGFP amplified to 40ng/ul; cold_4
    "sfgfp_puc19_gibson_v3_clone" : "ct188ejywa8jcv", # inventory; assembled sfGFP; cold_4
    # kits (must be used differently)
    "lb-broth-100ug-ml-amp_6-flat" : "ki17sbb845ssx9", # catalog; ampicillin plates
    "noAB-amp_6-flat" : "ki17reefwqq3sq" # catalog; no antibiotic plates

# Catalog (all to be discarded afterward)
water_tube       = p.ref("water",     cont_type="micro-1.5", storage="ambient", discard=True).well(0)
transform_plate  = p.ref("trn_plate", cont_type="96-pcr",    storage="ambient", discard=True)
transform_tube   = transform_plate.well(39) # experiment
transform_tube_L = p.ref("trn_tubeL", cont_type="micro-1.5", storage="ambient", discard=True).well(0)
transctrl_tube   = transform_plate.well(56) # control
transctrl_tube_L = p.ref("trc_tubeL", cont_type="micro-1.5", storage="ambient", discard=True).well(0)

# Plating according to Tali's protocol
amp_6_flat = Container(None, p.container_type('6-flat'))
p.refs[expid("amp_6_flat")] = Ref(expid("amp_6_flat"),
                                  {"reserve": inv['lb-broth-100ug-ml-amp_6-flat'], "store": {"where": 'cold_4'}}, amp_6_flat)
noAB_6_flat = Container(None, p.container_type('6-flat'))
p.refs[expid("noAB_6_flat")] = Ref(expid("noAB_6_flat"),
                                   {"reserve": inv['noAB-amp_6-flat'], "store": {"where": 'cold_4'}}, noAB_6_flat)

# My inventory: EcoRI-cut pUC19, oePCR'd sfGFP, Gibson-assembled pUC19, IPTG and X-Gal
if "gibson" in options:
    puc19_cut_tube = p.ref("puc19_ecori_v2_puc19_cut", id=inv["puc19_cut_v2"],
                           cont_type="micro-1.5", storage="cold_20").well(0)
    sfgfp_pcroe_amp_tube = p.ref("sfgfp_pcroe_v8_amplified", id=inv["sfgfp_pcroe_v8_amplified"],
                                 cont_type="micro-1.5", storage="cold_4").well(0)
    clone_plate = p.ref(expid("clone"), cont_type="96-pcr", storage="cold_4", discard=False)
    clone_plate = p.ref("sfgfp_puc19_gibson_v3_clone", id=inv["sfgfp_puc19_gibson_v3_clone"],
                         cont_type="96-pcr", storage="cold_4", discard=False)

IPTG_tube = p.ref("IPTG", id=inv["IPTG"], cont_type="micro-1.5", storage="cold_20").well(0)
if "XGal" in options: XGal_tube = p.ref("XGal", id=inv["XGal"], cont_type="micro-1.5", storage="cold_20").well(0)

# Initialize inventory
if "gibson" in options:
    all_inventory_wells = [puc19_cut_tube, sfgfp_pcroe_amp_tube, IPTG_tube]
    assert puc19_cut_tube.volume == µl(66), puc19_cut_tube.volume
    assert sfgfp_pcroe_amp_tube.volume == µl(36), sfgfp_pcroe_amp_tube.volume
    all_inventory_wells = [IPTG_tube, clone_plate.well(0)]

if "XGal" in options: all_inventory_wells.append(XGal_tube)

for well in all_inventory_wells:
    print("Inventory: {} {} {}".format(, well.volume,

# Provisioning. Water is used all over the protocol. Provision an excess since it's cheap
p.provision(inv["water"], water_tube, µl(500))

# -----------------------------------------------------------------------------
# Cloning/assembly (see NEBuilder protocol above)
# "Optimized efficiency is 50–100 ng of vectors with 2 fold excess of inserts."
# pUC19 is 20ng/ul (78ul total).
# sfGFP is ~40ng/ul (48ul total)
# Therefore 4ul of each gives 80ng and 160ng of vector and insert respectively

def do_gibson_assembly():
    # Combine all the Gibson reagents in one tube and thermocycle
    p.provision(inv["Gibson Mix"],   clone_plate.well(0), µl(10))
    p.transfer(water_tube,           clone_plate.well(0), µl(2))
    p.transfer(puc19_cut_tube,       clone_plate.well(0), µl(4))
    p.transfer(sfgfp_pcroe_amp_tube, clone_plate.well(0), µl(4),
               mix_after=True, mix_vol=µl(10), repetitions=10)

                  [{"cycles":  1, "steps": [{"temperature": "50:celsius", "duration": "16:minute"}]}],

    # Dilute assembled plasmid 4X according to the NEB Gibson assembly protocol (20ul->80ul)
    p.transfer(water_tube, clone_plate.well(0), µl(60), mix_after=True, mix_vol=µl(40), repetitions=5)

# --------------------------------------------------------------------------------------------------
# Transformation
# "Transform NEB 5-alpha Competent E. coli cells with 2 μl of the
#  assembled product, following the appropriate transformation protocol."
# Mix & Go
# "[After mixing] Immediately place on ice and incubate for 2-5 minutes"
# "The highest transformation efficiencies can be obtained by incubating Mix & Go cells with DNA on
#  ice for 2-5 minutes (60 minutes maximum) prior to plating."
# "It is recommended that culture plates be pre-warmed to >20°C (preferably 37°C) prior to plating."
# "Avoid exposing the cells to room temperature for more than a few seconds at a time."
# "If competent cells are purchased from other manufacture, dilute assembled products 4-fold
#  with H2O prior transformation. This can be achieved by mixing 5 μl of assembled products with
#  15 μl of H2O. Add 2 μl of the diluted assembled product to competent cells."

def _do_transformation():
    # Combine plasmid and competent bacteria in a pcr_plate and shock
    p.provision(inv["DH5a"], transform_tube,  µl(50))
    p.transfer(clone_plate.well(0), transform_tube, µl(3), dispense_speed="10:microliter/second")
    assert clone_plate.well(0).volume == µl(54), clone_plate.well(0).volume

    if 'control_pUC19' in options:
        p.provision(inv["DH5a"], transctrl_tube,  µl(50))
        p.provision(inv["pUC19"], transctrl_tube, µl(1))

    # Heatshock the bacteria to transform using a PCR machine
        [{"cycles":  1, "steps": [{"temperature":  "4:celsius", "duration": "5:minute"}]},
         {"cycles":  1, "steps": [{"temperature": "37:celsius", "duration": "30:minute"}]}],

def _transfer_transformed_to_plates():
    assert transform_tube.volume == µl(53), transform_tube.volume

    num_ab_plates = 4 # antibiotic places

    # Transfer bacteria to a bigger tube for diluting
    # Then spread onto 6-flat plates
    # Generally you would spread 50-100ul of diluted bacteria
    # Put more on ampicillin plates for more opportunities to get a colony
    # I use a dilution series since it's unclear how much to plate
    p.provision(inv["LB Miller"], transform_tube_L, µl(429))

    # Add all IPTG and XGal to the master tube
    # 4ul (1M) IPTG on each plate; 40ul XGal on each plate
    p.transfer(IPTG_tube, transform_tube_L, µl(4*num_ab_plates))
    if 'XGal' in options:
        p.transfer(XGal_tube, transform_tube_L, µl(40*num_ab_plates))

    # Add the transformed cells and mix (use new mix op in case of different pipette)
    p.transfer(transform_tube, transform_tube_L, µl(50))
    p.mix(transform_tube_L, volume=transform_tube_L.volume/2, repetitions=10)

    assert transform_tube.volume == dead_volume['96-pcr'] == µl(3), transform_tube.volume
    assert transform_tube_L.volume == µl(495), transform_tube_L.volume

    # Spread an average of 60ul on each plate == 480ul total
    for i in range(num_ab_plates):
        p.spread(transform_tube_L, amp_6_flat.well(i), µl(51+i*6))
        p.spread(transform_tube_L, noAB_6_flat.well(i), µl(51+i*6))

    assert transform_tube_L.volume == dead_volume["micro-1.5"], transform_tube_L.volume

    # Controls: include 2 ordinary pUC19-transformed plates as a control
    if 'control_pUC19' in options:
        num_ctrl = 2
        assert num_ab_plates + num_ctrl <= 6

        p.provision(inv["LB Miller"], transctrl_tube_L, µl(184)+dead_volume["micro-1.5"])
        p.transfer(IPTG_tube,         transctrl_tube_L, µl(4*num_ctrl))
        if "XGal" in options: p.transfer(XGal_tube, transctrl_tube_L, µl(40*num_ctrl))
        p.transfer(transctrl_tube,    transctrl_tube_L, µl(48))
        p.mix(transctrl_tube_L, volume=transctrl_tube_L.volume/2, repetitions=10)

        for i in range(num_ctrl):
            p.spread(transctrl_tube_L, amp_6_flat.well(num_ab_plates+i), µl(55+i*10))
            p.spread(transctrl_tube_L, noAB_6_flat.well(num_ab_plates+i), µl(55+i*10))

        assert transctrl_tube_L.volume == dead_volume["micro-1.5"], transctrl_tube_L.volume
        assert IPTG_tube.volume == µl(808), IPTG_tube.volume
        if "XGal" in options: assert XGal_tube.volume == µl(516), XGal_tube.volume


def do_transformation():

# ------------------------------------------------------
# Measure growth in plates (photograph)

def measure_growth():
    # Incubate and photograph 6-flat plates over 18 hours
    # to see blue or white colonies
    for flat_name, flat in [(expid("amp_6_flat"), amp_6_flat), (expid("noAB_6_flat"), noAB_6_flat)]:
        for timepoint in [9,18]:
            p.incubate(flat, "warm_37", "9:hour")
            p.image_plate(flat, mode="top", dataref=expid("{}_t{}".format(flat_name, timepoint)))

# ---------------------------------------------------------------
# Sanger sequencing, TURNED OFF
# Sequence to make sure assembly worked
# 500ng plasmid, 1 µl of a 10 µM stock primer
# "M13_F"       : "rs17tcpqwqcaxe", # catalog; M13 Forward (-41); cold_20 (1ul = 100pmol)
# "M13_R"       : "rs17tcph6e2qzh", # catalog; M13 Reverse (-48); cold_20 (1ul = 100pmol)
def do_sanger_seq():
    seq_primers = [inv["M13_F"], inv["M13_R"]]
    seq_wells = ["G1","G2"]
    for primer_num, seq_well in [(0, seq_wells[0]),(1, seq_wells[1])]:
        p.provision(seq_primers[primer_num], pcr_plate.wells([seq_well]), µl(1))

    p.transfer(pcr_plate.wells(["A1"]), pcr_plate.wells(seq_wells),  µl(5), mix_before=True, mix_vol=µl(10))
    p.transfer(water_tube, pcr_plate.wells(seq_wells), µl(9))

    p.mix(pcr_plate.wells(seq_wells), volume=µl(7.5), repetitions=10)
    p.sangerseq(pcr_plate, pcr_plate.wells(seq_wells[0]).indices(), expid("seq1"))
    p.sangerseq(pcr_plate, pcr_plate.wells(seq_wells[1]).indices(), expid("seq2"))

# ---------------------------------------------------------------
# Generate protocol

# Skip Gibson since I already did it
if 'gibson' in options: do_gibson_assembly()
if 'sanger' in options: do_sanger_seq()

# ---------------------------------------------------------------
# Output protocol
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze

#print("\nProtocol {}\n\n{}".format(experiment_name, jprotocol))
Inventory: IPTG/IPTG/IPTG/IPTG/IPTG/IPTG 832.0:microliter {}
Inventory: sfgfp_puc19_gibson_v3_clone/sfgfp_puc19_gibson_v3_clone/sfgfp_puc19_gibson_v3_clone/sfgfp_puc19_gibson_v3_clone/sfgfp_puc19_gibson_v3_clone 57.0:microliter {}

✓ Protocol analyzed
  40 instructions
  8 containers
  Total Cost: $53.20
  Workcell Time: $17.35
  Reagents & Consumables: $35.86
Colony picking

Once the colonies are growing on an ampicillin plate, I can "pick" individual colonies and inoculate wells in a 96-well plate with those colonies. There is an autoprotocol colony-picking command (autopick) for this purpose.

"""Pick colonies from plates and grow in amp media and check for fluorescence.
v2: try again with a new plate (no blue colonies)
v3: repeat with different emission and excitation wavelengths"""

p = Protocol()

options = {}
for k, v in list(options.items()):
    if v is False: del options[k]

experiment_name = "sfgfp_puc19_gibson_pick_v3"

def plate_expid(val):
    """refer to the previous plating experiment's outputs"""
    plate_exp = "sfgfp_puc19_gibson_plates_v4"
    return "{}_{}".format(plate_exp, val)

# -----------------------------------------------------------------------
# Inventory
inv = {
    # catalog
    "water"         : "rs17gmh5wafm5p", # catalog; Autoclaved MilliQ H2O; ambient
    "LB Miller"     : "rs17bafcbmyrmh", # catalog; LB Broth Miller; cold_4
    "Amp 100mgml"   : "rs17msfk8ujkca", # catalog; Ampicillin 100mg/ml; cold_20
    "IPTG"          : "ct18a2r5wn6tqz", # inventory; IPTG at 1M (conc semi-documented); cold_20
    # plates from previous experiment, must be changed every new experiment
    plate_expid("amp_6_flat")  : "ct18snmr9avvg9", # inventory; Ampicillin plates with blue-white screening of pUC19
    plate_expid("noAB_6_flat") : "ct18snmr9dxfw2", # inventory; no AB plates with blue-white screening of pUC19

# Tubes and plates
lb_amp_tubes = [p.ref("lb_amp_{}".format(i+1), cont_type="micro-2.0", storage="ambient", discard=True).well(0)
                for i in range(4)]
lb_xab_tube  = p.ref("lb_xab", cont_type="micro-2.0", storage="ambient", discard=True).well(0)
growth_plate = p.ref(expid("growth"), cont_type="96-flat",   storage="cold_4",  discard=False)

# My inventory
IPTG_tube = p.ref("IPTG", id=inv["IPTG"], cont_type="micro-1.5", storage="cold_20").well(0)
# ampicillin plate
amp_6_flat = Container(None, p.container_type('6-flat'))
p.refs[plate_expid("amp_6_flat")] = Ref(plate_expid("amp_6_flat"),
                                        {"id":inv[plate_expid("amp_6_flat")], "store": {"where": 'cold_4'}}, amp_6_flat)

# Use a total of 50 wells
abs_wells = ["{}{}".format(row,col) for row in "BCDEF" for col in range(1,11)]
abs_wells_T = ["{}{}".format(row,col) for col in range(1,11) for row in "BCDEF"]
assert abs_wells[:3] == ["B1","B2","B3"] and abs_wells_T[:3] == ["B1","C1","D1"]

def prepare_growth_wells():
    # To LB, add ampicillin at ~1/1000 concentration
    # Mix slowly in case of overflow
    p.provision(inv["LB Miller"], lb_xab_tube, µl(1913))
    for lb_amp_tube in lb_amp_tubes:
        p.provision(inv["Amp 100mgml"], lb_amp_tube, µl(2))
        p.provision(inv["LB Miller"],   lb_amp_tube, µl(1911))
        p.mix(lb_amp_tube, volume=µl(800), repetitions=10)

    # Add IPTG but save on X-Gal
    # "If you are concerned about obtaining maximal levels of expression, add IPTG to a final concentration of 1 mM."
    # 2ul of IPTG in 2000ul equals 1mM
    p.transfer(IPTG_tube, [lb_xab_tube] + lb_amp_tubes, µl(2), one_tip=True)

    # Distribute LB among wells, row D is control (no ampicillin)
    cols = range(1,11)
    row = "D" # control, no AB
    cwells = ["{}{}".format(row,col) for col in cols]
    assert set(cwells).issubset(set(abs_wells))
    p.distribute(lb_xab_tube,  growth_plate.wells(cwells), µl(190), allow_carryover=True)

    rows = "BCEF"
    for row, lb_amp_tube in zip(rows, lb_amp_tubes):
        cwells = ["{}{}".format(row,col) for col in cols]
        assert set(cwells).issubset(set(abs_wells))
        p.distribute(lb_amp_tube, growth_plate.wells(cwells), µl(190), allow_carryover=True)

    assert all(lb_amp_tube.volume == lb_xab_tube.volume == dead_volume['micro-2.0']
               for lb_amp_tube in lb_amp_tubes)

def measure_growth_wells():
    # Growth: absorbance and fluorescence over 24 hours
    # Absorbance at 600nm: cell growth
    # Absorbance at 615nm: X-gal, in theory
    # Fluorescence at 485nm/510nm: sfGFP
    # or 450nm/508nm (
    hr = 4
    for t in range(0,24,hr):
        if t > 0:
            p.incubate(growth_plate, "warm_37", "{}:hour".format(hr), shaking=True)

        p.fluorescence(growth_plate, growth_plate.wells(abs_wells).indices(),
                       excitation="485:nanometer", emission="535:nanometer",
                       dataref=expid("fl2_{}".format(t)), flashes=25)
        p.fluorescence(growth_plate, growth_plate.wells(abs_wells).indices(),
                       excitation="450:nanometer", emission="508:nanometer",
                       dataref=expid("fl1_{}".format(t)), flashes=25)
        p.fluorescence(growth_plate, growth_plate.wells(abs_wells).indices(),
                       excitation="395:nanometer", emission="508:nanometer",
                       dataref=expid("fl0_{}".format(t)), flashes=25)
        p.absorbance(growth_plate, growth_plate.wells(abs_wells).indices(),
                     dataref=expid("abs_{}".format(t)), flashes=25)

# ---------------------------------------------------------------
# Protocol steps
batch = 10
for i in range(5):
    p.autopick(amp_6_flat.well(i), growth_plate.wells(abs_wells_T[i*batch:i*batch+batch]),
    p.image_plate(amp_6_flat, mode="top", dataref=expid("autopicked_{}".format(i)))

# ---------------------------------------------------------------
# Output protocol
jprotocol = json.dumps(p.as_dict(), indent=2)
!echo '{jprotocol}' | transcriptic analyze
✓ Protocol analyzed
  62 instructions
  8 containers
  Total Cost: $66.38
  Workcell Time: $57.59
  Reagents & Consumables: $8.78
Results: Colony picking

The blue–white screening worked beautifully, with mostly white colonies on the antibiotic plates (1-4) and blue only on the non-antibiotic plate (5-6). This is exactly what I expect, and I was relieved to see it, especially since I was using my own IPTG and X-gal that I shipped to Transcriptic.

Blue–white screening plates with ampicillin (1-4) and no antibiotic (5-6)

However, the colony-picking robot did not work well with these blue and white colonies. The image below was generated by subtracting successive plate photographs after each round of plate picking and increasing the contrast of the differences (using GraphicsMagick). This way, I can visualize which colonies were picked (albeit imperfectly since picked colonies are not completely removed).

I also annotate the image with the number of colonies reported picked by Transcriptic. The robot is supposed to pick a maximum of 10 colonies from the first five plates. However, few colonies were picked overall, and when they were picked they look to be often blue. The robot only managed to find ten colonies on a control plate with only blue colonies. My working theory is that the colony-picking robot preferentially selected blue colonies since those have the highest contrast.

Blue–white screening plates with ampicillin (1-4) and no antibiotic (5-6), annotated with number of colonies picked

Blue–white screening did serve a purpose in that it showed me that most of colonies were being correctly transformed, or at least that an insertion was happening. However, to get better colony picking, I repeat the experiment without X-gal.

Given only white colonies to pick, the colony-picking robot successfully picked 10 colonies from each of the first five plates. I have to assume most of the picked colonies have successful insertions.

Colonies growing on ampicillin plates (1-4) and no antibiotic plates (5-6)

Results: Transformation with assembled product

After growing 50 picked colonies in a 96-well plate for 20 hours, I measure fluorescence to see if sfGFP is being expressed. Transcriptic uses a Tecan Infinite plate-reader to measure fluorescence and absorbance (and luminescence if you want that).

In theory, any well that has growth has an assembled plasmid, since it needs antibiotic resistance to grow, and every assembled plasmid is expressing sfGFP. In reality, there are many reasons why that might not happen, not least of which is that you can lose the sfGFP gene from the plasmid without losing ampicillin resistance. A bacterium that loses the sfGFP gene has a selection advantage over its competitors because it is not wasting energy on that, so given enough generations of growth this is certain to happen.

I collect absorbance (OD600) and fluorescence data every four hours for 20 hours (~60 generations).

for t in [0,4,8,12,16,20]:
    abs_data = pd.read_csv("glow/sfgfp_puc19_gibson_pick_v3_abs_{}.csv".format(t), index_col="Well")
    flr_data = pd.read_csv("glow/sfgfp_puc19_gibson_pick_v3_fl2_{}.csv".format(t), index_col="Well")

    if t == 0:
        new_data = abs_data.join(flr_data)
        new_data = new_data.join(abs_data, rsuffix='_{}'.format(t))
        new_data = new_data.join(flr_data, rsuffix='_{}'.format(t))
new_data.columns = ["OD 600:nanometer_0", "Fluorescence_0"] + list(new_data.columns[2:])

I plot the data at hour 20, and a contrail of previous timepoints. I only really care about the data at hour 20 since that's approximately when fluorescence should peak.

svg = []
W, H = 800, 500
min_x, max_x = 0, 0.8
min_y, max_y = 0, 50000

def _toxy(x, y):
    return W*(x-min_x)/(max_x-min_x), H-H*(y-min_y)/(max_y-min_y)
def _topt(x, y):
    return ','.join(map(str,_toxy(x,y)))

ab_fls = [[row[0]] + [list(row[1])] for row in new_data.iterrows()]
# axes
svg.append('<g fill="#888" font-size="18" transform="translate(20,0),scale(.95)">')
svg.append('<text x="0" y="{}">OD600 →</text>'.format(H+20))
svg.append('<text x="0" y="0" transform="rotate(-90),translate(-{},-8)">Fluorescence →</text>'.format(H))
svg.append('<line x1="0" y1="{}" x2="{}" y2="{}" style="stroke:#888;stroke-width:2" />'.format(H,W,H))
svg.append('<line x1="0" y1="0" x2="0" y2="{}" style="stroke:#888;stroke-width:2" />'.format(H))

# glow filter
svg.append("""<filter id="glow" x="-200%" y="-200%" height="400%" width="400%">
<feColorMatrix type="matrix" values="0 0 0 0   0 255 0 0 0   0 0 0 0 0   0 0 0 0 1 0"/>
<feGaussianBlur stdDeviation="10" result="coloredBlur"/>
<feMerge><feMergeNode in="coloredBlur"/><feMergeNode in="SourceGraphic"/></feMerge>

for n, (well, vals) in enumerate(ab_fls):
    fill = "#444" if not well.startswith("D") else "#aaa"
    gfilter = 'filter="url(#glow)"' if well in ["C3", "D1", "D3"] else ""
    cx, cy = _toxy(*vals[-2:])
    svg.append('''<g id="point{n:d}"><circle {gfilter:s} r="12" cx="{cx:f}" cy="{cy:f}" fill="{fill:s}" />
                  <text x="{cx:f}" y="{cy:f}" font-size="10" text-anchor="middle" fill="#fff"
               '''.format(n=n, cx=cx, cy=cy, fill=fill, txt=well, gfilter=gfilter))

    pathd = 'M{} '.format(_topt(*vals[:2]))
    pathd += ' '.join("L{}".format(_topt(*vals[i:i+2])) for i in range(2,len(vals),2))
    svg.append('''<path d="{pathd:}" stroke="#ccc" stroke-width=".2"
                   fill="none" id="path{n:d}"/>'''.format(pathd=pathd, n=n))

svg.append("</g>") # entire chart group
show_svg(''.join(svg), w=W, h=H)

Fluorescence vs OD600: wells with ampicillin are black, control wells with no ampicillin are grey. A green glow is applied to wells with plasmids where I have validated the sfGFP protein sequence is correct.

I run a miniprep to extract the plasmid DNA, then Sanger sequence using M13 primers. Unfortunately, for some reason, minipreps are currently only available via Transcriptic's web-based protocol launcher and not through autoprotocol. I sequence the three wells with the highest fluorescence readings (C1, D1, D3), and three others (B1, B3, E1) and align the (forward and reverse) sequences against sfGFP with muscle.

In wells C1, D3, and D3 there is a perfect match to my original sfGFP sequence, while in wells B1, B3, and E1, there are gross mutations or the alignment just fails.

Three glowing colonies

The results are good, though some aspects are surprising. For example, the fluorescence reader starts out at a very high reading at timepoint 0 (40,000 units), for no apparent reason. By hour 20, it has settled down to a more reasonable pattern, with a clear basal correlation between OD600 and fluorescence (I assume because of a minor overlap in spectra), plus some outliers with high fluorescence. Eyeballing, it looks like it could be one, three or perhaps 11-15 outliers.

Some of the wells showing high fluorescence readings are in control wells (i.e., no ampicillin, colored grey), which is surprising since in these wells there is no selection pressure so I expect the plasmid to be lost.

Based on the fluorescence data and sequencing results, it appears that only three out of 50 colonies produce sfGFP and fluoresce. That's not nearly as many as I expected. However, because there were three separate growth stages (on the plate, in the growth well, for miniprep), the cells have undergone about 200 generations of growth by this stage, so there were quite a lot of opportunities for mutations to occur.

There must be ways to make this process more efficient, especially since I am far from an expert on these protocols. Nevertheless, we have successfully produced transformed cells expressing an engineered GFP using only Python code!

Part Three: Conclusions


Depending on how you measure it, the cost of this experiment was around $360, not including the money I spent on debugging:

  • $70 to synthesize the DNA
  • $32 to PCR and add flanks to the insert
  • $31 to cut the plasmid
  • $32 for Gibson assembly
  • $53 for transformation
  • $67 for colony picking
  • $75 for 3 minipreps and sequencing

I think the cost could probably be brought down to $250-300 with some tweaks. For example, getting a robot to pick 50 colonies is susprisingly expensive, and probably overkill.

In my experience, this price seems expensive to some (molecular biologists) and cheap to others (computational people). Since Transcriptic basically just charges for reagents at list price, the main cost difference is in labor. A robot is already pretty cheap per hour, and doesn't mind getting up in the middle of the night to take a photograph of a plate. Once the protocols are nailed down, it's hard to imagine that even a grad student will be cheaper, especially if you factor in opportunity costs.

To be clear, I am only talking about replacing routine protocols — cutting-edge protocol development will still be done by skilled molecular biologists — but a lot of exciting science uses only boring protocols. Until recently, many labs manufactured their own oligos, but now few would bother — it's just not worth anyone's time, even grad students, when IDT will ship them to you within a couple of days.

Robot labs: pros and cons

Obviously, I'm a big believer in robotic labs. There are some really fun and useful things about doing experiments with robots, especially if you're primarily a computational scientist and are allergic to latex gloves and manual labor:

  • Reproducibility! This is probably the biggest advantage. It includes the consistency of robots and the ability to publish your protocol in autoprotocol format, instead of awkward English prose (and the passive voice is not even minded by me...)
  • Scalability You can repeat my experiment 100 times with different parameters, without too much marginal work.
  • Arbitrarily complicated protocols, for example PCR touchdown. This might seem minor or even counterproductive, but if a protocol is going to be run hundreds or thousands of times by different labs, why not optimize the protocol to a fraction of a degree? Or even use statistics / machine learning to improve the protocol over time? It drives me crazy to see a protocol that might be used tens of thousands of times recommend performing an operation for 2-3 minutes. Which is it?
  • Fine-tuning You can repeat experiments after changing just one minor detail. It's really hard to ceteris paribus as a human.
  • Virtuality Run experiments or monitor results while away from the lab, like in Vienna.
  • Expressiveness You can use programming syntax to encode repetitive steps or branching logic. For example, if you wanted to dispense 1 to 96μl of reagent and (96-x)μl of water into a 96 well plate this can be concisely written.
  • Machine-readable data Results data is almost always returned as csv, or something else you can compute on.
  • Abstraction Ideally, you could run the entire protocol above while remaining agnostic to the reagents or style of cloning used, and drop in a replacement protocol if it worked better.

There are some catches too of course, especially since it's very early in the evolution of these tools. If it were the internet it would be around 1994:

  • Transporting samples back and forth to Transcriptic is a chore. I'm not sure how to solve this, though the more you can do at the cloud lab the less you need to transport. That is partially why synthetic biology is a good fit for cloud labs over, say, diagnostics with human samples.
  • Debugging protocols remotely is difficult and can be expensive — especially differentiating between your bugs and Transcriptic's bugs.
  • There are lots of experiments you just can't do yet. At the time of writing, Transcriptic only supports bacterial experiments (no yeast, no mammalian cells, though these are coming).
  • For many labs it may be more expensive to use a cloud lab than just getting a grad student (marginal cost per hour: ~$0) to do the work. This depends on how much the lab needs the grad student's hands compared to their brain.
  • Transcriptic doesn't run experiments on the weekend yet. Understandable, but it can be inconvenient, even when your project is not so time-sensitive.

Software is eating protein

Even though there's quite a lot of code here and quite a lot of debugging, I think it's feasible to produce some software that takes as input a protein sequence and as output creates bacteria that express that protein.

To make that work, a few things need to happen:

  • True integration of Twist/IDT/Gen9 with Transcriptic (this will probably be slow because of low demand currently).
  • Very robust versions of the protocols I have outlined above, to account for differences in protein sequence composition, length, secondary structure, etc.
  • Replacing various custom tools (NEB's Gibson protocol generator, IDT's codon optimizer) with open-source equivalents (e.g., primer3).

For many applications, you also want to purify your protein (using a tag and a column), or perhaps just get the bacteria to secrete it. Let's assume that we can soon do this in a cloud lab too, or that we can do experiments in vivo (i.e., within the bacterial cell).

There are also lots of opportunities to make the protocol actually work better than a human-run version, for example: design of promoters and RBSs to optimize expression specific to your sequence; statistics on the probability of success of the experiment based on comparable experiments; automated analysis of gels.

Why bother with all this?

After all that, it might not be totally clear why you would want to engineer a protein like this. Here are some ideas:

  • Make a protein sensor to detect something dangerous/unhealthy/delicious like gluten.
  • Make a vaccine by identifying the peptides unique to a pathogen and thinking really hard.
  • Evaluate protein binding in vivo by using a split GFP or similar approaches.
  • Make some scFvs as a sensor for salmonella. scFvs are like mini-antibodies that usually fold in bacteria.
  • Make a BiTE to treat a specific cancer you just sequenced. (This could be trickier than it sounds).
  • Make a topical vaccine that can enter the body via hair follicles (I don't recommend trying this at home).
  • Mutagenize your protein 100 different ways and characterize the changes. Then scale it up to 1,000, or 10,000? Maybe characterize the mutations of GFP?

For more ideas on what is possible you only have to look at the hundreds of iGEM projects that are already out there.

Finally, thanks to Ben Miles at Transcriptic for helping me finish this project.


Boolean Biotech © Brian Naughton Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More