An Automated Protein Synthesis Pipeline with Transcriptic and Snakemake

Brian Naughton // Mon 06 February 2017 // Filed under biotech // Tags biotech transcriptic snakemake synthetic biology

In a previous blogpost I described a pipeline for synthesizing arbitrary proteins on the transcriptic robotic lab platform using only Python code. The ultimate goal of that project was to be able to run a program that takes a protein sequence as input, and "returns" a tube of bacteria expressing that protein. Here I'll describe some progress towards that goal.

pipeline diagram


The usual way to chain together different programs in bioinformatics is with a pipeline management system, for example, snakemake, nextflow, toil, WDL, and many many more. I've recently become a big fan of nextflow for computational pipelines, but its major advantages (e.g., containerization) don't help much here because so much of the work happens outside of the computer. For this project I've been using the slightly simpler snakemake, mainly for tracking which steps have been completed, and deciding which steps can be run in parallel based on their dependencies.

Each protocol has four associated steps in the pipeline:

  • generate protocol: create an autoprotocol file describing the protocol
  • submit protocol: submit the autoprotocol file to transcriptic
  • get results: download images, data, etc. from transcriptic
  • create report: create a HTML report from the downloaded data
snakemake pipeline

snakemake pipeline for protein synthesis


In my terminology, a "metaprotocol" defines the complete process, which is turned into a series of protocols. Ideally, the output of a single protocol will be a decision point: for example, whether or not a gel image includes the expected bands.

The metaprotocol is defined in yaml, which has its issues, but is more readable than json, and well supported. This code depends heavily on Pydna, a Python package for cloning and assembly. Given an insert and a vector, Pydna will design primers and a PCR program. The following is my metaprotocol yaml for expressing GFP:

- meta:
    assembly: |-
      Sequences........................: [2690] [786]
      Sequences with shared homologies.: [2690] [786]
      Homology limit (bp)..............: 25
      Number of overlaps...............: 2
      Nodes in graph(incl. 5' & 3')....: 4
      Only terminal overlaps...........: No
      Circular products................: [3412]
      Linear products..................: [3446] [3442] [34] [30]
    assembly_figure: |2-
      |            \/
      |            /\
      |            31|786bp_PCR_prod|30
      |                              \/
      |                              /\
      |                              30-
      |                                 |
    metaprotocol_id: 1k9ginus
    pcr_figure: |2-
                                                                ||||||||||||||||||||||||||||||| tm 59.8 (dbd) 70.6
                                     |||||||||||||||||||||||| tm 62.1 (dbd) 69.3
    pcr_program: |2

      Pfu-Sso7d (rate 15s/kb)
      Two-step|    30 cycles |      |786bp
      98.0°C  |98.0C         |      |Tm formula: Pydna tmbresluc
      _____ __|_____         |      |SaltC 50mM
      00min30s|10s  \        |      |Primer1C 1.0µM
              |      \ 72.0°C|72.0°C|Primer2C 1.0µM
              |       \______|______|GC 49%
              |       0min11s|10min |4-12°C
    project_name: pUC19_sfGFP_cloning_v1
- linearize:
    restriction_enzyme: EcoRI
    vector: pUC19
- oligosynthesize:
- thermocycle:
      extension_time: 11.0
      forward_primer_concentration: 0.001
      rate: 15.0
      reverse_primer_concentration: 0.001
      saltc: 50.0
      ta: 72.0
- assemble:
    insert: sfGFP
    vector: pUC19

DNA synthesis

Of course, before you can run this pipeline, you need to have the appropriate insert DNA in your transcriptic inventory. As far as I know, none of the major synthetic DNA suppliers has an API. However, you can order DNA from IDT by filling in an excel file. I have automated filling in and emailing this file, so DNA synthesis can be included in the pipeline too! It should take about a week from ordering for DNA to appear at transcriptic.


After each protocol finishes, a HTML report is generated. This allows the user to evaluate protocol results manually before initiating the next step. There are ways to automate this more, like using automated band mapping of gel images, but I think that kind of thing will work better once the transcriptic API settles down a bit. The HTML report also serves as a log of the experiment.

cut_plasmid FINISHED

cut_plasmid FINISHED

 Submitted at UTC 2016-08-20 19:42:48
   Started at UTC 2016-08-20 22:52:06
 Completed at UTC 2016-08-21 01:28:49
Ran report at UTC 2016-10-26 22:15:03

Expected DNA bands of size: 2686bp

synthesize_primers <strong>FINISHED</strong>

synthesize_primers FINISHED

 Submitted at UTC 2016-08-23 19:35:37
   Started at UTC 2016-08-23 20:00:11
 Completed at UTC 2016-08-24 20:01:03
Ran report at UTC 2016-10-26 22:15:44

Synthesized primers:


add_flanks FINISHED

add_flanks FINISHED

 Submitted at UTC 2016-10-08 15:12:38
   Started at UTC 2016-10-10 22:22:16
 Completed at UTC 2016-10-11 01:04:12
Ran report at UTC 2016-11-17 15:36:02

PCR program

Pfu-Sso7d (rate 15s/kb)
Two-step|    30 cycles |      |786bp
98.0°C  |98.0C         |      |Tm formula: Pydna tmbresluc
_____ __|_____         |      |SaltC 50mM
00min30s|10s  \        |      |Primer1C 1.0µM
        |      \ 72.0°C|72.0°C|Primer2C 1.0µM
        |       \______|______|GC 49%
        |       0min11s|10min |4-12°C
run_gibson_and_transform FINISHED

run_gibson_and_transform FINISHED

 Submitted at UTC 2016-10-24 23:10:06
   Started at UTC 2016-10-28 17:45:56
 Completed at UTC 2016-10-29 17:01:26
Ran report at UTC 2016-10-30 15:12:18