Over the couple of months I've formally applied to or contacted several biobanks to inquire about access to data and samples.
Kaiser RPGEH biobank
Kaiser's biobank is large, at 200k samples, but only 20k are blood samples, and the rest saliva. Eventually, they plan to reach 500k, but I assume most of these will be saliva too. Kaiser has a well-structured application process designed for collaborations. The one downside is that it's pretty long and complex, as the flowchart below shows. I submitted a "pre-application", which is a three or four page form, including a few thousand words describing the project. I now have to wait for Kaiser to match me with a researcher on their side.
Vanderbilt BioVU
Vanderbilt's biobank is also large at 180k samples and, since it's tied to Vanderbilt's EMR, the phenotype data is rich. However, to do any research with BioVU, you need to first identify a collaborator at Vanderbilt. It seems like one of the major goals of the biobank is to generate collaborations. BioVU is part of the eMERGE network, a group of biobanks that include EMR data and genotype data.
Mount Sinai BioME
Mount Sinai has 30k samples, and like Vanderbilt, it's part of the eMERGE network. They have a portal called BioSERVE, but it is down at the time of writing. Like Vanderbilt, you really need a collaborator to help you navigate this one.
Estonian Biobank
The Estonian Biobank has 50k samples and the population is skewed quite old (perhaps 10% over 80). Unfortunately, only 15k of their samples are genotyped. The Estonians have a refreshingly straightforward application form and process for data access.
China Kadoorie Biobank
The China Kadoorie biobank is one of several international biobanks that are administered from Oxford. The biobank is 500k samples and they had BGI genotype 100k samples with a 384 SNP panel. The content of the panel is sadly unspecified. The CKB application process is pretty straightforward and they have a lot of projects underway. The recommended first step with CKB is an informal inquiry by email, so that's what I did.
UK Biobank
The UK biobank is large at 500k samples, and ostensibly open to biotech collaborations. It's also possibly the best phenotyped, although I believe no EMR data is included. However, UK Biobank are yet to genotype their samples, and the oldest participants are only 69 — an odd restriction on an otherwise amazing dataset and the major reason I have not applied here.
Other biobanks
There are a number of other biobanks out there, with varying degrees of obfuscation of what's actually in the bank and how to get access. I think this will improve over time, but meanwhile, I'd love to figure out which biobanks are really open and which are not.