Data & Methods

Datasets

See here for table listing the datasets associated with the flagship UK10K paper available in the EGA. These datasets correspond to the data analysed for the main UK10K paper.

See here for a table listing all UK10K datasets available in the EGA. Over the course of the UK10K project, data was released periodically. Releases were generally cumulative, in that samples were added between releases, however there were some samples dropped between releases when stricter QC measures were applied. There were also some follow up studies that were not included in the main analysis. These include high-coverage WGS sequencing for 20 RARE samples and a large exome sequencing follow-up for UK10K_RARE_FIND with 1151 samples. Additionally there were a handful of exome samples that came in after the final freeze. Data for these are included in their own datasets.

Sites and allele frequencies

A VCF and a tab-delimited file are both available on the Sanger ftp site with sites, and allele frequencies for the final UK10K COHORT datasets. Allele frequencies for the UK10K exome studies are only available by obtaining access to the individual exome studies. See the data access page for information about requesting access.

The VCF is annotated with rsIDs from dbSNP138, and the following INFO fields:

  • DP: Raw read depth
  • VQSLOD: Variant Recalibration Score from GATK
  • AC: Allele count in called genotypes in UK10K
  • AN: Total number of alleles in called genotypes in UK10K
  • AF: Allele frequency in called genotypes in UK10K
  • AC_TWINSUK: Allele count in TWINSUK cohort
  • AN_TWINSUK: Total number of alleles in TWINSUK cohort
  • AF_TWINSUK: Allele frequency in TWINSUK cohort
  • AC_TWINSUK_NODUP: Allele count in TWINSUK cohort excluding 67 samples where a monozygotic or dyzygotic twin was included in the release
  • AN_TWINSUK_NODUP: Total number of alleles in TWINSUK cohort excluding 67 samples where a monozygotic or dyzygotic twin was included in the release
  • AF_TWINSUK_NODUP: Allele frequency in TWINSUK cohort excluding 67 samples where a monozygotic or dyzygotic twin was included in the release
  • AC_ALSPAC: Allele count in called genotypes in ALSPAC cohort
  • AN_ALSPAC: Total number of alleles in called genotypes in ALSPAC cohort
  • AF_ALSPAC: Allele frequency in called genotypes in ASLPAC cohort
  • AF_AFR: 1000 Genomes Phase 1 Allele Frequency in African populations
  • AF_AMR: 1000 Genomes Phase 1 Allele Frequency in American populations
  • AF_ASN: 1000 Genomes Phase 1 Allele Frequency in Asian populations
  • AF_EUR: 1000 Genomes Phase 1 Allele Frequency in European populations
  • AF_MAX: 1000 Genomes Phase 1 Maximum Allele Frequency
  • ESP_MAF: Minor Allele Frequency for European American, African American and All populations in the NHLBI Exome Sequencing Project (ESP)
  • CSQ: Functional annotation from Ensembl 75 Variant Effect Predictor

Methods

Coming soon...

FAQ

  • Q: Why are there multiple BAM files per-sample?
    A: Only raw sequence data was submitted to the EGA. Most of the sequencing for UK10K was multiplexed, so there were multiple runs per-sample and the individual plexes were combined to make per-sample BAMs (see Alignment and BAM processing methods). Multiplexing reduced the likelihood that a single failed run would remove whole samples. The individual plex BAMs are the raw data that is submitted to the EGA.

  • Q: Which samples should be used for association analyses?
    A: Some samples in the main UK10K cohorts release were found to be of non-European background. Included in the analysis download is a file listing samples which should be excluded from certain analyses such as association.

  • Q: Is phenotype data available for the exome studies?
    A: Sample sex should be available in the EGA download. For other phenotype information, please get in touch with the individual study contacts listed on the studies page.

Cookies policy | Terms & Conditions This site is hosted by the Wellcome Trust Sanger Institute.