Spectrum Genetic Data
The genome is divided into 23 pairs of chromosomes (22 pairs of autosomal chromosomes, 1 pair of sex chromosomes) that are made up of more than 3 billion building blocks called nucleotide bases (A, T, C and G). Whole genome sequencing is the procedure that determines the order of nucleotide bases in the genome of a person at a single time. Typically, a generated sequence is compared to an accepted representation of the human genome (called the reference sequence) to detect any variant nucleotides. The reference sequence is a general framework and it is not the sequence of a single individual.
The genetic data that a participant obtains from Spectrum is called low-pass whole genome sequencing data and it is obtained by comparing the participants’ sequence to the GRCh38 v6.0 human reference sequence. The standard format for storing sequence variations is called the variant call format (VCF). Because the file name suffix is .vcf, phones and computers may translate the data file as a contact card file (
https://www.lifewire.com/vcf-file-2622845) which one has to be mindful of.
There are a lot of paid and free DNA websites that allow uploading of external data. Most of these sites seek to extract information on health/wellness and/or ancestry. These websites require that the data be in a particular format. Examples include:
• GEDmatch – provides ancestry tools and accepts data in 23andMe, AncestryDNA or FamilyTreeDNA formats.
• Genomelink – can provide an ancestry report, physical trait information or wellness tips. Genomelink accepts data in 23andMe, AncestryDNA or MyHeritage formats.
• Nebula Genomics – provides trait reports and other information about a user’s genome. The file formats accepted are 23andMe or AncestryDNA.
• Genetic Genie – allows for easy interpretation of a user’s genetic information. File formats accepted include 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, LivingDNA, whole genome/exome sequencing VCF files, etc.
• Sequencing.com – provides ancestry and health reports. Accepted file formats include 23andMe, AncestryDNA, FamilyTreeDNA, MyHeritage, LivingDNA, whole genome/exome sequencing VCF files, etc.
• etc.
Spectrum participants have the right to download their data and upload to any one of the available DNA websites. However, we advise caution and hope that participants are fully aware of the risks of such uploads before proceeding (read all necessary Terms and Conditions). Given that current Spectrum data are VCF files, there is a smaller set of websites that a participant can upload their data to e.g. Sequencing.com. In the future, we will provide 23andMe format data, which will allow participants to upload to a broader set of DNA websites. If you want to convert your Spectrum VCF file to 23andMe format, follow these steps:
• Download the bioinformatics software PLINK (
https://www.cog-genomics.org/plink/)
• In your command line, run the following command:
plink --vcf [name of spectrum vcf file] --snps-only --recode 23 --out new23andMeFile