File conversion
Microarray SNP analysis can be performed using either Affymetrix or Illumina platforms. However, each creates output files in different formats, which may have the effect of tying analysis software to a particular platform. To allow use of non-Affymetrix data with AutoSNPa, Sample, DominantMapper and IBDfinder, the conversion programs Illumina2Affy versions 1 to 3, deCode2Affy and PLink2Affywere developed. They can be used to convert certain file formats to the Affymetrix format.
These programs were produced in reponse to requests from other labs wanting to use AutoSNPa or IBDfinder with Illumina data. If they do not meet your needs, feel free to contact us and we will try to help.
Illumina file conversion
These programs can be Download here.Illumina2Affy v.1
This program converts Illumina files containing ONE individual’s genotype data of the format shown below into an Affymetrix-style file. In these files, the column order must be as shown below:
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 | Col 6 | Col 7 |
SNP | Sample | Chr | Position | Allele 1 | Allele 2 | GC content |
---|---|---|---|---|---|---|
RS10938 | 400101 | 1 | 2016609 | A | B | 0.887 |
RS54678 | 400101 | 1 | 2503078 | B | B | 0.912 |
Table 1
Illumina2Affy v.2
This program converts Illumina files containing MULTIPLE individuals’ data, of the format shown below, into Affymetrix-style files, one file per person. The order of the first three columns is invariant; however, the program identifies genotype data by the presence of the suffix “.GType” in a column header. Therefore, the column order after Col 3 is not important:
Col 1 | Col 2 | Col 3 | Col n | Col m |
SNP Name | Chr | Position | 0001.GType | 0002.GType |
---|---|---|---|---|
SNP 1 | 1 | 1000000 | AA | BB |
SNP 2 | 1 | 2000000 | AB | AA |
SNP 3 | 1 | 3000000 | BB | AA |
SNP 4 | 1 | 4000000 | BB | AB |
Table 2
Illumina2Affy v.3
This program converts Illumina data files when the positional data is stored in a different file to the genotype data and the genotype file contains data for MULTIPLE individuals. The format of the SNP map file is shown in Table 3 and the format of the genotype file in Table 4.
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 |
[Header] | ||||
BSGT Version | 03/02/1932 | |||
Processing Date | 6/16/2009 11:46 AM | |||
Content | HumanHap550v3_A.bpm | |||
Num SNPs | 561466 | |||
Total SNPs | 561466 | |||
Num Samples | 4 | |||
Total Samples | 4 | |||
[Data] | ||||
HG_WUE_NRAA | HG_WUE_NRFN | HG_WUE_NROR | HG_WUE_NR-Fet | |
MitoA10045G | AA | AA | AA | AA |
MitoA10551G | AA | AA | AA | AA |
MitoA11252G | BB | BB | AA | BB |
MitoA11468G | AA | AA | AA | AA |
MitoA11813G | -- | AA | AA | -- |
Table 3: Genotype data file
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 | Col 6 | Col 7 | Col 8 | Col 9 |
Index | Name | Chromosome | Position | GenTrain Score | SNP | ILMN Strand | Customer Strand | NormID |
1 | MitoA10045G | M | 10045 | 0.7355 | [T/C] | Bot | Top | 0 |
2 | MitoA10551G | M | 10551 | 0.7128 | [A/G] | Top | Top | 0 |
3 | MitoA11252G | M | 11252 | 0.7452 | [T/C] | Bot | Top | 0 |
4 | MitoA11468G | M | 11468 | 0.7345 | [T/C] | Bot | Top | 0 |
Table 4: SNP map file
Download here
Note: these programs require the .NET framework version 2.0 to be installed.
deCode file conversion
This program converts deCode files containing ONE individual’s genotype data of the format shown below, into an Affymetrix-style file. Since the deCode data files use the SNPs alleles to describe the genotype rather than using 'A' or 'B' the program must identify the different alleles and then designate them as 'A' or 'B', consequently all the files used in an analysis must be converted in the same batch, otherwise the genotype designation may vary between files converted in different batches. In these files, the column order must be as shown below:
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 | Col 6 | Col 7 |
Name | Variation | Chromosome | Position | Strand | Your code | |
---|---|---|---|---|---|---|
RS10938 | A/G | 1 | 2016609 | + | A | G |
RS54678 | C/T | 1 | 2503078 | + | C | T |
Table 5
Download here
Note: this program requires the .NET framework version 2.0 to be installed.
PLink pedigree and map file conversion
This program converts PLink pedigree (*.ped) and map (*.map) files, into an Affymetrix-style files. While Affymetrix files label the alleles as 'A' or 'B', Plink pedigree files use either the numbers 1 and 2 or the actual alleles (A, C, G or T). If the pedigree file contains the actual alleles, Plink2Affy must first identify the different alleles and then designate them as 'A' or 'B', consequently it is possible that the resultant Affymetrix files originating from different pedigree files may have different genotype designations. The Plink *.ped and *.map files must be in the file format shown below:
More information about PLink files here
Col 1 | Col 2 | Col 3 | Col 4 | Col 5 | Col 6 | Col 7 | Col 8 | Col 9 |
Family | Individual | Paternal ID | Maternal ID | Sex | Phenotype | SNP1 Allele 1 | SNP1 Allele 2 | etc... |
---|---|---|---|---|---|---|---|---|
FAM001 | 1 | 0 | 0 | 1 | 2 | A | A | ... |
FAM001 | 2 | 0 | 0 | 1 | 2 | A | A | ... |
Table 6: PLink pedigree file (*.ped)
Col 1 | Col 2 | Col 3 | Col 4 |
Chromosome | SNP ID | Genetic position | Physical position |
---|---|---|---|
1 | rs123456 | 1.23 | 1234555 |
1 | rs234567 | 1.27 | 1237793 |
1 | rs224534 | 1.32 | 1237697 |
1 | rs233556 | 1.35 | 1337456 |
Table 7: PLink map file (*.map)
Download here (This is version 4; updated on 13th March 2013)
Note: this program requires the .NET framework version 2.0 to be installed.
23andMe file conversion
This program converts 23andMe files containing ONE individual’s genotype data of the format shown below, into an Affymetrix-style file. Since the 23andMe data files use the SNPs alleles to describe the genotype rather than using 'A' or 'B' the program must identify the different alleles and then designate them as 'A' or 'B', consequently all the files used in an analysis must be converted in the same batch, otherwise the genotype designation may vary between files converted in different batches. In these files, the column order must be as shown below:
Col 1 | Col 2 | Col 3 | Col 4 |
Name | Chromosome | Position | Genotype |
---|---|---|---|
RS10938 | 1 | 2016609 | AG |
RS54678 | 1 | 2503078 | CT |
Table 6. The start of the file contains a data description where each line begins with a '#' symbol.
Download here (This is version 1; updated on 21th June 2013)
Note: this program requires the .NET framework version 2.0 to be installed.