File format description

Introduction

The data files used by programs in the Agile suite of programs to store the varaint and read depth data are simple tab-delimited plain text. Data for each variant is located on at least two lines, with the first line containing data on the sequence variant and the subsequent lines containing the data on the affect the variant has on the genes transcripts (one line per transcript). Single base changes and deletions use a common line format, however insertion variants us a different format. Data describing a single base variant start with a 'S' and lines containing data on inserts begin with an 'S', each format is described below:

Single base variants

When opened in a spread sheet program, the data for a single base variant occupies a number of cells, labelled A to U in Figure 1. The data used to describe single base sequence variants are identified by a 'S' in the first cell of the first data line of a sequence variant (See A in Figure 1A).

File format Screenshot 1

Figure 1: The file format for a single base sequence variant.

Insert variants

When opened in a spread sheet program, the data for a DNA insert variant occupies a number of cells, labelled A to U in Figure 1. The data used to describe single base sequence variants are identified by a 'S' in the first cell of the first data line of a sequence variant (See A in Figure 1A).
File format version2

Figure 2: The file format for a insert sequence variant.

Read depth file format

The read depth file format is shown in Tabe 1 below. The file is a tab-delimited plain text file, with each line containing the read depth information for a single exon. When opened in a spread sheet application the first column identifies the chromosome that contains the gene named in the second column. the third column identifies the exon, with the numbers starting at 0 and not 1. Also the exon are number from the p telomere end of the gene, so genes encoded on the reverse strand of a chromosome are numbered in the opposite direction than expected. The remaining three columns contain the read depth values that 95%, 90% and 50% of the positions in each exon have are exceed. For example row one of Table 1 relates to the first exon (as judged by is closeness to the p telomere) of SAMB11 and 95% of the coding positions have a read depth of 62 reads or more, 90% of the positions have a read depth of 66 reads or more and 50% of the positions have a read depth of 78 reads or more. The last value is equivalent to the median read depth of the coding sequences of the exon. If a gene has no reads mapped to its exons, the gene will not appear in this list and all exon read depth values will be set to 0.

ChromosomeGene nameExon number95% read depth105 read depth50% read depth
1SAMD110626678
1SAMD1113311
1SAMD112131417
1SAMD1136835
1SAMD114333448
1SAMD1156610
1SAMD116568
1SAMD117000
1SAMD118000
1SAMD119001
1SAMD1110151621
1SAMD11117811
1SAMD11123318
1NOC2L0015
1NOC2L1311330400
1NOC2L2238263422
1NOC2L3424857
1NOC2L4323339
1NOC2L59927
1NOC2L6132144184
1NOC2L7141826
1NOC2L84456102
1NOC2L9324078
1NOC2L105254110
1NOC2L1191215
1NOC2L12373749
1NOC2L13107126319
1NOC2L14114118161
1NOC2L15367414542
1NOC2L16193171
1NOC2L1791221
1NOC2L18000

Table 1