SAMPLE - User Guide

Shadow Autozygosity MaPping by Linkage Exclusion

Requirements

This program is designed to run on Windows XP SP3 or Vista SP1 systems that have the .NET 2.0 framework installed, which is freely available from Microsoft .

Genotyping should be performed using very high density SNP microarrays such as Affymetrix SNP5 or SNP6 chips. SNP6 data files must be annotated with chromosome and positional data, which can conveniently be done using SNP6Annotator.

Assumptions and DNA availability

The underlying algorithm of SAMPLE works on the following assumptions about the families and disease gene:

The disease is a recessive condition.
The affected individuals all have mutations in the same gene.
The families are inbred and the pathogenic mutation is inherited identical by descent (IBD) in affected individuals.
Only one mutant allele is present in each pedigree (although each pedigree may have a different mutation).

The program was specifically designed for instances where it is not possible to obtain DNA suitable for microarray SNP analysis from affected individuals, while it is still possible to analyse their parents and unaffected siblings.

Data entry

Adding parents

Figure 1

Each family is added sequentially and must include both parents. (Children are optional.) A parent is added by using the appropriate Select button (Figure 1, highlighted in blue for a father and red for a mother) to load the parent's SNP data file. The name of the file will then be displayed by the program (red and blue underlining in Figure 1).

Adding unaffected sibs of affected patients

Figure 2

A family may optionally also include one or more children, whose data are added using the Children button (highlighted in green in Figure 2). The name of the selected data file will be added to the drop-down list (green underlining in Figure 2). To remove a child, select his or her file from this list box and press the Delete button. To remove all children, press the Clear button which is also located under the drop-down list of children's filenames.

Adding a family

Once the parents and children have been added, the family can be stored by pressing the Add button (highlighted in red in Figure 2). This clears the display of the names of the files linked to the family, and instead adds a family name entry to the second drop-down list, below the Add button. This family name is created by combining the father's and mother's file names. Families can be removed from the analysis using the additional Clear and Delete buttons below the drop-down list of family names.

Adding families to pedigrees

Figure 3

Once all the families have been added, related nuclear families may be linked to an extended pedigree by pressing the Select button (Figure 3, highlighted in red). (Alternativel, if there is no known kinship between any of the nuclear families, this must be specified by clicking No pedigrees.) After clicking Select, the form shown in Figure 4 will be displayed, in which the families are listed in the upper and pedigrees in the lower drop-down list box.

Figure 4

Initially the pedigree list only contains one entry, but once a family has been linked to this pedigree, a new, empty pedigree will be added to the end of the list. To link a family to a pedigree, select the family from the upper drop-down list and the pedigree from the lower list, and then press Link. A family can only belong to one pedigree; thus, if the family is linked to a second pedigree, it will be automatically unlinked from its previous pedigree. To unlink a family manually, select it from the upper list and press Unlink. It is assumed that families that are not linked to any pedigree are not related to one another. It is therefore not necessary to create pedigrees for a single family.

Figure 5

When all pedigrees have been specified, press the View button to display an information window listing the families and the pedigree to which each is linked (Figure 5). If this is correct, press the Done button at the bottom of the previous form (Figure 4).

Viewing the analysis results

Analysing and viewing the data

Figure 6

To analyse the SNP genotype data, click Analyse at the bottom of the main form and choose the unit of distance you wish to use (Figure 6). Only distance units present in the data file of the father in the first family can be selected; e.g., in Figure 6 only physical distance is enabled, since the file contains no genetic map data.

The program will then load the data, first checking for SNPs that have either a Nocall genotype or show non-Mendelian inheritance; such SNPs are discarded from the database and any subsequent analysis. Pressing the View button next to the Analyse button now opens a new window displaying the results of the analysis (Figure 7).

Figure 7

As with IBDfinder and AutoSNPa, the results are displayed visually and no mathematical or statistical analysis is performed. Rather, the program highlights the movement of large segments of the genome from parent to child, which have undergone relatively few recombination events. The sizes of the fragments that contain the mutant allele are very variable, and the length of a common region consequently does not predict whether or not it contains the disease gene.

The results window displays the data one chromosome at a time, with Chr. 1 initially selected (Figure 7). The window contains two display panels; the upper panel shows the positions of SNP that exclude linkage to a disease gene, while the lower panel shows a graph of an empirically derived score, plotted against chromosomal position.

Features common to both panels

Both panels are drawn to the same scale, indicated by the rulers at the bottoms of the panels. The units are Mb or cM according to whether physical or genetic map position data were selected earlier. Just above the displayed map positions on the ruler is a discontinuous thick black line, within which the gaps represent regions of no SNP coverage. Each chromosome is drawn to the same scale, such that the longest chromosome spans the full panel width.

Upper panel

The upper panel is composed of six strips of vertical markers. Each marker represents a SNP that can be excluded from linkage to the disease gene. Thus, the absence of markers suggests a region may be linked to the disease locus. The colour of each marker represents the reason the SNP was excluded, as follows:

Red:	SNPs for which the parents within one nuclear family are homozygous for different alleles.
Dark green:	SNPs for which unaffected children within a single nuclear family are homozygous for both alleles.
Black:	SNPs excluded because a child whose parents are heterozygous is homozygous for the same allele as another parent within the same pedigree.
Pale green:	SNPs excluded because unaffected children of heterozygous parents within a pedigree are homozygous for both alleles.
Pink:	SNPs excluded because the parents within one pedigree do not share a common allele.

Each strip displays SNPs excluded on one of these five criteria, except for the uppermost strip, which shows combined information for all the excluded SNPs.

Lower Panel

Figure 8

Due to the limited screen resolution, compared to the large number of SNPs per chromosome, multiple SNPs are likely to occupy the same pixel on the screen. It is consequently difficult to discern whether a region has been excluded by just a few or by many SNPs. To give an indication of the number of SNPs that exclude a region, the lower panel shows a graph of the number of non-excluding SNPs in a sliding SNP window. The size of this window is set using the Window size drop-down list box (Figure 8, underlined in red). Since most SNPs are uninformative, the graph only shows regions that have 25 or fewer excluding SNPs. The horizontal gridlines indicate the number of excluding SNPs in the window and are labelled in Figure 8.

Figure 9

Initially, this value is plotted as a line graph, with the points indicating the centre of the SNP window. However, since SNP density is not uniform along the chromosome, it is alternatively possible to view the graph as a series of tapes or bars that show the extents of the windows (Figures 9A,B). While the Bar view displays the width of a window, it is possible for regions to overlap, making them appear to be one wide region. To overcome this, the Tape plot highlights points where regions overlap. These different plots are selected using the Plot type drop-down menu (underlined in green in Figure 8).

View options

Below the lower results panel is a series of controls for changing the view options of the two panels. These include the Window size and Plot type controls described above, while the others are used respectively to select the chromosome displayed, select information from a single family and save the underlying data to a file.

Figure 10

Chromo list:: This contains a list of the autosomes, used to select which chromosome is displayed. (The current chromosome is also indicated at the left-hand side of the title bar.) For example, to view Chr. 8, select 8 from the list (Figure 10, green underlining).

Figure 11

Family list:: By default the All value is selected and data from all the families are shown in the panels. To view regions excluded by a single family, select the number from the list, representing the order in which that family was added. For example, to view data for the first family added to the program, select 1 (Figure 11, underlined in red). Note that since only data from a single nuclear family are now shown, SNPs are no longer excluded by analysis across a pedigree. Therefore, only red and dark green marks appear in the upper panel.

Figure 12

Save data:: A region's underlying data can be exported either to a colour-coded web page or to a tab-delimited text file. To select a region, place the mouse cursor at the start of the region (on either the upper or lower panel) and while holding down the left mouse button, drag the cursor to the end of the region. The currently selected region will be delimited by two black vertical lines on the upper and lower panels (Figure 12). To save the data, press Save data (underlined in red in Figure 12) and enter the name of the output file and the desired file extension.

Figure 13

Window size:: The lower panel represents a graph of the number of non-excluding SNPs in a sliding window. This value can be set to 100, 200, 300 or 400 SNPs (Figure 13A-D, respectively).

Interpreting the results

The SAMPLE algorithm works by tracking the movement of large segments of chromosomes from parents to offspring. Each family has rather little information content, compared to standard autozygosity mapping. The discriminative power derives from collating the exclusion information across a number of families. The size of a region therefore depends on the unpredictable way in which the regions from each family overlap. As in classical autozygosity mapping, therefore, the disease gene may fall in a very large region or a very small one that does not stand out from the background noise.

In the data shown above, the disease gene is located at 94.7 Mb on Chr. 8. The peak in the graph corresponding to this position is the third largest at a window size of 300 SNPs. However, at 400 SNPs the peak almost disappears, and is approximately the 15th ranking peak. At 200 SNPs, the peak is still one of the more prominent peaks, but the difference between it and other similar peaks is not as clear as at 300 SNPs. Therefore, the SAMPLE program should ideally be used to screen a list of candidate genes or regions that have been identified by other means.