Software and Scripts.

Ian Wilson

Software and scripts used in the paper paper Titin founder mutation is a common cause of myofibrillar myopathy with early respiratory failure. 2014. Pfeffer et al. link. These have all been tested on ubuntu and red hat linux. . The tools tabix, vcftools and R are available as part of the ubuntu package management system and can be installed using sudo apt-get install tabix vcftools rbase

tabix available as part of the samtools package.
vcftools
PHASE
haploview
The R statistical programming environment.

The custom written scripts and R code used are

1000_genomes_script. Uses tabix and vcftools to download.
filter1000genomes.py
Transform_PHASE.R
haploview_phase.R
readphase.R

Data Files

phase1_samples_integrated_20101123.ped downloaded from the 1000 genomes data repository.

The 100 genomes data was downloaded between positions 178404570 and 179734924 using tabix. This was then converted to the IMPUTE format for phased data by vcftools (vcftools.sourceforge.net). Filtering this data by only including positions shared with our assays was performed using a custom written python script, filter1000genomes.py, and then the data was prepared for the programs PHASE and haploview/ using R scripts, selecting only those individuals from the CEU and GB populations that did not have parents in the data set, and by adding homozygote wild type alleles at the disease position for 1000 genomes data.

Sample information was obtained from the file phase1_samples_integrated_20101123.ped downloaded from the 1000 genomes data repository.

PHASE was used to phase haplotypes, using the known haplotypes option to use the 1000 genomes data as known phase data. The pairs output file was used to determine the relative posterior probability of the different haplotype reconstructions for these data. R was used for post processing of PHASE pairs output. The haploview program was used to investigate the haplotype structure of the population data and to produce plots.

Workflow

Once you have all the tools installed, then the analysis done in the paper can be repeated with the following commands.

Create a working directory and copy the files 1000_genomes_script, haploview_phase.R, Transform_PHASE.R, HMERFmatch1K.csv, HMERF_Haplotype.csv , filter1000genomes.py and readphase.R.
Run the 1000_genomes_script using
bash 1000_genomes_script
R --vanilla readphase.R

ian.wilson@newcastle.ac.uk