December | 2011 | Raony Guimaraes

In order to rerun my analysis from the exome pipeline I decided to try different experiments in order to evaluate which one fits better my analysis. So far I came up with this ideas:

Experiment 1

Use Fastx toolkit to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 2

Use Trimreads to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 3

Do no trimming, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 4

Use SOAP to align and call SNPs

I’m planning to use one individual from 1000 genomes to compare the genotype.

There is a huge mess in the fastq format description and sometimes you might get lost trying to find which is the encoding of your file. Because of that i found a script written in here https://github.com/brentp/bio-playground/blob/master/reads-utils/guess-encoding.py that you can use to find out which is the correct version of your files with the command:

awk ‘NR % 4 == 0’ your.fastq | python %prog [options]

So far this support the following encodings:

‘Sanger’: (33, 73),

‘Solexa’: (59, 104),

‘Illumina-1.3’: (64, 104),

‘Illumina-1.5’: (67, 104)

My file outputed Solexa Illumina-1.3 (66, 104)

There is also this script:

./SolexaQA.pl reads1.fastq

http://solexaqa.sourceforge.net/

Raony Guimaraes

Science, Bioinformatics, Linux and Fun!

Month: December 2011

Experimental Design

Fastq – Guessing between 4 formats