Report 21/09/12

Indel Detection

Atlas, Dindel

ruby1.9.1 /lgc/programs/Atlas2_v1.4.1/Atlas-Indel2/Atlas-Indel2.rb -b ../../input/Exome_1_RP.realigned-recalibrated.bam -r /lgc/datasets/gatk_data/hg19/ucsc.hg19.fasta -o rms_indel -S

Varid Indel

varid_exec -a ../../../../input/Exome_1_RP.realigned-recalibrated.sam -r /lgc/datasets/gatk_data/hg19/ucsc.hg19.fasta -o varid_rms_detection –threads 4 –format vcf

Phasing Beagle

Agora sim!

Bom, depois de muito tempo refletindo sobre a melhor maneira de registrar as atividades do meu doutorado decidi que a melhor maneira de manter tudo organizado seria atraves deste blog.Porém com algumas ressalvas, apesar do blog se chamar Openscience pretendo manter os registros das atividades fechadas e postar mais sobre topicos em que ando trabalhando. Anteriormente estive usando um documento no google docs que se tornou extremamente grande, esse foi o maior motivo que me fez migrar para cá! Para isso preparei uma lista de alguns temas que considero comuns na area da Bioinformatica:

Markov

Burrows-Wheller

Montecarlo

Bayes

SVD

PCA

SVM

 

Experimental Design

In order to rerun my analysis from the exome pipeline I decided to try different experiments in order to evaluate which one fits better my analysis. So far I came up with this ideas:

Experiment 1

Use Fastx toolkit to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 2

Use Trimreads to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 3

Do no trimming, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 4

Use SOAP to align and call SNPs

I’m planning to use one individual from 1000 genomes to compare the genotype.

Fastq – Guessing between 4 formats

There is a huge mess in the fastq format description and sometimes you might get lost trying to find which is the encoding of your file. Because of that i found a script written in here https://github.com/brentp/bio-playground/blob/master/reads-utils/guess-encoding.py that you can use to find out which is the correct version of your files with the command:

awk ‘NR % 4 == 0’ your.fastq | python %prog [options]

So far this support the following encodings:

‘Sanger’: (33, 73),
‘Solexa’: (59, 104),
‘Illumina-1.3’: (64, 104),
‘Illumina-1.5’: (67, 104)
My file outputed Solexa  Illumina-1.3    (66, 104)
There is also this script:
./SolexaQA.pl reads1.fastq