September 7, 2012 by Raony Guimaraes on Sem categoria

Vídeos and podcasts for traveling

http://www.openhelix.com/ENCODE2
http://www.openhelix.com/ENCODE

June 11, 2012 by Raony Guimaraes on Sem categoria

3d slide transitions on OpenOffice

Today I installed:
sudo apt-get install openoffice.org-ogltrans

And now I’m able to make 3d slide transitions on OpenOffice

June 4, 2012 by Raony Guimaraes on Relatório de Atividades LGC, Sem categoria

Replication studies: Bad copy

This week I read an article about how hard is to replicate published studies.
http://www.nature.com/news/replication-studies-bad-copy-1.10634

May 30, 2012 by Raony Guimaraes on Sem categoria

Agora sim!

Bom, depois de muito tempo refletindo sobre a melhor maneira de registrar as atividades do meu doutorado decidi que a melhor maneira de manter tudo organizado seria atraves deste blog.Porém com algumas ressalvas, apesar do blog se chamar Openscience pretendo manter os registros das atividades fechadas e postar mais sobre topicos em que ando trabalhando. Anteriormente estive usando um documento no google docs que se tornou extremamente grande, esse foi o maior motivo que me fez migrar para cá! Para isso preparei uma lista de alguns temas que considero comuns na area da Bioinformatica:

Markov

Burrows-Wheller

Montecarlo

Bayes

SVD

PCA

SVM

March 14, 2012 by Raony Guimaraes on Relatório de Atividades LGC, Sem categoria

Libraries to generate plot for Bioinformatics

http://biostar.stackexchange.com/questions/18390/best-graphics-gallery-or-blogs-for-bioinformatics-use

February 13, 2012 by Raony Guimaraes on Relatório de Atividades LGC, Sem categoria

Less without WordWrap

To visualize big files on terminal without linebreaks you can use the command “less -S”

January 9, 2012 by Raony Guimaraes on Cursos, Relatório de Atividades LGC

Cursos

Curso de R

http://manuals.bioinformatics.ucr.edu/home/ht-seq#TOC-Running-SOAP

Curso de Python

http://code.google.com/edu/languages/google-python-class/

December 8, 2011 by Raony Guimaraes on Relatório de Atividades LGC

Experimental Design

In order to rerun my analysis from the exome pipeline I decided to try different experiments in order to evaluate which one fits better my analysis. So far I came up with this ideas:

Experiment 1

Use Fastx toolkit to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 2

Use Trimreads to clean the reads, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 3

Do no trimming, align with BWA, postprocess with GATK and evaluate the genotype of the individual

Experiment 4

Use SOAP to align and call SNPs

I’m planning to use one individual from 1000 genomes to compare the genotype.

December 1, 2011 by Raony Guimaraes on Relatório de Atividades LGC, Utils

Fastq – Guessing between 4 formats

There is a huge mess in the fastq format description and sometimes you might get lost trying to find which is the encoding of your file. Because of that i found a script written in here https://github.com/brentp/bio-playground/blob/master/reads-utils/guess-encoding.py that you can use to find out which is the correct version of your files with the command:

awk ‘NR % 4 == 0’ your.fastq | python %prog [options]

So far this support the following encodings:

‘Sanger’: (33, 73),

‘Solexa’: (59, 104),

‘Illumina-1.3’: (64, 104),

‘Illumina-1.5’: (67, 104)

My file outputed Solexa Illumina-1.3 (66, 104)

There is also this script:

./SolexaQA.pl reads1.fastq

http://solexaqa.sourceforge.net/

October 3, 2011 by Raony Guimaraes on Relatório de Atividades LGC, Sem categoria

Notes

DATASETS from :references for SIFT, PolyPhen, annovar

OMIM variants extracted by Omicia and provided as a track (OMICIA_auto) on the next release of UCSC tables (http://genome-preview.ucsc.edu/…)

COSMIC rev54 (now 55 since a couple of days) DL as a text table I had to convert to BED with some perl magic (ftp://ftp.sanger.ac.uk/pub/CGP/cosmic)

dbSNP was not an easy catch and I am still struggling to get the full information from their difficult batch download system (only feasible through ensembl BIOMART so far: [tip: hg18 BIOMART is at:http://may2009.archive.ensembl.org/biomart/martview/]). For dbSNP, I searched for records with phenotype (thanks to another colleague) which is the only available annotation to pick disease variants but in fact includes many association results which are far from being causative .

Cancer Datasets

http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi

Breast Cancer Datasets

http://bioinformatics.nki.nl/data.php

Raony Guimaraes

Science, Bioinformatics, Linux and Fun!

Asides