GCCBOSC18 – 26/06/2018 Daily Notes

CWL Tutorial

https://docs.google.com/presentation/d/1Kf82kImNLucLvRsoafRWr_J1GEBzYosWuWmg6vRj7dg/edit

https://github.com/rabix/composer/pull/259

GATK Training

https://drive.google.com/drive/folders/1U6Zm_tYn_3yeEgrD1bdxye4SXf5OseIt

In order to make our time together as effective as possible, you’ll need to do a bit of homework before coming to the workshop session: download a data bundle and get GATK4 installed on your laptop. To be clear, you will not have time to do this at the start of the session so it’s imperative that you do this ahead of time.

1) Download the “gatk_bundle.zip” data bundle containing data that we will use in the hands-on exercises:
Direct link: https://drive.google.com/open?id=1ixEFNgWQBf79eRVqhKv9OGpXzGzcrbn2
Enclosing folder (containing additional material for further reading): https://drive.google.com/drive/folders/1U6Zm_tYn_3yeEgrD1bdxye4SXf5OseIt?usp=sharing

2) Install Docker on your laptop and download the GATK4 container image, which contains all the system dependencies needed to run GATK4. Please follow the instructions provided here:
https://gatkforums.broadinstitute.org/gatk/discussion/11090/how-to-run-gatk-in-a-docker-container/p1?new=1

If for whatever reason you are unable to follow the docker installation instructions, the recommended alternative is to use the Conda environment that we provide to manage dependencies, as described in the github repository README:
https://github.com/broadinstitute/gatk/blob/master/README.md

And if that doesn’t work for you, for the purposes of this workshop you can just get the GATK package, as long as you make sure you have Java 8 installed on your laptop: https://github.com/broadinstitute/gatk/releases/download/4.0.5.1/gatk-4.0.5.1.zip

Thank you and see you in Portland!

GATK Haplotype Called
GermlineCNVCaller
SVDiscovery
Mutect
ModelSegments

CNN, gCNV Germline CNV Calling
Probabilistic Graphical Models

THIS IS THE GATK WORKSHOP BUNDLE FOR MARCH 2018

The materials are now ready for download. The gatk_bundle.zip package contains the data that is sed in the hands-on exercises. The “worksheets” directory contains the exercise instructions. The “dayX” directories contain all the presentation slide decks from the workshop.

== LINKS FOR SHARING ==

PDFs and gatk_bundle: http://broad.io/gatk-1803
Installation prep: https://broad.io/gatk-w-prep

PairHMM depends on the machine you are running on.

15k genomes in 2 weeks.
76k genomes WGS processing
GenomicsDB gives 100k genomes, but still need some work for doing more than that.

https://software.broadinstitute.org/gatk/documentation/article?id=11090

docker run -v /path/gatk_data

https://software.broadinstitute.org/gatk/blog?id=11398

Somatic Variant Analysis
Call Variants per Sample
Haplotype Caller in GVCF mode

Alibaba Cloud
AWS
Azure
Google Cloud Platform
IBM Cloud

Not only of for the cloud
BIGStack* 2.0

docker run -v /home/raony/gccbosc/gatk/gatk_bundle/2-germline/:/gatk/gatk_data -it broadinstitute/gatk:4.0.5.1

gatk HaplotypeCaller -R /gatk/gatk_data/ref/ref.fasta -I /gatk/gatk_data/bams/mother.bam -O /gatk/gatk_data/sandbox/variants.vcf
Using GATK jar /gatk/build/libs/gatk-package-4.0.5.1-local.jar

gatk ValidateSamFile -I bams/mother.bam -MODE SUMMARY

gatk –java-options “-Xmx4G” MarkDuplicatesSpark -R ref/ref.fasta -I bams/mother.bam -O sandbox/mother_dedup.bam -M sandbox/metrics.txt — –spark-master local[*]
Using GATK jar /gatk/build/libs/gatk-package-4.0.5.1-local.jar

gatk –java-options “-Xmx4G” HaplotypeCaller -R /gatk/gatk_da/ref/ref.fasta -I /gatk/gatk_data/bams/mother.bam -O /gatk/gatk_data/sandbox/mother.g.vcf -ERC GVCF

gatk –java-options “-Xmx4G” HaplotypeCaller -R /gatk/gatk_da/ref/ref.fasta -I /gatk/gatk_data/bams/father.bam -O /gatk/gatk_data/sandbox/father.g.vcf -ERC GVCF

10reads of difference beetwen markduplicates, markduplicatesspark, they are trying to explain that.

7 different levels of certification
Stringent Options Available

export GATK_GCS_STAGING=gs://gatk-jar-cache/
gatk MarkDuplicatesSpark -R gs://gatk-workshops/GCCBOSC2018/ref/ref.fasta -I gs://gatk-workshops/GCCBOSC2018/ref/ref.fasta -O mother_dedup.bam -M metrics.txt — –spark-runner GCS –cluster aardvark-01

https://gccbosc2018.sched.com/overview/type/B.+Conference/Birds-of-a-Feather

 

Raony Guimaraes