June 26, 2016 by Raony Guimaraes on Uncategorized

The Feynman Technique: The Most Efficient Way to Learn Anything

Choose a Concept Teach it to a Toddler Identify Gaps and Go Back to The Source Material Review and Simplify

There are four simple steps to the Feynman Technique, which I’ll explain below:

Choose a Concept
Teach it to a Toddler
Identify Gaps and Go Back to The Source Material
Review and Simplify

***

If you’re not learning you’re standing still. So what’s the best way to learn new subjects and identify gaps in our existing knowledge?

Two Types of Knowledge

There are two types of knowledge and most of us focus on the wrong one. The first type of knowledge focuses on knowing the name of something. The second focuses on knowing something. These are not the same thing. The famous Nobel winning physicist Richard Feynman understood the difference between knowing something and knowing the name of something and it’s one of the most important reasons for his success. In fact, he created a formula for learning that ensured he understood something better than everyone else.

It’s called the Feynman Technique and it will help you learn anything faster and with greater understanding. Best of all, it’s incredibly easy to implement.

Source: The Feynman Technique: The Most Efficient Way to Learn Anything

June 24, 2016 by Raony Guimaraes on Uncategorized

Genome Differential Compressor

Compression and decompression speedThe compression speed of GDC varies depending on data, but for the mentioned data sets is from 95 to 200MB/s at a computer equipped with 6-core Intel i7 4930K 3.4GHz processor. The decompression speed is dependent on the disk speed and is up to 1000 MB/s on the mentioned system.GDC

GDC—What is it?

Genome Differential Compressor is a utility designed for compression of genome collections from the same species. The amount of such collections can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Universal compression programs like gzip or bzip2 might be used for this purpose, but it is obvious that a specialized tool can work much better, since a universal compressor does not use the properties of such data sets, e.g., long approximate repetitions at long distances.

GDC 2

The architecture of GDC 2

GDC 2 is designed as a C++ application. The key features of the software are:

compression of collections of genomes in FASTA format,
decompression of the whole collection,
decompression of only a single genome without decompressing the complete collection,

How good is GDC?

Compression factor

In terms of compression factor (the ability to reduce the file size), GDC is usually much better than universal compressors and other specialized genome compressors. Its compression factor for some test data sets are:

∼9500 — on a collection of 1092 diploid genomes of H. sapiens from 1000 Genome Project,
∼590 — on a collection of 785 genomes of A. thaliana from 1001 Genome Project.

Compression and decompression speed

The compression speed of GDC varies depending on data, but for the mentioned data sets is from 95 to 200MB/s at a computer equipped with 6-core Intel i7 4930K 3.4GHz processor. The decompression speed is dependent on the disk speed and is up to 1000 MB/s on the mentioned system.

June 24, 2016 by Raony Guimaraes on Uncategorized

All about the base | The Economist

New businesses eye the opportunities in managing genome data

THE project to understand the human genome has long promised to revolutionise the way that diseases are diagnosed, drugs are designed and even the way that medicine is practised. An ability to interpret human genetic information holds the promise of doing everything from predicting which drugs will work on a particular patient to identifying a person’s predisposition to develop diseases.

Genomic information is already transforming some medical practices. Sequencing has changed the way that fetuses are screened for Down’s syndrome, from a risky invasive test to one where abnormalities in fetal DNA can be picked up from blood drawn from the mother. In time this sort of method will extend to other genetic disorders and other medical applications. One area of promise is treating some types of cancer. Using blood tests to detect genetic changes in tumours could allow doctors to discover more quickly when drugs are no longer effective. This is so promising that there is already speculation that performing such “liquid” biopsies could be a $11 billion business by 2022.

Source: All about the base | The Economist

June 24, 2016 by Raony Guimaraes on Uncategorized

The wisdom of (smaller) crowds | Santa Fe Institute

When guessing the weight of an ox or estimating how many marbles fill a jar, the many have been shown to be smarter than the few. These collective displays of intelligence have been dubbed ‘the wisdom of crowds,’ but exactly how many people make a crowd wise?

New research by SFI Professor Mirta Galesic and her colleagues from the Max Planck Institute for Human Development in Berlin suggests that larger crowds do not always produce wiser decisions. In fact, when it comes to qualitative decisions such as “Which candidate will win the election” or “which diagnosis fits the patient’s symptoms,” moderately-sized ‘crowds,’ around five to seven members, are likely to outperform larger ones. In the real world, these moderately-sized crowds manifest as physician teams making medical diagnoses; top bank officials forecasting unemployment, economic growth, or inflation; and panels of election forecasters predicting political wins.

“When we ask ‘how many people should we have in this group?’ the impulse might be to create as big a group as possible because everyone’s heard of the wisdom of crowds,” Galesic says But in many real world situations, it’s actually better to have a group of moderate size.”

Where previous research on collective intelligence deals mainly with decisions of ‘how much’ or ‘how many,’ the current study applies to ‘this or that’ decisions under a majority vote. The researchers mathematically modeled group accuracy under different group sizes and combinations of task difficulties. They found that in situations similar to a real world expert panel, where group members encounter a combination of mostly easy tasks peppered with more difficult ones, small groups proved more accurate than larger ones.

“In the real world we often don’t know whether a group will always encounter only easy or only difficult tasks,” Galesic says. “And in many real world situations, an expert group will encounter a combination of mostly (for them) easy tasks and a few difficult tasks. In these circumstances, moderately-sized crowds will perform better than larger groups or individuals. Organizations might take this research to heart when designing groups to solve a series of problems.”

Read the paper on Research Gate (May 30, 2016)

Source: The wisdom of (smaller) crowds | Santa Fe Institute

June 24, 2016 by Raony Guimaraes on Plus

PHD Comics: The Beardening, Pt. 2

Link to Piled Higher and Deeper
PHD Comics: The Beardening, Pt. 2

June 24, 2016 by Raony Guimaraes on Uncategorized

AI, Apple and Google

(Note – for a good introduction to the history and current state of AI, see my colleague Frank Chen’s presentation here.)

In the last couple of years, magic started happening in AI. Techniques started working, or started working much better, and new techniques have appeared, especially around machine learning (‘ML’), and when those were applied to some long-standing and important use cases we started getting dramatically better results. For example, the error rates for image recognition, speech recognition and natural language processing have collapsed to close to human rates, at least on some measurements.

So you can say to your phone: ‘show me pictures of my dog at the beach’ and a speech recognition system turns the audio into text, natural language processing takes the text, works out that this is a photo query and hands it off to your photo app, and your photo app, which has used ML systems to tag your photos with ‘dog’ and ‘beach’, runs a database query and shows you the tagged images. Magic.

There are really two things going on here – you’re using voice to fill in a dialogue box for a query, and that dialogue box can run queries that might not have been possible before. Both of these are enabled by machine learning, but they’re built quite separately, and indeed the most interesting part is not the voice but the query. In fact, the important structural change behind being able to ask for ‘Pictures with dogs at the beach’ is not that the computer can find it but that the computer has worked out, itself, how to find it. You give it a million pictures labelled ‘this has a dog in it’ and a million labelled ‘this doesn’t have a dog’ and it works out how to work out what a dog looks like. Now, try that with ‘customers in this data set who were about to churn’, or ‘this network had a security breach’, or ‘stories that people read and shared a lot’. Then try it without labels (‘unsupervised’ rather than ‘supervised’ learning).

Today you would spend hours or weeks in data analysis tools looking for the right criteria to find these, and you’d need people doing that work – sorting and resorting that Excel table and eyeballing for the weird result, metaphorically speaking, but with a million rows and a thousand columns. Machine learning offers the promise that a lot of very large and very boring analyses of data can be automated – not just running the search, but working out what the search should be to find the result you want.

That is, the eye-catching demos of speech interfaces or image recognition are just the most visible demos of the underlying techniques, but those have much broader applications – you can also apply them to a keyboard, a music recommendation system, a network security model or a self-driving car. Maybe.

Source: AI, Apple and Google — Benedict Evans

June 23, 2016 by Raony Guimaraes on Uncategorized

How Google is Remaking Itself as a “Machine Learning First”

If you want to build artificial intelligence into every product, you better retrain your army of coders. Check.

“The tagline is, Do you want to be a machine learning ninja?” says Christine Robson, a product manager for Google’s internal machine learning efforts, who helps administer the program. “So we invite folks from around Google to come and spend six months embedded with the machine learning team, sitting right next to a mentor, working on machine learning for six months, doing some project, getting it launched and learning a lot.”

For Holgate, who came to Google almost four years ago after with a degree in computer science and math, it’s a chance to master the hottest paradigm of the software world: using learning algorithms (“learners”) and tons of data to “teach” software to accomplish its tasks. For many years, machine learning was considered a specialty, limited to an elite few. That era is over, as recent results indicate that machine learning, powered by “neural nets” that emulate the way a biological brain operates, is the true path towards imbuing computers with the powers of humans, and in some cases, super humans. Google is committed to expanding that elite within its walls, with the hope of making it the norm. For engineers like Holgate, the ninja program is a chance to leap to the forefront of the effort, learning from the best of the best. “These people are building ridiculous models and have PhD’s,” she says, unable to mask the awe in her voice. She’s even gotten over the fact that she is actually in a program that calls its students “ninjas.” “At first, I cringed, but I learned to accept it,” she says.

Source: How Google is Remaking Itself as a “Machine Learning First” Company — Backchannel

June 22, 2016 by Raony Guimaraes on Uncategorized

Design and synthesis of a minimal bacterial genome | Science

Design and synthesis of a minimal bacterial genome

Designing and building a minimal genome

A goal in biology is to understand the molecular and biological function of every gene in a cell. One way to approach this is to build a minimal genome that includes only the genes essential for life. In 2010, a 1079-kb genome based on the genome of Mycoplasma mycoides (JCV-syn1.0) was chemically synthesized and supported cell growth when transplanted into cytoplasm. Hutchison III et al. used a design, build, and test cycle to reduce this genome to 531 kb (473 genes). The resulting JCV-syn3.0 retains genes involved in key processes such as transcription and translation, but also contains 149 genes of unknown function.

Source: Design and synthesis of a minimal bacterial genome | Science

June 21, 2016 by Raony Guimaraes on Uncategorized

Web Service Efficiency at Instagram with Python — Instagram Engineering

Instagram currently features the world’s largest deployment of the Django web framework, which is written entirely in Python. We initially…

Source: Web Service Efficiency at Instagram with Python — Instagram Engineering

June 20, 2016 by Raony Guimaraes on Plus

Docker 1.12: Now with Built-in Orchestration!

Three years ago, Docker made an esoteric Linux kernel technology called containerization simple and accessible to everyone. Today, we are doing the same for container orchestration. Container orch…
Docker 1.12: Now with Built-in Orchestration!

Raony Guimaraes

Science, Bioinformatics, Linux and Fun!

Month: June 2016

The Feynman Technique: The Most Efficient Way to Learn Anything

Two Types of Knowledge

Genome Differential Compressor

GDC—What is it?

GDC 2

The architecture of GDC 2

How good is GDC?

Compression factor

Compression and decompression speed

All about the base | The Economist

The wisdom of (smaller) crowds | Santa Fe Institute

PHD Comics: The Beardening, Pt. 2

AI, Apple and Google

How Google is Remaking Itself as a “Machine Learning First”

Design and synthesis of a minimal bacterial genome | Science

Designing and building a minimal genome

Web Service Efficiency at Instagram with Python — Instagram Engineering

Docker 1.12: Now with Built-in Orchestration!