Start here to learn R!

Ready, set, go!

On R-exercises, you will find hundreds of exercises that will help you to learn R. We’ve bundled them into exercise sets, where each set covers a specific concept or function. An exercise set typically contains about 10 exercises, progressing from easy to somewhat more difficult. In order to give you a full picture of all the amazing content on this website, we’ve categorized all sets into broader topics below.

If you’re completely new to R, we suggest you simply start with the first topic, “Vectors”. Once you’ve managed to work through all exercise sets, from top to bottom, you should have a fair amount of knowledge of, and practical experience with, using R. Of course, those of you who are familiar with R already, can jump straight to any of the topics below.

Source: Start here to learn R!

 

Approaching (Almost) Any Machine Learning Problem

An average data scientist deals with loads of data daily. Some say over 60-70% time is spent in data cleaning, munging and bringing data to a suitable format such that machine learning models can be applied on that data. This post focuses on the second part, i.e., applying machine learning models, including the preprocessing steps. The pipelines discussed in this post come as a result of over a hundred machine learning competitions that I’ve taken part in. It must be noted that the discussion here is very general but very useful and there can also be very complicated methods which exist and are practised by professionals.

We will be using python!

Source: Approaching (Almost) Any Machine Learning Problem | Abhishek Thakur | No Free Hunch

 

Scientists pave the way for large-scale storage at the atomic level

Individual atoms offer ultra-dense information storage

WHAT if “we can arrange the atoms the way we want; the very atoms, all the way down”? So asked the physicist Richard Feynman in an influential 1959 lecture called “There’s Plenty of Room at the Bottom.” This manipulation would mean that information, like text, could be written using atoms themselves. In his speech, Feynman predicted that the entire Encyclopædia Britannica could be written on the head of a pin.

Just over three decades later, a group of scientists at IBM managed exactly that. They were able to write their company’s name using 35 xenon atoms resting on a sheet of nickel—the first demonstration of precise atomic placement. Individual atoms, though, tend to jiggle around. They jiggle less as the temperature drops, so to keep the atoms in place, the researchers had to cool them to -269ºC, just 4ºC above absolute zero, the coldest temperature physically possible. That was so expensive that writing more information than three letters did not make sense.

Now, a team of researchers led by Sander Otte at Delft University of Technology, in the Netherlands, have devised a better way to keep the atoms in place, paving the way for large-scale storage at the atomic level. As they report in Nature Nanotechnology, the group was able to store an entire paragraph of text (1 kilobyte of data) at the relatively balmy temperature of -196ºC. That may not sound like a big change, but such temperatures can be achieved with liquid-nitrogen cooling, which is much cheaper than the liquid helium used in the IBM experiment.

Source: Scientists pave the way for large-scale storage at the atomic level

 

The 7 biggest problems facing science, according to 270 scientists – Vox

These are dark times for science so we asked hundreds of researchers how to fix it.

“Science, I had come to learn, is as political, competitive, and fierce a career as you can find, full of the temptation to find easy paths.” — Paul Kalanithi, neurosurgeon and writer (1977–2015)

Science is in big trouble. Or so we’re told.

In the past several years, many scientists have become afflicted with a serious case of doubt — doubt in the very institution of science.

As reporters covering medicine, psychology, climate change, and other areas of research, we wanted to understand this epidemic of doubt. So we sent scientists a survey asking this simple question: If you could change one thing about how science works today, what would it be and why?

We heard back from 270 scientists all over the world, including graduate students, senior professors, laboratory heads, and Fields Medalists. They told us that, in a variety of ways, their careers are being hijacked by perverse incentives. The result is bad science.

The scientific process, in its ideal form, is elegant: Ask a question, set up an objective test, and get an answer. Repeat. Science is rarely practiced to that ideal. But Copernicus believed in that ideal. So did the rocket scientists behind the moon landing.

But nowadays, our respondents told us, the process is riddled with conflict. Scientists say they’re forced to prioritize self-preservation over pursuing the best questions and uncovering meaningful truths.

“I feel torn between asking questions that I know will lead to statistical significance and asking questions that matter,” says Kathryn Bradshaw, a 27-year-old graduate student of counseling at the University of North Dakota.

Today, scientists’ success often isn’t measured by the quality of their questions or the rigor of their methods. It’s instead measured by how much grant money they win, the number of studies they publish, and how they spin their findings to appeal to the public.

Scientists often learn more from studies that fail. But failed studies can mean career death. So instead, they’re incentivized to generate positive results they can publish. And the phrase “publish or perish” hangs over nearly every decision. It’s a nagging whisper, like a Jedi’s path to the dark side.

“Over time the most successful people will be those who can best exploit the system,” Paul Smaldino, a cognitive science professor at University of California Merced, says.

Source: The 7 biggest problems facing science, according to 270 scientists – Vox

 

Don’t Replace People. Augment Them. — What’s The Future of Work?

If we let machines put us out of work, it will be because of a failure of imagination and the will to make a better future!

“Could a machine do your job?” ask Michael Chui, James Manyika, and Mehdi Miremadi in a recent McKinsey Quarterly article, Where Machines Could Replace Humans and Where They Can’t Yet. The authors try to put the current worries about this question in perspective:

“As automation technologies such as machine learning and robotics play an increasingly great role in everyday life, their potential effect on the workplace has, unsurprisingly, become a major focus of research and public concern. The discussion tends toward a Manichean guessing game: which jobs will or won’t be replaced by machines?

In fact, as our research has begun to show, the story is more nuanced. While automation will eliminate very few occupations entirely in the next decade, it will affect portions of almost all jobs to a greater or lesser degree, depending on the type of work they entail.”

Source: Don’t Replace People. Augment Them. — What’s The Future of Work? — Medium

 

A Course in Machine Learning

Machine learning is the study of algorithms that learn from data and experience. It is applied in a vast variety of application areas, from medicine to advertising, from military to pedestrian. Any area in which you need to make sense of data is a potential consumer of machine learning.

CIML is a set of introductory materials that covers most major aspects of modern machine learning (supervised learning, unsupervised learning, large margin methods, probabilistic modeling, learning theory, etc.). It’s focus is on broad applications with a rigorous backbone. A subset can be used for an undergraduate course; a graduate course could probably cover the entire material and then some.

Individual Chapters:
  1. Front Matter
  2. Decision Trees
  3. Geometry and Nearest Neighbors
  4. The Perceptron
  5. Machine Learning in Practice
  6. Beyond Binary Classification
  7. Linear Models
  8. Probabilistic Modeling
  9. Neural Networks
  10. Kernel Methods
  11. Learning Theory
  12. Ensemble Methods
  13. Efficient Learning
  14. Unsupervised Learning
  15. Expectation Maximization
  16. Semi-Supervised Learning
  17. Graphical Models
  18. Online Learning
  19. Structured Learning
  20. Bayesian Learning
  21. Back Matter

Source: A Course in Machine Learning

 

JupyterLab: the next generation of the Jupyter Notebook

Learning the lessons of the Jupyter Notebook

It’s been a long time in the making, but today we want to start engaging our community with an early (pre-alpha) release of the next generation of the Jupyter Notebook application, which we are calling JupyterLab.

At the SciPy 2016 conference, Brian Granger and Jason Grout presented (PDF of talk slides and video) the overall vision of the system and gave a demo of its current capabilities, which are rapidly evolving and improving:

Source: JupyterLab: the next generation of the Jupyter Notebook