Unlearning descriptive statistics

If you’ve ever used an arithmetic mean, a Pearson correlation or a standard deviation to describe a dataset, I’m writing this for you. Better numbers exist to summarize location, association and spread: numbers that are easier to interpret and that don’t act up with wonky data and outliers.

Statistics professors tend to gloss over basic descriptive statistics because they want to spend as much time as possible on margins of error and t-tests and regression. Fair enough, but the result is that it’s easier to find a machine learning expert than someone who can talk about numbers. Forget what you think you know about descriptives and let me give you a whirlwind tour of the real stuff.

The average

The arithmetic mean is one of many measures of central tendency. One particularly useful feature of the mean is that, whenever we lack outside information like a scientific theory, it is our best possible guess for what to expect in the future. Sum up all of the rainy days in your area for as many years as you have data for, divide by the amount of years, and that’s your best bet for how much rain to expect this year. Multiply by 10, and that’s how much rain you can expect in a decade.

Source: Unlearning descriptive statistics