A Large Scale Study of Programming Languages and Code Quality in Github

What is the effect of programming languages on software quality?
This question has been a topic of much debate for a very long time.
In this study, we gather a very large data set from GitHub (729
projects, 80 Million SLOC, 29,000 authors, 1.5 million commits,
in 17 languages) in an attempt to shed some empirical light on
this question. This reasonably large sample size allows us to use a
mixed-methods approach, combining multiple regression modeling
with visualization and text analytics, to study the effect of language
features such as static v.s. dynamic typing, strong v.s. weak typing on
software quality. By triangulating findings from different methods,
and controlling for confounding effects such as team size, project
size, and project history, we report that language design does have a
significant, but modest effect on software quality. Most notably, it
does appear that strong typing is modestly better than weak typing,
and among functional languages, static typing is also somewhat bet-
ter than dynamic typing. We also find that functional languages are
somewhat better than procedural languages. It is worth noting that
these modest effects arising from language design are overwhelm-
ingly dominated by the process factors such as project size, team
size, and commit size. However, we hasten to caution the reader
that even these modest effects might quite possibly be due to other,
intangible process factors, e.g., the preference of certain personality
types for functional, static and strongly typed languages.

Source: NimbusSanL-Regu – lang_study.pdf

Raony Guimaraes

Science, Bioinformatics, Linux and Fun!

A Large Scale Study of Programming Languages and Code Quality in Github