October 3, 2016 by Raony Guimaraes on Uncategorized

Bubble sheet multiple choice scanner and test grader using OMR, Python and OpenCV – PyImageSearch

Discover how to build a bubble sheet multiple choice scanner and test grader using Optical Mark Recognition (OMR) along with Python, OpenCV, computer vision

Source: Bubble sheet multiple choice scanner and test grader using OMR, Python and OpenCV – PyImageSearch

October 3, 2016 by Raony Guimaraes on Uncategorized

Why’s that company so big? I could do that in a weekend

I can’t think of a single large software company that doesn’t regularly draw internet comments of the form “What do all the employees do? I could build their product myself.” Benjamin Pollack and Jeff Atwood called out people who do that with Stack Overflow. But Stack Overflow is relatively obviously lean, so the general response is something like “oh, sure maybe Stack Overflow is lean, but FooCorp must really be bloated”. And since most people have relatively little visibility into FooCorp, for any given value of FooCorp, that sounds like a plausible statement. After all, what product could possible require hundreds, or even thousands of engineers?

A few years ago, in the wake of the rapgenius SEO controversy, a number of folks called for someone to write a better Google. Alex Clemmer responded that maybe building a better Google is a non-trivial problem. Considering how much of Google’s $500B market cap comes from search, and how much money has been spent by tens (hundreds?) of competitors in an attempt to capture some of that value, it seems plausible to me that search isn’t a trivial problem. But in the comments on Alex’s posts, multiple people respond and say that Lucene basically does the same thing Google does and that Lucene is poised to surpass Google’s capabilities in the next few years.

What would Lucene at Google’s size look like? If we do a naive back of the envelope calculation on what it would take to index a significant fraction of the internet (often estimated to be 1 trillion (T) or 10T documents), we might expect a 1T document index to cost something like $10B¹. That’s not a feasible startup, so let’s say that instead of trying to index 1T documents, we want to maintain an artisanal search index of 1B documents. Then our cost comes down to $12M/yr. That’s not so bad – plenty of startups burn through more than that every year. While we’re in the VC-funded hypergrowth mode, that’s fine, but once we have a real business, we’ll want to consider trying to save money. At $12M/yr for the index, a 3% performance improvement that lets us trim our costs by 2% is worth $360k/yr. With those kinds of costs, it’s surely worth it to have at least one engineering working full-time on optimization, if not more.

Source: Why’s that company so big? I could do that in a weekend

October 3, 2016 by Raony Guimaraes on Uncategorized

Keynote Session: Dr. Edward Tufte – The Future of Data Analysis

Keynote

Source: Keynote Session: Dr. Edward Tufte – The Future of Data Analysis | Microsoft Machine Learning & Data Science Summit 2016 | Channel 9

October 3, 2016 by Raony Guimaraes on Uncategorized

Sam Altman’s Manifest Destiny

Is the head of Y Combinator fixing the world, or trying to take over Silicon Valley?

Source: Sam Altman’s Manifest Destiny – The New Yorker

October 2, 2016 by Raony Guimaraes on Uncategorized

Why bad science persists: Incentive malus

IN 1962 Jacob Cohen, a psychologist at New York University, reported an alarming finding. He had analysed 70 articles published in the Journal of Abnormal and Social Psychology and calculated their statistical “power” (a mathematical estimate of the probability that an experiment would detect a real effect).

Source: Why bad science persists: Incentive malus | The Economist

October 2, 2016 by Raony Guimaraes on Uncategorized

Purposes, Concepts, Misfits, and a Redesign of Git

Santiago Perez De Rosso and Daniel Jackson: “Purposes, Concepts, Misfits, and a Redesign of Git“, SPLASH 2016.

Git is a widely used version control system that is powerful but complicated. Its complexity may not be an inevitable consequence of its power but rather evidence of flaws in its design. To explore this hypothesis, we analyzed the design of Git using a theory that identifies concepts, purposes, and misfits. Some well-known difficulties with Git are described, and explained as misfits in which underlying concepts fail to meet their intended purpose. Based on this analysis, we designed a reworking of Git (called Gitless) that attempts to remedy these flaws.

To correlate misfits with issues reported by users, we conducted a study of Stack Overflow questions. And to determine whether users experienced fewer complications using Gitless in place of Git, we conducted a small user study. Results suggest our approach can be profitable in identifying, analyzing, and fixing design problems.

This paper presents a detailed, well-founded critique of one of the most powerful, but frustrating, tools in widespread use today. A follow-up to earlier work published in 2013, it is distinguished from most other discussion of software design by three things:

It clearly describes its design paradigm, which comprises concepts (the major elements of the user’s mental model of the system), purposes (which motivate the concepts), and misfits (which are instances where concepts do not satisfy purposes, or contradict one another).
It lays out Git’s concepts and purposes, analyzes its main features in terms of them, and uses that analysis to identify mis-matches.
Crucially, it then analyzes independent discussion of Git (on Stack Overflow) to see if users are stumbling over the misfits identified in step 2.

Source: : Purposes, Concepts, Misfits, and a Redesign of Git

October 2, 2016 by Raony Guimaraes on Uncategorized

Gitless

Gitless: a version control system built on top of Git

Gitless is an experimental version control system built on top of Git. Many people complain that Git is hard to use. We think the problem lies deeper than the user interface, in the concepts underlying Git. Gitless is an experiment to see what happens if you put a simple veneer on an app that changes the underlying concepts. Because Gitless is implemented on top of Git (could be considered what Git pros call a “porcelain” of Git), you can always fall back on Git. And of course your coworkers you share a repo with need never know that you’re not a Git aficionado.

Check out the documentation to get started. If you are new to version control, the documentation should be enough to get you started. If you are a Git pro looking to see what’s different from your beloved Git you’ll be able to spot the differences by glancing through the Gitless vs. Git section.

Source: Gitless

Raony Guimaraes

Science, Bioinformatics, Linux and Fun!

Month: October 2016

Bubble sheet multiple choice scanner and test grader using OMR, Python and OpenCV – PyImageSearch

Why’s that company so big? I could do that in a weekend

Keynote Session: Dr. Edward Tufte – The Future of Data Analysis

Sam Altman’s Manifest Destiny

Why bad science persists: Incentive malus

Purposes, Concepts, Misfits, and a Redesign of Git

Gitless