NYC subway math · Erik Bernhardsson

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here’s some relevant code for how to use the API:

from google.transit import gtfs_realtime_pb2
import urllib

for feed_id in [1, 2, 11]:
    feed = gtfs_realtime_pb2.FeedMessage()
    response = urllib.urlopen('http://datamine.mta.info/mta_esi.php?key=%s&feed_id=%d' % (os.environ['MTA_KEY'], feed_id))
    feed.ParseFromString(response.read())
    print feed

I started tracking all subway trains one day and completely forgot about it. Several weeks later I had a 3GB large data dump full of all the arrivals for L, 4, 5, 6, SI and GC (the latter two being Staten Island railway and Grand Central Shuttle).

Let’s do some cool stuff with this data!

Source: NYC subway math · Erik Bernhardsson