Note: This is part of a project I completed as part of a Data Mining class looking at using demographic data to predict fare type usage on the New York City Subway System. I’m publishing posts on the methods I used to analyze the data and the ultimate results in a series of blog posts. For background on this project, please see this post.
Background on the MTA
The Metropolitan Transportation Authority was chartered in 1968 by the State of New York as a public benefit corporation to takeover and operate the failing Long Island Railroad. The MTA also became the parent agency of the New York City Transit Authority, responsible for the New York City subway and bus systems. MTA also took control of New York City’s bridges and tunnels from the Triborough Bridge and Tunnel Authority. By 1983, the MTA was in control of the upstate and Connecticut commuter railroad lines that have become the Metro-North Railroad.
The MTA runs the largest public transit system in the United States. New York City Transit (as the NYCTA was renamed) has over 7 million daily weekday passengers with an operating budget of $9.1 billion. The NYCT is divided into three segments, the Subway System, the Bus System and the Staten Island Railway. The focus of this project is fare data for the New York City Transit Subway System.
Subway Transit Fares
The NYCT has a variety of fare products to serve its large ridership. For general riders, there are single fare tickets, Pay-Per-Ride MetroCards, and Unlimited Ride MetroCards that come in either 7-day or 30-day durations. Unlimited Ride MetroCards allow riders to take unlimited trips on subways and buses for a fixed price and are targeted towards frequent users of the transit system. The pay-per-ride charges the transit rider $2.25 with a MetroCard or $2.50 for a single ride ticket, with a free transfer from bus to subway and vice-versa within a two-hour period. This is an optimal fare solution for transit riders who don’t ride the subway or bus frequently enough to make the unlimited ride transit fares economical.
The NYCT offers special fare products targeted for specific demographics, including student fare MetroCards for school-age children under the age of 18, senior fares, and fares for medically disabled riders. There are also fares for special segments of the NYCT system, such as the AirTrain connecting the subway system to JFK airport and the PATH trains connecting New Jersey commuters to lower Manhattan. These special fare products are already demographically targeted and were included in this analysis collectively as “Other” fare types.
MTA Data Collection
The MTA collects fare data at the level of the remote station. A remote station is a collection of turnstiles serving a particular station and is uniquely identified by a remote code in the format “R###”, where the “#” is an integer value between 0 and 9. One station may have multiple remote codes associated with its service. Several stations, connected by passageways, may have only one remote code for all the turnstile groups at its entrances. Likewise, connected stations may have multiple remote codes, each associated with a particular portion of the station. This variety is likely the result of changes in station configurations over time.
Remote Code – Turnstile – Station Dependency Chart
Because one station could have many remote codes or several stations could share one remote code, the level of analysis was moved from the remote code level to the station level. A station was associated with the respective remote codes and the counts for the respective remote codes were combined together to accurately reflect fare type usage at the station as a whole. This helped normalize the fare type data, particularly given the difficulty in knowing which turnstile groups were assigned to a particular remote code, which would be necessary for parsing the census tract data appropriately.
Excluding the PATH and AirTrain stations, there are 450 remote codes in the NYCT Subway System as reported in the MTA fare type dataset. This corresponds with 421 NYCT stations. Of these 421 stations, one station had four remote codes (42nd Street – Grand Central), one station had three remote codes (Canal Street), and 17 stations had two remote codes, with the remaining 402 stations only having one remote code associated with the fare type collection.
Excluded from the analysis are two subway stations, Smith St. – 9th St. which was closed for the duration of the period under consideration and Aqueduct Racetrack Station on the A Route, which is only open when there is a race and has no exit to the surrounding area, making the fare usage non-relational to the surrounding demographics of the station.
In the next post, I’ll be looking at the spatial analysis of station locations and Census tracts.