Note: This is part of an infrequent series of posts I’m doing on Census Bureau Demographic Data for Developers based on my experience creating a database of Census Data as part of my summer internship at Patch.com.
Tomorrow, I’ll be presenting my Demographics Around 7-11 Stores in Manhattan for LocationTech NYC in a 5 minute Ignite-style talk. I’ve been wanting to update my original post with some new thoughts (not to mention some better graphs) and this was a good excuse to round the edges with a more nuanced approach to the technique.
To briefly summarize the approach I took, if you assume an even distribution of demographics across a census tract, then you can take a weighted sample of that census tract based on the area of overlap as a percentage of the whole census tract.
For those that like formulas:
The 7-11 at 14th Street and 6th Ave is a perfect example of where this is likely to work:
I take a percentage of the demographics from each census tract based on the percentage of area overlap, sum them together, and then normalize the raw numbers (rounded to the nearest whole person) as percentages over the total in the universe of analysis (usually total population or total households for the Demographic Profile data).
Since publishing my original blogpost, I’ve become much more proficient with Python and with iPython Notebook in particular. I thought I’d replace my hacked R barplots with some sophisticated plots (courtesy of matplotlib) of the demographics of 7-11s in relation to Manhattan wide totals:
This is taking the area around the 17 Manhattan 7-11s, performing the analysis outlined above for each demographic dimension, and comparing the results to the aggregate total for all of Manhattan. I’m plotting the difference between the Manhattan total and the number for the individual 7-11 store. The baselines are indicated in the label.
But not all census tracts are created equal. Since they follow the political boundaries, you have census tracts that extend over water (where people usually don’t live). Such is the case in Inwood near the 7-11 store on Dyckman Street:
The census provides a land area in the table (called ALAND10) in the Demographic Profile Table (DP1) for each census tract and I could easily use that; however, by adding a few other data sources, I can model the population distribution with even more precision.
The above shows an overlay of the NYC Parks. You can see virtually all the population in census tract 36061028700 is concentrated to the west of Broadway and between Riverside Drive and Dyckman Street, east of Staff Street. Adding the MapPluto Data gives us even more granularity on the distribution of residences in the census tract (which is what the Census Bureau is actually tracking):
Knowing the residential area in a particular census tract (from the Pluto data) provides us a way to better model the demographic distribution. If assume an even distribution of demographics across the total residential area for a given census tract, once I calculate the proportion of residential area within the area of interest, I can take that weighting and apply it to the demographics for that census tract instead of using the area (I haven’t implemented this yet, but it doesn’t seem hard to do).
Obviously populations are not even distributed and there are likely concentrations of particular populations that would distort the measure. An assistive living facility outside the area of interest but within the census tract would give the area of interest a greater weighting of senior citizens than the “true” population of that area.
Using the MapPluto data, I can also look at the density of the residential area. Below is a map of ResArea (relative to that area of Manhattan), that shows the amount of residential area within that tax lot.
You can see the residential area inside the buffer is some of the least dense relative to the other residential area in the 36061028700 census tract.
I’ve never claimed this was science. It’s a hack, more precisely, a data hack. It suggests a more sophisticated way of mapping the population distribution within the census tract, allowing a better way to disaggregate the data and recombine it in meaningful ways.
I’d like to write this up as some kind of research paper, assuming it has any scientific value whatsoever. Feel free to let me know what you think in the comments section.