*This is part of a series of posts I’m doing on Census Bureau Demographic Data for Developers based on my experience creating a database of Census Data as part of my summer internship at Patch.com.*

In a previous post, I discussed the idea behind what I’m calling “spatial weighting”, which is basically taking a percentage of the demographics from a particular census tract based on the area covered by the other shape. This post explores the mechanics of doing this in PostGIS.

The basic formula for weighting is:

In PostGIS, you would do this with the following code:

To break that down a little, I’m first creating a polygon that’s the intersection of the two geometries, in this case the census tract and my other table of geometries. I’m then taking the area of that polygon. I’m then dividing that area over the total area of the census tract to get the percentage of the census tract that falls within my shape.

That percentage will be applied to all the demographics for the census tract and then summed together with all the demographics (in my **GROUP BY** statement) to provide the total demographics for the objects in my Other Table.

As an example, I’m going to generate the census demographics for the area surrounding all 17 7-Eleven stores in Manhattan, using a 100 meter buffer around each point, and then compare those demographics to those of Manhattan as a whole. You can see the full code on my Github.

I created a table with Common Table Expressions, which are basically temporary tables that are created by a query to allow successive commands to execute. I’m going to break down the key parts of the code below. You could create these each as separate tables, but by using CTEs, you can reduce the number of tables and general clutter in your database.

Here’s the code for the buffer:

The “gid” attribute is just a serial number for each of the 17 sites. I’m casting the geometry into a geography in order to input the radius in meters. For more information, check out the PostGIS documentation on ST_Buffer. Also check out my blog post on Geometries vs. Geographies or (better yet) go straight to the source at OpenGeo for their explanation.

I’m recasting the geography into a geometry here:

At this point, I’m creating the weights based on the intersection area taken over the total area of the census tract:

From there, I’m creating the aggregates in another temporary CTE table:

This is just a snippet, but it gives you the basic idea. I’m taking the particular demographic attribute, multiplying it times the intersection area divided by the total area of the tract (“weight”), summing the function because I’m grouping multiple tracts together, then rounding the result (so I don’t end up with a fractional person in my count. I’m recasting this to “numeric” because PostgreSQL will cast this into a double precision type. I have a function for calculating the median age that I’ll detail in a future post that uses values in numeric format. Since these are whole number integers anyway, I’m recasting into numeric from double precision.

At the end, here is my join statement between the tables and the GROUP BY syntax:

Here’s the payoff:

This creates my table of demographic data by 7-Eleven location, normalized over the total population for that area covered by the 7-Eleven to have a meaningful comparison between them.

But what about the “find_median_age” function? I’ll address that function in a future post.

Pingback: Spatial Demographic Data – Weighting Demographics Overview | Datapolitan