← Back to Blog

Building a Geospatial Transit API for Mexico City

The problem

Mexico City has one of the world’s largest public transit systems — Metro, Metrobús, Cablebús, Tren Ligero — but its official data is scattered across PDFs, shapefiles, and inconsistent web portals. If you want to answer a spatial question like “which Metro stations are within 800m of a Metrobús corridor?”, you’d normally spend hours just wrangling the raw data.

For my Master’s research in Urban Planning at UNAM, I needed a clean, queryable representation of this network. So I built Apimetro.

Architecture overview

The stack is intentionally simple:

Flask (REST API)
  └── GeoPandas (spatial dataframes)
  └── PostGIS (persistent spatial storage)
  └── GeoJSON (wire format)

All transit geometries (routes as LineStrings, stations as Points) are stored in PostGIS with proper SRID 4326 projections. Flask endpoints expose them as GeoJSON, which means any GIS client — QGIS, Leaflet, Mapbox, deck.gl — can consume them without transformation.

Spatial SQL in action

One of the most useful queries is finding transit interchanges — places where two different lines are within walking distance:

SELECT 
  a.station_name,
  a.line,
  b.station_name AS nearby_station,
  b.line AS nearby_line,
  ST_Distance(
    a.geom::geography,
    b.geom::geography
  ) AS distance_m
FROM metro_stations a
JOIN metrobus_stations b
  ON ST_DWithin(a.geom::geography, b.geom::geography, 400)
WHERE a.line != b.line
ORDER BY distance_m;

PostGIS’s ST_DWithin with ::geography cast handles the geodesic distance calculation correctly — important in a city at ~2,200m elevation where metric accuracy matters.

What GeoPandas brings to the table

Before pushing to PostGIS, I process raw shapefiles with GeoPandas:

import geopandas as gpd
from shapely.ops import unary_union

metro = gpd.read_file("metro_stations.shp")
metro = metro.to_crs(epsg=4326)          # normalize projection
metro["buffer"] = metro.geometry.buffer(0.004)  # ~400m buffer
metro.to_postgis("metro_stations", engine, if_exists="replace")

The CRS normalization step is critical — the source data often comes in EPSG:6372 (Mexico’s national projection) and needs to be converted before spatial joins with OSM or GTFS data.

Lessons learned

1. Always validate your geometry before inserting

PostGIS will accept invalid geometries but spatial functions will fail silently or return wrong results. Always run ST_IsValid() after imports.

2. GeoJSON is your best friend for APIs

Don’t try to serialize geometries as WKT in JSON — just use GeoJSON natively. Flask-SQLAlchemy + GeoAlchemy2 can serialize PostGIS geometries to GeoJSON automatically.

3. Spatial indexes are not optional

A GIST index on geometry columns turns a 30-second ST_DWithin scan into a 200ms lookup on a table with 150,000 transit stops.

CREATE INDEX idx_metro_geom ON metro_stations USING GIST(geom);

What’s next

The next phase is integrating GTFS feed data for real-time frequency analysis — answering not just where stations are but how often each line runs and what the effective coverage area is at different time windows.

If you’re working on urban mobility data or transit APIs, check out the project:
github.com/galigaribaldi/Apimetro


← All posts
Next →