<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://galigaribaldi.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://galigaribaldi.github.io/" rel="alternate" type="text/html" /><updated>2026-04-23T06:27:26+00:00</updated><id>https://galigaribaldi.github.io/feed.xml</id><title type="html">Hernán Galileo</title><subtitle>Portfolio and blog of Hernán Galileo Cabrera Garibaldi — Backend, Data Engineering, and Geospatial Systems professional based in Mexico City.</subtitle><author><name>Hernán Galileo Cabrera Garibaldi</name><email>galigaribaldi0@gmail.com</email></author><entry><title type="html">Building a Geospatial Transit API for Mexico City</title><link href="https://galigaribaldi.github.io/blog/2026/04/22/building-geospatial-transit-api/" rel="alternate" type="text/html" title="Building a Geospatial Transit API for Mexico City" /><published>2026-04-22T00:00:00+00:00</published><updated>2026-04-22T00:00:00+00:00</updated><id>https://galigaribaldi.github.io/blog/2026/04/22/building-geospatial-transit-api</id><content type="html" xml:base="https://galigaribaldi.github.io/blog/2026/04/22/building-geospatial-transit-api/"><![CDATA[<h2 id="the-problem">The problem</h2>

<p>Mexico City has one of the world’s largest public transit systems — Metro, Metrobús, Cablebús, Tren Ligero — but its official data is scattered across PDFs, shapefiles, and inconsistent web portals. If you want to answer a spatial question like <em>“which Metro stations are within 800m of a Metrobús corridor?”</em>, you’d normally spend hours just wrangling the raw data.</p>

<p>For my Master’s research in Urban Planning at UNAM, I needed a clean, queryable representation of this network. So I built <strong>Apimetro</strong>.</p>

<h2 id="architecture-overview">Architecture overview</h2>

<p>The stack is intentionally simple:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Flask (REST API)
  └── GeoPandas (spatial dataframes)
  └── PostGIS (persistent spatial storage)
  └── GeoJSON (wire format)
</code></pre></div></div>

<p>All transit geometries (routes as LineStrings, stations as Points) are stored in PostGIS with proper SRID 4326 projections. Flask endpoints expose them as GeoJSON, which means any GIS client — QGIS, Leaflet, Mapbox, deck.gl — can consume them without transformation.</p>

<h2 id="spatial-sql-in-action">Spatial SQL in action</h2>

<p>One of the most useful queries is finding transit interchanges — places where two different lines are within walking distance:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> 
  <span class="n">a</span><span class="p">.</span><span class="n">station_name</span><span class="p">,</span>
  <span class="n">a</span><span class="p">.</span><span class="n">line</span><span class="p">,</span>
  <span class="n">b</span><span class="p">.</span><span class="n">station_name</span> <span class="k">AS</span> <span class="n">nearby_station</span><span class="p">,</span>
  <span class="n">b</span><span class="p">.</span><span class="n">line</span> <span class="k">AS</span> <span class="n">nearby_line</span><span class="p">,</span>
  <span class="n">ST_Distance</span><span class="p">(</span>
    <span class="n">a</span><span class="p">.</span><span class="n">geom</span><span class="p">::</span><span class="n">geography</span><span class="p">,</span>
    <span class="n">b</span><span class="p">.</span><span class="n">geom</span><span class="p">::</span><span class="n">geography</span>
  <span class="p">)</span> <span class="k">AS</span> <span class="n">distance_m</span>
<span class="k">FROM</span> <span class="n">metro_stations</span> <span class="n">a</span>
<span class="k">JOIN</span> <span class="n">metrobus_stations</span> <span class="n">b</span>
  <span class="k">ON</span> <span class="n">ST_DWithin</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">geom</span><span class="p">::</span><span class="n">geography</span><span class="p">,</span> <span class="n">b</span><span class="p">.</span><span class="n">geom</span><span class="p">::</span><span class="n">geography</span><span class="p">,</span> <span class="mi">400</span><span class="p">)</span>
<span class="k">WHERE</span> <span class="n">a</span><span class="p">.</span><span class="n">line</span> <span class="o">!=</span> <span class="n">b</span><span class="p">.</span><span class="n">line</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">distance_m</span><span class="p">;</span>
</code></pre></div></div>

<p>PostGIS’s <code class="language-plaintext highlighter-rouge">ST_DWithin</code> with <code class="language-plaintext highlighter-rouge">::geography</code> cast handles the geodesic distance calculation correctly — important in a city at ~2,200m elevation where metric accuracy matters.</p>

<h2 id="what-geopandas-brings-to-the-table">What GeoPandas brings to the table</h2>

<p>Before pushing to PostGIS, I process raw shapefiles with GeoPandas:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">geopandas</span> <span class="k">as</span> <span class="n">gpd</span>
<span class="kn">from</span> <span class="nn">shapely.ops</span> <span class="kn">import</span> <span class="n">unary_union</span>

<span class="n">metro</span> <span class="o">=</span> <span class="n">gpd</span><span class="p">.</span><span class="n">read_file</span><span class="p">(</span><span class="s">"metro_stations.shp"</span><span class="p">)</span>
<span class="n">metro</span> <span class="o">=</span> <span class="n">metro</span><span class="p">.</span><span class="n">to_crs</span><span class="p">(</span><span class="n">epsg</span><span class="o">=</span><span class="mi">4326</span><span class="p">)</span>          <span class="c1"># normalize projection
</span><span class="n">metro</span><span class="p">[</span><span class="s">"buffer"</span><span class="p">]</span> <span class="o">=</span> <span class="n">metro</span><span class="p">.</span><span class="n">geometry</span><span class="p">.</span><span class="nb">buffer</span><span class="p">(</span><span class="mf">0.004</span><span class="p">)</span>  <span class="c1"># ~400m buffer
</span><span class="n">metro</span><span class="p">.</span><span class="n">to_postgis</span><span class="p">(</span><span class="s">"metro_stations"</span><span class="p">,</span> <span class="n">engine</span><span class="p">,</span> <span class="n">if_exists</span><span class="o">=</span><span class="s">"replace"</span><span class="p">)</span>
</code></pre></div></div>

<p>The CRS normalization step is critical — the source data often comes in EPSG:6372 (Mexico’s national projection) and needs to be converted before spatial joins with OSM or GTFS data.</p>

<h2 id="lessons-learned">Lessons learned</h2>

<p><strong>1. Always validate your geometry before inserting</strong></p>

<p>PostGIS will accept invalid geometries but spatial functions will fail silently or return wrong results. Always run <code class="language-plaintext highlighter-rouge">ST_IsValid()</code> after imports.</p>

<p><strong>2. GeoJSON is your best friend for APIs</strong></p>

<p>Don’t try to serialize geometries as WKT in JSON — just use GeoJSON natively. Flask-SQLAlchemy + GeoAlchemy2 can serialize PostGIS geometries to GeoJSON automatically.</p>

<p><strong>3. Spatial indexes are not optional</strong></p>

<p>A <code class="language-plaintext highlighter-rouge">GIST</code> index on geometry columns turns a 30-second <code class="language-plaintext highlighter-rouge">ST_DWithin</code> scan into a 200ms lookup on a table with 150,000 transit stops.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">INDEX</span> <span class="n">idx_metro_geom</span> <span class="k">ON</span> <span class="n">metro_stations</span> <span class="k">USING</span> <span class="n">GIST</span><span class="p">(</span><span class="n">geom</span><span class="p">);</span>
</code></pre></div></div>

<h2 id="whats-next">What’s next</h2>

<p>The next phase is integrating GTFS feed data for real-time frequency analysis — answering not just <em>where</em> stations are but <em>how often</em> each line runs and what the effective coverage area is at different time windows.</p>

<p>If you’re working on urban mobility data or transit APIs, check out the project:<br />
<a href="https://github.com/galigaribaldi/Apimetro">github.com/galigaribaldi/Apimetro</a></p>]]></content><author><name>Hernán Galileo</name></author><category term="gis" /><summary type="html"><![CDATA[How I built Apimetro — a REST API that models Mexico City's public transit network as spatial data, and what I learned combining GeoPandas, PostGIS, and Flask.]]></summary></entry><entry><title type="html">VFT Model — Report 00: Building a Topological Transit Graph</title><link href="https://galigaribaldi.github.io/blog/2026/04/22/vftmodel-reporte-00-grafo-topologico/" rel="alternate" type="text/html" title="VFT Model — Report 00: Building a Topological Transit Graph" /><published>2026-04-22T00:00:00+00:00</published><updated>2026-04-22T00:00:00+00:00</updated><id>https://galigaribaldi.github.io/blog/2026/04/22/vftmodel-reporte-00-grafo-topologico</id><content type="html" xml:base="https://galigaribaldi.github.io/blog/2026/04/22/vftmodel-reporte-00-grafo-topologico/"><![CDATA[<div class="post-lang" id="post-en">

  <h2 id="report-00-building-a-topological-transit-graph-for-mexico-city">Report 00: Building a Topological Transit Graph for Mexico City</h2>

  <p>The foundational challenge in transit network analysis is deceptively simple: how do you turn a collection of GPS coordinates, stops, and route shapes into a computable graph? The Vanishing Fig-Tree Model (VFT Model) addresses this in its first report by constructing a directed topological representation of Mexico City’s multimodal transit network.</p>

  <p><strong>Full technical results:</strong> <a href="https://galigaribaldi.github.io/VFTModel/notebooks/intro.html">Reporte 00 — VFT Model Notebooks</a></p>

  <hr />

  <h3 id="the-problem-from-gtfs-to-a-graph">The Problem: From GTFS to a Graph</h3>

  <p>GTFS (General Transit Feed Specification) feeds describe transit systems as sequences of stops and route geometries — not as connected graph topologies. Two stations on different lines may be physically a few meters apart, but appear as entirely separate nodes in the raw data. This is the <strong>phantom node problem</strong>: without resolving it, any graph-based analysis produces broken or disconnected paths.</p>

  <p>Consider the Pantitlán interchange, where Metro Lines 1, 5, 9, and A converge. In raw GTFS data, each line registers its own stop coordinates independently. Without spatial preprocessing, Pantitlán appears as four separate nodes with no edges between them — analytically invisible as a transfer hub.</p>

  <h3 id="logical-snapping">Logical Snapping</h3>

  <p>The solution implemented in Report 00 is <strong>logical snapping</strong>: a spatial preprocessing step that merges nodes within a configurable distance threshold (ε). Unlike exact coordinate matching, this algorithm handles GPS noise and inconsistent data entry gracefully:</p>

  <ol>
    <li>Build a spatial index (R-tree) over all stop coordinates</li>
    <li>Identify all node pairs within ε meters of each other</li>
    <li>Collapse each cluster into a single representative node, preserving all incoming and outgoing edge connections</li>
  </ol>

  <p>The threshold ε is tuned per transport mode. Metro stations use a tighter ε than surface-level RTP stops, which have higher GPS variance.</p>

  <h3 id="building-the-directed-graph">Building the Directed Graph</h3>

  <p>With phantom nodes resolved, each transit line becomes a sequence of directed edges in a <code class="language-plaintext highlighter-rouge">networkx.DiGraph</code>:</p>

  <ul>
    <li><strong>Nodes:</strong> transit stops — attributes include coordinates, system (Metro, Metrobús, Cablebús…), and line identifier</li>
    <li><strong>Edges:</strong> service segments between consecutive stops — weighted by scheduled travel time in seconds</li>
  </ul>

  <p>The resulting graph covers the full CDMX multimodal network: Metro, Metrobús, Cablebús, Tren Ligero, Trolebús, Mexicable, and Interurbano.</p>

  <h3 id="why-this-matters">Why This Matters</h3>

  <p>A correctly-built topological graph is the prerequisite for every subsequent analysis in the VFT Model: computing betweenness centrality, measuring direct-route indices (DI), and simulating ring-corridor scenarios. A phantom node left unresolved corrupts path-finding across the entire network.</p>

  <p>Report 00 establishes the foundation that Reports 01–05 build on.</p>

  <hr />

  <p><em>This post summarizes findings from the VFT Model research project, part of the TAICMAM thesis at UNAM. For the full notebook with code, visualizations, and methodology detail, see the <a href="https://galigaribaldi.github.io/VFTModel/notebooks/intro.html">VFT Model GitHub Pages</a>.</em></p>

</div>

<div class="post-lang" id="post-es">

  <h2 id="reporte-00-construccin-del-grafo-topolgico-de-la-red-de-transporte-de-la-cdmx">Reporte 00: Construcción del Grafo Topológico de la Red de Transporte de la CDMX</h2>

  <p>El desafío fundamental en el análisis de redes de transporte es engañosamente simple: ¿cómo se transforma una colección de coordenadas GPS, paradas y trazas de rutas en un grafo computable? El Modelo VFT (Modelo del Punto de Higuera) aborda esta pregunta en su primer reporte construyendo una representación topológica dirigida de la red multimodal de transporte de la Ciudad de México.</p>

  <p><strong>Resultados técnicos completos:</strong> <a href="https://galigaribaldi.github.io/VFTModel/notebooks/intro.html">Reporte 00 — Notebooks del Modelo VFT</a></p>

  <hr />

  <h3 id="el-problema-de-gtfs-a-un-grafo">El Problema: De GTFS a un Grafo</h3>

  <p>Los feeds GTFS (General Transit Feed Specification) describen sistemas de transporte como secuencias de paradas y geometrías de rutas — no como topologías de grafo conectadas. Dos estaciones de líneas diferentes pueden estar físicamente a pocos metros de distancia, pero aparecer como nodos completamente separados en los datos crudos. Este es el <strong>problema de nodos fantasma</strong>: sin resolverlo, cualquier análisis basado en grafos produce caminos rotos o desconectados.</p>

  <p>Tomemos el caso del Pantitlán, donde convergen las Líneas 1, 5, 9 y A del Metro. En los datos GTFS crudos, cada línea registra sus propias coordenadas de parada de forma independiente. Sin preprocesamiento espacial, Pantitlán aparece como cuatro nodos separados sin aristas entre ellos — analíticamente invisible como hub de transferencia.</p>

  <h3 id="snapping-lgico">Snapping Lógico</h3>

  <p>La solución implementada en el Reporte 00 es el <strong>snapping lógico</strong>: un paso de preprocesamiento espacial que fusiona nodos dentro de un umbral de distancia configurable (ε). A diferencia de la coincidencia exacta de coordenadas, este algoritmo maneja de forma robusta el ruido GPS y la inconsistencia en la captura de datos:</p>

  <ol>
    <li>Construir un índice espacial (R-tree) sobre todas las coordenadas de paradas</li>
    <li>Identificar todos los pares de nodos dentro de ε metros entre sí</li>
    <li>Colapsar cada cluster en un único nodo representativo, preservando todas las conexiones de aristas entrantes y salientes</li>
  </ol>

  <p>El umbral ε se calibra por modo de transporte. Las estaciones de Metro usan un ε más ajustado que las paradas de superficie del RTP, que tienen mayor varianza GPS.</p>

  <h3 id="construccin-del-grafo-dirigido">Construcción del Grafo Dirigido</h3>

  <p>Con los nodos fantasma resueltos, cada línea de transporte se convierte en una secuencia de aristas dirigidas en un <code class="language-plaintext highlighter-rouge">networkx.DiGraph</code>:</p>

  <ul>
    <li><strong>Nodos:</strong> paradas de transporte — atributos: coordenadas, sistema (Metro, Metrobús, Cablebús…) e identificador de línea</li>
    <li><strong>Aristas:</strong> segmentos de servicio entre paradas consecutivas — ponderados por tiempo de viaje programado en segundos</li>
  </ul>

  <p>El grafo resultante cubre la red multimodal completa de la CDMX: Metro, Metrobús, Cablebús, Tren Ligero, Trolebús, Mexicable e Interurbano.</p>

  <h3 id="por-qu-importa">Por Qué Importa</h3>

  <p>Un grafo topológico correctamente construido es el prerrequisito para todo análisis posterior en el Modelo VFT: calcular la centralidad de intermediación, medir el Índice de Ruta Directa (DI) y simular escenarios de corredores anillares. Un nodo fantasma sin resolver corrompe la búsqueda de caminos en toda la red.</p>

  <p>El Reporte 00 establece la base sobre la que se construyen los Reportes 01 al 05.</p>

  <hr />

  <p><em>Este post resume los hallazgos del proyecto de investigación Modelo VFT, parte de la tesis TAICMAM en la UNAM. Para el notebook completo con código, visualizaciones y detalle metodológico, consulta las <a href="https://galigaribaldi.github.io/VFTModel/notebooks/intro.html">GitHub Pages del Modelo VFT</a>.</em></p>

</div>]]></content><author><name>Hernán Galileo Cabrera Garibaldi</name><email>galigaribaldi0@gmail.com</email></author><category term="gis" /><summary type="html"><![CDATA[How do you turn raw GTFS data into a computable graph? The VFT Model's first report constructs a directed, weighted topology of Mexico City's multimodal transit network using NetworkX and logical snapping.]]></summary></entry></feed>