GTFS Data Quality: What it is, why it matters

Think back to a time when someone gave you good directions. Why were they so easy to understand? They likely predicted your small decisions, and offered hints to reassure yourself you were on the right track.

How did it feel to follow those directions? Were you confident and calm? Were your needs anticipated? Did it make you want to repeat the process?

Or maybe those directions were so clear, so intuitive, and so easy to hold in your mind that you forgot you were being led altogether. These are the best kinds of instructions, and form the kind of experience that California's transit riders deserve.

What is transit data?

Transit providers invest significant resources into describing their networks with maps, schedules, and signs. They use numbers, names, colors, and symbols to make the network easier to understand. These are all intended to ensure that riders spend as little time as possible having to think about how to navigate their system.

Providers have a wide range of control over the information they display to riders in their own brochures, signage, and digital real estate. They have less control when they have to distill their local conventions and the nuances of their schedules into standardized General Transit Feed Specification (GTFS) data before it is displayed alongside data from other providers in formats optimized for competing reasons.

What makes transit data good?

How should transit providers evaluate the quality of their GTFS data? Consider both completeness and accuracy.

Completeness.If a region has several transit providers, and all but one produces GTFS data, that provider's operations will be invisible to people planning trips with online tools. This frustrates riders ("why can't I get there easily?"), the provider ("why aren't people riding our bus?"), and the region ("why are so many people driving when we have a perfectly good bus network?").

Missing details matter, too. For example, transit riders with disabilities rely on accessibility information when planning trips. The GTFS-Pathways extension provides essential information about accessibility of multi-level transit stations. Without providing this data, transit providers cannot equitably serve all riders.

Accuracy. It's not enough to simply publish information – it has to match what the rider sees and experiences. Data becomes inaccurate very easily as schedules change, detours happen, and drivers call out sick. There are few transit experiences more frustrating than waiting for a bus that isn't going to arrive.

Even subtle variations can make a big difference for a transit rider. If a trip planning app recommends taking the bus route "3N" but the route indicator on the bus says "3 Northbound" or "3 to City Center," the rider will question if this is the correct bus. Before they know it, the bus has left.

Optimizing your data

Since 2018, on behalf of DRMT, Cal-ITP has worked closely with service providers, trip planning applications, and transit riders to understand the many reasons data quality can become diminished and why it matters. This extensive knowledge base informed Caltrans' publication of California's Minimum GTFS Guidelines in the summer of 2020. These Guidelines include a ten-point Data Process Checklist that any transit provider can use to identify gaps in their data completeness and accuracy.

These Guidelines are part of DRMT's ongoing development of a Transit Data Quality Toolkit. The Toolkit includes technical support, grants, demonstration projects, and more to address the specific challenges transit providers face regarding GTFS completeness and accuracy.

DRMT invites any California transit provider to begin understanding and improving their GTFS data quality by first emailing gtfsrt@dot.ca.gov to request their GTFS data assessment. While discussing the results, providers can work with Caltrans to define a Transit Data Improvement Strategy that will close these gaps support and improve both the completeness and accuracy of transit data.

Accessible and equitable transit requires data that is complete and accurate, and DRMT looks forward to supporting California transit providers to deliver the highest-quality experience to their riders.