California Transit Data Guidelines
Version 3.0 – Finalized December 13, 2022
Table of Contents
- Purpose
- Applicability
- Principles
- Background
- New Concepts in Version 3
- Reaching GTFS Compliance
- Guidelines Checklist
- Tools and Resources
- Guideline Development Process
- Frequently Asked Questions
Purpose
The purpose of the California Transit Data Guidelines (“Guidelines”) is to communicate the specific criteria by which the California Department of Transportation (Caltrans) determines whether the General Transit Feed Specification (“GTFS”) Schedule and Realtime data produced by transit providers and vendors meets the qualitative needs of riders (and therefore trip-planning applications that deliver this data to riders), transit providers, and regulators (see the FAQs for a brief background on GTFS data). The Guidelines provide producers with a reference document for the characteristics of high-quality GTFS feeds. All transit providers and GTFS-producing vendors within the State of California are expected to work toward meeting the Guidelines in full.
Applicability
The Guidelines pertain to public transit operations. GTFS effectively supports all modes of scheduled fixed-route public transit: rail, bus, ferries, streetcars, subways, cable cars, trolleybuses, gondolas, and more. Currently, GTFS best represents public transit operations with fixed routes and fixed schedules. Demand-responsive transit services are expected to be incorporated into the GTFS standard through the upcoming GTFS-Flex extension, which some transit providers are already publishing.
Supporting an Interoperable Transit Technology Market
Beyond establishing best practices for constructing GTFS feeds, the Guidelines also specify other key open standards in which transit data should be conveyed. The use of open standards supports interoperability as defined by the Mobility Data Interoperability Principles (MDIP). MDIP is an international, coalition-led effort to support seamless integration of the hardware and software that transit providers rely on through the development and implementation of open standards.
Principles
These Guidelines were shaped by the principles of adopted transportation policy in California, as well as the desire to realize a transportation network that:
-
Provides the public with complete, correct, and up-to-date information (including standardized schedule, service, fare, access, and geographic data) to help riders of any ability plan journeys—as is nearly ubiquitous in the maps and apps on mobile phones
-
Provides riders with real-time vehicle arrival and departure information for making efficient decisions along their journey
-
Makes information easy for any application developer to use by providing Schedule and Realtime GTFS data feeds that are both standardized and public
-
Pursues industry standardization in order to provide better coordination, service delivery, and clear information for the transit industry—providers as well as their vendors—in the future
Background
As part of the department’s desire to achieve accurate and robust information sharing within the transit industry, Caltrans supports the use of the globally recognized GTFS standard for planning trips on transit.
Comprehensive, error-free GTFS is the foundation that enables transit providers to distribute critical information to their riders, whenever and wherever they need it. This includes information that can impact rider satisfaction and even help non-riders become riders. Universal, high-quality transit data transforms the experience of riding public transportation in California, empowering riders through dynamic, timely, and accurate information about transit networks. Achieving this goal of universal, high-quality transit data requires a focus on making sure that the GTFS data available from transit providers meets the needs of all riders.
The Guidelines, which were originally known as the California Minimum GTFS Guidelines, establish the information that all transit riders should have reliable access to (see the Guidelines Development Process for more information). This includes, but is not limited to, information about:
-
Discovering, planning, and navigating transit;
-
Understanding transit options available for travel between two points at a specific date and time;
-
Understanding when the next transit vehicle on their route is arriving;
-
Understanding any travel disruptions that are expected to occur or are currently occurring;
-
Understanding how much their trip will cost and how to pay their fare; and
-
Understanding whether (and how) they can complete their trip via wheelchair.
Within the State of California, Caltrans expects that transit providers will work toward achieving compatibility with these Guidelines. Meeting the Guidelines will likely be an ongoing process for transit providers and transit technology vendors. Many of the changes required to achieve these data improvements may take months to achieve (such as incorporating established specs into a feed). In limited cases, it may even take years to realize these changes (e.g., when new technologies, vendors, and/or procedures are needed). Providers may have different pathways toward achieving the data quality standards. Providers will begin from different baselines than one another and may have the bandwidth to work on improving different aspects of their data at different times.
Caltrans staff are available to assess a transit provider’s current GTFS Schedule and GTFS Realtime data. Staff will provide free technical assistance to transit providers to help the provider reach GTFS Compliance and will also work with them on other aspects of the Guidelines. Learn about the Transit Data Check-In process.
New Concepts in Version 3
The Guidelines contain a large number of individual recommendations for transit providers to check, and each of these checks may apply to different aspects of the composition or distribution of a GTFS feed. In Version 3 of the Guidelines, Caltrans is introducing new organizational concepts to help transit providers understand the intended impact of each check and to indicate the relative priority of different checks. In addition, Version 3 leverages GTFS-Flex to represent demand-responsive transit service. Other than the above, much of the intent of the content of the Guidelines remains the same as or similar to Version 2.
Features
Within the Guidelines, recommendations are now organized into groupings called “Features.” Each Feature represents a desirable characteristic of a high-quality GTFS feed. For transit providers, a GTFS feed will be considered to have a given Feature when it successfully meets the Guidelines for that Feature in the Guidelines Checklist below.
Guidelines for GTFS Schedule data are organized into the following Features:
-
Compliance
-
Accurate Accessibility Data
-
Best Practices Alignment
-
Demand-Responsive Completeness
-
Fare Completeness
-
Fixed-Route Completeness
-
Service Accuracy
-
Up-to-Dateness
Guidelines for Realtime data are organized into the following Features:
-
Compliance
-
Best Practices Alignment
-
Fixed-Route Completeness
-
Service Accuracy
Guidelines for Data Availability are organized into the following Features:
-
Feed Aggregator Availability
-
Technical Contact
-
Website Availability
Reaching GTFS Compliance
GTFS Compliance is a minimum threshold for your customers to see your service “on the map” in their trip-planning applications. Compliance is divided between GTFS Schedule data and GTFS Realtime data and is predicated on the core part of a transit provider’s service being covered.
GTFS Schedule Compliance
A provider is deemed to have achieved “Compliance” for their GTFS Schedule feed if it meets the recommendations for its respective Feature. Schedule feeds that meet the requirements to be considered “compliant” have all of the following characteristics:
-
The feed is publicly available at a stable URL;
-
The feed regularly passes the canonical validator with no errors;
-
The feed is explicitly provided under an open data license; and
-
The feed is accepted by major trip planners (i.e., Google Maps and Apple Maps).
Transit providers should prioritize Compliance before working on other GTFS Schedule Features. For transit providers that have already achieved Compliance, recommendations for the remaining GTFS Schedule Features can be found in Beyond GTFS Schedule Compliance below.
GTFS Realtime Compliance
A provider is deemed to have achieved “Compliance” for their GTFS Realtime feeds if they meet the recommendations for its respective Feature. Feeds that meet the requirements to be considered “compliant” have all of the following characteristics:
-
The feed is publicly available at a stable URL;
-
The feed regularly passes the canonical validators with no errors;
-
The feed is explicitly provided under an open data license; and
-
The feed is accepted by major trip planners (i.e. Google Maps and Apple Maps).
Transit providers should prioritize Compliance before working on other GTFS Realtime Features. For transit providers that have already achieved Compliance, recommendations for the remaining GTFS Realtime Features can be found in Beyond GTFS Realtime Compliance below.
Guidelines Checklist
The Guidelines Checklist provides all of the Transit Data Guidelines currently adopted by Caltrans. The development of each Guideline was informed by one or more Findings about data quality, and the Findings are listed next to each Guideline in the Checklist. Each Guideline belongs to exactly one Feature of a high-quality feed; a feed can be said to meet the requirements of a given Feature when it meets all of the Guidelines for that Feature.
The Guidelines Checklist is organized into priority levels, with the requirements for GTFS Compliance listed first, for both Schedule and Realtime feeds, followed by the other Features listed in alphabetical order for each feed type.
GTFS Compliance
Feature |
Finding(s) |
Guideline |
Compliance (GTFS Schedule Data) |
Access to GTFS Schedule data will be inadequate if that data is not made publicly accessible. |
Valid GTFS Schedule data is regularly and publicly published. |
Data consumers and their applications rely on the ability to find updated information in a single, regular location. |
GTFS Schedule data is published at a stable URL (permalink) from which it can be “fetched” automatically by trip-planning applications such as Google Maps, Apple Maps, and Transit App. |
|
There is a public, free-to-use, and open-source GTFS Schedule feed validator, which is available from MobilityData. GTFS feed errors may diminish the ability of transit riders to successfully plan their trip and may reduce their overall confidence in taking transit. |
The published feed regularly yields no errors as reported by the MobilityData GTFS Validator. |
|
To the greatest extent practicable, customers should be able to plan transit trips in the application of their choice. If the GTFS Schedule feed is not known to trip-planning applications, riders will not be receiving the information they need to plan their trip. |
The public datasets are being ingested* by trip-planning applications that serve the majority of their customers. This should include:
*In order for datasets to be ingested, transit providers or their vendor must provide this data to trip-planning applications. See “How do I publish my data to trip-planning applications?” for more information. |
|
GTFS data should be retrievable without unreasonable legal requirements. |
The transit provider’s website includes an open license that allows commercial use of GTFS data. See Caltrans’ model language for examples, or read more about what makes a license open. |
Feature |
Finding(s) |
Guideline |
Compliance (GTFS Realtime Data) |
Data consumers and their applications rely on the ability to find updated information in a single, regular location. |
GTFS Realtime datasets are published at stable URLs (permalinks) from which they can be “fetched” automatically by trip-planning applications. To be complete, this includes all three standard types of Realtime feeds:
|
Access to GTFS Realtime data will be inadequate if that data is not made publicly accessible. |
GTFS Realtime datasets are publicly published. To be complete, this includes all three standard types of Realtime feeds:
|
|
There is a public, free-to-use, and open-source GTFS feed validator, which is available from MobilityData. GTFS feed errors may diminish the ability of transit riders to successfully plan their trip and may reduce their overall confidence in taking transit. |
The public datasets regularly produce no errors as reported by the MobilityData GTFS Realtime Validator. This applies to all three standard types of Realtime feeds:
|
|
To the greatest extent practicable, customers should be able to plan transit trips in the application of their choice. If GTFS Realtime feeds are not known to trip-planning applications, riders will not be receiving the information they need to plan their trip. |
The public datasets are being ingested* by trip-planning applications that serve the majority of their customers. This should include:
*In order for datasets to be ingested, transit providers or their vendor must provide this data to trip-planning applications. See “How do I publish my data to trip-planning applications?” for more information. |
|
GTFS data should be retrievable without unreasonable legal requirements. |
The transit provider’s website includes an open license that allows commercial use of GTFS data. See Caltrans’ model language for examples, or read more about what makes a license open. |
Beyond GTFS Compliance
Beyond GTFS Compliance, transit providers should choose improvements from among the remaining Features based on their own interest, priorities, funding, and capacity. The Features below are presented alphabetically. Features are grouped based on whether they apply to GTFS Schedule data, GTFS Realtime data, or Data Availability.
GTFS Schedule Data
Feature |
Finding(s) |
Guideline |
Accurate Accessibility Data |
Transit riders with wheelchairs and other mobility aids encounter distinct challenges in accessing transit, including uncertainty as to whether they can board and alight at particular locations using their devices. Transit providers should support the ability of these riders to plan and take trips on transit by publishing information about the locations where wheelchair users can and cannot access the system in trip-planning applications. |
The wheelchair_boarding field has a valid, non-empty, and non-null value for every entry in the stops.txt file. |
Transit riders with wheelchairs and other mobility aids encounter distinct challenges in accessing transit, including the uncertainty as to whether their devices can be used on specific scheduled trips. Transit providers should support the ability of these riders to plan and take trips on transit by publishing information about the trips on which wheelchair users may or may not be able to travel in trip-planning applications. |
The wheelchair_accessible field has a valid, non-empty, and non-null value for every entry in the trips.txt file. |
|
Audio annunciation of stop names is an important wayfinding tool for transit riders with visual impairments. Transit providers should support the ability of these riders to conveniently and accurately plan and take trips on transit by ensuring that stop names will be pronounced correctly in trip-planning applications. |
The tts_stop_name field should include correct pronunciation for all stop names in stops.txt that are commonly mispronounced in trip-planning applications. |
|
Transit riders with wheelchairs and other mobility aids encounter distinct challenges in accessing transit, including uncertainty about navigating between boarding zones and street level at stops. Transit providers should support the ability of these riders to plan and take trips on transit by providing sufficient information for them to find accessible paths on and off transit using mobile applications. |
Sufficient data is included within stops.txt, pathways.txt, and levels.txt to navigate to, from, and between any boarding zone to street level with varying physical abilities, including pathway_mode and stair_count where applicable. This includes but is not limited to any stops that use parent_station in stops.txt as well as all significant or named transit facilities where an infrequent visitor may be concerned about accessibility. |
|
Best Practices Alignment |
MobilityData maintains a set of recommended practices developed by GTFS practitioners known as the GTFS Schedule Best Practices. The Best Practices align with the Principles of the California Transit Data Guidelines. |
Follow all GTFS Schedule Best Practices (version 1.0 or later) and “Should” statements in specification. |
Demand- Responsive Complete |
Transit riders should know what all their publicly available demand-responsive transit options are. (This does not include similar services provided by social service programs.) |
The feed represents all demand-responsive transit services available to the general public under the transit provider’s purview. This includes:
See GTFS-Flex for more information. |
Fare Completeness |
Transit riders should have access to complete and accurate fare and payment information without leaving their trip-planning application of choice. |
Fares data is published using the Fares v2 format and applicable following files. This includes: fare_leg_rules, rider_categories, fare_containers, fare_products, fare_transfer_rules. |
Fixed-Route Complete |
Transit riders should know what all their fixed-route transit options are. |
The feed represents all fixed-route transit services under the transit provider’s purview. |
Service Accuracy |
Service reliability is a major determining factor in rider satisfaction. Transit riders should be able to rely on the service information they receive from transit providers. |
Published GTFS Schedule datasets achieve a score of “passing” or “perfect” in all categories of the GTFS Grading Scheme v1. |
Accurate route mapping is an important tool for riders to plan and take trips on transit. |
Shapes.txt is included within the feed, with valid route shapes for each trip. Shapes should be precise enough to show the right-of-way that the vehicle uses and not inaccurately exit the right-of-way. |
|
Up-to-Dateness |
Transit riders should be able to rely on the service information they see in their trip planner of choice using the transit provider’s GTFS feed. Having up-to-date schedule information in trip-planning applications available when major service changes occur is likely the most important time to have updated data as it will be accessed by regular as well as infrequent customers. Trip-planning applications often require a week to process new GTFS data, run it through their QA/QC processes, and resolve outstanding issues. |
Changes to the base schedule are published at least one week in advance of every planned service change. Note: Unplanned or short-notice service changes should be represented in GTFS Realtime unless you have coordinated with the major trip-planning applications to have them consume an updated GTFS Schedule. |
Realtime Data
Feature |
Finding(s) |
Guideline |
Best Practices Alignment |
While the GTFS specification allows for a great level of flexibility, providers should leverage the Features that are expected by data consumers and customers and follow a consistent set of community best-practices. |
Consistent with GTFS Realtime Best Practices and “Should” statements in the GTFS Realtime specification. |
Fixed-Route Complete |
Transit riders should know the Realtime status for all fixed-route transit service. |
Realtime feeds* represent all fixed-route transit service operated under the transit providers’ purview. *Includes:
|
Realtime trip data must be provided for the whole system so the customer is confident in making decisions along their journey. |
100% of trip_ids for planned trips are consistent between the GTFS Schedule and all GTFS Realtime data. |
|
Accurate and confident trip-planning also depends on receiving reliable information about any scheduled trips that get canceled. Information about any canceled trips should be communicated to riders. |
100% of planned and/or operated trips are represented within the Trip Updates feed. This includes:
|
|
Accurate and confident trip-planning depends on reliably receiving information about all in-service options. Information about all planned trips should be communicated to riders. |
100% of trips marked as “Scheduled,” “Canceled,” or “Added” within the Trip Updates feed are represented within the Vehicle Positions feed. |
|
Service Accuracy |
Frequent status updates are needed to provide accurate trip-planning information to customers that they can rely on. |
Updates are published to Trip Updates and Vehicle Positions feeds at least once every 20 seconds, including updated timestamps and data for each trip and vehicle in service. |
Data Availability
Feature |
Finding(s) |
Guideline |
Feed Aggregator Availability |
Many data consumers turn to popular global feed aggregators to look for GTFS Schedule data. Therefore, including the GTFS Schedule feed on these sites allows for a transit provider’s GTFS data to be more discoverable. |
The GTFS Schedule dataset is published to global GTFS aggregators, including: transit.land. |
Many data consumers turn to popular global feed aggregators to look for GTFS Realtime data. Therefore, including GTFS Realtime feeds on these sites allows for a transit provider’s GTFS data to be more discoverable. |
The links to the GTFS Realtime datasets are published to global GTFS aggregators, including: transit.land. To be complete, this includes:
|
|
Many data consumers turn to popular global feed aggregators to look for GTFS Schedule data. Therefore, including the GTFS Schedule feed on these sites allows for a transit provider’s GTFS data to be more discoverable. |
The GTFS Schedule dataset is included within the Mobility Database. |
|
Many data consumers turn to popular global feed aggregators to look for GTFS Realtime data. Therefore, including GTFS Realtime feeds on these sites allows for a transit provider’s GTFS data to be more discoverable. |
The links to the GTFS Realtime datasets are included within the Mobility Database. To be complete, this includes:
|
|
Technical Contact |
A transit provider’s website should include a way for any data consumer to report errors they observe or ask questions so that all riders can benefit from these reports. |
Either the transit provider’s website, or their regional partner’s website, identifies a specific technical contact who knows how to triage inquiries about GTFS. This contact could be, for example, staff at the transit provider (whether a group email inbox such as “gtfs@agency.org”, a link to the main customer service workflow, or an individual) or a vendor contact. This contact is provided alongside the GTFS feeds on either the provider’s (or their regional partner’s) website. |
A GTFS Schedule feed should ensure that app developers can report inconsistencies or errors they observe, so that all riders can benefit from these reports. |
A technical contact is provided within the published GTFS Schedule dataset in the feed_contact_email field within the feed_info.txt file. |
|
Website Availability |
A transit provider’s website should be seen as the source of truth for GTFS Schedule data. Many consumers look to websites for feed(s) and the provider putting this data on their website provides confidence that the consumer has the correct information. |
A link to the GTFS Schedule dataset is published on the transit provider’s website. Transit providers can either publish the links to GTFS data directly or refer to a regional partner where the data can be found. These local and regional websites must be in agreement on the canonical version of each provider’s GTFS data. |
A transit provider’s website should be seen as the source of truth for GTFS Realtime data. Many consumers look to websites for feed(s) and the provider putting this data on their website provides confidence that the consumer has the correct information. |
The links to the GTFS Realtime datasets are included on the transit provider’s website. To be complete, this includes:
Transit providers can either publish the links to GTFS data directly on their websites or refer to a regional partner where the data can be found. These local and regional websites must be in agreement on the canonical version of each provider’s GTFS data. |
|
It is reasonable for a provider to require authentication to access GTFS Realtime feeds by having users register for a unique API key that can be used during the process of making HTTP requests for GTFS Realtime data. This can include allowing the API key to be accepted as a request parameter or request header. |
If an API key is required to access the any or all of the GTFS Realtime feeds, the registration process must be:
Other means of authentication that require extra effort are discouraged. These include requiring users to frequently renew their API keys, requiring requests be made from specific IP addresses, or using authenticated request protocols other than the HTTP protocol. |
Tools and resources
Caltrans maintains several resources to support transit providers in their efforts to meet these Guidelines.
Frequently Asked Questions page
The Guidelines have their own FAQ page that addresses the most common questions raised and links out to further resources. The FAQ page is updated as needed, and we encourage GTFS data producers and consumers to check this page to see if your question is answered there.
Transit Data Check-Ins
How well does a transit provider meet the outcomes listed in the Data Process Checklist? Caltrans is available to formally assess this consistency on request by California transit providers. Caltrans staff are reviewing every California transit provider’s consistency with the checklist once a year and reaching out to transit providers to share and discuss those findings. To proactively request a check-in, contact hello@calitp.org.
If transit providers would like support in resolving identified gaps, Caltrans provides technical assistance to resolve issues within the data or with creating a plan to meet the Guidelines. These plans provide information about where a provider’s feed is not meeting the Guidelines, suggested fixes and resources, and highlighted priorities. After a provider is “Compliant,” the plans for meeting the other Guidelines are driven by provider preference, time, and resources.
Products and services
For California transit providers that do not currently have GTFS Schedule data, Caltrans can meet with you to strategize how to produce GTFS Schedule data and provide direct assistance. Contact hello@calitp.org for assistance.
For California transit providers that do not currently have the hardware or software to create GTFS Realtime, Caltrans’ Cal-ITP offers GTFS Realtime as a Service (GRaaS), which—using low hardware costs and no software costs—produces GTFS Realtime data. The Cal-ITP team is available to support software implementation at no cost to transit providers. For more information, contact hello@calitp.org.
For other hardware and software that providers need in order to produce and maintain GTFS Schedule or GTFS Realtime, Cal-ITP invites providers to acquire these services through pre-negotiated, competitively bid contracts at camobilitymarketplace.org as they become available.
Helpdesk
Have a specific or exploratory question about transit data or strategy development? Caltrans’ team of technical experts is available at no cost to advise and support California transit providers. Contact hello@calitp.org to get started.