Critical GTFS Validation Errors
A GTFS validator is a software tool that identifies common issues with a transit provider’s data. Caltrans recommends two validators, one for GTFS Schedule and one for GTFS Realtime.
-
The Center for Urban Transportation Research’s (CUTR) GTFS Realtime Validator
Each validator can identify dozens of issues, and each issue has a different impact on the transit rider. The lists below indicate which issues Caltrans has deemed critical.
GTFS Schedule
The following errors are deemed critical by Caltrans because they impede the ability of data-consuming applications to properly represent a transit provider’s scheduled services:
GTFS Error | Description |
---|---|
Trips within the same block have overlapping stop times, which is not allowed. |
|
The Validator was unable to parse a field. This is typically caused by a cell containing more than 4096 characters. |
|
Two consecutive points in shapes.txt should have increasing values for shape_dist_traveled. If the values are equal, this is considered as an error. |
|
Two consecutive stop times in a trip should have increasing distance. If the values are equal, this is considered as an error. |
|
The input file CSV header has the same column name repeated. |
|
Each of the following fields should be unique in fare_rules.txt: fare_rules.route_id, fare_rules.origin_id, fare_rules.contains_id, and fare_rules.destination_id |
|
The values of the given key and rows are duplicates. |
|
Empty csv file found in the archive: File does not have any headers, or is a required file and does not have any data. The GTFS specification requires the first line of each file to contain field names, and required files must have data. |
|
The values of the given key and rows of one table cannot be found with values of the given key in another table. |
|
Agencies from GTFS agency.txt have been found to have different timezones. |
|
A color is coded incorrectly. A color must be encoded as a six-digit hexadecimal number. The leading "#" should not be included. |
|
Value of field with type currency is not valid. Currency code must follow ISO 4217. |
|
A date is coded incorrectly. Dates must have the YYYYMMDD format. |
|
A field contains a malformed email address. |
|
A field cannot be parsed as a floating point number. |
|
A field cannot be parsed as an integer. |
|
A field contains an incorrect language code. Language codes must follow IETF BCP 47. |
|
A row in the input file has a different number of values than specified by the CSV header. |
|
Value of a field with type time is not valid. Time must be in the H:MM:SS, HH:MM:SS or HHH:MM:SS format. |
|
Value of field with type timezone is not valid.Timezones are defined at www.iana.org. Timezone names never contain the space character but may contain an underscore. |
|
A field contains a malformed URL. |
|
The following location types must have a parent station: entrance, generic node, boarding_area. |
|
Both files calendar_dates.txt and calendar.txt are missing from the GTFS dataset. At least one of the files must be provided. |
|
A required column is missing. |
|
A required field is blank. |
|
A required file is missing. |
|
First and last stop of a trip must define both arrival_time and departure_time fields. |
|
A value in CSV file has a new line or carriage return. |
|
The values in the given column of the input rows are out of range. |
|
Trip frequencies must not overlap in time. |
|
Both short_name and long_name are missing for a route. At least one is required. |
|
The GTFS spec requires that a route_description is distinct from both the route short and long names. |
|
Date or time fields have been found equal. |
|
Date or time fields have been found out of order. |
|
Field parent_station must be empty when location_type is 2 (entrance/exit). |
|
The departure_time must not precede the arrival_time in stop_times.txt if both are given. |
|
Value of field location_type of parent found in field parent_station is invalid. Stops, Platforms, Entrance/exit, and generic nodes can only have a Station as a parent. Boarding Area can only have Platform as a parent. Stations cannot have a parent. All other combinations are prohibited. |
|
All routes of the same route_type with the same agency_id should have unique combinations of route_short_name and route_long_name. |
|
A column name has not been provided. Such columns are skipped by the Validator. |
|
A row in the input file has only spaces. |
|
At any time, the published GTFS dataset should be valid for at least the next 7 days, and ideally for as long as the operator is confident that the schedule will continue to be operated. If possible, the GTFS dataset should cover at least the next 30 days of service. |
|
A file that is expected to have only one row has more than one row. |
|
A value of field with type id contains non ASCII or non printable characters. This is not recommended. |
|
A platform has no parent_station field set. |
|
Short and long names should not be identical. |
|
Short name of a route is too long (more than 12 characters). Note that major trip planning applications start truncating short names after seven characters. |
|
The start_time and end_time in frequencies.txt should not match, or block IDs will not be able to represent linked trips. |
|
Any record with stop_times.timepoint set to 1 should define a value for stop_times.arrival_time and stop_times.departure_time fields. |
|
Per GTFS Best Practices, route alignments (in shapes.txt) should be within 100 meters of stop locations which a trip serves. |
|
As implemented in the original Google Python GTFS Validator, the calculated speed between stops should not be greater than 150 km/h (42 m/s SI or 93 mph). |
|
An enum has an unexpected value. |
|
A trip must visit more than one stop in stop_times.txt to be usable by passengers for boarding and alighting. |
GTFS Realtime
The errors and warnings generated by the GTFS Realtime Validator below are considered critical by Caltrans. The complete list of errors is available on GitHub.
GTFS Error | Description |
---|---|
E001 | Not in POSIX time |
E002 | stop_time_updates not strictly sorted |
E003/E004 | GTFS-rt trip_id or trip specified by (route_id,direction_id,start_time) does not exist in GTFS data |
E009 | GTFS-rt stop_sequence isn't provided for trip that visits same stop_id more than once |
E011 | GTFS-rt stop_id does not exist in GTFS data |
E020/E021 | Invalid start_time or start_date format |
E022 | Sequential stop_time_update times are not increasing |
E026 | Invalid vehicle position |
E029 | Vehicle position far from trip shape |
E040 | stop_time_update doesn't contain stop_id or stop_sequence |
E043 | stop_time_update doesn't have arrival or departure |
E044 | stop_time_update arrival/departure doesn't have delay or time |
W007 | Refresh interval is more than 35 seconds |
W008 | Header timestamp is older than 65 seconds |