This page is open for editing because it is part of the Incubator. Have something to add? Please register so you can contribute. Have an option you would like to share? Please click on the 'Talk' button to enter the dialogue. The TF Resource Volunteers appreciate your feedback and interest.


CAV Data

Michigan Safety Pilot Model Deployment Data (SPDM)

Contributor(s): Sandeep Mudigonda,

Connected vehicle (CV) technology is taking shape with the government mandating the inclusion of vehicle-to-vehicle (V2V) equipment such as the DSRC technology, in all new light weight vehicles starting from 2018. The equipped vehicles communicate using, broadly, two kinds of messages: Probe Data Message (PDM) and Basic Safety Message (BSM). PDM includes location, speed, direction is transmitted at 1Hz. BSM is be much more comprehensive including, location, speed, acceleration, brake status, steering angle, windshield wiper status, etc. collected at 10Hz.

The Safety Pilot Model Deployment was conducted in Ann Arbor, Michigan and data was collected between August 2012-August 2013 from vehicles equipped with DSRC devices. These data include BSM messages generated by vehicles, vehicular data, BSM messages received by road-side equipment (RSE), warning messages generated by RSE, weather and traffic count data.

The data can be accessed from the FHWA’s Research Data Exchange (RDE) at . In addition to real-world CV data, namely, the SPMD data, there are other CV data generated from few well-calibrated microscopic simulation networks also available on RDE.

Commonly Used Tools

BSM Emulator

Analytical assessment of vehicle trajectories from BSM, PDM data is critical for providing various data for Dynamic Mobility Applications (DMA), safety and environmental apps. An open source platform, BSM Data Emulator, was developed to enable these analyses. The BSM Data Emulator to provide the capability for cross-cutting analyses of messages and communications methods to support research in mobile messaging for various connected vehicle applications. The tool can be used for simulating mobile wireless protocols for Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) messages using DSRC, and non-DSRC (cellular) communication. The tool is used to emulate alternative strategies for in-vehicle data capture, storage and transmission timing. Also, the tool is useful to simulate latency and data loss by messaging variant and communications media of data available for system manager use in an operational data environment. Additionally, the emulator can be used to conduct a side-by-side assessment and comparison of the characteristics and attributes of Japanese, European, and US messaging protocols and approaches.

Trajectory data for the emulator is derived from a variety of sources, observed and simulated. Vehicle data such as, vehicle ID, time, speed, acceleration, lane, vehicle Length, link_x , X Coordinate, Y Coordinate, communication mode is used. The algorithms within the trajectory conversion analysis (TCA) tool within the emulator help to examine the effectiveness of BSM, variants of BSM, and PDM transmitted via DSRC and non-DSRC technologies in estimating:

  1. Queues at known bottlenecks
  2. Cycle failures
  3. Shockwaves
  4. Queues at variable locations
  5. Travel time
  6. Delay
  7. Space Mean Speed

The BSM Data emulator is available for free on the FHWA’s RDE.

CV Pilot Deployment Performance Evaluation

Contributor(s): Sisinnio Concas,

CV Pilot Deployment Program is a national effort to deploy, test, and operationalize cutting-edge mobile and roadside technologies and enable multiple CV applications. Through the deployment, the objective is to assess and measure the benefit of CV technologies using a multiple set of performance measures. This will help inform policy makers to what extent these technologies contribute to save lives, improve personal mobility, enhance economic productivity, reduce environmental impacts, and transform public agency operations.

Section 1 : Data sources

Project is entered Phase II of deployment. In Tampa CV Pilot, over the course of the project, CV data will be collected on a sample of 1600 participants traveling through the Tampa CBD. Data collection will be primary and under a quasi-experimental approach where the sample of participants will be split into treatment (those receiving warnings) and control (those not receiving warnings) groups. data include BSM messages generated by vehicles, vehicular data, BSM messages received by road-side equipment (RSE), warning messages generated by RSE, weather and traffic count data. Sample data will become available to researchers through the Research Data Exchange (RDE):

Section 2: Commonly Used Tools

A set of ad-hoc data-fusion tools to assess and measure system performance is underway. Non-proprietary tools will become available at the end of the deployment.

Section 3: Emerging Trends for Topic

The biggest challenge in the deployment will be related to the data generation and transmission. An emerging trend is the development of standardize performance measures to evaluate the benefits of CV technologies for different stakeholders, from the perspective of users and public agencies. This calls for new methods to scale performance measures that are fitted to high granularity data to aggregated system wide performance measures. from data generated at high granularity to

Data Utilized for District Mobility Project

(contact: Ryan Westrom, DDOT,

Full technical summary:

Table 1: Performance Measures and Associated Data

Mobility Story Performance Measure Mode Data Source Reporting Period
Time spent commuting Commute Time Multiple American Community Survey (ACS), TIGER/Line Census Tract Shapefiles 2010-2014 (five year average)
How we are commuting Commute Mode Split Multiple American Community Survey (ACS), TIGER/Line Census Tract Shapefiles 2014
Travel during the week Travel Time Index Auto INRIX1 2015
How many people ride the bus Bus Ridership Transit Automatic Passenger Count (APC) October 2015
Which bus stops serve the most riders Bus Ridership Transit Automatic Passenger Count (APC) October 2015
Which bus routes are most crowded Bus Overcrowding Transit Automatic Passenger Count (APC) October 2015
How reliable are our roads Travel Time Reliability (Planning Time Index) Auto INRIX1 2015
Bus reliability Bus On-Time Performance Transit Automatic Passenger Count (APC) October 2015
Transit coverage area Transit Coverage Area Transit The General Transit Feed Specification (GTFS)2, District Geographic Information System (GIS) October 2015 – April 2016
Bicycle comfort network Bicycle Level of Traffic Stress Bicycle District Bike Layer , District Centerline Layer, Business District Layer 2010-2015*
Pedestrian environment Pedestrian Friendliness Index Pedestrian District Centerline Layer, District Census Block Layer, District Sidewalk Layer, District Census Block Centroids Layer 2010-2015*

Urban Safety data

Motor Vehicle Collision Records for NYC

Contributor(s): Sandeep Mudigonda,'

As a part of their VisionZero program for traffic safety, NYC periodically uploads crash data. These data are available since April 2014. Various fields in the data are the following:

Fields Fields
Date Number of motorist injured
Time Number of motorist killed
Borough Contributing factor vehicle 1
Zip code Contributing factor vehicle 2
Latitude Contributing factor vehicle 3
Longitude Contributing factor vehicle 4
Location Contributing factor vehicle 5
On street name Unique key
Cross street name Vehicle type code 1
Off street name Vehicle type code 2
Number of persons injured Vehicle type code 3
Number of persons killed Vehicle type code 4
Number of pedestrians injured Vehicle type code 5
Number of pedestrians killed
Number of cyclist injured
Number of cyclist killed

Basic querying tools are available on the standardized feeds available on

GPS Data


Contributor(s): Sandeep Mudigonda,

The New York City Taxi & Limousine Commission (NYCTLC) periodically release detailed historical dataset since 2009. This includes yellow and green taxi trips. Each trip record contains latitude/longitude of pick-up and drop-off locations, timestamps for when each trip started and ended, and other variables including fare amount, payment method, and distance traveled.

On one occasion after a Freedom of Information request was filed by FiveThreeEight, Uber released trip data for few months. Approximately 19 million Uber rides are available from FiveThreeEight’s GitHub repository for April-September 2014 (4.5 million) and January-June 2015 (14.3 million). These data are less detailed than the NYCTLC taxi data, the times and coordinates are available only for Uber pick-ups for April-September 2014 only, while an aggregate location is available for January-June 2015.

NREL Data Second-by-second GPS readings for several miles of travel, along with vehicle characteristics and survey participant demographics. However, to get access to the full data, the proposed use of the data has to be discussed with NREL and proceed via an application and approval process. Upon approval, users may remotely connect to the environment, which prohibits removal of sensitive data. However, the provided software tools and reference data allow users to create specialized database queries, perform detailed calculations, and conduct statistical and geographic information system analyses. Users may also request to have custom files or programs loaded for them, and to have aggregated results or reports sent directly to them.

Vehicle Trajectory Data

Contributor(s): Sandeep Mudigonda,

Very high resolution trajectory data is essential for understanding driver behavior. Algorithms such as car-following, lane change decision making, gap acceptance, etc. can be modeled and calibrated using such data. Next Generation Simulation (NGSIM) data includes high resolution vehicle trajectory data extracted from aerial videos from four sites in the US: US-101 (Los Angeles, CA), I-80 (Emeryville, CA), Lankershim Boulevard (Los Angeles, CA), Peachtree Street NE (Atlanta, GA). Each section of freeway or arterial is about 0.5-1 mile in length. The dataset includes vehicle positions at 10Hz. Additionally, sensor data from the sections are also available. The data can be accessed from the FHWA’s Research Data Exchange (RDE) at . More information on the dataset, collection process and sites can be found here:

(Older) ITS Data

Contributor(s): Kristin Tufte

Link to Detailed Description of ITS Data sources – a bit dated, but good info: