Urban Big Data
URBAN BIG DATA
# '''CAV Data
# '''Michigan Safety Pilot Model Deployment Data (SPDM)
Contributor(s): Sandeep Mudigonda, email@example.com
Connected vehicle (CV) technology is taking shape with the government mandating the inclusion of vehicle-to-vehicle (V2V) equipment such as the DSRC technology, in all new light weight vehicles starting from 2018. The equipped vehicles communicate using, broadly, two kinds of messages: Probe Data Message (PDM) and Basic Safety Message (BSM). PDM includes location, speed, direction is transmitted at 1Hz. BSM is be much more comprehensive including, location, speed, acceleration, brake status, steering angle, windshield wiper status, etc. collected at 10Hz.
The Safety Pilot Model Deployment was conducted in Ann Arbor, Michigan and data was collected between August 2012-August 2013 from vehicles equipped with DSRC devices. These data include BSM messages generated by vehicles, vehicular data, BSM messages received by road-side equipment (RSE), warning messages generated by RSE, weather and traffic count data.
The data can be accessed from the FHWA’s Research Data Exchange (RDE) at https://www.its-rde.net/ (opens new window) . In addition to real-world CV data, namely, the SPMD data, there are other CV data generated from few well-calibrated microscopic simulation networks also available on RDE.
Commonly Used Tools
Analytical assessment of vehicle trajectories from BSM, PDM data is critical for providing various data for Dynamic Mobility Applications (DMA), safety and environmental apps. An open source platform, BSM Data Emulator, was developed to enable these analyses. The BSM Data Emulator to provide the capability for cross-cutting analyses of messages and communications methods to support research in mobile messaging for various connected vehicle applications. The tool can be used for simulating mobile wireless protocols for Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) messages using DSRC, and non-DSRC (cellular) communication. The tool is used to emulate alternative strategies for in-vehicle data capture, storage and transmission timing. Also, the tool is useful to simulate latency and data loss by messaging variant and communications media of data available for system manager use in an operational data environment. Additionally, the emulator can be used to conduct a side-by-side assessment and comparison of the characteristics and attributes of Japanese, European, and US messaging protocols and approaches.
Trajectory data for the emulator is derived from a variety of sources, observed and simulated. Vehicle data such as, vehicle ID, time, speed, acceleration, lane, vehicle Length, link_x , X Coordinate, Y Coordinate, communication mode is used. The algorithms within the trajectory conversion analysis (TCA) tool within the emulator help to examine the effectiveness of BSM, variants of BSM, and PDM transmitted via DSRC and non-DSRC technologies in estimating:
- Queues at known bottlenecks
- Cycle failures
- Queues at variable locations
- Travel time
- Space Mean Speed
The BSM Data emulator is available for free on the FHWA’s RDE.
# CV Pilot Deployment Performance Evaluation
Contributor(s): Sisinnio Concas, firstname.lastname@example.org
CV Pilot Deployment Program is a national effort to deploy, test, and operationalize cutting-edge mobile and roadside technologies and enable multiple CV applications. Through the deployment, the objective is to assess and measure the benefit of CV technologies using a multiple set of performance measures. This will help inform policy makers to what extent these technologies contribute to save lives, improve personal mobility, enhance economic productivity, reduce environmental impacts, and transform public agency operations. https://www.its.dot.gov/pilots/index.htm (opens new window)
Section 1 : Data sources
Project is entered Phase II of deployment. In Tampa CV Pilot, over the course of the project, CV data will be collected on a sample of 1600 participants traveling through the Tampa CBD. Data collection will be primary and under a quasi-experimental approach where the sample of participants will be split into treatment (those receiving warnings) and control (those not receiving warnings) groups. data include BSM messages generated by vehicles, vehicular data, BSM messages received by road-side equipment (RSE), warning messages generated by RSE, weather and traffic count data. Sample data will become available to researchers through the Research Data Exchange (RDE): https://www.its-rde.net/ (opens new window)
Section 2: Commonly Used Tools
A set of ad-hoc data-fusion tools to assess and measure system performance is underway. Non-proprietary tools will become available at the end of the deployment.
Section 3: Emerging Trends for Topic
The biggest challenge in the deployment will be related to the data generation and transmission. An emerging trend is the development of standardize performance measures to evaluate the benefits of CV technologies for different stakeholders, from the perspective of users and public agencies. This calls for new methods to scale performance measures that are fitted to high granularity data to aggregated system wide performance measures. from data generated at high granularity to
# Data Utilized for District Mobility Project
''(contact: Ryan Westrom, DDOT, email@example.com) ''
Table 1: Performance Measures and Associated Data
|Mobility Story||Performance Measure||Mode||Data Source||Reporting Period|
|Time spent commuting||Commute Time||Multiple||American Community Survey (ACS), TIGER/Line Census Tract Shapefiles||2010-2014 (five year average)|
|How we are commuting||Commute Mode Split||Multiple||American Community Survey (ACS), TIGER/Line Census Tract Shapefiles||2014|
|Travel during the week||Travel Time Index||Auto||INRIX1||2015|
|How many people ride the bus||Bus Ridership||Transit||Automatic Passenger Count (APC)||October 2015|
|Which bus stops serve the most riders||Bus Ridership||Transit||Automatic Passenger Count (APC)||October 2015|
|Which bus routes are most crowded||Bus Overcrowding||Transit||Automatic Passenger Count (APC)||October 2015|
|How reliable are our roads||Travel Time Reliability (Planning Time Index)||Auto||INRIX1||2015|
|Bus reliability||Bus On-Time Performance||Transit||Automatic Passenger Count (APC)||October 2015|
|Transit coverage area||Transit Coverage Area||Transit||The General Transit Feed Specification (GTFS)2, District Geographic Information System (GIS)||October 2015 – April 2016|
|Bicycle comfort network||Bicycle Level of Traffic Stress||Bicycle||District Bike Layer , District Centerline Layer, Business District Layer||2010-2015*|
|Pedestrian environment||Pedestrian Friendliness Index||Pedestrian||District Centerline Layer, District Census Block Layer, District Sidewalk Layer, District Census Block Centroids Layer||2010-2015*|
# Urban Safety data
Motor Vehicle Collision Records for NYC
Contributor(s): Sandeep Mudigonda, firstname.lastname@example.org'
As a part of their VisionZero program for traffic safety, NYC periodically uploads crash data. These data are available since April 2014. Various fields in the data are the following:
|Date||Number of motorist injured|
|Time||Number of motorist killed|
|Borough||Contributing factor vehicle 1|
|Zip code||Contributing factor vehicle 2|
|Latitude||Contributing factor vehicle 3|
|Longitude||Contributing factor vehicle 4|
|Location||Contributing factor vehicle 5|
|On street name||Unique key|
|Cross street name||Vehicle type code 1|
|Off street name||Vehicle type code 2|
|Number of persons injured||Vehicle type code 3|
|Number of persons killed||Vehicle type code 4|
|Number of pedestrians injured||Vehicle type code 5|
|Number of pedestrians killed|
|Number of cyclist injured|
|Number of cyclist killed|
Basic querying tools are available on the standardized feeds available on https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95 (opens new window).
# GPS Data
Contributor(s): Sandeep Mudigonda, email@example.com
The New York City Taxi & Limousine Commission (NYCTLC) periodically release detailed historical dataset since 2009. This includes yellow and green taxi trips. Each trip record contains latitude/longitude of pick-up and drop-off locations, timestamps for when each trip started and ended, and other variables including fare amount, payment method, and distance traveled.
On one occasion after a Freedom of Information request was filed by FiveThreeEight, Uber released trip data for few months. Approximately 19 million Uber rides are available from FiveThreeEight’s GitHub repository for April-September 2014 (4.5 million) and January-June 2015 (14.3 million). These data are less detailed than the NYCTLC taxi data, the times and coordinates are available only for Uber pick-ups for April-September 2014 only, while an aggregate location is available for January-June 2015.
'''NREL Data ''' Second-by-second GPS readings for several miles of travel, along with vehicle characteristics and survey participant demographics. However, to get access to the full data, the proposed use of the data has to be discussed with NREL and proceed via an application and approval process. Upon approval, users may remotely connect to the environment, which prohibits removal of sensitive data. However, the provided software tools and reference data allow users to create specialized database queries, perform detailed calculations, and conduct statistical and geographic information system analyses. Users may also request to have custom files or programs loaded for them, and to have aggregated results or reports sent directly to them.
Vehicle Trajectory Data
Contributor(s): Sandeep Mudigonda, firstname.lastname@example.org
Very high resolution trajectory data is essential for understanding driver behavior. Algorithms such as car-following, lane change decision making, gap acceptance, etc. can be modeled and calibrated using such data. Next Generation Simulation (NGSIM) data includes high resolution vehicle trajectory data extracted from aerial videos from four sites in the US: US-101 (Los Angeles, CA), I-80 (Emeryville, CA), Lankershim Boulevard (Los Angeles, CA), Peachtree Street NE (Atlanta, GA). Each section of freeway or arterial is about 0.5-1 mile in length. The dataset includes vehicle positions at 10Hz. Additionally, sensor data from the sections are also available. The data can be accessed from the FHWA’s Research Data Exchange (RDE) at https://www.its-rde.net/ (opens new window) . More information on the dataset, collection process and sites can be found here: https://ops.fhwa.dot.gov/trafficanalysistools/ngsim.htm (opens new window)
(Older) ITS Data
Contributor(s): Kristin Tufte
Link to Detailed Description of ITS Data sources – a bit dated, but good info: https://portal.its.pdx.edu/fhwa (opens new window)