Sample frames and annotation. ROAD’s annotated frames cover multiple agents and actions, recorded at different weather conditions (overcast, sun, rain) at different times of the day (morning, afternoon and night). Ground truth bounding boxes and labels can also be appreciated.

ROAD is the first benchmark of its kind, designed to allow the autonomous vehicles community to investigate the use of semantically meaningful representations of dynamic road scenes to facilitate situation awareness and decision making for autonomoous driving.

ROAD is a multilabel dataset containing 22 long-duration videos (ca 8 minutes each) comprising 122K frames annotated in terms of *road events*, defined as triplets E = (Agent, Action, Location) and represented as a series of frame-wise bounding box detections.

ROAD has the ambition to become the reference benchmark for agent and event detection, intention and trajectory prediction, future events anticipation, modelling of complex road activities, instance- and class-incremental continual learning, machine theory of mind and automated decision making.

YouTube Videos

Sample video with ground truth and annotations for our ROad event Awareness Dataset for autonomous driving (ROAD)

Footage of Prof Fabio Cuzzolin’s invited talk at the Machine Learning series of seminars of the Legato group, University of Luxembourg

Footage of Prof Fabio Cuzzolin’s invited talk at the DeepView’21 workshop of AVSS

The complete recording of the ROAD @ ICCV 2021 Workshop