Sample frames and annotation. ROAD’s annotated frames cover multiple agents and actions, recorded at different weather conditions (overcast, sun, rain) at different times of the day (morning, afternoon and night). Ground truth bounding boxes and labels can also be appreciated.
ROAD is the first benchmark of its kind, designed to allow the autonomous vehicles community to investigate the use of semantically meaningful representations of dynamic road scenes to facilitate situation awareness and decision making for autonomoous driving.
ROAD is a multilabel dataset containing 22 long-duration videos (ca 8 minutes each) comprising 122K frames annotated in terms of *road events*, defined as triplets E = (Agent, Action, Location) and represented as a series of frame-wise bounding box detections.
ROAD has the ambition to become the reference benchmark for agent and event detection, intention and trajectory prediction, future events anticipation, modelling of complex road activities, instance- and class-incremental continual learning, machine theory of mind and automated decision making.