Track1

2D object detection

For 2D object detection, we provide a real-word training dataset with 10 million images of which 5K are labeled and with 5K/10K validation/testing labeled images for evaluation. This dataset has been collected throughout diverse scenarios in cities in China and contains scenes in a wide variety of places, objects and weather conditions such as highways, city streets, country roads, rainy weather, also different cameras / camera setups.

  • Evaluation: Leaderboard ranking for this track is by Mean Average Precision(mAP) among all categories, that is, the mean over the APs of pedestrian, cyclist, car, truck, tram and tricycle. The IoU overlap threshold for pedestrian, cyclist, tricycle is set to 0.5, and for car, truck, tram is set to 0.7. Only camera images of SODA10M are allowed to be used.
  • Dataset: Please refer to SODA-2d for detailed dataset introduction and dataset downloads.
  • Submission: TBD .

Track2

3D object detection

For 3D object detection, we provide a large-scale dataset with 1 million point clouds and 7 million images. We annotated 5K, 3K and 8K scenes for training, validation and testing set respectively and leave the other scenes unlabeled. We provide 3D bounding boxes for car, cyclist, pedestrian, truck and bus.

  • Evaluation: Leaderboard ranking for this track is by Mean Average Precision with Heading (mAPH) / L2 among "ALL_NS" (all Object Types except signs), that is, the mean over the APHs of car, cyclist, pedestrian, truck and bus. All sensors are allowed to be used.
  • Dataset: Please refer to ONCE for detailed dataset introduction and dataset downloads.
  • Submission: TBD.

Track3

Corner Case Detection

Deep learning has achieved prominent success in detecting common traffic participants (e.g., cars, pedestrians, and cyclists). Such detectors, however, are generally incapable of detecting novel objects that are not seen or rarely seen in the training process, generally called (object-level) corner cases, which consist of two categories, namely 1) instance of novel class (e.g., a runaway tire) and 2) novel instance of common class (e.g., an overturned truck). Properly dealing with corner cases has become the essential key to reliable autonomous driving perception systems.

We provide a real-word training dataset with 10 million images of which 5K are labeled and with 5K/10K validation/testing labeled images for evaluation. This dataset has been collected throughout diverse scenarios in cities in China and contains scenes in a wide variety of places, objects and weather conditions such as highways, city streets, country roads, rainy weather, also different cameras / camera setups.

  • Evaluation: We utilize the COCO-style average recall as the evaluation metrics.
  • Dataset: Please refer to SODA-2d for detailed dataset introduction and dataset downloads.
  • Submission: TBD.

Track4

Multiple object tracking and segmentation

For multiple object tracking (MOT), we extract the video frames from the BDD100K dataset, which contains 100K unlabeled videos and 1,400/200/400 labeled videos for train/val/test. The labeled part contains 160K instances, 4M objects. BDD100K has been collected throughout diverse scenarios, covering New York, San Francisco Bay Area, and other regions in the US. It contains scenes in a wide variety of locations, weather conditions and day time periods, such as highways, city streets, residential areas, rainy/snowy weathers, etc. Each video is 40-second long and 30fps. For segmentation tracking (MOTS), we use 100K unlabeled videos and 154/32/37 labeled videos for train/val/test. The labeled parts contain 25K instances and 480K masks, which are the segmentation tracking videos of BDD100K. We hope the utilization of large-scale unlabeled video data in self-driving could further boost the performance of MOT & MOTS.

  • Evaluation: TBD
  • Dataset: TBD.
  • Submission: TBD.

Track5

Unified model for multi-task benchmark

Awaiting details...

  • Evaluation: TBD
  • Dataset: TBD.
  • Submission: TBD.

CHALLENGE PRIZES: (TOTAL 100,000 USD)

Challenge participants with the most successful and innovative entries will be invited to present at this workshop and will receive awards. There are 20,000 USD cash prize for each track. A 10,000 USD cash prize will be awarded to the top performers in each task and 2nd and 3rd places will be awarded with 5 000 USD each.


References

[1] Han J, Liang X, Xu H, et al. SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving[J]. arXiv preprint arXiv:2106.11118, 2021.

[2] Mao J, Niu M, Jiang C, et al. One Million Scenes for Autonomous Driving: ONCE Dataset[J]. arXiv preprint arXiv:2106.11037,