VSPW Challenge Dataset and Rules

Dataset

This challenge uses the large-scale Video Scene Parsing in the Wild (VSPW) dataset. VSPW totally provides 3,536 annotated videos, including 251,633 frames from 124 pre-defined categories. VSPW is annotated densely with the labeled frame rate of 15 f/s. For more details about VSPW, please refer to this link.

Videos from VSPW are with high resolution (96% of videos are with 1080P). Considering the pressure of computation sources, this challenge resizes VSPW to 480P. You can download VSPW_480P by the following links:

Google Drive: https://drive.google.com/file/d/1-Z_mwp_mGOlX842kerGVfA7g7a38ueKA/view?usp=sharing

Baidu YunPan: 链接:https://pan.baidu.com/s/1p3HNj6_-DtnTt-aHAsSVlA 密码:akga

The labeled training/validation data and unlabeled test data are provided. The VSPW dataset is used for this competation and academic purpose only.

If the VSPW dataset helps your research, please cite the following paper.

@inproceedings{miao2021vspw,

title={VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild},

author={Miao, Jiaxu and Wei, Yunchao and Wu, Yu and Liang, Chen and Li, Guangrui and Yang, Yi},

booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},

year={2021}

}

Evaluation and Rules

For video scene parsing, we use Mean IoU to evaluate the segmentation performance and Video Consistency(VC) to evaluate the stability of predictions.

Mean IoU(mIoU) indicates the intersection-over-union between the predicted and ground truth pixels, averaged over all the classes.

Video Consistency (VC) indicates the category consistency among long-range adjacent frames.

We provide the labeled training data and validation data. The provided test data is unlabeled.

There are 2 phases:

  • Phase 1: development phase. We provide you with labeled training/validation data and unlabeled test data. Since the labeled validation data is provided, in this phase, we split the test data into two parts. You will receive feedback on your performance on one part of the test set only. Which videos in the test set at Phase 1 part will not be told. Thus you must submit the results of all the test set. The performance of your best submission will be displayed on the leaderboard.

  • Phase 2: final phase. The scores of the development phase will not be automatically copied over. Therefore, you must re-submit your solution to the final phase if you want to be considered in the final leaderboard.

You only need to submit the prediction results of the test set (no code). For the 1st Video Scene Parsing in the Wild challenge, the ranking is evaluated according to mIoU.

The submission file is a zip file named result_submission.zip. The structure of the folder is:

|----result_submission/

| |----video1/

| |----image1.png

| |----image2.png

......

You must submit predicted results of all the test data. One example of the submission file is shown at Participate - Files - Starting Kit. There is a baseline code for VSPW here.


Rules: 1. For the 1st Video Scene Parsing in the Wild challenge, the ranking is evaluated according to mIoU.

2. The validation data is NOT allowed for training your model.

3. Other datasets are allowed for training and the participants need to claim which extra datasets they used.

Important Dates

May 20, 2021 Phase 1: Development Phase starts.

Aug. 5, 2021 Phase 2: Final Phase starts. Phase 1 is automatically closed.

Aug. 8, 2021 Phase 2 ends. Deadline for submitting the final predictions over the test data.

Aug. 9, 2021 Release of final results.

Aug. 12, 2021 Paper submission deadline. We encourage participants to submit a paper to the associated workshop, independently of their rank position.

Submission via CMT: TBD