A Large-scale Dataset for Video Scene Parsing in the Wild

Introducing VSPW

Large Scale

3,536 videos

251,632 pixel-level labeled frames

124 categories

Dense Annotation

Pixel-level annotations are provided at 15 f/s

High Resolution

Over 96% videos are with high resolution from 720P to 4K

Long-temporal Clips

A complete shot lasting 5 seconds on average

Citing VSPW


title={VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild},

author={Miao, Jiaxu and Wei, Yunchao and Wu, Yu and Liang, Chen and Li, Guangrui and Yang, Yi},

booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},




The annotations in this dataset belong to the organizers of the challenge and are licensed under a Creative Commons Attribution 4.0 License.

The data is released for non-commercial research purpose only.

The organizers of the dataset as well as their employers make no representations or warranties regarding the Database, including but not limited to warranties of non-infringement or fitness for a particular purpose. Researcher accepts full responsibility for his or her use of the Database and shall defend and indemnify the organizers, against any and all claims arising from Researcher’s use of the Database, including but not limited to Researcher’s use of any copies of copyrighted videos that he or she may create from the Database. Researcher may provide research associates and colleagues with access to the Database provided that they first agree to be bound by these terms and conditions. The organizers reserve the right to terminate Researcher’s access to the Database at any time. If Researcher is employed by a for-profit, commercial entity, Researcher’s employer shall also be bound by these terms and conditions, and Researcher hereby represents that he or she is fully authorized to enter into this agreement on behalf of such employer.