V S P W

A Large-scale Dataset for Video Scene Parsing in the Wild

Introducing VSPW


Large Scale

3,536 videos

251,632 pixel-level labeled frames

124 categories

Dense Annotation

Pixel-level annotations are provided at 15 f/s


High Resolution

Over 96% videos are with high resolution from 720P to 4K


Long-temporal Clips

A complete shot lasting 5 seconds on average


Citing VSPW



@inproceedings{miao2021vspw,

title={VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild},

author={Miao, Jiaxu and Wei, Yunchao and Wu, Yu and Liang, Chen and Li, Guangrui and Yang, Yi},

booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},

year={2021}

}