The 1st Video Scene Parsing in the Wild Challenge Workshop



Scene parsing is one of the main problems in computer vision, which aims at recognizing the semantics of each pixel in the given image. Recently, several image-based datasets have been collected to evaluate the effectiveness of scene parsing approaches. However, since the real-world is actually video-based rather than a static state, learning to perform video scene parsing is more reasonable and practical for realistic applications. Although remarkable progress has been made in image-based scene parsing, few works have been proposed to consider the video scene parsing, which is mainly limited by the lack of suitable benchmarks. To advance the scene parsing task from images to videos, we present a new dataset (VSPW dataset) and a competition in this workshop, aiming at performing the challenging yet practical Video Scene Parsing in the Wild (VSPW).

Video Scene Parsing aims to assign pre-defined semantic labels to pixels of all frames in a given video, which brings new challenges compared with image semantic segmentation. One main challenge of Video Scene Parsing task is how to leverage the temporal information for high predictive accuracy. We expect the challengers to provide results in terms of the accuracy better than image-based semantic segmentation methods.

Invited Speakers

Liang-Chieh (Jay) Chen

Research Scientist at Google AI

Raquel Urtasun

Chief Scientist at University of Toronto

Hengshuang Zhao

Postdoctoral Researcher at University of Oxford

Federico Perazzi

Research Scientist at Facebook

Call for Papers

We are soliciting high quality papers covering the topics listed below. Papers should follow the standard ICCV formatting instructions. Papers should be 4-8 pages in length (excluding references) formatted using the ICCV template. All the submissions should be anonymous. Accepted papers will appear in the IEEE/CVF proceedings. This workshop accepts both the challenge and regular papers. Any paper about the following topics is welcomed. So you can submit your paper without attending our challenge.

Topics of Interest

The topics of interest include (but are not limited to):

  • Semantic segmentation for images/videos

  • Video object/instance segmentation

  • Efficient computation for video scene parsing

  • Object tracking

  • Semi-supervised recognition in videos

  • New metrics to evaluate the quality of video scene parsing results

  • Real-world video applications, including autonomous driving, indoor robotics, visual navigation, etc.

Submission Deadline: July 25 23:59 PST

Author Notification: August 6 23:59 PST

Camera ready due: August 13 23:59 PST

Submission via CMT: TBD


8:30 AM Chairs’ opening remarks

8:45 AM Raquel Urtasun, University of Toronto

9:15 AM ‪Liang-Chieh (Jay) Chen, Google AI

9:45 AM Hengshuang Zhao, University of Oxford

10:15 AM Break

10:30 AM Federico Perazzi, Facebook

11:00 AM Challenge 1st place Winners’ Oral Presentation

11:15 AM Challenge 2nd place Winners’ Oral Presentation

11:30 AM Challenge 3rd place Winners’ Oral Presentation

11:45 AM Award ceremony and concluding remarks


Yunchao Wei

University of Technology Sydney

Jiaxu Miao

University of Technology Sydney

Yu Wu

University of Technology Sydney

Yi Yang

University of Technology Sydney

Si Liu

Beihang University

Zhu Yi


Elisa Ricci

University of Trento

Cees Snoek

University of Amsterdam