The 3rd Pixel-level Video Understanding in the Wild Challenge Workshop

17 June, 2024

CVPR 2024, SEATTLE

Introduction

Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Since the real-world is actually video-based rather than a static state, learning to perform video segmentation is more reasonable and practical for realistic applications. To advance the segmentation task from images to videos, we will present new datasets and competitions in this workshop, aiming at performing the challenging yet practical Pixel-level Video Understanding in the Wild (PVUW). This workshop includes workshop papers

This workshop will cover but not limit to the following topics: 

● Semantic/panoptic segmentation for images/videos 

● Video object/instance segmentation

● Efficient computation for video scene parsing 

● Object tracking 

● Semi-supervised recognition in videos 

● New metrics to evaluate the quality of video scene parsing results 

● Real-world video applications, including autonomous driving, indoor robotics, visual navigation, etc.

Challenges

Pixel-level Video Understanding in the Wild Challenge (PVUW) challenge includes four tracks. In this year, we add two new tracks, Complex Video Object Segmentation Track based on MOSE [1] and Motion Expression guided Video Segmentation track based on MeViS [2]. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios.


[1] MOSE: A New Dataset for Video Object Segmentation in Complex Scenes. ICCV 2023

[2] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions. ICCV 2023


Track 1: Video Semantic Segmentation (VSS) Track 

The video semantic segmentation task aims to recognize the semantics of all frames in a given video.  To participant Track 1, please visit this link.


Track 2: Video Panoptic Segmentation (VPS) Track

 The video panoptic segmentation task aims to jointly predict object classes, bounding boxes, masks, instance id tracking, and semantic segmentation in video frames. To participant Track 2, please visit this link.


Track 3: Complex Video Object Segmentation Track

The complex video object segmentation task aims to track and segment objects in complex environments. To participant Track 3, please visit this link.


Track 4: Motion Expression guided Video Segmentation Track

The motion expression guided video segmentation track focuses on segmenting objects in video content based on a sentence describing the motion of the objects. To participant Track 4, please visit this link.


Important Dates: 


Call for Papers

Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this link.

Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Important Dates: 


Invited Speakers

Xiaojuan Qi

Assistant Professor

The University of Hong Kong

Martin Danelljan

Lecturer 

ETH Zurich

Chen Change Loy

Professor

Nanyang Technological University

Yun Liu

Senior Scientist

A*STAR

Workshop Schedule

13:30 PM Chairs’ opening remarks

13:45 PM Invited talk 1, Prof. XiaoJuan Qi, The University of Hong Kong

14:15 PM Invited talk 2, Dr. Martin Danelljan, ETH Zurich

14:45 PM Invited talk 3, Prof. Chen Change Loy, Nanyang Technological University

15:15 PM Break

15:30 PM Invited talk 4, Dr. Yun Liu, Institute for Infocomm Research, A*STAR

16:00 PM Challenge Track1 1st place Winners’ Oral Presentation

16:10 PM Challenge Track2 1st place Winners’ Oral Presentation

16:20 PM Challenge Track3 1st place Winners’ Oral Presentation

16:30 PM Challenge Track4 1st place Winners’ Oral Presentation

16:40 PM Award ceremony and concluding remarks

Organizers

Henghui Ding

Fudan University

Jiaxu Miao

Zhejiang University

Yunchao Wei

Beijing Jiaotong University 

Zongxin Yang

Zhejiang University

 Nikhila Ravi

META AI

 Yi Yang

Zhejiang University

Si Liu

Beihang University

 Yi Zhu 

Amazon

Elisa Ricci

University of Trento

Cees Snoek

University of Amsterdam

 Song Bai

ByteDance

Philip Torr

University of Oxford