This grand challenge aims to advance large-scale human-centric video analysis in complex events using multimedia techniques. We propose the largest existing dataset (named as Human-in-Events or HiEve) for understanding human motion, pose, and action in a variety of realistic events, especially crowd & complex events. Four challenging tasks are established on our dataset, which encourages researches to address the very challenging and realistic problems in human-centric analysis. Our challenge will beneﬁt researches in a wide range of multimedia and computer vision areas including multimedia analysis and multimedia content analysis.
At the end of the Challenge, all teams will be ranked based on objective evaluation. The top-3 performing teams in each track will receive award certificates and awards. At the same time, teams of high performance results are invited to submit challenge papers (4-pages) and present their solutions during the conference.
In this grand challenge, we focus on very challenging and realistic tasks of human-centric analysis in various crowd & complex events, including subway getting on/oﬀ, collision, ﬁghting, and earthquake escape (cf. Figure. 1). To the best of our knowledge, few existing human analysis approaches report their performance under such complex events. With this consideration, we further propose a dataset (named as Human-in-Events or HiEve) with large-scale and densely-annotated labels covering a wide range of tasks in human-centric analysis (1M+ poses and 56k+ action labels, cf. Table 1).
Our HiEve dataset covers a wide range of human-centric understanding tasks including motion, pose, and action, while the previous workshops only focus on a subset of our tasks. Compared with the related workshops & challenges, our challenge has the following unique characteristics:
• Our challenge covers a wide range of human-centric understanding tasks including motion, pose, and action, while the previous workshops only focus on a subset of our tasks (cf. Table 1).
• Our challenge has substantially larger data scales, which includes the currently largest number of poses (>1M), the largest number of complex-event action labels (>56k), and one of the largest number of trajectories with long terms (with average trajectory length >480).
• Our challenge focuses on the challenging scenes under various crowd & complex events (such as dining, earthquake escape, subway getting-oﬀ, and collision, cf. Figure 1), while the related workshops are mostly related to normal or relatively simple scenes.
|Dataset||# pose||# box||# traj.(avg)||# action||pose track||surveillance||complex events|