We will host Moments in Time Multimodal Multi-Label
Action Detection Challenge at ICCV'19
as part of the Workshop
on Multi-modal Video Analysis . The goal of this challenge is to detect the
event labels depicted in a 3 second video. The video data comes from the
Multi-Moments in Time dataset, which can be downloaded
This is a new challenge on multi-label action detection in video. For this, we expanded the Moments in Time dataset to include multiple action labels per video. Labels in the dataset can pertain to actions recognizable using one or more of the audio/visual streams (i.e. audio actions, visual actions, or audio-visual actions).
Moments is an ongoing research collaboration with the MIT-IBM Watson AI Lab. The project is dedicated to building a large-scale dataset to help AI systems recognize and understand actions and events in videos. The dataset includes more than one million three-second videos and two million action labels.
To participate in the challenge:
The challenge includes a single track:
A detection task on the entire Multi-Moments in Time dataset:
We will use mAP (mean average precision) on the testing set as the official metric for this task.
When submitting your results for the challenge, please provide a plain text file containing a sorted list of class predictions for each video in the test set. Each line should contain the video filename followed by each class label sorted by confidence in descending order:
<filename> <pred(1)> <pred(2)> <pred(3)> ... <pred(313)>