Multi-Moments in Time Challenge 2019

We will host Moments in Time Multimodal Multi-Label Action Detection Challenge at ICCV'19 as part of the Workshop on Multi-modal Video Analysis . The goal of this challenge is to detect the event labels depicted in a 3 second video. The video data comes from the Multi-Moments in Time dataset, which can be downloaded here.

This is a new challenge on multi-label action detection in video. For this, we expanded the Moments in Time dataset to include multiple action labels per video. Labels in the dataset can pertain to actions recognizable using one or more of the audio/visual streams (i.e. audio actions, visual actions, or audio-visual actions).

Moments is an ongoing research collaboration with the MIT-IBM Watson AI Lab. The project is dedicated to building a large-scale dataset to help AI systems recognize and understand actions and events in videos. The dataset includes more than one million three-second videos and two million action labels.

Challenge Results Released


The challenge includes a single track:

A detection task on the entire Multi-Moments in Time dataset:

  • 313 classes
  • 2M training labels for 1M videos
  • 10K validation videos
  • 10K testing videos

Important Dates:

  • July 5, 2019: Training data made available.
  • August 1, 2019: Testing videos are released and submission site opened.
  • October 19, 2019: Submission deadline.
  • October 21, 2019: Challenge results announced.
  • November 02, 2019: Winner(s) are invited to present at the Workshop.

Evaluation Metric

We will use mAP (mean average precision) on the testing set as the official metric for this task.

Submission Format

When submitting your results for the challenge, please provide a plain text file containing a sorted list of class predictions for each video in the test set. Each line should contain the video filename followed by each class label sorted by confidence in descending order:

<filename> <pred(1)> <pred(2)> <pred(3)> ... <pred(313)>