Moments in Time Dataset

A large-scale dataset for recognizing and understanding action in videos
Explore Moments

Moments is a research project in development by the MIT-IBM Watson AI Lab. The project is dedicated to building a very large-scale dataset to help AI systems recognize and understand actions and events in videos.

Today, the dataset includes a collection of one million labeled 3 second videos, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.

News (February 14, 2018): Moments in Time Challenge is hosted at CVPR’18. Come to Compete!


Three seconds events capture an ecosystem of changes in the world: 3 seconds convey meaningful information to understand how agents (human, animal, artificial or natural) transform from one state to another.


Designed to have large inter-class and intra-class variation that represent dynamical events at different levels of abstraction (i.e. "opening" doors, drawers, curtains, presents, eyes, mouths, and even flower petals).


A large-scale, human-annotated video dataset capturing visual and/or audible actions, produced by humans, animals, objects or nature that together allow for the creation of compound activities occurring at longer time scales.


Supervised tasks on a large coverage of the visual and auditory ecosystem help construct powerful but flexible feature detectors, allowing models to quickly transfer learned representations to novel domains.






Video Understanding Demo

Can we understand what models attend to during a prediction?

Here, we show the areas of the video frames that our neural network is focusing on in order to recognize the event in the video. These methods show the networks ability to locate the most important areas to focus on for each video clip so that it can identify each moment.


Moments in Time Dataset: one million videos for event understanding

Mathew Monfort, Bolei Zhou, Sarah Adel Bargal,
Alex Andonian, Tom Yan, Kandan Ramakrishnan, Lisa Brown,
Quanfu Fan, Dan Gutfreund, Carl Vondrick, Aude Oliva

Moments in Time Challenge 2018

We will host Moments in Time Recognition challenge at CVPR'18, jointly held with the ActivityNet Challenge 2018. The goal of this challenge is to identify the event labels depicted in a 3 second video. The video data comes from the Moments in Time dataset. The challenge has two tracks:

Full track

The Full track is to benchmark classification algorithms on the entire Moments in Time dataset:

  • 339 classes
  • 802,264 training videos
  • 33,900 validation videos
  • 67,800 testing videos

The commonly used Top-5 Accuracy on the test set will be used to measure the performance of different algorithms. The Full track is open to all the public.

Mini track

The Mini track is to benchmark classification algorithms on a subset of Moments in Time dataset:

  • 200 classes
  • 100,000 training videos
  • 10,000 validation videos
  • 20,000 testing videos

Top-5 Accuracy on the validation set will be used to measure the performance. The Mini track is open only to students (all the team members should be registered students, the team advisor/coach could be faculty members associated with a university/college). Data for baselines for the Mini track will be made available soon.

Important Dates:

  • March 1, 2018: Training data and development kit with evaluation scripts made available.
  • April 1, 2018: Testing videos are released.
  • June 1, 2018: Submission deadline.
  • June 7, 2018: Challenge results released.
  • June 22, 2018: Winner(s) are invited to present at the Workshop.


Moments in Time Dataset

To obtain the dataset, please fill out this form.

Code and pretrained models

Convolutional neural networks (CNNs) trained on the Moments in Time Dataset can be downloaded and used for action and event recognition in videos.


This work was supported by:

Also check other related video datasets: