Moments in Time

Moments is a research project dedicated to building a very large-scale dataset to help AI systems recognize and understand actions and events in videos.

Today, the dataset includes a collection of one million labeled 3 second videos, involving people, animals, objects or natural phenomena, that capture the gist of a dynamic scene.

MOMENTS

Three seconds events capture an ecosystem of changes in the world: 3 seconds convey meaningful information to understand how agents (human, animal, artificial or natural) transform from one state to another.

Diversity

Designed to have large inter-class and intra-class variation that represent dynamical events at different levels of abstraction (i.e. "opening" doors, drawers, curtains, presents, eyes, mouths, and even flower petals).

Generalization

A large-scale, human-annotated video dataset capturing visual and/or audible actions, produced by humans, animals, objects or nature that together allow for the creation of compound activities occurring at longer time scales.

Transferability

Supervised tasks on a large coverage of the visual and auditory ecosystem help construct powerful but flexible feature detectors, allowing models to quickly transfer learned representations to novel domains.

Video Understanding Demo

Can we understand what models attend to during a prediction?

Here, we show the areas of the video frames that our neural network is focusing on in order to recognize the event in the video. These methods show the networks ability to locate the most important areas to focus on for each video clip so that it can identify each moment.

Using CAM Method by Zhou et al.

Paper

Moments in Time Dataset: one million videos for event understanding

Mathew Monfort, Alex Andonian, Bolei Zhou,
Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown,
Quanfu Fan, Dan Gutfreund, Carl Vondrick, Aude Oliva

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

PDF BIB

Multi-Moments in Time: Learning and Interpreting Models for
Multi-Action Video Understanding

Mathew Monfort, Bowen Pan, Kandan Ramakrishnan, Alex Andonian, Barry A McNamara, Alex Lascelles,
Quanfu Fan, Dan Gutfreund, Rogerio Feris, Aude Oliva

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

PDF BIB

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions

Mathew Monfort*, SouYoung Jin*, Alexander Liu, David Harwath,
Rogerio Feris, James Glass, Aude Oliva

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

PDF BIB

Moments in Time Dataset

MOMENTS

Diversity

Generalization

Transferability

Team

{{member.name}}

{{member.affiliation}}

Collaborators

{{member.name}}

{{member.affiliation}}

Video Understanding Demo

Paper

Moments in Time Dataset: one million videos for event understanding

Multi-Moments in Time: Learning and Interpreting Models for
Multi-Action Video Understanding

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions

Challenges

CVPR 2018 Moments in Time Challenge

ICCV 2019 Multi-Moments in Time Challenge

Download

Moments in Time Dataset

Multi-Moments in Time Dataset

Spoken Moments in Time Dataset

Code and pretrained models

Also check other related video datasets:

Kinetics

Activity Net

Slac

UCF101

HMDB

AVA

Charades

Something-Something

Moments in Time Dataset

MOMENTS

Diversity

Generalization

Transferability

Team

{{member.name}}

{{member.affiliation}}

Collaborators

{{member.name}}

{{member.affiliation}}

Video Understanding Demo

Paper

Moments in Time Dataset: one million videos for event understanding

Multi-Moments in Time: Learning and Interpreting Models for Multi-Action Video Understanding

Spoken Moments: Learning Joint Audio-Visual Representations From Video Descriptions

Challenges

CVPR 2018 Moments in Time Challenge

ICCV 2019 Multi-Moments in Time Challenge

Download

Moments in Time Dataset

Multi-Moments in Time Dataset

Spoken Moments in Time Dataset

Code and pretrained models

Also check other related video datasets:

Kinetics

Activity Net

Slac

UCF101

HMDB

AVA

Charades

Something-Something

Multi-Moments in Time: Learning and Interpreting Models for
Multi-Action Video Understanding