Meet VATEX!

A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.

Why VATEX?

MULTILINGUAL

Both English and Chinese captions.

LARGE-SCALE

826K captions for 41.3K video clips.

VIDEO COVERAGE

Comprehensive and representative video content from 600 fine-grained human activities.

LEXICAL DIVERSITY

Unique and lexically-richer annotations to empower more natural and diverse caption generation.

Comparison

Paper

Please cite our paper as below if you use the VATEX dataset.


@InProceedings{Wang_2019_ICCV,
author = {Wang, Xin and Wu, Jiawei and Chen, Junkun and Li, Lei and Wang, Yuan-Fang and Wang, William Yang},
title = {VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}

Contact

Have any questions or suggestions? Feel free to contact us via the team email vatex.org@gmail.com!

Copyright © UCSB NLP Group
3530 Phelps Hall
University of California, Santa Barbara
Santa Barbara, CA 93106-5110

The dataset is under a Creative Commons Attribution 4.0 International License.
Contact the VATEX team by vatex.org@gmail.com.