Meet VATEX!

A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research.

Why VATEX?



MULTILINGUAL

Both English and Chinese captions.



LARGE-SCALE

826K captions for 41.3K video clips.



VIDEO COVERAGE

Comprehensive and representative video content from 600 fine-grained human activities.



LEXICAL DIVERSITY

Unique and lexically-richer annotations to empower more natural and diverse caption generation.



Comparison






Paper


Please cite our paper as below if you use the VATEX dataset.

@InProceedings{Wang_2019_ICCV,
author = {Wang, Xin and Wu, Jiawei and Chen, Junkun and Li, Lei and Wang, Yuan-Fang and Wang, William Yang},
title = {VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}
                        

Contact



Have any questions or suggestions? Feel free to contact us via the team email vatex.org@gmail.com!