Video-guided Machine Translation (VMT) Challenge


We are pleased to announce the first Video-guided Machine Translation (VMT) Challenge! The challenge will be hosted at the Workshop on Advances in Language and Vision Research (ALVR), ACL 2020.

Please stay tuned for more information!


  • Data & Code Available: April 13th, 2020.
  • Challenge Launch: April 13th, 2020.
  • Results Submission Deadline: June 15th, 2020.
  • Challenge Paper Submission Deadline: June 22nd, 2020.
  • The winners will be announced at the ALVR workshop, ACL 2020 on July 9th, 2020.

Challenge Paper Requirements

To be eligible for result archives and consideration for awards, we kindly request you to send the following information to using your main contact email:

  • Team name.
  • Team members.
  • The username used in CodaLab submissions.
  • An arXiv link to a 2-4 page paper, which describes your systems (including data processing, methods, experimental results, etc.) using the ACL 2020 paper template.
Note that the challenge paper will be accepted to our ACL 2020 ALVR workshop as a non-archival paper.


The VATEX dataset is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. Please refer to our ICCV paper for more details. This Video-guided Machine Translation Challenge aims to benchmark progress towards models that translate source language sentence into the target language with video information as the additional spatiotemporal context.

Starter Code

The starter code for video-guided machine translation is released here, including the baseline VMT model, the preparation of the data and features, and the submission file generation.


The challenge is hosted at the CodaLab. Please go to the Challenge page to submit your models.

Challenge Phase
  • English-to-Chinese Translation: To fairly evaluate the submission results, we only provide English-to-Chinese translation on the public test set with English descriptions released and Chinese translations heldout. A submission needs to consist of results on the entire public test set to be considered as a valid submission. The participants are supposed to validate their models locally with our starter code, and this phase is aimed at the final evaluation of the model and one is not allowed to create multiple submissions using multiple teams.


Xin (Eric) Wang
UC Santa Cruz

An Yan
UC San Diego

Lei Li
ByteDance AI Lab

William Yang Wang
UC Santa Barbara