VATEX

Welcome to VATEX Captioning Challenge 2019!

We are pleased to announce the VATEX Captioning Challenge 2019! The challenge will be hosted at the 3rd Workshop on Closing the Loop Between Vision and Language, ICCV 2019.

Challenge Launch: ~~Aug 6th, 2019.~~
CodaLab Submission Deadline: ~~Oct 1st, 2019.~~
Report Submission Deadline: ~~Oct 15th, 2019 23:59 UTC~~.
~~The winners will be announced at the 3rd CLVL workshop, ICCV 2019 on Oct 28th, 2019~~.

Best System Award (first place in both English and Chinese tracks): Team Forence-CASIA (Ziqi Zhang, Yaya Shi, Jiutong Wei, Chunfeng Yuan, Bing Li, Weiming Hu), Report.
Outstanding Method Award: Team RUC_AIM3 + Adelaide (Shizhe Chen, Yida Zhao, Yuqing Song, Qin Jin, Qi Wu), Report.
Outstanding Method Award: Team pp3 (Xinxin Zhu, Longteng Guo, Peng Yao, Shichen Lu, Wei Liu, Zheng Yu, Jing Liu), Report.

The 1st VATEX Captioning Challenge has ended! We plan to archive the competition results from CodaLab to the official VATEX website and further contribute to the vision-and-language research community. Meanwhile, we have several rewards for the winning teams, which will be announced at the 3rd CLVL workshop on Oct 28th, 2019. To be eligible for result archives and consideration for awards, we kindly request you to send the following information to vatex.org@gmail.com using your main contact email:

Team name.
Team members.
The username used in CodaLab submissions.
An arXiv link to a 2-4 page report, which describes your systems (including data processing, methods, experimental results, etc.) using the ICCV 2019 paper template. Code release is also encouraged to facilitate future research.

The deadline for the above steps is October 15th, 2019 23:59 UTC.

The VATEX dataset is a new large-scale multilingual video description dataset, which contains over 41,250 videos and 825,000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset, VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. Please refer to our ICCV paper for more details. This VATEX Captioning Challenge aims to benchmark progress towards models that can describe the videos in various languages such as English and Chinese.

Dataset Download

Please refer to the details at the Download page. You can download English/Chinese captions and video features from the page.

Submission

The challenge is hosted at the CodaLab. Please go to the challenge page to submit your models.

Challenge Phases

Dev (English/Chinese): This phase evaluates algorithms on the VATEX validation set. We recommend using this Dev phase for algorithm validation. This phase should not be used for reporting results in the paper. A submission needs to consist of results on the entire validation set to be considered as a valid submission.
Test (English/Chinese): This phase evaluates algorithms on VATEX public test set. We recommend using this phase for reporting comparison numbers in academic papers. A submission needs to consist of results on the entire test set to be considered as a valid submission. This phase is aimed at the final evaluation of the model and one is not allowed to create multiple submissions using multiple teams.

Offical Leaderboard

Note: This is the learderboard for tracking results of published papers and concluded challenges. Please email us if you would like to report your published results here. For real-time result tracking of ongoing challenges, please check the real-time leaderboard hosted on CodaLab.

♦ English Captioning

Rank	Model / Team	BLEU-4	Meteor	Rouge-L	CIDEr
1	Forence-CASIA (Zhang et al., 2019)	40.9	26.4	54.2	82.4
2	RUC_AIM3 + Adelaide (Chen et al., 2019)	39.1	25.8	53.3	73.4
3	pp3 (Zhu et al., 2019)	38.4	24.5	52.1	70.0
4	anyeshine (Jin and Wang, 2019)	31.2	22.7	48.6	49.2
5	Imperial College London (ICL) (Caglayan et al., 2019)	29.0	21.1	46.7	45.2
6	Baseline Shared Encoder (Wang et al., ICCV 2019)	28.4	21.7	47.0	45.1
7	Naive Baseline (Wang et al., ICCV 2019)	28.1	21.6	46.9	44.3
8	Baseline Shared Encoder-Decoder (Wang et al., ICCV 2019)	27.9	21.6	46.8	44.2

♦ Chinese Captioning

Rank	Model / Team	BLEU-4	Meteor	Rouge-L	CIDEr
1	Forence-CASIA (Zhang et al., 2019)	32.6	32.5	56.7	64.4
2	pp3 (Zhu et al., 2019)	32.2	32.1	56.2	56.8
3	RUC_AIM3 + Adelaide (Chen et al., 2019)	31.7	30.2	49.4	51.9
4	anyeshine (Jin and Wang, 2019)	26.1	30.4	52.4	37.7
5	Baseline Shared Encoder-Decoder (Wang et al., ICCV 2019)	24.9	29.8	51.7	35.0
6	Baseline Shared Encoder (Wang et al., ICCV 2019)	24.9	29.7	51.6	34.9
7	Naive Baseline (Wang et al., ICCV 2019)	24.9	29.7	51.5	34.7
8	Imperial College London (ICL) (Caglayan et al., 2019)	23.3	29.5	50.8	33.1