Xin (Eric) WangAssistant ProfessorDepartment of Computer Science and Engineering Jack Baskin School of Engineering University of California, Santa Cruz Email: xwang366 [at] ucsc [dot] edu |
![]() |
Winter 2021 | CSE 142: Machine Learning |
Spring 2021 | CSE 290C: Multimodal Deep Learning |
Fall 2021 | CSE 142: Machine Learning |
[NEW!] | Our paper on Visual Question Rewriting was accepted to SIGIR 2021! |
[NEW!] | I am serving as Area Chair for CoNLL 2021 and NLPCC 2021. |
[NEW!] | Co-organizing the 4th Workshop on Closing the Loop Between Vision and Language (CLVL) at ICCV 2021! |
[NEW!] | Co-organizing the Tutorial on "From VQA to VLN: Recent Advances in Vision-and-Language Research" at CVPR 2021! |
[NEW!] | Co-organizing the Second Workshop on Advances in Language and Vision Research (ALVR) at NAACL 2021! |
[NEW!] | Invited talk at Arizona State University. |
[NEW!] | Two papers on multimodal style transfer learning for VLN and visual comparison were accepted to EACL 2020! |
[NEW!] | I am serving as Area Chair for NAACL 2021. |
[NEW!] | I am serving as Senior Program Committee (SPC) for IJCAI 2021. |
[NEW!] | Three papers were accepted to EMNLP 2020 (two conference papers and one Findings paper)! |
[NEW!] | Two papers were accepted to ECCV 2020 (the adversarial path sampling paper was seleted as Spotlight)! |
[NEW!] | I successfully defended my Ph.D. Dissertation Closing the Loop Between Language and Vision for Embodied Agents. Thanks to the committee and everyone who has helped me along the Ph.D. journey! |
[NEW!] | I am serving as Area Chair and Session Chair for EMNLP 2020. |
[NEW!] | Co-organizing the workshop on Advances in Language and Vision Research (ALVR) at ACL 2020! |
[NEW!] | Two papers were accepted to CVPR 2020 (the REVERIE paper was selected as Oral)! |
[03/2020] | Invited panelist at the GPU Technology Conference (GTC) 2020. |
[11/2019] | Organizer of the workshop on Language & Vision with applications to Video Understanding at CVPR 2020. |
[11/2019] | Organizer of the tutorial on Self-Supervised Deep Learning for NLP at AACL-IJCNLP 2020. |
[10/2019] | Invited speaker at the ICCV 2019 Workshop on Person In Context. |
[06/2019] | Recipient of the CVPR 2019 Best Student Paper Award. |
[06/2019] | Co-Organizer of the workshop on Closing the Loop Between Vision and Language at ICCV 2019. |
[06/2019] | Invited talk at Facebook AI. |
[01/2019] | Session Chair for AAAI 2019 (natural language processing). |
Language-based Video Editing via Multi-Modal Multi-Level Transformer |
Diagnosing Vision-and-Language Navigation: What Really Matters |
Visual Question Rewriting for Increasing Response Rate |
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation |
L2C: Describing Visual Differences Needs Semantic Understanding of Individuals |
Closing the Loop Between Language and Vision for Embodied Agents |
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning |
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations |
Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation |
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation |
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling |
Relational Graph Learning for Grounded Video Description Generation |
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation |
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments |
Vision-Language Navigation Policy Learning and Adaptation
|
Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs |
TIGEr: Text-to-Image Grounding for Image Caption Evaluation |
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research |
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation |
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment |
Self-Supervised Dialogue Learning |
Self-Supervised Learning for Contextualized Extractive Summarization |
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models |
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation |
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning |
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation |
XL-NBT: A Cross-lingual Neural Belief Tracking Framework |
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling |
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Network |
Video Captioning via Hierarchical Reinforcement Learning |
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning |
Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer |
Deep Reinforcement Learning for Visual Object Tracking in Videos |
![]() |
ProjectCloak: Remove Unwanted Objects in Video |
![]() |
ArtisticEye: A Real-time Application for High-resolution Artistic Style Transfer |
Google AI, Mountain View, US        Research Intern,   Summer 2019        Mentors: Sujith Ravi, Zornitsa Kozareva |
|
Facebook AI Research (FAIR), Menlo Park, US        Graduate Researcher,   Spring 2019        Mentors: Xinlei Chen, Marcus Rohrbach, Dhruv Batra |
|
Microsoft Research AI, Redmond, US        Research Intern,   Summer 2018        Mentors: Lei Zhang, Asli Celikyilmaz, Jianfeng Gao |
|
Adobe Research, San Francisco, US        Research Intern,   Summer 2017        Mentors: Geoffrey Oxholm, Oliver Wang, Eli Shechtman, Mike Lukac |
|
Adobe Research, San Francisco, US        Research Intern,   Summer 2016        Mentor: Geoffrey Oxholm |
|
Exacloud Inc., Hangzhou, China        Software Engineer Intern,   12. 2014 - 03. 2015 |
|
HCI, Graphics and Computer Vision Group, HKU        Research Assistant, Summer 2014        Advisor: Yizhou Yu |
|