Xin (Eric) WangAssistant Professor, Computer Science and Engineering, UC Santa CruzHead of Research, SimularEmail: xwang366 [at] ucsc [dot] edu |
|
Winter 2021 | CSE 142: Machine Learning |
Spring 2021 | CSE 290C: Multimodal Deep Learning |
Fall 2021 | CSE 142: Machine Learning |
Winter 2022 | CSE 244B: Machine Learning for Natural Language Processing |
Spring 2022 | CSE 142: Machine Learning |
Fall 2022 | CSE 142: Machine Learning |
Winter 2023 | CSE 244B: Machine Learning for Natural Language Processing |
Summer 2023 | California State Summer School for Mathematics & Science: AI Cluster |
Spring 2024 | CSE 142: Machine Learning |
[NEW!] | Three papers accepted to EMNLP 2024! |
[NEW!] | Our Discffusion paper is accepted to TMLR 2024! |
[NEW!] | Two papers accepted to ECCV 2024! |
[NEW!] | Two papers accepted to ACL 2024! |
[NEW!] | Two papers accepted to NAACL 2024! |
[NEW!] | Serving as Area Chair for ICLR 2025, NeurIPS 2024, and COLM 2024. |
[NEW!] | Our lab received a research grant from Microsoft. Thanks Microsoft! |
[NEW!] | Our lab received a gift award from Adobe. Thanks Adobe! |
[NEW!] | Our lab received multiple gift awards from eBay and Snap. Thanks eBay and Snap! |
[NEW!] | Two workshops are accepted to ACL 2024! Will be co-organizing the 3rd Workshop on Advances in Language and Vision Research (ALVR 2024) and the Fourth International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP 2024) in Bangkok, Thailand. |
[NEW!] | Invited talk at Yale University (10/2023). |
[NEW!] | Three papers accepted to NeurIPS 2023! Congratulations to all authors! |
[NEW!] | Three papers accepted to EMNLP 2023! Congratulations to all authors! |
[NEW!] | Our Athena team was awarded Second Place (Science Innovation Winner, $50,000) in the Alexa Prize SocialBot Grand Challenge 5! |
[NEW!] | Serving as Area Chair for ICLR 2024. |
[NEW!] | Our SlugJARVIS team won Third Place ($50,000) in the inaugural Alexa Prize SimBot Challenge! NEWS COVERAGE |
[NEW!] | Two papers on (1) Ariel Vision-and-Dialog Navigation and (2) Text-to-Image Association Test are accepted to ACL 2023! |
[NEW!] | Our ESC paper is accepted to ICML 2023! |
[NEW!] | Our SlugJARVIS team advances to the finals of the inaugural Alexa Prize SimBot Challenge! Check out Amazon News for more information about this and UCSC News for our three teams of all three Alexa Prize Challenges! |
[NEW!] | Co-organizing the 5th Workshop on Closing the Loop Between Vision and Language (CLVL) at ICCV 2023. |
[NEW!] | Serving as Area Chair for NeurIPS 2023. |
[NEW!] | Invited talk at the CVPR 2023 VizWiz Grand Challenge Workshop (06/2023). |
[NEW!] | Invited talk at Google Research and UCI (3/2023). |
[NEW!] | Invited talk at KAUST and USC (2/2023). |
[NEW!] | Two papers on (1) Training-Free Structured Diffusion Guidance and (2) Neuro-Symbolic Procedural Planning with Commonsense Prompting (Spotlight) are accepted to ICLR 2023! |
[NEW!] | Three papers on (1) Multimodal Graph Transformer, (2) Imagination-Based Automatic Evaluation, and (3) Imagination-Guided Open-Ended Text Generation are accepted to EACL 2023! |
[NEW!] | Co-organizing the Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP) at EMNLP 2023. |
[NEW!] | Serving as Area Chair for ACL 2023, ICLR 2023, and EMNLP 2022. |
[NEW!] | Our Sage team received an Amazon Alexa Prize Award to work on Alexa Prize TaskBot Challenge 2. Thanks Amazon! |
[NEW!] | Our paper "Parameter-Effcient Model Adaptation for Vision Transformers" is accepted to AAAI 2023! |
[NEW!] | Our Athena team received an Amazon Alexa Prize Award to work on Alexa Prize SocialBot Grand Challenge 5. Thanks Amazon! |
[NEW!] | Our paper "CPL: Counterfactual Prompt Learning for Vision and Language Models" is accepted to EMNLP 2022! |
[NEW!] | Our VLMBench paper is accepted to NeurIPS 2022 (Datasets and Benchmarks)! Check out the new compositional benchmark for vision-and-language robotic manipulation HERE! |
[NEW!] | Invited talk at Adobe Research (08/2022). |
[NEW!] | Our papers on (1) Privacy-preserving Federated Vision-and-Language Navigation and (2) Language-guided Artistic Style Transfer are accepted to ECCV 2022! |
[NEW!] | Our SlugJARVIS team won the Alexa Prize SimBot Public Benchmark Challenge! link |
[NEW!] | Our paper on Understanding Instance-Level Impact of Fairness Constraints accepted to ICML 2022! |
[NEW!] | Two papers accepted to NAACL 2022 as Oral presentations! Topics include (1) Imagination-Augmented Natural Language Understanding and (2) Diagnosing Vision-and-Language Navigation. |
[NEW!] | Invited talk at Fudan University (03/2022). |
[NEW!] | Two papers accepted to CVPR 2022! Topics include (1) Compositional Temporal Grounding and (2) Language-based Video Editing. |
[NEW!] | Three papers accepted to ACL 2022! Topics include (1) Vision-and-Language Navigation Survey, (2) Multilingual Fairness, and (3) Interpretable Research Replication Prediction. |
[NEW!] | We have received a Google Faculty Research Award. Thanks Google! |
[NEW!] | Invited speaker at the CVPR 2022 Workshop on Open-Domain Retrieval Under a Multi-Modal Setting. |
[NEW!] | Invited talk at USC ISI (02/2022). |
[NEW!] | Our SlugJARVIS team received an Amazon Alexa Prize Award to work on Alexa Prize SimBot Challenge. Thanks Amazon! |
[NEW!] | Serving as Area Chair for ACL 2022 and NAACL 2022. |
[NEW!] | We have received AAII Interdisciplinary Research Award. |
[NEW!] | Invited talk at Microsoft Research (11/2021). |
[NEW!] | Our paper on Mitigating Gender Bias in Image Search is accepted to EMNLP 2021 as an Oral paper. Congratulations Jialu! |
[NEW!] | Invited talk at Stanford Vision Lab (10/2021). |
[NEW!] | Received Google Cloud Research Credits. |
[NEW!] | Our VALUE paper is accepted to NeurIPS 2021 (Datasets and Benchmarks). Congratulations to all the authors! |
[NEW!] | Serving as Senior Program Committee (SPC) for AAAI 2022 and IJCAI-ECAI 2022. |
[NEW!] | Co-organizing the 4th Workshop on Closing the Loop Between Vision and Language (CLVL) at ICCV 2021! |
[NEW!] | I am giving a Tutorial on "From VQA to VLN: Recent Advances in Vision-and-Language Research" at CVPR 2021! |
[NEW!] | Co-organizing the Second Workshop on Advances in Language and Vision Research (ALVR) at NAACL 2021! |
[NEW!] | I am giving a keynote talk at the Third Workshop on Multimodal Artificial Intelligence at NAACL 2021 on June 6th! |
[NEW!] | Our paper on Visual Question Rewriting was accepted to SIGIR 2021! |
[NEW!] | I am serving as Area Chair for CoNLL 2021 and NLPCC 2021. |
[NEW!] | Invited talk at Arizona State University. |
[NEW!] | Two papers on multimodal style transfer learning for VLN and visual comparison were accepted to EACL 2020! |
[NEW!] | I am serving as Area Chair for NAACL 2021. |
[NEW!] | I am serving as Senior Program Committee (SPC) for IJCAI 2021. |
[NEW!] | Three papers were accepted to EMNLP 2020 (two conference papers and one Findings paper)! |
[NEW!] | Two papers were accepted to ECCV 2020 (the adversarial path sampling paper was seleted as Spotlight)! |
[NEW!] | I successfully defended my Ph.D. Dissertation Closing the Loop Between Language and Vision for Embodied Agents. Thanks to the committee and everyone who has helped me along the Ph.D. journey! |
[NEW!] | I am serving as Area Chair and Session Chair for EMNLP 2020. |
[NEW!] | Co-organizing the workshop on Advances in Language and Vision Research (ALVR) at ACL 2020! |
[NEW!] | Two papers were accepted to CVPR 2020 (the REVERIE paper was selected as Oral)! |
[03/2020] | Invited panelist at the GPU Technology Conference (GTC) 2020. |
[11/2019] | Organizer of the workshop on Language & Vision with applications to Video Understanding at CVPR 2020. |
[11/2019] | Organizer of the tutorial on Self-Supervised Deep Learning for NLP at AACL-IJCNLP 2020. |
[10/2019] | Invited speaker at the ICCV 2019 Workshop on Person In Context. |
[06/2019] | Recipient of the CVPR 2019 Best Student Paper Award. |
[06/2019] | Co-Organizer of the workshop on Closing the Loop Between Vision and Language at ICCV 2019. |
[06/2019] | Invited talk at Facebook AI. |
[01/2019] | Session Chair for AAAI 2019 (natural language processing). |
Agent S: An Open Agentic Framework that Uses Computers Like a Human |
Multimodal Situational Safety |
VIA: Unified Spatiotemporal Video Adaptation for Global and Local Video Editing |
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos |
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA |
FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation |
LLM-Coordination: Evaluating and Analyzing Multi-Agent Coordination Abilities in Large Language Models |
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens |
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding |
Active Listening: Personalized Question Generation in Open-Domain Social Conversation with User Model Based Prompting |
Multimodal Procedural Planning via Dual Text-Image Prompting |
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners |
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing |
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models |
Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA |
ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models |
Navigation as Attackers Wish? Towards Building Byzantine-Robust Embodied Agents under Federated Learning |
ComCLIP: Training-Free Compositional Image and Text Matching |
Photoswap: Personalized Subject Swapping in Images |
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models |
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation |
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests |
Collaborative Generative AI: Integrating GPT-k for Efficient Editing in Text-to-Image Generation |
Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment |
ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation |
Aerial Vision-and-Dialog Navigation |
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation |
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis |
Neuro-Symbolic Procedural Planning with Commonsense Prompting |
Multimodal Graph Transformer for Multimodal Question Answering |
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation |
ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation |
Parameter-efficient Model Adaptation for Vision Transformers |
Athena 3.0: Personalized Multimodal Chatbot with Neuro-symbolic Dialogue Generators |
Sage: A Multimodal Knowledge Graph-based Conversational Agent for Complex Task Guidance |
SlugJARVIS: Multimodal Commonsense Knowledge-based Embodied AI for SimBot Challenge |
CPL: Counterfactual Prompt Learning for Vision and Language Models |
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation |
FedVLN: Privacy-preserving Federated Vision-and-Language Navigation |
Language-Driven Artistic Style Transfer |
Understanding Instance-Level Impact of Fairness Constraints |
Imagination-Augmented Natural Language Understanding |
Diagnosing Vision-and-Language Navigation: What Really Matters |
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents |
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning |
M3L: Language-based Video Editing via Multi-Modal Multi-Level Transformer |
Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions |
Assessing Multilingual Fairness in Pretrained Multimodal Representations |
Interpretable Research Replication Prediction via Variational Contextual Consistency Sentence Masking |
Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search |
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation |
Visual Question Rewriting for Increasing Response Rate |
Multimodal Text Style Transfer for Outdoor Vision-and-Language Navigation |
L2C: Describing Visual Differences Needs Semantic Understanding of Individuals |
Closing the Loop Between Language and Vision for Embodied Agents |
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning |
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations |
Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation |
Environment-agnostic Multitask Learning for Natural Language Grounded Navigation |
Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling |
Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation |
REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments |
Vision-Language Navigation Policy Learning and Adaptation
|
Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs |
TIGEr: Text-to-Image Grounding for Image Caption Evaluation |
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research |
Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation |
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment |
Self-Supervised Dialogue Learning |
Self-Supervised Learning for Contextualized Extractive Summarization |
Towards Generating Long and Coherent Text with Multi-Level Latent Variable Models |
Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation |
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning |
Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation |
XL-NBT: A Cross-lingual Neural Belief Tracking Framework |
No Metrics Are Perfect: Adversarial Reward Learning for Visual Storytelling |
S3D: Single Shot multi-Span Detector via Fully 3D Convolutional Network |
Video Captioning via Hierarchical Reinforcement Learning |
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning |
Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer |
Deep Reinforcement Learning for Visual Object Tracking in Videos |
ProjectCloak: Remove Unwanted Objects in Video |
|
ArtisticEye: A Real-time Application for High-resolution Artistic Style Transfer |
Google AI, Mountain View, US        Research Intern,   Summer 2019        Mentors: Sujith Ravi, Zornitsa Kozareva |
|
Facebook AI Research (FAIR), Menlo Park, US        Graduate Researcher,   Spring 2019        Mentors: Xinlei Chen, Marcus Rohrbach, Dhruv Batra |
|
Microsoft Research AI, Redmond, US        Research Intern,   Summer 2018        Mentors: Lei Zhang, Asli Celikyilmaz, Jianfeng Gao |
|
Adobe Research, San Francisco, US        Research Intern,   Summer 2017        Mentors: Geoffrey Oxholm, Oliver Wang, Eli Shechtman, Mike Lukac |
|
Adobe Research, San Francisco, US        Research Intern,   Summer 2016        Mentor: Geoffrey Oxholm |
|
Exacloud Inc., Hangzhou, China        Software Engineer Intern,   12. 2014 - 03. 2015 |
|
HCI, Graphics and Computer Vision Group, HKU        Research Assistant, Summer 2014        Advisor: Yizhou Yu |