Honglei Liu's Homepage

About me

Builders gonna build! I'm building Tofu, with a mission to revolutionize B2B Marketing with Generative AI. Previously, I worked as a Sr. Manager at Twitter and Head of Machine Learning and Data Platform at Fast. Before that, I was also a Research Scientist Manager at Facebook (now Meta) supporting a team that focuses on building Multimodal AI Assistant.

I received my Ph.D. degree from the Department of Computer Science, University of California, Santa Barbara, under the supervision of Prof. Xifeng Yan. During my Ph.D Studies, I worked on various research projects on sequence mining, information extraction, active learning and deep learning. In general, my research was focused on developing better knowledge extraction tools for sequence data (e.g., biological sequences, event streams, text corpus). Before I started my Ph.D studies at UCSB, I also obtained B.S. and M.S. in Computer Science from Northeastern University, China.

What's New

03/2023: Tofu is officially incorporated! We are putting top-of-funnel (TOFU) on autopilot for B2B marketing teams!
08/2021: Paper "NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions" accepted to EMNLP 2021.
08/2021: Open-sourced NUANCED: a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.
03/2021: Paper "Adding Chit-Chats to Enhance Task-Oriented Dialogues" accepted to NAACL 2021.
09/2020: Paper "User Memory Reasoning for Conversational Recommendation" accepted to COLING 2020.
10/2019: Open-sourced ReAgent: a modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for RL and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.
12/2018: Open-sourced PyText: a deep-learning based NLP modeling framework now being used as the goto platform at Facebook and the open source community.
07/2017: We developed a new motif discovery tool, DeepMotif, that acheives even better performance than ASC+MEME (already 10,000 times faster than MEME). It's 10-100 times faster and doesn't rely on MEME. Learn more about them here. For more information or licensing, please contact me.
06/2017: Our motif discovery solution, ASC+MEME (paper), is licensed to SerImmune Inc. funded by NIH, illumina, Merck, etc. to find motifs from massive protein sequences generated by modern sequencing techniques.

Publications

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu. Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2021). [paper][data]
Adding Chit-Chats to Enhance Task-Oriented Dialogues
Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021). [paper]
User Memory Reasoning for Conversational Recommendation
Hu Xu, Seungwhan Moon, Honglei Liu, Bing Liu, Pararth Shah, Bing Liu, Philip S. Yu. Proc. of Int. Conf. on Computational Linguistics (COLING 2020). [paper]
Federated User Representation Learning
Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin. arXiv preprint arXiv:1909.12535 (2019). [paper]
Active Federated Learning
Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar. Workshop on Federated Learning for Data Privacy and Confidentiality at Neural Information Processing Systems (NeurIPS 2019). [paper]
Global Textual Relation Embedding for Relational Understanding
Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su. Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL 2019). (Short Paper) [paper]
Explore-Exploit: A Framework for Interactive and Online Learning
Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin. Systems for Machine Learning Workshop at Neural Information Processing Systems (NeurIPS 2018). [paper]
Open-sourced as ReAgent , a modular, end-to-end platform for building reasoning systems
PoQaa: Text Mining and Knowledge Sharing for Scientific Publications
Keqian Li, Ping Zhang, Honglei Liu, Hanwen Zha, Xifeng Yan. Proc. of Int. Conf. on Knowledge Discovery and Data Mining (KDD 2018). (demo) [paper][video]
In Vitro Validation of in Silico Identified Inhibitory Interactions
Honglei Liu, Daniel Bridges, Connor Randall, Sara A. Solla, Bian Wu, Paul Hansma, Xifeng Yan, Kenneth S. Kosik, Kristofer Bouchard. Journal of Neuroscience Methods 321 (2019): 39-48. [paper]
Global Relation Embedding for Relation Extraction
Yu Su^*, Honglei Liu^*, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). (*: Equal Contribution) [paper][source code]
Active Learning of Functional Networks from Spike Trains
Honglei Liu, Bian Wu. SIAM Int. Conf. on Data Mining (SDM 2017). [paper][supplementary materials][source code]
Fast Motif Discovery in Short Sequences
Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, Kenneth S. Kosik. Proc. of Int. Conf. on Data Engineering (ICDE 2016). [paper] [slides] [poster] [software]
Software licensed to SerImmune Inc. to produce real world value
ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases
Xiaochun Yang, Honglei Liu, Bin Wang. Proc. of Int. Conf. on Very Large Data Bases (VLDB 2012). [paper][source code]
Approximate Substring Query Algorithms Supporting Local Optimal Matching
Honglei Liu, Xiaochun Yang, Bin Wang, Rong Jin. Journal of Frontiers of Computer Science and Technology, 2011. [source code]

Open Source

NUANCED
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns. The dataset focuses on realistic settings where user preferences are extracted from real-world Yelp Open Dataset and paraphrased into natural user responses.

ReAgent
A modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for reinforcement learning and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.

PyText
A deep-learning based NLP modeling framework built on PyTorch. PyText is used at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale.

ASC
A fast motif discovery tool that is 10,000 times faster than MEME while preserving the same accuracy. ASC+MEME reduces the running time of MEME from weeks to a few minutes with even better accuracy. ASC was licensed to SerImmune to find motifs from massive protein sequences generated by modern sequencing techniques.

Fun Stuff

Facebook Lunar New Year Gala

I was the organizer/director of the 2020 Facebook Lunar New Year Gala, which had 30+ volunteers, 150+ performers, 900+ guests and FB VP level speakers attended. We spent 4+ months in preparation for this. Every year, we organize this event so that our co-workers and their families can have a place to get together and celebrate lunar new year.

Shanghui Life

An iOS app that allows users to post, find and join events. Users can also invite friends to events, search nearby businesses, follow others, post texts and pictures, etc. I did all the coding for the backend.

Intelligent car

An intelligent car that can navigate by itself and follow a road track. I was in charge of the software part. We won the first prize in a national competition. Check more photos here.

Acoustic positioning car

A car that can locate its position by sending / receving sound wave signals and do a series of tasks. I was in charge of the software part. We won the second prize in a national competition with our design. Check more photos here.

I also did some other fun stuffs when I was an undergraduate student, including an animal behavior analysis system and a visualization software for Space Ad-hoc Networks.