I'm currently building something exciting in stealth mode. Previously, I worked as a Sr. Manager at Twitter after leading the Machine Learning and Data Platform efforts at Fast. Before that, I was also a Research Scientist Manager at Facebook (now Meta) supporting a team that focuses on building Multimodal AI Assistant. I received my Ph.D. degree from the Department of Computer Science, University of California, Santa Barbara, under the supervision of Prof. Xifeng Yan. During my Ph.D Studies, I worked on various research projects on sequence mining, information extraction, active learning and deep learning. In general, my research was focused on developing better knowledge extraction tools for sequence data (e.g., biological sequences, event streams, text corpus). In addition, I also had some experiences with bioinformatics research such as DNA sequences assembly, SNPs calling and gene expression analysis.
Before I started my Ph.D. studies at UCSB, I also obtained B.S. and M.S. in Computer Science from Northeastern University, China. Find more details about me in my CV.
- 08/2021: Paper "NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions" accepted to EMNLP 2021.
- 08/2021: Open-sourced NUANCED: a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.
- 03/2021: Paper "Adding Chit-Chats to Enhance Task-Oriented Dialogues" accepted to NAACL 2021.
- 09/2020: Paper "User Memory Reasoning for Conversational Recommendation" accepted to COLING 2020.
- 10/2019: Open-sourced ReAgent: a modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for RL and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.
- 12/2018: Open-sourced PyText: a deep-learning based NLP modeling framework now being used as the goto platform at Facebook and the open source community.
- 07/2017: We developed a new motif discovery tool, DeepMotif, that acheives even better performance than ASC+MEME (already 10,000 times faster than MEME). It's 10-100 times faster and doesn't rely on MEME. Learn more about them here. For more information or licensing, please contact me.
- 06/2017: Our motif discovery solution, ASC+MEME (paper), is licensed to SerImmune Inc. funded by NIH, illumina, Merck, etc. to find motifs from massive protein sequences generated by modern sequencing techniques.
NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu. Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2021). [paper][data]
Adding Chit-Chats to Enhance Task-Oriented Dialogues
Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021). [paper]
User Memory Reasoning for Conversational Recommendation
Hu Xu, Seungwhan Moon, Honglei Liu, Bing Liu, Pararth Shah, Bing Liu, Philip S. Yu. Proc. of Int. Conf. on Computational Linguistics (COLING 2020). [paper]
Federated User Representation Learning
Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin. arXiv preprint arXiv:1909.12535 (2019). [paper]
Active Federated Learning
Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar. Workshop on Federated Learning for Data Privacy and Confidentiality at Neural Information Processing Systems (NeurIPS 2019). [paper]
Global Textual Relation Embedding for Relational
Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su. Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL 2019). (Short Paper) [paper]
Explore-Exploit: A Framework for Interactive and Online
Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin. Systems for Machine Learning Workshop at Neural Information Processing Systems (NeurIPS 2018). [paper]
Open-sourced as ReAgent , a modular, end-to-end platform for building reasoning systems
Text Mining and Knowledge Sharing for Scientific Publications
Keqian Li, Ping Zhang, Honglei Liu, Hanwen Zha, Xifeng Yan. Proc. of Int. Conf. on Knowledge Discovery and Data Mining (KDD 2018). (demo) [paper][video]
In Vitro Validation of in Silico Identified Inhibitory
Honglei Liu, Daniel Bridges, Connor Randall, Sara A. Solla, Bian Wu, Paul Hansma, Xifeng Yan, Kenneth S. Kosik, Kristofer Bouchard. Journal of Neuroscience Methods 321 (2019): 39-48. [paper]
Global Relation Embedding for Relation Extraction
Yu Su*, Honglei Liu*, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). (*: Equal Contribution) [paper][source code]
Active Learning of Functional Networks from Spike Trains
Honglei Liu, Bian Wu. SIAM Int. Conf. on Data Mining (SDM 2017). [paper][supplementary materials][source code]
Fast Motif Discovery in Short Sequences
Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, Kenneth S. Kosik. Proc. of Int. Conf. on Data Engineering (ICDE 2016). [paper] [slides] [poster] [software]
Software licensed to SerImmune Inc. to produce real world value
ALAE: Accelerating Local Alignment with Affine Gap Exactly in
Xiaochun Yang, Honglei Liu, Bin Wang. Proc. of Int. Conf. on Very Large Data Bases (VLDB 2012). [paper][source code]
Approximate Substring Query Algorithms Supporting Local Optimal
Honglei Liu, Xiaochun Yang, Bin Wang, Rong Jin. Journal of Frontiers of Computer Science and Technology, 2011. [source code]
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns. The dataset focuses on realistic settings where user preferences are extracted from real-world Yelp Open Dataset and paraphrased into natural user responses.
A modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for reinforcement learning and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.
A deep-learning based NLP modeling framework built on PyTorch. PyText is used at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale.
A fast motif discovery tool that is 10,000 times faster than MEME while preserving the same accuracy. ASC+MEME reduces the running time of MEME from weeks to a few minutes with even better accuracy. ASC was licensed to SerImmune to find motifs from massive protein sequences generated by modern sequencing techniques.
- 07/2022 - 01/2023: Senior Engineering Manager - Machine Learning, Twitter
- 04/2022 - 07/2022: Senior Engineering Manager - Machine Learning, Affirm
- 09/2021 - 04/2022: Head of Machine Learning and Data Platform, Fast
- 04/2020 - 09/2021: Research Scientist Manager, Facebook
- 02/2020 - 04/2020: Staff Research Scientist, Facebook
- 01/2019 - 02/2020: Senior Research Scientist, Facebook
- 07/2017 - 12/2018: Research Scientist, Facebook
- 06/2016 - 09/2016: Intern, Facebook
Topic: Indexing and Mining Billions of Time Series
- 07/2015 - 09/2015: Bioinformatics Intern, Illumina
Topic: Fast Specificity Checking for Multiplex PCR Primer Design