About me

Builders gonna build! I'm a co-founder of Tofu, where we're rethinking B2B marketing from the ground up with generative AI — one platform grounded in a company's own knowledge, instead of thirty single-use tools. Before Tofu, I led machine learning teams at Twitter and Affirm, was Head of Machine Learning and Data Platform at Fast, and was a Research Scientist Manager at Facebook (now Meta), where my team did both research and engineering on multimodal AI assistants.

I received my Ph.D. in Computer Science from UC Santa Barbara, advised by Prof. Xifeng Yan. My research focused on knowledge extraction from sequence data — biological sequences, event streams, text corpora — spanning sequence mining, information extraction, active learning, and deep learning. Before UCSB, I earned my B.S. and M.S. in Computer Science from Northeastern University in China.

What's New

  • 02/2025: Tofu raised a $12M Series A led by SignalFire after 12x revenue growth in our first year.
  • 10/2023: Tofu raised a $5M seed led by Index Ventures to build our core engine: knowledge-graph-grounded LLM generation that keeps content on-brand at scale.
  • 03/2023: Tofu is officially incorporated! We are putting top-of-funnel (TOFU) on autopilot for B2B marketing teams!
  • 08/2021: Paper "NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions" accepted to EMNLP 2021.
  • 08/2021: Open-sourced NUANCED: a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.
  • 03/2021: Paper "Adding Chit-Chats to Enhance Task-Oriented Dialogues" accepted to NAACL 2021.
  • 09/2020: Paper "User Memory Reasoning for Conversational Recommendation" accepted to COLING 2020.
  • 10/2019: Open-sourced ReAgent: a modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for RL and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.
  • 12/2018: Open-sourced PyText: a deep-learning based NLP modeling framework now being used as the goto platform at Facebook and the open source community.
  • 07/2017: We developed a new motif discovery tool, DeepMotif, that acheives even better performance than ASC+MEME (already 10,000 times faster than MEME). It's 10-100 times faster and doesn't rely on MEME. Learn more about them here. For more information or licensing, please contact me.
  • 06/2017: Our motif discovery solution, ASC+MEME (paper), is licensed to SerImmune Inc. funded by NIH, illumina, Merck, etc. to find motifs from massive protein sequences generated by modern sequencing techniques.

Publications

  • NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions
    Zhiyu Chen, Honglei Liu, Hu Xu, Seungwhan Moon, Hao Zhou, Bing Liu. Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2021). [paper][data]
  • Adding Chit-Chats to Enhance Task-Oriented Dialogues
    Kai Sun, Seungwhan Moon, Paul Crook, Stephen Roller, Becka Silvert, Bing Liu, Zhiguang Wang, Honglei Liu, Eunjoon Cho, Claire Cardie. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2021). [paper]
  • User Memory Reasoning for Conversational Recommendation
    Hu Xu, Seungwhan Moon, Honglei Liu, Bing Liu, Pararth Shah, Bing Liu, Philip S. Yu. Proc. of Int. Conf. on Computational Linguistics (COLING 2020). [paper]
  • Federated User Representation Learning
    Duc Bui, Kshitiz Malik, Jack Goetz, Honglei Liu, Seungwhan Moon, Anuj Kumar, Kang G. Shin. arXiv preprint arXiv:1909.12535 (2019). [paper]
  • Active Federated Learning
    Jack Goetz, Kshitiz Malik, Duc Bui, Seungwhan Moon, Honglei Liu, Anuj Kumar. Workshop on Federated Learning for Data Privacy and Confidentiality at Neural Information Processing Systems (NeurIPS 2019). [paper]
  • Global Textual Relation Embedding for Relational Understanding
    Zhiyu Chen, Hanwen Zha, Honglei Liu, Wenhu Chen, Xifeng Yan, Yu Su. Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL 2019). (Short Paper) [paper]
  • Explore-Exploit: A Framework for Interactive and Online Learning
    Honglei Liu, Anuj Kumar, Wenhai Yang, Benoit Dumoulin. Systems for Machine Learning Workshop at Neural Information Processing Systems (NeurIPS 2018). [paper]
    Open-sourced as ReAgent , a modular, end-to-end platform for building reasoning systems
  • PoQaa: Text Mining and Knowledge Sharing for Scientific Publications
    Keqian Li, Ping Zhang, Honglei Liu, Hanwen Zha, Xifeng Yan. Proc. of Int. Conf. on Knowledge Discovery and Data Mining (KDD 2018). (demo) [paper][video]
  • In Vitro Validation of in Silico Identified Inhibitory Interactions
    Honglei Liu, Daniel Bridges, Connor Randall, Sara A. Solla, Bian Wu, Paul Hansma, Xifeng Yan, Kenneth S. Kosik, Kristofer Bouchard. Journal of Neuroscience Methods 321 (2019): 39-48. [paper]
  • Global Relation Embedding for Relation Extraction
    Yu Su*, Honglei Liu*, Semih Yavuz, Izzeddin Gur, Huan Sun, Xifeng Yan. Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). (*: Equal Contribution) [paper][source code]
  • Active Learning of Functional Networks from Spike Trains
    Honglei Liu, Bian Wu. SIAM Int. Conf. on Data Mining (SDM 2017). [paper][supplementary materials][source code]
  • Fast Motif Discovery in Short Sequences
    Honglei Liu, Fangqiu Han, Hongjun Zhou, Xifeng Yan, Kenneth S. Kosik. Proc. of Int. Conf. on Data Engineering (ICDE 2016). [paper] [slides] [poster] [software]
    Software licensed to SerImmune Inc. for motif discovery in massive protein sequence data
  • ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases
    Xiaochun Yang, Honglei Liu, Bin Wang. Proc. of Int. Conf. on Very Large Data Bases (VLDB 2012). [paper][source code]
  • Approximate Substring Query Algorithms Supporting Local Optimal Matching
    Honglei Liu, Xiaochun Yang, Bin Wang, Rong Jin. Journal of Frontiers of Computer Science and Technology, 2011. [source code]

Open Source

NUANCED
NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns. The dataset focuses on realistic settings where user preferences are extracted from real-world Yelp Open Dataset and paraphrased into natural user responses.
ReAgent
A modular, end-to-end platform for building reasoning systems. It closes the loop of turning actions into feedback, and feedback into training data for reinforcement learning and online learning. ReAgent is used at Facebook to drive tens of billions of decisions per day.
PyText
A deep-learning based NLP modeling framework built on PyTorch. PyText is used at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale.
ASC
A fast motif discovery tool that is 10,000 times faster than MEME while preserving the same accuracy. ASC+MEME reduces the running time of MEME from weeks to a few minutes with even better accuracy. ASC was licensed to SerImmune to find motifs from massive protein sequences generated by modern sequencing techniques.