Multimodal and Multilingual Knowledge Graphs

4 May, 2020 by NExT

Introduction & Motivation

Knowledge Graph (KG) has attracted much attention and has been successfully applied to many applications. This project aims to develop technologies to construct and populate Multilingual Multimodal Knowledge Graphs (M3KG), where entities and relations can be of multilingual and multimodal forms. In particular, we are interested in developing an automated framework for constructing M3KG from the Web and domain data from the domain of interests, such as the wellness domain.

Take the wellness domain as an example, where it involves knowledge of foods, nutrition, lifestyle and diseases, which is closely related to a person’s well-being. The construction of a Wellness KG has the following four challenges. First, the annotations are hard to obtain due to the requirements of background and often expert knowledge. Thus, we aim to maximize the use of existing available resources (e.g., structural and application data) for the KG construction. Second, the contents (texts and images) are difficult to understand because of the use of specialized vocabulary. Third, the correctness of knowledge is crucial to meet the professional demands. Fourth, the content may come in multilingual form, each with slightly different scope and interpretation of knowledge.

Current Research

The goal of current research is to tackle the above challenges. In particular, we consider the integration of English and Chinese entities in the wellness domain so as to include the complementary information from two different cultures. The three key research focuses are on: (a) knowledge-guided deep learning as fundamental research to empower the models with capability in deep understanding of heterogeneous data for knowledge extraction systems; (b) domain-specific information extraction (IE) in low-resource environment; and (c) application driven KG completion.

There are four stages for this research.

First, we utilize KG to enhance text understanding by jointly training BERT with KG related tasks [1], which can provide effective text representations for various IE models, similar to our previous research of joint representational learning of words and entities [2].

Second, we utilize knowledge models to alleviate the annotation requirements of the current models, by focusing on the six subtasks of IE: (1) we merge existing structured data via a Multi-channel Graph Neural Network (MuGNN) [3] that can learn the alignment-oriented entity embeddings through joint KG inference and alignment; and further improve it by introducing various attributes and values; (2) we qualify the best of weakly labelled data using partial CRFs with non-entity sampling [4] for named entity recognition; (3) we integrate both local contextual features and global coherence information for entity linking [5]; (4) we improve long tail relation extraction by transferring knowledge from relations with sufficient training data based on relation prototypes; (5) we incorporate the open-domain trigger knowledge for event extraction via a teacher-student model [6]; and (6) we extend the above IE techniques to handle multiple modalities including text and images [7].

Third, we explore user-item interactions to predict the missing links among the entities, and utilize the extracted structured knowledge to improve the accuracy and explainability of the recommendation [8].

Fourth, we develop an annotation platform to ensure the integrity and correctness of knowledge with the help of annotators as well as machine-generated candidates. The resulting system can continuously maintain the domain-specific KG by checking the newly extracted knowledge.

Plans for Future Research

First, we are interested in further investigating the knowledge-guided text understanding and generation. Here we plan to apply it to an important real-world application in order to explore better communication between laymen and experts [9]. Second, we plan to enhance current research by exploring other forms of acquiring application data for KG completion. One example is the use of conversation, also known as interactive information extraction, that keeps human in the loop to find missing knowledge, while improving the conversation performance via the expanded KG. Third, we plan to incorporate multi-modality information, which are prevalent on the Web and contain more comprehensive knowledge than each modality on its own. Fourth, we will address one specific challenge of domain specific IE which is the need to deal with fine-grained knowledge and the high cost of domain experts. Here we will explore issues in fine-grained taxonomy.


  1. Yixin Cao, Lifu Huang, Zhiyuan Liu, Xiangnan He, Heng Ji, Tat-Seng Chua. TK-Bert: Bridging Text and Knowledge Graph with Contextual Entity Embeddings. Internal Report, 2020.
  2. Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu, Chengjiang Li, Xu Chen, Tiansi Dong: Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision. EMNLP 2018: 227-237.
  3. Yixin Cao, Zhiyuan Liu, Chengjiang Li, Zhiyuan Liu, Juanzi Li, and Tat-Seng Chua: Multi-Channel Graph Neural Network for Entity Alignment. ACL 2019: 1452-1461.
  4. Yixin Cao, Zikun Hu, Tat-Seng Chua, Zhiyuan Liu, Heng Ji: Low-Resource Name Tagging Learned with Weakly Labeled Data. EMNLP 2019.
  5. Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu. Neural collective entity linking. COLING2018.
  6. MeiHan Tong, Shuai Wang, Yixin Cao, Bin Xu, Lei Hou, Juanzi Li and Jun Xie. Improving Event Detection via Open-domain Event Trigger Knowledge. ACL2020.
  7. Meihan Tong, Shuai Wang, Yixin Cao, Bin Xu, Juanzi Li, Lei Hou, Tat-Seng Chua. Image Enchanced Event Detection in News Articles. AAAI2020.
  8. Yixin Cao, Xiang Wang, Xiangnan He, Zikun Hu, Tat-Seng Chua: Unifying Knowledge Graph Learning and Recommendation: Towards a Better Understanding of User Preference. WWW 2019: 151-161.
  9. Yixin Cao, Zhiyuan Liu, Liangming Pan, Min-Yen Kan, Zhiyuan Liu, Tat-Seng Chua. Expertise Style Transfer A New Task Towards Better Communication between Experts and Laymen. ACL2020.


Video Relation Inference and Content Understanding

4 May, 2020

Multimodal and Multilingual Knowledge Graphs

4 May, 2020

Explainable AI

4 May, 2020

Recommendation Technology

4 May, 2020

Multimodal Conversational Search

4 May, 2020

Dialogue and Interactive Systems

4 May, 2020

Heterogeneous Data Mining for Fintech

4 May, 2020

Visually-Aware Fashion Computing

4 May, 2020