Heterogeneous Data Mining for Fintech
Introduction and Motivation
In the last decade, digitalization and automation have empowered the fast technology development in finance, enabling various financial applications such as the peer-to-peer lending, online insurance, robo-advisor, and algorithmic trading, etc. Owing to such development, large-scale alternative financial data are generated in different financial scenarios, including banking, insurance, and investment. The continuously increased volume of such alternative data pushes the financial practitioners to embrace intelligent technology for managing and analyzing these alternative data, so as to improve the efficiency of financial working process and the quality of financial services. For instance, institutes like JPMorgan are developing AI and machine learning technology in investment banking and hedge funds to help shape trading strategies by analyzing the alternative financial data 1.
Analyzing financial data can be viewed as a heterogeneous data mining task since the data are in different formats, including time-series (e.g., historical prices), text (e.g., analyst reports), images (e.g., satellite images), and graphs (e.g., transaction graphs). Research on heterogeneous data mining largely focuses on: heterogeneous data understanding and predictive modeling. The target of heterogeneous data understanding is to encode the contents of the heterogeneous data, e.g., projecting the data into an embedding. Predictive modeling aims to make predictions of interest for financial practitioners so as to facilitate the management (e.g., filtering) of the data and making decisions (e.g., trading) from the data. Owing to the extraordinary representation ability, deep neural network (DNN) has become a promising solution for heterogeneous data mining in some scenarios such as e-commerce. However, most of the existing works overlook the key properties of financial data such as the stochasticity of historical prices and the co-movement between assets, and thus unsatisfying the requirements in finance.
NExT++ focuses on developing DNNs that are able to capture the key properties of financial data, especially the data of time-series, text, and graphs. Figure 1 shows an example of these three types of financial data in the scenario of investment where five specific properties of these heterogeneous data are considered in our current research.
In order to capture the stochasticity of historical prices that are the most important data for price prediction, we devise a new DNN, named Adv-ALSTM  that is equipped with adversarial training. In order to incorporate relational domain knowledge, we devise a DNN, named Relational Stock Ranking , which incorporates the relations between stocks, to improve prediction performance. Moreover, to improve the utilization of such relational data, i.e., graphs, we propose Graph Adversarial Training  and Cross-GCN , which enhance the generalization ability and representation ability, respectively.
For financial texts, we devise new pre-trained language models which are empowered with numeracy ability, i.e., encoding and understanding the numerical information within an input document . Moreover, for the quantification of textual data, we devise a new DNN which is able to infer the time-horizon of financial texts relevant to a stock.
Plan for Future Research
First, for conventional time-series data, we will develop a self-supervised representation learning model to learn the asset representations over different markets, and facilitate the asset price prediction of markets without sufficient data such as the commodity market. Second, for graph data, we will explore the modeling of uncertainty prediction from the perspective of graph structure and graph neural networks that eliminate noisy edges. Moreover, we will explore the central theme of utilizing financial knowledge graphs to enhance trading strategies. Third, for textual data, we will further investigate the pre-training of language models in different languages that are able to capture the specific properties of financial documents. Moreover, we will release the benchmark datasets for representative financial text analysis applications of great practical value. Fourth, we are also interested in studying the fundamental issues of DNN-based asset price prediction solutions such as the aggregation of different models and the update of models along time. We would like to tackle such issues by developing deep Reinforcement Learning and Meta-Learning methods that are suitable for financial markets.
- Fuli Feng, et al. “Enhancing Stock Movement Prediction with Adversarial Training.” IJCAI 2019.
- Fuli Feng, et al. “Temporal Relational Ranking for Stock Prediction.” ACM Transactions on Information Systems (TOIS) 37.2, 2019: 27.
- Fuli Feng, et al. “Graph adversarial training: Dynamically regularizing based on graph structure.” IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019.
- Fuli Feng, et al. “Cross-GCN: Enhancing Graph Convolutional Network with k-Order Feature Interactions.” Submitted to TKDE, 2020.
- Fuli Feng, et al. ” Numerical Text Understanding: Pre-training Language Model and Benchmark Evaluation.” Submitted to SIGKDD Applied Data Science Track, 2020.