Date

Nov. 21, 2019

Location

1F (class 62115), Department of Statistics, National Cheng Kung University

Introduction

Since two topics, AI and Data Science, are highly related to Statistics, we do invite 7 speakers from Statistics; Applied Mathematics and Computer Science, and focus on the “deep learning"; “complex network analysis"; “fintech" and so on. Hopefully we can share our own research experiences from the different fields and come out the possible future works on these fields.

Organizers

Main organizers for this workshop

Person 1

Ray-Bing Chen

Department of Statistics,
National Cheng Kung University

Person 2

Ying Chen

Department of Mathematics,
National University of Singapore

Invited list

in alphabetical order of surname

Person 2

Ying Chen

National University of Singapore

Person 3

Shih-Feng Huang

National University of Kaohsiung

Person 10

Daniel Jacob

Humboldt-Universität zu Berlin

Person 4

Thorsten Koch

Zuse Institute Berlin

Person 5

Chia-Yen Lee

National Cheng Kung University

Person 7

Chuan-Ju Wang

Academia Sinica

Person 6

Weichung Wang

National Taiwan University

Programs

The schedule of this day

Time                 Event    

10:00 - 10:20

10:20 - 10:30

Registration

Opening ceremony

Session I -- Chaired by Mei-Hui Guo

10:30 - 11:10
 

11:10 - 11:50
 

11:50 - 12:30
 

12:30 - 14:00
 

Session II -- Chaired by Kuo-Jung Lee

14:00 - 14:40
 

14:40 - 15:20
 

15:20 - 15:50
 

Session III-- Chaired by I-Chen Lee

Abstract

in speaking order

Shih-Feng Huang
National University of Kaohsiung

Modeling Financial Time Series with Soft Information

A hysteretic autoregressive model with GARCH effects and soft information, denoted by SHAR-GARCH, is proposed to model financial time series. The soft information contained in the daily news is extracted by the techniques of support vector machine and principal component analysis. A Markov Chain Monte Carlo algorithm is proposed for estimating model parameters. A corresponding risk-neutral SHAR-GARCH model is derived by Esscher transform for option pricing. The returns and options of the S&P500 index and the daily news posted on the website of Reuters are used for our empirical study. The numerical results indicate that the proposed model has satisfactory performances in depicting the dynamics of financial time series and in pricing deep-in-the-money options.

Ying Chen
National University of Singapore

Regularized partially functional autoregressive model

We propose a partially functional autoregressive model (pFAR) to describe the dynamic evolution of serially correlated functional data. This model provides a unified framework to depict both the serial dependence on multiple lagged functional covariates and the associated relation with ultrahigh-dimensional exogenous scalar covariates. Estimation is conducted under a two-layer sparsity assumption, where only a small number of groups and elements are supposed to be active, yet their number and location are unknown in advance. We establish the asymptotic properties of the estimator and perform simulation studies to investigate its finite sample performance. We demonstrate the application of the pFAR model using daily natural gas flow curves data in the high pressure pipeline of German gas transmission network. The gas demand and supply are influenced by their historical values and 85 scalar covariates varying from price to temperature. The model provides insightful interpretation and good out-of-sample forecast accuracy compared to several popular alternative models. This is a joint work with Thorsten Koch and Xiaofei Xu.

Chuan-Ju Wang
Academia Sinica

Textual Data Analytics in Finance

The growing amount of public financial data makes it more and more important to learn how to discover valuable information for financial decision-making. This talk presents our recent studies on exploring and mining soft information in financial reports. This talk will cover several machine learning techniques, such as learning to rank and word embedding, on financial reports for the study of financial risk among companies and for discovering new finance keywords. A brief demonstration on our developed web-based information systems, Fin10K, will be given to show its ability to facilitate the analysis on textual information in finance.

Weichung Wang
National Taiwan University

Toward AI-based Medical Data Analytics and Clinical Workflows

We lay out our plan to build a platform called Artificial Intelligence for Medical Image Analysis (AIMIA) in this talk. The AIMIA platform consists of Artificial Intelligent Engines (AI Engines) and Augmented Intelligence Workflows (AI Workflows). The AI Engines include high-performance algorithms and software modules aiming to extract insightful information from a large volume of medical image datasets accurately, efficiently, and robustly. In particular, the AI Engines include Image Processing, Quantitative Analytics, Deep Learning, Machine Learning, and High Dimensional Data Analysis Toolboxes to analyze medical images. By taking these algorithms and software modules as the building blocks, we further build up innovative AI Workflows in various clinical applications. AI Workflows examples include precision cancer treatments in a lung, hypopharyngeal, hepatocellular carcinoma, digital pathology whole slide image analysis for prostate cancers, pancreatic masses classification and detection, radiotherapy treatment planning in lung cancer, and psychiatric disorders phenotyping. These examples illustrate how we apply the AI Engines to configure AI Workflows in clinical medical cares and biomedical research. AIMIA is also a platform allowing interdisciplinary experts from academia and industry in medical, mathematical, statistical, computational, and information sciences to work together to ensure the research and development efforts can benefit the society broadly.

Thorsten Koch
Zuse Institute Berlin

AI and Data Science at the Department for Mathematical Optimization of ZIB

A few years ago, the world started to collect big data and move into the clouds. After experiencing there was too much data for humans to conclude anything interesting, progress in hardware allowed successes in machine learning, notably regarding neural networks. Now, whenever a particular method starts to show impressive results, immediately, people start to project these successes to the whole area which the method belongs to. Moreover, as ML is a subset of AI, some people started to conclude that now every problem can be solved with AI. However, while neural networks give impressive results on classification, it is not apparent whether this is necessarily the method of choice for making decisions. The biggest successes of AI in decision making like, e.g., AlphaGoZero, are based on tree search employing ML to direct the search. There is a vast potential in the combination of machine learning with mathematical advanced tree search as used in planning and optimization. In this talk, we will try to give a bit of an overview, some insights, and some challenges, as experienced in the projects of our department.

Daniel Jacob
Humboldt-Universität zu Berlin

Heterogeneous Treatment Effects through Tenure on Job Satisfaction: A Machine Learning Evaluation

In this paper, we estimate the heterogeneous treatment effects of having tenure on the perceived degree of five satisfaction variables. The target group is PhD holders who work in academia and we use data from the National Science Foundation (NSF). To provide as much information about the causal effects as possible and also to control for a potential selection-bias, we expand the idea of making inference on key features of heterogeneous effects sorted by impact groups (GATES), to non-randomised experiments. This is achieved through the implementation of a doubly-robust estimator. Cross-splitting with multiple iterations is a further extension to avoid biases introduced through sample splitting. The advantage of the proposed method is a robust estimation of heterogeneous treatment effects, under mild assumptions, which is comparable with other models and thus keeps its flexibility in the choice of machine learning methods and at the same time its ability to deliver interpretable results. The empirical findings support the hypothesis that tenure has a causal effect on all satisfaction variables related to the job. The difference in the satisfaction with opportunities for advancements (scaled from 1 for very dissatisfied to 4 for very satisfied) is about 1.3 between the most and least affected scientists and between 0.6 and 0.4 for all other variables. We also find that there are significant differences in the results based on the chosen model. We conduct a classification analysis to provide insight into the average values of key features, again for the most and least affected. Finally, we estimate the conditional average treatment effect for each individual, which allows making predictions of the treatment effect for new scientists. We find that there is heterogeneity in the treatment effect which is positive for the majority and negative for some PhD holders.

Chia-Yen Lee
National Cheng Kung University

Beyond the Prediction of Data Science: A Practical Aspect

The practical application of machine learning and data science (ML/DS) techniques present a range of procedural issues to be examined and resolve including those relating to the data issues, methodologies, and assumptions. In fact, the prediction is not the purpose of data science, but decision making is. This talk aims to fill the gap between prediction and decision. For decision-making process, the risk identification and quantification, tradeoff among alternatives, resource planning, applicable conditions, problem-specific requests, and interpretation. These guide the practical applications of the ML/DS methodologies from predictive analytics to prescriptive analytics.