Data Science for Smart Manufacturing and Healthcare Workshop

DS2-MH Workshop at SDM23 on April 27 2023 at Minneapolis

Workshop Desciption

In the era of the Internet of things (IoT), with the rapid development of advanced sensing, data storage, and high-performance computing technologies, both manufacturing industries and healthcare systems are experiencing a data‑driven revolution. However, the unique characteristics of manufacturing and healthcare systems prevent the direct application of existing data-driven methods. Their characteristics include (1) systematic physical principles; (2) high demand for interpretability, robustness, and trustworthiness; and (3) limited computation resources and the need for instant decision-making. These characteristics raised pressing needs to develop domain-aware machine learning for critical tasks in manufacturing and healthcare systems, such as smart diagnosis, automatic control, design optimization, customized analytics, etc.

This workshop aims to demonstrate the recent research progress of data science, which focuses on addressing the unique challenges in manufacturing and healthcare systems, such as the gaps in data quality assurance, domain-aware data analytics, improvement of trustworthiness, etc. We cordially invite submissions that focus on recent advances in research/development of data science, which are motivated by real-world problems in manufacturing and healthcare. Papers and/or posters focus on both theoretical foundations and applications are welcomed from the areas including but not limited to:

Topics of Interest



Keynote Speakers

Dr. Xiaoyu Chen, Assistant Professor, University of Louisville

Abstract: Machine learning (ML) has been playing an increasingly important role in cyber-manufacturing systems (CMSs) to enable intelligent decision support. In recent years, the increasing popularity of customized manufacturing started to challenge existing ML methods by the frequently changing contexts (e.g., customized specifications). Specifically, most of the existing studies in CMSs investigate ML techniques for a few specific applications, which typically require months/years of investigation but may not be sustainable as contexts are frequently changing. It is now strongly desired to rapidly adapt ML methods to the changing contexts to provide responsive and accurate decision support. Such adaptation directly leads to two fundamental questions: (1) what ML methods should be selected to achieve satisfactory accuracy and (2) how to efficiently select the ML methods without scalability issues for large-scale problems? These questions motivated my research on ML for adaptive computation service. The major contributions of the adaptive computation service are (1) a recommender system to effectively rank different ML methods in a pipeline format considering small-scale problems; and (2) a tree-based recommender system to address the scalability issues in large-scale problems. Both methodologies were validated in manufacturing applications to demonstrate the generalizability.

Dr. Jan Gertheiss, Professor, Helmut Schmidt University

Abstract: The use of second-order difference penalties for ordinal data will be discussed, with the main focus being on non-linear principal component analysis (PCA). In general, non-linear PCA for categorical data, also called optimal scoring/scaling, constructs new variables by assigning numerical values to categories such that the proportion of variance in those new variables that is explained by a predefined number of principal components is maximized. In the talk, a penalized version of non-linear PCA for ordinal variables is presented that is an intermediate between standard, linear PCA on category labels, and non-linear PCA as used so far. In addition, it will be discussed how second-order difference penalties can be employed for statistical inference with ordinal predictors in generalized additive models.

Dr. Yifeng Gao, Assistant Professor, University of Texas Rio Grande Valley

Abstract: Thanks to the recent advancement of sensor techniques, time series data is one of the most commonly encountered data used in both manufacturing and healthcare and is ubiquitous in various applications. There are many advanced interpretable time series data mining tools are just been introduced in recent years. In this talk, we will introduce recent development in developing interpretable time series data mining tools with examples in the field of manufacturing and healthcare. I will first briefly give an overview of the most unique characteristics of such approaches. Then, we will talk about three topics: 1) motif discovery, 2) time series chain discovery, and 3) privacy-aware matrix profile. The talk will emphasize how to apply such techniques in the field of manufacturing and healthcare.

Dr. Ming Huang, Assistant Professor, Mayo Clinic,

Abstract: Natural language processing (NLP) is playing a crucial role in health informatics for improving health care. ChatGPT, a recent large language model developed by openAI, is one of the cutting-edge technologies that are potentially revolutionizing healthcare NLP. With its ability to understand human language and generate human-like responses, ChatGPT is transforming how healthcare professionals interact with patients and how medical information is processed. In this talk, we will overview the development and application of NLP in healthcare and explore the potential of ChatGPT in changing health NLP and how it can lead to more efficient healthcare systems and better patient outcomes. We also discuss the challenges and limitations that need to be addressed to fully realize the potential of this innovative technology in healthcare.

Dr. Honghan Ye, Data Scientist, 3M

Abstract: The rapid advancements of sensor technology have created a data-rich environment and shown great promises for increasing detection capabilities in complex systems. Such massive data, involving heterogeneous high-dimensional data streams with high sampling frequency, incur high costs on data collection, transmission, and analysis in practice. Thus, the resource constraint often restricts the data observability to only a subset of data streams at each data acquisition time, posing significant challenges in many online monitoring applications. In this talk, the speaker will discuss some recent developments of intelligent sampling strategies for effective online process monitoring and quality improvement..

Workshop Agenda (April 27th, 2023, Central Daylight Time)

Full contact information of the organizers:

Chenang Liu, Assistant Professor, Oklahoma State University,

Yao Ma, Assistant Professor, New Jersey Institute of Technology,

Diane Oyen, Scientist, Los Alamos National Laboratory,

Yinan Wang, Assistant Professor, Rensselaer Polytechnic Institute,

Latest Workshop

Web Chair of this workshop:

Yue Zhao, Ph.D. Student, Rensselaer Polytechnic Institute,


This workshop will be held in conjunction with SIAM International Conference on Data Mining (SDM23) on April 27 - 29, 2023, Minneapolis, MN, USA. The detailed schedule of this workshop will be released soon. More information about the conference and workshop can be found here.

Fig. 1 Group Photos. Upper: Morning session; Lower: Afternoon session.

Fig. 2 Snapshots of speakers in morning sessions.

Fig. 3 Snapshots of speakers in afternoon sessions.