Data Science for Smart Manufacturing and Healthcare Workshop

DS2-MH Workshop at SDM24 on April 18 2024 at Houston

Workshop Desciption

In the era of the Internet of Things (IoT), with the rapid development of advanced sensing, data storage, data analytics, and high-performance computing technologies, both manufacturing industries and healthcare systems are experiencing a data‑driven revolution. However, the unique characteristics of manufacturing and healthcare systems prevent the direct application of existing data-driven methods. Their characteristics include (1) systematic physical principles, (2) high demand for interpretability, robustness, and trustworthiness, and (3) limited computation resources and the need for instant decision-making. These characteristics raised pressing needs to develop domain-aware data-driven approaches for critical tasks in manufacturing and healthcare systems, such as smart diagnosis, automatic control, design optimization, customized analytics, etc.

This workshop aims to demonstrate the recent research progress of data science, which focuses on addressing the unique challenges in manufacturing and healthcare systems, such as the gaps in data quality/security assurance, domain-aware data analytics, improvement of trustworthiness, etc. We cordially invite submissions that focus on recent advances in research/development of data science, which are motivated by real-world problems in manufacturing and healthcare. Papers and/or posters focus on both theoretical foundations and applications are welcomed from the areas including but not limited to:

Topics of Interest



Keynote Speakers

Abstract: Automating the detection of defects in CT scans of manufactured parts is perhaps best thought of as an example of anomaly detection. The number of potential defects is effectively infinite; almost anything that can go wrong in the fabrication process probably will go wrong at some point, and we require an algorithm capable of detecting that “something isn’t right” regardless of what that something is. An additional complication is that at very high resolution, CT scan volumes become enormous, often hundreds of gigabytes, making this a supercomputing problem. We have been approaching the problem of automatic anomaly detection at supercomputing scale using a technique derived from sparse coding. We start by training a set of convolutional 3-dimensional features that are optimized for the sparse reconstruction of “defect-free” manufactured parts. One may think of these 3-dimensional features as puzzle pieces that are optimized for reassembling representative CT scan volumes using as few puzzle pieces as possible. Because the algorithm struggles to reconstruct manufacturing defects using 3-dimensional features optimized for the sparse reconstruction of “defect-free” parts, defects show up as large residual errors. We are using PetaVision, a high-performance neural network simulator, to train sparse coding models on large CT scan volumes. PetaVision employs halos to split large CT scan volumes into smaller components, with each component distributed to a separate node and MPI used to pass halos to nearest neighbors, thereby coordinating the operations performed on individual nodes. PetaVision utilizes a hybrid approach to parallelism, employing MPI, openMP and GPU acceleration where available. Our preliminary studies indicate that Trinity KNL nodes are approximately a factor of 10 slower than Power 9 GPU nodes and that Nvidia A1000 GPU nodes are a factor of 10 faster still.

Biography: Dr. Garrett T. Kenyon received his Bachelor’s degree in Physics from the University of California, Santa Cruz and his Ph.D. in Physics from the University of Washington, Seattle and is currently a research scientist at the Los Alamos National Laboratory. He has worked for nearly forty years on problems involving various aspects of neural computation and neural computing. Dr. Kenyon has contributed to over 75 peer-refereed publications and has served as a reviewer for the NSF, NIH, DOE and other government agencies. Dr. Kenyon has led a number of research projects funded by the NSF, DARPA and DoD as well as projects funded by the DOE’s LDRD program.

Abstract: Deep reinforcement learning (RL) has currently achieved success in prediction and control tasks such as gaming, robotics, and language models, igniting curiosity about its applicability in real-world scenarios. This presentation focuses on deep RL for real-world decision-making, commencing with an exploration of potential applications and the challenges posed by uncertainty inherent in data and models. We will then discuss the implications of these uncertainties on the feasibility of prediction and control in real-world contexts. We will also share preliminary efforts aimed at addressing these challenges in applications such as sensor placement and policy evaluation.

Biography: Hua Wei is an assistant professor at the School of Computing and Augmented Intelligence (SCAI) in Arizona State University (ASU). He got his PhD from Pennsylvania State University in 2020. His recent work includes developing uncertainty quantification and large language models for spatio-temporal prediction and decision making He specializes in data mining, artificial intelligence and machine learning. He has been awarded the Best Paper at ECML-PKDD 2020, and his students and his own research work as a first author have been published in top conferences and journals in the fields of machine learning, data mining, and control (KDD, NeurlPS, AAAI, ICLR, IJCAI, CDC, ECML-PKDD, WWW, CIKM). His research has been funded by the National Science Foundation and the Department of Energy.

Abstract: Graph-structured data is ubiquitous in real-world applications (e.g., social networks, infrastructure, biomedical, etc.) and Graph Machine Learning (GML) has become a prominent method for handling graph-based data. Despite GML’s remarkable achievements, its reliance on node features and graph topology makes it susceptible to data quality challenges. Therefore, my research has focused on data-quality-aware graph machine learning, systematically examining issues related to topology, imbalance, and bias in graph data and proposing data/model-centric solutions to handle them. Furthermore, I have developed dedicated GML for many real-world application domains, such as computer-aided drug discovery, information retrieval on e-commerce platforms, and enhancing the reading experience for educational purposes. In this presentation, I will provide an overview of my research contributions while more thoroughly introducing my work on improved multi-document question-answering via knowledge graph with an LLM agent. In conclusion, I will highlight multiple future directions.

Biography: Yu Wang is a final-year Ph.D. candidate in Computer Science at Vanderbilt University and will join University of Oregon as a Tenure-track Assistant Professor at this coming Fall. His main research focuses on Data-centric Graph Machine Learning and Data-quality Aware Graph Neural Networks with applications in Information Retrieval, Infrastructure Networks and Chemistry. He has published 14+ papers in top conferences (e.g., ICLR, AAAI, WWW, KDD) and regularly serves as PC member/reviewer for international conferences and journals in machine learning and data mining, such as KDD, ICML, AAAI, WWW, WSDM. In addition, he has received numerous awards including two Best Paper Awards at Frontiers of Graph Learning Workshop at NeurIPS 2023 and the 2020 Smokey Mountain Data Challenge Competition by ORNL, Vanderbilt’s C. F. Chen Best Paper Award in 2022 as first-author (and Runner-Up Award in 2023 as co-author), had his work selected in top-10 Most Influential Papers at WWW 2023 and CIKM 2022 by Paper Digest, and was the sole recipient of Vanderbilt’s Graduate Leadership Anchor Award for Research in 2023.

Tentative Workshop Agenda

Morning Session
Lunch Break (12:00 – 13:20)
Afternoon Session

*The schedule may subject to change according to the SDM conference schedule.

Important Dates (Central Time)

Biography of the organizers:

Previous workshop

Web Chair of this workshop

Liangliang Zhang, PhD student, Rensselaer Polytechnic Institute.


This workshop will be held in conjunction with SIAM International Conference on Data Mining (SDM24) on April 18 - 20, 2024, Houston, TX, USA. The detailed schedule of this workshop will be released soon. More information about the conference and workshop can be found here.

For questions regarding this workshop, please contact us at:

Fig. 1 Snapshots of speakers in morning sessions.

Fig. 2 Snapshots of speakers in afternoon sessions.

Fig. 3 Data Science for Smart Manufacturing and Healthcare Workshop Group Picture