The Internet of Things (IoT) describes the interconnectedness of everyday objects and devices through the internet, allowing them to collect and exchange data to make intelligent decisions and automate tasks. One of the main aspects of IoT is the transformation of data into actionable insights, which has far-reaching implications for businesses and manufacturing. The main driver of this transformation is the ability to generate and share vast streams of data in real time as big data streams. In a new study published in the Journal Technometrics by Ph.D. candidate Xin Zan, Professor Xiaochen Xian from the Department of Industrial and Systems Engineering at the University of Florida in collaboration with Dr. Di Wang from Shanghai Jiao Tong University addressed the challenges posed by big data streams, particularly in the context of statistical process control (SPC) for online monitoring.
Big data streams present unique challenges for SPC when applied to online monitoring. These challenges stem from the high velocity, high dimensionality, and complexity of the data. In practical IoT applications, several resource constraints contribute to partial observations of the data streams. First, there is a limitation on the number of sensors due to power consumption concerns. Second, data communication resources are restricted, making it challenging to transmit large volumes of data in real time. Lastly, storage and processing capacity can be limited, hindering real-time analysis even when full observations are available offline. These constraints necessitate the integration of sampling techniques into the online monitoring process to adaptively select which data streams to observe at each data acquisition time.
Another significant challenge arises from the complex interrelationships and statistical distributions present in big data streams. These streams often exhibit arbitrary and correlated data patterns, making them unsuitable for traditional SPC methods that rely on simplified or parametric models. While some nonparametric SPC methods have been proposed, they still make specific distributional assumptions, limiting their applicability to general big data streams with partial observations.
To address these challenges, the authors have developed a novel nonparametric online monitoring and adaptive sampling strategy. Their approach is designed to intelligently sample informative data streams for effective detection of mean shifts in general data streams, even when dealing with heterogeneous, unexchangeable, and correlated data streams.
The core of their methodology is the Spatial Rank-based Adaptive Sampling (SRAS) algorithm. This algorithm leverages historical in-control (IC) data to inform monitoring and sampling decisions based solely on observed data streams. By exploiting correlations among data streams and employing dynamic compensation coefficients, the SRAS algorithm facilitates equitable and rational sampling among data streams, even under resource constraints.
The new SRAS algorithm possesses several crucial properties that make it superior to existing methods. It accommodates various distributions, including heterogeneous, unexchangeable, and correlated data streams. It preserves the descriptive power of big data streams under partial observations, avoiding the need for simplified assumptions. Moreover, it guarantees equitable and rational sampling among data streams, ensuring that valuable information is efficiently captured.
To validate the effectiveness of the SRAS algorithm, the authors conducted extensive simulation studies under various scenarios. These studies compared the SRAS algorithm’s performance to that of benchmark methods. Additionally, two case studies: one involving semiconductor manufacturing and the other addressing COVID-19 pandemic surveillance were conducted to demonstrate the robustness and efficacy of the SRAS algorithm in real-world applications.
There is potential to improve the augmentation framework by incorporating online screening of observed data streams to enhance convergence rates and address computational challenges in extremely high-dimensional settings. Moreover, the development of a diagnostic procedure to identify mean shifts in data streams is an intriguing prospect. A self-starting procedure could also be explored, along with the extension of the SRAS algorithm to monitor shifts in both process mean and covariance matrix.
In conclusion, the development by Professor Xiaochen Xian and colleagues of the SRAS algorithm, a nonparametric online monitoring and adaptive sampling strategy, offers a promising solution for efficiently detecting mean shifts and handling big data streams with partial observations. As we continue to witness the exponential growth of data in our interconnected world, innovative approaches like the SRAS algorithm will play a vital role in ensuring the reliability and quality of critical systems.
Xin Zan, Di Wang, and Xiaochen Xian, Spatial Rank-Based Augmentation for Nonparametric Online Monitoring and Adaptive Sampling of Big Data Streams. TECHNOMETRICS, 2023, VOL. 65, NO. 2, 243–256.