Abstract
A sensor network is a distributed system consisting of many embedded devices or sensors that can be installed over a large-scale physical environment. These intelligent sensors are designed to collect enormous amounts of data, requiring non-conventional techniques to store, process, and analyze data in real-time, particularly for detecting anomalies. Using analytical methods for massive data is a new challenge, but the concept of big data may provide the opportunity to meet this challenge. The emergence of big data and complex systems in sensor networks has revolutionized fault detection because many different types of anomalies can occur in sensor networks for various reasons. However, despite the considerable investment in intelligent devices and sensors, many network systems have no efficient, customized real-time data analytics framework for anomaly detection. This thesis focuses on the development of such algorithms in real-time. This thesis proposes a new data fusion and network analytics framework based on the topology of large-scale networks and the stochastic dependencies among nodes, edges, and sensor data. This novel framework can transform real-time sensor data collected from disparate sources in a network to detect the location of anomalies and the nodes impacted by the anomalies. By intelligently fusing multidimensional sensor data based on the topology of a large-scale network, this work also contributes to big data analytics for network systems. This proposed framework not only brings computational benefits but also results in better anomaly estimates, leading to lower false alarm rates, and higher detection rates.
This thesis demonstrates the validity and practicability of the proposed framework in three phases. The first phase of the validation process consists of a set of experiments for detecting a single anomaly and its location from sensors installed at the lowest level in a network. Experiments have been conducted to demonstrate the working of the proposed model using two real-time datasets containing balanced and unbalanced anomaly data. In addition, the framework's sensitivity to some important inputs and setup parameters (i.e., number of binary sensor outputs, missing points, sensor partial information, and size of the network) is demonstrated. Also, to further express the benefits of the proposed framework, the experimental results are compared to other proposed models in this thesis, as well as the most commonly used machine learning techniques. The results prove that the proposed model outperforms other models and machine learning techniques (i.e., logistic regression, self-organizing maps (SOM), extreme gradient boosting (EGB), bagged trees, linear discriminant analysis (LDA), adaptive boosting (AdaBoost), and quadratic discriminant analysis(QDA)).
The second and third phases show that the proposed framework can be extended and utilized to detect multiple simultaneous anomalies and their locations in a network using subsystem-level and system-level sensor data, respectively.