Date of Award


Document Type


Degree Name

Master of Science (MS)


Electrical Engineering

First Advisor

Prakash P. Ranganathan


Rapid digitization of modern vehicles using electronic control units (ECUs) has made the modern automobile to realize autonomous operations. ECUs within a vehicle are capable of handling multiple functions within the vehicle pertaining to vehicular control, infotainment system, or electronic control of mirrors, wipers, and seats. Such data are relayed through various communication buses within a vehicle that allows for wired communication between multiple ECUs when referring to in-vehicle communication or wireless communication between a vehicle's ECUs and other vehicles and/or roadside infrastructure. Though network traffic can be intercepted from these communication buses, identifying an ECU responsible for a particular function is an open problem and is proprietary to auto manufacturers.

Present day automobiles are equipped with a myriad of functionalities that allows for automation and sensing capabilities through on-board sensors and their respective sub-systems. Accurate and timely relay of such data in ideal ambient conditions such as adequate light, absence of fog/haze, and objects that do not interrupt with sensor inputs is critical to the safety of the passengers and passersby.

While sensors on-board the vehicle are known for having fairly high lifespans, especially on higher-end vehicles, the data relayed through these sensors via communication buses can be intercepted and analyzed in order to identify usage patterns and make recommendations based on the observations. To obtain this data from an in-vehicle communication bus, this research uses Linux open-source tools and a cost effective commercial off-the-shelf (COTS) hardware to read streaming data from the Controller Area Network (CAN). Analyzing multiple vehicle types from various makes is ideal for robust training data to identify similar patterns between automobile manufacturers.

Therefore, the collected CAN data included are from 3 Nissan sedans, 1 Honda sports utility vehicle (SUV), and 1 Toyota sedan. ECUs responsible for functions related to the powertrain system of the vehicle, namely speed, tachometer, and steering were identified and mapped to actual values in miles per hour (mph), revolutions per minute (RPM), and newton-metre (Nm) respectively based on available original equipment manufacturer (OEM) technical manuals. Using waveform patterns for the speed ECU signature from Nissan, threshold equations were designed for Honda and Nissan makes using unsupervised learning methods i.e. k-means++ and mean shift algorithms. The results from these methods show an agreement between the two clustering methods based on descriptive statistical parameters and the lowest errors were obtained for the speed and steering ECU signatures for the 10-minute and 5-minute driving datasets respectively.

Manually identifying ECUs and their signatures is possible for a certain number of test vehicles but cannot work at scale. To automate this process, three supervised machine learning algorithms were identified and compared based on their performances to solve a categorical or classification problem. Comparisons were done using evaluation metrics, namely accuracy, F1-score, and computation time for two cases. The first two metrics are essential for assessing the performance of classification algorithms while the third metric will play a significant role if such algorithms are to be deployed in a real-time streaming environment. Results from the classification algorithms show that the distance based k nearest neighbor (kNN) algorithm gives the highest performance followed by the Decision Tree algorithm and lastly the Gaussian Naive Bayes. Cross validation was used to identify whether the higher performing algorithms i.e. kNN and Decision Tree are prone to underfitting or overfitting; results show that using 5-fold cross validation, both these models generalize fairly well enough during their training sets to produce accuracy and F1-scores that are higher than 70% and 0.65 respectively. These findings indicate that the non-linear models like kNN and Decision Tree show potential for identifying ECU signatures at scale.