Date of Award


Document Type


Degree Name

Master of Science (MS)


Energy Systems Engineering

First Advisor

Michael Mann


Due to their exceptional properties, rare earth elements (REEs) are critical to technological innovation in renewable energy production, electronics, health care, and national defense. They make up key components for many applications in the above areas. Many countries rely upon rare earth element imports. The high demand for rare earth elements has led to the development of alternative methods for exploration and capture. Coal has been labeled a viable potential source of rare earth elements and yttrium (REY). Statistical evaluation of REY concentrations and the properties of various coal samples is critical for successful characterization.

The USGS COALQUAL database Version 3.0 is an industry standard database for coal research that contains 7658 non-weathered, full-bed coal samples from the United States. 5485 of these samples contain a full spectrum of REY concentrations. The data quality in the COALQUAL database will be analyzed to ensure that the data is reliable, and characteristics will be analyzed using conventional statistical methodology. This methodology includes accounting for samples with REY concentrations below the lowest limits of detection. Mean concentrations for each REY will be adjusted to fit a distribution of mean REY concentrations from the National Coal Resources Data System (NCRDS) normalized by the Upper Continental Crust standard dataset of REY mean concentrations. All samples are classified as unpromising or promising using total rare earth oxide concentration and the ratio of critical REYs to excess REYs called the outlook coefficient.

Machine learning is a powerful tool that can utilize data to classify new data points added to a database based on data attributes. A machine learning model was developed to use existing data from the COALQUAL database to train and test algorithms to classify coal samples as unpromising or promising based on the samples ASTM ash percentage. The 5485 adjusted coal samples from the COALQUAL database were used and subjected to synthetic minority over-sampling technique (SMOTE) to eliminate label bias, and imputing methods were used to format the data for computational purposes. The adjusted coal samples were tested amongst various machine learning algorithms for the best performance. Accuracy and the number of false positives were the key performance indicators used to test each algorithm. The k-nearest neighbors (KNN) algorithm emerged as the best performer with 92% accuracy and 2% false positives. A brief economic analysis is included to justify using the model to save costs associated with obtaining trace element concentrations from laboratory analysis. Recommendations are given with details on how to utilize this research for future endeavors.