Date of Award
December 2024
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer Science
First Advisor
Jeremiah Neubert
Abstract
Semantic segmentation is a task in computer vision that frequently employs convolutional neural networks (CNNs) to learn dense, pixel-wise object classification. Low-level feature preservation and robust, real-time network deployment are regarded as two of the biggest challenges in this task. The former challenge loses low-level features during downsampling operations, which are integral to learning object semantics, posing a seemingly paradoxical challenge. There have been many approaches for resolving this problem such as skip connections and multi-scale feature extraction, but none quite as explicit as modeling edge prior information. The latter challenge is often resolved by exploiting lightweight CNN architectures, where fewer parameters and low-latent convolution operations trade off performance with real-time inference.This study presents a novel multi-branch architecture, DCBNet, for which detailed, contextual, and boundary details are intelligently fused. DCBNet addresses low-level feature preservation by integrating semantic edge detection (SED) as an auxiliary task for semantic segmentation, where edge prior knowledge is leveraged to improve boundary contouring in dense prediction. To accommodate real-time inference, the knowledge of the high-parameter multi-task model, denoted as the teacher, is distilled to a lower-parameter multi-task student model. The results validated the multi-task, knowledge-distilled approach, where extensive experiments tested the effectiveness of distinct architecture modules, multi-level feature configurations, and loss functions to find the optimal model. A forward-connected path and lower-level feature subset in the boundary branch of DCBNet, coupled with softmax activated boundary fusion weights improved overall performance, yielding a mIoU score of 77.1% for the teacher model. Knowledge distillation improved the deployable student model’s performance from 72.3% mIoU to 72.8%. Real-time inference was clocked at 52.6 FPS on a single NVIDIA 40GB A100 SXM GPU. Results compare the optimal teacher model to state-of-the-art and other relevant real-time models for semantic segmentation.
Recommended Citation
Dos Santos, Alessandro, "Multi-Task Learning For Deployable, Boundary-Refined Semantic Segmentation" (2024). Theses and Dissertations. 6527.
https://commons.und.edu/theses/6527