Abstract:
Depth estimation is a crucial task across various domains, but the high cost of collecting labeled depth data has led to growing interest in self-supervised monocular depth estimation methods. In this paper, we introduce SwiftDepth++, a lightweight depth estimation model that delivers competitive results while maintaining a low computational budget. The core innovation of SwiftDepth++ lies in its novel depth decoder, which enhances efficiency by rapidly compressing features while preserving essential information. Additionally, we incorporate a teacher-student knowledge distillation framework that guides the student model in refining its predictions. We evaluate SwiftDepth++ on the KITTI and NYU datasets, where it achieves an absolute relative error (Abs-rel) of 10.2% on the KITTI dataset and 22% on the NYU dataset without fine-tuning, all with approximately 6 million parameters. These results demonstrate that SwiftDepth++ not only meets the demands of modern depth estimation tasks but also significantly reduces computational complexity, making it a practical choice for real-world applications.