PUKYONG

MoTUNet: A MobileNetV2-Transformer U-Net for Water Body Segmentation

Metadata Downloads
Abstract
Efficient real-time water body segmentation is crucial for applications such as flood de- tection, but balancing accuracy and inference efficiency remains challenging. In this paper, we propose MoTUNet (MobileNetV2-Transformer U-Net), designed to optimize both accuracy and inference speed for water body segmentation. Its performance is evaluated against several popular segmentation models such as U-Net, DeepLabV3+, PSPNet, PAN, and LinkNet. All models use MobileNetV2 as an encoder to reduce computational complexity while preserving feature extraction, and the Kaggle RIWA dataset is used for training and evaluation. The key metrics include Intersection over Union (IoU), precision, recall, F1-score, frames per second (FPS), and the average in- ference latency. Our results show that U-Net and DeepLabV3+ achieve the highest accuracy, while PSPNet is the most efficient in terms of FPS. MoTUNet provides an optimal balance by being 97.20% and 64.49% faster than U-Net and DeepLabV3+ at a 512×512 input size, and 81.71% and 58.18% faster at a 256×256 input size, while maintaining competitive segmentation accuracy.|수자원감시나홍수탐지와같은응용분야에서실시간수계분할은필수적이지만,정확도와추론효율성의균형을 맞추는 것은 여전히 도전적인 과제로 남아있다. 본 논문에서 우리는 수계 분할의 정확도와 속도를 모두 최적화하 기 위한MoTUNet, 즉MobileNetV2-Tranformer U-Net,을 제안하고 U-Net, DeepLabV3+, PSPNet, PAN, LinkNet과 같은 다양한 딥러닝 모델과 비교하여 성능을 평가하였다. 모든 모델은 특징 추출 능력을 유지하면서 계산 복잡도를 줄이기 위해MobileNetV2를 인코더로 사용하며, Kaggle RIWA 데이터셋을 훈련 및 평가에 활용 하였다. 주요 평가 지표로는 교집합 비율(IoU), 정밀도, 재현율, F1-점수, 초당 프레임 수(FPS), 평균 추론 지연 시간을 포함한다. 실험 결과에 따르면 U-Net과 DeepLabV3+가 가장 높은 정확도를 달성했으며, PSPNet은 FPS 측면에서 가장 효율적인 것으로 나타났다. MoTUNet은 512×512 입력 크기에서 U-Net과 DeepLabV3+ 보다 각각 97.20%, 64.49% 더 빠르고, 256×256 입력 크기에서는 81.71%, 58.18% 더 빠르면서도 높은 분할 정확도를 유지하며 최적의 균형을 제공한다.
Author(s)
POV KIMSAY
Issued Date
2025
Awarded Date
2025-08
Type
Dissertation
Keyword
Water body segmentation, Flood monitoring, Deep learning, Convolutional neural network, Transformer-based decoder
Publisher
국립부경대학교 대학원
URI
https://repository.pknu.ac.kr:8443/handle/2021.oak/34353
http://pknu.dcollection.net/common/orgView/200000904486
Affiliation
국립부경대학교 대학원
Department
대학원 인공지능융합학과
Advisor
Youngsun Han
Table Of Contents
I. Introduction 1
1. Motivation 1
2. Contribution 2
3. Thesis Organization 3
II. Background 4
1. Convolutional Neural Network 4
1.1. Convolutional Layer 5
1.2. Batch Normalization 6
1.3. Activation Function 7
1.4. Pooling Layer 7
1.5. Fully Connected Layer 8
1.6. CNNs in Vision Tasks 9
2. Semantic Segmentation 10
III. Related Works 13
1. MobileNetV2 13
2. U-Net 16
3. DeepLabV3+ 18
4. PSPNet 20
5. Pyramid Attention Network (PAN) 22
6. LinkNet 24
IV. Proposed Method 26
1. Transformer Overview 27
2. MoTUNet Architecture 29
V. Performance Evaluation 31
1. Environment Setup 31
2. Dataset 32
3. Hardware Configuration 32
4. Model Configuration 33
5. Evaluation Criteria 33
6. Performance Results 34
VI. Discussion and Limitation 38
VII.Conclusion 39
Bibliography 41
Acknowledgement 45
Publication 47
Degree
Master
Appears in Collections:
대학원 > 인공지능융합학과
Authorize & License
  • Authorize공개
  • Embargo2025-04-30
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.