PUKYONG

Robust 6D Pose Estimation of Occluded Objects Using RGB Images

Metadata Downloads
Abstract
The 6D pose estimation problem involves both object detection and the determination of their translations and rotations. In three-dimensional space, these properties each have three degrees of freedom, collectively referred to as the 6D pose. Specifically, the term encompasses the translational and rotational motions of a rigid body along the x, y, and z axes in a three-dimensional Cartesian coordinate system. The challenge in object pose estimation lies in determining the translation and rotation of an object. Translation refers to the displacement along the three orthogonal coordinate axes-x, y, and z. Rotation involves the angle of rotation around these three right-angle coordinate axes, encompassing pitch, yaw, and roll operations. Estimating the pose of objects is crucial for enabling machines to interact or manipulate them effectively. This capability finds applications in various domains, including augmented reality, virtual reality, autonomous driving, robotics, and more. However, addressing the associated challenges is non-trivial, involving issues such as cluttered backgrounds, occlusions, untextured objects, and scenarios where images are not readily available. In such cases, minor variations in rotation, translation, or scaling can pose challenges in accurate pose estimation. In industrial settings, robots leverage 6D pose technology to estimate and manipulate objects accurately. In augmented reality applications, 6D pose estimation plays a crucial role in measuring the poses of real- world objects, enabling the seamless integration of virtual objects into the environment with correct spatial positioning. This capability enhances the overall functionality and immersive experience of augmented reality applications. 6D pose estimation involves leveraging discernible details from a reference 2D image to deduce the 3D rotation and 3D translation of an object in relation to the known shape of the camera. Typical cameras used for capturing scenes include RGB cameras and RGBD (color and depth) cameras. However, the use of depth cameras is not always feasible, especially outdoors where they can be prone to failure due to lighting conditions. Therefore, there is a preference to rely solely on color images for 6D pose estimation, even though this poses additional challenges. Despite the challenges, it is crucial to address occlusion—the partial visibility of the object due to obstructions- which significantly impacts pose detection. Occlusion complicates the inference of an object's position as only a portion of the object is visible. As a result, estimating a 6D pose is not always straightforward, and overcoming occlusion remains an important consideration in the development of accurate pose estimation systems. This thesis introduces a methodology for estimating the 6D pose of an occluded object using only RGB images. The approach employs a neural network to identify keypoints by predicting vectors that point to these keypoints for each pixel in the RGB image. These vectors are generated through the prediction of pixels for semantic segmentation. The accuracy of localizing keypoints crucially depends on the results obtained for the target pixels. The neural network automatically adjusts the weights of the pixels based on its prediction results, enhancing the network's learning capability, especially in occluded regions. Consequently, our approach excels at extracting features from occluded regions, ensuring robust performance even in cases of object occlusion. Comparative analysis against existing approaches demonstrates that our method achieves higher accuracy in 6D pose estimation. ACKNOWLEDGMENTS I wish to take this moment to convey my sincere appreciation to all those who played a pivotal role in my successful completion of the master's program. Foremost, my heartfelt gratitude extends to Prof. Hanhoon Park, whose guidance and unwavering support defined my academic journey. His encouragement and mentorship were instrumental in shaping my goals and achievements. I also want to express profound thanks to Prof. Seonhan Choi for recommending me to Prof. Park, facilitating the culmination of my master's studies. My gratitude extends to the members of the IVC Lab, whose invaluable assistance, camaraderie, and shared experiences enriched both my academic and personal life. Additionally, I am deeply thankful to my family for their steadfast support and understanding throughout my overseas study, particularly during challenging moments. Their encouragement and endorsement of my decision to pursue studies abroad have been invaluable. Lastly, I extend my thanks to Prof. YoonTae Kim and Prof. Jaehyo Jung for their special care during my graduate studies. Their guidance has left an indelible mark on my academic journey.
Author(s)
YE YUNING
Issued Date
2024
Awarded Date
2024-02
Type
Dissertation
Publisher
국립부경대학교 대학원
URI
https://repository.pknu.ac.kr:8443/handle/2021.oak/33621
http://pknu.dcollection.net/common/orgView/200000742338
Alternative Author(s)
YE YUNING
Affiliation
국립부경대학교 대학원
Department
대학원 인공지능융합학과
Advisor
Hanhoon Park
Table Of Contents
CHAPTER I. INTODUCTION . 1
CHAPTER II. OVERVIEW OF ROBUST 6D POSE ESTIMATION OF OCCLUDED OBJECTS USING KEYPOINT FROM RGB IMAGE . 5
II.1. Background 5
II.2. Challenges of 6D pose estimation 6
II.2.1. Texture-less objects 6
II.2.2. Occluded objects . 8
II.2.3. Background clutter . 9
II.3. Algorithms for estimating 6D pose from RGB image . 10
II.3.1. Overview 10
II.3.2. Template-based methods 11
II.3.3. Regression -based methods . 13
II.3.4. Classification-based methods . 15
II.3.5. Keypoint-based methods 17
II.4. PVNet 6D Pose estimation approach 20
II.4.1. Overview 20
II.4.2. Weakness 22
II.5. Perspective-n-point (PnP) algorithm 24
II.6. Focal loss 26
CHAPTER III. THE PROPOSED 6D POSE ESTIMATION APPROACH OF OCCLUDED OBJECTS USING KEYPOINT FROM RGB IMAGE 28
III.1. Introduction . 28
III.2. Overview of proposed approach 29
III.3. Approach 31
III.4. Result and discussion . 33
III.4.1. Setup . 33
III.4.2. Datasets 33
III.4.3. Evaluation metrics . 34
III.4.4. Results and discussion . 35
III.4.5. Comparison of segmentation performance with PVNet . 38
III.4.6. Visual comparison of pose estimation Accuracy with PVNet . 41
III.4.7. Limitations . 44
III.5. Summary 46
CHAPTER IV. CONCLUSION 48
IV.1. Conclusion 48
IV.2. Challenges and limitations 49
IV.3. Future works 50
REFERENCES 52
Degree
Master
Appears in Collections:
대학원 > 인공지능융합학과
Authorize & License
  • Authorize공개
  • Embargo2024-02-16
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.