Deep Learning-based Approach to Software Changes Recommendation with Consideration for Propagation Effects in Large Software System
- Alternative Title
- 대규모 소프트웨어 시스템에서 변경파급효과를 고려한 딥러닝 기반의 소프트웨어 변경 추천 방법론
- Abstract
- 대규모 소프트웨어 시스템에서 변경파급효과를 고려한 딥러닝 기반의 소프트웨어 변경 추천 방법론
Ahmed Hamdi Abdurhman
Department of Industrial and Data Engineering. The Graduate School,
Pukyong National University
요약
소프트웨어 시스템은 지속적으로 진화하고 크기와 복잡성이 커지면서 시스템의 안정성과 기능을 유지하기 위하여 소프트웨어의 변경을 예측하는 문제의 중요성이 대두되고 있다. 기존 문헌에서는 소프트웨어 협업 도구에 변경 로그 데이터를 활용하여 동시 변경 패턴을 추출하는 방법론이 강조되었다. 그러나 이러한 방법은 정확성이 떨어지고, 대규모 시스템에 적용이 어렵다는 한계가 존재한다. 이 논문에서는 이러한 연구 격차를 다루는 세 가지 서로 다른 상호 연관된 연구를 소개한다.
첫 번째 연구는 File-level Change Propagation to Vector (FCP2Vec)라는 새로운 파일 수준 변경 전파 예측 방법론이다. FCP2Vec은 자연어 처리에서 활발히 사용되는 Word2Vec을 이용하여, 현재 변경된 파일을 기반으로 변경 내용이 전파될 수 있는 파일이 무엇인지 추천한다. 해당 방법론의 검증을 위해 3개의 공개 소프트웨어의 히스토리 변경 로그를 학습하였으며 (Vuze, Spring Framework, Elasticsearch) 해당 방법론이 변경 전파를 효과적으로 예측하고 이전 조사와 비교하여 패키지 레벨에서 정확도가 21% 향상되었음을 입증했다. 두 번째 방법은 딥러닝 기반의 Transformer 아키텍처를 사용한 소프트웨어 변경 추천 기법으로 FCP2BERT로 명명되었다. 기존 방법과 달리 FCP2BERT는 개발자가 다수의 파일과 상호 종속성을 처리할 때 발생하는 확장성 문제를 해결할 수 있다. 또한 파일 수준 추상화를 기반으로 보다 세분화된 추천을 제공한다. 해당 방법론은 파일 수준의 정확도를 60% 향상시켜 FCP2BERT의 효과를 입증했습니다. 마지막 연구는 동적 그래프 신경망을 사용한 소프트웨어 변경 추천 방법론인 FCP2DGNN이다. FCP2DGNN은 그래프 기반 문제로 접근법을 모델링하여 시스템 구성 요소 간의 진화 관계를 파악하고 향후 변경 전파를 효과적으로 예측합니다. 이전 연구와 비교하여 이 접근 방식은 파일 수준에서 63% 정확도 향상을 보여준다.
전반적으로, 해당 연구는 소프트웨어 변경 전파에 대한 이해와 예측을 통해 소프트웨어 변경의 관리 및 소프트웨어 품질을 향상시키기 위한 도구를 제공한다. 저자의 발견은 보다 효율적인 소프트웨어의 유지보수 방법을 제공함으로써 보다 효과적인 소프트웨어 관리 방안을 제공할 것으로 예상된다.
|Deep Learning-based Approach to Software Changes Recommendation with Consideration for Propagation Effects in Large Software System
Ahmed Hamdi Abdurhman
Department of Industrial and Data Engineering. The Graduate School,
Pukyong National University
Abstract
Software systems continually evolve, expanding in size and complexity and amplifying the challenge of predicting change propagation while maintaining system stability and functionality. The existing literature has highlighted methodologies focusing on co-change patterns from changelog data using data-driven methods like dependency networks. However, these methods are fraught with scalability issues and lack a comprehensive focus on higher-level abstraction. This dissertation presents three distinct yet interrelated studies to address these research gaps, each leveraging different techniques and data sets to improve the prediction of software change propagation.
In the first study, a novel file-level change propagation approach, termed File-level Change Propagation to Vector (FCP2Vec), is proposed. FCP2Vec, designed as a recommendation system, suggests files potentially subject to change propagation based on the file being currently modified. Analyzing three publicly available datasets (Vuze, Spring Framework, Elasticsearch), the FCP2Vec method effectively predicted subsequent change propagation from historical changelog data, demonstrating a 21% improvement in accuracy at the package level compared to previous study.
The second study extends the focus to a file-level change propagation approach using Bidirectional Encoder Representations with Transformer strategy (FCP2BERT). Unlike existing methods, FCP2BERT can handle scalability issues developers encounter while dealing with numerous files and their interdependencies. Moreover, it offers more fine-grained recommendations based on file-level abstraction. The case study on the Vuze dataset demonstrated a 60% increase in accuracy at the file level, substantiating the efficacy of FCP2BERT.
The final study introduces a novel approach, File-level Change Propagation with Dynamic Graph Neural Network (FCP2DGNN), addressing the evolving nature of software and temporal dynamics. Modeling our approach as a graph-based problem, FCP2DGNN captures the evolving relationships between system components, effectively predicting future change propagation. In comparison with prior studies, the approach yielded a 63% increase in accuracy at the file level.
Overall, these studies contribute significantly to our understanding of software change propagation and provide software developers with tools for managing software changes, enhancing software quality, and predicting future modifications more accurately. Our findings may pave the way for more efficient software maintenance and evolution, ultimately fostering more effective decision-making processes.
Keywords: Change coupling recommendation, Change management, Word2vec, BERT, Software Changelog data, Dependency networks, Software change propagation, Dynamic graph neural network, Recommendation system, File level abstraction, Software engineering
- Author(s)
- AHMED HAMDI ABDURHMAN
- Issued Date
- 2023
- Awarded Date
- 2023-08
- Type
- Dissertation
- Keyword
- Change coupling recommendation, Change management, Word2vec, BERT, Software Changelog data, Dependency networks,Software change propagation, Dynamic graph neural network, Recommendation system, File level abstraction, Software engineering
- Publisher
- 부경대학교
- URI
- https://repository.pknu.ac.kr:8443/handle/2021.oak/33480
http://pknu.dcollection.net/common/orgView/200000695526
- Affiliation
- Pukyong National University, Graduate School
- Department
- 산업 및 데이터공학과(산업데이터공학융합전공)
- Advisor
- Jihwan Lee
- Table Of Contents
- I. INTRODUCTION 1
1.1 INTRODUCTION 1
1.2 MOTIVATION 2
1.3 BACKGROUND OF SOFTWARE EVOLUTION AND CHANGE PROPAGATION 4
1.3.1 SOFTWARE EVOLUTION 4
1.3.2 CHANGE PROPAGATION 4
1.3.3 CHANGE COUPLING 6
1.3.4 CODE CHANGE 6
1.4 RECOMMENDATION SYSTEM 6
1.5 RESEARCH QUESTIONS 11
1.6 RESEARCH OBJECTIVES 12
1.7 RESEARCH CONTRIBUTIONS 13
1.7.1 FCP2VEC: FILE-LEVEL CHANGE PROPAGATION TO VECTOR 14
1.7.2 FCP2BERT: FILE-LEVEL CHANGE PROPAGATION TO BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMER 15
1.7.3 FCP2DGNN: FILE-LEVEL CHANGE PROPAGATION TO DYNAMIC GRAPH NEURAL NETWORK 16
1.8 DATA GATHERING REQUIRED METHODS 18
1.9 ORGANIZATION OF THE DISSERTATION 19
II. FCP2VEC: DEEP LEARNING-BASED APPROACH TO SOFTWARE CHANGE PREDICTION BY LEARNING CO-CHANGING PATTERNS FROM CHANGE LOGS 22
2.1 INTRODUCTION 23
2.2 MATERIALS AND METHODS 28
2.2.1 LEVERAGING CHANGELOG DATA IN CHANGE PROPAGATION 29
2.2.2 NEURAL LANGUAGE MODEL 35
2.3 BACKGROUND 38
2.3.1 PROBLEM DEFINITION 38
2.3.2 WORD2VEC 41
2.3.2.1 SKIP-GRAM 43
2.4 PROPOSED APPROACH 47
2.4.1 DATA PREPARATION 47
2.4.2 DATA PROCESSING 49
2.4.3 LEARNING D-DIMENSIONAL ELEMENT REPRESENTATION 51
2.4.4 EVALUATION METRICS 53
2.4.5 HYPERPARAMETERS 55
2.5 EMPIRICAL STUDIES 59
2.5.1 TRANSACTION DATA 60
2.5.2 SYSTEM ENVIRONMENTS 61
2.6 RESULT AND DISCUSSION 63
2.6.1 COMPARISON BETWEEN FCP2VEC AND DN 73
2.7 IMPLICATIONS FOR PRACTICAL USE 77
III. FCP2BERT: A BERT-BASED SEQUENTIAL RECOMMENDATION SYSTEM FOR EFFECTIVE CHANGE PROPAGATION PREDICTION IN LARGE SOFTWARE SYSTEMS 79
3.1 INTRODUCTION 80
3.2 LITERATURE REVIEW 86
3.2.2 DISTRIBUTIONAL REPRESENTATION 90
3.3 BACKGROUND 95
3.3.1 PROBLEM DEFINITION 95
3.3.2 BIDIRECTIONAL ENCODER REPRESENTATION FROM TRANSFORMER (BERT) 97
3.4 PROPOSED METHODS 100
3.4.1 DATA PREPROCESSING 100
3.4.2 LEARNING ELEMENT REPRESENTATION DATA PREPROCESSING 103
3.4.3 EVALUATION METRICS DATA PREPROCESSING 105
3.4.4 HYPERPARAMETERS 107
3.5 EMPIRICAL STUDIES 110
3.5.1 TRANSACTION DATA 111
3.5.2 SYSTEM ENVIRONMENT 111
3.5.3 RESULTS AND DISCUSSION 112
3.6 IMPLICATIONS FOR PRACTICAL USE 115
IV. DYNAMIC GRAPH NEURAL NETWORK -BASED SEQUENTIAL RECOMMENDATION SYSTEM FOR EFFECTIVE CHANGE PROPAGATION PREDICTION IN LARGE SOFTWARE SYSTEMS 118
4.1 INTRODUCTION 119
4.2 LITERATURE REVIEW 126
4.2.1 GNN AND SEQUENTIAL RECOMMENDATION SYSTEM 127
4.3 BACKGROUND 131
4.3.1 PROBLEM DEFINITION 131
4.3.2 GRAPH NEURAL NETWORK 132
4.3.2.1 DYNAMIC GRAPH NEURAL NETWORK 133
4.4 PROPOSED METHOD 133
4.4.1 DATA PREPARATION 135
4.4.2 DEVELOPER SEQUENCE GENERATION 136
4.4.3 DYNAMIC GRAPH AND SUB-GRAPH SAMPLING 137
4.4.4 DATA OPERATION 137
4.4.5 LEARNING GRAPH DATA REPRESENTATION 138
4.4.6 EVALUATION METRICS 140
4.4.7 EXPERIMENT SETUP 141
4.5 EMPIRICAL STUDIES 142
4.5.1 TRANSACTION DATA 143
4.5.2 RESULTS 144
4.6 IMPLICATIONS FOR PRACTICAL USE 144
V. COMPARISON RESULTS 147
5.1 COMPARISON BETWEEN FCP2BERT AND FCP2VEC 147
5.1.1 CHANGE PREDICTION PERFORMANCE 147
5.1.2 COMPUTATION EFFICIENCY 148
5.2. COMPARISON BETWEEN FCP2BERT AND FCP2DGNN 149
5.2.1 CHANGE PREDICATION PERFORMANCE 150
5.2.2 COMPUTATION EFFICIENCY 151
5.3 GENERAL DISCUSSION 152
VI. CONCLUSION 158
5.1 CONCLUSIONS 158
5.2 SUMMARY AND CONTRIBUTIONS 161
5.3 RESEARCH DIRECTIONS 164
REFERENCE 165
ACKNOWLEDGEMENTS 188
- Degree
- Doctor
-
Appears in Collections:
- 대학원 > 산업및데이터공학과
- Authorize & License
-
- Authorize공개
- Embargo2023-08-07
- Files in This Item:
-
Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.