PUKYONG

Research on Computational Methods for Organic Molecule Synthesis and Drug Development Integrating Deep Graph Learning and Intelligent Screening Technologies

Metadata Downloads
Abstract
본 연구는 새로운 유기 고분자 약물의 합성과 개발 과정에서 인공지능 알고리즘의 최적화된 적용을 탐구하여, 유기 고분자 예측의 정확도를 향상시키고 약물-단백질 스크리닝 경로를 최적화하는 것을 목표로 합니다. 그래프 어텐션 네트워크(GAT), 능동 샘플링 게이트드 그래프 컨볼루션 네트워크(ASGGCN), 불균형 데이터 마이닝 기반의 Adaboost-SVM 알고리즘을 도입하여 현재 약물 개발의 핵심 과제를 체계적으로 해결하였습니다. 유기 분자 합성의 가능성 예측 측면에서, GAT 모델을 활용하여 분자 그래프 내의 복잡한 의존성을 포착함으로써 예측 정확도를 크게 향상시켰으며, 특히 복잡한 분자 구조를 처리하는 데 있어 전통적인 GCN 모델을 능가하였습니다. 그러나 대규모 데이터셋에서 GAT 모델의 계산 효율성은 추가적인 최적화가 필요합니다. 예측 성능을 더욱 향상시키기 위해, GCN 을 강화하고 글로벌 어텐션 메커니즘을 통합한 ASGGCN 모델을 제안하였으며, 이는 반응 중심의 예측 정확도를 효과적으로 높였습니다. 실험 결과, ASGGCN 모델은 기존 모델 대비 반응 생성물 예측 정확도와 경로 추천의 실현 가능성에서 우수한 성능을 보였으며, 예측 정확도는 각각 87.7%, 91.6%, 93.1%, 94.8%에 달하였습니다. 그럼에도 불구하고, 이 모델은 복잡도가 높고 학습 시간이 길며, 더 많은 계산 자원이 요구됩니다. 약물-단백질 가상 스크리닝에서는 데이터 불균형으로 인한 예측 부정확 문제를 해결하기 위해 Adaboost 와 SVM 을 결합한 알고리즘을 설계하였습니다. 약한 분류기를 반복적으로 향상시켜 전체 모델의 성능을 개선함으로써, 가상 스크리닝의 정확도와 재현율을 크게 높여 신약 개발에 보다 신뢰할 수 있는 스크리닝 도구를 제공하였습니다. Adaboost-SVM 방법은 고도로 불균형한 데이터를 처리하는 데 뛰어난 성능을 보였지만, 모델의 복잡성과 파라미터 조정 과정이 복잡하여 신중한 최적화가 필요합니다. 위의 연구를 종합하면, GAT 모델은 복잡한 분자 구조 처리에 탁월하며, ASGGCN 모델은 예측 정확도와 경로 추천에서 우수한 성능을 나타내고, Adaboost-SVM 알고리즘은 데이터 불균형 문제를 효과적으로 해결합니다. 이 세 가지 접근법의 시너지 효과는 유기화학 및 제약 연구 분야에서 딥 그래프 학습과 지능형 스크리닝 기술의 큰 잠재력과 실용적 가치를 보여줍니다. 본 연구는 미래의 약물 개발에 새로운 방법과 관점을 제공하여, 유기 분자 합성과 약물 스크리닝의 핵심 문제에 대한 체계적인 해결책을 제시합니다.|This study delves into the optimized application of artificial intelligence algorithms in the synthesis and development processes of novel organic macromolecular drugs, aiming to enhance the accuracy of organic macromolecule prediction and to optimize drug-protein screening pathways. By introducing Graph Attention Networks (GAT), Active Sampling Gated Graph Convolutional Networks (ASGGCN), and an Adaboost-SVM algorithm based on imbalanced data mining, we systematically address key challenges in current drug development. In the aspect of feasibility prediction for organic molecule synthesis, the GAT model is utilized to capture complex dependencies within molecular graphs, significantly improving prediction accuracy, especially in handling complex molecular structures, surpassing traditional GCN models. However, the computational efficiency of the GAT model on large-scale datasets requires further optimization. To further enhance predictive performance, the ASGGCN model is proposed by augmenting GCNs and integrating a global attention mechanism, effectively increasing the accuracy of reaction center predictions. Experimental results indicate that the ASGGCN model outperforms existing models in both reaction product prediction accuracy and pathway recommendation feasibility, achieving prediction accuracies of 87.7%, 91.6%, 93.1%, and 94.8%, respectively. Nevertheless, this model exhibits higher complexity, longer training times, and greater computational resource demands. In drug-protein virtual screening, addressing the issue of inaccurate predictions caused by data imbalance, we designed an algorithm combining Adaboost and SVM. By iteratively enhancing weak classifiers, this method significantly improves the overall model performance, increasing the accuracy and recall rate of virtual screening, thus providing a more reliable screening tool for new drug development. The Adaboost-SVM method excels in handling highly imbalanced data; however, its model complexity and parameter tuning process are intricate and require careful optimization. Integrating the above research, the GAT model excels in processing complex molecular structures, the ASGGCN model demonstrates superior performance in prediction accuracy and pathway recommendation, and the Adaboost-SVM algorithm effectively resolves data imbalance issues. The synergistic application of these three approaches showcases the significant potential and practical value of deep graph learning and intelligent screening technologies in the fields of organic chemistry and pharmaceutical research. This study offers new methods and perspectives for future drug development, not only possessing broad application potential but also providing systematic solutions to key problems in organic molecule synthesis and drug screening.
Author(s)
XU PEILONG
Issued Date
2025
Awarded Date
2025-02
Type
Dissertation
Keyword
Organic molecule synthesis, Virtual screening, Graph convolutional networks (GCN), Attention mechanism, Support vector machine (SVM)
Publisher
국립부경대학교 대학원
URI
https://repository.pknu.ac.kr:8443/handle/2021.oak/34016
http://pknu.dcollection.net/common/orgView/200000849311
Alternative Author(s)
서패용
Affiliation
국립부경대학교 대학원
Department
대학원 컴퓨터공학과
Advisor
Incheol Shin
Table Of Contents
Chapter 1 Introduction 1
1.1 Research Background and Significance 4
1.1.1 Research Significance 6
1.1.2 Recent Research Achievements 7
1.2 Research Objectives and Content 11
1.3 Structure of the Thesis 18
Chapter 2 Background 21
2.1 Graph Convolutional Networks (GCN) and Drug Development 21
2.1.1 Basic Principles of GCN 21
2.1.2 Applications of GCN in Drug Development 22
2.2 Application of Attention Mechanisms in Deep Learning 23
2.2.1 Basic Principles 23
2.2.2 Variants of the Attention Mechanism 24
2.2.3 Future Research Directions 25
2.3 Support Vector Machine (SVM) and its Application in Virtual Screening 25
2.3.1 Basic Principles of SVM 26
2.3.2 Applications of SVM in Virtual Screening 26
2.3.3 Case Study 27
2.3.4 Future Research Directions 28
2.4 Literature Review and Current Research Status 28
2.4.1 Literature Review 28
2.4.2 Current Research Status 30
Chapter 3 Graph Attention Model for Predicting Organic Molecule Making 32
3.1 Overview of Graph Attention Mechanism 34
3.1.1 Basic Principles of the Attention Mechanism 37
3.1.2 Architecture of Graph Attention Networks (GAT) 38
3.2 Representation of Organic Molecules and Construction of GCN Model 39
3.3 Feasibility Prediction of Organic Molecule Synthesis Based on GAT 43
3.4 Experimental Results and Analysis 47
3.4.1 Performance Testing of the Organic Molecule Synthesis Feasibility Prediction Model Algorithm 48
3.4.2 Verification of the Organic Molecule Synthesis Feasibility Prediction Model 51
3.5 Chapter Summary 53
Chapter 4 Algorithmic Improvements and Predictions for Organic Reaction Pathways and Products 56
4.1 Background of Algorithmic Improvements for Organic Reaction Pathway Predictions 57
4.2 Prediction Model for Organic Chemical Reaction Products 60
4.2.1 Single-Step Retrosynthetic Reaction Prediction Based on the Improved GGCN Algorithm 60
4.2.2 Reaction Pathway Recommendation Based on Graph Logic Network Model 66
4.3 Experimental Results and Analysis 72
4.3.1 Analysis of Organic Reaction Synthesis Prediction Results Based on the Improved ASGGCN Algorithm 72
4.3.2 Analysis of GLN Pathway Recommendation Results Based on Retro* 76
4.4 Summary of this Chapter 79
Chapter 5 Optimization of Compound Screening Algorithm Based on Data Balance Uncertainty 81
5.1 Research background of Imbalance Data Mining and Adaboost—SVM 82
5.2 Imbalanced Data Issues and Solutions 84
5.3 Design of Adaboost-SVM Algorithm 87
5.3.1 Virtual Screening Data Preprocessing Based on Density Clustering and Boundary Sampling 88
5.3.2 Drug-Protein Virtual Screening Based on AdaBoost-SVM 92
5.4 Data Preprocessing for Virtual Screening 98
5.5 Experimental Results and Analysis 101
5.6 Summary of this Chapter 106
Chapter 6 Comprehensive Discussion 108
6.1 Comparison and Analysis of Various Research Methods 108
6.1.1 Feasibility Prediction Model for Organic Molecule Synthesis Based on Graph Attention Mechanisms 108
6.1.2 Improved ASGGCN Algorithm 109
6.1.3 Optimization of Compound Screening Algorithm 110
6.1.4 Comprehensive Analysis 111
6.2 Inter-Model Interactions and Data Transmission Mechanisms 113
6.2.1 Logical Relationships Among the Models 113
6.2.2 Data Transmission and Format 116
6.2.3 High-Performance Computing and Parallelization 117
6.2.4 Advantages and Challenges 118
6.3 Application of Multimodal Data Fusion in Drug Screening 119
6.4 Role of High-Performance Computing in Drug Discovery 123
6.5 Prospects for Future Research 126
Chapter 7 Conclusion 130
7.1 Research Summary 130
7.2 Main Contributions 132
7.3 Directions for Future Work 135
Acknowledge 138
References 140
Degree
Doctor
Appears in Collections:
대학원 > 컴퓨터공학과
Authorize & License
  • Authorize공개
  • Embargo2099-12-31
Files in This Item:
  • There are no files associated with this item.

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.