PUKYONG

텍스트 분석과 기계학습 접근법을 활용한 건설업 재해분석

Metadata Downloads
Alternative Title
Text mining and machine learning approach to analyzing industrial incidents in construction sites
Abstract
In the case of an occupational accident investigation, information on the accident should be prepared in an occupational accident survey table and reported to government agencies. Since 2001, more than 80,000 cases of industrial accident surveys have been collected every year, and by analyzing them, it is used to identify the current status of domestic accidents and to prepare industrial accident statistics to prevent accidents in the same type and similar industries. In particular, the disaster overview included in the Industrial Accident Survey Table includes the overall process of accidents, and since a large amount of data is accumulated, it is necessary to analyze keywords and analyze them through machine learning.
The purpose of this study is to derive keyword-type risk factors related to actual disasters using text analysis and machine learning in the disaster overview, identify the relationship between risk factors, and derive the risks of risk factors. To this end, first, unsupervised learning of high-risk fatal accidents is performed to search for risk factors affecting fatal accidents, and similar risk factors are grouped. Next, through supervised learning of disaster documents, the risk of risk factors is identified by searching for keywords of risk factors that classify non-fatal injury and fatal injury.
This study was conducted under two main themes :
The first theme is a clustering study through unsupervised learning of the construction industry disaster overview. We collected 2,448 cases of fatal accidents in the construction industry, the industry where fatal accidents occur the most, and structured it into structured data in the form of a document-term matrix through text data preprocessing to enable machine learning. Using self-organizing map(SOM), an unsupervised learning methodology, structured data were clustered among disaster documents with similar characteristics and visualized as a risk factor map. Keyword analysis was performed based on the disaster documents clustered in each cell of the risk factor map, and clustering was additionally performed on adjacent cells to divide into 5 clusters with similar disaster characteristics. Five clusters were clustered into 1. material-oriented disaster, 2. high place moving-oriented disaster, 3. excavator and collapse-oriented disaster, 4. scaffold-oriented disaster, and 5. crane-oriented disaster. Risk factors with high relevance were derived by deriving keywords for risk factors included in each cluster and analyzing the keywords. In addition, dynamic analysis of risk factors was performed through keyword analysis by year according to cluster.
The second theme is a classification study of disaster documents through supervised learning of construction industry disaster overview. Among the collected documents, 2,853 fatal disasters and 24,133 non-fatal disaster documents in construction industry were analyzed. To perform supervised learning, text data was preprocessed using TF-IDF, and dimensionality reduction was performed through PCA. Through this, a PC-document matrix was created and then divided into a training set and a test set through k-fold validation, and a model that could classify disaster documents was created using four classification methodologies. The classification model used logistic regression, decision tree, neural network, and support vector machine. Confusion matrix was prepared for each classification model, and the accuracy of each model was evaluated through precision, recall, and accuracy, which are accuracy indicators. Finally, keywords representing risk were derived by analyzing keywords according to the classification model, and risk factors affecting classification were derived by analyzing misclassified disaster documents in a neural network model with the highest accuracy. Misclassification can be divided into two types: Type I error and Type II error. Type I error means a high-risk injury accident, and the document that appears as an actual injury accident is incorrectly predicted as a fatal accident. One error resulted in a document containing risk factors highly related to fatal accidents. Type 2 error means accidental fatal disaster, and it is an error of misclassifying documents that appear to be actual fatal accidents as non-fatal accidents, and documents containing risk factors highly related to non-fatal accidents were derived.
The academic contribution was first presented on the usefulness of a text-based disaster summary. Through unsupervised learning of disaster overview, it is possible to more effectively analyze the correlation between risk factors that appear in the process of disaster occurrence, and to understand the risks of risk factors derived from the classification process of disaster documents. Second, a new method that can be used for safety management using machine learning methodology was suggested compared with the previous studies focused on frequency analysis that were analyzed using industrial accident statistics.
The practical contribution of this study is first, through the risk factor map created through unsupervised learning, risk factors are extracted and analyzed from text documents that are difficult for humans to interpret individually and visualized as an easy-to-understand map. Second, by identifying risk factors affecting the classification of documents derived through supervised learning, disasters occurring in the field can be analyzed and prevented in detail. It is expected to help safety managers and supervisors effectively manage safety by presenting detailed types and keywords for major factors of disasters and deriving factors related to risk factors that appear in actual sites.
Author(s)
강성식
Issued Date
2021
Awarded Date
2021. 8
Type
Dissertation
Keyword
재해분석 건설업 텍스트마이닝 기계학습 비지도학습 SOM 지도학습 regression decision tree neural network SVM
Publisher
부경대학교
URI
https://repository.pknu.ac.kr:8443/handle/2021.oak/1278
http://pknu.dcollection.net/jsp/common/DcLoOrgPer.jsp?sItemId=200000508553
Alternative Author(s)
Sungsik Kang
Affiliation
부경대학교 대학원
Department
대학원 안전공학과
Advisor
장성록
Table Of Contents
제 1 장 서 론 : 1
1.1 연구배경 : 1
1.2 연구 필요성 및 목적 : 4
1.3 연구내용 및 범위 : 9
제 2 장 이론적 배경 : 12
2.1 산업재해통계 : 12
2.1.1 전체 재해 현황 : 12
2.1.2 건설업 재해 현황 : 13
2.2 산업재해조사표 : 15
2.3 텍스트데이터 전처리 : 20
2.3.1 텍스트 분석 : 20
2.3.2 차원 축소 : 25
2.3.3 데이터 분할 : 27
2.4 기계학습 방법론 : 29
2.4.1 비지도학습 방법론 : 31
2.4.2 지도학습 방법론 : 37
제 3 장 건설업 재해문서의 비지도학습 : 46
3.1 위험요인 연관성 분석을 위한 SOM 분석 방법 : 47
3.1.1 재해문서의 텍스트 데이터 전처리 : 49
3.1.2 SOM 분석 방법 : 52
3.2 재해문서의 군집화 결과 : 53
3.2.1 텍스트 분석 결과 : 53
3.2.2 SOM 분석 결과 : 54
3.2.3 위험요인지도의 해석 : 56
3.2.4 도출된 군집의 연도별 분석 : 63
3.3 소결 : 67
제 4 장 건설업 재해문서의 지도학습 : 70
4.1 위험성 키워드 도출을 위한 재해문서 분류분석 방법 : 71
4.1.1 위험성 키워드 도출을 위한 텍스트데이터 전처리방법 : 72
4.1.2 분류 모델별 분석방법 : 75
4.2 재해문서의 분류 결과 : 78
4.2.1 텍스트 분석 결과 : 78
4.2.2 PCA를 활용한 재해문서의 특성 선정 결과 : 79
4.2.3 k-fold cross validation을 활용한 데이터 분할 결과 : 81
4.2.4 모형별 분류 결과 : 82
4.2.5 분류 키워드 분석 결과 : 85
4.2.6 오분류된 재해문서의 키워드 분석 : 89
4.3 소결 : 93
제 5 장 결론 : 95
5.1 연구 기여점 : 97
5.2 연구의 제한점 및 추후 연구 : 100
참고문헌 : 102
부록 : 110
A. Keyword conversion table : 110
B. Analysis code : 111
Degree
Doctor
Appears in Collections:
대학원 > 안전공학과
Authorize & License
  • Authorize공개
Files in This Item:

Items in Repository are protected by copyright, with all rights reserved, unless otherwise indicated.