A ‘Cluster-then-Estimate’ Natural Language Processing (NLP) Approach for Classifying Maritime Incident Severity Based on Textual Descriptions
Published in Accident Analysis & Prevention, 2026
Textual incident description is a vital source for understanding the severity of maritime incident. In the maritime industry, relevant authorities and companies typically rely on manual methods to estimate incident severity based on textual descriptions. However, manual estimation is less efficient for assessing vessels’ operational risk or managing historical incident archives, where a large volume of incidents is involved. Therefore, this study proposes a ‘cluster-then-estimate’ approach which uses Natural Language Processing (NLP) techniques to automatically estimate the severity level of incidents based upon their textual descriptions. In the proposed approach, Latent Dirichlet Allocation (LDA) is used to group the preprocessed textual descriptions into multiple clusters, with each cluster representing an incident type. Then, Bidirectional Encoder Representation from Transformers (BERT) model is fine-tuned for each …
Recommended citation: T Chen, M Liang, WS Lee, Y Cai, Q Meng. (2026). A 'Cluster-then-Estimate' Natural Language Processing (NLP) Approach for Classifying Maritime Incident Severity Based on Textual Descriptions. Accident Analysis & Prevention, 228, 108413. https://www.sciencedirect.com/science/article/pii/S0001457526000229
