J. Korean Soc. Hazard Mitig Search

CLOSE


J. Korean Soc. Hazard Mitig. > Volume 25(2); 2025 > Article
Feng and Kim: Comparative Analysis of Risk Factor Weighting GBDT Methods for Enhancing the Accuracy of Flood Risk Assessment

Abstract

In recent years, the increasing frequency of flood disasters has emerged as a significant challenge to sustainable urban development. To improve the precision of urban flood risk assessment and effectively identify flood-prone cities, this study proposes an enhanced evaluation framework that integrates the traditional entropy weighted method with a genetic algorithm (GA) and gradient boosting decision tree (GBDT) for a weighted estimation of risk factors and indicators in flood risk assessments. Utilizing historical data from 16 prefecture-level cities in Shandong Province, China, the model assigns optimized weights to four key dimensions of flood risk or risk factors (hazard, exposure, vulnerability, and tolerance) and constructs a comprehensive urban flood risk assessment model. The resulting flood risk indices provide a quantitative basis for ranking the flood susceptibility of these cities, offering critical insights into targeted disaster risk reduction strategies. Subsequently, the assessment results were validated against urban flood loss data from 2017 to 2019. The GA, as a robust global optimization technique, has been extensively applied to identify optimal solutions for parameterized functions. The GBDT algorithm operates by minimizing the root mean squared error during iterative learning processes, incrementally refining the regression outcomes of the loss function to approach the optimal prediction. Comparative analysis demonstrates that relative to traditional approaches and the GA-based method, the GBDT model achieves substantially higher accuracy in identifying urban flood susceptibility, with error reductions of 8.94%, 24.23%, and 26.05%, respectively. These findings indicate that the proposed flood risk index serves as a reliable and effective decision-support tool, enabling policymakers to rapidly identify cities with inadequate disaster preparedness.

요지

최근 홍수 재해의 빈도가 증가함에 따라 지속 가능한 도시 개발에 큰 도전 과제로 떠오르고 있다. 따라서 엄격한 사전 홍수 위험 평가를 실시한다면 도시의 재난 예방 및 완화 능력이 향상될 뿐만 아니라 위험 관리와 경제 개발 목표의 통합을 촉진하여 안전과 개발 이익을 모두 달성하는데 기여할 수 있을 것이다. 도시 홍수 위험 평가의 정확성을 높이고 홍수에 취약한 도시를 효과적으로 식별하기 위해, 본 연구는 홍수위험도 평가를 위한 위험 요소들과 지표들의 가중치 산정에 있어서 전통적인 엔트로피 가중치 방법과 유전 알고리즘(GA) 및 그래디언트 부스팅 결정 트리(GBDT)를 통합한 향상된 평가 프레임워크를 제안하고자 한다. 즉, 중국 산둥성의 16개 행정구역 단위의 도시에서 수집한 과거 데이터를 활용하여, 홍수 위험의 네 가지 주요 차원 또는 요소인 위험, 노출, 취약성, 허용 오차에 최적화된 가중치를 할당하고 종합적인 도시 홍수 위험 평가 모델을 구성하고자 한다. 결과적으로 도출된 홍수 위험 지수는 이러한 도시의 홍수 취약성 순위를 매기는 정량적인 기반을 제공하며, 재난 위험 감소 목표를 위한 전략에 중요한 통찰력을 제공한다. 평가 결과는 2017년부터 2019년까지의 실제 도시 홍수 손실 데이터와 비교하여 검증하였다. 강력한 전역 최적화 기법인 유전 알고리즘(GA)은 매개변수화된 함수에 대한 최적해를 식별하는 데 광범위하게 적용되고 있다. 또한 그래디언트 부스팅 결정 트리(GBDT) 알고리즘은 반복 학습 과정에서 평균 제곱근 오차(RMSE)를 최소화하며, 손실 함수의 회귀 결과를 점진적으로 정제하여 최적의 예측 결과에 근접하도록 한다. 비교 분석 결과, 전통적인 접근 방식과 GA 기반 방법에 비해 GBDT 모델이 도시 홍수 취약성을 식별하는 데 있어 훨씬 더 높은 정확도를 보여주었으며, 오차 감소율은 각각 8.94%, 24.23%, 26.05%로 나타났다. 따라서 제안된 가중치 산정 방법론에 의해 신뢰할수 있는 홍수위험지수를 도출하고, 효과적인 의사 결정 지원 도구로 활용함으로써 정책 입안자들이 재난 회복력이 불충분한 도시를 신속하게 식별할 수 있도록 도움을 줄 수 있다. 이는 예산의 제약 속에서 보다 효율적인 자원 할당을 촉진하고, 과학적 근거에 의해 홍수 위험 완화 전략 개발을 지원한다.

1. Introduction

Amid the ongoing challenges posed by global climate change, the frequency and intensity of extreme precipitation events are on the rise, leading to an increased incidence of urban waterlogging and flood disasters. In many regions, the scale and recurrence of extreme flood events—as well as the associated economic and non-economic losses—are projected to escalate due to the compounding effects of climate change, rapid urbanization, and population growth (Vyas et al., 2024; Nobile et al., 2025). According to the IPCC AR6 Working Group I report, urbanization has been shown to intensify average and extreme precipitation in areas downwind of cities, thereby increasing surface runoff and exacerbating flood risk (Xiao et al., 2023). To mitigate such flood-induced losses, it is imperative to conduct comprehensive flood risk assessments and to formulate targeted, evidence-based countermeasures. Both domestic and international studies have widely adopted a qualitative flood risk assessment framework consisting of four key dimensions: hazard, exposure, vulnerability, and capacity. Each of these dimensions is further represented by a set of specific indicators. The flood risk index is typically derived by assigning appropriate weights to these indicators and aggregating them into a composite score that reflects the overall risk level (Baky et al., 2020; W. Wang et al., 2024).
Currently, commonly used weighting methods for flood risk assessment indicators include the entropy weight method, Euclidean distance method, analytic hierarchy process (AHP), and the technique for order preference by similarity to ideal solution (TOPSIS). However, flood risk indices derived from these traditional weighting approaches often exhibit limited accuracy when evaluated against the cumulative impacts of long-term rainstorms and typhoon events.
Among these methods, the entropy weight method—based on statistical characteristics—offers the advantage of producing objective, data-driven results. Nonetheless, because the calculation of entropy weights relies on the mean and standard deviation of each indicator’s original data, the resulting weights may be biased toward indicators with higher variability, potentially distorting the overall risk assessment (Feng et al., 2024). Secondly, the Euclidean distance method, often employed as an equal-weighting approach, assigns uniform weights to all indicators regardless of their relative importance. This can result in the underrepresentation of certain indicators that are strongly correlated with flood risk, thereby limiting the explanatory power of the resulting risk index. The analytic hierarchy process (AHP) and TOPSIS methods rely on expert judgment to determine weights (Li et al., 2022). While these methods can incorporate expert knowledge, their outcomes are susceptible to subjective biases and inconsistencies among experts (Kim, 2024). Moreover, these methods are often time-consuming and resource intensive. When new indicators are introduced or expert responses are flawed, the entire weighting process must be repeated, further reducing efficiency and reliability. Given these limitations, it is essential to explore alternative methods that can overcome the shortcomings of traditional weighting techniques. Optimization-based approaches offer the potential to improve the accuracy and robustness of flood risk assessment, especially when validated against historical data on average annual heavy rainfall and typhoon-related disasters (Kiavarz and Jelokhani-Niaraki, 2017; Yang et al., 2020).
In recent years, the accuracy of flood susceptibility maps (FSMs) has significantly improved through the integration of Geographic Information Systems (GIS) and remote sensing (RS) technologies (Chen et al., 2015; Kang et al., 2024). Building on these advancements, a growing body of research has incorporated optimization algorithms to refine the weighting of evaluation factors and enhance the reliability of flood risk assessments (Thi Thuy Linh et al., 2022). Among these, genetic algorithms (GA) have been widely employed due to their robust global search capabilities (Razavi-Termeh et al., 2023; Debbarma et al., 2024), along with other metaheuristic techniques such as particle swarm optimization (PSO; Chen et al., 2025; Mosalla Tabari et al., 2025). Simultaneously, the application of machine learning (ML) techniques has gained momentum in the domain of flood susceptibility modeling (Choi et al., 2025). Prominent ML methods include artificial networks (ANN; Abdellatif et al., 2015; Pham et al., 2021; Bera et al., 2022), support vector machines (SVM; Huang et al., 2010; Majid et al., 2024; X. Wang et al., 2024), random forests (RF; Wang et al., 2015; Zhu and Zhang, 2022; C. Wang et al., 2024), and decision trees (DT; Tehrany et al., 2013; Asiri et al., 2024; Debnath et al., 2024; Sarwar et al., 2025). These algorithms have demonstrated strong classification and generalization performance, making them increasingly prevalent in flood risk mapping and assessment research.
However, due to the complex and multifactorial nature of flood events, existing methods—though capable of simulating flood risk assessments—often face challenges in ensuring the accuracy and robustness of their results. To address this limitation, the present study integrates the traditional entropy weight method with an optimization algorithm (genetic algorithm, GA) and a machine learning approach (gradient boosting decision tree, GBDT) to develop a comprehensive urban flood risk assessment model.
The specific objectives of this study are as follows:
Utilizing historical statistical data, 13 evaluation factors are selected to construct a flood risk assessment model for 16 prefecture-level cities in Shandong Province, China;
The entropy weight method, GA, and GBDT are employed to calculate the weights of the selected evaluation factors, where the GA and GBDT methods use information gain (IG) as the basis for designing their respective fitness functions;
The risk assessment outcomes derived from the different weighting methods are compared against actual flood disaster rankings obtained from official government statistics, with the goal of identifying the weighting scheme that best aligns with historical data;
Flood risk indices are computed for each city to evaluate the relative susceptibility of different urban areas;
Based on the calculated indices, a final ranking of flood risk levels across the study area is produced.
Fig. 1 presents a flowchart outlining the overall calculation and methodological framework of this study.
Fig. 1
Flow Chart
kosham-2025-25-2-9-g001.jpg

2. Study Area and Data Resources

2.1 Research Area

Shandong Province is situated between 114°48′-122°42′E longitude and 34°23′-38°17′N latitude, spanning approximately 721.03 km from east to west and 437.28 km from north to south, with a total land area of 155,800 km2. As of November 2020, the province had a male population of 51,432,931 (50.66%) and a female population of 50,094,522 (49.34%), resulting in a sex ratio (males per 100 females) of 102.67, reflecting a 0.34 percentage point increase compared to the sixth national census. By the end of 2023, the population aged 60 and above reached 23.91 million, constituting 23.62% of the total population. Shandong Province has the largest elderly population in China, characterized by a high absolute number, rapid growth rate, and an accelerating aging trend, presenting significant demographic and socio-economic challenges. Land use types are shown in Fig. 2(b).
Fig. 2
Study Area of Shandong Province
kosham-2025-25-2-9-g002.jpg

2.2 Data Resources

Urban flood risk assessment is commonly conducted using indicator-based methods within a multi-objective decision-making framework. Accordingly, this study constructs a comprehensive indicator system by integrating raster datasets and flood risk maps to quantify 13 indicators spanning four key dimensions: hazard, exposure, vulnerability, and capacity. These indicators serve to systematically evaluate the flood risk level at the city scale. Drawing upon city-level raster data and statistical yearbook records, this research applies three distinct methods for weighting and assessment: the entropy weight method, the genetic algorithm (GA) grounded in optimization principles, and the gradient boosting decision tree (GBDT) based on machine learning. Each method is used to calculate composite scores for the four dimensions, thereby generating an overall flood risk index for 16 prefecture-level cities in Shandong Province, China. A higher score in any dimension indicates a higher level of flood risk associated with that factor. In this study, the hazard component refers primarily to urban flooding, which is typically driven by the intensity and frequency of extreme precipitation events. It reflects both natural environmental characteristics and urban planning-related factors. The exposure dimension encompasses human populations and material assets that are directly at risk during flood events; this is a broad conceptual category reflecting the scale of potential impact. Vulnerability, as a refinement of exposure, captures the degree of susceptibility and is represented by more granular indicators. Finally, capacity (or tolerance) evaluates a city’s ability to absorb, respond to, and recover from flood disasters, considering both human and financial resource allocations that reflect the local government’s disaster management efforts.
A detailed breakdown of the 13 indicators corresponding to each dimension is provided in Table 1.
Table 1
Classification of Disaster Risk Ractors and Indicators
Risk factor Indicator Data sources
Hazard Heavy rainfall frequency European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalyzes (ERA5)
Maximum precipitation
Exposure Portion of agricultural land Resource and Environment Science and Data Center (RESDC)
GDP Shandong Province Statistical Yearbook (SPSY)
Population density
Fixed assets investment
Vulnerability Portion of cultivated land Resource and Environment Science and Data Center (RESDC)
Elderly population density Shandong Province Statistical Yearbook (SPSY)
Disposable income
Number of self-employed individuals
Capacity Disaster prevention facility density AutoNavi
Drainage network density Shandong Province Statistical Yearbook (SPSY)
The hazard component comprises two indicators: the number of heavy rainfall events (24-hour duration, mm) and the maximum 24-hour rainfall (mm). According to the definitions provided by the China Meteorological Administration, rainfall qualifies as “heavy rain” if it meets any of the following thresholds: over 16 mm per hour, over 30 mm within 12 consecutive hours, or 50 mm or more within 24 hours (Citation). In this study, the threshold of pr ≥ 50 mm of rainfall within 24 hours is adopted to identify heavy rainfall events.
Proportion of agricultural land (%), Gross Domestic Product (GDP) per unit area (RMB/km2), Population density (persons/km2), and Fixed assets investment per unit area (RMB 100 million/km2). Proportion of cultivated land (%), Elderly population density (persons aged 65 and above per km2), Per capita disposable income (RMB/person), and Number of self-employed individuals per unit area (persons/km2). Density of disaster prevention facilities (units/km2), Drainage network density (km/km2), and Public security expenditure per unit area (RMB/km2).

3. Methodology

3.1 Genetic Algorithm

The genetic algorithm (GA) is a population-based optimization technique that simulates the principles of natural selection and genetic evolution. It has been extensively applied to identify optimal or near-optimal solutions within large, complex search spaces. The conceptual foundation of GA is inspired by the process of biological evolution, wherein successive generations adapt and evolve through selection, crossover, and mutation. In a genetic algorithm, each potential solution is encoded as an “individual” within a population, typically represented by a string of binary digits—referred to as the genetic code or chromosome. Each individual corresponds to a candidate solution in the problem space and is evaluated using a fitness function that quantifies its performance or suitability for the given optimization objective. The algorithm begins by generating an initial population of individuals, each with a randomly assigned genetic code. Through simulated evolutionary operations—namely selection (based on fitness), crossover (recombination of genetic material), and mutation (random alterations of genes)—new generations of individuals are produced. These operations enable the exploration of the solution space and promote the convergence of the population toward high-quality solutions. Through iterative evolution over multiple generations, the genetic algorithm refines the population and progressively approaches the global optimum. Owing to its robustness and adaptability, GA is particularly well-suited for addressing complex, high-dimensional, and nonlinear optimization problems where traditional methods may struggle to converge or become trapped in local optima

3.2 Construction of Genetic Algorithm Fitness Function

The calculation of the fitness value is a critical step in the implementation of the genetic algorithm, as it serves to evaluate the quality of each individual (chromosome) within the solution space. A higher fitness value indicates a more optimal solution, and the outcome of this evaluation directly influences key genetic operations such as selection, crossover, and mutation. Consequently, the design of the fitness function significantly impacts the evolutionary trajectory and convergence efficiency of the algorithm.
In this study, the fitness function is designed based on the principle of minimizing conditional information entropy, under the assumption that information entropy reflects the degree of uncertainty within the system. Our objective is to identify a set of optimal weights w = (w1, w2, …, wn) that effectively integrate the contributions of individual evaluation factors and jointly minimize the conditional entropy of the target variable.
Accordingly, the core logic of the fitness function we propose can be summarized as follows:
Step1: Population Initialization:
The genetic algorithm begins by initializing the population, where the population size is set to P = 100 individuals. Each individual represents a candidate solution in the form of a weight vector of length n, corresponding to the number of evaluation indicators. A total of PPP weight vectors are randomly generated, each satisfying the predefined constraints: all weights wi must be strictly greater than zero wi > 0, and the sum of all weights must equal one i=1nwn=1. The maximum number of generations is defined as T = 200, after which the algorithm terminates if no earlier convergence is achieved:
(1)
Wj=(w1,w2,...wn)
Step2: Given that the optimization objective is to maximize the target output Y, the fitness function is designed to maximize a scoring function maxF(W). However, since the conditional entropy associated with the output is typically a negative value, a smaller (i.e., more negative) weighted conditional entropy corresponds to a larger information gain, which aligns with the goal of enhancing model performance. Therefore, the fitness function is constructed such that minimizing the conditional entropy effectively results in the maximization of the fitness value. This approach ensures that the genetic algorithm favors weight vectors that yield greater information gain and, consequently, more accurate flood risk assessments.
Step3: Selection and retention: Use roulette wheel selection, normalize the fitness of all individuals to the selection probability Pj
(2)
Pj=F(Wj)Fminip(F(Wi)Fmin)
Based on the elite retention strategy, the top 5% of individuals in the parent generation are directly copied to the next generation to avoid crossover and mutation, improve the convergence speed of the calculation and retain the optimal solution.
Step4: Crossover and mutation: For the remaining non-elite individuals in the parent generation, binary crossover (SBX) is performed with a probability of α = 0.8. After crossover, the chromosomes of the offspring are normalized:
(3)
Wχld=αWA+(1α)WB
At the same time, a perturbation value is added to each offspring chromosome with a probability of 0.1 and normalized again.
Iterate steps (1)-(4) T= 200 times and finally output the optimal solution.

3.3 GBDT (Gradient Boosting Decision Tree)

GBDT (Gradient Boosted Decision Tree) is an ensemble learning method based on decision trees. Its core idea is to build a strong predictive model by iteratively training a series of weak predictive models (typically regression decision trees) and optimizing the prediction error at each iteration. GBDT is a Boosting method that progressively reduces the residual error (i.e., the difference between the true value and the predicted value) by combining multiple weak learners (usually regression trees). Unlike traditional AdaBoost, GBDT employs the gradient descent method to optimize the objective function. In each iteration, the goal of GBDT is to minimize the loss function of the current model (such as mean squared error or logarithmic loss) and incrementally improve the model by fitting the residuals.
GBDT consists of multiple trees, each of which performs multiple node splits. The total contribution of each feature is calculated, and the sum of the information gain of a feature Xi at all split points in all trees reflects its importance:
(4)
Feather Importance (Xi)=IG(Xi)
Unlike the GA method, GBDT is an internal black-box optimization algorithm, but its underlying logic is based on information gain. Therefore, the procedures for calculating weights using GBDT and GA are essentially similar. To avoid redundant computation, the differences between GBDT and GA are presented:
Build T= 200 regression trees, and select a feature that can best reduce the loss for each tree t to split,
(5)
Xtn=argmaxIG(Xi)
Each tree selects split features based on the gradient residual:
(6)
riT=ϑL(yi,F(Wi)ϑF(Wi)|F(Wi)=FT1(Wi)
Use the pseudo residual as the new target value and train a regression tree hm(x). The output of this regression tree is a fit to the residual. Calculate the optimal weight of the regression tree and find an optimal step coefficient γm to minimize the objective function:
(7)
γm=argi=1nL(yi,Fm1(xi)+γhm(xi))
Add the new tree to the current model and update the prediction value:
(8)
Fm(x)=Fm1(x)+γmhm(x)
Iterate times and output the optimal solution.

3.4 Entropy Method

The entropy weight method determines the objective weight based on the degree of variation of the indicator. The greater the degree of variation of the indicator, the more information it provides. Therefore, the greater the role it can play in the comprehensive evaluation, the greater the weight it will be given. The weight calculation process is as follows:
(9)
fij=bij+1i=1k(bij+1),bij=xijxminxmaxxmin
(10)
Hj=1lnki=1kfijlnfij,(j=1kHj=1,0Hj1)
(11)
wj*=1Hji=1k(1Hj),(i=1kwj*=1,0wj*1)
Where, xij is membership degree; fij is overall sample entropy information; Hj is entropy value; j* is entropy weight vector.

3.5 Information Gain

Information Gain (IG) is a widely adopted metric in feature selection and decision tree learning, used to quantify the effectiveness of a feature in reducing uncertainty within a dataset. Specifically, IG measures the decrease in entropy—a statistical measure of randomness or impurity—when a dataset is partitioned based on a particular feature. A higher information gain indicates that the feature contributes more significantly to distinguishing between different data categories, thereby playing a more important role in the modeling process. Formally, the Information Gain of a feature with respect to a dataset is defined as:
(12)
IG(Xi)=H(Y)H(YXi)
(13)
H(Y)=jn(PYj)logP(Yj)
(14)
H(YXi)=jnP(YjXilogP(YjXi)
Where, H(Y)The information entropy of the target variable measures the uncertainty of the original data.
H(Y|Xi): It measures the residual uncertainty of Y under the premise of factor Xi also known as conditional entropy.
To assign higher weights to evaluation factors that contribute more substantially to the prediction of the target variable, it is necessary to compute the minimum weighted conditional entropy of the target. In this context, if a given evaluation factor Xi is highly informative in determining the outcome Y, the conditional entropy IG(Xi) will be relatively low. This indicates that the uncertainty in predicting Y is significantly reduced when Xi is known, thereby implying a higher information gain. Consequently, factors that yield smaller conditional entropy values are considered to have stronger explanatory power and should be assigned higher weights in the flood risk assessment model.
Based on the theory of information gain (IG), a constraint function for the genetic algorithm (GA) is constructed to guide the optimization process. Since Gradient Boosting Decision Trees (GBDT) inherently rely on the principle of information gain for feature selection and split decisions, the use of information gain as a criterion in parameter tuning is both theoretically consistent and practically effective.

4. Results and Discussion

4.1 Results

Based on the statistical dataset of each evaluation factor up to the year 2016, a historical data model is constructed to calculate the weights of urban flood risk assessment factors for 16 prefecture-level cities in Shandong Province. These calculated weights are then applied to the statistical data from 2017 to 2019 to examine whether the historical model can effectively reflect the actual flood loss rankings reported in government statistics during that period, as well as the results generated by the model. The weights of the evaluation factors derived from the entropy method, genetic algorithm (GA), and gradient boosting decision tree (GBDT), based on the historical dataset, are presented in Table 2.
Table 2
Weights of Each Indicator Factor
Risk factors Indicators Entropy GA GBDT
Hazard Heavy rainfall frequency 0.613291 0.2 0.734427
Maximum precipitation 0.386709 0.8 0.265573
Exposure Proportion of agricultural land 0.061129 0.288626 0.038596
GDP 0.459245 0.162228 0.545837
Population density 0.089896 0.34445 0.281302
Fixed assets investment 0.389729 0.204697 0.134265
Vulnerability Portion of cultivated land 0.359459 0.138547 0.034748
Elderly population density 0.108812 0.19404 0.566454
Disposable income 0.298862 0.350129 0.06621
Number of self-employed individuals 0.232866 0.317285 0.332588
Capacity Disaster prevention facility density 0.245054 0.279404 0.38657
Drainage network density 0.452247 0.416951 0.241705
Expenditure for public security 0.302698 0.303645 0.399639
Based on the obtained weights, urban flood disaster risk scores are calculated using different methods, and corresponding rankings are generated. The weights derived from historical data are applied to the statistical data from 2017 to 2019, and the resulting rankings are compared with the official flood loss rankings reported in government statistics for the same period (Tables 3~5). To evaluate the performance of each method, the root mean square error (RMSE) values (Table 6) between the model-generated rankings and the actual rankings are calculated and compared.
Table 3
Damage Ranking 2017
Cities 2017 Observed damage ranking
Entropy GA GBDT
Binzhou 12 15 11 9
Dezhou 6 14 16 16
Dongying 11 16 15 14
Heze 15 11 9 13
Jinan 3 2 3 10
Jining 14 8 8 1
Liaocheng 7 9 12 15
Linyi 8 13 14 5
Qingdao 16 1 1 12
Rizhao 5 12 13 7
Taian 9 7 10 11
Weifang 2 10 7 2
Weihai 1 4 6 4
Yantai 13 6 5 6
Zaozhuang 10 5 4 8
Zibo 4 3 2 3
Table 4
Damage Ranking 2018
Cities 2018 Observed damage ranking
Entropy GA GBDT
Binzhou 8 15 15 11
Dezhou 16 4 4 14
Dongying 4 16 16 16
Heze 9 12 13 1
Jinan 11 2 2 10
Jining 7 9 10 6
Liaocheng 12 10 8 13
Linyi 15 5 5 2
Qingdao 13 1 1 3
Rizhao 2 14 14 12
Taian 14 7 7 8
Weifang 10 13 12 4
Weihai 1 6 9 15
Yantai 3 11 11 7
Zaozhuang 5 8 6 9
Zibo 6 3 3 5
Table 5
Damage Ranking 2019
Cities 2019 Observed damage ranking
Entropy GA GBDT
Binzhou 6 15 12 13
Dezhou 11 11 15 15
Dongying 7 16 14 10
Heze 13 7 16 11
Jinan 5 2 2 5
Jining 14 5 11 8
Liaocheng 8 12 13 14
Linyi 9 13 10 7
Qingdao 16 1 1 4
Rizhao 2 14 7 9
Taian 12 6 9 1
Weifang 15 10 5 12
Weihai 1 8 6 2
Yantai 4 9 4 3
Zaozhuang 10 4 8 16
Zibo 3 3 3 6
Table 6
RMSE Results
GA
RMSE 2017 2018 2019
5.57 5.7 5.11
Enhance 0.36% 23.69% 10.66%
GBDT
RMSE 2017 2018 2019
5.09 5.66 4.23
Enhance 8.94% 24.23% 26.05%
In this study, differences in the calculated rankings were evaluated using the root mean square error (RMSE). Using the ranking results derived from the entropy method as a baseline, the outcomes obtained through GA and GBDT were compared against those of the entropy method to examine the similarities and differences in ranking performance under a shared theoretical framework, as well as their respective proximities to the actual observed values. All disaster damage data are based on the annual government work report of each city’s mayor, and also refer to the annual statistics of local competent authorities.
The RMSE results indicate that both the GA and GBDT methods achieve varying degrees of improvement in ranking accuracy compared to the entropy-based method. Under the same information gain framework, the GBDT approach demonstrates a more substantial enhancement in the accuracy of weight assignment. This improvement may be attributed to the distinct algorithmic characteristics of the two methods. The GA method, as a global optimization algorithm based on evolutionary principles, is sensitive to the design of the objective function and initial parameters. During the processes of chromosome crossover and mutation, portions of the optimal solution may be lost, potentially affecting the final result. In contrast, GBDT is a black-box optimization algorithm that operates as an ensemble learning method based on decision trees. It iteratively reduces residual errors from previous models, enabling the capture of complex nonlinear relationships and the integration of multiple weak learners into a strong predictive model. GBDT exhibits strong capabilities in modeling nonlinear interactions among features, achieves rapid error convergence, and is particularly well-suited for small to medium-sized datasets.

4.2 Disscussion

The assessment results of GA and GBDT are showed as Figs. 6 and 7. In the GA modeling process, general default values were used for the initialization parameters, which may have influenced the overall optimization performance. As a global optimization algorithm, GA is susceptible to the loss of optimal solutions during the evolutionary process, particularly in the stages of chromosome crossover and mutation. In contrast, GBDT operates as a built-in algorithm with an implicit computation process. Once its parameters are defined, the algorithm proceeds automatically according to its internal mechanism. It remains uncertain whether the task performed by GBDT is equivalent to the fitness function explicitly defined in the GA framework. However, since GBDT continuously optimizes the residuals at each iteration, it effectively incorporates an additional refinement step compared to GA. This may explain the superior performance observed in the results generated by the GBDT model.
Fig. 6
The Assessment Results of GBDT
kosham-2025-25-2-9-g003.jpg
Fig. 7
The Assessment Results of GA
kosham-2025-25-2-9-g004.jpg

5. Conclusion

Based on the theory of information gain and historical statistical data, this study applied the GA and GBDT methods to calculate the weights of urban flood risk evaluation factors for 16 prefecture-level cities in Shandong Province, followed by model construction and risk assessment. The entropy weight method was employed as a baseline for comparison, and the results were validated using actual disaster data from 2017 to 2019. The findings indicate that both the GA and GBDT methods improved the accuracy of urban flood risk ranking to varying degrees, with the weights derived from the GBDT method exhibiting a more pronounced enhancement compared to those obtained through the entropy method. Impact of compound Hot-Humid event, posing significant risks. Similarly, inland cities are more vulnerable to compound Hot-Drought events, further intensifying climate-related challenges.

Acknowledgement

This work was supported by Korea Environmental Industry & Technology Institute (KEITI) through R&D Program for Innovative Flood Protection Technologies against Climate Crisis Project, funded by Korea Ministry of Environment (MOE) (2022003460002).

References

1. Abdellatif, M, Atherton, W, Alkhaddar, R, and Osman, Y (2015) Flood risk assessment for urban water system in a changing climate using artificial neural network. Natural Hazards, Vol. 79, No. 2, pp. 1059-1077.
crossref pdf
2. Asiri, M.M, Aldehim, G, Alruwais, N, Allafi, R, Alzahrani, I, Nouri, A.M, et al (2024) Coastal flood risk assessment using ensemble multi-criteria decision-making with machine learning approaches. Environmental Research, Vol. 245, pp. 118042.
crossref
3. Baky, M.A.A, Islam, M, and Paul, S (2020) Flood hazard, vulnerability and risk assessment for different land use classes using a flow model. Earth Systems and Environment, Vol. 4, No. 1, pp. 225-244.
crossref pdf
4. Bera, S, Das, A, and Mazumder, T (2022) Evaluation of machine learning, information theory and multi-criteria decision analysis methods for flood susceptibility mapping under varying spatial scale of analyses. Remote Sensing Applications:Society and Environment, Vol. 25, pp. 100686.
crossref
5. Chen, H, Ito, Y, Sawamukai, M, and Tokunaga, T (2015) Flood hazard assessment in the kujukuri plain of chiba prefecture, Japan, based on GIS and multicriteria decision analysis. Natural Hazards, Vol. 78, No. 1, pp. 105-120.
crossref pdf
6. Chen, X, Wang, Z, Yang, H, Liang, Q, Li, J, and Cai, Y (2025) Assessing dynamic flood vulnerability variations in urban functional zones using dynamic population data and a PSO-based weighting approach. International Journal of Disaster Risk Reduction, Vol. 116, pp. 105154.
crossref
7. Choi, Y, Wang, Z, Yang, H, Liang, Q, Li, J, and Cai, Y (2025) Assessing dynamic flood vulnerability variations in urban functional zones using dynamic population data and a PSO-based weighting approach. International Journal of Disaster Risk Reduction, Vol. 116, pp. 105154.
crossref
8. Debbarma, S, Dey, S, Bandyopadhyay, A, and Bhadra, A (2024) Simulation of flood inundation extent by integration of HEC-HMS, GA-based rating curve and cost distance analysis. Water Resources Management, Vol. 38, No. 4, pp. 1397-1417.
crossref pdf
9. Debnath, J, Debbarma, J, Debnath, A, Meraj, G, Chand, K, Singh, S.K, et al (2024) Flood susceptibility assessment of the agartala urban watershed, India, using machine learning algorithm. Environmental Monitoring and Assessment, Vol. 196, No. 2, pp. 110.
crossref pmid pdf
10. Feng, Q, Kim, D, Wang, W, Lee, J, Kim, K, and Kim, H.S (2024) A case study:Evaluation of urban flood resilience based on fuzzy mathematics and VIKOR method in Ulsan metropolitan city, South Korea. KSCE Journal of Civil Engineering, Vol. 28, No. 4, pp. 1554-1565.
crossref pdf
11. Huang, Z, Zhou, J, Song, L, Lu, Y, and Zhang, Y (2010) Flood disaster loss comprehensive evaluation model based on optimization support vector machine. Expert Systems with Applications, Vol. 37, No. 5, pp. 3810-3814.
crossref
12. Kang, J, Lee, W, and Jun, H (2024) Analysis of flood damage mitigation effects based on the presence of a rainwater-retaining facility through flood-depth distribution analysis by grid. Journal of the Korean Society of Hazard Mitigation, Vol. 24, No. 6, pp. 87-96.
crossref pdf
13. Kiavarz, M, and Jelokhani-Niaraki, M (2017) Geothermal prospectivity mapping using GIS-based ordered weighted averaging approach:A case study in Japan's akita and iwate provinces. Geothermics, Vol. 70, pp. 295-304.
crossref
14. Kim, H (2024) Analysis developing complex disaster response scenatios using social network analysis. Journal of the Korean Society of Hazard Mitigation, Vol. 24, No. 2, pp. 17-27.
crossref pdf
15. Li, Z, Luo, Z, Wang, Y, Fan, G, and Zhang, J (2022) Suitability evaluation system for the shallow geothermal energy implementation in region by entropy weight method and TOPSIS method. Renewable Energy, Vol. 184, pp. 564-576.
crossref
16. Majid, S.I, Kumar, M, Sahu, N, Kumar, P, and Tripathi, D.K (2024) Application of ensemble fuzzy weights of evidence-support vector machine (Fuzzy WofE-SVM) for urban flood modeling and coupled risk (CR) index for ward prioritization in NCT Delhi, India. Environment, Development and Sustainability.
crossref pdf
17. Mosalla Tabari, M, Ebadi, H, and Alizadeh Zakaria, Z (2025) PSO-random forest approach to enhance flood-prone area identification:Using ground and remote sensing data (Case study:Ottawa-Gatineau). Earth Science Informatics, Vol. 18, No. 2, pp. 215.

18. Nobile, E.G.L, Figueiredo, R, Arrighi, C, Romão, X, and Martina, M.L.V (2025) Flood risk assessment of cultural heritage across countries and spatial scales. International Journal of Disaster Risk Reduction, Vol. 118, pp. 105236.
crossref
19. Pham, B.T, Luu, C, Dao, D.V, Phong, T.V, Nguyen, H.D, Le, H.V, et al (2021) Flood risk assessment using deep learning integrated with multi-criteria decision analysis. Knowledge-Based Systems, Vol. 219, pp. 106899.
crossref
20. Razavi-Termeh, S.V, Sadeghi-Niaraki, A, Seo, M, and Choi, S (2023) Application of genetic algorithm in optimization parallel ensemble-based machine learning algorithms to flood susceptibility mapping using radar satellite imagery. Science of the Total Environment, Vol. 873, pp. 162285.
crossref pmid
21. Sarwar, J, Khan, S.A, Azmat, M, and Khan, F (2025) An application of hybrid bagging-boosting decision trees ensemble model for riverine flood susceptibility mapping and regional risk delineation. Water Resources Management, Vol. 39, No. 2, pp. 547-577.
crossref pdf
22. Tehrany, M.S, Pradhan, B, and Jebur, M.N (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. Journal of Hydrology, Vol. 504, pp. 69-79.
crossref
23. Thi Thuy Linh, N, Pandey, M, Janizadeh, S, Sankar Bhunia, G, Norouzi, A, Ali, S, et al (2022) Flood susceptibility modeling based on new hybrid intelligence model:Optimization of XGboost model using GA metaheuristic algorithm. Advances in Space Research, Vol. 69, No. 9, pp. 3301-3318.
crossref
24. Vyas, S, Barua, A, Mallikarjuna, C, and Baghel, T (2024) An integrated approach for managing drought risks in the eastern himalayan region of India. International Journal of Disaster Risk Reduction, Vol. 112, pp. 104789.
crossref
25. Wang, C, Wang, K, Liu, D, Zhang, L, Li, M, Imran Khan, M, et al (2024) Development and application of a comprehensive assessment method of regional flood disaster risk based on a refined random forest model using beluga whale optimization. Journal of Hydrology, Vol. 633, pp. 130963.
crossref
26. Wang, W, Kim, D, Kang, Y, Haraguchi, M, Kim, H.S, and Kim, S (2024) Developing a flood risk assessment model with genetic algorithm-based weights. Journal of Hydrology, 642, Vol. 131902.
crossref
27. Wang, X, Chen, W, Yin, J, Wang, L, and Guo, H (2024) Risk assessment of flood disasters in the Poyang lake area. International Journal of Disaster Risk Reduction, Vol. 100, pp. 104208.
crossref
28. Wang, Z, Lai, C, Chen, X, Yang, B, Zhao, S, and Bai, X (2015) Flood hazard risk assessment model based on random forest. Journal of Hydrology, Vol. 527, pp. 1130-1141.
crossref
29. Xiao, S, Zou, L, Xia, J, Dong, Y, Yang, Z, and Yao, T (2023) Assessment of the urban waterlogging resilience and identification of its driving factors:A case study of Wuhan city, China. Science of the Total Environment, Vol. 866, pp. 161321.
crossref pmid
30. Yang, T, Li, Q, Chen, X, De Maeyer, P, Yan, X, Liu, Y, et al (2020) Spatiotemporal variability of the precipitation concentration and diversity in Central Asia. Atmospheric Research, Vol. 241, pp. 104954.
crossref
31. Zhu, Z, and Zhang, Y (2022) Flood disaster risk assessment based on random forest algorithm. Neural Computing and Applications, Vol. 34, No. 5, pp. 3443-3455.
crossref pdf


ABOUT
ARTICLE CATEGORY

Browse all articles >

BROWSE ARTICLES
AUTHOR INFORMATION
Editorial Office
1014 New Bldg., The Korea Science Technology Center, 22 Teheran-ro 7-gil(635-4 Yeoksam-dong), Gangnam-gu, Seoul 06130, Korea
Tel: +82-2-567-6311    Fax: +82-2-567-6313    E-mail: master@kosham.or.kr                

Copyright © 2025 by The Korean Society of Hazard Mitigation.

Developed in M2PI

Close layer
prev next