J. Korean Soc. Hazard Mitig Search

CLOSE


J. Korean Soc. Hazard Mitig. > Volume 17(2); 2017 > Article
Yu, Necesito, Kim, Cheong, and Jeong: Development of Multivariate Flood Damage Function for Flood Damage Assessment in Gunsan City, Korea

Abstract

한국의 경제성장과 재해방재기술 향상에도 불구하고 홍수와 같은 자연재해는 여전히 국민들을 위협하고 있으며 특히, 태풍에 의해 발생한 홍수피해는 막대한 재산과 인명피해의 원인이 되어왔다. 홍수에 의한 피해액을 추정하는 것은 홍수대책마련과 홍수피해를 저감하는데 매우 중요하다. 이와 관련하여 피해 및 손실을 평가 및 분석하고 재해 위험 관리 계획에 따라 해당 지역의 재난 위험을 평가하도록 지정된 정부 기관인 국민안전처는 새로운 손실 산정 방법을 개발하고 있다. 본 연구의 목적은 범용 최소자승 회귀분석법과 지리적 가중 회귀분석법을 이용하여 주거용, 상업용, 농업용 건물 유형을 기반으로 군산시의 홍수 피해를 추정하는 홍수피해추정 손실함수를 개발하는 것이다. 모델은 홍수심, 홍수 지속시간, 범람 면적, 가족 수입, 토지 가격을 매개 변수로 구축 된다. 본 연구에서는 홍수피해추정을 위해 범용 최소자승 회귀분석법과 지리적 가중 회귀분석법을 평가하였으며 지리적 가중 회귀 분석법이 홍수피해 추정에는 더 적합한 것으로 분석 되었다.

요지

Despite the growing economy and improving disaster prevention techniques of Korea, natural disasters such as floods, typhoon, drought have still threatened people. The counteractions made by flood disasters that were also induced by typhoons have caused significant damages to properties and human life. Estimating flood damage is essential make countermeasures in order to mitigate flood disaster. In this regard, the Ministry of Safety and Security (MPSS), the government institution designated to assess and analyze the damages and losses as well as evaluate the disaster risks of the said areas in accordance to their disaster risk management plans, are now developing a new estimating method for damages and losses. This study aims to develop flood damage functions that will estimate the flood damages of Gunsan City based on the building type: residential, commercial and agricultural facilities, by utilizing the Ordinary Least Squares Regression (OLS) and later on, the Geographically Weighted Regression (GWR). The model building process includes flood depth, flood duration, inundated area, family income and land price as the parameter variables. Both OLS and GWR were evaluated in this study, but the search for which among them is the ‘best fit’ resulted to the use of GWR.

1. Introduction

Korea has become one of the leading countries in Northwestern Pacific region after 1960’s due to its booming economic growth (Kim et al., 2007). However, this fast-growing economic growth was counteracted by the damages and losses caused by weather-related disasters. Korea has gone through a lot of disastrous events most especially brought about by typhoons. Typhoon Rusa (2002), Maemi (2003) and Sanba (2012) caused significant damages to properties and human life in the country. The amount of losses, both economic and insured, was estimated to be billions of US dollars.
Due to these catastrophic events, Korea’s National Disaster Management Institute (NDMI) is now taking steps to provide a better prevention and solution for the increasing damage and losses during flood events. The Multi-Dimensional Flood Damage Analysis (MDFDA) for estimating floods damage used decades ago, is proven to overly or under estimate flood damages. Since this method originated from Japan, and thus, the way it was derived was not based from the actual living condition of Korea, the factors used in the Multi-Dimensional Method may not be accurately applicable.
Korea has been subjected to different natural hazards such as typhoons, floods, droughts, landslides, snowstorms, tsunami and earthquakes both at smaller and larger scales. Apart from this fact, this nation, at present, has a growing population of 51,202,130 and a population density of about 513 people per sq. km, (Ministry of Security and Public Administration, 2014). With this approximate 4% increase in population from 2009, in proportion to its growing vulnerability against hazards, priorities for a better disaster management is needed to ensure the safety of its people.
Flood damage refers to the effects and all the varieties of harm, which was caused by flooding (Messner and Meyer, 2005). The tangible damages are those which can be evaluated quantitatively in economic terms such as damage to lifelines, buildings, etc. These can be expressed monetarily. On the other hand, the intangible damages include the damages recorded through loss of lives etc. and thus cannot be measured by money. The effects of flood damage are further divided into two: direct (physical damage) and indirect (the after-effect of the hazard, let us say flood). The indirect damage is the most challenging part in dealing with the recovery phase. Examples of indirect damage include traffic disruptions, reduced productivity and reduce competitiveness of different economic sectors due to affected public services (Smith and Ward 1998). Moreover, direct damages are caused by physical contact of floodwater and indirect flood damages are those caused through interruption and disruption of economic and social activities as a consequence of direct flood damage as shown in Fig. 1.
Fig. 1
Concept Diagram of Direct and Indirect Damages
KOSHAM_17_02_247_fig_1.gif
Flood depth-damages curve (Smith, 1994), also known as loss function (Smith, 1994; White, 1945, 1964), is the most frequently used way to estimate damage. There are two ways to find the depth-damage curve. One is to do the statistical analysis by using the data collection of the damage after flood, and the other is to do the hypothetical analysis by simulating the flood condition and generating the synthetic depth-damage curve (Smith, 1994; Dutta et al., 2003). U.S. Army Corps of Engineers used the 1983’s, 1986’s, 1995’s, and 1996’s data after the flood events, which happened in California Central Valley. Penning-Rowsell et al. (1977) divided the buildings into 21 categories, to determine the total 168 depth-damage curves of each type of building for two kinds of duration and four types of societal condition. Rainfall, topography, meteorological, physical and human factors such as flood prevention measures also influence the acquired damage (Yang et al., 2005). Besides, building type (Smith, 1994; FEMA, 1977; McBean et al., 1988; Chang et al., 2008), other parameters such as income per household (Lekuthai et al., 2001; McBean et al., 1988), flood forecast and alarm systems (Wind et al., 1999; David, 2001; Du Plessis, 2002), time of recognize flood occurrence in advance (Penning-Rowsell et al., 1977; Thieken et al., 2005), flood experiences (McBean et al., 1988; Wind et al., 1999; McPherson et al., 1977), disaster prevention (Penning-Rowsell et al., 1977), frequency of flood (Lekuthai et al., 2001; McBean et al., 1988; Thieken et al., 2005), flood velocity (Smith, 1994; CH2M HILL, 1974; Black, 1975; Beck et al., 2002), number of family per household (McBean, 1988; Shaw et al., 2005) and building location (Chang et al., 2008; Shaw et al., 2005) were also found to affect the damage. As the flood damage is caused by various factors, Shaw et al. (2005) suggested the use of a multiple regression analysis model. This study develops the multivariate flood damage functions for residential, commercial and agricultural sectors based on the collected data of the following parameters: flood impact, building characteristics, socio-economic status and damage after the flood event on August, 2012 in Gunsan City.

2. Study Area

Gunsan City (see Fig. 2), is a city in North Jeolla Province (Jeollabuk-do) located in the south of Geum River. It sits on the fertile western Honam plain where much rice is harvested. Currently, Gunsan’s economy thrives on fishing and agriculture. This city has a total area of 680.11 sq. km. and has a population of 278,495 or 111,275 households. Thus, its population density reached 410/km2. Jeollabuk-do owned 14 districts and ranked 8th in the country’s most flood vulnerable provinces. In this regard, the authors were motivated to provide a statistical analysis that can be defined out of the available data in Gunsan City. The area has been the subject of several flood events in the past. The city It has been experiencing a significant population growth especially during the last decade. In proportion to the increasing level of urbanization of the city, the degree of its vulnerability to flood damages is also increasing.
Fig. 2
Gunsan City, Korea
KOSHAM_17_02_247_fig_2.gif

2.1 Flood Impact Parameters

Certain factors have proven to affect the impacts of flooding incidents. Therefore, the relevance of knowing the factors is needed, because it would help us recognize how to improve the disaster risk management system in terms of awareness and preparedness issues. Water depth, flood duration, inundated area, family income and land price are the flood damage influential factors used in this study.
Stage-damage curves are the usual representation of flood damage versus flood depth with the latter as the independent variable. This is one of the methods to predict the most probable amount of damage to be acquired given a certain flood depth. Also, this factor is one of the most abundant data during flooding incidents. It is easy to observe and is therefore easy to quantify. Flood duration is also a factor in determining the impacts of floods. Run-up time and the time between the first warning and the actual flood define flood duration. In other cases, like flash floods, where there is short run-up time, a perilous threat as well as damage is achieved. Inundated area is also regarded as a factor in this study. The larger the area submerged to flood, the larger the amount of damages it could cause. Thus, the number of people, buildings and other infrastructure and properties included in a specific area are therefore affected.
Family income is also one factor to be considered. The numbers of people living in a certain type of building, may it be residential, commercial or any other types are to be considered. An example would be a family of three living in house, wherein two of which (the father and the mother) are working, the expected income (for example) would be about five million won (5,000 USD). If that is the case, calculation for the subject property of the owner through the use of some capitalization rate methods would help get the property value. Since flooded area is one of the considerations for monetary flood impacts, land price should also be taken into consideration. In the case of Korea, commercial land and agricultural land have great weight in the economic ecosystem. If the said land types were flooded, then the inclusion of damages would be placed in the amount of the total reported accumulated damages. An example is the agricultural land wherein if it were to be flooded, the damaged land would also account for the damaged fruits and vegetables that would have been reaped.
All of the factors stated above are contributors to the flood impacts. The impacts might vary in certain degrees most especially in the quantified direct, indirect, tangible and intangible damages which are not under the scope of this study.

3. Methodology

3.1 Ordinary Least Squares

Ordinary Least Squares (OLS) Regression is one of the ways to ideally show the relationship among the parameter variables. In OLS, one of the conditions is that the error terms, εi’s are assumed to be Independently, Identically Distributed (IID) random variables with mean zero and a constant variance σ2. This is the general equation of OLS model:
(1)
y1=βo+j=1pXijβj+εi
Where:
βo = Intercept coefficient
β j = Slope coefficient for the jth independent variable Xij
εi = Random error term
I = n × n identity matrix
The model can now be in the form of:
(2)
Y=Xβ+ε
Moreover, using an independent assumption and constant variance, β would be in the form of:
(3)
β^=(XTX)1XTy
Where:
(XTX)-1 = Inverse Mat
T = Transpose matrix
The use of Analysis of Variance (ANOVA) is one method to validate and check whether the regression model is statistically significant or not. This makes use of F-value or p-value. However, T-test (p-value) is used to check the statistical significance of each parameters or variables. The following is the formula for the T-test.
(4)
T=β^ssxx
Where the following sub-equations are necessary to arrive at the general equation presented above:
(5)
β^=sxysxx
(6)
sxy=(xx¯)(yy¯)
(7)
sxx=(xx¯)2
Lastly, the value for R-squared is calculated as
(8)
R2=s2xysxxsyy
Where:
(9)
syy=(yy¯)2
The redundancies of model variables are expressed through collinearity. In this research, we used the multi-co-linearity condition number to determine the existing problem in the model. If the multi-co-linearity condition number exceeded 30, the model would depict large variances and co-variances, as well as large confidence intervals and insignificant coefficients. Therefore, if the value were to exceed the maximum then this would imply that the model is not reliable.
Another way to check the normality of the given data sets is through the method applied by Shapiro-Wilk with the equation stated below:
(10)
W=(i=1n(aixi)2i=1n(xix¯)2
Where:
xi = sample value
αi = constant generated from mean, variance and covariance from the normal distribution
If the variable, W, of Shapiro-Wilk is too small, then the distribution is said to be not normalized. If this happens then another approach done by Jarque-Bera to test the normality can also be performed. The equation is as follows:
(11)
JB=n6(S2+K24)
Where the following sub-equations are needed to arrive at the general equation presented above:
(12)
S=i=1N(Yμ)3/ns3
(13)
K=i=1N(Yμ)4/ns43
Where:
n = number of observations
Y = sample data
μ = mean
s = standard deviation
Hence, if the value for Jarque-Bera test (JB) is statistically significant, the normality assumption is then rejected. Therefore, one more method used by Breusch-Pagan to test for the random coefficients and the White test for the specification robust. Both were performed to check the presence of spatial heteroscedasticity.
To illustrate, if
(14)
Yi=α+βXi+εi
Where:
i = 1, … N
Ei) = 0
Then, the auxiliary regression is given by
(15)
Z2i=+δXi+υi
Where the following sub-equations are desired to arrive at the general equation presented above:
(16)
S2=u2^iN
(17)
Z2=u2^is2
It should be noted that the symbol “ ^ “ indicates the estimated value). Consequently, if the coefficient of Xi is 0, the error variance is homoscedastic, otherwise, it is heteroscedastic. The same goes with White test. If the model of regression is,
(18)
Yi=a+β1X+β2W+εi
Then, the auxiliary regression model is,
(19)
u2^i=+δ1Xi+δ2W1+δ3Xi+δ4W2+δ5XixWi+υi
If the same result of the coefficient of Xi is 0 for the white test, then the error variance is homoscedastic, if not, it is heteroscedastic.

3.2 Geographically Weighted Regression

The goal of Geographically Weighted Regression (GWR) is to indicate the presence of non-stationarity where the locally weighted regression coefficients move away from their global values (Bivand, 2014). GWR assumes the possibility that the obtained coefficient values of the global model will not be accurate enough compared to the local model. If there is an existing local variation, it can be taken as an indication of non-stationarity. In some studies, GWR was able to provide better specification compared to other global models like OLS (Yrigoyen et al., 2008).
As the spatially varied characteristics in flood damages are taken into account, damage function can be modified by using Geographically Weighted Regression (GWR) Method:
(20)
yi=β0(ui,υi)+β1(ui,υi)xi+β2(ui,υi)xi2+εi
Wherein, β0(ui, vi), β1(ui, vi) and β2(ui, vi) are is the realization of the continuous function at point i and εi is the residual of point (ui, vi). Since, GWR recognizes the possibility of spatial variations (Chang et al., 2008) the estimate in GWR the equation is:
(21)
β=(XTWX)-1XTWY
Wherein, is n×n matrix whose off-diagonal elements are zero; the diagonal elements denote the geographical weighting of observed data for point i. The weighting of each observed data is given at:
(22)
wij(ui,νi)=(1(dij(ui,νi)/h)3)3
Where the dij is the Euclidean distance between observed data i, j and h are the bandwidth.

3.3 Box-Cox Method

Box-Cox method is a very useful method to normalize non-normal datasets. The following defines Box-Cox Method of transformation:
(23)
yλ={yλ1λifλ0;otherwise,log(y);
The λ is treated as a parameter in the likelihood function and the profile likelihood function is evaluated in order to get the optimal λ value.

4. Results and Discussion

4.1 Descriptive Statistics

A descriptive information and statistics of the model variables should be provided for the general overview of the data sets. The results showed that variations in the mean values for the given parameters were due to the differences in the units used in each parameter. Additionally, the standard deviation reveals variability in the dispersion of the variables.
Therefore, by analyzing the statistics of the datasets, we could see that the standard error of the mean is quiet high and the distance of the sample mean from it, being likely to be far from the true population mean is not that precise. However, the standard deviation for most of the values has significantly high amounts. This just proves how the data points are spread out over large range of values.
Skewness, on the other hand, showed positive values except for the flood duration (dur) of commercial facilities (See Table 1 to 3). These positively skewed datasets simply indicate that they are skewed to the right. The negative value is of course, skewed to the left. The Shapiro-Wilk Test, when calculated, consistently showed p-values of less than 0.01 (p-value = 0.00000), which violates the rule of normality. Thus, in order to have a clear comparison, the author normalized the datasets through Box-Cox Transformation as shown in Tables 4 and 5, the coefficient results.
Table 1
Descriptive Statistics of the Model Variables for Residential Facilities
*dam *dep *dur *far *inc *lp
N 496 496 496 496 496 496
Min 100.000 0.010 1.000 12.000 1.000 19.459
Max 60,000.000 2.000 965.000 37,008.000 5.000 2,367.230
Mean 2,656.661 0.262 185.718 706.450 2.050 337.611
SE Mean 265.929 0.019 5.006 175.255 0.026 8.884
Std Dev 5,922.519 0.425 111.498 3,903.124 0.572 197.867
Skewness 4.728 2.492 2.144 8.572 0.917 3.865
Kurtosis 28.638 5.555 12.501 74.228 3.408 27.660

*dam-damage amount *dep-flood depth *dur-flood duration *far-inundated area *inc-family income *lp-land price

Table 2
Descriptive Statistics of the Model Variables for Commercial Facilities
*dam *dep *dur *far *inc *lp
N 752 752 752 752 752 752
Min 100.000 0.010 4.000 2.000 1.000 19.459
Max 2.15 2.250 299.000 15 9.000 2.5963
Mean 6.543 0.392 183.019 13 2.407 550.160
SE Mean 597.108 0.020 2.792 187.346 0.034 15.120
Std Dev 1.643 0.548 76.566 5.14 0.926 414.641
Skewness 6.971 2.070 -0.511 12.195 2.723 2.003
Kurtosis 62.597 3.398 -0.904 199.362 14.661 4.972

*dam-damage amount *dep- flood depth *dur-flood duration *far-inundated area *inc-family income *lp-land price

Table 3
Descriptive Statistics of the Model Variables for Agricultural Facilities
*dam *dep *dur *far *inc *lp
N 30 30 30 30 30 30
Min 100.000 0.020 14.000 123.000 1.000 8.760
Max 21,008.000 1.960 973.000 4,409.000 4.000 526.244
Mean 2,718.433 0.722 207.700 959.967 1.933 96.172
SE Mean 964.748 0.109 31.118 173.444 0.172 27.340
Std Dev 5,284.144 0.598 170.440 949.994 0.944 149.749
Skewness 2.515 0.632 3.123 2.102 0.929 2.150
Kurtosis 5.933 -0.974 14.222 5.119 0.233 3.750

*dam - damage amount *dep- flood depth *dur-flood duration *far-inundated area *inc-family income *lp-land price

Table 4
OLS and GWR Results for Untransformed Datasets
Parameter Variables Coefficient (OLS) Residential Coefficient (GWR) Residential Coefficient (OLS) Commercial Coefficient (GWR) Commercial Coefficient (OLS) Agricultural Coefficient (GWR) Agricultural
Intercept -149.248 1203.646 2692.710 754.041 -4790.713 -7643.386
dep 7339.759 5742.459 11892.550 17247.551 3509.237 3001.392
dur -0.041 1.142 -3.954 -0.135 6.721 22.325
far -0.014 1.233 0.203 -5.403 0.936 3.370
inc 491.837 94.574 134.682 435.851 1222.567 -5.295
lp -0.322 -4.341 -1.155 -2.476 3.301 -11.530

*dam-damage amount *dep- flood depth *dur-flood duration *far-inundated area *inc-family income *lp-land price

Table 5
OLS and GWR Results for Transformed Datasets
Parameter Variables Coefficient (OLS) Residential Coefficient (GWR) Residential Coefficient (OLS) Commercial Coefficient (GWR) Commercial Coefficient (OLS) Agricultural Coefficient (GWR) Agricultural
Intercept -4.0-6 -1.842-3 0.230 -1.842-3 0.073 -0.035
BCDEP 9.59-4 6.140-4 -0.150 6.140-4 -0.025 -0.021
BCDUR 2.0-6 1.3-5 2.7-5 1.3-5 -7.11-4 -0.003
BCFAR 1.0-3 1.453-3 0.025 1.453-3 0.452 0.346
BCINC 5.5-5 8.0-5 1.0-3 8.0-5 -0.020 0.014
BCLP -8.0-5 2.85-4 2.522-3 2.85-4 2.57-4 0.428

*BCDAM - transformed damage amount *BCDEP - transformed flood depth *BCDUR - transformed flood duration

*BCFAR - transformed inundated area *BCINC - transformed family income *BCLP - transformed land price

4.2 Analysis of OLS and GWR Results

The coefficients obtained from the OLS untransformed model of residential facilities all showed a negative relationship except for flood depth and family income. Flood depth has 7339.759 and family income at 491.837. Flood duration has -0.041, inundated area has -0.014, and land price has -0.322. The positive results means that in every increase of flood depth and the amount of income of the affected families, there is an expected increase in damage and a decrease in flood duration, inundated area and land price. However, the constant value of the regression analysis showed a -149.248, which is a little bit different from the other untransformed OLS models. This could strongly signify that missing variables do exist (e.g. if the given five parameters are zero). On the transformed datasets, the intercept (Intercept = -4.0-6) values showed a negative relationship with regards to the damage amount. The same goes with the land price (BCLP = -8.0-5). The inundated area (BCFAR = 1.0-3), flood duration (BCDUR = 2.0-6), family income (BCINC = 5.5-5) and flood depth (BCDEP = 9.59-4) with again the BCFAR garnering the highest value. In case of the transformed data, the inundated area gets the highest bearing for flood damage followed by flood depth. Consistently, flood depth has the highest influence in the model as particularly having the largest coefficient value in the untransformed datasets, while inundated area gains the crown for the datasets that were transformed.
Nonetheless, the coefficients obtained from the OLS untransformed model of commercial facilities all showed a positive relationship. Flood depth has 11892.550, flood duration has -3.954, inundated area has 0.203, family income has 134.682 and land price with -1.155. The positive results show that in every increase of flood depth, the area of flooded region and the amount of income of the affected families, there is an expected increase in damage and a decrease in flood duration and land price. However, the constant value of the regression analysis showed a +2692.710. This indicates a positive relationship with respect to the amount of flood damage. On the other hand, on the transformed datasets, the intercept (Intercept = 0.230) values showed a positive relationship with regards to the damage amount. The same goes with the inundated area (BCFAR = 0.025), land price (BCLP = 2.522-3) and flood duration (BCDUR = 2.7-5) with the BCFAR gaining again the highest value. The rest are of negative values (BCDEP = -0.150 and BCINC = - 0.010). This shows that in the case of the transformed data, the inundated area gets the highest bearing for flood damage followed by flood depth.
The coefficients obtained from the OLS untransformed model of agricultural facilities all showed a positive relationship. Flood depth has 3509.237, flood duration has 6.721, inundated area has 0.936, family income has 1222.567 and land price has 3.301. The positive results show that in every increase of flood depth, duration in the flooding event, the area of flooded region, the amount of income of the affected families as well as the land price of the affected region, there is an expected rise in damage. However, the constant value of the regression analysis showed a -4790.713. This indicates a negative relationship with respect to the amount of flood damage. On the other hand, on the transformed datasets, the intercept (Intercept = 0.073) values showed a positive relationship with regards to the damage amount. The same goes with the inundated area (BCFAR = 0.456) and land price (BCLP = 2.57-4) with the former gaining the highest value. All the other parameter variables are of negative values (BCDEP = -0.025, BCDUR = -7.11-4 and BCINC = - 0.020). This shows that in case of the transformed data, the inundated area gets the highest bearing for flood damage followed by family income.
Some positive values in OLS have become negative in GWR and vice versa. This happens both in untransformed and transformed datasets. This, however, indicates that the factors have varying effects in the global and local conditions. Several parametric evaluation like coefficient of determination (R2), log-likelihood and AIC were used to evaluate the OLS and GWR models (see Table 6 and 7).
Table 6
OLS and GWR Evaluation for Untransformed Datasets
Parameter OLS (Residential) GWR (Residential) OLS (Commercial) GWR (Commercial) OLS (Agricultural) GWR (Agricultural)
R2 0.284 0.614 0.175 0.566 0.270 0.979
Log-likelihood 4928.850 4775.675 8291.250 8049.553 294.519 241.335
AIC 9871.699 9752.683 16596.500 16435.690 603.038 534.615
Table 7
OLS and GWR Evaluation for Transformed Datasets
Parameter OLS (Residential) GWR (Residential) OLS (Commercial) GWR (Commercial) OLS (Agricultural) GWR (Agricultural)
R2 0.138 0.481 0.190 0.320 0.305 0.873
Log-likelihood 2904.051 3030.034 1446.733 1512.389 68.421 93.960
AIC -5794.102 -5845.397 -2879.465 -2671.310 -122.843 -134.956
The R-squared value or the coefficient of determination of all the above models have shown improved values from OLS to GWR. Commercial and residential facilities still showed an increase, but only ranges from 0.57 to 0.61, while the agricultural facilities achieved the highest R-squared value with 0.98. However, we should be reminded of the fact that the coefficient of determination is not the sole criteria for us to tell a significant improvement of the OLS model to the GWR model. In addition the Log-likelihood and AIC, are two necessary approach to evaluate the performance of the model. The log-likelihood of the GWR models of the untransformed datasets was lower than the log-likelihood of OLS. As for the AIC, as the rule says, an absolute difference for the AIC should be 3 in order to consider it as an improved performance. All of the generated models have found to have satisfied this condition.
Several tests were also performed to analyze the datasets. T-statistic test for the untransformed data showed that ‘flood depth’ (dep) is the only statistically significant parameter variable at 1% of significance level (p-value = 0.00000). For the transformed data, BCDEP (flood depth) and BCFAR (inundated area) were the statistically significant on 1% and 5% level, respectively.
The Jarque-Bera Test in this case have also failed the normality of the residual distribution for the untransformed data (JB =8812.902, p-value = 0.00000). Moreover, the Breusch-Pagan Test (BP = 1152.643, p-value = 0.00000), Koenker-Basset Test (KB = 106.607, p-value = 0.582) and the White Test on specification of robust test (WT = 132.148, p-value = 0.00000) confirmed the presence of spatial heteroscedasticity. This leads to the model being spatially non-stationary. However, the transformed data showed that JB = 53443.481, p-value = 0.00000, which means that it is not under normal distribution. Therefore, the Breusch-Pagan Test (BP = 98.723, p-value = 0.00000), Koenker-Basset Test KB = 3.776, p-value = 0.582) and the White Test on specification of robust test (WT = 7.106, p-value = 0.996) all failed the prediction of the data being spatially non-stationary.
For the datasets of residential facilities, the multi-co-linearity condition number was found to be 10.950 for the untransformed and 42.476 for the transformed. The untransformed value is less than 30, while that of the other is greater than the said standard value. Thus, the latter has an issue with multi-co-linearity. In case of commercial facilities of untransformed datasets, only 17.49% of the variation in the dependent variable is explained. Thus, this model tells only an approximately 17.49% of the flood damage in the 2012 flood event in Gunsan City. For the transformed data, it increased to 19.00%.
The results of the t-statistic test for the untransformed data showed that ‘flood depth’ (dep) is the only statistically significant parameter variable at 1% of significance level (p-value = 0.00000). For the transformed data, BCDEP (flood depth) is statistically significant on 1% level together with BCINC (family income). In this case, flood depth is indeed the highest influencing factor among the other four.
The Jarque-Bera Test in this case have failed the normality of the residual distribution for the untransformed data (JB = 96078.419, p-value = 0.00000). Additionally, the Breusch-Pagan Test (BP = 2656.296, p-value = 0.00000), Koenker-Basset Test (KB = 94.588, p-value = 0.00000) and the White Test on specification of robust test (WT = 117.630, p-value = 0.00000) confirmed the presence of spatial heteroscedasticity. This leads to the model being spatially non-stationary. However, the transformed data showed that JB = 8.797, p-value = 0.012, which means it is not under normal distribution. The following tests: Breusch-Pagan Test (BP = 6.184, p-value = 0.289), Koenker-Basset Test (KB = 5.205, p-value = 0.391) and the White Test on specification of robust test (WT = 17.955, p-value = 0.590) all failed the prediction of the data being spatially non-stationary.
The test for multi-co-linearity was also performed in commercial facilities datasets. The multi-co-linearity condition number was found to be 10.254 for untransformed and 38.679 for transformed. The untransformed value is less than 30, while that of the other is greater than the said standard value. Therefore, the latter has an issue with multi-co-linearity.
As shown in the results of the untransformed datasets, only 26.97% of the variation in the dependent variable is explained. Thus, this model tells only an approximately 26.97% of the flood damage in the 2012 flood event in Gunsan City. In the transformed data, it then increased to 30.48%.
All the resulting coefficients of the parameter variables are given in the same units as their associated explanatory variables. The coefficient reflects the expected change in the dependent variable for every 1 unit change in the associated explanatory variable, holding all other variables constant.
However, the results of the t-statistic test for the untransformed data showed that ‘flood depth’ (dep) is the only statistically significant parameter variable at 5% of significance level (p-value = 0.04). The t-test is used to assess whether or not an explanatory variable is statistically significant. The null hypothesis is that the coefficient is, for all intents and purposes, equal to zero (and consequently is NOT helping the model). When the probability is very small, the chance of the coefficient being essentially zero is also small. This again proves that flood depth is indeed the highest influencing factor among the other four.
The Jarque-Bera Test also reject the normality of the residual distribution for the untransformed data at 1% level (JB = 6.705, p-value = 0.04). Breusch-Pagan Test (BP = 26.905, p-value = 0.00006), Koenker-Basset Test (KB = 16.123, p-value = 0.00065) and the White Test on specification of robust test (WT = 29.427, p-value = 0.080) confirmed the absence of spatial heteroscedasticity. This leads to the model being stationary. However, the transformed data showed that JB = 1.649, p-value = 0.438, which means it is under normal distribution. In addition, the Breusch-Pagan Test (BP = 1.552, p-value = 0.907), Koenker-Basset Test (KB = 2.795, p-value = 0.732) and the White Test on specification of robust test (WT = 23.924, p-value = 0.246) all failed the prediction of the data being spatial non-stationary.
The test for multi-co-linearity was also performed. The multi-co-linearity condition number arrived with 8.283 for untransformed and 13.574 for transformed. In this final case, both values are less than 30 and thus indicate that multi-co-linearity problem no longer exist.

5. Conclusions

This study did not just respond to the underlying difficulties in dealing with thousands of datasets, but rather as proclaimed in this master thesis, the data sets from the Gunsan City’s August 12 flood event were statistically explored to be able to arrive in the flood damage functions that would estimate the amount of flood damage of the said city. Such functions were expressed in terms of flood depth, flood duration, inundated area, family income and land price. Flood depth has been found to be influential to flood damages long before. However, this paper aims to identify the ‘other’ possible factors that could contribute in the flood damage estimation model. The candidate models were obtained by utilizing the Ordinary Least Squares (OLS) Regression and Geographically Weighted Regression (GWR).
The OLS and GWR were both used to generate the functions. The GWR, however, proved to be more of a suitable fit for the three sets of facilities. The coefficients of determination for residential, commercial and agricultural facilities are 0.0614, 0.566 and 0.979, respectively. The log-likelihood values were 4775.675, 8049.553 and 241.335 for the three facilities. Nevertheless, AIC values of 9752.683, 16435.690 and 534.615 for the said facilities were also obtained.
The author tried to solve the normality issue without trying to lessen the reliability of the models. In order to do so, the Box-Cox Method was then applied (refer to transformed datasets). It was found out that after transforming the data into a normalized distribution, all other tests have been passed. Therefore, it would be evident to show that the underlying factors for the normality issue might point out to the presence of extreme values that resulted to skewed distribution and due to sorting of data as well as facility classification.
Pointing out on the normality issue on the untransformed model, one reason is the presence of extreme values that resulted to skewed distribution. With this, the author recommends that a need for including more flood events will be helpful to improve the models, since the consideration of only one flood event was used, so extreme values are therefore inevitable. No other flood events were considered and therefore, no other values could support the said extreme observations. The presence of the extreme values is important, because it is actually the reason why we are estimating. Deleting those values without valid reason is therefore unacceptable.
The data used in the study are sorted out and are also classified into three groups: residential, commercial and the agricultural facilities. The methodology includes sorting out the data into three before analyzing it and therefore, some data were removed (1,278 out of 3,111 facilities) affecting the lower, middle and upper specification of the datasets. Also, the fact that facilities, e.g. commercial facilities, exists not only in one zone, have an effect regarding the normality issue. Therefore, it is recommended that modifying this type of grouping order into zonal classification would better improve the normality and the models as well.

Acknowledgement

This research was supported by a grant [MPSS-NH-2013-62] through the Disaster and Safety Management Institute funded by Ministry of Public Safety and Security of Korean government.

References

Beck, J, Metzger, R, Hingray, B, and Must, A (2002) Flood risk assessment based on security deficit analysis. 27th General Assembly of the European Geophysical Society Geophys. Res 21-26 April 2002, Nice, France.
crossref
Bivand, R (2014) Geographically Weighted Regression, pp. 1-4.
crossref
Black, R.D (1975). Flood Proofing Rural Structures: A ‘Project Agnes’ Report, Pennsylvania. National Technical Information Service, Springfield, VA, USA.
crossref
CH2M, HILL (1974). Potential Flood Damages. Willamette River System. Department of the Army Portland District, Corps of Engineers, Portland, OR, USA.
crossref
Chang, L.F, Lin, C.H, and Su, M.D (2008) Application of geographic weighted regression to establish flood-damage functions reflecting spatial variation. Water SA, Vol. 34, pp. 209-216.
crossref
David, T.F (2001) Flood-warning decision-support system for Sacramento, California. Water Resour. Plann. Manage, Vol. 127, pp. 254-260. 10.1061/(ASCE)0733-9496(2001)127:4(254).
crossref
Du Plessis, L.A (2002) A review of effective flood forecasting, warning and response system for application in South Africa. Water SA, Vol. 28, pp. 129-137. 10.4314/wsa.v28i2.4878.
crossref
Dutta, D, Herath, S, and Musiake, K (2003) A mathematical model for flood loss estimation. J. Hydrol, Vol. 277, pp. 24-49. 10.1016/S0022-1694(03)00084-2.
crossref
FEMA (1977). Reducing Flood Damage through Building Design: A Guide Manual - Elevated Residential Structures. Structure. Edited by FEM Agency.
crossref
Grigg, N.S, and Helweg, O.J (1975) State-of-the-art of estimating flood damage in urban areas. Water Resour. Bull, Vol. 11, pp. 379-390. 10.1111/j.1752-1688.1975.tb00689.x.
crossref
Kim, S.M, Tachikawa, Y, and Takara, K (2007) Recent Flood Disasters and Progress of Disaster Management System in Korea. Annuals of Disaster Prevention Research Institute, Kyoto University, No. 50B, pp. 15-31.
crossref
Lekuthai, A, and Vongvisessomjai, S (2001) Intangible Flood Damage Quantification. Water Resour. Manag, Vol. 15, pp. 323-362. 10.1023/A:1014489329348.
crossref
McBean, E.A, Gprrie, J, Fortin, M, Ding, J, and Moulton, R (1988) Adjustment factors for flood damage curves. J. Water Resour. Plann. Manage, Vol. 114, pp. 635-646. 10.1061/(ASCE)0733-9496(1988)114:6(635).
crossref
McPherson, H.J, and Saarinen, T.F (1977) Flood plain dwellers perception of flood hazard in Tucson. Arizona. Ann. Reg. Sci, Vol. 11, pp. 25-40. 10.1007/BF01287852.
crossref
Messner, F, and Meyer, V (2005). UFZ Discussion Papers. UFZ - Umweltforschungszentrum Leipzig-Halle. Department Okonomie. Permosersttr.15, Leipzig, Germany.
crossref
Ministry of Security and Public Administration (2014). http://english.visitkorea.or.kr/enu/AK/AK_EN_1_4_3.jsp. October 30, 2014.
crossref
Penning-Rowsell, E.C, and Chatterton, J.B (1977). The benefits of flood alleviation: A manual of assessment techniques. Gower Technical Press, Aldershot.
crossref
Shaw, D.G, Huang, H.H, and Ho, M.C (2005) Modeling flood loss and risk perception: the case of typhoon Nari in Taipei. Proc. 5th Annu. IIASA-DPRI Meeting on Integrated Disaster Risk Management: Innovat.
crossref
Smith, D.I (1994) Flood damage estimation - A review of urban stagedamage curves and loss functions. Water SA, Vol. 20, pp. 231-238.
crossref
Smith, K, and Ward, R (1998). Floods - Physical processes and human impacts. Earth Surface Processes and Landforms. Vol. 24: No. 13, Chichester: Wiley.
crossref
Thieken, A.H, Muller, M, Kreibich, H, and Merz, B (2005) Flood damage and influencing factors: New insights from the August 2002 flood in Germany. Water. Resour. Res, Vol. 41, pp. 1-16. 10.1029/2005wr004177.
crossref
White, G.F (1945). Human Adjustment to Floods. University of Chicago, Dept., of Geography, Research Paper No. 29.
crossref
White, G.F (1964). Choice of Adjustment to Floods. University of Chicago, Dept., of Geography, Research Paper No. 93.
crossref
Wind, H.G, Nierop, T.M, De Blois, C.J, and De Kok, J.L (1999) Analysis of flood damages from the 1993 and 1995 Meuse floods. Water Resour. Res, Vol. 35, pp. 3459-3465. 10.1029/1999WR900192.
crossref
Yang, L, Zuo, C, and Wang, Y.G (2005) An effective two-stage neu\-ral network model and its application on flood loss prediction. Proc. Advances in Neural Networks - Isnn 2005 Pt 3, Vol. 3498, pp. 1010-1016.
crossref


ABOUT
ARTICLE CATEGORY

Browse all articles >

BROWSE ARTICLES
AUTHOR INFORMATION
Editorial Office
1010 New Bldg., The Korea Science Technology Center, 22 Teheran-ro 7-gil(635-4 Yeoksam-dong), Gangnam-gu, Seoul 06130, Korea
Tel: +82-2-567-6311    Fax: +82-2-567-6313    E-mail: master@kosham.or.kr                

Copyright © 2024 by The Korean Society of Hazard Mitigation.

Developed in M2PI

Close layer
prev next