Daily Flow Forecasting based on Deterministic and Stochastic Features
Article information
Abstract
Abstract
Hydrological time series have been forecasted using various models. A study employed nonlinear dynamical methods, namely the correlation dimension and dynamic vs. stochastic (DVS) algorithm, to analyze daily flow data characteristics. Analysis using the correlation dimension and DVS algorithm revealed that the daily streamflow observed from the St. Johns River near Cocoa, Florida, USA, exhibited chaotic characteristics, whereas the daily inflow to the Soyanggang Dam reservoir showed stochastic properties. However, the nonlinearity of the flows has not been investigated, and the stochastic models have not been fit for flow modeling and forecasting. Therefore, the present study tests the nonlinearity and fits stochastic models of the flow data. In addition, the forecasting results obtained from the DVS algorithm and neural networks were compared with those from the fitted stochastic models. The forecasting results derived from the DVS algorithm and neural networks demonstrated higher accuracy for daily streamflow than for daily inflow. Furthermore, when applying the AR (1) model to daily flow and ARMA (3, 1) model to daily inflow, the results showed that the chaotic nature of the daily streamflow yielded more accurate forecasts. The findings suggest that the dynamic structure inherent in hydrological time series may influence forecasting performance. Notably, the two flows exhibited nonlinearity based on BDS statistics, indicating that nonlinear time-series models may be more appropriate for analysis.
Trans Abstract
수문 시계열은 다양한 모형을 이용하여 예측되어 왔는데 선행 연구에서는 카오스 분석 기법인 상관 차원과 DVS (Dynamic vs. Stochastic) 알고리즘을 사용하여 일유량 자료의 특성을 분석하였다. 미국 플로리다주 코코아 지역 근처 St. Johns River에서 관측된 일유량 시계열은 비선형 동역학적 특성을 보인 반면, 소양강댐 저수지로의 일유입량은 추계학적 특성을 보였다. 그러나 해당 연구는 자료의 비선형성과 유량 예측 및 모형화를 위한 추계학적 모형을 검토 하지 않았다. 따라서 본 연구에서는 유량 자료의 비선형성을 검정하고, 선정된 추계학적 모형 및 DVS 알고리즘과 인공신경망에 의한 예측 결과를 비교 분석하고자 하였다. 그 결과, DVS 알고리즘과 인공신경망을 통해 도출한 예측 결과는 추계학적 특성의 일유입량보다 일유량 자료에서 더 높은 예측 정확도를 보였다. 또한, AR (1)모형의 일유량과 ARMA (3, 1)모형에 의해 일유입량 자료를 예측한 결과에서도 일유량 자료의 예측 정확도가 일유입량 보다 더 좋게 도출되었다. 이는 수문 시계열에 내재된 카오스적 동적 구조가 예측 성능에 영향을 줄 수 있음을 시사한다고 판단하지만, BDS 통계에 의한 비선형성 검토에서 두 시계열 모두 비선형성을 보여 선형 보다는 비선형 모형이 분석 및 예측에 보다 더 적합할 수 있음을 보여준다.
1. Introduction
Hydrological time series encompass the temporal variations of various hydrological variables such as precipitation, runoff, and reservoir inflow, and have long attracted attention due to their inherently complex dynamic characteristics. Traditionally, linear time series models such as ARMA (autoregressive moving average) and ARIMA (autoregressive integrated moving average) have been widely used for the analysis and forecasting of hydrological phenomena. These models offer computational simplicity and interpretability, but often fall short in capturing the nonlinear, nonstationary, and chaotic properties found in natural systems (Butts et al., 2014; Sang et al., 2015; Di et al., 2019; Ombadi et al., 2021).
However, natural hydrological processes are governed by inherently nonlinear dynamics arising from interactions among rainfall-runoff responses, land surface conditions, and human interventions. Such complexity suggests that hydrological time series may possess deterministic chaos, which exhibits irregular yet structured patterns and is highly sensitive to initial conditions. This perspective has led many researchers to analyze hydrological series using tools from chaos theory to uncover hidden deterministic structures and enhance short- term prediction accuracy (Sivakumar and Singh, 2012; Ehret et al., 2014; Kedra, 2014; Bancheri et al., 2019).
Over the past decades, chaos-based models such as those using correlation dimension and phase space reconstruction have been applied to hydrological series to distinguish chaotic behavior from stochastic noise (Grassberger and Procaccia, 1983; Holzfuss and Mayer-Kress, 1986; Tsonis and Elsner, 1988; Rodriguez-Iturbe et al., 1989; Graf and Elbert, 1990; Sharifi et al., 1990; Barnett, 1993; Lall et al., 1996; Puente and Obregon, 1996; Sangoyomi et al., 1996; Porporato and Ridolfi, 1997; Kim et al., 2001; Kim et al., 2003; Paik et al., 2005; Salas et al., 2005; Kim and Kim, 2008; Kim et al., 2009, Kyoung et al., 2011; Kim et al., 2014; H.S. Kim et al., 2015; S. Kim et al., 2015). More recently, nonlinear time series analysis techniques, including the DVS (Deterministic Variable Selection) algorithm, have emerged as effective tools for forecasting chaotic systems by leveraging local dynamic structures.
Yu et al. (2025) demonstrated the potential of using quantitative tools for effectively analyzing the nonlinear dynamics of hydrological time series and provided a theoretical foundation for chaos-based forecasting models. Similarly, Li et al. (2014) applied six nonlinear analytical methods to runoff series and identified both the presence of low-dimensional chaos and the varying intensity of nonlinearity across multiple time scales.
In parallel, rapid advances in machine learning have enabled the use of artificial neural networks (ANNs) and deep learning models such as LSTM and GRU for hydrological prediction. These models are capable of learning complex temporal patterns from data, and often outperform traditional models in terms of predictive accuracy (Gao et al., 2020; Swagatika et al., 2024; Waqas and Humphries, 2024; Widiasari and Efendi, 2024). Nevertheless, their black-box nature makes it difficult to interpret physical dynamics, which can limit their utility in practice-oriented hydrological applications (Jung et al., 2021; Kim et al., 2022; Kwak et al., 2022; Liu et al., 2022; Cambria et al., 2023; Han et al., 2023; Wang et al., 2024; Zhang et al., 2025).
To provide a comprehensive understanding of prediction strategies under different data characteristics, this study applies and compares three forecasting methodologies: (1) a chaos- based approach using the DVS algorithm, (2) a neural network- based learning approach, and (3) a traditional statistical model, ARMA. By including ARMA models as a baseline, the study aims to assess the relative strengths and limitations of linear versus nonlinear and data-driven models in forecasting hydrological series.
This research focuses on two distinct time series: the daily streamflow at the St. Johns River near Cocoa, Florida (characterized by low-dimensional deterministic chaos), and the daily inflow to Soyang Reservoir in South Korea (showing stochastic properties). The study evaluates the predictability of these datasets using each modeling approach under multiple lead times.
Ultimately, the goal is to provide practical guidance on selecting appropriate forecasting techniques based on the underlying dynamic nature of the hydrological series—whether deterministic, stochastic, or nonlinear. Through this comparison, the study demonstrates how a model’s forecasting capability can vary significantly depending on the internal structure of the data.
While this study draws upon the analytical foundation established by Wang et al. (2019), which first applied chaos theory to distinguish the deterministic characteristics of the St. Johns River and Soyang inflow series using DVS and ANN models, our research advances this work by expanding the methodological comparison and rigorously evaluating forecast performance under varying lead times and model classes. Specifically, this study (1) systematically compares chaos-based, machine learning-based, and statistical models under identical datasets, (2) incorporates additional performance indicators to highlight model degradation over time, and (3) provides practical insights into model suitability based on the underlying system dynamics—deterministic chaos versus stochastic processes. This allows for a more nuanced understanding of when and how nonlinear models offer meaningful forecasting advantages over traditional methods.
2. Study Area and Application Data
Data sets used in this study are a daily streamflow at St. Johns River near Cocoa, Florida, USA (case-1; USGS-0223 2400), and a daily inflow series at Soyang Reservoir in Korea (case-2; https://www.water.or.kr). We obtained these data from Wang et al. (2019). The case-1 series was analyzed for the investigation of its chaotic behavior by Wang et al. (2019), and it showed deterministic chaos. The case-1 series consists of 12,784 measurements from January 1, 1954 to December 31, 1988. The St. Johns River is the longest river in Florida, stretching approximately 310 miles (about 500 km). Unlike most rivers in North America, it flows northward from central Florida to the Atlantic Ocean. The river basin covers around 8,840 square miles (about 22,900 km2), accounting for approximately 23% of Florida’s total area.
Another data set used consists of 8,776 measurements from January 1, 1974 to December 31, 1997, corresponding to the inflow series of the Soyang Reservoir. The Soyang Reservoir, located in Gangwon Province, South Korea, was formed by the construction of Soyang Dam, which was completed in 1973. It is the largest multi-purpose dam in Korea, serving for flood control, water supply, and hydroelectric power generation. The dam is 123 meters high and 530 meters long, with a total storage capacity of approximately 2.9 billion cubic meters. The time series plots are shown in Figs. 1 and 2.
3. Chaos Characterization and BDS Statistic
3.1 Chaotic Behavior of Flow Series
Wang et al. (2019) reconstructed two flow series in phase space using the delay method suggested by Packard et al., 1980; Takens, 1981). A single record of some observable xt, t=1,2,…,N, where N is data size can be reconstructed on m-dimensional phase space and obtained the attractor. This reconstruction takes the form shown in Eq. (1):
where τ is the delay time. There are methods for the estimation such as C-C algorithm (Kim et al., 1999), autocorrelation function, and mutual information. We use the autocorrelation function for the convenience and simplicity in this study. Wang et al. (2019) investigated the chaotic behaviors of case-1 and case-2 by the estimation of the correlation dimensions. Streamflow series at St. Johns river near Cocoa, USA shows the correlation dimension of 3.305 and it is possible to say that the time series has a chaotic characteristic. On the other hand, in the case of inflow series at Soyang reservoir, the correlation dimension calculated is increasing as embedding dimension is increased and it may be difficult to conclude that the inflow series is chaos. Therefore, the inflow series may have stochastic property.
3.2 BDS Statistic and Nonlinearity Test
Linear and nonlinear models are used for testing their residuals by conventional nonparametric test statistics as well as by a new test statistic, called BDS statistic. Brock et al. (1991), Brock et al. (1996) studied the BDS statistic, which is based on the correlation integral, to test the null hypothesis that the data are independently and identically distributed (iid). The correlation integral is defined in Eq. (2):
where Θ(a)= 0, if a < 0
Θ(a)= 1, if a ≥ 0
N is the size of the data set, M=N-(m-1) is the number of embedded points in m-dimensional space. This test has been particularly useful for chaotic systems and nonlinear stochastic systems. Under the iid hypothesis, the BDS statistic is defined in Eq. (3) for m > 1:
As N→∞, this statistic converges to a standard normal distribution. The asymptotic variance σ2 (m, N, r) and K (m, M, r) is estimated as described in Eq. (4):
The values of BDS statistic distinguish random time series from the time series generated by chaotic or nonlinear stochastic processes. But, even though the BDS statistic cannot be used to distinguish between a nonlinear deterministic system and a nonlinear stochastic system, we can know two flow series have nonlinear characteristics from Tables 1 and 2. Therefore, the case-1 may have nonlinear deterministic property and the case-2 may have nonlinear stochastic property.
4. Flow Forecasting and Results Analysis
4.1 DVS Algorithm and Forecasting
For a scalar time series xi = x1,x2,…,xN, the DVS algorithm attempts to fit models of the form shown in Eq. (5):
It is used a least-squares method to find the function f that gives the best prediction for xi+T in the sense that the function minimizes the squared error within the model class. The integers T and m define the following quantities.
T: lead time or prediction horizon (prediction time into the future)
m: embedding dimension or dimension of the reconstructed phase space (number of taps of the tapped delay line)
Furthermore, the m are combined in the delay vector xi. Here assuming equal spacing of the taps of the delay line, i.e., xi+T≈f(xi,xi-τ,…,xi-(m-1)τ), where τ is the lag time or lag spacing between each of the taps. After these definitions, the DVS algorithm is given by
(1) Normalize the time series to zero mean and unit variance.
-
(2) Divide the time series into two parts:
1) a training set or fitting set x1,…,xNf used to evaluate the model. Nf denotes the number of points in the fitting set, Nt the number of points in the test set.
(3) Choose T and m
(4) Choose a test delay vector xi for a T-step-ahead forecasting task (i>Nf).
(5) Compute the distances dij of the test vector xi from the training vectors xj (for all j such that (m-1)τ<j<i-T)
(6) Order the distances dij
(7) Find the k nearest neighbors xj(1) through xj(k) of xi, and fit an affine model with coefficients α0,…,αm of the following form shown in Eq. (6):
(8) Use the fitted model from step (7) to estimate a T-step-ahead forecast
(9) Repeat step (4) through (8) as (i+T) runs through the test set, and compute the mean absolute forecasting error as in Eq. (7):
Vary the embedding dimension m, and plot the curves Em(k) as functions of the number of nearest neighbor (k). Such a plot of the family of curves is called DVS plot.
The name of above algorithm derives from the fact that the shapes of the resulting plots can provide evidence of low dimensional deterministic chaos, or of high dimensional or stochastic dynamics. Low dimensional chaos is typically characterized by U-shaped or monotonically increasing plots whose minimum Em(k) values are small and occur at low values of k. High dimensional or stochastic behavior is often indicated by relatively large minimum Em(k) values occurring at high k values (Casdagli, 1992).
Wang et al. (2019) used DVS algorithm for the property examination and forecasting of two flow series and they found the same properties as this study found in the correlation dimension from the relation of Em(k) and k values. Figs. 3 and 4 obtained from Wang et al. (2019) show daily streamflow series at St. Johns river near Cocoa has a chaotic characteristic and daily inflow series at Soyang reservoir has no chaotic.
The results of the DVS plots show the best k and m. Based on the local linear approximation method (Farmer and Sidorrwich, 1987) with the best k and m, the forecasting is performed. The DVS algorithm has 301 days test sets of two daily flow series. The remaining data series are training sets. Because the DVS algorithm makes the relationship among the peak flows and among the low flows, the effect of the magnitude of data sets for forecast error may be small. Figs. 5 and 6 show the relationship between the observed and the forecasted values for each lead times (T = 1, 10, 20 days).
The chaotic streamflow time series shows that the correlation coefficients between the forecasted and the observed values are 0.9995 and the inflow time series shows 0.6311 for 1 day-ahead lead time (Figs. 5 and 6). As the lead time is increased the accuracy of the forecast is decreased. Chaotic streamflow at St. Johns river near Cocoa, shows more accurate than inflow in their correlation coefficients. The forecasting results of the lead time of 10, 20 day-ahead for chaotic streamflow are also relatively satisfactory.
4.2 Neural Network Forecasting
To improve prediction accuracy, this study applies a neural network approach, which is a widely adopted method in the field of artificial intelligence. A feedforward neural network with three layers of neurons was employed and trained using the backpropagation algorithm. The network consists of an input layer, one or more hidden layers, and an output layer, where information flows in one direction and errors are propagated backward during the training process to update weights.
In this study, streamflow forecasting is formulated as a nonlinear mapping problem where previous observations are used to estimate future values. The model can be expressed mathematically as Eq. (8):
In this equation,
In the results of Wang et al. (2019), when using a neural network for 1 day-ahead lead time, the chaotic streamflow time series shows that the correlation coefficients are 0.9994 and the non-chaotic inflow time series are 0.6286. The neural network also shows accurate forecasting results and low forecasting error for chaotic streamflow series. The results for the lead time of 10, 20 day-ahead are also relatively satisfactory even though the result based on the DVS algorithm is a little better. However, in daily inflow series at Soyang reservoir the forecasting results show relatively lower performance as we can see in Tables 3 and 4, and Figs. 7 and 8.
To evaluate the prediction performance, several statistical error metrics were used: AMB (Absolute Mean Bias), which indicates the average size of bias; RMSE (Root Mean Squared Error), which emphasizes larger errors; RRMSE (Relative RMSE), which normalizes RMSE by the mean of observations for comparability; MRE (Mean Relative Error), which reflects the average proportional error; and the correlation coefficient (R), which shows the strength of the linear relationship between observed and predicted values.
4.3 ARMA Model Forecasting
This study selects the appropriate time series models for the two flow series using Akaike Information Criterion (AIC) and Schwarz Bayesian Criterion (SBC). As shown in Table 5, the AR (1) model was selected for the streamflow series at the St. Johns River near Cocoa (case-1) based on the lowest AIC and SBC values among candidate models. For the inflow series at Soyang Reservoir (case-2), the ARMA (3, 1) model was selected according to Table 6.
The forecasting results using AR (1) and ARMA (3, 1) models were compared with the results from the DVS algorithm and neural networks for each lead time (T = 1, 10, and 20 days). For case-1, the AR (1) model showed high accuracy with a correlation coefficient of R = 0.9994 at T = 1 day, comparable to the results of the DVS algorithm (R = 0.9995) and neural network model (R = 0.9994). However, the accuracy decreased as the lead time increased, with R = 0.8103 at T = 20 days.
In contrast, the performance of the ARMA (3, 1) model for case-2 (Soyang Reservoir) was relatively poor. At T = 1 day, the correlation coefficient was R = 0.6139, and the forecasting accuracy dropped sharply to R = 0.1328 at T = 10 days and R = 0.0695 at T = 20 days. This trend was consistent with the results obtained from the DVS and neural network models, which also showed lower performance for the stochastic inflow series compared to the chaotic streamflow series.
In the forecasting results using ARMA models, the AR (1) model provided highly accurate results for the chaotic streamflow series, while the ARMA (3, 1) model showed limited forecasting capability for the stochastic inflow series. The forecasting performance of both models across different lead times is summarized in Tables 7 and 8, and the relationship between observed and forecasted values is illustrated in Figs. 9 and 10. As shown, the AR (1) model captures the trend of the observed values well, especially at shorter lead times, whereas the ARMA (3, 1) model struggles to predict the inflow dynamics at the Soyang Reservoir, with forecasting performance deteriorating rapidly as the lead time increases.
5. Conclusions
This study analyzed the dynamic characteristics of daily streamflow time series from the St. Johns River and the Soyang Reservoir using nonlinear dynamic techniques, including correlation dimension and the DVS algorithm, as well as neural network and ARMA-based forecasting models. The comparative results revealed distinct behaviors between the two time series and highlighted the impact of these characteristics on forecasting performance.
The St. Johns River data exhibited low-dimensional deterministic chaos, as confirmed by both correlation dimension analysis and DVS plots. Forecasting using the DVS algorithm and neural networks produced high accuracy, particularly for short-term lead times, with correlation coefficients exceeding 0.99 for a 1-day lead time. Similarly, the AR (1) model showed comparable performance, suggesting that linear models may still be effective when underlying dynamics are well captured.
Additionally, a detailed examination of the prediction accuracy metrics across lead times reveals notable trends. For the St. Johns River dataset, all three models—DVS, Neural Network, and AR (1)—maintained strong performance at T = 1 day with RMSE values around 47-52 and correlation coefficients above 0.999. However, as the lead time increased to T = 20 days, RMSE values rose significantly (e.g., 828.96 for AR (1)), and correlation coefficients declined to approximately 0.81. The RRMSE also doubled or tripled across lead times, reflecting growing relative errors over time. This illustrates that even in chaotic systems, predictive accuracy gradually deteriorates with longer horizons, though performance remains reasonably acceptable.
In contrast, the inflow series at the Soyang Reservoir displayed stochastic and nonlinear characteristics, as indicated by gradually increasing correlation dimension values and strong deviations in the BDS statistic. Forecasting accuracy for this dataset was relatively lower across all methods, including DVS, neural networks, and ARMA (3, 1), with correlation coefficients decreasing significantly as the lead time increased. In particular, RMSEs exceeded 100 across all lead times, and correlation coefficients dropped below 0.1 by T = 20 days, while RRMSE values surpassed 0.9. These results highlight that the stochastic nature of the inflow series imposes fundamental limitations on predictive accuracy, and such sensitivity is exacerbated with increasing lead times. This underlines the critical need to consider the underlying system dynamics when interpreting performance metrics and selecting appropriate forecasting methods.
Overall, the findings underscore the importance of identifying the intrinsic dynamic structure of hydrological time series when selecting forecasting approaches. While chaos-based methods and shallow neural networks offer strong predictive capabilities for systems with deterministic nonlinearity, their performance declines for highly stochastic series. Conversely, conventional models such as ARMA may provide stable, though limited, accuracy for short-term forecasts in stochastic systems.
Future work should explore hybrid modeling approaches that combine stochastic frameworks with deep learning architectures or chaos-informed structures to improve long- term forecasting accuracy. Additionally, integrating external hydrological drivers, such as precipitation and land-use change data, could enhance model interpretability and robustness in operational forecasting.