Evaluating Forecasts from State-Dependent Autoregressive Models for US GDP Growth Rate. Comparison with Alternative Approaches

The aim of the paper is to compare the forecasting performance of a class of state-dependent autoregressive (SDAR) models for univariate time series with two alternative families of nonlinear models, such as the SETAR and the GARCH models. The study is conducted on US GDP growth rate using quarterly data. Two methods of forecast comparison are employed. The first method consists in evaluation the average performance by using two measures such as the root mean square error (RMSE) and the mean absolute error (MAE) over different forecast horizons, while the second method make use of one of the most used statistical test to compare the accuracy of two forecast methods such as the Diebold-Mariano test


Introduction
In this paper we propose a class of state-dependent autoregressive models (SDAR) to study nonlinearities in economic time series as the quarterly US GDP growth rate. The aim is to compare the predictive ability of SDAR models with respect to linear autoregressive (AR) time series models and two leading classes of nonlinear models such as the self-exciting threshold autoregressive (SETAR) model and the generalized autoregressive conditional heteroskedasticity (GARCH) model, that have already been proposed for US GDP. The problem we address is whether SDAR models offer a much improved forecast performance.
The class of SDAR models is a generalized version of a first-order autoregressive process where the autoregressive coefficient depends on the first lagged state variable whose equation is yt = α + ψ(yt−1; γ)yt−1 + ξt, (1.1) where ψ(·; ·) is a specified function satisfying some assumptions and depending on a set of parameters γ. The error term ξt is independent of yt−1 with zero mean and volatility σ. SDAR models are closely related to the functional-coefficient autoregressive (FAR) models introduced by Chen and Tsay (1993) where p autoregressive coefficients are given by measurable functions depending on k < p lagged values of yt. Within this framework, Cai, Fan and Yao (2000) adopt local linear regression techniques to estimate functional coefficient regression models for times series data while Chen and Liu (2001) study nonparametric estimation and hypothesis testing procedures for the same model. In Cherubini and Gobbi (2013) SDAR models are derived as a special case of a more general convolution-based autoregressive processes in which the error term is not independent of the lagged value of the state variable (see also Cherubini, Gobbi and Mulinacci, 2016). More recently, Gobbi and Mulinacci (2020) define the class of SDAR models more rigorously establishing their main statistical properties, such as stationarity and ergodicity, and determine the asymptotic behaviour of the quasi-maximum likelihood (QML) estimator of the parameters. In the same paper, the authors compare the forecast performance of two specifications of SDAR models with SETAR models for time series of weekly realized volatilities extracted from three different European financial indexes, showing that SDAR models ensure a gain in the accuracy for two cases on three, at least for short and medium forecast horizons. Furthermore, Gobbi (2020) documents, through a Monte Carlo experiment, that nonlinearity in time series generated from a SDAR model strictly depends on the functional form of persistence function ψ and on the value of parameters.
A class of alternative nonlinear models we consider in this paper is the self-exciting threshold autoregressive (SETAR) models, which were first proposed and studied by Tong (1978Tong ( , 1986Tong ( and 1995 and Tong and Lim (1980). In SETAR models the variable yt is a linear autoregression within a regime but may move among regimes depending on the value taken by a lag of yt itself. A number of authors have estimated SETAR models of US GDP. Tiao and Tsay (1994) consider a two regime SETAR model, Potter (1995) estimates a SETAR (2,5,5) but with the third and fourth regimes restricted to zero in both regimes. Both papers use time series from 1947 to 1990. A key feature of SETAR models for US GDP over this period is a large and negative coefficient on the second lag in the lower regime, indicating that US economy moves rapidly out of recession periods. Moreover, Tiao and Tsay (1994) find that the forecast performance of the SETAR model relative to a linear AR model is improved when the comparison is made when the economy is in recession (i.e., the lower regime is activated).
Clements and Smith (1997) implement a Monte Carlo simulation to show that there is an significant effect of the regimes on the forecast accuracy. In particular, the authors find that the gain in the lower regime need to be sufficiently large for the SETAR to perform well on average.
The second alternative class of nonlinear models we use is represented by the generalized autoregressive conditional heteroscedasticity (GARCH) models developed by Bollerslev (1986) as an extension of ARCH models introduced by Engle (1982). GARCH models are nonlinear in variance since their crucial feature is the heteroskadasticity which assumes that volatility is not constant over time. Since the US GDP growth rate involves long-run phenomena, structural changes in volatility can occur with high probability. Kim  Our aim is to measure the forecasting accuracy for the US GDP growth rate of four different classes of nonlinear models mentioned above, SDAR, SETAR and AR-GARCH, using the linear AR as a benchmark. The evaluation of the forecast accuracy of different models adopted is conducted according to two different criteria. We first evaluate the average performance using the root mean square error (RMSE) and the mean absolute error (MAE) over different forecast horizons, from 1 to 8 quarters ahead. The second criteria is provided by the Diebold-Mariano test (DM), introduced and implemented by Diebold and Mariano (1995), to compare the forecast accuracy of two forecast methods. We use a modified version of the test proposed by Harvey, Leybourne and Newbold (1997) particularly adapted for small samples. We will show that whereas the first criteria highlights a higher performance of SDAR models with respect to the alternatives analyzed, the same conclusion is not completely confirmed by the second criteria.
The paper is organised as follows. Section 2 describes the data set used in the empirical analysis. Section 3 briefly introduces the models adopted. Section 4 reports and discusses the estimation results. In section 5 we present the forecast accuracy comparison among the models. Section 6 concludes.

Preliminary data analysis
The empirical data analysis has been carried out on the quarterly US GDP growth rate.
The observation period goes from 1950.Q2 until 2017.Q3 (270 observations) and is depicted in figure 1. The series appears mean-stationary while the variance features the volatility clustering phenomenon with periods with high volatility followed by periods of low volatility.
Furthermore, volatility is higher in the first part of the time series (indicatively until the 1980s).     Chan (1990). The null hypothesis is that the fitted model to the time series is an AR model with a specified lag structure and the alternative is that the fitted model is a threshold autoregressive model with the same lag structure for each regime. Finally, the Tsay test, which was introduced and implemented in Tsay (1986) is a test for quadratic nonlinearity in a time series in which the null hypothesis is a normal AR process. The results

Linearity tets
show that there is no strong evidence of nonlinearity in the full series, since in a number of cases tests lead to the acceptance of linearity. However, for at least one lag all tests reject the null. In particular, the TNN test highlights low p-values (less than 10%) regardless of the lag structure assumed. On the other hand, the Tlrt test and the Tsay test reject the null of linearity only for a lag structure equal to 3, reflecting a weakness of the hypothesis of quadratic and threshold autoregressive nonlinearities. In order to realize whether the nonlinearity structure strengthens or not in more recent period, we conduct the same linearity tests in a portion of the sample corresponding to the last 10 years of observations. Unfortunately, table 2

The models
With regard to introducing the models just proposed in the introduction, we briefly present their representation referring for more details to the cited literature.
The benchmark model is the standard linear autoregressive model of order p (AR(p)) which has the following equation where ξt is independent of the lagged variables yt−1, ..., yt−p. The vector of parameters is θ = (α, φ1, ..., φp, σ). The reader interested to linear autoregressive models can consult among others Hamilton (1994) and Brockwell and Davis (1991).
To compare forecasting accuracy of US GDP growth rate we will specify and estimate five alternative nonlinear models within three different classes of models: SETAR, GARCH and SDAR. Below we briefly outline their representation.
• Self-exciting threshold autoregressive (SETAR) models were first proposed in Tong (1978Tong ( , 1983), Tong and Lim (1980) and discussed in detail in Tong (1995). SETAR models considered in this paper assume that a variable yt is a linear autoregression within a regime, but may move between regimes depending on the value assumed by the first lag yt−1. We estimate two SETAR models, the first with two regimes and the second with three regimes. We denote SETAR(2, p1, p2) the model with two regimes whose where v is the threshold variable, p1 and p2 are the orders of the linear AR within each regime, ξj,t ∼ IID N (0, σj), j = 1, 2. Furthermore ξ1,t and ξ2,t are independent for all t. The vector of parameters is θ = (α1, α2, φ1,1, ..., φ 1,d 1 , φ2,1, ..., φ 2,d 2 , σ1, σ2). SETAR model with three regimes, denoted by SETAR(3, p1, p2, p3) is defined as where v1 and v2 are two threshold variables, d1, d2 and d3 are the orders of the linear AR within each regime ξj,t ∼ IID N (0, σj), j = 1, 2, 3. The vector of parameters is • GARCH models were proposed in Bollerslev (1986) as a generalization of ARCH model introduced in Engle (1982). In this paper we consider an AR(p) component in place of a constant mean for the equation of the variable yt in light of the preliminary analysis carried out in the previous section on the time series of US GDP growth rate. Therefore, our specification of the model is the following where Ft−1 is the information set which includes the lagged values of the variable yt−1, yt−2, ... and the conditional variance has a GARCH(1,1) specification. The vector of parameters is θ = (α, φ1, ..., φp, ω0, ω1, ω2). depends on a set of parameters γ and must satisfy a number of assumptions in order to guarantee that the resulting process (yt) t≥1 is stationary and ergodic, as shown in Gobbi and Mulinacci (2020). The choice of the function ψ completely determines the SDAR model. In this paper we consider two specifications of the model, denoted by SDAR1 and SDAR2. Both satisfy the required assumptions as shown in Gobbi and Mulinacci (2020). The first SDAR1 model is defined as where γ0, γ1 > 0. The error term ξt is independent of yt−1 for all t. The vector of parameters is θ = (α, γ0, γ1, r, σ). Remark that this specification is a generalization of EXPAR models introduced by Haggan and Ozaki (1981). Some insights about the persistence function e −(γ 0 +γ 1 y 2r t−1 ) are needed. We can notice that it is decreasing with The second SADR2 model is ( 3.7) where γ0 > 1 and γ1 > 0, whereas the statistical properties of ξt and the vector of parameters are the same of the SDAR1 model. For this specification the same considerations about the persistence function apply. We can only observe that the maximum is 1 γ 0 .

Estimation
The empirical results relative to the parameter estimates of the models presented above are re- To evaluate if the proposed models are well specified we consider the residuals diagnostics in

Forecasting
We assess the forecast performance of each estimated model relative to linear AR (2)  we deduce that, if the RE is measured in terms of the RMSE, only SDAR models offer a better performance than the linear AR(2) at least for the first 4 quarters. After this horizon the accuracy seems to be equivalent even if the linear AR(2) is slightly higher. On the other hand, the remaining alternative models tend to be worse but the SETAR(3,3,3,1) model is the only one to improve significantly over time until it become superior than the linear AR(2) for the last two forecast horizons. As regards SETAR models, Tiao and Tsay (1994) find that the forecasts obtained with this class of models are markedly superior than those obtained with the linear AR models if we only consider forecasts which are made when the economy is in the lower regime reflecting the ability of the SETAR models to capture the movements out of recession. In our observed time series the percentage of data belonging to the lower regime is of 36%, and this partly explains why on average the forecasts obtained by the SETAR model are lower.
The same considerations are strengthened if we consider the RE in terms of the MAD, as in table 5 and figure 6. In this case, both SDAR models provide a prediction with an accuracy higher than the benchmark for each forecast horizon and in the first four quarters (basically over the course of a year) the gain in the accuracy is considerable. Based on this measure we can conclude that there is an evidence that SDAR models has superior predictive ability compared to alternative models analyzed in this paper. These findings are not surprising if we consider the preliminary results on the sample. As shown in table 2, the evidence of nonlinearity is not strong, and in particular, this is confirmed and strengthened in the last ten years. The Tlrt test seems to exclude the presence of a threshold autoregressive structure in the last portion of the sample regardless of the lag considered. This can explain the relatively worse performance of SETAR models than the alternatives. SDAR models appear less conditioned by the kind and strength of the nonlinearity in the data.
third direction is that to address the problem of alternative (asymmetric?) distributions of the error term.