Fit extended SARIMA models

Fit extended SARIMA models, which can include lagged exogeneous variables, general unit root non-stationary factors, multiple periodicities, and multiplicative terms in the SARIMA specification. The models are specified with a flexible formula syntax and contain as special cases many models with specialised names, such as ARMAX and reg-ARIMA.

Usage

sarima(model, data = NULL, ss.method = "sarima", use.symmetry = FALSE, 
       SSinit = "Rossignol2011")

Arguments

model: a model formula specifying the model.
data: a list or data frame, usually can be omitted.
ss.method: state space engine to use, defaults to "sarima". (Note: this argument will probably be renamed.)
use.symmetry: a logical argument indicating whether symmetry should be used to estimate the unit polynomial.
SSinit: method to use for computation of the stationary part of the initial covariance matrix, one of "Rossignol2011", "gnb", "Gardner1980".

Details

sarima fits extended SARIMA models, which can include exogeneous variables, general unit root non-stationary factors and multiplicative terms in the SARIMA specification.

Let $\{Y_t\}$ be a time series and $f(t)$ and $g(t)$ be functions of time and/or (possibly lagged) exogeneous variables.

An extended pure SARIMA model for $Y_t$ can be written with the help of the backward shift operator as $$U(B)\Phi(B)Y_t = \Theta(B)\varepsilon_t,$$ where $\{\varepsilon_t\}$ is white noise, and $U(z)$, $\Phi(z)$, and $\Theta(z)$ are polynomials, such that all roots of $U(z)$ are on the unit circle, while the roots of $\Phi(z)$ and $\Theta(z)$ are outside the unit circle. If unit roots are missing, ie $U(z)\equiv 1$, the model is stationary with mean zero.

A reg-SARIMA or X-SARIMA model can be defined as a regression with SARIMA residuals: $$Y_t = f(t) + Y^c_t$$ $$U(B)\Phi(B)Y^c_t = \Theta(B)\varepsilon_t,$$ where $Y^c_t = Y_t - f(t)$ is the centred $Y_t$. This can be written equivalently as a single equation: $$U(B)\Phi(B)(Y_t - f(t)) = \Theta(B)\varepsilon_t.$$ The regression function $f(t)$ can depend on time and/or (possibly lagged) exogeneous variables. We call it centering function. If $Y^c_t$ is stationary with mean zero, f(t) is the mean of $Y_t$. If f(t) is constant, say mu, $Y_t$ is stationary with mean mu. Note that the two-equation form above shows that in that case mu is the intercept in the first equation, so it is perfectly reasonable to refer to it also as intercept but to avoid confusion we reserve the term intercept for g(t) below.

If the SARIMA part is stationary, then $EY_t = f(t)$, so $f(t)$ can be interpreted as trend. In this case the above specification is often referred to as mean corrected form of the model.

An alternative way to specify the regression part is to add the regression function, say $\{g(t)\}$, to the right-hand side of the SARIMA equation: $$U(B)\Phi(B)Y_t = g(t) + \Theta(B)\varepsilon_t.$$ In the stationary case this is the classical ARMAX specification. This can be written in two-stage form in various ways, eg $$U(B)\Phi(B)Y_t = (1 - \Theta(B))\varepsilon_t + u_t,$$ $$u_t = g(t) + \varepsilon_t .$$ So, in a sense, g(t) is a trend associated with the residuals from the SARIMA modelling. We refer to this form as intercept form of the model (as opposed to the mean-corrected form discussed previously).

In general, if there are no exogeneous variables the mean-corrected model is equivalent to some intercept model, which gives some justification of the terminology, as well. If there are exogeneous variables equivalence may be achievable but at the expense of introducing more lags in the model, whish is not desirable in general.

Some examples of equivalence. Let Y be a stationary SARIMA process ($U(z)=1$) with mean $\mu$. Then the mean-corrected form of the SARIMA model is $$\Phi(B)(Y_t - \mu) = \Theta(B)\varepsilon_t,$$ while the intercept form is $$\Phi(B)Y_t = c + \Theta(B)\varepsilon_t,$$ where $c = \Phi(B)\mu$. So, in this case the mean-corrected model X-SARIMA model with $f(t) = \mu$ is equivalent to the intercept model with $g(t) = \Phi(B)\mu$.

As another example, with $f(t) = bt$, the mean-corrected model is $(1-B)(Y_t - bt) = \varepsilon_t$. Expanding the left-hand side we obtain the intercept form $(1-B)Y_t = b + \varepsilon_t$, which demonstrates that $Y_t$ is a random walk with drift $g(t) = b$.

Model specification

Argument model specifies the model with a syntax similar to other model fitting functions in R. A formula can be given for each of the components discussed above as y ~ f | SARIMA | g, where f, SARIMA and g are model formulas giving the specifications for the centering function f, the SARIMA specification, and the intercept function g. In normal use only one of f or g will be different from zero. f should always be given (use 0 to specify that it is identical to zero), but g can be omitted altogether. Sometimes we refer to the terms specified by f and g by xreg and regx, respectively.

Model formulas for trends and exogeneous regressions

The formulas for the centering and intercept (ie f and g) use the same syntax as in linear models with some additional functions for trigonometric trends, polynomial trends and lagged variables.

Here are the available specialised terms:

.p(d)

Orthogonal polynomials over 1:length(y) of degree d (starting from degree 1, no constant).

t

Stands for 1:length(y). Note that powers need to be protected by I(), e.g. y ~ 1 + .t + I(.t^2).

.cs(s, k)

cos/sin pair for the k-th harmonic of 2pi/s. Use vector k to specify several harmonics.

.B(x, lags)

Include lagged terms of x, $B^{lags}(x[t]) = x[t - lags]$. lags can be a vector. If x is a matrix, the specified lags are taken from each column.

Model formulas for SARIMA models

A flexible syntax is provided for the specification of the SARIMA part of the model. It is formed using a number of primitives for stationary and unit root components, which have non-seasonal and seasonal variants. Arbitrary number of multiplicative factors and multiple seasonalities can be specified.

The SARIMA part of the model can contain any of the following terms. They can be repeated as needed. The first argument for all seasonal operators is the number of seasons.

ar(p)

autoregression term of order p

ma(q)

moving average term of order q

sar(s,p)

seasonal autoregression term (s seasons, order p)

sma(s,q)

seasonal moving average term (s seasons, order q)

i(d)

$(1-B)^d$

s(seas)

summation operator, $(1 + B + \cdots + B^{seas -1})$

u(x)

quadratic unit root term, corresponding to a complex pair on the unit circle. If $x$ is real, it specifies the argument of one of the roots as a fraction of $2\pi$. If $z$ is complex, it is the root itself.

The real roots of modulus one (1 and $-1$) should be specified using i(1) and s(2), which correspond to $1-B$ and $1+B$, respectively.

su(s, h)

quadratic unit root terms corresponding to seasonal differencing factors. h specifies the desired harmonic which should be one of 1,2, ..., [s/2]. Several harmonics can be specified by setting h to a vector.

ss(s, p)

seasonal summation operator, $(1 + B^s + \cdots + B^{(s-1)p})$

Terms with parameters can contain additional arguments specifying initial values, fixed parameters, and transforms. For ar, ma, sar, sma, values of the coefficients can be specified by an unnamed argument after the parameters given in the descriptions above. In estimation these values will be taken as initial values for optimisation. By default, all coefficients are taken to be non-fixed.

Argument fixed can be used to fix some of them. If it is a logical vector it should be of length one or have the same length as the coefficients. If fixed is of length one and TRUE, all coefficients are fixed. If FALSE, all are non-fixed. Otherwise, the TRUE/FALSE values in fixed determine the fixedness of the corresponding coefficients.

fixed can also be a vector of positive integer numbers specifying the indices of fixed coefficients, the rest are non-fixed.

Sometimes it may be easier to declare more (e.g. all) coefficients as fixed and then `unfix' selectively. Argument nonfixed can be used to mark some coefficients as non-fixed after they have been declared fixed. Its syntax is the same as for fixed.

TODO: streamline "atanh.tr"

TODO: describe SSinit

Value

an object from S3 class Sarima

(Note: the format of the object is still under development and may change; use accessor functions, such as coef(), where provided.)

References

Halliday J, Boshnakov GN (2022). “Partial autocorrelation parameterisation of models with unit roots on the unit circle.” doi:10.48550/ARXIV.2208.05055 , https://arxiv.org/abs/2208.05055.

Author

Georgi N. Boshnakov

Note

Currently the implementation of the intercept form (ie the third part of the model formula) is incomplete.

Examples

## AirPassengers example
## fit the classic airline model using arima()
ap.arima <- arima(log(AirPassengers), order = c(0,1,1), seasonal = c(0,1,1))

## same model using two equivalent ways to specify it
ap.baseA <- sarima(log(AirPassengers) ~ 
                   0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(1) + si(12,1), 
                   ss.method = "base")
ap.baseB <- sarima(log(AirPassengers) ~ 
                   0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(12), 
                   ss.method = "base")

ap.baseA
#> *Sarima model*
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + 
#>     i(1) + si(12, 1), ss.method = "base")
#> 
#> Unit root terms:
#>      (1 - B)(1 - B^12)
#> 
#> Coefficients:
#>           ma1     sma1
#>       -0.4015  -0.5615
#> s.e.   0.1072   0.1068
#> 
#> sigma^2 estimated as 0.001347:  log likelihood = 169.3,  aic = -332.61
summary(ap.baseA)
#> 
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 
#>     1, c(-0.1)) + i(1) + si(12, 1), ss.method = "base")
#> 
#> Model:  Y_t - xreg_t is SarimaX
#>   xreg:    log(AirPassengers) ~ 0 
#>   sarima:  ~ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + i(1) + si(12, 1) 
#>   regx:    0 
#> 
#> Unit root terms:
#>      (1 - B)(1 - B^12)
#> 
#> Coefficients:
#>      Estimate Std. Error Z value  Pr(>|z|)    
#> ma1  -0.40153    0.10716 -3.7472 0.0001788 ***
#> sma1 -0.56154    0.10680 -5.2581 1.456e-07 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> estimated sigma^2 = 0.00135,  log-likelihood = 169.3,  aic = -332.61
ap.baseB
#> *Sarima model*
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + 
#>     i(2) + s(12), ss.method = "base")
#> 
#> Unit root terms:
#>      (1 - B)^2(1 + B + ... + B^11)
#> 
#> Coefficients:
#>           ma1     sma1
#>       -0.4015  -0.5615
#> s.e.   0.1072   0.1068
#> 
#> sigma^2 estimated as 0.001347:  log likelihood = 169.3,  aic = -332.61
summary(ap.baseB)
#> 
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 
#>     1, c(-0.1)) + i(2) + s(12), ss.method = "base")
#> 
#> Model:  Y_t - xreg_t is SarimaX
#>   xreg:    log(AirPassengers) ~ 0 
#>   sarima:  ~ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + i(2) + s(12) 
#>   regx:    0 
#> 
#> Unit root terms:
#>      (1 - B)^2(1 + B + ... + B^11)
#> 
#> Coefficients:
#>      Estimate Std. Error Z value  Pr(>|z|)    
#> ma1  -0.40153    0.10716 -3.7472 0.0001788 ***
#> sma1 -0.56154    0.10680 -5.2581 1.456e-07 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> estimated sigma^2 = 0.00135,  log-likelihood = 169.3,  aic = -332.61

## as above, but drop 1-B from the model:
ap2.arima <- arima(log(AirPassengers), order = c(0,0,1), seasonal = c(0,1,1))
ap2.baseA <- sarima(log(AirPassengers) ~ 
                    0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) +     si(12,1), 
                    ss.method = "base")
ap2.baseB <- sarima(log(AirPassengers) ~ 
                    0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(1) + s(12), 
                    ss.method = "base")

## for illustration, here the non-stationary part is 
##     (1-B)^2(1+B+...+B^5) = (1-B)(1-B^6)
##     (  compare to (1-B)(1-B^{12}) = (1-B)(1-B^6)(1+B^6) ) 
ap3.base <- sarima(log(AirPassengers) ~ 
                   0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(6), 
                   ss.method = "base")

## further unit roots, equivalent specifications for the airline model
tmp.su <- sarima(log(AirPassengers) ~ 
                 0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(2) + su(12,1:5), 
                 ss.method = "base")
tmp.su$interna$delta_poly
#> List of polynomials:
#> [[1]]
#> 1 - x 
#> 
#> [[2]]
#> 1 + x 
#> 
#> [[3]]
#> 1 - 1.732051*x + x^2 
#> 
#> [[4]]
#> 1 - x + x^2 
#> 
#> [[5]]
#> 1 + x^2 
#> 
#> [[6]]
#> 1 + x + x^2 
#> 
#> [[7]]
#> 1 + 1.732051*x + x^2 
#> 
prod(tmp.su$interna$delta_poly)
#> 1 - 8.881784e-16*x + 8.881784e-16*x^2 - 1.110223e-15*x^3 + 4.440892e-16*x^4 -  
#> 6.661338e-16*x^5 + 8.881784e-16*x^7 - 4.440892e-16*x^8 + 8.881784e-16*x^9 -  
#> 8.881784e-16*x^10 + 8.881784e-16*x^11 - x^12 
zapsmall(coef(prod(tmp.su$interna$delta_poly)))
#>  [1]  1  0  0  0  0  0  0  0  0  0  0  0 -1
tmp.su
#> *Sarima model*
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + 
#>     i(2) + s(2) + su(12, 1:5), ss.method = "base")
#> 
#> Unit root terms:
#>      (1 - B)^2(1 + B)(1 - 1.73205080756888*B + B^2)(1 - B + B^2)(1 + B^2)(1 + B + B^2)(1 + 1.73205080756888*B + B^2)
#> 
#> Coefficients:
#>           ma1     sma1
#>       -0.4015  -0.5615
#> s.e.   0.1072   0.1068
#> 
#> sigma^2 estimated as 0.001347:  log likelihood = 169.3,  aic = -332.61

tmp.u <- sarima(log(AirPassengers) ~ 
                0 | ma(1, c(-0.3)) + sma(12,1, c(-0.1)) + i(2) + s(2) + u((1:5)/12), 
                ss.method = "base")
tmp.u
#> *Sarima model*
#> Call:
#> sarima(model = log(AirPassengers) ~ 0 | ma(1, c(-0.3)) + sma(12, 1, c(-0.1)) + 
#>     i(2) + s(2) + u((1:5)/12), ss.method = "base")
#> 
#> Unit root terms:
#>      (1 - B)^2(1 + B)(1 - 1.73205080756888*B + B^2)(1 - B + B^2)(1 + B^2)(1 + B + B^2)(1 + 1.73205080756888*B + B^2)
#> 
#> Coefficients:
#>           ma1     sma1
#>       -0.4015  -0.5615
#> s.e.   0.1072   0.1068
#> 
#> sigma^2 estimated as 0.001347:  log likelihood = 169.3,  aic = -332.61