Handle missing values in 'timeSeries' objects
stats-na.omit.Rd
Functions for handling missing values in "timeSeries"
objects.
Arguments
- object
an object of class
"timeSeries"
.- method
for
na.omit
, the method of handling NAs; forinterpNA
, how to interpolate the matrix column by column, see Section ‘Details’.- interp, type
Three alternative methods are provided to remove NAs from the data:
type="zeros"
replaces the missing values with zeros,type="mean"
replaces the missing values with the column mean,type="median"
replaces the missing values with the column median.- FUN
a function or a name of a function, such as
"mean"
ormedian
.FUN
is applied to the non-NA values in each column to determine the replacement value. The call looks likeFUN(coli, na.rm = TRUE)
, soFUN
should have argumentna.rm
. All arguments exceptobject
are ignored ifFUN
is specified.- x
-
a numeric matrix, or any other object which can be transformed into a matrix through
x = as.matrix(x, ...)
. Ifx
is a vector, it will be transformed into a one-dimensional matrix. - ...
arguments to be passed to the function
as.matrix
.
Details
Functions for handling missing values in "timeSeries"
objects
and in objects which can be transformed into a vector or a two
dimensional matrix.
For na.omit
argument method
specifies the method how
to handle NAs. Can be one of the following strings:
- method="s"
na.rm = FALSE
, skip, i.e. do nothing,- method="r"
remove NAs,
- method="z"
substitute NAs by zeros,
- method="ir"
interpolate NAs and remove NAs at the beginning and end of the series,
- method="iz"
interpolate NAs and substitute NAs at the beginning and end of the series,
- method="ie"
interpolate NAs and extrapolate NAs at the beginning and end of theseries.
For interpNA
argument method
specifies
how to interpolate the matrix column
by column. One of the following character strings:
"linear"
,
"before"
,
"after"
.
For interpolation the function approx
is used.
The functions are listed by topic.
na.omit | Handles NAs, |
removeNA | Removes NAs from a matrix object, |
substituteNA | substitute NAs by zero, the column mean or median, |
interpNA | interpolates NAs using R's "approx" function. |
Missing Values in Price and Index Series:
Applied to timeSeries
objects the function removeNA
just removes rows with NAs from the series. For an interpolation
of time series points one can use the function interpNA
.
Three different methods of interpolation are offered: "linear"
does a linear interpolation, "before"
uses the previous value,
and "after"
uses the following value. Note, that the
interpolation is done on the index scale and not on the time scale.
Missing Values in Return Series:
For return series the function substituteNA
may be useful. The
function allows to fill missing values either by method="zeros"
,
the method="mean"
or the method="median"
value of the
appropriate columns.
Note
When dealing with daily data sets, there exists another function
alignDailySeries
which can handle missing data in un-aligned
calendrical "timeSeries"
objects.
The functions removeNA
, substituteNA
and interpNA
are older implementations. Please use in all cases if possible the
new function na.omit
.
Additional remarks by GNB:
removeNA(x)
is equivalent to na.omit(x)
or
na.omit(x), methods = "r"
.
interpNA
can be replaced by a call to na.omit
with
argument method equal to ir
, iz
, or ie
, and
argument "interp"
equal to the "method"
argument for
interpNA
(note that the defaults are not the same).
substituteNA(x, type = "zeros")
is equivalent to
na.omit(x, method = "z")
. For other values of type
one
can use argument FUN
, as in na.omit(x, FUN = "mean")
.
A final remark: the three deprecated functions are non-generic.
removeNA(x)
is completely redundant as it simply calls
na.omit
. The other two however may be useful for matrix-like
objects. Please inform the maintainer of the package if you use them
on objects other than from class "timeSeries"
and wish them
kept in the future.
References
Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani R., Botstein D., Altman R.B., (2001); Missing Value Estimation Methods for DNA microarrays Bioinformatics 17, 520--525.
Examples
X <- matrix(rnorm(100), ncol = 5) # Create a Matrix X
X[3, 5] <- NA # Replace a Single NA Inside
X[17, 2:4] <- c(NA, NA, NA) # Replace Three in a Row Inside
X[13:15, 4] <- c(NA, NA, NA) # Replace Three in a Column Inside
X[11:12, 5] <- c(NA, NA) # Replace Two at the Right Border
X[20, 1] <- NA # Replace One in the Lower Left Corner
X
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.51319841 -0.18022159 -0.58244338 1.1745123
#> [2,] -1.2900417 -0.43898655 -0.66582924 -0.74490561 -1.1151520
#> [3,] 0.4222867 -1.32640195 1.31724791 -1.50875006 NA
#> [4,] -0.1030881 -1.19271949 0.13421979 -0.95380354 -0.3747833
#> [5,] 0.5258783 -1.13269775 0.33373548 0.13156962 -1.2167619
#> [6,] 0.4992021 -0.71374675 1.42513695 -0.10488850 -1.6879300
#> [7,] 1.2296179 0.97129225 -0.66687363 -1.29914179 -0.8430539
#> [8,] 0.4359482 0.11081679 -0.15419999 -1.81072734 1.3052824
#> [9,] -0.7221102 1.14315208 0.39575880 0.34617192 0.2354969
#> [10,] 0.8468178 -0.79418025 -0.28903724 0.30310787 0.7766441
#> [11,] 1.9957190 -0.09344266 -1.03946394 0.62199075 NA
#> [12,] 0.8602705 -0.04080370 0.91980757 -0.07572521 NA
#> [13,] -0.1299832 -0.76676070 1.18297818 NA -0.4331491
#> [14,] -0.2294296 2.04819505 -0.06661768 NA 0.7639455
#> [15,] 0.1092238 -0.70144007 0.69121516 NA 0.1272981
#> [16,] 0.7522100 0.65586854 -1.28738212 1.42095840 -2.3297949
#> [17,] 0.9254184 NA NA NA -0.9856772
#> [18,] -0.2917223 1.94466287 -0.68995732 -0.37824285 0.4704501
#> [19,] -0.4709085 0.66014159 -0.53506265 0.07207710 2.2224636
#> [20,] NA 0.37914099 -1.39575652 1.05144407 -0.1124329
Xts <- timeSeries(X) # convert X to timeSeries Xts
## Remove Rows with NAs
na.omit(Xts)
#>
#> SS.1 SS.2 SS.3 SS.4 SS.5
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
## Subsitute NA's with zeros or column means (formerly substituteNA())
na.omit(Xts, method = "z")
#>
#> SS.1 SS.2 SS.3 SS.4 SS.5
#> [1,] -1.2188515 -1.51319841 -0.18022159 -0.58244338 1.1745123
#> [2,] -1.2900417 -0.43898655 -0.66582924 -0.74490561 -1.1151520
#> [3,] 0.4222867 -1.32640195 1.31724791 -1.50875006 0.0000000
#> [4,] -0.1030881 -1.19271949 0.13421979 -0.95380354 -0.3747833
#> [5,] 0.5258783 -1.13269775 0.33373548 0.13156962 -1.2167619
#> [6,] 0.4992021 -0.71374675 1.42513695 -0.10488850 -1.6879300
#> [7,] 1.2296179 0.97129225 -0.66687363 -1.29914179 -0.8430539
#> [8,] 0.4359482 0.11081679 -0.15419999 -1.81072734 1.3052824
#> [9,] -0.7221102 1.14315208 0.39575880 0.34617192 0.2354969
#> [10,] 0.8468178 -0.79418025 -0.28903724 0.30310787 0.7766441
#> [11,] 1.9957190 -0.09344266 -1.03946394 0.62199075 0.0000000
#> [12,] 0.8602705 -0.04080370 0.91980757 -0.07572521 0.0000000
#> [13,] -0.1299832 -0.76676070 1.18297818 0.00000000 -0.4331491
#> [14,] -0.2294296 2.04819505 -0.06661768 0.00000000 0.7639455
#> [15,] 0.1092238 -0.70144007 0.69121516 0.00000000 0.1272981
#> [16,] 0.7522100 0.65586854 -1.28738212 1.42095840 -2.3297949
#> [17,] 0.9254184 0.00000000 0.00000000 0.00000000 -0.9856772
#> [18,] -0.2917223 1.94466287 -0.68995732 -0.37824285 0.4704501
#> [19,] -0.4709085 0.66014159 -0.53506265 0.07207710 2.2224636
#> [20,] 0.0000000 0.37914099 -1.39575652 1.05144407 -0.1124329
na.omit(Xts, FUN = "mean")
#>
#> SS.1 SS.2 SS.3 SS.4 SS.5
#> [1,] -1.2188515 -1.51319841 -0.18022159 -0.58244338 1.1745123
#> [2,] -1.2900417 -0.43898655 -0.66582924 -0.74490561 -1.1151520
#> [3,] 0.4222867 -1.32640195 1.31724791 -1.50875006 -0.1189790
#> [4,] -0.1030881 -1.19271949 0.13421979 -0.95380354 -0.3747833
#> [5,] 0.5258783 -1.13269775 0.33373548 0.13156962 -1.2167619
#> [6,] 0.4992021 -0.71374675 1.42513695 -0.10488850 -1.6879300
#> [7,] 1.2296179 0.97129225 -0.66687363 -1.29914179 -0.8430539
#> [8,] 0.4359482 0.11081679 -0.15419999 -1.81072734 1.3052824
#> [9,] -0.7221102 1.14315208 0.39575880 0.34617192 0.2354969
#> [10,] 0.8468178 -0.79418025 -0.28903724 0.30310787 0.7766441
#> [11,] 1.9957190 -0.09344266 -1.03946394 0.62199075 -0.1189790
#> [12,] 0.8602705 -0.04080370 0.91980757 -0.07572521 -0.1189790
#> [13,] -0.1299832 -0.76676070 1.18297818 -0.21945678 -0.4331491
#> [14,] -0.2294296 2.04819505 -0.06661768 -0.21945678 0.7639455
#> [15,] 0.1092238 -0.70144007 0.69121516 -0.21945678 0.1272981
#> [16,] 0.7522100 0.65586854 -1.28738212 1.42095840 -2.3297949
#> [17,] 0.9254184 -0.04216359 -0.03001590 -0.21945678 -0.9856772
#> [18,] -0.2917223 1.94466287 -0.68995732 -0.37824285 0.4704501
#> [19,] -0.4709085 0.66014159 -0.53506265 0.07207710 2.2224636
#> [20,] 0.2182346 0.37914099 -1.39575652 1.05144407 -0.1124329
na.omit(Xts, FUN = "median")
#>
#> SS.1 SS.2 SS.3 SS.4 SS.5
#> [1,] -1.2188515 -1.51319841 -0.18022159 -0.58244338 1.1745123
#> [2,] -1.2900417 -0.43898655 -0.66582924 -0.74490561 -1.1151520
#> [3,] 0.4222867 -1.32640195 1.31724791 -1.50875006 -0.1124329
#> [4,] -0.1030881 -1.19271949 0.13421979 -0.95380354 -0.3747833
#> [5,] 0.5258783 -1.13269775 0.33373548 0.13156962 -1.2167619
#> [6,] 0.4992021 -0.71374675 1.42513695 -0.10488850 -1.6879300
#> [7,] 1.2296179 0.97129225 -0.66687363 -1.29914179 -0.8430539
#> [8,] 0.4359482 0.11081679 -0.15419999 -1.81072734 1.3052824
#> [9,] -0.7221102 1.14315208 0.39575880 0.34617192 0.2354969
#> [10,] 0.8468178 -0.79418025 -0.28903724 0.30310787 0.7766441
#> [11,] 1.9957190 -0.09344266 -1.03946394 0.62199075 -0.1124329
#> [12,] 0.8602705 -0.04080370 0.91980757 -0.07572521 -0.1124329
#> [13,] -0.1299832 -0.76676070 1.18297818 -0.09030686 -0.4331491
#> [14,] -0.2294296 2.04819505 -0.06661768 -0.09030686 0.7639455
#> [15,] 0.1092238 -0.70144007 0.69121516 -0.09030686 0.1272981
#> [16,] 0.7522100 0.65586854 -1.28738212 1.42095840 -2.3297949
#> [17,] 0.9254184 -0.09344266 -0.15419999 -0.09030686 -0.9856772
#> [18,] -0.2917223 1.94466287 -0.68995732 -0.37824285 0.4704501
#> [19,] -0.4709085 0.66014159 -0.53506265 0.07207710 2.2224636
#> [20,] 0.4222867 0.37914099 -1.39575652 1.05144407 -0.1124329
## Subsitute NA's with a trimmed mean
na.omit(Xts, FUN = function(x, na.rm) mean(x, trim = 0.10, na.rm = na.rm))
#>
#> SS.1 SS.2 SS.3 SS.4 SS.5
#> [1,] -1.2188515 -1.51319841 -0.18022159 -0.58244338 1.1745123
#> [2,] -1.2900417 -0.43898655 -0.66582924 -0.74490561 -1.1151520
#> [3,] 0.4222867 -1.32640195 1.31724791 -1.50875006 -0.1276874
#> [4,] -0.1030881 -1.19271949 0.13421979 -0.95380354 -0.3747833
#> [5,] 0.5258783 -1.13269775 0.33373548 0.13156962 -1.2167619
#> [6,] 0.4992021 -0.71374675 1.42513695 -0.10488850 -1.6879300
#> [7,] 1.2296179 0.97129225 -0.66687363 -1.29914179 -0.8430539
#> [8,] 0.4359482 0.11081679 -0.15419999 -1.81072734 1.3052824
#> [9,] -0.7221102 1.14315208 0.39575880 0.34617192 0.2354969
#> [10,] 0.8468178 -0.79418025 -0.28903724 0.30310787 0.7766441
#> [11,] 1.9957190 -0.09344266 -1.03946394 0.62199075 -0.1276874
#> [12,] 0.8602705 -0.04080370 0.91980757 -0.07572521 -0.1276874
#> [13,] -0.1299832 -0.76676070 1.18297818 -0.22296711 -0.4331491
#> [14,] -0.2294296 2.04819505 -0.06661768 -0.22296711 0.7639455
#> [15,] 0.1092238 -0.70144007 0.69121516 -0.22296711 0.1272981
#> [16,] 0.7522100 0.65586854 -1.28738212 1.42095840 -2.3297949
#> [17,] 0.9254184 -0.07859440 -0.03527544 -0.22296711 -0.9856772
#> [18,] -0.2917223 1.94466287 -0.68995732 -0.37824285 0.4704501
#> [19,] -0.4709085 0.66014159 -0.53506265 0.07207710 2.2224636
#> [20,] 0.2023988 0.37914099 -1.39575652 1.05144407 -0.1124329
## Interpolate NA's Linearily (formerly interpNA())
na.omit(X, method = "ir", interp = "linear")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
na.omit(X, method = "iz", interp = "linear")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
na.omit(X, method = "ie", interp = "linear")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
## Take Previous Values in a Column
na.omit(X, method = "ir", interp = "before")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
na.omit(X, method = "iz", interp = "before")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
na.omit(X, method = "ie", interp = "before")
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] -1.2188515 -1.5131984 -0.1802216 -0.5824434 1.1745123
#> [2,] -1.2900417 -0.4389865 -0.6658292 -0.7449056 -1.1151520
#> [3,] -0.1030881 -1.1927195 0.1342198 -0.9538035 -0.3747833
#> [4,] 0.5258783 -1.1326977 0.3337355 0.1315696 -1.2167619
#> [5,] 0.4992021 -0.7137467 1.4251369 -0.1048885 -1.6879300
#> [6,] 1.2296179 0.9712922 -0.6668736 -1.2991418 -0.8430539
#> [7,] 0.4359482 0.1108168 -0.1542000 -1.8107273 1.3052824
#> [8,] -0.7221102 1.1431521 0.3957588 0.3461719 0.2354969
#> [9,] 0.8468178 -0.7941803 -0.2890372 0.3031079 0.7766441
#> [10,] 0.7522100 0.6558685 -1.2873821 1.4209584 -2.3297949
#> [11,] -0.2917223 1.9446629 -0.6899573 -0.3782428 0.4704501
#> [12,] -0.4709085 0.6601416 -0.5350626 0.0720771 2.2224636
#> attr(,"na.action")
#> [1] 20 17 13 14 15 3 11 12
#> attr(,"class")
#> [1] "omit"
## examples with X (which is a matrix, not "timeSeries"
## (these examples are not run automatically as these functions are
## deprecated.)
if(FALSE){
## Remove Rows with NAs -
removeNA(X)
## Subsitute NA's by Zeros or Column Means -
substituteNA(X, type = "zeros")
substituteNA(X, type = "mean")
## Interpolate NA's Linearily -
interpNA(X, method = "linear")
# Note the corner missing value cannot be interpolated!
## Take Previous Values in a Column -
interpNA(X, method = "before")
# Also here, the corner value is excluded
}