Interpolating Multivariate time series

2 views (last 30 days)
Andrew
Andrew on 29 Apr 2011
Hi all,
I'm trying to test a multivariate time series dataset which has 2536instances and 73 attributes with missing values(represented by ?) in some rows. I tried looking for interpolating the time series. But all I can see is for 2-3 attributes.
Can someone help me on how to interpolate this dataset?The dataset is in .data format.
Andrew

Answers (3)

Andrew
Andrew on 29 Apr 2011
To be clear,the dataset will be something similar to this
1/1/1998,0.8,1.8,2.4,2.1,10330,-55,0,0.
1/2/1998,2.8,3.2,3.3,2.7,10275,-55,0,0.
. . .
1/5/1998,2.6,2.1,1.6,1.4,?,?,?,0.58,0.
. . .
1/22/1998,2.8,3.6,?,?,4.6,10090,-40,0,0.
  4 Comments
Andrew
Andrew on 29 Apr 2011
@Oleg
not really...all the rows have same number of colums with 73 attributes.
This is the dataset I'm talking about
http://archive.ics.uci.edu/ml/machine-learning-databases/ozone/onehr.data
it has total 75 columns 1 date+73 attributes+1 result column which says if it's ozone day or not.
Andrew
Andrew on 29 Apr 2011
@andrei
I'm not sure on how to use TriScatteredInnterp. Would you mind helping with the code that does the interpolation and save that missing values in the .data file. I need to use that data to test the algorithm
Thanks

Sign in to comment.


Richard Willey
Richard Willey on 29 Apr 2011
Handling missing data is a very complicated topic.
There are a number of different approaches that you can use including listwise deletion, substitution models, multiple imputation, yada yada yada. Each approach has its own advantages and disadvantages.
For example, an approach based on substitution (regression substitution, interpolation, what have you) will give you a complete data set to work with, however, this new data set is going to be biased. (As a simple example, supposed that you use a regression substitution model to estimate plausible values for your missing data point. Later on, you fit a regression model to your [complete) data set and report an R^2...)
Alternatively, an approach based on listwise deletion won't [necessarily] run into the same problems with bias, however, you will have issues with loss of statistical power.
I took a quick look at the data set in question. Two observations.
1. You are missing large blocks of data - this is going to cause some real problems for interpolation based techniques
2. Your data doesn't appear to be Missing Completely At Random or even Missing at Random
Personally, I would start with listwise deletion...

Andrew
Andrew on 30 Apr 2011
I guess I can't delete the missing values..
How do we interpolate that with interp1???Can I use this to interpolate the above dataset?
I've read somewhere in the matlab works saying, yi = interp1(Y,xi) assumes that x = 1:N, where N is the length of Y for vector Y, or size(Y,1) for matrix Y.
yi = interp1(x,Y,xi,method) interpolates using alternative methods:
But then, how does it know what dataset to use??when I load dataset using "load onehr.data",it says unknown value '?'...
Can someone help me??

Categories

Find more on Data Preprocessing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!