Using a time-stamp to find the median for every hour

Hello,
I am working on a project that is examining the consumption of oxygen within the brain of neonatal children with congenital heart disease. We pull data from a variety of sources that sample at different frequencies. All of these different sources come with a time-stamp for each individual entry. There are between 3-8 .asc files for each patient containing around 20,000 rows of data. Currently I have multiple .m files to find the median based off of the number of rows that would exist within each hour (ie. if the sampling occurs every 10 seconds it counts 360 rows). However, this is not perfect since during more frequent sampling such as every second there are missing data points. Additionally, between each file there are missing time points when the monitor was not connected for various reasons.
Ideally, I would like a script that would be able to combine multiple files into one large data-set for each patient. Then utilize the time-stamps to return the median for every hour for 3 separate variables (VO2, VCO2 and SvO2) within this data. Ideally the returned values will be displayed in rows per Hour with each corresponding median.
If it is not clear from this post, I have practically no experience at all with MatLab. I am in contact with some that understand it much better than I but they are all currently very busy and I need this to move forward. If anybody has an idea on how I could make this work please let me know, I would greatly appreciate it! You will be directly helping in research that may save children's lives in the future!

1 Comment

I think that you increase your chances here if you
  • supply a couple of sample data files (attached via the paper-clip button)
  • make a rather detailed out-line of the result file (I assume some kind of text file)
  • describe how meta-data should be found and presented
  • describe how the data files can be found (avoid mixing of files from different children)
  • etc.

Sign in to comment.

Answers (2)

If you could show a snippet of one of the data files I might be able to give some more specific guidance, but I'd suggest:
  • f = cellstr(ls('*.asc')); to get the files to read
  • loop over the files in f
  • readtable to read in each file, concatenate to the current data (could be a bit slow, but no way around that unless you know how many rows you have in each file a priori)
  • sortrows to sort by the time stamp
  • calculate an hour variable -- something like this: hour = floor(timestamp/3600) (assuming that the time stamp is just a linear time in seconds)
  • use accumarray or grpstats (if you have Statistics TB) to calculate the median, using the hour as the index/grouping variable: grpstats(data,hour,@median)
Depending on how the files are split, you may be able to skip some of the initial steps. Or do different things. Are the different files different times? Different things being measured (VO2 vs VCO2)?

5 Comments

Hi Matt, Thanks for your help so far! The files are in chronological order for each individual patient already. They have different start and end times based off of their course within the hospital. The time is currently stored as hh:mm:ss. Each of the files has all 3 variables (some time points are marked as -378219 as there was no recording made at that time). How can I securely share a data file? Please let me know if you have any other questions!
"securely share a data file" &nbsp I think that only some short sample files with dummy data is needed. It's the format that is important. In the end you must test with real data.
A couple of important details: are the times duration since some reference point or actual time of day? And, depending on that, how do you want the grouping by hours done (again, as an elapsed time or time of day)? Also, how does the time split from one file to another? (This depends on what the time represents.) This is important because it determines whether you can process the files one at a time or whether they have to be taken together. If it's time of day, for example, then multiple files can have the same time, so they need to be taken together.
One last thing: if you can't actually share the data, can you show even just a line or two to show the formatting? (You can scramble the actual values.) But it's important to know how the data is stored -- delimited text? What's the delimiter? Are there header lines? etc.
The time stamps represent the actual time of day, spanning over multiple days. So there would be multiple data-points with the same time-stamp. We would want a median for every hour over the duration of the monitoring (which as I mentioned before changes based on each patient).
I do not know how to edit the .asc file so I copied and changed the data into an excel file, however, it appears the exact same way on each file for the patient. The headers will be at the top for each one of the files that belongs to each individual patient.
The variables of interest are located in columns: BL, ED and EE. The negative numbers that are seen represent a missing value. In this example that does not happen for the variables that we are interested in, but for other patients that is a possibility.
The only additional factor that would be incredible if you could account for would be that if the median value for VO2 occurred at the same time point that the RQ value is less than .8 or greater than 1.3 then that value is invalid.
Let me know if a certain part of this did not make sense. Thanks again for all of the help!
Two questions regarding the names of the files
  • is it possible to retrieve an ID of the patient from the name of the file or the name of the folder?
  • is it possible to retrieve the order in time of the files from their names?
PS. A blank line is needed to separate paragraphs

Sign in to comment.

The hourly median of ~10 second data can be found easily if you have time in Matlab's datenum format. Use downsample_ts:
If V02_10 and t_10 are your V02 readings and corresponding time vector,
V02_hourly_medians = downsample_ts(V02_10,t_10,'median','hour');

Categories

Asked:

on 5 Nov 2014

Edited:

on 5 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!