Why CSV date column is imported as NaN ?

67 views (last 30 days)
Respected colleagues ...
I have imported data from a csv file (John Hopkins).
The date column header (starting from 5th column till end) is imported as NaN ?
Date format is MM/DD/YY
My code is as below (under Matlab2020b)
==============================================================
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv';
fileinf= 'time_series_covid19_confirmed_global.csv';
websave(fileinf,url);
dataset = readtable('time_series_covid19_confirmed_global.csv');
=============================================================
Your assistance in this regard is appreciated. Regards

Accepted Answer

Walter Roberson
Walter Roberson on 28 Oct 2020
dataset = readtable('time_series_covid19_confirmed_global.csv','readvariablenames',true,'preservevariablename',true);
  2 Comments
Clement
Clement on 28 Oct 2020
Edited: Clement on 28 Oct 2020
Thank you for replying.
Would it be possible to get first row in the table instead of considering it as a header (as below)
Provence/state, Country region, Lat, Long, 1/22/20, ... etc
Once more, thank you for your assistance.
Walter Roberson
Walter Roberson on 28 Oct 2020
If that row is to be considered data, then you have a few choices:
  • do two different readtable operations, one saying to read the first row and the other to read the others
  • use readtable for all the rows, but force columns after the 2nd to be numeric, with the date entries on the first row coming out as NaN because they are not numeric
  • use readtable for all the rows, but force columns after the 2nd to be datetime, with the numeric entries on the remaining rows coming out as NaT (Not A Time) because they are not dates
  • use readcell() so that you can get a mix of data types, with everything coming out as cell and it being necessary to convert the cells in the second and following rows into numeric form for practical use
  • use textscan() with a format for the first row and a count of 1, then textscan() with a format for the remaining rows and no count
  • use fscanf() with %s formats and a count to read the first line, and then use fscanf() with mixed format and patch together the outputs because fscanf does strange things when you ask to scan numeric and string in the same call
I would suggest to you that by far the easier way to proceed would to do the readtable I showed above and then
first_row = dataset.Properties.VariableNames;
If the goal is to create a table object in which you have that line as a table row and the other rows as well, then you will need to use the cell approach, as the only time table variables can have different datatypes on different rows is if the variable is set up for cell.

Sign in to comment.

More Answers (0)

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!