Why does Readtable return NaN for values greater than 1000?

Question

0 votes

I was working with a dataset that contains the annual GDP of all countries in billion USD. However, when importing the dataset with readtable, the corresponding table omits all data points that are greater than 1000.

fileName2 = 'GDP by Country 1999-2022.csv';
GDP = readtable(fileName2);

The following is the original CSV data (see Canada)

Whereas the table GDP that readtable produces is as below

Is there a reason for this happening?

4 Comments
Show 2 older comments Hide 2 older comments

Stephen23 on 27 Nov 2023

"the authors used double-quotes for any number greater than 1000"

Ugh.

"Any suggestions on how to convert these strings to double while importing?"

Upload a sample data file by clicking the paperclip button.

Kevin on 27 Nov 2023

GDP by Country 1999-2022.csv

Here you go!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Stephen23 on 27 Nov 2023

Edited: Stephen23 on 27 Nov 2023

Open in MATLAB Online

1 vote

GDP by Country 1999-2022.csv

Ugh, double-quoting number values really is an abuse of CSV norms. Note that is not possible to simply remove/ignore the double quotes globally (i.e. throughout the file), because

they are also significant in the first column of text, e.g. "Bahamas, The"
the thousands separator is the same as the field delimiter character.

So we need a way of removing/ignoring them only from numeric values... and to know if they are numeric requires removing them first before testing if they are numeric so that the quotes can be removed... catch 22.

Ugh! The only reason someone would go out of their way to make that file more complex to parse (i.e. by adding superfluous quotes) is if they want to make everyone else's lives harder. Avoid the person who designed that file.

One approach to importing that very unfortunate datafile would be to leverage STR2DOUBLE, which by default will accept the "," thousands delimiter. We just need to merge its output with the (already) numeric data:

C = readcell('GDP by Country 1999-2022.csv');
M = str2double(C);
X = isnan(M);
X(:,1) = false;
M(X) = [C{X}] % all data are now numeric (just ignore the first column)
M = 181×24
1.0e+04 *

       NaN    0.1999    0.2000    0.2001    0.2002    0.2003    0.2004    0.2005    0.2006    0.2007    0.2008    0.2009    0.2010    0.2012    0.2013    0.2014    0.2015    0.2016    0.2017    0.2018    0.2019    0.2020    0.2021    0.2022
       NaN         0         0         0    0.0004    0.0005    0.0006    0.0007    0.0008    0.0010    0.0012    0.0013    0.0015    0.0019    0.0022    0.0024         0         0         0         0         0         0         0         0
       NaN    0.0003    0.0004    0.0004    0.0004    0.0006    0.0007    0.0008    0.0009    0.0010    0.0011    0.0012    0.0012    0.0014    0.0015    0.0016    0.0012    0.0012    0.0013    0.0014    0.0016    0.0017    0.0017    0.0018
       NaN    0.0049    0.0055    0.0055    0.0057    0.0068    0.0085    0.0102    0.0114    0.0116    0.0127    0.0135    0.0155    0.0178    0.0190    0.0203    0.0175    0.0182    0.0192    0.0202    0.0211    0.0219    0.0164    0.0168
       NaN    0.0006    0.0009    0.0009    0.0011    0.0014    0.0020    0.0031    0.0044    0.0055    0.0068    0.0070    0.0088    0.0118    0.0136    0.0151    0.0102    0.0099    0.0105    0.0113    0.0119    0.0127    0.0070    0.0075
       NaN    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0001    0.0002    0.0002    0.0001    0.0002
       NaN    0.0284    0.0284    0.0269    0.0098    0.0128    0.0152    0.0182    0.0213    0.0247    0.0278    0.0301    0.0296    0.0316    0.0329    0.0348    0.0579    0.0579    0.0590    0.0603    0.0617    0.0634    0.0455    0.0484
       NaN    0.0002    0.0002    0.0002    0.0002    0.0003    0.0004    0.0005    0.0006    0.0008    0.0009    0.0009    0.0008    0.0009    0.0010    0.0011    0.0011    0.0011    0.0011    0.0012    0.0012    0.0013    0.0014    0.0015
       NaN    0.0402    0.0390    0.0368    0.0413    0.0528    0.0639    0.0712    0.0755    0.0822    0.0854    0.0920    0.0983    0.1040    0.1079    0.1126    0.1241    0.1253    0.1317    0.1382    0.1442    0.1516    0.1611    0.1677
       NaN    0.0213    0.0194    0.0193    0.0209    0.0256    0.0293    0.0305    0.0322    0.0349    0.0365    0.0374    0.0389    0.0407    0.0418    0.0431    0.0373    0.0386    0.0403    0.0418    0.0434    0.0453    0.0481    0.0520
M(24,2:end) % check the first row with double-quoted numbers
ans = 1×23
1.0e+03 *

    0.5869    0.6443    0.5544    0.5057    0.5522    0.6636    0.8820    1.0677    1.1777    1.2516    1.4815    1.7243    1.9243    2.0362    2.1557    1.7996    1.6729    1.7593    1.8547    1.9537    2.0544    1.6458    1.8106

And if you really want it in a table:

T = array2table(M(2:end,2:end), 'VariableNames',string(M(1,2:end)), 'RowNames',C(2:end,1))
T = 180×23 table
                            1999      2000      2001      2002      2003      2004      2005      2006      2007      2008      2009      2010      2012      2013      2014      2015      2016      2017      2018      2019      2020      2021      2022 
                           ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______    ______

    Afghanistan                 0         0         0     4.084     4.585     5.971     7.309     8.399     9.892    11.513    13.318    15.406    19.248    21.555    24.304         0         0         0         0         0         0         0         0
    Albania                 3.444     3.695     4.096     4.456       5.6     7.452     8.376     9.133    10.163    11.131    11.726    11.797    13.808     14.91    16.053    11.591    12.204    13.214    14.341    15.553    16.996     16.77    18.012
    Algeria                48.845    54.749    55.181    57.053    68.013    85.016    102.38    114.32    116.16    126.89     134.8    154.84    177.83    190.43    203.45    175.08    181.71    192.26    202.18    210.91    219.16    163.81    168.19
    Angola                  6.153     9.135     8.936    11.386    13.956      19.8    30.632    43.759     55.37    67.608    69.708    87.718    118.43    136.41    151.09    102.01    98.815    105.37    112.53     119.4    127.15    70.339    74.953
    Antigua and Barbuda     0.652     0.678      0.71     0.718     0.754     0.818     0.875     0.962     1.026     1.074      1.18     1.196     1.322     1.404     1.494     1.285     1.328     1.386     1.458     1.536     1.617     1.405     1.534
    Argentina              283.52     284.2     268.7    97.732    127.64    151.96    181.55     212.7     247.1    277.91    301.33    296.11    315.83    329.33    347.73    578.71    578.69    589.59    602.87    617.39    633.89    455.17    483.76
    Armenia                 1.845     1.912     2.118     2.376     2.807     3.573     4.903      6.41     7.666     8.579     8.683     8.238     9.086     9.844    10.821    10.607    10.672    11.074     11.63    12.273    12.939    13.612     15.06
    Australia              401.78    390.02    368.19     412.9    527.59    639.36    712.44    754.82    822.09    854.07    920.01    982.83    1039.7    1078.9    1125.8    1240.8      1253    1317.2    1381.5    1442.5    1515.7    1610.6    1677.5
    Austria                213.39    194.41    193.34    208.57    255.84    293.19    305.34    321.93    348.66    364.58    374.42       389    406.52    417.77    430.61    372.61    386.23    402.79     418.4    433.75    452.66    481.21    520.34
    Azerbaijan              4.581     5.273     5.708     6.236     7.276     8.682    12.561    19.817    28.699    40.229    42.505    51.916    58.795    62.867    65.776    63.983    63.602    70.927    74.266    83.996    87.276    52.645    54.725
    Bahamas, The            4.704     5.004     5.131     5.389     5.503     5.661      5.87     6.223     6.624     7.026     7.403     7.409     7.786      8.02     8.287         0         0         0         0         0         0         0         0
    Bahrain                 6.617     7.966     7.927     8.446     9.734     11.18    13.378    16.065    17.505    19.673    19.361    21.567    24.231    25.814    27.165    30.914    32.335    34.318    36.334    38.446    40.522    39.104    41.057
    Bangladesh             46.529    47.048    47.194     49.56    54.476     59.12     61.28    65.216    71.443    77.943    92.121    99.088    114.72    123.45    133.18    202.33    218.99    238.72    260.87    285.33    310.86    355.69    390.61
    Barbados                2.478     2.559     2.554     2.476     2.695     2.817     3.061     3.386     3.751     3.922     3.595      3.63      4.03      4.23     4.445     4.451     4.537     4.652     4.805     4.999     5.222     4.648     5.142
    Belarus                12.104    12.758    12.421    14.654    17.823    23.141    30.131    36.944    38.185    37.323    49.043    53.137    69.146    79.416    91.501     62.02     66.37    70.997    76.023    82.437    89.315    65.754    70.632
    Belgium                253.89    232.93    231.93    252.66    310.51    359.15    372.73    393.59    423.54    441.12    461.49    480.15       506    520.03    535.08    458.65    473.52    492.06    511.49    531.19     556.2    581.85    619.16
T{'Brazil',:} % check the first row with double-quoted numbers
ans = 1×23
1.0e+03 *

    0.5869    0.6443    0.5544    0.5057    0.5522    0.6636    0.8820    1.0677    1.1777    1.2516    1.4815    1.7243    1.9243    2.0362    2.1557    1.7996    1.6729    1.7593    1.8547    1.9537    2.0544    1.6458    1.8106

1 Comment
Show -1 older comments Hide -1 older comments

Kevin on 27 Nov 2023

Thank you so much Stephen!

Sign in to comment.

Answer 2

Walter Roberson on 27 Nov 2023

Open in MATLAB Online

0 votes

GDP = readtable(fileName2, "ThousandsSeparator",",");

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Why does Readtable return NaN for values greater than 1000?

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (1)

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Tags

Community Treasure Hunt

Why does Readtable return NaN for values greater than 1000?

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (1)

0 Comments Show -2 older comments Hide -2 older comments

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

1 Comment
Show -1 older comments Hide -1 older comments

0 Comments
Show -2 older comments Hide -2 older comments