Handling Undesirable Characters in Numeric Columns When Reading a CSV File

Dear all,
I need to read a .csv file which has some numeric columns that contain undesirable characters, so when I import it using readtable, the lines with these characters ( [u.]) are showed as NAN. Is it possible to remove it before? Or maybe read it as char, then remove and convert to double?
Here some lines of the file:

3 Comments

Please share the csv file you are working with. Use the paperclip button to attach.
Perhaps you could define square brackets as a comment.
In any case, please upload a sample data file by clicking the paperclip button.

Sign in to comment.

 Accepted Answer

You should NOT make changes to the original data file.
fnm = 'p1.csv';
opt = detectImportOptions(fnm, 'FileType','delimitedtext',...
'Delimiter',';', 'DecimalSeparator',',', 'VariableNamingRule','preserve');
opt = setvartype(opt,'double');
opt = setvartype(opt,'MoveStop','datetime');
opt = setvartype(opt,'Operador','string');
idx = strcmp('double',opt.VariableTypes);
opt = setvaropts(opt,idx,'TrimNonNumeric',true);
tbl = readtable(fnm,opt)
tbl = 3389×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ _________ _________ ___________ 01-Oct-2023 05:25:57 328 1 32 32 37 18 19 0 0 73496 3286.9 4866.2 1.6556e+05 5418.2 10321 0 "lcosta" 328 01-Oct-2023 05:30:57 262 1 0 40 3 -2 5 0 0 1.6556e+05 5420.9 94.5 1.5267e+05 10489 8663.6 0 "lcosta" 262 01-Oct-2023 05:37:58 291 1 0 40 2 1 1 0 0 1.5267e+05 10748 93.88 73397 2735.3 6752.4 0 "Jborges" 291 01-Oct-2023 05:43:41 191 1 32 40 45 21 24 0 0 73398 2569.3 6604 1.5291e+05 10936 8771.6 0 "Jborges" 191 01-Oct-2023 05:47:40 206 1 0 40 2 1 2 0 0 1.5291e+05 10936 94.88 73833 11400 6766.8 0 "Jborges" 206 01-Oct-2023 05:52:57 197 1 32 40 45 22 24 0 0 73833 11358 6605.4 1.5276e+05 11357 8774.2 0 "Jborges" 197 01-Oct-2023 06:00:46 440 1 0 32 2 1 2 0 0 1.5276e+05 11356 94.13 32782 3252.8 4939.9 0 "Jborges" 440 01-Oct-2023 06:06:00 302 1 32 32 37 18 19 0 0 32781 3253.7 4749.9 1.6562e+05 7309 10294 0 "Jborges" 302 01-Oct-2023 06:16:12 556 1 0 40 3 -3 6 0 0 1.6562e+05 7305.4 95.5 74087 20369 7048.1 0 "Jborges" 556 01-Oct-2023 06:26:37 590 1 32 40 45 23 22 0 0 74087 20367 6639.9 1.5332e+05 10992 8760.6 0 "Jborges" 590 01-Oct-2023 06:49:39 860 1 0 40 2 1 1 0 0 1.5332e+05 10992 96.13 63814 19000 2607 0 "Giba" 860 01-Oct-2023 06:56:38 396 1 32 40 45 22 24 0 0 63814 19000 2445.6 1.5356e+05 11426 8684.8 0 "Giba" 396 01-Oct-2023 07:00:33 211 1 0 40 2 1 1 0 0 1.5356e+05 11425 94.5 63662 3824.2 4518.4 0 "Giba" 211 01-Oct-2023 07:04:27 212 1 32 40 45 21 24 0 0 63662 3825.1 4356.5 1.5315e+05 11151 8623.2 0 "Giba" 212 01-Oct-2023 07:08:34 212 1 0 40 2 1 1 0 0 1.5315e+05 11150 99.88 63873 19470 4531.9 0 "Giba" 212 01-Oct-2023 07:22:08 693 1 32 40 45 23 23 0 0 63873 19719 4344.2 1.6813e+05 2547.8 10187 0 "Giba" 693
tbl = convertvars(tbl,@isstring,@(s)regexprep(s,'\s+\[.*','')) % optional
tbl = 3389×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ _________ _________ ___________ 01-Oct-2023 05:25:57 328 1 32 32 37 18 19 0 0 73496 3286.9 4866.2 1.6556e+05 5418.2 10321 0 "lcosta" 328 01-Oct-2023 05:30:57 262 1 0 40 3 -2 5 0 0 1.6556e+05 5420.9 94.5 1.5267e+05 10489 8663.6 0 "lcosta" 262 01-Oct-2023 05:37:58 291 1 0 40 2 1 1 0 0 1.5267e+05 10748 93.88 73397 2735.3 6752.4 0 "Jborges" 291 01-Oct-2023 05:43:41 191 1 32 40 45 21 24 0 0 73398 2569.3 6604 1.5291e+05 10936 8771.6 0 "Jborges" 191 01-Oct-2023 05:47:40 206 1 0 40 2 1 2 0 0 1.5291e+05 10936 94.88 73833 11400 6766.8 0 "Jborges" 206 01-Oct-2023 05:52:57 197 1 32 40 45 22 24 0 0 73833 11358 6605.4 1.5276e+05 11357 8774.2 0 "Jborges" 197 01-Oct-2023 06:00:46 440 1 0 32 2 1 2 0 0 1.5276e+05 11356 94.13 32782 3252.8 4939.9 0 "Jborges" 440 01-Oct-2023 06:06:00 302 1 32 32 37 18 19 0 0 32781 3253.7 4749.9 1.6562e+05 7309 10294 0 "Jborges" 302 01-Oct-2023 06:16:12 556 1 0 40 3 -3 6 0 0 1.6562e+05 7305.4 95.5 74087 20369 7048.1 0 "Jborges" 556 01-Oct-2023 06:26:37 590 1 32 40 45 23 22 0 0 74087 20367 6639.9 1.5332e+05 10992 8760.6 0 "Jborges" 590 01-Oct-2023 06:49:39 860 1 0 40 2 1 1 0 0 1.5332e+05 10992 96.13 63814 19000 2607 0 "Giba" 860 01-Oct-2023 06:56:38 396 1 32 40 45 22 24 0 0 63814 19000 2445.6 1.5356e+05 11426 8684.8 0 "Giba" 396 01-Oct-2023 07:00:33 211 1 0 40 2 1 1 0 0 1.5356e+05 11425 94.5 63662 3824.2 4518.4 0 "Giba" 211 01-Oct-2023 07:04:27 212 1 32 40 45 21 24 0 0 63662 3825.1 4356.5 1.5315e+05 11151 8623.2 0 "Giba" 212 01-Oct-2023 07:08:34 212 1 0 40 2 1 1 0 0 1.5315e+05 11150 99.88 63873 19470 4531.9 0 "Giba" 212 01-Oct-2023 07:22:08 693 1 32 40 45 23 23 0 0 63873 19719 4344.2 1.6813e+05 2547.8 10187 0 "Giba" 693
tbl(180:190,:) % checking the rows where [u.] first appears:
ans = 11×19 table
MoveStop Duração [s] Modo Direção Tamanho Peso [t] Peso Esquerdo 1 [t] Peso Direito 1 [t] Peso Esquerdo 2 [t] Peso Direito 2 [t] Transl Pos Inicial [mm] Carro Pos Inicial [mm] Elev Pos Inicial [mm] Transl Pos Final [mm] Carro Pos Final [mm] Elev Pos Final [mm] Número_OS Operador Move Status ____________________ ___________ ____ _______ _______ ________ ___________________ __________________ ___________________ __________________ _______________________ ______________________ _____________________ _____________________ ____________________ ___________________ __________ _________ ___________ 02-Oct-2023 02:04:27 3854 2 0 40 2 1 1 0 0 63764 11557 93.63 24777 11689 8665.8 2.1574e+05 "Jborges" 3854 02-Oct-2023 02:08:16 199 2 16 40 45 23 22 0 0 24777 11694 8377.2 63538 3200.8 10340 2.1574e+05 "Jborges" 199 02-Oct-2023 02:11:45 165 2 0 40 2 1 1 0 0 63537 3203.5 94.38 45901 11403 6518.2 0 "Jborges" 165 02-Oct-2023 02:26:48 0 2 16 40 2 1 1 0 0 1.4162e+05 11402 32.5 24373 2823.2 8547.4 2.1576e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:31:22 204 2 16 40 45 21 25 0 0 24374 2819.6 8440.5 64000 20097 8528.5 2.1576e+05 "Jborges" 204 02-Oct-2023 02:36:50 0 2 0 40 2 1 1 0 0 64000 20095 94.63 25073 20399 10339 2.1577e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:40:31 150 2 16 40 45 23 22 0 0 25073 20398 10172 63747 11602 8764.4 2.1577e+05 "Jborges" 150 02-Oct-2023 02:44:49 0 2 0 40 2 1 1 0 0 63747 11603 37.75 24722 11622 10342 2.1577e+05 "Jborges" 2.8323e-317 02-Oct-2023 02:48:56 164 2 16 40 45 23 22 0 0 24722 11625 10238 63458 3208.9 8733.1 2.1577e+05 "Jborges" 164 02-Oct-2023 02:53:25 0 2 0 40 2 1 1 0 0 63795 3208 27.75 24403 2856.4 10354 2.1578e+05 "Jborges" 2.8325e-317 02-Oct-2023 02:58:08 0 2 16 40 46 22 23 0 0 24403 2856.4 10200 64062 20212 6817.1 2.1578e+05 "Jborges" 2.8323e-317

More Answers (1)

If you have Microsoft Excel (or Google Sheets) you can use the Replace Tool to get rid of the [u.] in all of the columns. Simply open your CSV file in Excel, choose the "Replace" tool (I think the shortcut in Windows is Ctrl+H). A dialog box should appear. In the "Find what:" box type [u.] and leave the "Replace with:" box blank. Then click the "Replace All" button. This should get rid of all instances of [u.] in your spreadsheet. Just save the CSV and then you can load it into MATLAB as normal.
Alternatively, you could probably also do this in MATLAB by converting the numeric columns into strings and then using the strrep function:
% Read table data
T = readtable('table_name.csv')
% Extract the numeric columns in the table
numeric_data = T{:,2:10};
% Convert to strings
numeric_strings = string(numeric_data);
% Delete [u.]
numeric_strings = strrep(numeric_strings, '[u.]','');
% Convert back to numeric data
numeric_data = str2num(numeric_strings);
% Insert back into the table
T{:,2:10} = numeric_data;
Try either of these and let me know if they work.

5 Comments

It works using excel, but by using your matlab code didn't
Removing the characters from the csv file itself might result in unwanted/undesired changes in the format of the file.
The handling from readtable() needs refinement -
For some columns, the data (with '[u.]' appended) is read as NaN, whereas for others, it is read as a cell array of a character vector (slide right to see).
y = readtable('p1.csv');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
%data corresponding to the 4th row in the image attached by OP
dt = datetime(2023,10,2,2,26,48);
dt = datetime
02-Oct-2023 02:26:48
y(y.MoveStop==dt,:)
ans = 1×19 table
MoveStop Dura__o_s_ Modo Dire__o Tamanho Peso_t_ PesoEsquerdo1_t_ PesoDireito1_t_ PesoEsquerdo2_t_ PesoDireito2_t_ TranslPosInicial_mm_ CarroPosInicial_mm_ ElevPosInicial_mm_ TranslPosFinal_mm_ CarroPosFinal_mm_ ElevPosFinal_mm_ N_mero_OS Operador MoveStatus ____________________ __________ ____ _______ _______ _______ ________________ _______________ ________________ _______________ ____________________ ___________________ __________________ __________________ _________________ ________________ _________ ________________ ______________________________ 02-Oct-2023 02:26:48 NaN NaN NaN NaN NaN NaN NaN NaN NaN {'141623,14 [u.]'} {'11401,99 [u.]'} {'32,50 [u.]'} {'24373,29 [u.]'} {'2823,17 [u.]'} {'8547,38 [u.]'} NaN {'Jborges [u.]'} {'2,83234989053995e-317 [u.]'}
Exactly.
I think I'll have to remove it using another tool before import.
Anyway thanks for your time guys.
@Geovane Gomes, I did not see that you had shared your CSV file before I submitted my original answer. But I just I did the Excel procedure with the file and it seems to work perfectly. When I finished, I saved the file as a tab-delimited text file rather than a CSV so that the commas in your numbers aren't confused as delimiters. You can load the file in MATLAB by typing:
T = readtable('p1_tab_delimited.txt','Delimiter','tab');
"I think I'll have to remove it using another tool before import."
Not required, you can use the import options.

Sign in to comment.

Categories

Products

Release

R2023b

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!