textscan doesn't stop at blank space in txt file

Question

Josh Tome on 20 Oct 2022

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1831643-textscan-doesn-t-stop-at-blank-space-in-txt-file

Edited: dpb on 22 Oct 2022

walking_01.txt

Hi, I'm trying to import data from a txt file using the textscan function. While I thought it was suppose to stop at the first blank space it sees, it seems to be grabbing data beyond the blank space. My Group1 should stop at the first blank space before "Events", but it includes "Events", "100", and "Subject".

I'm using the following code thus far..

[file_list, path_n] = uigetfile('.txt','Select the Files to Process','Multiselect','on');
fidi = fopen(file_list);
Group1 = textscan(fidi, '%s %s %s %f %s %s','HeaderLines',3, 'Delimiter','\t');

Attached is the txt file data:

4 Comments
Show 2 older commentsHide 2 older comments

dpb on 21 Oct 2022

Edited: dpb on 21 Oct 2022

Open in MATLAB Online

"...the blank space doesn't match the "%s" specifier (or so I believe),"

Well, that isn't correct assumption, either, a blank is a valid character as is any other. However, unless told different with the optional 'whitespace' named parameter, blanks are considered whitespace and ignored or treated as delimiters except for quoted strings in which they are significant.

Again the textscan doc Algorithms section states--

"When matching data to a text conversion specifier, textscan reads until it finds a delimiter or an end-of-line character."

But, the format spec was '%s %s %s %f %s %s' which gets reapplied over and over until it either fails or reaches the end of file. In this case it found the %s and a numeric it could convert, but then the following records fail.

Another alternative to parsing w/ textscan when such is known to be in the file is to just accept the error; and resynch the file pointer to the next expected record and then carry on with the next section format string. This can be tricky if the file doesn't have fixed-length records as the example; fgetl will get to the next EOL record, but depending upon file content, that may not include all of the next record to be scanned and trying to back up to the previous end of record isn't easily supported in stream files. In the particular file, however, with the failure in the header line, that would work and you could subsequently get the second group in the same open with textscan as

fidi = fopen(file_list);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,2)];
G1=textscan(fidi,fmt,'HeaderLines',3,'Delimiter','\t','collectoutput',1);
fmt=[repmat('%s',1,3) '%f' repmat('%s',1,1)];
fgetl(fidi);         % resynch to BOL next header group
G2=textscan(fidi,fmt,'Delimiter','\t','collectoutput',1);

Personally, I'd still opt for higher level parsing tools instead of having to then put the above into something useful...

Walter Roberson on 21 Oct 2022

All textscan formats other than %c and %[] skip leading whitespace as defined by the Whitespace option (or default list of whitespace characters if no option was passed.) And %c is perfectly happy to read a space.

If you need a space to be rejected then you have two possibilities:

pass Whitespace option that does not include space; or
use %[^ ] taking into account that would be happy to gobble a number returning it as a character vector

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 20 Oct 2022

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1831643-textscan-doesn-t-stop-at-blank-space-in-txt-file#answer_1080038

Open in MATLAB Online

opt=detectImportOptions(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'), ...
            'numheaderlines',2, ...
            'readvariablenames',1, ...
            'delimiter','\t', ...
            'expectednumvariables',6, ...
            'missingrule','fill');
opt.VariableTypes(1)={'char'};
tG=readtable(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'),opt);
ix=find(contains(tG.Subject,'Events'));
tG=tG(1:ix-1,:);
[head(tG);tail(tG)]
ans = 16×6 table
       Subject         Context               Name                Value         Units              Description       
    ______________    _________    _________________________    _______    _____________    ________________________

    {'PluginGait'}    {'Left' }    {'Cadence'              }     116.39    {'steps/min'}    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Walking Speed'        }     1.3038    {'m/s'      }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Stride Time'          }      1.031    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Step Time'            }      0.551    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Opposite Foot Off'    }     12.609    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Opposite Foot Contact'}     46.557    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Foot Off'             }     62.076    {'%'        }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'Single Support'       }       0.35    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Single Support'       }      0.391    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Double Support'       }      0.309    {'s'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Stride Length'        }     1.3652    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Step Length'          }      0.651    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Step Width'           }    0.19855    {'m'        }    {0×0 char              }
    {'PluginGait'}    {'Right'}    {'Limp Index'           }     1.0249    {0×0 char   }    {0×0 char              }
    {'PluginGait'}    {'Left' }    {'GDI'                  }     75.701    {0×0 char   }    {'Gait Deviation Index'}
    {'PluginGait'}    {'Right'}    {'GDI'                  }     72.639    {0×0 char   }    {'Gait Deviation Index'}

Got to thinking -- each of the first two sections would make a great table -- and can import each in part directly. Unfortunately, readtable isn't set up to be able to read from memory...but thought it worthy of showing an import object and what could do.

"In anger" (as my old Scottish power plant testing engineer friend use to say) I'd still probably first read the file in in toto and use that to find the sections and then parse them.

The first two sections are pretty easy; not so sure about the "Devices" section -- the "Moment" section also looks ok although appears empty in this dataset.

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 20 Oct 2022

Edited: dpb on 22 Oct 2022

Open in MATLAB Online

SECTIONS={'Gait Cycle','Events','Devices'};
F=readlines(websave('walking_01.txt','https://www.mathworks.com/matlabcentral/answers/uploaded_files/1163318/walking_01.txt'));
ix=find(startsWith(F,SECTIONS))
ix = 3×1
     1
    33
    51

Gives the section starting locations for internal parsing -- or use those to limit the ranges read using readtable from the file itself.

Sign in to comment.

textscan doesn't stop at blank space in txt file

4 Comments
Show 2 older commentsHide 2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

textscan doesn't stop at blank space in txt file

4 Comments Show 2 older commentsHide 2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments

1 Comment
Show -1 older commentsHide -1 older comments