Ignoring header/footer in textfile question
5 views (last 30 days)
Show older comments
Hello,
For the past week Ive been trying to open multiple text files that have different headers/footers at the same time. And ignoring all headers/footers and just extracting the data.Without knowing what the headers/footers are.The only thing I know is that the headers/footers always start with a char and form a string.
All headers/footers start with a char, examples:
File 1:
Line 1 of file - Samplerate : 100000
Line 2 of file - Bitspersample: 12
Rest of lines - data(2000 samples,floats)
File 2:
Line 1 of file - Bitspersample: 32
Line 2 of file - Normalized: FALSE
Lines 3-2500 - data(2500 samples,floats)
Line 2501 of file - Channel: A
Is there a way to ignore all lines of a text file that start with a char/string?
0 Comments
Answers (1)
Walter Roberson
on 29 Jan 2020
fileread() the file.
regexprep() pattern '^\s*[^0-9+.-].*$' replacement '' (the empty string) with 'lineanchors' option. This will zap the content of lines whose first non-whitespace character is not a digit or + or - or period. If your data never has leading + on the numbers then do not include the + in the pattern. If your data never has numbers that start with period without leading 0 then do not include period in the pattern. This is the question of whether a number like .5 can occur or if would be 0.5.
In the case where your data never has leading + or - or period then instead of the pattern I showed, you can use '\s*\D.*$'
After the regexprep, textscan() the string.
2 Comments
Walter Roberson
on 29 Jan 2020
regexprep(str, '^\s*[^0-9+-].*$', '', 'lineanchors', 'dotexceptnewline')
[] means aany one character chosen from the list inside of the [] except when the first thing inside the [] is ^ in which case it means any one character that is NOT one of the listed ones. So the construct matches any one character that is NOT 0123456789 or + or - . In short you are looking for lines in which the first nonblank character is something that cannot possibly be forming a number.
The .* after that with the dotexceptnewline option matches to the end of the same line. When you find such a line you replace it with emptiness (but without removing the newline character itself) so you get an empty line in place of any line that starts with a non-number
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!