Loading standard format .dat files into Matlab for further processing
Show older comments
Dear MathWorks community,
I have a big collection of .dat files that I wish to process in Matlab and read specific information from, potentially create a GUI later on.
All the .dat files are in the same format, as they have been generated by the same device. A copy of an example is attached below. We can observe that the .dat files contain multiple sets of data - all these individual chunks have a constant, specific form (X or Y number of columns, the corresponding columns always represent the same data, a certain item number always relates to the same description, etc.).
Column-wise and title-wise the data is always the same, the one thing that is changing is the number of rows, so the amount of data itself (items on the list).
The import tool does not seem to be the best option as the data set is large and would rather use a script. I have tried readtable but also without success. One other option I tried is to convert the .dat file to .csv first with specified delimiters and only then use readtable, but I could not get the file to convert properly. I assume it is because the data is not delimited with a single delimitor.
The goal is to load the .dat file into Matlab which would place the data in the titles and column into appropriate cells within a Matlab spreadsheet. Then I could search that document for a specific item number and read the description or any other value from the same row.
I do not know what else to try and would kindly like to ask for help.
Example of the .dat file content:
****************
* TITLE *
****************
Age: 55
Name: John Doe
Date: Fri Dec 12 1995 11:48:19 GMT+0000 (GMT)
Directory: /people/names.count/results
*** Note no. 1 ***
Item # Description Ag(Z) Ax(Z) Ay(Z) Bx(W) By(W)
-------- ---------------------------- --------- ---------- --------- --------- ----------
0000001 Left Front 2.539E+03 -3.835E+01 2.512E+03 3.649E+02 -5.248E-02
0020000 Right Front 2.514E+03 9.694E+01 2.512E+03 -2.778E+01 -5.722E-02
3000000 Left Back 2.496E+03 -5.440E+01 2.469E+03 3.591E+02 -5.263E-02
0000004 Right Back 2.476E+03 1.144E+02 2.469E+03 -1.308E+02 -5.846E-02
5000000 Head 2.631E+02 -3.062E+00 -8.223E+00 -2.630E+02 1.346E+02
0006000 Torso 2.138E+02 9.390E+00 -8.390E+00 2.134E+02 2.117E+02
*** Note no. 2 ***
Item # Description Dg(mm) Dx(mm) Dy(mm) Dz(mm)
-------- ---------------------------- --------- ---------- --------- ---------
7000000 Pen 3.00 -0.00 -6.60 2.00
0008000 Box 4.00 -8.00 -9.00 4.00
9999999 Scissors 8.00 -0.00 -4.00 0.00
5000000 Bottle 5.00 -6.00 -0.00 1.00
7 Comments
dpb
on 15 Jun 2020
Attach an actual file for folks to play with -- I just {CODE} formatted yours to make somewht legible but we con't know that's accurate and not a figment of the display...
It's a pretty simple file format -- I don't have a link handy to the specific thread but there are at least 3 or 4 Answers I've posted (one not terriibly long ago) that illustrate parsing a file by looking for key phrases to locate file sections -- that's the ticket here -- the "Item #" is unique to the two sections and it's very simple after locating them. You could then readtable twice with two headerlines columns, but that's extra overhead and the format to read is pretty easy -- cells will come in easily for the Descriptions and the others will convert numerically. The one hassle to deal with will be the first section Description field that contains blank characters in string data without, it appears, any delimiter other than blanks. It could be worthwhile to read the whole file as char() and then insert tabs in the proper places to have a delimiter...otherwise selecting the right records and then using a fixed-width import object would be the alternative.
Jure Vrhunc
on 16 Jun 2020
Edited: Jure Vrhunc
on 16 Jun 2020
per isakson
on 17 Jun 2020
"appropriate cells within a Matlab spreadsheet" there is no Matlab spreadsheet. table is the closest there is.
Describe how you want the data of the dat-file shall be assigned to Matlab variables. Be specific.
Jure Vrhunc
on 17 Jun 2020
dpb
on 17 Jun 2020
Polymorphic data like you show in Column 1 is not a good idea in MATLAB table -- means every variable has to be treated as a cell or the column as a cellstr(). That's not conducive to using the power of MATLAB on vectors/arrays of a given type.
Your description above makes Item a natural for using categorical
per isakson
on 18 Jun 2020
"There are thousands of lines of data in my .dat files." and "I believe I could take it from there"
I don't think the table you propose is a good idea. I believe that it will cause you problems downstream and that the resulting code will be inefficient and hard to work with.
A little bit of upfront design is needed.
Answers (0)
Categories
Find more on Standard File Formats in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!