Clear Filters
Clear Filters

Parsing a Binary File (NOT using fread)

1 view (last 30 days)
I am trying to put together a (generic) binary parsing tool over the weekend that I can modify with specifics for a particular binary file (if someone could explain why there are different "types" of binary files, I would appreciate it...) at work on Monday. It is my understanding that parsing a binary file is faster than parsing plaintext, so to speed up my program, I would like to parse the binary file which has previously been converted to plaintext for parsing. I know about fread, but that seems to be something to read binary by converting it to plaintext - this is not what I want to do.
Any help with parsing a binary file (ie looking for specific values/phrase, which I will likely have to convert from plaintext to binary...) will be much appreciated.
  2 Comments
Walter Roberson
Walter Roberson on 9 Nov 2012
What do you mean by "plaintext" in this situation?
Image Analyst
Image Analyst on 9 Nov 2012
Edited: Image Analyst on 9 Nov 2012
What have you got against fread?
And, do you know about endian? http://en.wikipedia.org/wiki/Endian
An example of what you're starting with and what you want to end up with would help. Just one line or something.

Sign in to comment.

Accepted Answer

Jan
Jan on 10 Nov 2012
The term "conversion from plaintext to binary" is unclear. fread() does not convert binary to plaintext.
The main difference is, that "text"-Files use ASCII encoding, e.g.
3.14159265358979
These are 16 characters to store PI. Parsing it and converting it to a double requires 15 multiplications by 10, an expensive power operation might be needed also for strings like '3.14e-15'.
In opposite to this, storing pi in binary format uses 8 bytes:
-DTû! @
This looks strange, but it is a stream of these bytes: [24, 45, 68, 84, 251, 33, 9, 64]. This can be copied directly to the memory and no further arithmetic is needed.
Binary files can have different types. E.g. the above byte sequence [24, 45, 68, 84, 251, 33, 9, 64] can be one double value, but it could be 2 single values also: [3.370281e+012, 2.142699], because a single uses 4 bytes per element. Therefore the software has to know the type of each variable stored in a binary file.
I suggest to copy a JPEG file, which is an example of a binary format. Then change the extension to ".txt" and open the file in the editor. This does not convert anything, but the byte sequences store in the file are interpreted as characters now, while the same contents is handled differently, when the computer assumes that this is a jpeg encoded picture. Finally you can change the extension to ".mp3", another example of a binary format. Of course your player will fail, because it will have a malformed contents.
Does the difference between "binary" and "text" become more clear now?

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!