deteriming number of bytes in one line of binary file
Show older comments
Hello,
I am working on translating some old fortran code into MATLAB and have run across the following issue.
In the fortran file I am working with the following check is performed
CHARACTER*72 PICTFILE
CHARACTER*8192 INLINE
INTEGER NPX
INTEGER NLN
INTEGER BYTES
c This is read from another file but I'll just hard code it for now
NPX = 1024
NLN = 1024
bytes=2
open(unit=10, file=pictfile, access='direct', recl=2*npx, status='old')
read(10,rec=nln, err=20) inline(1:2*npx)
go to 21
20 bytes=1
21 continue
close(unit=10)
where nln is the number of lines in the file being read, and npx is the number of integers contained in each line. This check basically determines whether each of those integers is 1 byte or 2 bytes by trying to read in a line of 2 bytes for each integer, and then adjusting to 1 byte if there is an error. I understand the fortran code well enough to figure that out, but now I need to figure out how to perform this check in MATLAB. I have tried using the fgetl command on the file and then reading the length of the characters contained but the length never seems to be more than 4 or 5 characters, when even if each integer is 1 byte the length should be somewhere around 1000.
Can someone help me with devising some way to perform this check in MATLAB? I've thought about maybe a try-catch statement but I'm not very familiar with them and don't really know where to start.
Thanks! Andrew
7 Comments
James Tursa
on 9 Jun 2014
Edited: James Tursa
on 9 Jun 2014
Are you trying to read a direct access file (i.e., a binary file) in MATLAB that was originally written using Fortran? If so, what system was the file written on and what system are you trying to read it on using MATLAB? Is inline an integer*1 array?
Andrew
on 9 Jun 2014
dpb
on 9 Jun 2014
Several issues here...first, in Fortran direct-access there's a hidden record length written to each record (most implementations include it both in front of and behind each record, but the details of the implementation are not specified by the Standard so the only guarantee made is that a file written by a "compatible compiler" will be readable by that same "compatible compiler").
In most common implementations prior to 64-bit, this is a default 32-bit integer.
Also the interpretation of RECL is another detail not specified but left as implementation-dependent. It may be bytes or words; which is dependent upon the compiler. The compiler of my choice says--
If the file is connected for formatted data transfer, the value must be expressed in bytes (characters). Otherwise, the value is expressed in 4-byte units (longwords).
If the file is connected for unformatted data transfer, the value can be expressed in bytes if compiler option /assume:byterecl is specified.
So, need some more context--what is the declaration for INLINE and what happens with BYTES?
Where is NPX defined?
But, first thing to just try is to read a 4-byte integer and see what you get--I'm betting it'll turn out to be a record length in bytes you can use directly.
If you are indicating that there is some kind of header that will tell how many bytes there are in each line there isn't. The entire file is strictly data which makes this much more difficult than it needs to be.
It's not a header record, no; but if it was written by an unformatted Fortran WRITE then there WILL be a record marker in each record; it will NOT be just a stream file.
Did you try the experiment suggested?
...npx is the number of integers contained in each line
Not quite, there are 2*NPX values/record.
As for what happens with bytes, it simply is used to determine whether each integer that is to be read is 1 byte or 2 bytes of unsigned data...
If the above suggestion doesn't solve the problem, I'd like to see where/how that's done for certain. Do you have a Fortran compiler handy to simply read the file? Or use a hex dump utility to poke around. Or, can you link the file to your post and I'll try to take a gander.
Oh, the other verification would be to look at the OPEN and the WRITE statements that created the file if you have that code handy, but I'm betting I'm right. Until F2003 Standard there was no Fortran-standard way to write a stream (aka 'binary') file and the use of direct access in the read wouldn't handle it correctly if it were written that way.
ADDENDUM
My primary source of confusion is why, if the OPEN fails didn't they just adjust the RECL parameter and reOPEN there? That is, why isn't it
... , RECL=bytes*npx, ... instead of ..., RECL=2*npx, ... ???
I keep thinking there must be more to the story...but I'm still nearly 100% certain there's an embedded record length in the file in each record that you'll have to read to read the file in Matlab, anyway.
Andrew
on 9 Jun 2014
Accepted Answer
More Answers (0)
Categories
Find more on Fortran with MATLAB in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!