deteriming number of bytes in one line of binary file

Hello,
I am working on translating some old fortran code into MATLAB and have run across the following issue.
In the fortran file I am working with the following check is performed
CHARACTER*72 PICTFILE
CHARACTER*8192 INLINE
INTEGER NPX
INTEGER NLN
INTEGER BYTES
c This is read from another file but I'll just hard code it for now
NPX = 1024
NLN = 1024
bytes=2
open(unit=10, file=pictfile, access='direct', recl=2*npx, status='old')
read(10,rec=nln, err=20) inline(1:2*npx)
go to 21
20 bytes=1
21 continue
close(unit=10)
where nln is the number of lines in the file being read, and npx is the number of integers contained in each line. This check basically determines whether each of those integers is 1 byte or 2 bytes by trying to read in a line of 2 bytes for each integer, and then adjusting to 1 byte if there is an error. I understand the fortran code well enough to figure that out, but now I need to figure out how to perform this check in MATLAB. I have tried using the fgetl command on the file and then reading the length of the characters contained but the length never seems to be more than 4 or 5 characters, when even if each integer is 1 byte the length should be somewhere around 1000.
Can someone help me with devising some way to perform this check in MATLAB? I've thought about maybe a try-catch statement but I'm not very familiar with them and don't really know where to start.
Thanks! Andrew

7 Comments

Are you trying to read a direct access file (i.e., a binary file) in MATLAB that was originally written using Fortran? If so, what system was the file written on and what system are you trying to read it on using MATLAB? Is inline an integer*1 array?
Hey, yes it is a direct access file though I don't know whether it was originally written using fortran or not but my guess is probably. I am also not sure about what system it was written on but I could probably find out a little later today if that is crucial to know. I would hazard a guess that it was probably written on windows or some linux distro. I am personally working on a Mac.
If you are asking because of endianess don't worry about that because I can figure out the endianess for myself; however if you are potentially asking because you want to figure out what the new line character will be then I will try to get the information to you.
Also, inline is a character array but in matlab I will be going directly to an integer array thanks to the wonder of fread and it's ability to interpret data. (inline is a character array so that the fortran code can handle the endianess of the file and build the subsequent integers itself).
Several issues here...first, in Fortran direct-access there's a hidden record length written to each record (most implementations include it both in front of and behind each record, but the details of the implementation are not specified by the Standard so the only guarantee made is that a file written by a "compatible compiler" will be readable by that same "compatible compiler").
In most common implementations prior to 64-bit, this is a default 32-bit integer.
Also the interpretation of RECL is another detail not specified but left as implementation-dependent. It may be bytes or words; which is dependent upon the compiler. The compiler of my choice says--
If the file is connected for formatted data transfer, the value must be expressed in bytes (characters). Otherwise, the value is expressed in 4-byte units (longwords).
If the file is connected for unformatted data transfer, the value can be expressed in bytes if compiler option /assume:byterecl is specified.
So, need some more context--what is the declaration for INLINE and what happens with BYTES?
Where is NPX defined?
But, first thing to just try is to read a 4-byte integer and see what you get--I'm betting it'll turn out to be a record length in bytes you can use directly.
@dpb - I added in the declarations for inline and the rest of the variables. NPX and NLN are defined by reading another file for the information but I've hard coded them directly above just for scope.
As for what happens with bytes, it simply is used to determine whether each integer that is to be read is 1 byte or 2 bytes of unsigned data so that the characters read from the file can be interpreted properly.
I am not sure what you mean by your final statement. If you are indicating that there is some kind of header that will tell how many bytes there are in each line there isn't. The entire file is strictly data which makes this much more difficult than it needs to be.
Thanks for your response!
If you are indicating that there is some kind of header that will tell how many bytes there are in each line there isn't. The entire file is strictly data which makes this much more difficult than it needs to be.
It's not a header record, no; but if it was written by an unformatted Fortran WRITE then there WILL be a record marker in each record; it will NOT be just a stream file.
Did you try the experiment suggested?
...npx is the number of integers contained in each line
Not quite, there are 2*NPX values/record.
As for what happens with bytes, it simply is used to determine whether each integer that is to be read is 1 byte or 2 bytes of unsigned data...
If the above suggestion doesn't solve the problem, I'd like to see where/how that's done for certain. Do you have a Fortran compiler handy to simply read the file? Or use a hex dump utility to poke around. Or, can you link the file to your post and I'll try to take a gander.
Oh, the other verification would be to look at the OPEN and the WRITE statements that created the file if you have that code handy, but I'm betting I'm right. Until F2003 Standard there was no Fortran-standard way to write a stream (aka 'binary') file and the use of direct access in the read wouldn't handle it correctly if it were written that way.
ADDENDUM
My primary source of confusion is why, if the OPEN fails didn't they just adjust the RECL parameter and reOPEN there? That is, why isn't it
... , RECL=bytes*npx, ... instead of ..., RECL=2*npx, ... ???
I keep thinking there must be more to the story...but I'm still nearly 100% certain there's an embedded record length in the file in each record that you'll have to read to read the file in Matlab, anyway.
Hey, from what you asked I input the following fid=fopen('FC21A0004707.DAT') fread(fid,1,'*uint32','b') fclose(fid)
fid=fopen('FC21A0004707.DAT')
fread(fid,1,'*uint32','l')
fclose(fid);
fread(fid,1,'*uint32','l')
and got the following outputs from the file respectively:
ans = 2551174414
ans = 248844184
(I haven't had time to figure out the endianness). This file may not have been made with fortran. It is simply supposed to be image data. I have attached both the file I am trying to read and the fortran function for reading it to this comment. I do have access to gfortran and have been using it to recompile the fortan source I attached with different write statements so that I could check my work.
Also, as for npx, there may be 2*NPX values for each line but only if byte is equal to 2, which is what the above chunk of code checks, then they fold down to simply NPX 16 bit integers. Essentially this is a raw image file that I am trying to read so regardless of whether the integers are 8 bit or 16 bit there will always be exactly 1024 of them.
Also, I apologize that this code is not commented well. I wasn't the one who wrote it I just have to try and interpret it despite never having used fortran before :p...
Hey @dpb. I think I figured it out in my answer below. all that the above test is doing is trying to read the last record in the file. If it gets to the end of the file and is still trying to read then it throws an error. However, because of the err=20 option in the read command it doesn't report the error and instead goes to the line identified by 20, thus recording that there can't be 2 bytes per integer because there isn't enough data in the file and then moving on. If the read command doesn't experience an error it just skips over the line from the go to command and keeps bytes set as 2. Certainly no the most robust way to handle this but I supposed that it works.
Thanks for all your help!!
Andrew

Sign in to comment.

 Accepted Answer

For anyone else who may have this problem I believe I figured out a solution using try-catch statements.
I realized that all the above was doing was trying to read the data from the last record in the file. It was set up that if there was an error (it ran out of data in the file) it would kick out and set the number of bytes to 1 instead of 2. In order to do this in Matlab use the following:
fullpath=which(file); %extracting the full file path
s=dir(fullpath); %extracting information about hte file
fid=fopen(file_name,'r'); %opening image file
if s.bytes/NLN==2*NPX %if the file is NLN*NPX*2 bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint16','b'))'; %reading in lines into DN
end
elseif s.bytes/NLN==NPX %Else if the file is NLN*NPX bytes
for n=1:NLN %for each line
dn(n,:) = (fread(fid, NPX, '*uint8','b'))'; %reading in lines into DN
end
else %If the file is neither something went wrong
error('Invalid file. The file is not the correct size specified by the SUM file')
end
hope this helps, but be careful! This only works if all of the data in the file is the same type (ie uint8 or uint16) and you know the number of entries that should be contained in the file beforehand. Andrew

1 Comment

... if all of the data in the file is the same type (ie uint8 or uint16) and you know the number of entries that should be contained in the file beforehand.
If that were the case you should be able to just find the SIZE in bytes and divide it is stream ('binary') data only.
It still doesn't make sense unless there was a compiler switch that made the Fortran unformatted record look like a stream file, though, altho apparently it was...does GFortran actually read it correctly w/o any special switch settings? I've not used it enough to know its handling otomh but it's plenty weird if it does.
But, I suppose "all's well that end's well"... :)

Sign in to comment.

More Answers (0)

Categories

Products

Asked:

on 9 Jun 2014

Commented:

dpb
on 9 Jun 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!