Encoding problem reading data using fread

14 views (last 30 days)
Hi,
I'm using the following code to read in data from a file which contains text as well as binary data (European Data Format, to be more specific):
fid = fopen('test.edf', 'r', 'l');
fileType = fread(fid, 1, 'uint8');
id = char(fread(fid, [1 7], 'char'));
fclose(fid);
On my machine (Windows 10, MATLAB R2020a Update 6) this code runs fine and the values returned (i.e. fileType and id) are correct.
However, when this code is run on a different machine (one of our customers; also running Windows 10 but using MATLAB 2020a Update 1) using the same input file, the value of id seems to be read in incorrectly (the encoding used seems to be UTF-16BE. In fact, I get the same incorrect results on my machine if I specify UTF-16BE as the file encoding in the fopen call.
More interestingly, if I open the file on my machine without specifying an encoding and determine the used encoding using
[filename, permission, machineformat, encoding] = fopen(fid);
then the encoding UTF-16BE is returned.
And the default encoding in Windows is the same across the machines compared.
So, to me it seems like MATLAB on my machine detects an incorrect encoding because the file contains the BOM somewhere in the data but nevertheless returns the correct values. On the customers machine, however, it seems like the detected encoding is used, yielding different results.
My question is now: how is it possible that MATLAB obviously detects a wrong encoding but reads in the data correctly on my machine? And why do I get incorrect data if I explicitly specify the incorrect encoding (which is detected by MATLAB)? And why does the customer get different results although the same input file is used and although MATLAB detects the same (incorrect) encoding?
Is it possible that something has changed between Update 1 and Update 6 of MATLAB R2020a which causes MATLAB to behave differently? Unfortunately, I did not find any hint in the release notes of the updates with respect to the behavior of fopen.
Best,
Michael
  2 Comments
Mathieu NOE
Mathieu NOE on 25 May 2023
you may want to contact TMW support for that
Michael Liedlgruber
Michael Liedlgruber on 25 May 2023
Thank you. Yes, if nobody in the community has an idea what may cause these inconsistencies, I will contact TMW support.
Fortunately, a fix is quite easy: by specifying UTF-8 encoding explicitly, everything works as expected on all machines.
But I'm still curious what's going on here.
Best,
Michael

Sign in to comment.

Answers (1)

Ayush
Ayush on 4 Sep 2023
  1 Comment
Michael Liedlgruber
Michael Liedlgruber on 6 Sep 2023
Thank you. But this does not really answer my question. And, funnily, the page you linked to says "For more information, see ."
So, I already know that MATLAB defaults to UTF-8. But as you can see in my original post, the behavior is inconsistent between Update 1 and Update 6.And I have no explanation why on my machine the incorrect encoding is returned by fopen(fid), while the correct encoding is used when reading the data.

Sign in to comment.

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!