Encoding problem reading data using fread
14 views (last 30 days)
Show older comments
Hi,
I'm using the following code to read in data from a file which contains text as well as binary data (European Data Format, to be more specific):
fid = fopen('test.edf', 'r', 'l');
fileType = fread(fid, 1, 'uint8');
id = char(fread(fid, [1 7], 'char'));
fclose(fid);
On my machine (Windows 10, MATLAB R2020a Update 6) this code runs fine and the values returned (i.e. fileType and id) are correct.
However, when this code is run on a different machine (one of our customers; also running Windows 10 but using MATLAB 2020a Update 1) using the same input file, the value of id seems to be read in incorrectly (the encoding used seems to be UTF-16BE. In fact, I get the same incorrect results on my machine if I specify UTF-16BE as the file encoding in the fopen call.
More interestingly, if I open the file on my machine without specifying an encoding and determine the used encoding using
[filename, permission, machineformat, encoding] = fopen(fid);
then the encoding UTF-16BE is returned.
And the default encoding in Windows is the same across the machines compared.
So, to me it seems like MATLAB on my machine detects an incorrect encoding because the file contains the BOM somewhere in the data but nevertheless returns the correct values. On the customers machine, however, it seems like the detected encoding is used, yielding different results.
My question is now: how is it possible that MATLAB obviously detects a wrong encoding but reads in the data correctly on my machine? And why do I get incorrect data if I explicitly specify the incorrect encoding (which is detected by MATLAB)? And why does the customer get different results although the same input file is used and although MATLAB detects the same (incorrect) encoding?
Is it possible that something has changed between Update 1 and Update 6 of MATLAB R2020a which causes MATLAB to behave differently? Unfortunately, I did not find any hint in the release notes of the updates with respect to the behavior of fopen.
Best,
Michael
2 Comments
Answers (1)
Ayush
on 4 Sep 2023
MATLAB's default encoding has changed to "UTF-8" for all platforms and all locales in MATLAB release R2020a.
To read more about it kindly view the following documentation: https://www.mathworks.com/help/releases/R2020a/matlab/ref/fopen.html?s_tid=doc_ta#:~:text=fopen%20defaults%20to%20using%20UTF%2D8%20in%20order%20to%20provide%20interoperability%20between%20all%20platforms%20and%20locales%20without%20data%20loss%20or%20corruption.
Thank you
See Also
Categories
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!