Clear Filters
Clear Filters

Saving files in v7.3 format are becoming 100 times larger than than those saved in v7? Solution?

2 views (last 30 days)
My saved *.mat files are becoming LARGER than the memory they take up in Matlab when saving with the '-v7.3' switch and almost 100 times larger than files saved with the default -v7. Something is very wrong here.
EXAMPLE: A mixed data structure with numbers and cells shows as 3.8MB in Matlab R2015b:
>> whos data2
Name Size Bytes Class Attributes
data2 1x1 3991020 struct
>> save ('data2_ver7_3.mat','data2','-v7.3')
>> save ('data2_ver7.mat','data2')
>> z=dir('data2_*.mat');
>> z(1)
ans =
name: 'data2_ver7.mat'
date: '02-Apr-2018 15:02:43'
bytes: 126485
isdir: 0
datenum: 737152.626886574
>> z(2) ans =
name: 'data2_ver7_3.mat'
date: '02-Apr-2018 15:02:33'
bytes: 12449281
isdir: 0
datenum: 737152.626770833
So how can a 3.8MB data set become 11.9MB when saved as -v7.3 and yet compresses beautifully to 0.12MB when saved as default v7?
Am I missing something obvious to enable compression with -v7.3? Why is the saved -v7.3 file becoming three times larger than the same dataset loaded in Matlab 64 bit memory? Shouldn't it become the same size or smaller upon saving?

Answers (2)

Martin Dunda
Martin Dunda on 3 Feb 2020
I do have the same problem using R2017b.
MAT files without the 'v7.3' are around 100 MB in size, with the tag they grow to around 1.1 GB.

Walter Roberson
Walter Roberson on 3 Feb 2020
V7.3 files are stored in a completely different format, that is compatible with hdf5 where possible.
Unfortunately, hdf5 is not really designed for cell arrays or for structure arrays, so the v7.3 format more or less has to create a complete hdf5 variable for each entry in a cell array, and for each field of each member of a structure array. That includes defining index information and type information and so on.
I do not know if any optimization is done to compare entries to entries already generated so as to be able to refer to an already created internal variable. Hypothetically if a cell has (among the entries) two entries that were each 5 x 7 zeros then the internal list generated could possibly refer to the same one each time.
If I were implementing such a thing I would possibly skip duplicate checking for smaller arrays (potentially too expensive because there could be a lot of them) and for larger arrays I would probably generate a hash entry as sorted hash entries can be searched in log2 time at worst, or closer to constant time with a hash table.
Anyhow: struct and cell array have a much lower cost in MATLAB than in hdf5.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!