Could you please tell me if there is a way of reducing very big table by removing the cells (positions) that have least frequently change?

3 views (last 30 days)
Salem on 11 Sep 2015
Commented: Salem on 11 Sep 2015
I have very big table (saved as .csv file) with 10000 rows and 10000 columns or we can say matrix with size of 10000 x 10000. The data in the table is changed frequently in every iteration (mostly from 1000 to 100000 iterations).
I would like to compare the new coming data in the table (or new table) with the previous one and record the change between the new data and previous one. For example, if the previous table has {1,2,3,..} and the new table (or new data) has {4,5,3,..}, I want to save that the first cell(position) in the table as it is changed, the second is changed and the third one is not and etc.
At the end, I would like to remove the cells or positions that have the minimum changes in the table as the goal is reducing the table by removing the cells (positions) that have least frequently change among all the iterations. In other words, make the table sparser. However, no need to save the old data in the table after mark or record the changes in the cells (positions) and only the new data needs to be saved to compare with new coming data and of course the number of change in the positions are very important.
Please, could you please suggest or explain me a way or method that can do this work.

Accepted Answer

Guillaume on 11 Sep 2015
Tracking the number of changes for each element of the matrix is easy, but of course, you're going to temporarily at least double your memory requirement:
data = randi([0 255], 100, 100); %demo data
numchanges = zeros(size(data)); %matrix to track the number of changes in each cell
for iter = 1:100
newdata = data; newdata(randperm(numel(data), 50)) = randi([0 255], 1, 50); %for demo, change some data randomly
changedcells = newdata ~= data; %a logical matrix indicating which cells have changed
numchanges = numchanges + changedcells; %add the logical matrix to the change count
From there, it's easy to create a sparse matrix with only the cells that have changed:
sparsedata = data;
sparsedata(numchanges == 0) = 0; %replace unchanged data by 0
sparsedata = sparse(sparsedata); %and convert to sparse matrix, effectively removing all 0 (unchanged data)
Note that you're trading a smaller final memory/disk footprint for increased processing time, not only to generate the data, but most likely every time you use your sparse matrix.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!