How to amend the names of all variables in all tables in the current workspace

13 views (last 30 days)
Dear MATLAB community,
1. I have 300+ tables in my base workspace, and each of them will contain anywhere from approximately 5 to several hundred variables. Many of these variables are identically named. My eventual goal is to generate a gargantuan table in which all 300+ tables have been joined appropriately on a combination of key variables. However, this would mean that many variables will be identically named. I understand from reading MATLAB's documentation on the 'join' function that MATLAB copes with this problem by automatically appending the name of the original tables to the names of the affected variables, during the process of executing the 'join' function - e.g. if two tables named 'table1' and 'table2' both contain a variable named 'var1', in the final joined table, the resulting variables would be 'var1_table1' and 'var1_table2'.
However, I would prefer a different solution. I would like to amend the names of all variables in all tables first, before carrying out the joining. Specifically, I would like to append the name of each table as a prefix to the name of the variables in that table. Using the same example as above, this would generate the variables named 'table1_var1' and 'table2_var1'.
2. I have found a solution that works, but for only one table at a time:
N = length(table1.Properties.VariableNames);
varNames = cell(1,N);
for i=1:N
varNames{i} = ['table1_' table1.Properties.VariableNames{i}];
end
table1.Properties.VariableNames = varNames;
clear N
clear varNames
clear i
Re-creating the above chunk of code for all 300+ tables in the workspace, substituting 'tableX' for 'table1', will lead to 3000+ lines of code, and will be really tedious.. I did consider using 'mail merge' or some similar function to automate the creation, but the end result would still be long and inelegant.
3. My question boils down to this: is there a more elegant solution?
I have tried the following:
workspace = whos; % save all the variables in the current workspace to a new variable called 'workspace'.
% This 'workspace' is of the data type structure
N = length(workspace);
for i=1:N
if workspace(i).class == "table"
x = workspace(i).name; % I am trying to extract out the name of each table here
M = length([x '.Properties.VariableNames']); % And then use that name as a dynamic variable name,
% as inputs into arguments for example - but it doesn't work. Rather than returning
% M = <number of variables in the table named 'x'>, it just treats [x '.Properties.VariableNames']
% as a character instead and tells me how many characters there are in it
varNames = cell(1,M);
for j=1:M
varNames{j} = [x '_' x.Properties.VariableNames{j}];
end
x.Properties.VariableNames = varNames;
end
end
clear x
clear M
clear varNames
I think the key problem here is that whenever I type 'x' in the Command Window, it returns the character 'table1', rather than the table 'table1' itself.
I'm aware that eval(x) will solve it nicely, e.g. eval(x).Properties.VariableNames will indeed return all the names of the variables in the table named 'x'. But I've dutifully read what our experts have said against using eval when it comes to creating dynamic variable names (e.g. https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval#answer_236126), and am keen to avoid it if at all possible.
I've looked through all the solutions referenced by Mr Stephen Cobeldick in the link above, and unfortunately still couldn't find a way out. In particular, I tried assigning the variable names 'table1', 'table2', ..., 'tableX', to a cell, and then retrieving those table names by indexing into the cell array, but it just returns a character value rather than an actual variable name that would then pull up the desired table. (Admittedly, I'm rather new to MATLAB and thus might not have understood all of the suggested solutions enough to have tested them out properly.)
===
Apologies for the rather long post! Just wanted to explain my problem in enough detail that I hope one of the experts here can show a way out of this.
Thank you very much!
  9 Comments
Sindar
Sindar on 1 Aug 2020
If the import procedure for each table truly is unique, you could still transition to cells (or structs) which handle iteration better:
go from
opts_1 = struct(...);
table1 = readtable([rootdir 'table1.csv'],opts_1);
table1 = postprocess(table1);
opts_2 = struct(...);
table2 = readtable([rootdir 'table2.csv']);
table2 = postprocess(table2);
to
opts_1 = struct(...);
all_tables{1} = readtable([rootdir 'table1.csv'],opts_1);
all_tables{1} = postprocess(all_tables{1});
opts_2 = struct(...);
all_tables{2} = readtable([rootdir 'table2.csv']);
all_tables{1} = postprocess(all_tables{1});
still not elegant and tedious to change (find-replace and maybe SHIFT-ENTER variable renaming), but allows the more stable solutions above
Yao Feng Chong
Yao Feng Chong on 4 Aug 2020
Thank you for your comment. I will probably restructure my code along the lines you suggested, if I ever need to rewrite my code.

Sign in to comment.

Answers (1)

J. Alex Lee
J. Alex Lee on 1 Aug 2020
You can still "bandaid" your situation up by reading the named tables in your workspace into the cell array that could have been created in the first place. I guess you cannot avoid the usage of eval, but maybe that sin can be forgiven because you are using it to undo the 300+ sequential naming sins:
for i = numTables:-1:1
tablename = sprintf("table%d",i);
AllTables{i} = eval(tablename);
end
Then, I would step back even further and ask why do you want to join/merge the 300+ tables?
If the table name and variable name are semantically important to you, why not index into your variables using dot notation:
AllTables{128}.Var1
This perserves the information the names carried in the first place (Var1 from the 128the table), but without unnecessary burden of renaming Var1 within that table to Table128_Var1.
If your tables are just 1-row tables, maybe the better data type is the nested structure.
Data = struct()
for i = numTables:-1:1
tablename = sprintf('table%d',i)
tmptable = eval(tablename);
varnames = tmptable.Properties.VariableNames;
Data.(tablename) = struct();
for j = 1:length(varnames)
Data.(tablename).(varnames{j}) = tmptable.(varnames{j});
end
end
And you can index your variables as
Data.table128.var34
And to pontificate further on Adam Danz's wisdom: if the cost is sunk, then the further cost of learning how to do things better and faster is but a drop in the sea, and starts to look like a pretty good investment when you consider that usually your stakeholders care only about how well you have accomplished your task, and not necessarily how hard you worked on it.
Plus, it's easier to learn new methods on familiar problems than have to learn them on new problems. Now that you ahve invested time and effort to really understand this task/problem, why not capitalize further on that by learning better ways to solve it and better prepare yourself for the next challenging task?
  5 Comments
Stephen23
Stephen23 on 9 Aug 2020
Edited: Stephen23 on 9 Aug 2020
'not sure if dynamic referencing of structure elements suffers similar performance issues from "eval"-like "strategies"'
Accessing structure elements or fields can be optimised by the JIT compiler, whereas evaluated strings cannot be.
Structure elements are accessed exactly like any other class array elements: using very efficient indexing.
Using indexing is usually more efficient than generating dynamic fieldnames, because commonly those fieldnames are created by converting data from numeric or concatenating other strings, both of which take time.
J. Alex Lee
J. Alex Lee on 9 Aug 2020
"The reason why all the datasets (aka tables) had to be stored within the same workspace and then subsequently joined is because I'm merging individual subjects' data which had been recorded across 300+ tables."
I understand the need for having the data all in one place and accessible by some sane and simple "interface". I just challenged the additional and conceptually separate reqirement of merging the tables into a single table, or the requirement of the table data type. It's just a comment in the same vein as other advisers here: solve the actual problem, not the problems you may have created for yourself by making "bad" choices earlier in the process.
@Steven, by your last sentence I take it you just mean that accessing array elements by numeric indexes is faster than accessing accessing structure fields by dynamically resolving the fieldname because of the cost of resolving the fieldname? Otherwise that structure access by dynamic fieldnames is indeed much better than eval-ing?

Sign in to comment.

Categories

Find more on Tables in Help Center and File Exchange

Products


Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!