Filter excel files by sheetnames

Hi, I am trying to filter out excel files by checking whether it has a sheet named "Cooking_is_fun" within the excel.
This is the code I have been working on.
function hassheet = CheckForWorksheet(filepath, filelist, sheetname)
hassheet = false(size(filelist));
excel = actxserver('Excel.Application');
cleanupobj = onCleanup(@() excel.Quit);
for fileidx = 1:numel(filelist)
workbook = excel.Workbooks.Open(filepath{fileidx}, false, true);
sheetnames = arrayfun(@(i) workbook.Sheets.Item(i).Name, 1:workbook.Sheets.Count, 'UniformOutput', false);
if ismember(sheetname, sheetnames)
hassheet(fileidx) = true;
end
end
end
%now call the function, note that data is a pre-existing structure array that I have.
filepath = {data.AllExcelPath};
filelist = {data.AllExcel};
hassheet = CheckForWorksheet(SubFolder, filelist, 'Cooking_is_fun');
wantedExcel = filelist(hassheet);
However, this kept on giving me errors such as:
Cell contents reference from a non-cell array object.
Error in Open_Excel_Files_All/CheckForWorksheet
workbook = excel.Workbooks.Open(filepath{fileidx}, false, true);
Error in Open_Excel_Files_All
hassheet = CheckForWorksheet(SubFolder, filelist,'Cooking_is_fun');
My goal is to:
be able to write the function and call the function in the same script, so that when I send the code to someone they don't have to run a function and a m-file separately.
be able to extract wanted excel files based on whether it contain a sheet named "Cooking_is_fun".
I am fairly new to matlab and am still confused with many basic concepts. Please advice me with any tips/guide.
Thank you very much for reading and any help will be greatly appreciated :)

2 Comments

Note that I've just realised that there's a minor bug (I never close the workbooks I open) in the original code I wrote for CheckForWorksheet. Fixed in the original discussion.
chlor thanks
chlor thanks on 13 Jul 2016
Edited: chlor thanks on 13 Jul 2016
Ok! Thank you so much for your generous help and reaching out to me again!

Sign in to comment.

 Accepted Answer

Assuming that filepath is just a list of folders (no filename) and filelist is the corresponding list of files (just filenames), then the simplest thing might be to concatenate the two into just one list (with fullfile) and just pass that list to the check function. Therefore,
in the calling code:
fullpaths = fullfile(filepath, filelist); %concatenate paths with filenames
hassheet = CheckForWorksheet(fullpaths, 'Cooking_is_fun');
%fullpaths(hassheet) is what you want to keep
the check function:
function hassheet = CheckExcelFilesForWorksheet(filelist, sheetname)
%HASSHEET Check whether or not the given excel files have a sheet with the given name
%filelist: full paths of excel files (cell array of 1D char arrays / string array)
%sheetname: name of sheet to find in excel files (1D char array / string)
%hassheet: array the same size as filelist, indicating whether or not the excel file has a sheet with sheetname (logical)
hassheet = false(size(filelist)); %output
excel = actxserver('Excel.Application'); %start microsoft excel
cleanupobj = onCleanup(@() excel.Quit); %close excel when function returns or error occurs
for fileidx = 1:numel(filelist)
%open without updating link and as read only. read only ensure the file can be opened even if it's already in use:
workbook = excel.Workbooks.Open(filelist{fileidx}, false, true);
%get the list of worksheet name by iterating over the Sheets collection:
sheetnames = arrayfun(@(i) workbook.Sheets.Item(i).Name, 1:workbook.Sheets.Count, 'UniformOutput', false);
workbook.Close(false); %close workbook, without saving
if ismember(sheetname, sheetnames)
hassheet(fileidx) = true;
end
end
end
Note that you should really learn what a cell array is. Yes, your filelist and filepath are cell arrays. Cell arrays are containers like matrices except that matrices can only contain numbers while cell arrays can contain anything. Therefore, if you want to put a bunch of strings (char arrays) together, you put them in a cell array.

2 Comments

Following on from Image Analyst's point that new versions of xlsfinfo keep a persistent excel instance. The above code could be simplified to:
function hassheet = CheckExcelFilesForWorksheet(filelist, sheetname)
%HASSHEET Check whether or not the given excel files have a sheet with the given name
%filelist: full paths of excel files (cell array of 1D char arrays / string array)
%sheetname: name of sheet to find in excel files (1D char array / string)
%hassheet: array the same size as filelist, indicating whether or not the excel file has a sheet with sheetname (logical)
hassheet = false(size(filelist)); %output
for fileidx = 1:numel(filelist)
[~, sheetnames] = xlsfinfo(filelist{fileidx});
if ismember(sheetname, sheetnames)
hassheet(fileidx) = true;
end
end
end
chlor thanks
chlor thanks on 19 Jul 2016
Edited: chlor thanks on 19 Jul 2016
Thank you Guillaume, I believe this will work perfectly except that I keep on getting some error saying that the object invoked has disconnected from its clients.
Alternatively I am trying xlsfinfo() now, it turns out this works very slow and not nearly as good as yours... Hope that I can figure out how to fix the disconnection error soon. After all I appreciate all your help very much!! Thank you!

Sign in to comment.

More Answers (2)

Walter Roberson
Walter Roberson on 12 Jul 2016
You construct a variable named filepath as a cell array, but what you pass as the first parameter in your call is SubFolder which we do not know the contents of.

8 Comments

@chlor, It would have been better to continue the conversion in that question.
1. You've removed the documentation (the comments) that I wrote for the function. It clearly stated what the purpose of each input was. In particular:
%folder: folder where the excel files are located (1D char array / string)
%filelist: names of excel files (cell array of 1D char arrays / string array)
2. Then you went on to change the code for opening the excel file. My original code constructed a full path for the file using the folder name and one of the file name from the filelist.
Your new code doesn't make much sense. You're iterating over the number of elements in filelist, yet never use filelist anywhere and use that iterator to extract elements in a cell array filepath instead. Is there a guarantee that the two cell arrays have the same size? Probably not. Is filepath supposed to be a cell array in the first place? probably not.
What was the purpose of the code change? If you're already passing the full path of the files, then change the signature of the function to:
function hassheet = CheckExcelFilesForWorksheet(filepaths, sheetname)
%HASSHEET Check whether or not the given excel files have a sheet with the given name
%filepaths: full path of excel files (cell array of 1D char arrays / string array)
%sheetname: name of sheet to find in excel files (1D char array / string)
%hassheet: array the same size as filelist, indicating whether or not the excel file has a sheet with sheetname (logical)
and change the loop to:
for fileidx = 1:numel(filepaths)
%open without updating link and as read only. read only ensure the file can be opened even if it's already in use:
workbook = excel.Workbooks.Open(filepaths{fileidx}), false, true);
Note that comments in code are a good thing. Also note that the comments immediately after the function signature show when you type help hasssheet or doc hassheet so are even more essential.
Guillaume, I apologize for not mentioning the old thread, I really appreciated the fact that you had a back and forth conversation with me with some guidance in my older thread, and I felt bad kept on asking your the simplest questions, especially after I already accepted your answer. I tried to figure out the bug on my own but have not able to, thus I opened a new topic for my issue, which is mainly for why my code did not work after I did some change based on your code. I know it has been my own bug as you stated in your previous answers in the older thread. To answer your question, I changed the code because I thought the purpose of "folder" as function input is to work together with "filelist" to get the full path of each excel file, which I have already managed to obtain and put in a pre-existing structure array called "data" and I tried to call it by filepath = {data.AllExcelPath}. Thank you for continuing with my question, I really appreciate it!
Walter, thank you for commenting, it is one of the bugs, I fixed the input and now my problem narrow down to
Error using hassheet = false(size(filelist));
Not enough input arguments.
Guillaume, so I run your code from the older thread without any modification, and it gives me the following error...
>> CheckExcelFilesForWorksheet Undefined function or variable 's'.
Error in CheckExcelFilesForWorksheet (line 35)
hassheet = false(size(s));
Then I changed the line hassheet = false(size(s)); to hassheet = false(size(filelist)); but it generates error again:
Error using CheckForWorksheet (line 35)
Not enough input arguments.
You were correct that it should have been size(filelist). I've edited the post to fix the error.
If you get a 'not enough input argument' errors that is probably because you haven't supplied enough arguments when you called the function hassheet.
Can you clarify what exactly are the inputs that you'd like to pass to the function, what they contain and what type (e.g. cell arrays) they are.
Note that for testing if a sheet belongs to an excel file, the code is simply:
%inputs are fullpathoffile and sheetname, both strings
%output is hassheet, logical
excel = actxserver('Excel.Application');
workbook = excel.Workbooks.Open(fullpathoffile, false, true);
sheetnames = arrayfun(@(i) workbook.Sheets.Item(i).Name, 1:workbook.Sheets.Count, 'UniformOutput', false);
hassheet = ismember(sheetname, sheetnames);
excel.Quit;
Because the above starts and close excel each time, it is better to wrap the check in a loop as I did with the function and only start and close excel once.
chlor thanks
chlor thanks on 12 Jul 2016
Edited: chlor thanks on 12 Jul 2016
Please correct me if I am wrong, I thought we can run a function without input argument to check whether it is functionable? so in the previous comment I run your original code without any input argument while it generated the "not enough input argument" error... (note that the matlab version I am using is 2011b, I don't know if that has anything to do with the error)
And thank you for the detailed explanation on how to test if a sheet belongs to an excel file, I will try that out also :)
In general, you cannot call a function without input arguments just to test it, regardless of the version of matlab. If you try to call the built-in sin function without an input, you're going to get the same error.
Some functions may work without any input (e.g. spy) but they're in the minority and their behaviour is most likely different from when inputs are supplied.
I must have been confused with when I was able to run a function from fileexchange to add it to path... so to answer your previous question: from my very limited understanding of the different inputs that I have, "data" is a structure array. "filepath" is a cell(? or char? I don't really know the different between the two..) array that contains all the full excel files' path, which I generated from
filepath = {data.AllExcelPath};
"filelist" is another cell(?) array that contains all the excel file names, which I generated from
filelist = {data.AllExcel};
For a better explanation than what I can do, I attached a picture of what they look like in my workspace, thanks!

Sign in to comment.

Image Analyst
Image Analyst on 12 Jul 2016
Rather than using your own CheckForWorksheet() function, you might use the built in function for doing that. From the help:
xlsfinfo
Determine if file contains Microsoft Excel spreadsheet
[status,sheets] = xlsfinfo(filename) additionally returns the name of each spreadsheet in the file.

6 Comments

He! I wasn't aware of xlsfinfo. Yes, it can be used as a replacement for part of the code I've written. Internally, it is more or less the same code anyway (if you have excel installed).
However, i find Mathwork's description, "Determine if file contains Microsoft Excel spreadsheet", misleading. It simply returns all the spreadsheet, it does not check if the file contains a particular spreadsheet. You still need an ismember call for that.
Guillaume
Guillaume on 13 Jul 2016
Edited: Guillaume on 13 Jul 2016
Actually, since xlsfinfo starts and stops excel for each call and since chlor thanks wants to check a lot of files, my hand written function is still probably better as it only starts and stops excel once.
chlor thanks
chlor thanks on 13 Jul 2016
Edited: chlor thanks on 19 Jul 2016
Thank you for the tips from both of you!! Now I learn something new again :D
As of R2015b, Excel doesn't launch and shutdown each time, at least not with xlsread() and xlswrite() so I imagine it would be the same with xlsfinfo().
chlor thanks
chlor thanks on 14 Jul 2016
Edited: chlor thanks on 14 Jul 2016
My excel is 2011, so it should not have the same problem? But you are right, if this code is send to a computer that has updated excel, it will have some trouble.
It's the version of MATLAB you have that matters, not the version of Excel.

Sign in to comment.

Asked:

on 11 Jul 2016

Edited:

on 19 Jul 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!