How do I remove unwanted strings from a table imported from excel?

I read an excel table that contains 5 columns of values and 1 column of string. I want to remove some text from the string column such that if the text stands all alone on a line, the entire row is deleted.Please, see the attached file.
In the worksheet, I want to remove "<Thread>", "<ThreadID>289123_4_3</ThreadID>","<InitPost>" etc.

Answers (1)

Attach the workbook and your calls to readtable() . If you don't have the latest MATLAB and are having to use xlsread() instead, then let us see your code, or else upgrade where this will be a lot easier.

4 Comments

clear
[header, y, x]=xlsread('sent_training_sample.xlsx', 1, 'A2:F1806');
for i=1:size(x,1) %row length
s1=x(i,:); % to assign a complete row to s1
c1=s1(:,6); % to assign col. 6 of s1 to c1
e=['<Thread>', '<ThreadID>', '</ThreadID>', '<InitPost>', '<UserID>', '</UserID>', '<Date>', '</Date>', '</InitPost>', '<Post>', '</Post>'];
% e is the list of all tags that are to be removed from text.
b=regexp(c1, '(<)\w*(>)|(</)\w*(>)$', 'match') %regex to extract xml tags
a=unique(ismember(b,e))
if a==0
xlswrite('tag_free', s1,'free');
end
end
b =
{1x1 cell}
Error using cell/ismember>cellismemberlegacy (line 131) Input A of class cell and input B of class char must be cell arrays of strings, unless one is a string.
Error in cell/ismember (line 76) [varargout{1:nlhs}] = cellismemberlegacy(varargin{:});
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% The problem with the code is that the value of "b" as generated by the regular expression could not be passed into "a=unique(ismember(b,e))". It appears there is type mismatch. Please kindly help out!
I don't know what cellismemberlegacy() is. Those look like bad names. The first cell array is not header material and the second is not y and the third x, it's really
[numbersOnly, textOnly, everything] = xlsread(.....
You should probably only use the everything cell array because that's the only one where the row and column numbers of the numbers and text are synced up. If you use the first two, they're not. That's why it's so much better to use readtable. Do you have R2013b?
No, I don't have R2013b. I'm using R2012b.
Well, in general, to remove something, set that element to [].

Sign in to comment.

Categories

Asked:

on 25 Feb 2014

Commented:

on 28 Feb 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!