Matrix index is out of range for deletion

Question

oliver on 10 Apr 2023

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion

Commented: Walter Roberson on 10 Apr 2023

Accepted Answer: Walter Roberson

IMBD_reviews_smol.csv

Open in MATLAB Online

my project is sentiment analysis I am trying to follow the tutorial "Create Simple Text Model for Classification"

my database is a list of reviews with labelled sentiment (either 'positive' or 'negative)

I am trying to remove any documents containing no words from the bag-of-words model, and remove the corresponding entries in labels

my code is:

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = []; %produces an error 
Deletion requires an existing variable.
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

7 Comments
Show 5 older commentsHide 5 older comments

oliver on 10 Apr 2023

Open in MATLAB Online

with the code i recieve the error message "Error using classreg.learning.classif.FullClassificationModel.prepareData

No class names are found in input labels." about line 25 "mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");"

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain = [];
XTrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
documentsTest = preprocessText(textDataTest);
XTest = encode(bag,documentsTest);
YPred = predict(mdl,XTest);
acc = sum(YPred == YTest)/numel(YTest);
str = [
    "i hated this movie."
    "this was really good"
    "sometimes slow movies work out in the way you want and thats how this movie went"];
documentsNew = preprocessText(str);
XNew = encode(bag,documentsNew);
labelsNew = predict(mdl,XNew);
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

Walter Roberson on 10 Apr 2023

Yes, as I indicated, you are removing all documents from the bag, so your training information becomes empty.

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 10 Apr 2023

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/1944759-matrix-index-is-out-of-range-for-deletion#answer_1213124

Moved: Walter Roberson on 10 Apr 2023

Open in MATLAB Online

IMBD_reviews_smol.csv

filename = "IMBD_reviews_smol.csv"; 
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
 
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
whos Ytrain idx
  Name          Size             Bytes  Class          Attributes

  Ytrain      181x1                423  categorical              
  idx           1x181             1448  double                   
Ytrain(idx) = []; %produces an error 
Xtrain = bag.Counts;
whos
  Name                 Size              Bytes  Class                Attributes

  Xtrain               0x0                  24  double               sparse    
  Ytest               20x1                 262  categorical                    
  Ytrain               0x1                 242  categorical                    
  ans                  1x46                 92  char                           
  bag                  1x1                 640  bagOfWords                     
  cmdout               1x33                 66  char                           
  cvp                  1x1                3278  cvpartition                    
  data               201x2              543470  table                          
  dataTest            20x2               66077  table                          
  dataTrain          181x2              478944  table                          
  documents          181x1               43321  tokenizedDocument              
  filename             1x1                 178  string                         
  idx                  1x181              1448  double                         
  textDataTest        20x1               64602  string                         
  textDataTrain      181x1              477308  string                         
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels.

Error in ClassificationECOC.prepareData (line 128)
                classreg.learning.classif.FullClassificationModel.prepareData(X,Y,varargin{:});

Error in classreg.learning.FitTemplate/fit (line 246)
                    this.PrepareData(X,Y,this.BaseFitObjectArgs{:});

Error in ClassificationECOC.fit (line 119)
            this = fit(temp,X,Y);

Error in fitcecoc (line 357)
    obj = ClassificationECOC.fit(X,Y,varargin{:});
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end

You are removing all of the documents. The bag is left empty.

2 Comments
Show NoneHide None

oliver on 10 Apr 2023

Edited: Walter Roberson on 10 Apr 2023

I am trying to follow this matlab link https://uk.mathworks.com/help/textanalytics/ug/create-simple-text-model-for-classification.html but using my own dataset. can you help with what i need to change?

Walter Roberson on 10 Apr 2023

Open in MATLAB Online

IMBD_reviews_smol.csv

You were calling removeShortWords twice, so all words less than 15 characters were being removed. The remaining "words" all happened to be unique, so removing infrequent words resulted in an empty bag.

filename = "IMBD_reviews_smol.csv";

data = readtable(filename,'TextType','string');

data.sentiment = categorical(data.sentiment);

cvp = cvpartition(data.sentiment,'Holdout',0.1);

dataTrain = data(cvp.training,:);

dataTest = data(cvp.test,:);

textDataTrain = dataTrain.review;

textDataTest = dataTest.review;

Ytrain = dataTrain.sentiment;

Ytest = dataTest.sentiment;

documents = preprocessText(textDataTrain);

bag = bagOfWords(documents);

bag = removeInfrequentWords(bag,2);

[bag,idx] = removeEmptyDocuments(bag);

Ytrain(idx) = [];

Xtrain = bag.Counts;

mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");

mdl

mdl =

CompactClassificationECOC ResponseName: 'Y' ClassNames: [negative positive] ScoreTransform: 'none' BinaryLearners: {[1×1 ClassificationLinear]} CodingMatrix: [2×1 double] Properties, Methods

function documents = preprocessText(textData)

documents = tokenizedDocument(textData);

documents = addPartOfSpeechDetails(documents);

documents = removeStopWords(documents);

documents = erasePunctuation(documents);

documents = removeShortWords(documents,2);

documents = removeLongWords(documents,15);

end

Sign in to comment.

Matrix index is out of range for deletion

7 Comments
Show 5 older commentsHide 5 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Matrix index is out of range for deletion

7 Comments Show 5 older commentsHide 5 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

7 Comments
Show 5 older commentsHide 5 older comments

2 Comments
Show NoneHide None