Remove words with low counts from bag-of-words model
Remove Infrequent Words
Remove the words that appear two times or fewer from a bag-of-words model.
Create a bag-of-words model from an array of tokenized documents.
documents = tokenizedDocument([ "an example of a short sentence" "a second short sentence" "another example" "a short example"]); bag = bagOfWords(documents)
bag = bagOfWords with properties: Counts: [4x8 double] Vocabulary: ["an" "example" "of" "a" "short" ... ] NumWords: 8 NumDocuments: 4
Remove the words that appear two times or fewer from the bag-of-words model.
count = 2; newBag = removeInfrequentWords(bag,count)
newBag = bagOfWords with properties: Counts: [4x3 double] Vocabulary: ["example" "a" "short"] NumWords: 3 NumDocuments: 4
bag — Input bag-of-words model
Input bag-of-words model, specified as a
count — Count threshold to remove words
Count threshold to remove words, specified as a positive integer. The
function removes the words that appear
count times in
total or fewer.
Introduced in R2017b