Text Analytics Toolbox supports the languages English, Japanese, German, and Korean. Most Text Analytics Toolbox functions also work with text in other languages. For more information, see Language Considerations.


tokenizedDocumentArray of tokenized documents for text analysis
removeStopWordsRemove stop words from documents
normalizeWordsStem or lemmatize words
stopWordsList of stop words
mecabOptionsOptions for MeCab tokenization (Since R2019b)
tokenDetailsDetails of tokens in tokenized document array
addSentenceDetailsAdd sentence numbers to documents
addPartOfSpeechDetailsAdd part-of-speech tags to documents
addEntityDetailsAdd entity tags to documents
addLemmaDetailsAdd lemma forms of tokens to documents
addLanguageDetailsAdd language identifiers to documents
corpusLanguageDetect language of text


