containsNgrams

Check if n-gram is member of documents

Since R2022a

Syntax

tf = containsNgrams(documents,ngrams)

tf = containsNgrams(documents,ngrams,IgnoreCase=flag)

Description

tf = containsNgrams(documents,ngrams) returns 1 where any n-gram of documents matches ngrams and returns 0 otherwise.

example

tf = containsNgrams(documents,ngrams,IgnoreCase=flag) also specifies whether to ignore letter case when checking n-grams.

Examples

collapse all

Check if N-Gram Is Member of Document

Open Live Script

Create an array of tokenized documents.

documents = tokenizedDocument([
    "an example of a short sentence" 
    "a second short sentence"]);

Check for documents containing the n-gram ["a" "short"].

tf = containsNgrams(documents,["a" "short"])

tf = 2×1 logical array

   1
   0

Input Arguments

collapse all

`documents` — Input documents
`tokenizedDocument` array

Input documents, specified as a tokenizedDocument array.

`ngrams` — N-grams to check
string array | character vector | cell array of character vectors | `pattern` array

N-grams to check, specified as one of these values:

String array
Character vector
Cell array of character vectors
pattern array

If ngrams is a string array, cell array, or pattern array, then it has size numNgrams-by-maxN, where numNgrams is the number of n-grams and maxN is the length of the largest n-gram. If ngrams is a character vector, then it represents a single word (unigram).

The value of ngrams(i,j) corresponds to the jth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of ngrams must be empty.

If ngrams contains multiple n-grams or patterns, then the function returns 1 where any of the n-grams appear in the corresponding document.

Example: ["An" ""; "An example"; "example" ""]

Data Types: string | char | cell

`flag` — Option to ignore case
`0` (`false`) (default) | `1` (`true`)

Option to ignore case, specified as one of these values:

0 (false) – Treat candidate matches that differ only by letter case as nonmatching.
1 (true) – Treat candidate matches that differ only by letter case as matching.

Version History

Introduced in R2022a

containsNgrams

Syntax

Description

Examples

Check if N-Gram Is Member of Document

Input Arguments

`documents` — Input documents
`tokenizedDocument` array

`ngrams` — N-grams to check
string array | character vector | cell array of character vectors | `pattern` array

`flag` — Option to ignore case
`0` (`false`) (default) | `1` (`true`)

Version History

See Also

Topics

containsNgrams

Syntax

Description

Examples

Check if N-Gram Is Member of Document

Input Arguments

documents — Input documents tokenizedDocument array

ngrams — N-grams to check string array | character vector | cell array of character vectors | pattern array

flag — Option to ignore case 0 (false) (default) | 1 (true)

Version History

See Also

Topics

`documents` — Input documents
`tokenizedDocument` array

`ngrams` — N-grams to check
string array | character vector | cell array of character vectors | `pattern` array

`flag` — Option to ignore case
`0` (`false`) (default) | `1` (`true`)