Unicode compatibility composed normalized form (NFKC)
Normalize String to Unicode Compatibility Canonical Composition Form
Strings that look identical can have different underlying representations. The Unicode compatibility canonical composition form (NFKC) ensures that equivalent strings have a unique binary representation.
Consider the string
"eﬃcient", where the character
"ﬃ" is represented by the code unit
"\xFB03". The string has length 7.
str = compose("e\xFB03") + "cient"
str = "eﬃcient"
ans = 7
Normalize the string using the
newStr = textanalytics.unicode.nfkc(str)
newStr = "efficient"
View the length of the normalized string. The normalized representation includes two extra code units. In this case, the function replaces the
"ﬃ" character with the string
ans = 9
Extract the second to fourth code units of the normalized string.
ans = "ffi"
Check whether the strings
newStr are equal using the
== operator. The operator returns
0 because the strings have different underlying representations.
tf = str == newStr
tf = logical 0
str — Input text
string array | character vector | cell array of character vectors
Input text, specified as a string array, character vector, or cell array of character vectors.
["An example of a short sentence."; "A second short
Unicode Normalization Forms
For more information about Unicode normalization forms, see Unicode Standard Annex #15 Unicode Normalization Forms.
 Whistler, Ken, ed. "Unicode Standard Annex #15: Unicode Normalization Forms." Unicode Technical Reports, August 27, 2021. https://unicode.org/reports/tr15/.
Introduced in R2022b