Main Content

extractBetween

Extract substrings between start and end points

Description

newStr = extractBetween(str,startPat,endPat) extracts the substring from str that occurs between the substrings startPat and endPat. The extracted substring does not include startPat and endPat.

newStr is a string array if str is a string array. Otherwise, newStr is a cell array of character vectors.

If str is a string array or a cell array of character vectors, then extractBetween extracts substrings from each element of str.

example

newStr = extractBetween(str,startPos,endPos) extracts the substring from str that occurs between the positions startPos and endPos, including the characters at those positions. extractBetween returns the substring as newStr.

example

newStr = extractBetween(___,'Boundaries',bounds) forces the starts and ends specified in any of the previous syntaxes to be either inclusive or exclusive. They are inclusive when bounds is 'inclusive', and exclusive when bounds is 'exclusive'. For example, extractBetween(str,startPat,endPat,'Boundaries','inclusive') returns startPat, endPat, and all the text between them as newStr.

example

Examples

collapse all

Create string arrays and select text that occurs between substrings.

str = "The quick brown fox"
str = 
"The quick brown fox"

Select the text that occurs between the substrings "quick " and " fox". The extractBetween function selects the text but does not include "quick " or " fox" in the output.

newStr = extractBetween(str,"quick "," fox")
newStr = 
"brown"

Select substrings from each element of a string array. When you specify different substrings as start and end indicators, they must be contained in a string array or a cell array that is the same size as str.

str = ["The quick brown fox jumps";"over the lazy dog"]
str = 2x1 string
    "The quick brown fox jumps"
    "over the lazy dog"

newStr = extractBetween(str,["quick ";"the "],[" fox";" dog"])
newStr = 2x1 string
    "brown"
    "lazy"

Since R2020b

Create a string array of text enclosed by tags.

str = ["<courseName>Calculus I</courseName>";
       "<semester>Fall 2020</semester>";
       "<schedule>MWF 8:00-8:50</schedule>"]
str = 3x1 string
    "<courseName>Calculus I</courseName>"
    "<semester>Fall 2020</semester>"
    "<schedule>MWF 8:00-8:50</schedule>"

Extract the text enclosed by tags. First create patterns that match any start tag and end tag by using the wildcardPattern function.

startPat = "<" + wildcardPattern + ">"
startPat = pattern
  Matching:

    "<" + wildcardPattern + ">"

endPat = "</" + wildcardPattern + ">"
endPat = pattern
  Matching:

    "</" + wildcardPattern + ">"

Then call the extractBetween function.

newStr = extractBetween(str,startPat,endPat)
newStr = 3x1 string
    "Calculus I"
    "Fall 2020"
    "MWF 8:00-8:50"

For a list of functions that create pattern objects, see pattern.

Create string arrays and select substrings between start and end positions that are specified as numbers.

str = "Edgar Allen Poe"
str = 
"Edgar Allen Poe"

Select the middle name. Specify the seventh and 11th positions in the string.

newStr = extractBetween(str,7,11)
newStr = 
"Allen"

Select substrings from each element of a string array. When you specify different start and end positions with numeric arrays, they must be the same size as the input string array.

str = ["Edgar Allen Poe";"Louisa May Alcott"]
str = 2x1 string
    "Edgar Allen Poe"
    "Louisa May Alcott"

newStr = extractBetween(str,[7;8],[11;10])
newStr = 2x1 string
    "Allen"
    "May"

Select text from string arrays with boundaries that are forced to be inclusive or exclusive. extractBetween includes the boundaries with the selected text when the boundaries are inclusive. extractBetween does not include the boundaries with the selected text when the boundaries are exclusive.

str1 = "small|medium|large"
str1 = 
"small|medium|large"

Select the text between sixth and 13th positions, but do not include the characters at those positions.

newStr = extractBetween(str1,6,13,'Boundaries','exclusive')
newStr = 
"medium"

Select the text between two substrings, and also the substrings themselves.

str2 = "The quick brown fox jumps over the lazy dog"
str2 = 
"The quick brown fox jumps over the lazy dog"
newStr = extractBetween(str2," brown","jumps",'Boundaries','inclusive')
newStr = 
" brown fox jumps"

Create a character vector and select text between start and end positions.

chr = 'mushrooms, peppers, and onions'
chr = 
'mushrooms, peppers, and onions'
newChr = extractBetween(chr,12,18)
newChr = 1x1 cell array
    {'peppers'}

Select text between substrings.

newChr = extractBetween(chr,'mushrooms, ',', and')
newChr = 1x1 cell array
    {'peppers'}

Input Arguments

collapse all

Input text, specified as a string array, character vector, or cell array of character vectors.

Text or pattern that marks the start position of the text to extract, specified as one of the following:

  • String array

  • Character vector

  • Cell array of character vectors

  • pattern array (since R2020b)

If str is a string array or cell array of character vectors, then you can extract substrings from every element of str. You can specify that the substrings either all have the same start or have different starts in each element of str.

  • To specify the same start, specify startPat as a character vector, string scalar, or pattern object.

  • To specify different starts, specify startPat as a string array, cell array of character vectors, or pattern array.

Example: extractBetween(str,"AB","YZ") extracts the substrings between AB and YZ in each element of str.

Example: If str is a 2-by-1 string array, then extractBetween(str,["AB";"FG"],["YZ";"ST"]) extracts the substrings between AB and YZ in str(1), and between FG and ST in str(2).

Text or pattern that marks the end position of the text to extract, specified as one of the following:

  • String array

  • Character vector

  • Cell array of character vectors

  • pattern array (since R2020b)

If str is a string array or cell array of character vectors, then you can extract substrings from every element of str. You can specify that the substrings either all have the same end or have different ends in each element of str.

  • To specify the same end, specify endPat as a character vector, string scalar, or pattern object.

  • To specify different ends, specify endPat as a string array, cell array of character vectors, or pattern array.

Example: extractBetween(str,"AB","YZ") extracts the substrings between AB and YZ in each element of str.

Example: If str is a 2-by-1 string array, then extractBetween(str,["AB";"FG"],["YZ";"ST"]) extracts the substrings between AB and YZ in str(1), and between FG and ST in str(2).

Start position, specified as a numeric array.

If str is an array with multiple pieces of text, then startPos can be a numeric scalar or numeric array of the same size as str.

Example: extractBetween(str,5,9) extracts the substrings from the fifth through the ninth positions in each element of str.

Example: If str is a 2-by-1 string array, then extractBetween(str,[5;10],[9;21]) extracts the substring from the fifth through the ninth positions in str(1), and from the 10th through the 21st positions in str(2).

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

End position, specified as a numeric array.

If str is an array with multiple pieces of text, then endPos can be a numeric scalar or numeric array of the same size as str.

Example: extractBetween(str,5,9) extract the substrings from the fifth through the ninth positions in each element of str.

Example: If str is a 2-by-1 string array, then extractBetween(str,[5;10],[9;21]) extracts the substrings from the fifth through the ninth positions in str(1), and from the 10th through the 21st positions in str(2).

Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Boundary behavior, specified as 'inclusive' or 'exclusive'. When boundary behavior is inclusive the start and end specified by previous arguments are included in the extracted text. If boundary behavior is exclusive, then the start and end are not included.

Output Arguments

collapse all

Output text, returned as a string array or cell array of character vectors.

Extended Capabilities

Version History

Introduced in R2016b