Reference Information

5 minute read Last updated on October 16, 2024

Regular Expressions

Regular Expressions (RegEx) are specially written strings that are used to search text. Regular Expressions have a formal syntax that defines how the search is performed. They provide flexibility where a simple string match would not suffice.

Supported Operators - The following table contains some of the common operators supported by Regular Expressions in Dispatcher Stratus . Please note that this is not a complete list of all operators.

Match Modifiers - These characters will affect how the search is performed.

Operator Description
* General escape character
. (period) Match any character
* Match 1 or more of the previous character
? Match 0 or 1 of the previous character
[] Defines a set of characters to match (i.e. [0-9] match the digits 0 to 9; [a-z] match lower case letter a to z; [A,E,I,O,U] match uppercase vowels)
[^] Defines a set that will not match (i.e. [^0-9] do not match the digits 0 to 9)
^ Match the start of a line
$ Match the end of a line

Character Classes

These characters will match against single characters, words or special characters:

Operator Description
\d Match any decimal digit (short form of [0-9])
\D Do not match any decimal digit ([^0-9])
\s Match any whitespace character (tab, newline, formfeed, carriage return, or space)
\S Do not match any whitespace character
\w Match any “word” character [a-z, A-Z, 0-9, _]
\W Do not match any “word” character [^ a-z, A-Z, 0-9, _]
\b Match a word boundary
\B Do not match a word boundary
\cx Match the control-x character where x is any character i.e. \cs matches the control-s character
\e Match the escape character (hex 1B)
\f Match the formfeed character (hex 0C)
\n Match the newline character (hex 0A)
\r Match the carriage return character (hex 0D)
\t Match the tab character (hex 09)
\ddd Match the character with the octal code ddd
\xhh Match the character with the hex code hh

For more information on regular expressions, please visit: Regular Expressions Wiki

For a tutorial to learn more about how to use regular expressions, please visit: Regular Expressions Tutorial

The syntax supported by Dispatcher Stratus is defined on the Metadata Browsing page.

Using Special Characters In Regular Expressions

If you want to use the following special characters as a literal in a regular expression, you must use a backslash (\) to suppress their special meaning:

Operator Description
[ opening square bracket
* backslash
^ caret
$ dollar sign
. period
** **
? question mark
* asterisk
+ plus sign
( opening round bracket
) closing round bracket

For example, if you want to match 1+1=2, the correct regular expression is: 1\+1=2; otherwise the plus sign will have a special meaning.

The following examples of using regular expressions in Dispatcher Stratus assume that the following text file is the file searched.

Content Search

To search for the string “test”, enter “test” into the search text field. Depending on the match case option, the search results would be as follows:

Content Search

To search for the word “test” it needs to be delimited with the word boundary operator \b, as such, a search using the string “\btest\b” will return the following results:

Content Search

To locate multiple strings with a numeric value the “\d” operator can be used. Using the search string “Test #\d” will produce the following result:

Content Search

The Parse Node has the ability to search for file names using regular expressions. Unlike the other parser nodes which operate on the contents of files, the Parser Node operates on file names.

For example, with the following list of files:

testfile1.txt

testfile2.xls

testfile3.docx

testfile4.doc

testfile5.psd

testfile6.pdf

testfile7.jpg

testfile8.tiff

Using a search string of “testfile\d” would match all of the files in the preceding list.

Specifying Page-Level Metadata

The following are examples of specifying page-level metadata.

  • The metadata for a specific page can be specified by adding a page number between two square brackets ([ ]). This will return the text of the first value for the subgroup found on the specified page. For example, the following would return the value of bar1:Address for page 5:
	{bar1:Address\[5\]} 
  • To specify document-level metadata, add a 0 between two square brackets or leave it blank. For example:
	{bar1:Address\[0\]}

	{bar1:Address}
  • If you are processing in a page-per-page manner (e.g., applying a Bates stamp or annotation), you can use ‘current’ between two square brackets to retrieve the data from the page being processed. For example:
	{annotate:text\[current\]}

Specifying Metadata Occurrence Number

The following are examples of specifying a metadata occurrence number.

  • You can specify an occurrence number using another pair of square brackets following the page-level brackets. For example, the following would return the second occurrence of “value:bar” on page 3:
	{value:bar\[3\] \[2\]}
  • In another example, the following would return the first document-level occurrence of “parser:Value”:
	{parser:Value\[0\]\[1\]} 
  • To specify the first value found, use []. For example, the following would return the first barcode found in the “Address” zone, regardless of page:
	{bar1:zone.Address\[\]} 
  • To indicate that multiple values should be returned as a joined string, use | . For example, the following would return all values of bar1:zone.part number, separated by “-“:
	{bar1:zone.part number\[\]|\\-}