Tuesday, July 31, 2012

Regular Expressions for Integration Server

Introduction

A regular expression is a text string that describes some set of strings. Regular expressions are a pattern-matching technique supported in a variety of tools and software but are most commonly known from UNIX tools such as grep, sed, vi and emacs. Using regular expressions (regex for short) you can determine if a string matches a specific pattern.
webMethods Integration Server provides regex support. The syntax and usage is described in the webMethods Developer User's Guide (Appendix D in version 4.6, Appendix B in 6.0).
This webMethods Ezine article addresses using regex patterns in:
  • The label property of a child step within a BRANCH
  • webMethods Query Language (WQL) statements
  • Built-in String Services

Regex in Labels

Using regular expressions can greatly simplify your BRANCH constructs. The regular expression in a label must be surrounded with slashes. As an example, consider the common task of needing to determine if a pipeline variable contains at least one non-space character. Without using regex the FLOW steps to accomplish this might look like this:
webMethods regex Flow
Using a regular expression can greatly simplify the steps needed. The following FLOW steps accomplish this easily:
webMethods regex Flow
The following table provides a few regular expressions samples for common tasks.
Regex statementIs "True" if...
/.+/BRANCH switch has one or more characters, including just spaces. Strings that are $null or are empty will not be selected by this label (e.g. BRANCH won't take this path). Strings that have only spaces will take this branch.
/^ISA/BRANCH switch variable starts with the characters "ISA".
/[^ ]/BRANCH switch variable is not null, is not empty, and contains at least one non-space character.
/webMethods$/BRANCH switch variable ends with the string "webMethods".
/[^ \t\r\n]/BRANCH switch variable is not null, not empty, and contains at least one non-whitespace character.


regex and WQL

WQL is the webMethods precursor to XQL. Both languages are supported by the query service (pub.xml:queryXMLNode in v6.0, pub.web:queryDocument in v4.6 and earlier). WQL supports using regex to specify a pattern-matching string to select members of an element array.
This facility is described in the Working with XML Documents document shipped with v6.0 and in the Developer User's Guide with v4.6 and earlier. From an example described in the webMethods documentation, the following query:
    doc.bitterrootboards['*Pro Series*'].text
returns every bitterrootboards element that contains the string "Pro Series" anywhere within it. WQL coupled with regex provides significant power and flexibility in processing XML.

regex and Built-In String Services

There are two built-in services that support regex:
  1. pub.string:lookupTable
  2. pub.string:replace
The pub.string:lookupTable service can use a regular expression to perform a lookup on the table. It returns the first value in the table whose key matches the specified regex pattern.
The pub.string:replace service supports using a regular expression as the search string. Erik Versteijnen provided an example in the webMethods Discussion Forums post to strip leading zeroes from a number. The screen shot below is the results tab after running pub.string:replace.
webMethods regex Flow
In this example, "+0000" in the inString is replaced by "+" as shown by the variable named value. The search string can be described as "look at the beginning of the string for a + followed by one or more zeroes." The search string uses these regex features:

  • The ^ specifies that the pattern must be at the start of the inString.
  • The + character has a special meaning in regex. For this pattern we want to use the + character literally. The \ escapes the + telling the regex engine to use + as a literal rather than a regex operator.
  • The 0 is used literally. The second + character is a regex operator indicating that the preceding element (in this case the character 0) can occur 1 or more times.
We can add some flexibility to Eric's regex example to support a optional leading + or - and keep it in the resulting value. The screen shot below again shows the results tab after running pub.string:replace with the indicated parameters. Note the differences in the searchString parameter in this example as compared to the first example.
webMethods regex Flow
The search string in this second example can be described as "look at the beginning of the string for an optional + or - followed by one or more zeroes." This search string uses these regex features:
  • The ^ specifies that the pattern must be at the start of the inString.
  • The parentheses indicate a subexpression that can be used as a substitution variable in the replacement string. In this case, any leading + or - will be placed where $1 appears, so the resulting value will have the same leading sign as the input string. If there is no leading sign, then the resulting value will not have a leading sign.
  • The brackets indicate a character set. The set here is the + and the - characters. The \ escapes the + (since it is a regex operator character and we want to use it literally).
  • The ? is a regex operator which indicates that the preceding item can occur zero or one times. The preceding item in this case is a character set, so the pattern will match only if one of the characters from the set appears in inString.
  • The 0 is used literally. The second + is a regex operator indicating that the preceding element (in this case the character 0) can occur 1 or more times.
As you can see, just a few characters in a regular expression can do quite a bit of work. Any regular expression can be used to search for and replace substrings.
The best way to learn to use regular expressions is to play around with them. Run pub.string:replace directly in Developer to learn the basics and expand your regex abilities.

Summary

Regular expressions are a powerful way to easily process strings and control process flow. While they can't do everything (e.g. using regex to try to trim whitespace from all elements in an XML string would be ill-advised) they can make many tasks a breeze.