Email Parser

Extract data from incoming emails and automate your workflow

Documentation topics:

Capturing text with Regular Expressions

See also
Capturing text with “Filtering and Replacing”
Capturing text with “Starts with… Continues until…”
Example – Basic regular expression use

Highly recommended link
Regex101, an online regex tester and builder

 

Regular expressions, as happens with the wildcard expressions, are a way to let Email Parser know how a text looks like. In other words, a specification of the text format. But, unlike wildcard expressions, they are much complex to use and also much more powerful.

As they are widely used in other contexts and very well documented, this help topic is only a brief explanation of what regular expressions are and how they work in Email Parser. There are even books and full websites covering this topic only.

 

The very basics of Regular expressions

A regular expression is a text string that uses tokens to match a given text. For example “\d” matches any number between 0 and 9:

Regular expression Input text Captured text
\d\d\d Hello John, please call me to 788-383-134 788
-\d\d\d- Hello John, please call me to 788-383-134 -383-
\d\d\d-\d\d\d-\d Hello John, please call me to 788-383-134 788-383-1
\d\d\d\d\d Hello John, please call me to 788-383-134

 

As you can see, given a regular expression and an input text, there can be a match or not. And sometimes there can be more than one match (see the first example of the table). Email Parser, by default, takes the first match only.

Within Email Parser, regular expressions are used this way:

 

Clicking on the testing tab produces the following:

 

 

If we activate the “This field can appear multiple times” the result is:

 

 

There are many other types of tokens. The most used ones are:

Token
. Matches any character except the line break (yes, the line break is a character)
\s Matches a white space
\w Matches any word character such as a,b,c,d,e…
[aeiou] Matches any vowel. You can replace “aeiou” with any set of characters.
For example [abc] will match with a,b or c
\n Matches a new line character
[a-zA-Z] Matches any character in the range of a…z and A..Z

 

You can combine tokens to build more complex text captures. For example:

 

Regular expression Input text Result
\w\d\d\d-\d\d\d The order id is A233-531 A233-531

 

 

Quantifiers are used with the tokens shown before to build more complex regular expressions. For example

 

Quantifier
* 0 or more of previous expression.
+ 1 or more of previous expression.
? 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string

 

For example:

Regular expression Input text Captured text
\d+ Hello John, please call me to 788-383-134 788
-\d+-? Hello John, please call me to 788-383-134 -383-
J\w* Hello John, please call me to 788-383-134 John
.* Hello John, please call me to 788-383-134 Hello John, please call me to 788-383-134

 

 

 

Capturing text with a capture group

A capture group is a label within a regular expression that define the name of a part of the matching text. For example, in a phone number there is a part that we can label as “prefix”, in a date there are “month”, “year” and “day” etc. These are helpful if you want to capture not the full regular expression match but only part of it.

A Capture group is used enclosing part of the regular expression between (?’yourlabelhere’ and )

If Email Parser finds a capture group with the same name as the field name will take that part as the captured text. Otherwise it will take the full match. For example:

Email Parser field name Regular expression Input text Captured text
prefix (?’prefix’\d+)-\d+-\d+ Hello John, please call me to 788-383-134 788
month (?’year’\d+)/(?’month’\d+)/(?’day’\d+) The date is 2017/6/8. Blah blah 6
year (?’year’\d+)/(?’month’\d+)/(?’day’\d+) The date is 2017/6/8. Blah blah 2017
address (?’year’\d+)/(?’month’\d+)/(?’day’\d+) Hello Carl, some text here 2017/6/8 etc et 2017/6/8
address (?’year’\d+)/(?’month’\d+)/(?’day’\d+) Hello Carl, some text here etc etc