Removing line breaks (or other unwanted characters) from captured text
June 12, 2014
You can master regular expressions or c# scripting and build a parser that captures from the email exactly what you want but sometimes unwanted characters such as line breaks, tabs or weird characters (for instance ıġħť p̀ł) are part of the text captured.
To solve these issues we need to apply to the captured text another step of parsing called “text filtering and replacing” :
Notice the regular expression used:
[^dts()]It matches any character except a digit (d), a tab (t), a space (s) and also the left and right parenthesis. This means that anything that it is not commonly used in a phone number will match and will be removed. For example, if we have captured the following phone number in the field “mobile_phone”:
758956-786s.awThe field “mobile_phone_filtered” will be:
758956786