Email Parser

Extract data from incoming emails and automate your workflow

MENUMENU

How to parse a HTML table and export to Excel

 

See also:
Capturing an HTML tag
Exporting email contents to Excel

 

In this example we will explain how to capture the data from a table. We can get the cells from the plain text version of the email but in this case we will get them directly from the HTML code using the text capture method Capturing HTML tag. The email we receive looks as following:

From: b.lastrange@example.com
To: johnjohnes@example.com
Subject: Supplier defects. Nov17-Jan18Hi John,

Here you can find the list of the defects found in the received orders from our suppliers. It is from the last 3 months, only.

Regards,
Bob

ER48423 Protoveo Electronic Devices 5/1/2018 Arrived a week later than expected
TH84280 Wikimba packagings 8/12/2017
AU32842 Aurend 7/12/2017 Some minor scratches in the surface. Call QA?
LT31325 Electrova 6/12/2017 Reference 4587459 is missing

 

 

We have set up a Gmail email source, a parser and an excel action:

 

export email to excel

 

 

As you can see, there is no email filter. Instead, we are getting the incoming emails directly from a label. Gmail will do the email filtering for us:

 

 

The interesting part for this example is how to get the data contained in the HTML table with the parser. The rest of the items of the left panel are very common and easy to understand.

The parser capture data from table contains 5 fields: order_defects_row, ID, supplier, date and notes. Order_defects_row is an intermediate field we use to store the rows of the table.

The default field Body_HTML of the incoming email looks like this:

 

html code of email table

 

Well, that is difficult to understand. The HTML code of an email is not meant to be human-readable but if we tell Email Parser where to dig we can get something more human-friendly.

HTML tables, like the one we are receiving in the emails are a series of <tr> tags together. If we use the capture method “Capture HTML tag” we can separate them:

 

 

export emails to excel

 

It is still not readable, but it is a step forward. Let’s convert that ugly HTML we are getting to plain text we can read. We click on the checkbox Convert output HTML to plain text

 

converting to email plain text before Excel export

 

We are getting closer. Now we have something readable but unfortunately all the columns of a given row are together. In order to separate them into cells (ID,supplier,date and notes) we use order_defects_row as input:

 

taking one column of the table

 

As each column is placed in the same line, we use the Starts after…. continues until pattern. We know that the column ID always starts on line 2 character number 1 and goes until the line ends. The same criteria apply to the rest of the columns (supplier,date and notes). We can see that column 2 goes in the line 3, the column 3 in line 4 etc.

We have not set up the Excel part yet but we can test run Email Parser to see what it actually gets:

outlook email to saved to an excel file

 

Looks good, we just need to add the items to export the fields we have captured to Excel:

 

grouping fields into rows

how to export outlook email to an excel spreadsheet

 

The full output of Email Parser looks like this:

 

 

Processing email 1 of 1

Show email  Process again

The email contains the following fields:

Subject Supplier defects. Nov17-Jan18
From “Emaill Parser tech support” <——–@gmail.com>
To “Emaill Parser tech support” <——–@gmail.com>
Body Here you can find the list of the defects found in the rec  ……
Body_HTML <div dir=”ltr”>Here you can find the list of the defect  ……
DateTimeSent 2018/04/26 10:59:18
DateTimeReceived 2018/04/26 10:59:39
CC empty
BCC empty
DateTimeProcessed 2018/04/26 12:03:08

 

········ Running parser capture data from table

The parser produced the following results:

order_defects_row ER48423Protoveo Electronic Devices5/1/2018Arrived a  ……
order_defects_row TH84280Wikimba packagings8/12/2017
order_defects_row AU32842Aurend7/12/2017Some minor scratches in the s  ……
order_defects_row LT31325Electrova6/12/2017Reference 4587459 is missi  ……
ID ER48423
ID TH84280
ID AU32842
ID LT31325
supplier Protoveo Electronic Devices
supplier Wikimba packagings
supplier Aurend
supplier Electrova
date 5/1/2018
date 8/12/2017
date 7/12/2017
date 6/12/2017
notes Arrived a week later than expected
notes
notes Some minor scratches in the surface. Call QA?
notes Reference 4587459 is missing

 

········ Running action For each id-supplier-date-notes set

Found 4 sets of ID, supplier, date, notes

 

Set 1 of 4

ID ER48423
supplier Protoveo Electronic Devices
date 5/1/2018
notes Arrived a week later than expected

 

········ Running action Add row to Excel file

Opening Excel file C:\Users\Carlos\Desktop\order_defeccts.xlsx

Adding cell ID: ER48423

Adding cell supplier: Protoveo Electronic Devices

Adding cell date: 5/1/2018

Adding cell notes: Arrived a week later than expected

 

Set 2 of 4

ID TH84280
supplier Wikimba packagings
date 8/12/2017
notes

 

········ Running action Add row to Excel file

Opening Excel file C:\Users\Carlos\Desktop\order_defeccts.xlsx

Adding cell ID: TH84280

Adding cell supplier: Wikimba packagings

Adding cell date: 8/12/2017

Adding cell notes:

 

Set 3 of 4

ID AU32842
supplier Aurend
date 7/12/2017
notes Some minor scratches in the surface. Call QA?

 

········ Running action Add row to Excel file

Opening Excel file C:\Users\Carlos\Desktop\order_defeccts.xlsx

Adding cell ID: AU32842

Adding cell supplier: Aurend

Adding cell date: 7/12/2017

Adding cell notes: Some minor scratches in the surface. Call QA?

 

Set 4 of 4

ID LT31325
supplier Electrova
date 6/12/2017
notes Reference 4587459 is missing

········ Running action Add row to Excel file

Opening Excel file C:\Users\Carlos\Desktop\order_defeccts.xlsx

Adding cell ID: LT31325

Adding cell supplier: Electrova

Adding cell date: 6/12/2017

Adding cell notes: Reference 4587459 is missing

 

········ Finished