email parsing automation

Email Parser

Extract data from emails and automate your workflow

The forum is now read only. Please, go to the the main Email Parser website if you need help.
Post here if you experience problems or get unexpected errors.
I had a script running on my notebook and it worked okay. The Email source was configured to use IMAP but on another location the IMAP ports are blocked. So I changed the email source setting to POP3 instead of IMAP.

After I did that, email could be fetched again but now it seems to get HTML based text because I see a lot of HTML markup tags so the parser code doesn't understand it anymore.

The emails I get are multipart and contain HTML and plain text.

How can I FIX the Email source to always use either plain text or HTML?? :?

Maurice
Hello Maurice. Please, try the latest version available in the download section. I have just released a new version and this problem has been fixed.
Hi,

I have downloaded and installed the new version. Does it read the incoming emails in HTML or plain text? Can you explain how emails are 'read' before it's sent to the parser?

Thanks for the quick solution!

Maurice
Hi Maurice,

Usually (but not always) email clients and web-based email services create two versions of the same email body, one in HTML and the other in plain text. Email&Parser sends to the parser the plain-text version but, if no plain-text version is found, it tries to convert the HTML body to plain-text.

The last version of Email&Parser (2.0.1) was sending to the parser the HTML as is , without converting it to plain-text when no plain-text version of the body was found. This is the part that I have fixed (among others) in the latest release (2.0.2)
Aha, that makes sense! Thank you very much for this explanation!!!
I downloaded the regular version on your website and tried the program out using outlook folders. It works great but I had two questions:

1) Is there anyway to process the original source of the email? I care more about the html tags ( need to extract <href> , <a> and <table> ) than I do the text version of the email. Because I will be parsing urls/hrefs from the messages.

I know the current version tries to alter emails that are not multipart and just html to plain text email. But the data I am looking to parse is really in the html original . I believe you mentioned a previous version did not try to alter html emails. I actually would prefer just the html source as its what I need most and not the text version of the multi-part emails.

Would you happen to have a older version that will just pick up the html and ignore the text area. Just show and parse through the raw html source of the message?

2) I also want to retrieve additional header fields and not just the 7 basic header fields? ( From to subject etc. )

For instance I would like to extract Headers such as:
Received: from 127.0.0.1 (EHLO mail-pz0-f45.google.com) (209.85.210.45)
Message-ID:
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=000e0cd242f806a22004ab96e5ef
Content-Length: 20920

Thanks any help would be greatly appreciated.
Let me know if either of these is possible.
Regarding the question number 1 you can work directly with the HTML body starting with the version 3.2 Beta. You can get it from here:

http://www.automatedemailparser.com/Ema ... _setup.msi

And regarding the second question, at this moment this is not possible with the program. Email headers are only handled internally and not available within the parsing process. But, if more people asks for it we will consider inluding it in new releases.