Extracting attachments from Outlook .msg files using ColdFusion - java

I am building a system where intranet users are allowed to drag and drop files into a div on our ColdFusion site, which after some validation will then automatically upload them to a file server. One of my requirements is: when the file which was uploaded is a .msg file (Outlook Email), extract any files which are attachments to that email and upload them individually. This is possible using the org.apache.poi.hsmf.MAPIMessage Java object.
With the following code I am able to see each attachment object listed out. I can then get their filenames and extensions and save each one to the local file system.
However, this does not work if the attachment is another .msg file. When I call getEmbeddedAttachmentObject() on an attached .msg file, it returns an object which contains only "undefined". Non .msg files return a binary object which I can then pass into the FileWrite() ColdFusion function. Further examination of the MAPIMessage object shows that it has a write() method, but upon calling it I get an error stating:
Note - writing is not yet supported for this file format, sorry.
This is backed up by the documentation on http://poi.apache.org as well.
To summarize, I can write each email message attachment to the file system without a problem, unless the attachment is another email message. Am I out of luck or is there another way to accomplish this?
<cfscript>
// Load test .msg into MAPIMessage object
MAPIMessage = createObject("java", "org.apache.poi.hsmf.MAPIMessage");
message = MAPIMessage.init('C:\Test\Test Email 1 Attachment.msg');
// Get array of attached files
attachments = message.getAttachmentFiles();
// If attachments were found
if(arrayLen(attachments) > 0) {
// Loop over each attachment
for (i=1; i LTE arrayLen(attachments); i++) {
// Dump the current attachment object
writeDump( attachments[i] );
// Get current attachment's binary data
local.data=attachments[i].getEmbeddedAttachmentObject();
// Dump binary data
writeDump( local.data );
// Get attachment's filename and extension
attachmentFileName = attachments[i].attachLongFileName.toString();
attachmentExtension = attachments[i].attachExtension.toString();
// Dump filename and extension
writeDump( attachmentFileName );
writeDump( attachmentExtension );
// Write attachment to local file system
FileWrite("#expandPath('/')##attachments[i].attachLongFileName.toString()#", local.data);
}
}
</cfscript>

After much research I found a solution to my problem. I was not able to save an embedded msg file using org.apache.poi.hsmf.MAPIMessage java object which ships with ColdFusion due to the not yet implemented write() method. Instead, I used a 3rd party tool called Aspose.Email for Java
Aspose is a paid product, and is the only way that I was able to accomplish what I needed to do.
Here is my implementation. This does everything I need it to.
local.msgStruct.attachments = [];
// Create MapiMessage from the passed in .msg file
MapiMessage = createObject("java", "com.aspose.email.MapiMessage");
message = MapiMessage.fromFile(ARGUMENTS.msgFile);
// Get attachments
attachments = message.getAttachments();
numberOfAttachments = attachments.size();
// If attachments exist
if(numberOfAttachments > 0) {
// Loop over attachments
for ( i = 0; i LT numberOfAttachments; i++) {
// Get current Attachment
currentAttachment = attachments.get_Item(i);
// Create struct of attachment info
local.attachmentInfo = {};
local.attachmentInfo.fileName = currentAttachment.getLongFileName();
local.attachmentInfo.fileExtension = currentAttachment.getExtension();
// If an attachmentDestination was specified
if(ARGUMENTS.attachmentDestination NEQ ''){
// Ignore inline image attchments (mostly email signature images)
if( NOT (left(local.attachmentInfo.fileName, 6) EQ 'image0' AND local.attachmentInfo.fileExtension EQ '.jpg') ){
// Get attachment object data (only defined for Outlook Messages, will return undefined object for other attachment types)
attachmentObjectData = currentAttachment.getObjectData();
// Check if attachment is an outlook message
if( isDefined('attachmentObjectData') AND attachmentObjectData.isOutlookMessage()){
isAttachmentOutlookMessage = 'YES';
} else {
isAttachmentOutlookMessage = 'NO';
}
////////////////////////////
// ATTACHMENT IS AN EMAIL //
////////////////////////////
if( isAttachmentOutlookMessage ){
// Get attachment as a MapiMessage
messageAttachment = currentAttachment.getObjectData().toMapiMessage();
// If an attachmentDestination was specified
if(ARGUMENTS.attachmentDestination NEQ ''){
// Set file path
local.attachmentInfo.filePath = ARGUMENTS.attachmentDestination;
// Set file path and file name
local.attachmentInfo.filePathAndFileName = ARGUMENTS.attachmentDestination & local.attachmentInfo.fileName;
// Save attachment to filesystem
messageAttachment.save(local.attachmentInfo.filePathAndFileName);
}
////////////////////////////////
// ATTACHMENT IS NOT AN EMAIL //
////////////////////////////////
} else {
// If an attachment destination was specified
if(ARGUMENTS.attachmentDestination NEQ ''){
// Set file path
local.attachmentInfo.filePath = ARGUMENTS.attachmentDestination;
// Set file path and file name
local.attachmentInfo.filePathAndFileName = ARGUMENTS.attachmentDestination & local.attachmentInfo.fileName;
// Save attachment to filesystem
currentAttachment.save(local.attachmentInfo.filePathAndFileName);
}
}
// Verify that the file was saved to the file system
local.attachmentInfo.savedToFileSystem = fileExists(ARGUMENTS.attachmentDestination & local.attachmentInfo.fileName);
// Add attachment info struct to array
arrayAppend(local.msgStruct.attachments,local.attachmentInfo);
} // End ignore inline image attachments
} // End loop over attachments
} // End if attachments exist

Related

parse outlook emails using outlook-message-parser library

I am trying to load emails from INBOX from remote mailbox and parse them to extract attachments and converted body in HTML format.
I use the below code snippet to parse using outlook message parser jar
ResultSuccess insertMessage(Message currentMsg) {
final OutlookMessageParser msgp = new OutlookMessageParser();
final OutlookMessage msg = parseMsg(currentMsg.getInputStream());
}
and the currentMsg is of Type javax.mail.Message
Code snippet of getting emails from server is as follows
Properties props = new Properties();
Message currentMessage;
Session session = Session.getInstance(props, null);
session.setDebug(debug);
store = session.getStore(PROTOCOL);
store.connect(host, username, password);
Message message[] = inboxfolder.getMessages();
Message copyMessage[] = new Message[1];
int n = message.length;
for (int j = 0; j < n; j++) {
currentMessage = message[j];
ResultSuccess result = insertMessage(currentMessage);
Exception details are as follows
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
org.apache.poi.poifs.filesystem.NotOLE2FileException: Invalid header signature; read 0x615F3430305F2D2D, expected 0xE11AB1A1E011CFD0 - Your file appears not to be a valid OLE2 document
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:151)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:117)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:285)
at org.simplejavamail.outlookmessageparser.OutlookMessageParser.parseMsg(OutlookMessageParser.java:133)
at com.email.Email_Parse.loadMessages(Email_Parse.java:38)
at com.email.Email_Parse.getMessages(Email_Parse.java:116)
at com.email.Email_Parse.main(Email_Parse.java:26)
However the issue doesn't occur when I try to load emails from local disk and parse them.
Any idea on how to resolve the issue?
I suppose you're using outlook-message-parser to parse the emails stored on disk.
Messages retrieved from the mail server are not in the Outlook file format (even if the remote server is an Microsoft Exchange server or Microsoft's Outlook email service) so outlook-message-parser won't be able to parse them.
You should use the JavaMail Api to retrieve the body of the message and its attachments.
This page has a description (with a few examples) of the steps needed to read a message with attachments. Here is an excerpt :
Q: How do I read a message with an attachment and save the
attachment?
A: As described above, a message with an attachment is
represented in MIME as a multipart message. In the simple case, the
results of the Message object's getContent method will be a
MimeMultipart object. The first body part of the multipart object wil
be the main text of the message. The other body parts will be
attachments. The msgshow.java demo program shows how to traverse all
the multipart objects in a message and extract the data of each of the
body parts. The getDisposition method will give you a hint as to
whether the body part should be displayed inline or should be
considered an attachment (but note that not all mailers provide this
information). So to save the contents of a body part in a file, use
the saveFile method of MimeBodyPart.
To save the data in a body part into a file (for example), use the
getInputStream method to access the attachment content and copy the
data to a FileOutputStream. Note that when copying the data you can
not use the available method to determine how much data is in the
attachment. Instead, you must read the data until EOF. The saveFile
method of MimeBodyPart will do this for you. However, you should not
use the results of the getFileName method directly to name the file to
be saved; doing so could cause you to overwrite files unintentionally,
including system files.
Note that there are also more complicated cases to be handled as well.
For example, some mailers send the main body as both plain text and
html. This will typically appear as a multipart/alternative content
(and a MimeMultipart object) in place of a simple text body part.
Also, messages that are digitally signed or encrypted are even more
complex. Handling all these cases can be challenging. Please refer to
the various MIME specifications and other resources listed on our main
page.
Emails are not always in html, sometimes they are just plain text. Most of the time they are "multipart". For example, an email can have an html part that will be displayed by email clients that support html (gmail, thunderbird ...) and another plain text part that can be used by other email clients that can't display html (think text-based email clients).
So before dumping the content of an email you have to check its content type (or if it has multiple part, check the content type of the parts).
For the html parts, dumping the content verbatim can give you the desired result depending on how images are referenced.
If an image is referenced using an http URL (like <img src="https://example.com/a.png"/>) no further work is necessary to display the result in a browser.
If an image is referenced using a Content-Id URL (like <img src="cid:image002.gif#01D44EB0.904DB790"/>) then you have to do extra work to be able to display the result correctly in a browser.
You have to look for the correct image in the email parts and decide how to include it in the final result.
For example, save it to disk and replace the reference in the html with its path on the disk so that <img src="cid:image002.gif#01D44EB0.904DB790"/> becomes something like this <img src="/path/to/saved/images/imagexyz.png"/>
Or convert it to base64 format and replace the reference in the html with a data URI so that <img src="cid:image002.gif#01D44EB0.904DB790"/> becomes something like this <img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="/>.
I don't know if there is a java library that can do this automatically.
The JavaMail api website provides samples that you can read to learn how to use it. You can check msgshow.java from the samples to see how you can use the api to retrieve the content of a message.
Here is a simple example program that downloads the last message from a gmail inbox to a local directory (it may have bugs. don't forget to put your own account and password and replace "/tmp/messages" with a valid directory on your computer).
import javax.mail.*;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.Properties;
public class MessageDownloader {
private File destDir;
public MessageDownloader(File destDir){
this.destDir = destDir;
}
public void download(Part message, String basename) throws MessagingException, IOException {
System.out.println("Type : " + message.getContentType());
if(message.isMimeType("text/plain")) {
downloadTextPart((String) message.getContent(), basename + ".txt");
}else if(message.isMimeType("text/html")) {
downloadTextPart((String) message.getContent(), basename + ".html");
}else if(message.isMimeType("image/*") || Part.ATTACHMENT.equalsIgnoreCase(message.getDisposition())){
downloadDataPart(message, basename);
}else if(message.isMimeType("multipart/*")){
downloadMultiPart((Multipart) message.getContent(), basename);
}else{
System.out.println("Unrecognized type");
}
}
private void downloadDataPart(Part dataPart, String basename) throws IOException, MessagingException {
File dataFile = new File(destDir, basename + "_" + dataPart.getFileName());
Files.copy(dataPart.getInputStream(), dataFile.toPath());
}
private void downloadTextPart(String textContent, String filename) throws MessagingException, IOException{
File textFile = new File(destDir, filename);
Files.writeString(textFile.toPath(), textContent);
}
private void downloadMultiPart(Multipart multiPartMessage, String basename) throws MessagingException, IOException {
for(int partIdx = 0; partIdx < multiPartMessage.getCount(); partIdx++){
BodyPart part = multiPartMessage.getBodyPart(partIdx);
download(part, String.format("%s_%d_", basename, partIdx));
}
}
public static void main(String[] args) throws MessagingException, IOException {
Store store = getStore();
Folder folder = store.getFolder("Inbox");
folder.open(Folder.READ_ONLY);
MessageDownloader msgDownloader = new MessageDownloader(new File("/tmp/messages"));
Message lastMessage = folder.getMessage(folder.getMessageCount()-1);
msgDownloader.download(lastMessage, "last_message");
folder.close();
store.close();
}
private static Store getStore() throws MessagingException {
Properties props = new Properties();
props.setProperty("mail.smtp.ssl.enable", "true");
Session session = Session.getInstance(props, null);
Store store = session.getStore("imaps");
store.connect("imap.gmail.com", "account#gmail.com","password");
return store;
}
}

How to get the content length(aka file size) from the MediaHttpDownloader? drive file.getSize() always return null

I want to download a file from Gdrive. So I m using drive API v3 (java).
here the code snippet.
Drive.Files.Get request = driveService.files().get(fileId);
// Get the file metadata
File downloadFile = request.execute();
Long fileSize = downloadFile.getSize(); // always returns null.
OutputStream out = new FileOutputStream(fileName);
request.getMediaHttpDownloader().setProgressListener(
new DriveDownloadProgressListener(messageQueue));
request.executeMediaAndDownloadTo(out);
here downloadFile.getSize() always returns null.
Is there any way to get the file size from the MediaHttpDownloader?
I have to add required fields myself in the request.
like this
Drive.Files.Get request = driveService.files().get(fileId).setFields("size");
File file = request.execute(); // contains only size field.Other fields will be empty
then execute and get the expected response(It only send fields setted by yourself)
For example, if you need name and size of the file.
Drive.Files.Get request = driveService.files().get(fileId).setFields("name,size"); // name and size
File file = request.execute(); // contains only name and size field.Other fields will be empty(respective get methods will return null only)

Domino app - how to access the source document in java during custom file attachment routine

This is a non-xpages application.
I have inherited some code that I need to tweak....this code is used in a drag&drop file attachment subform. Normally, this will create a document in a separate dedicated .nsf that stores only attachments, and uses the main document's universalid as a reference to link the two....I need to change what the reference is to the value in a field already on the main document (where the subform is).
Java is challenging to me, but all I need to do is GET the value of the field from the main document (which has not necessarily been saved yet) and write that string value onto the attachment doc in that storage database, so I think I am just needing help with one line of code.
I will paste the relevant function here and hopefully someone can tell me how I get that value, or what else they need to see what is going on here.
You can see my commented-out attempt to write the field 'parentRef' in this code
...
private void storeUploadedFile( UploadedFile uploadedFile, Database dbTarget) {
File correctedFile = null;
RichTextItem rtFiles = null;
Document doc = null;
String ITEM_NAME_FILES = "file";
try {
if (uploadedFile==null) {
return;
}
doc = dbTarget.createDocument();
doc.replaceItemValue("form", "frmFileUpload");
doc.replaceItemValue("uploadedBy", dbTarget.getParent().getEffectiveUserName() );
Utils.setDate(doc, "uploadedAt", new Date() );
doc.replaceItemValue("parentUnid", parentUnid);
//doc.replaceItemValue("parentRef", ((Document) dbTarget.getParent()).getItemValue("attachmentDocKey"));
//get uploaded file and attach it to the document
fileName = uploadedFile.getClientFileName();
File tempFile = uploadedFile.getServerFile(); //the uploaded file with a cryptic name
fileSize = tempFile.length();
targetUnid = doc.getUniversalID();
correctedFile = new java.io.File( tempFile.getParentFile().getAbsolutePath() + java.io.File.separator + fileName );
//rename the file on the OS so we can embed it with the correct (original) name
boolean success = tempFile.renameTo(correctedFile);
if (success) {
//embed original file in target document
rtFiles = doc.createRichTextItem(ITEM_NAME_FILES);
rtFiles.embedObject(lotus.domino.EmbeddedObject.EMBED_ATTACHMENT, "", correctedFile.getAbsolutePath(), null);
success = doc.save();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
com.gadjj.Utils.recycle(rtFiles, doc);
try {
if (correctedFile != null) {
//rename the temporary file back to its original name so it's automatically
//removed from the os' file system.
correctedFile.renameTo(uploadedFile.getServerFile());
}
} catch(Exception ee) { ee.printStackTrace(); }
}
}
}
...
dbTarget.getParent does not do what you think it does. It returns a Session object that is the parent session containing all your objects. Casting it to (Document) won't give you your main document.
I don't see the declaration for it, but you appear to have a variable available called parentUNID. You can use it to get a handle on the main document.
You need to use the parentUNID value in a call to getDocumentByUNID() in order to retrieve the Document object representing your main document. But in order to do that, you need the Database object for the nsf file containing the main document, and if I understand you correctly, that is a different database than targetDb.
I'm going to have to assume that you already have that Database object in a variable called parentDb, or that you know the path to the NSF and can open it. In either case, your code would look like this (without error handling):
Document parentDoc = parentDb.getDocumentByUNID(parentUNID);
doc.replaceItemvalue("parentRef", parentDoc.getItemValue("attachmentDocKey"));

How to retrieve the Content-Type and the Content-Disposition from an outlook msg file using Apache POI-HSMF?

I need to write a Java program to extract all attachments from messages saved by Outlook 2016 in the native msg format. The program should skip inline images. Also some of the mails have multipart/alternative parts where the program should retrieve the "best" content-type, e.g. text/html over text/plain.
In order to do that, I need to find out the content-type and content-disposition of all parts and attachments of the message.
I tried the following:
public static void main(String[] args) throws IOException {
String mfile = "test/test2.msg";
MAPIMessage msg = new MAPIMessage(mfile);
AttachmentChunks[] attachments = msg.getAttachmentFiles();
if (attachments.length > 0) {
for (AttachmentChunks attachment : attachments) {
System.out.println("long file name = " + attachment.getAttachLongFileName());
System.out.println("content id = " + attachment.getAttachContentId());
System.out.println("mime tag = " + attachment.getAttachMimeTag());
System.out.println("embedded = " + attachment.isEmbeddedMessage());
}
}
msg.close();
}
The problem is, that the "mime tag" (i.e. the content-type) is returned only for some attachments and returns null for all others. The content-disposition seems to be totally missing.
For example, I get the following output on a mail saved by OL2016 (the mail contains a PDF attachment and an inline logo image):
long file name = Vertretungsvollmacht Übersiedlung.pdf
content id = null
mime tag = null
embedded = false
long file name = image001.jpg
content id = image001.jpg#01D2E697.12EC9370
mime tag = image/jpeg
embedded = false
Is there a way to get these attributes out of the msg files or is there a more complete & convenient way to achieve what I want in Java with some other library than Apache POI-HSMF?
In order to get the content-disposition (inline or attachment), I did the following:
String disposition = "attachment";
if (contentId != "")
if (body.contains(contentId.toString()))
disposition = "inline";
To obtain the content-type, I have derived it from the file extension of the attachment, e.g.:
String ext = fileNameOri.substring(fileNameOri.lastIndexOf(".") + 1);
switch (ext.toLowerCase()) {
case "xlsx":
ct = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
break;
}
A list of mime types can be obtained from e.g. https://wiki.selfhtml.org/wiki/MIME-Type/%C3%9Cbersicht
Of course, this should only be done in case AttachmentChunks.getAttachMimeTag() returns an empty string.
The fact that an attachment has a content-id tag does not mean it is an embedded image - Lotus Notes adds content-id to all attachments. The only valid check is to load the HTML body and figure out what the <img> tags refer to.

read a zip file without saving it first

I have an API call returning binary compressed data in .zip format. I don't want to first save the file to disk and then unzip. I want to unzip on the fly in memory. And then parse the xml content.
Is there a way to do this with cfzip or via java directly? Basically editing this code I found on cflib.
<cffunction name="ungzip"
returntype="any"
displayname="ungzip"
hint="decompresses a binary|(base64|hex|uu) using the gzip algorithm; returns string"
output="no">
<cfscript>
var bufferSize=8192;
var byteArray = createObject("java","java.lang.reflect.Array")
.newInstance(createObject("java","java.lang.Byte").TYPE,bufferSize);
var decompressOutputStream = createObject("java","java.io.ByteArrayOutputStream").init();
var input=0;
var decompressInputStream=0;
var l=0;
if(not isBinary(arguments[1]) and arrayLen(arguments) is 1) return;
if(arrayLen(arguments) gt 1){
input=binaryDecode(arguments[1],arguments[2]);
}else{
input=arguments[1];
}
decompressInputStream = createObject("java","java.util.zip.GZIPInputStream")
.init(createObject("java","java.io.ByteArrayInputStream")
.init(input));
l=decompressInputStream.read(byteArray,0,bufferSize);
while (l gt -1){
decompressOutputStream.write(byteArray,0,l);
l=decompressInputStream.read(byteArray,0,bufferSize);
}
decompressInputStream.close();
decompressOutputStream.close();
return decompressOutputStream.toString();
</cfscript>
</cffunction>

Categories