I was looking at that javamail faqs, I was looking at this snippet which is supposed to extract the body of the email:
private boolean textIsHtml = false;
/**
* Return the primary text content of the message.
*/
private String getText(Part p) throws
MessagingException, IOException {
if (p.isMimeType("text/*")) {
String s = (String)p.getContent();
textIsHtml = p.isMimeType("text/html");
return s;
}
if (p.isMimeType("multipart/alternative")) {
// prefer html text over plain text
Multipart mp = (Multipart)p.getContent();
String text = null;
for (int i = 0; i < mp.getCount(); i++) {
Part bp = mp.getBodyPart(i);
if (bp.isMimeType("text/plain")) {
if (text == null)
text = getText(bp);
continue;
} else if (bp.isMimeType("text/html")) {
String s = getText(bp);
if (s != null)
return s;
} else {
return getText(bp);
}
}
return text;
} else if (p.isMimeType("multipart/*")) {
Multipart mp = (Multipart)p.getContent();
for (int i = 0; i < mp.getCount(); i++) {
String s = getText(mp.getBodyPart(i));
if (s != null)
return s;
}
}
return null;
}
Now the code can be refactored to the following version which is basically less lines of code:
private static String getText(Part message) {
String text = null;
try {
if (message.isMimeType("text/*")) {
text = (String) message.getContent();
}
if (message.isMimeType("multipart/alternative") || message.isMimeType("multipart/*")) {
Multipart multiPart = (Multipart) message.getContent();
Part bodyPart = multiPart.getBodyPart(multiPart.getCount() - 1);
text = getText(bodyPart);
}
} catch (Exception e) {
logger.error(e.getMessage());
}
return text;
}
My question is, why the old code looping through the parts for both multipart/alternative and multipart/* messages? Am I missing something here?
Update:
Just saw Jon's comment, I have a further question, is there any scenario where my version of the code will break?
Basically there are many Multipart types and they all need to handled uniquely:
Mixed Subtype
The "mixed" subtype of "multipart" is intended for use when the body
parts are independent and need to be bundled in a particular order.
Any "multipart" subtypes that an implementation does not recognize
must be treated as being of subtype "mixed".
Alternative Subtype
The "multipart/alternative" type is syntactically identical to
"multipart/mixed", but the semantics are different. In particular,
each of the body parts is an "alternative" version of the same
information.
Systems should recognize that the content of the various parts are interchangeable. Systems should choose the "best" type based on the local environment and references, in some cases even through user interaction. As with "multipart/mixed", the order of body parts is significant. In this case, the alternatives appear in an order of increasing faithfulness to the original content.
In general, the best choice is the LAST part of a type supported by the recipient system's local environment.
"Multipart/alternative" may be used, for example, to send a message
in a fancy text format in such a way that it can easily be displayed
anywhere:
From: Nathaniel Borenstein <nsb#bellcore.com>
To: Ned Freed <ned#innosoft.com>
Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42
--boundary42
Content-Type: text/plain; charset=us-ascii
... plain text version of message goes here ...
--boundary42
Content-Type: text/enriched
... RFC 1896 text/enriched version of same message
goes here ...
--boundary42
Content-Type: application/x-whatever
... fanciest version of same message goes here ...
--boundary42--
In this example, users whose mail systems understood the
"application/x-whatever" format would see only the fancy version,
while other users would see only the enriched or plain text version,
depending on the capabilities of their system.
Your code won't "work" (whatever that means to you) with a multipart/mixed message where the last attachment is of type text/*. Yes, attachments can be of type text/*.
Related
I need to write a Java program to extract all attachments from messages saved by Outlook 2016 in the native msg format. The program should skip inline images. Also some of the mails have multipart/alternative parts where the program should retrieve the "best" content-type, e.g. text/html over text/plain.
In order to do that, I need to find out the content-type and content-disposition of all parts and attachments of the message.
I tried the following:
public static void main(String[] args) throws IOException {
String mfile = "test/test2.msg";
MAPIMessage msg = new MAPIMessage(mfile);
AttachmentChunks[] attachments = msg.getAttachmentFiles();
if (attachments.length > 0) {
for (AttachmentChunks attachment : attachments) {
System.out.println("long file name = " + attachment.getAttachLongFileName());
System.out.println("content id = " + attachment.getAttachContentId());
System.out.println("mime tag = " + attachment.getAttachMimeTag());
System.out.println("embedded = " + attachment.isEmbeddedMessage());
}
}
msg.close();
}
The problem is, that the "mime tag" (i.e. the content-type) is returned only for some attachments and returns null for all others. The content-disposition seems to be totally missing.
For example, I get the following output on a mail saved by OL2016 (the mail contains a PDF attachment and an inline logo image):
long file name = Vertretungsvollmacht Übersiedlung.pdf
content id = null
mime tag = null
embedded = false
long file name = image001.jpg
content id = image001.jpg#01D2E697.12EC9370
mime tag = image/jpeg
embedded = false
Is there a way to get these attributes out of the msg files or is there a more complete & convenient way to achieve what I want in Java with some other library than Apache POI-HSMF?
In order to get the content-disposition (inline or attachment), I did the following:
String disposition = "attachment";
if (contentId != "")
if (body.contains(contentId.toString()))
disposition = "inline";
To obtain the content-type, I have derived it from the file extension of the attachment, e.g.:
String ext = fileNameOri.substring(fileNameOri.lastIndexOf(".") + 1);
switch (ext.toLowerCase()) {
case "xlsx":
ct = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
break;
}
A list of mime types can be obtained from e.g. https://wiki.selfhtml.org/wiki/MIME-Type/%C3%9Cbersicht
Of course, this should only be done in case AttachmentChunks.getAttachMimeTag() returns an empty string.
The fact that an attachment has a content-id tag does not mean it is an embedded image - Lotus Notes adds content-id to all attachments. The only valid check is to load the HTML body and figure out what the <img> tags refer to.
Please help me to read the UNICODE characters as it is from the properties file in java. For example : if I pass the key "Account.label.register" it should return to me as "\u5BC4\u5B58\u5668" but not its character representation like "寄存器" . Here is my sample properties file
file_ch.properties
Account.label.register = \u5BC4\u5B58\u5668
Account.label.login = \u767B\u5F55
Account.label.username = \u7528\u6237\u540D
Account.label.password = \u5BC6\u7801
Thank you.
Hi , I am reading properties file using the following java code
#Override
public ResourceBundle getTexts(String bundleName) {
ResourceBundle myResources = null;
try {
myResources = ResourceBundle.getBundle(bundleName, getLocale());
} catch (Exception e) {
myResources = ResourceBundle.getBundle(getDefaultBundleKey(), getLocale());
}
return myResources;
}
Using the above approach it's ok fine, I am getting chinese characters. But for some of the ajax requests in my application I need to pass the chinese text in X-JSON header. Sample code is given below
HashMap<String, List<String>> map = new HashMap<String, List<String>>();
List<String> errors = new ArrayList<String>();
errors.add(str); /*ex: str = "无效的代码" , value taken from properties file through resource bundle*/
map.put("ERROR", errors);
JSONObject json = JSONObject.fromObject(map);
response.setCharacterEncoding("UTF-8");
response.setHeader("X-JSON", json.toString());
response.setStatus(500);
I am passing english for example str="Invalid Code" X-JSON header is carrying the information as it is. But if the str="无效的代码" (chinese or any other text) X-JSON header is carrying the text as empty like below is the response I am getting
response :
connection:close
Content-Encoding:gzip
Content-Type:text/html;charset=UTF-8
Date:Wed, 08 Jun 2016 10:17:43 GMT
Server:Apache-Coyote/1.1
Transfer-Encoding:chunked
Vary:Accept-Encoding
X-JSON:{"ERROR":["Invalid Code"]}
However if the "error" contains "chinese" text for ex:"无效的代码"
response :
connection:close
Content-Encoding:gzip
Content-Type:text/html;charset=UTF-8
Date:Wed, 08 Jun 2016 10:17:43 GMT
Server:Apache-Coyote/1.1
Transfer-Encoding:chunked
Vary:Accept-Encoding
**X-JSON:{"ERROR":[" "]}** /*expecting the response X-JSON:{"ERROR":["无效的代码"]}*/
As the chinese text is coming as empty , I thought of sending unicode through X-JSON header like below
{"ERROR":["\u65E0\u6548\u7684\u4EE3\u7801"]}
After that want to parse the Unicode characters using Javascript code after evaluating X-JSON header like below
var json;
try {
json = xhr.getResponseHeader('X-Json');
} catch (e) {
alert(e);
}
if (json) {
var data = eval('(' + json + ')');
decodeMsg(data);
}
function decodeMsg(message) {
var mssg = message;
var r = /\\u([\d\w]{4})/gi;
mssg = mssg.replace(r, function (match, grp) {
return String.fromCharCode(parseInt(grp, 16)); } );
mssg = unescape(mssg);
return mssg;
}
Please give suggestions. Thank you.
Update of answer:
The original encoding of .properties was in Latin-1, ISO-8859-1 (éö).
This needed u-escaping for the full Unicode range of characters.
However the newer java versions try UTF-8 first. So you can keep the .properties file in UTF-8! Which is a tremendous improvement.
Original answer: .properties in ISO-8859-1 as of java 1.
The error is that in HTTP the header lines are in ISO-8859-1, basic Latin-1.
The solution there is to use %XX conversion of UTF-8 bytes (in this case).
However you are better served in case of JSON simply doing as you intended.
So you want to send u-escaped Unicode, using \uXXXX. As not only Java, but also JavaScript/JSON knows this convention, you only need this u-escaping in java on the server.
static String uescape(String s) {
StringBuilder sb = new StringBuilder(s.length() * 6);
for (int i = 0; i < chars.length; ++i) {
char ch = s.charAt(i);
if (ch < 128) {
sb.append(ch);
} else {
sb.append(String.format("\\u%04X", (int) ch));
}
}
return sb.toString();
}
errors.add(uescape(str));
This zero-pads every non-ASCII (>=128) char as 4 digit hex, the exact format.
Or use apache-commons StringEscapeUtils.escapeJava which also does quotes and \n and such - much safer.
Escape the backslashes in your properties file by doubling them:
Account.label.register = \\u5BC4\\u5B58\\u5668
Account.label.login = \\u767B\\u5F55
Account.label.username = \\u7528\\u6237\\u540D
Account.label.password = \\u5BC6\\u7801
I've created a custom command to retrieve multiple objects in the same request (in order to solve some performance issues), instead of using the folder method .getMessage(..) which in my case retrieved an ImapMessage object:
Argument args = new Argument();
args.writeString(Integer.toString(start) + ":" + Integer.toString(end));
args.writeString("BODY[]");
FetchResponse fetch;
BODY body;
MimeMessage mm;
ByteArrayInputStream is = null;
Response[] r = protocol.command("FETCH", args);
Response status = r[r.length-1];
if(status.isOK()) {
for (int i = 0; i < r.length - 1; i++) {
...
}
}
Currently I'm validating if the object is a ImapResponse like this:
if (r[i] instanceof IMAPResponse) {
IMAPResponse imr = (IMAPResponse)r[i];
My question is, how can I turn this response into an ImapMessage?
Thank you.
Are you trying to download the entire message content for multiple messages at once? Have you tried using IMAPFolder.FetchProfileItem.MESSAGE? That will cause Folder.fetch to download the entire message content, which you can then access using the Message objects.
I haven't succeeded yet to convert it into a IMAPMessage but I'm now able transform it into a MIME Message. It isn't perfect but I guess it will have to work for now:
FetchResponse fetch = (FetchResponse) r[i];
BODY body = (BODY) fetch.getItem(0);
ByteArrayInputStream is = body.getByteArrayInputStream();
MimeMessage mm = new MimeMessage(session, is);
Then, it can be used to get information like this:
String contentType = mm.getContentType();
Object contentObject = mm.getContent();
There are also other methods to get information like the sender, date, etc.
I am currently working with java mail api . I need to list the attachment details also wants remove the attachment from some emails and forward it to others. So i'm trying to find out the Attachment ID. How can i do it? Any suggestion will be appreciate!!!
Does this help?
private void getAttachments(Part p, File inputFolder, List<String> fileNames) throws Exception{
String disp = p.getDisposition();
if (!p.isMimeType("multipart/*") ) {
if (disp == null || (disp != null && (disp.equalsIgnoreCase(Part.ATTACHMENT) || disp.equalsIgnoreCase(Part.INLINE)))) {
String fileName = p.getFileName();
File opFile = new File(inputFolder, fileName);
((MimeBodyPart) p).saveFile(opFile);
fileNames.add(fileName);
}
}
}else{
Multipart mp = (Multipart) p.getContent();
int count = mp.getCount();
for (int i = 0; i < count; i++){
getAttachments(mp.getBodyPart(i),inputFolder, fileNames);
}
}
}
There ain't anything as an attachment ID. What your mail client displays as a message with attached contents, is really a MIME Multipart and looks like this (sample source):
From: John Doe <example#example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="XXXXboundary text"
This is a multipart message in MIME format.
--XXXXboundary text
Content-Type: text/plain
this is the body text
--XXXXboundary text
Content-Type: text/plain;
Content-Disposition: attachment; filename="test.txt"
this is the attachment text
--XXXXboundary text--
Important things to note:
Every part in a multipart has a Content-Type
Optionally, there can be a Content-Disposition header
Single parts can be themselves multipart
Note that there is indeed a Content-ID header, but I don't think it's what you are looking for: for example, it is used in multipart/related messages to embed image/*s and text from a text/html in the same email message. You have to understand how it works and if it's used in your input.
I think your best option is to examine the Content-Disposition and the Content-Type header. The rest is guesswork, and without actual requirement one can't help with the code.
Try using the Apache Commons Email package which has a MimeMessageParser class. With the parser you can get the content id (which could be used to identify the attachment) and attachments from the email message like so:
Session session = Session.getInstance(new Properties());
ByteArrayInputStream is = new ByteArrayInputStream(rawEmail.getBytes());
MimeMessage message = new MimeMessage(session, is);
MimeMessageParser parser = new MimeMessageParser(message);
// Once you have the parser, get the content ids and attachments:
List<DataSource> attachments = parser.getContentIds.stream
.map(id -> parser.findAttachmentByCid(id))
.filter(att -> att != null)
.collect(Collectors.toList());
I have created a list here for the sake of brevity, but instead, you could create a map with the contentId as the key and the DataSource as the value.
Take a look at some more examples for using the parser in java here, or some code I wrote for a scala project here.
I have a Grails app that includes a Java-based Spring Integration driven email adapter. The email adapter processes emails from a single source and, based on business rules, reports certain communications back to the user by updating some internal tables, including adding the HTML body of the email to an Oracle CLOB for reference.
About half the time, the links in the HTML are corrupted when they are added to the CLOB. For example, "=df" is interpretted as Unicode U+00DF, and converted to a "ß" (LATIN SMALL LETTER SHARP S) and "=20" is converted to a space. Both of these unexpected mappings break the links.
http://www.mycompany.com/MyProject/MyApp.xxx?field1=dfa1.0&field2=2.0&field3=20012345&field4=N
http://www.mycompany.com/MyProject/MyApp.xxx?field1ßa1.0&field2=2.0&field3 012345&field4=N
This corruption doesn't happen all the time and I haven't been able to identify a pattern to when it happens.
This is the only code that "touches" the content of the HTML from the email...
public void processMessage(Message<?> message) {
if (message.getPayload() instanceof MimeMessage) {
MimeMessage mimeMessage = (MimeMessage) message.getPayload();
try {
String subject = mimeMessage.getSubject();
logger.info("Subject : " + subject);
// Get the main body of the message -- Assumes the email is in HTML format and
// uses that to isolate the interesting bits of the email to analyze
String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));
logger.info("Content Length (bytes) : " + content.length());
int htmlStart = content.indexOf(HTML_START);
int htmlEnd = content.lastIndexOf(HTML_END);
String html;
try {
html = content.substring(htmlStart, htmlEnd + HTML_END.length());
} catch (IndexOutOfBoundsException e) {
// Don't try and prune the string
html = content;
}
// Do the major processing of the actual HTML contents. This is where the magic happens.
processHtmlMessageContent(html);
} catch (MessagingException e) {
logger.error("Error in processing message:", e);
} catch (IOException e) {
logger.error("Error in processing message:", e);
}
} else {
logger.error("DON'T KNOW HOW TO PROCESS [" + message.getPayload().getClass() + "] MESSAGE");
}
logger.info("Done.");
}
I suspect the issue is in convertStreamToString or MimeUtility.decode, but I haven't been able to isolate it. I'm also not ruling out some strangeness when this is stored in a CLOB, but I find this less likely.
For reference, my convertStreamToString() method is...
protected String convertStreamToString(java.io.InputStream is) {
try {
return new java.util.Scanner(is).useDelimiter("\\A").next();
} catch (java.util.NoSuchElementException e) {
return "";
}
}
I tried changing...
String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));
to...
String content = convertStreamToString(mimeMessage.getDataHandler().getDataSource().getInputStream());
But now I've lost basic mime decoding.
I also tried using MimeUtility to get the encoding
String encoding = MimeUtility.getEncoding(mimeMessage.getDataHandler().getDataSource());
This returns 7bit and I've tried using that, but then I wind up with things like =3D for equals signs.
In the decoded content, I get the following, which indicates quoted-printable
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
I've looked through the javadocs, source and online examples, but this really isn't clicking for me.
The conversions you're seeing are exactly quoted-printable decoding, so I suspect you're trying to decode data that wasn't QP encoded in the first place. You should probably check the headers of the mimeMessage to decide what decoding you need to do, rather than just doing QP unconditionally.