Unexpected Unicode Translation in Grails/Spring Integration App

Unexpected Unicode Translation in Grails/Spring Integration App - java

I have a Grails app that includes a Java-based Spring Integration driven email adapter. The email adapter processes emails from a single source and, based on business rules, reports certain communications back to the user by updating some internal tables, including adding the HTML body of the email to an Oracle CLOB for reference.
About half the time, the links in the HTML are corrupted when they are added to the CLOB. For example, "=df" is interpretted as Unicode U+00DF, and converted to a "ß" (LATIN SMALL LETTER SHARP S) and "=20" is converted to a space. Both of these unexpected mappings break the links.
http://www.mycompany.com/MyProject/MyApp.xxx?field1=dfa1.0&field2=2.0&field3=20012345&field4=N
http://www.mycompany.com/MyProject/MyApp.xxx?field1ßa1.0&field2=2.0&field3 012345&field4=N
This corruption doesn't happen all the time and I haven't been able to identify a pattern to when it happens.
This is the only code that "touches" the content of the HTML from the email...
public void processMessage(Message<?> message) {
if (message.getPayload() instanceof MimeMessage) {
MimeMessage mimeMessage = (MimeMessage) message.getPayload();
try {
String subject = mimeMessage.getSubject();
logger.info("Subject : " + subject);
// Get the main body of the message -- Assumes the email is in HTML format and
// uses that to isolate the interesting bits of the email to analyze
String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));
logger.info("Content Length (bytes) : " + content.length());
int htmlStart = content.indexOf(HTML_START);
int htmlEnd = content.lastIndexOf(HTML_END);
String html;
try {
html = content.substring(htmlStart, htmlEnd + HTML_END.length());
} catch (IndexOutOfBoundsException e) {
// Don't try and prune the string
html = content;
}
// Do the major processing of the actual HTML contents. This is where the magic happens.
processHtmlMessageContent(html);
} catch (MessagingException e) {
logger.error("Error in processing message:", e);
} catch (IOException e) {
logger.error("Error in processing message:", e);
}
} else {
logger.error("DON'T KNOW HOW TO PROCESS [" + message.getPayload().getClass() + "] MESSAGE");
}
logger.info("Done.");
}
I suspect the issue is in convertStreamToString or MimeUtility.decode, but I haven't been able to isolate it. I'm also not ruling out some strangeness when this is stored in a CLOB, but I find this less likely.
For reference, my convertStreamToString() method is...
protected String convertStreamToString(java.io.InputStream is) {
try {
return new java.util.Scanner(is).useDelimiter("\\A").next();
} catch (java.util.NoSuchElementException e) {
return "";
}
}
I tried changing...
String content = convertStreamToString(MimeUtility.decode(mimeMessage.getDataHandler().getDataSource().getInputStream(), "quoted-printable"));
to...
String content = convertStreamToString(mimeMessage.getDataHandler().getDataSource().getInputStream());
But now I've lost basic mime decoding.
I also tried using MimeUtility to get the encoding
String encoding = MimeUtility.getEncoding(mimeMessage.getDataHandler().getDataSource());
This returns 7bit and I've tried using that, but then I wind up with things like =3D for equals signs.
In the decoded content, I get the following, which indicates quoted-printable
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
I've looked through the javadocs, source and online examples, but this really isn't clicking for me.

The conversions you're seeing are exactly quoted-printable decoding, so I suspect you're trying to decode data that wasn't QP encoded in the first place. You should probably check the headers of the mimeMessage to decide what decoding you need to do, rather than just doing QP unconditionally.

Related

Why iterate over the parts in a multipart email in javamail?

I was looking at that javamail faqs, I was looking at this snippet which is supposed to extract the body of the email:
private boolean textIsHtml = false;
/**
* Return the primary text content of the message.
*/
private String getText(Part p) throws
MessagingException, IOException {
if (p.isMimeType("text/*")) {
String s = (String)p.getContent();
textIsHtml = p.isMimeType("text/html");
return s;
}
if (p.isMimeType("multipart/alternative")) {
// prefer html text over plain text
Multipart mp = (Multipart)p.getContent();
String text = null;
for (int i = 0; i < mp.getCount(); i++) {
Part bp = mp.getBodyPart(i);
if (bp.isMimeType("text/plain")) {
if (text == null)
text = getText(bp);
continue;
} else if (bp.isMimeType("text/html")) {
String s = getText(bp);
if (s != null)
return s;
} else {
return getText(bp);
}
}
return text;
} else if (p.isMimeType("multipart/*")) {
Multipart mp = (Multipart)p.getContent();
for (int i = 0; i < mp.getCount(); i++) {
String s = getText(mp.getBodyPart(i));
if (s != null)
return s;
}
}
return null;
}
Now the code can be refactored to the following version which is basically less lines of code:
private static String getText(Part message) {
String text = null;
try {
if (message.isMimeType("text/*")) {
text = (String) message.getContent();
}
if (message.isMimeType("multipart/alternative") || message.isMimeType("multipart/*")) {
Multipart multiPart = (Multipart) message.getContent();
Part bodyPart = multiPart.getBodyPart(multiPart.getCount() - 1);
text = getText(bodyPart);
}
} catch (Exception e) {
logger.error(e.getMessage());
}
return text;
}
My question is, why the old code looping through the parts for both multipart/alternative and multipart/* messages? Am I missing something here?
Update:
Just saw Jon's comment, I have a further question, is there any scenario where my version of the code will break?

Basically there are many Multipart types and they all need to handled uniquely:
Mixed Subtype
The "mixed" subtype of "multipart" is intended for use when the body
parts are independent and need to be bundled in a particular order.
Any "multipart" subtypes that an implementation does not recognize
must be treated as being of subtype "mixed".
Alternative Subtype
The "multipart/alternative" type is syntactically identical to
"multipart/mixed", but the semantics are different. In particular,
each of the body parts is an "alternative" version of the same
information.
Systems should recognize that the content of the various parts are interchangeable. Systems should choose the "best" type based on the local environment and references, in some cases even through user interaction. As with "multipart/mixed", the order of body parts is significant. In this case, the alternatives appear in an order of increasing faithfulness to the original content.
In general, the best choice is the LAST part of a type supported by the recipient system's local environment.
"Multipart/alternative" may be used, for example, to send a message
in a fancy text format in such a way that it can easily be displayed
anywhere:
From: Nathaniel Borenstein <nsb#bellcore.com>
To: Ned Freed <ned#innosoft.com>
Date: Mon, 22 Mar 1993 09:41:09 -0800 (PST)
Subject: Formatted text mail
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=boundary42
--boundary42
Content-Type: text/plain; charset=us-ascii
... plain text version of message goes here ...
--boundary42
Content-Type: text/enriched
... RFC 1896 text/enriched version of same message
goes here ...
--boundary42
Content-Type: application/x-whatever
... fanciest version of same message goes here ...
--boundary42--
In this example, users whose mail systems understood the
"application/x-whatever" format would see only the fancy version,
while other users would see only the enriched or plain text version,
depending on the capabilities of their system.

Your code won't "work" (whatever that means to you) with a multipart/mixed message where the last attachment is of type text/*. Yes, attachments can be of type text/*.

Convert base64 string to image at server side in java

I have base64 String which I want to convert back to image irrespective of image format at server side. I tried it by using following code, image is getting created but when I am trying to preview it, showing error could not load image.
public void convertStringToImage(String base64) {
try {
byte[] imageByteArray = decodeImage(base64);
FileOutputStream imageOutFile = new FileOutputStream("./src/main/resources/demo.jpg");
imageOutFile.write(imageByteArray);
imageOutFile.close();
} catch (Exception e) {
logger.log(Level.SEVERE, "ImageStoreManager::convertStringToImage()" + e);
}
}
public static byte[] decodeImage(String imageDataString) {
return Base64.decodeBase64(imageDataString);
}
what should I do so that my image will look properly?

Your code looks fine. I can suggest however some more debugging steps for you.
Encode your file manually using, for example, this webpage
Compare if String base64 contains exact same content like you've got seen on the page. // if something wrong here, your request is corrupted, maybe some encoding issues on the frontend side?
See file content created under ./src/main/resources/demo.jpg and compare content (size, binary comparison) // if something wrong here you will know that actually save operation is broken
Remarks:
Did you try to do .flush() before close?
Your code in current form might cause resource leakage, have a look at try-with-resources

Try this:
public static byte[] decodeImage(String imageDataString) {
return org.apache.commons.codec.binary.Base64.decodeBase64(imageDataString.getBytes());
}

PDFBox returns isEncrypted true even if i can open file

I am using PDFBox to determine pdf file is password protected or not.
this is my code:
boolean isProtected = pdfDocument.isEncrypted();
My file properties is in sceenshot.
Here i am getting isProtected= true even i can open it without password.
Note: this file has Document Open password : No and permission password : Yes.

Your PDF has an empty user password and a non empty owner password. And yes, it is encrypted. This is being done to prevent people to do certain things, e.g. content copying.
It isn't a real security; it is the responsibility of the viewer software to take care that the "forbidden" operations aren't allowed.
You can find a longer (and a bit amusing) explanation here.
To see the document access permissions, use PDDocument.getCurrentAccessPermission().
In 2.0.*, a user will be able to view a file if this call succeeds:
PDDocument doc = PDDocument.load(file);
If a InvalidPasswordException is thrown, then it means a non empty password is required.

I am posting this answer because elsewhere on Stack Overflow and the web you might see the suggested way to check for a password protected PDF in PDFBox is to use PDDocument#isEncrypted(). The problem we found with this is that certain PDFs which did not prompt for a password were still being flagged as encrypted. See the accepted answer for one explanation of why this is happening, but in any case we used the followed pattern as a workaround:
boolean isPDFReadable(byte[] fileContent) {
PDDocument doc = null;
try {
doc = PDDocument.load(fileContent);
doc.getPages(); // perhaps not necessary
return true;
}
catch (InvalidPasswordException invalidPasswordException) {
LOGGER.error("Unable to read password protected PDF.", invalidPasswordException);
}
catch (IOException io) {
LOGGER.error("An error occurred while reading a PDF attachment during account submission.", io);
}
finally {
if (!Objects.isNull(doc)) {
try {
doc.close();
return true;
}
catch (IOException io) {
LOGGER.error("An error occurred while closing a PDF attachment ", io);
}
}
}
return false;
}
If the call to PDDocument#getPages() succeeds, then it also should mean that opening the PDF via double click or browser, without a password, should be possible.

How to use ESAPI to fix Resource Injection (URL) issues

I am new to the Stack Overflow forum. I have a question in remediating the fortify scan issues.
HP Fortify scan reporting the Resource Injection issue for following code.
String testUrl = "http://google.com";
URL url = null;
try {
url = new URL(testUrl);
} catch (MalformedURLException mue) {
log.error("MalformedUrlException URL " + testUrl + " Exception : " + mue);
}
In the above code fortify showing Resource injection in line => url = new URL(testUrl);
I have done following code changes for URL validation using ESAPI to remediate this issue,
String testUrl = "http://google.com";
URL url = null;
try {
String canonURL = ESAPI.encoder().canonicalize(strurl, false, false);
if(ESAPI.validator().isValidInput("URLContext", canonURL, "URL", canonURL.length(), false)) {
url = new URL(canonURL);
} else {
log.error("In Valid script URL passed"+ canonURL);
}
} catch (MalformedURLException mue) {
log.error("MalformedUrlException URL " + canonURL + " Exception : " + mue);
}
However, still Fortify scan reporting as en error. It is not remeditaing this issue. Anything am doing wrong?
Any solution will help lot.
Thanks,
Marimuthu.M

I think that the real issue here is not that the URL may be somehow malformed, but, that the URL may not reference a valid site. More specifically, if I, the bad guy, am able to cause your URL to point to my web site, then you obtain data from my location that is not tested and I can return data that may be used to compromise your system. I might use that to say return a record for "bob the bad guy" that makes bob look like a good guy.
I suspect that in your code you do not set a hard coded value in a string, since this is usually described with words such as
When an application permits a user input to define a resource, like a
file name or port number, this data can be manipulated to execute or
access different resources.
(see https://www.owasp.org/index.php/Resource_Injection)
I think that the proper response will be some combination of:
Do not get the result from the user, but, use the input to choose from your own internal list.
Argue that the value came from a trusted source. For example, read from a strictly controlled database or configuration file.
You do not need to remove the warnings, you need to demonstrate that you understand the risk and indicate why it is OK to use the value in your case.

boolean isValidInput(java.lang.String context,
java.lang.String input,
java.lang.String type,
int maxLength,
boolean allowNull)
throws IntrusionException
type filed in isValidInput function defines a Regular expression or pattern to match with your testUrl.
Like:
try {
ESAPI.validator().getValidInput("URI_VALIDATION", requestUri, "URL", 80, false);
} catch (ValidationException e) {
System.out.println("Validation exception");
e.printStackTrace();
} catch (IntrusionException e) {
System.out.println("Inrusion exception");
e.printStackTrace();
}
It will pass if requestUri matches pattern defined in validation.properties under Validator.URL and its length is less than 80.
Validator.URL=^(ht|f)tp(s?)\:\/\/0-9a-zA-Z(:(0-9))(\/?)([a-zA-Z0-9\-\.\?\,\:\'\/\\\+=&%\$#_])?$

This is piggybacking on Andrew's answer, but the problem Fortify is warning you of is user control of a URL. If your application later decides to make connections to that website, and it is untrusted, this is an issue.
If this is an application where you care more about sharing public URIs, than you'll have to accept the risk, and make sure users are properly trained on the inherent risk, as well as make sure if you redisplay those URLs, that someone doesn't try to embed malicious data.

How to check pdf file is password protected?

How to check pdf file is password protected or not in java?
I know of several tools/libraries that can do this but I want to know if this is possible with just program in java.

Update
As per mkl's comment below this answer, it seems that there are two types of PDF structures permitted by the specs: (1) Cross-referenced tables (2) Cross-referenced Streams. The following solution only addresses the first type of structure. This answer needs to be updated to address the second type.
====
All of the answers provided above refer to some third party libraries which is what the OP is already aware of. The OP is asking for native Java approach. My answer is yes, you can do it but it will require a lot of work.
It will require a two step process:
Step 1: Figure out if the PDF is encrypted
As per Adobe's PDF 1.7 specs (page number 97 and 115), if the trailer record contains the key "\Encrypted", the pdf is encrypted (the encryption could be simple password protection or RC4 or AES or some custom encryption). Here's a sample code:
Boolean isEncrypted = Boolean.FALSE;
try {
byte[] byteArray = Files.readAllBytes(Paths.get("Resources/1.pdf"));
//Convert the binary bytes to String. Caution, it can result in loss of data. But for our purposes, we are simply interested in the String portion of the binary pdf data. So we should be fine.
String pdfContent = new String(byteArray);
int lastTrailerIndex = pdfContent.lastIndexOf("trailer");
if(lastTrailerIndex >= 0 && lastTrailerIndex < pdfContent.length()) {
String newString = pdfContent.substring(lastTrailerIndex, pdfContent.length());
int firstEOFIndex = newString.indexOf("%%EOF");
String trailer = newString.substring(0, firstEOFIndex);
if(trailer.contains("/Encrypt"))
isEncrypted = Boolean.TRUE;
}
}
catch(Exception e) {
System.out.println(e);
//Do nothing
}
Step 2: Figure out the encryption type
This step is more complex. I don't have a code sample yet. But here is the algorithm:
Read the value of the key "/Encrypt" from the trailer as read in the step 1 above. E.g. the value is 288 0 R.
Look for the bytes "288 0 obj". This is the location of the "encryption dictionary" object in the document. This object boundary ends at the string "endobj".
Look for the key "/Filter" in this object. The "Filter" is the one that identifies the document's security handler. If the value of the "/Filter" is "/Standard", the document uses the built-in password-based security handler.
If you just want to know whether the PDF is encrypted without worrying about whether the encryption is in form of owner / user password or some advance algorithms, you don't need the step 2 above.
Hope this helps.

you can use PDFBox:
http://pdfbox.apache.org/
code example :
try
{
document = PDDocument.load( yourPDFfile );
if( document.isEncrypted() )
{
//ITS ENCRYPTED!
}
}
using maven?
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0</version>
</dependency>

Using iText pdf API we can identify the password protected PDF.
Example :
try {
new PdfReader("C:\\Password_protected.pdf");
} catch (BadPasswordException e) {
System.out.println("PDF is password protected..");
} catch (Exception e) {
e.printStackTrace();
}

You can validate pdf, i.e it can be readable, writable by using Itext.
Following is the code snippet,
boolean isValidPdf = false;
try {
InputStream tempStream = new FileInputStream(new File("path/to/pdffile.pdf"));
PdfReader reader = new PdfReader(tempStream);
isValidPdf = reader.isOpenedWithFullPermissions();
} catch (Exception e) {
isValidPdf = false;
}

The correct how to do it in java answer is per #vhs.
However in any application by far the simplest is to use very lightweight pdfinfo tool to filter the encryption status and here using windows cmd I can instantly get a report that two different copies of the same file are encrypted
>forfiles /m *.pdf /C "cmd /c echo #file &pdfinfo #file|find /i \"Encrypted\""
"Certificate (9).pdf"
Encrypted: no
"ds872 source form.pdf"
Encrypted: AES 128-bit
"ds872 filled form.pdf"
Encrypted: AES 128-bit
"How to extract data from a particular area in a PDF file - Stack Overflow.pdf"
Encrypted: no
"Test.pdf"
Encrypted: no
>

The solution:
1) Install PDF Parser http://www.pdfparser.org/
2) Edit Parser.php in this section:
if (isset($xref['trailer']['encrypt'])) {
echo('Your Allert message');
exit();}
3)In your .php form post ( ex. upload.php) insert this:
for the first require '...yourdir.../vendor/autoload.php';
then write this function:
function pdftest_is_encrypted($form) {
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile($form);
}
and then call the function
pdftest_is_encrypted($_FILES["upfile"]["tmp_name"]);
This is all, if you'll try to load a PDF with password the system return an error "Your Allert message"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Unexpected Unicode Translation in Grails/Spring Integration App - java

The conversions you're seeing are exactly quoted-printable decoding, so I suspect you're trying to decode data that wasn't QP encoded in the first place. You should probably check the headers of the mimeMessage to decide what decoding you need to do, rather than just doing QP unconditionally.

Related

Why iterate over the parts in a multipart email in javamail?

Convert base64 string to image at server side in java

PDFBox returns isEncrypted true even if i can open file

How to use ESAPI to fix Resource Injection (URL) issues

How to check pdf file is password protected?

Categories

Resources