Chinese lettes not read properly using Java Mail API

Chinese lettes not read properly using Java Mail API - java

I am having an email listener which reads mail from gmail. When I send a mail from Outlook client which contains chinese character, the encoding is set to gb2312, which causes improper result in part.getContent() in Java mail api .
If encoding from client is set to Chinese Big5 program works properly but we can't change the encoding in Outlook Client . Is there a way to read from Java Mail API but setting the content type or any alternate approach to get the proper content??????

https://community.oracle.com/message/5440489#5440489
Used GBK charset to read the file for all GB2312 file since gb2312 is a subset of GBK.

The following then should work with a bit of luck:
String content = mail. ...
// The bytes as sent, and then interpreted as gb2312:
byte[] bytes = content.getBytes("gb2312");
// Now correctly interprete the bytes as Big5:
content = new String(bytes, "Big5");

Related

HL7V2 HAPI parser exception while receiving data via TCP/IP

I'm using the HAPI hapi-structures-v25 library with version 2.3 to parse HL7v2 message & convert that into FHIR resources. I'm facing a strange issue while receiving and parsing the HL7V2 message using HAPI via TCP listener.
Determine encoding for message. The following is the first 50 chars of the message for reference, although this may not be where the issue is: MSH|^~\&|test|DrJhonDoe|TEST|UNKNOWN|20210216190432||ADT^A01^ADT_A01|60b647d4-b5a5-4fae-a928-d4a3849de3c8|T|2.5
Strange that I'm not getting this error when I'm trying to send this message as a string in main function. I'm getting this error only when I receive the data over TCP/IP to my Java function. I tried sending the HL7 message to my receiving TCP port using Mirth as well external tool & my result is same.
Here is the sample of my HL7v2 message Im trying to process
MSH|^~\\&|test|Dr.JhonDoe|TEST|UNKNOWN|20210216190432.7||ADT^A01^ADT_A01|60b647d4b5a54faea928d4a3849de3c8|T|2.5
EVN||20210216|20210216|
While receiving the data from tcp/ip im converting the byte to string using the UTF-8 charset.
InputStream in = connection.getInputStream();
OutputStream out = connection.getOutputStream();
receivedMessageSize = in.read(receivedByeBuffer);
String incomingHl7Message = new String(receivedByeBuffer, StandardCharsets.UTF_8);
Im getting the message properly. But not sure why the error comes.

As mentioned in the answer by Amit, it needs to be escaped in JAVA. The HL7v2 when transmitted via MLLP it adds <VT>, <CR> Unicode data to the text. The understanding needed here is that these are not junk characters. By the protocol of MLLP the starting and ending of the messages are marked by these unicode characters to describe the starting and ending of a frame.
The HAPI HL7 parse cannot parse these special (non-printable) characters. Happy that I've found a solution on the same forum to handle it in java wisely. How to remove control characters from java string?
A simple regex will do the trick as shown below:
.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");
Also make sure that you're encoding characters are also handled properly with JAVA. Usually JAVA is not good in handling backslash. So, escape the backslash .replace("\\", "\\\\")
This will do the trick.

How convert mail body from iso-8859-2 into utf-8

I'm using JavaMail to handle mail from my mailbox, and I have a following problem.
The content type of a mail is: Content-Type: text/plain; charset=iso-8859-2 (that's only in this case, I have to handle various mails with cp-1250 and many others).
And to get the body of this mail I'm using:
mimeMessage.getContent().toString()
and the content of the mail is returned but is not converted into utf-8, it's still in iso-8859-2 with causes generate text:
w ďż˝omďż˝y. instead of w Łomży.
How can I handle bodies with different than utf-8 encoding to convert it to utf? Is there any possibility to make it in JavaMail, that means JavaMail can convert an mail body from defined charset into utf-8?

JavaMail - Attachment filename not displaying UTF-8 characters correctly

I am trying to send mails that may contain UTF-8 characters in subject, message body and in the attachment file name.
I am able to send UTF-8 characters as a part of Subject as well as Mesage body. However when I am sending an attachment having UTF-8 characters as a attachment file name, it is not displaying it correctly.
So my question is how can I set attachement filename as UTF-8?
Here is part of my code:
MimeBodyPart pdfPart = new MimeBodyPart();
pdfPart.setDataHandler(new DataHandler(ds));
pdfPart.setFileName(filename);
mimeMultipart.addBodyPart(pdfPart);
Later edit:
I replaced
pdfPart.setFileName(filename);
with
pdfPart.setFileName(MimeUtility.encodeText(filename, "UTF-8", null));
and it is working perfectly.
Thanks all.

MIME Headers (like Subject or Content-Disposition) must be mime-encoded, if they contain non-ascii chars.
Encoding is either "quoted printable" or "base64". I recommend for quoted-printable.
See here: Java: Encode String in quoted-printable

I don't know how you send attachments. If upload through tomcat server, It could cause by value of URIEncoding in conf/server.xml

Why I can't send the message to my java server via obj-c?

Here is the question. I open a socket from my java server, and send a UTF string using objective-c, it can't read. So, I am interested in what's going wrong??:
First here is the java code:
DataInputStream in = new DataInputStream(
server.getInputStream());
System.out.println(in.readUTF()); //Just holding, can't print out
Full source code is here:
http://www.tutorialspoint.com/java/java_networking.htm
Then, is my iOS client source code:
NSString *requestStrFrmt = #"HEAD / HTTP/1.0\r\nHost: %#\r\n\r\n";
NSString *requestStr = [NSString stringWithFormat:requestStrFrmt, HOST];
NSData *requestData = [requestStr dataUsingEncoding:NSUTF8StringEncoding];
[asyncSocket writeData:requestData withTimeout:-1.0 tag:0];
Full source code for iOS client:
https://github.com/robbiehanson/CocoaAsyncSocket
The file path I quote:
CocoaAsyncSocket / GCD / Xcode / SimpleHTTPClient / Mobile / SimpleHTTPClient / SimpleHTTPClientAppDelegate.m
I didn't do something fancy, just change the IP and port. And here is the output:
Waiting for client on port 7780...
Just connected to /192.168.1.31:55207
Yes, they can connect together, but the server can't read the string I send.

The readUTF method expects to read the length of the string as a 16 bit big endian number before the string data. You'll have to either modify the client so it will send the length, or use another function to read the data in the server.
You seem to be making a http client and server, so you should use the second option and make the Java server read the request with the read method.
Besides, the http request headers should be encoded in iso-8859-1, preferably ASCII, so using the UTF-8 encoding is a mistake in the first place.

International characters in filename in mutipart formdata

I am using Apache HTTP components (4.1-alpha2) to upload a files to dropbox. This is done using multipart form data. What is the correct way to encode filenames in in a multipart form that contain international (non-ascii) characters?
If I use there standard API, the server returns an HTTP status Forbidden. If I modify the upload code so the file name is urlencoded:
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
FileBody bin = new FileBody(file_obj, URLEncoder.encode(file_obj.getName(), HTTP.UTF_8), HTTP.UTF_8, HTTP.OCTET_STREAM_TYPE );
entity.addPart("file", bin);
req.setEntity(entity);
The file is uploaded, but I end up with a filename that is still encoded. E.g. %D1%82%D0%B5%D1%81%D1%82.txt

To solve this issue specifically for the dropbox server I had to encode the filename in utf8. To do this I had to declare my multipart entity as follows:
MultipartEntity entity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE, null, Charset.forName(HTTP.UTF_8));
I was getting the forbidden because of the OAuth signed entity not matching the actual entity sent (it was being URL encoded).
For those interested on what the standards have to say on this I did some reading of RFCs.
If the standard is strictly adhered then all headers should be encoded 7bit, this would make utf8 encoding of the filename illegal. However RFC2388 () states:
The original local file name may be
supplied as well, either as a
"filename" parameter either of the
"content-disposition: form-data"
header or, in the case of multiple
files, in a "content-disposition:
file" header of the subpart. The
sending application MAY supply a file
name; if the file name of the sender's
operating system is not in US-ASCII,
the file name might be approximated,
or encoded using the method of RFC
2231.
Many posts mention using either rfc2231 or rfc2047 for encoding headers in non US-ASCII in 7bit. However rfc2047 explicitly states in section 5.3 encoded words MUST NOT be used on a Content-Disposition field. This would only leave rfc2231, this however is an extension and cannot be relied upon being implemented in all servers. The reality is most of the major browsers send non-US-ASCII characters in UTF-8 (hence the HttpMultipartMode.BROWSER_COMPATIBLE mode in Apache HTTP client), and because of this most web servers will support this. Another thing to note is that if you use HttpMultipartMode.STRICT on the multipart entity, the library will actually substitute non-ASCII for question mark (?) in the filename.S

I would have thought that the implementation of the FileBody would take responsibility for applying the appropriate rules from RFC 2047 itself. The filename would then be encoded as =?UTF-8?Q?=D1=82=D0=B5=D1=81=D1=82.txt?= or something very similar.

Quick fix:
new String(multipartFile.getOriginalFilename().getBytes ("iso-8859-1"), "UTF-8");

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.