Upload File containing latin characters - java

I'm using latest Apache Commons Net to make use of FTP functionality.
My goal is to upload CSV files (based on ;), which might contain latin characters, such as ñ, á or Ú. The thing is that when I upload them to the FTP Server, those characters are transformed to another.
The following line:
12345678A;IÑIGO;PÉREZ;JIMÉNEZ;X
gets transformed into this:
12345678A;IÑIGO;PÉREZ;JIMÉNEZ;X
My code seems something like that:
// pFile is passed as parameter to the current method
InputStream is = new FileInputStream(pFile);
ftp.setFileType(FTP.BINARY_FILE_TYPE);
ftp.setControlEncoding("UTF-8");
if (ftp.storeFile("some\\path", is)) {
is.close();
...
}
I've digged some hours to find a solution (I thought setFileType() and/or setControlEncoding() would work), but nope...
I've tried to print to the standard output (screen, with logger and System.out), and I've realised that it's InputStream who doesn't read those characters. Executing the following code printed the mentioned characters in a right way:
InputStreamReader isr = new InputStreamReader(is, StandardCharsets.UTF_8);
BufferedReader in = new BufferedReader(isr);
String line = null;
while((line = in.readLine()) != null){
System.out.print(line);
logger.debug(line);
}
in.close();
isr.close();
But how to tell FTP client or storeFile() to make use of UTF-8?
Thank you all.

Sorry, but I've got the answer.
When I've told you that I see transformed some characters
12345678A;IÑIGO;PÉREZ;JIMÉNEZ;X
I meant that those characters were seen on a FTP Client application (I use WinSCP). The issue is that the default character encoding was selected and it wasn't UTF-8-
Now, after realising it, I select the proper encoding (UTF-8), and the text seem to be well-formed.
Thanks for your help.

Related

What is the correct way to handle incorrect charsets when reading strings from a URL resource?

I am reading a file which is a classpath resource:
URL dictionary = Main.class.getResource("/british-english.txt");
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(dictionary.openStream(), StandardCharsets.UTF_8));
List<String> lines = bufferedReader.lines().collect(Collectors.toList());
How should I handle the case where the file is encoded with a different character set, say UTF_16? Is there a way to detect this, except by looking at the list of strings, to see whether they are English words?
https://tika.apache.org/0.8/api/org/apache/tika/parser/txt/CharsetDetector.html.
Please try with apache tika api for charset detection on the supplied input.

Handling Norwegian and Icelandic letters in Java

In Java,
I am receiving an text input which contains Norwegian Characters and Icelandic characters.
I get a stream and then parse it to String and assign to some variables and again create output.
When i make output, Norwegian and Icelandic characters get distorted and get some ? or ¶ etc. Output files also get same character when opened.
I am making web project .war using Maven. What basic settings are required for Icelandic/Norwegian Text in Coding?
I get a method of setting Locale but unable to produce output using it. Locale.setDefault(new Locale("is_IS", "Iceland"));
Kindly Suggest. How to do it?
Actual Character: HÝS048
Distorted Character: HÃ?S048 (when SOUT directly) or H??S048 (when i get bytes from string and put into string object using UTF-8)
Update (11:13)
I have used
CharsetEncoder encoder = Charset.forName("UTF-8").newEncoder();
encoder.onMalformedInput(CodingErrorAction.REPORT);
encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("d:\\try1.csv"),encoder));
out.write(sb.toString());
out.flush();
out.close();
Output: H�S048
Update (12:41):
While reading stream from HTTP source i have used following:
`BufferedReader in = new BufferedReader(new InputStreamReader(apiURL.openStream(), "UTF-8"));`
It perfectly shows output on Console.
I have fetched value of CSV and put it in after logics Bean.
Now I need to create CSV file but when i get values from bean it again gives distorted text. I am using StringBuilder to append the values of bean and write it to file. :( Hope for best. Looking for ideas
The solution to this problem is to get data in UTF-8, print it in UTF-8 and to create file in UTF-8
Read data from URL as below:
BufferedReader in = new BufferedReader(new InputStreamReader(apiURL.openStream(), "UTF-8"));
Then set it to beans or do whatever you want. While printing
System.out.println(new String(sb.toString().getBytes("UTF-8"),"UTF-8"));
Then while creating file again:
FileWriter writer = new FileWriter("d:\\try2.csv");
writer.append(new String(sb.toString().getBytes("UTF-8"),"UTF-8"));
writer.flush();
writer.close();
This is how my problem got resolved.

PDF generation issue in java code

I am getting a PDF attachment in a Soap response message. I need to generate a PDF back out of it. However, the generated PDF is of the following form:
%PDF-1.4
%
2 0 obj
<</Type/XObject/ColorSpace/DeviceRGB/Subtype/Image/BitsPerComponent 8/Width
278/Length 7735/Height 62/Filter/DCTDecode>>stream
How can I solve this issue?
Here is the code showing how I am embedding a PDF as an attachment:
message = messageFactory.createMessage();
SOAPBody body = message.getSOAPBody();
header.detachNode();
AttachmentPart attachment1 = message.createAttachmentPart();
fr = new FileReader(new File(pathName));
br = new BufferedReader(fr);
String stringContent = "";
line = br.readLine();
while (line != null) {
stringContent = stringContent.concat(line);
stringContent = stringContent.concat("\n");
line = br.readLine();
}
fr.close();
br.close();
attachment1.setMimeHeader("Content-Type", "application/pdf");
attachment1.setContent(stringContent, "application/pdf");
The below code describes how I am getting PDF back from the SOAP message:
Object content = attachment1.getContent();
writePdf(content);
private void writePdf(Object content) throws IOException, PrintException,
DocumentException {
String str = content.toString();
//byte[] b = Base64.decode(str);
//byteArrayToFile(b);
OutputStream file = new FileOutputStream(new File
(AppConfig.getInstance().getConfigValue("webapp.root") +
File.separator + "temp" + File.separator + "hede.pdf"));
//String s2 = new String(bytes, "UTF-8");
//System.out.println("S2::::::::::"+s2);
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
document.add(new Paragraph(str));
document.close();
file.close();
}
Can anyone help me out?
There are several faults in the supplied code:
In the code showing how you are embedding pdf as an attachment, you are using a Reader (a FileReader enveloped in a BufferedReader) to read the file to attach line by line, concat these lines with using \n as separator, and send the result of the concatenation as attachment content of type "application/pdf".
This is a procedure you may consider for text files (even though it isn't a good choice there either) but binary files read like this most like get broken beyond repair (and PDFs are binary files, in spite of a phase early in their history where handling them as text was quite harmless):
When reading a file, a Reader interprets the bytes in it according to some character encoding (as none is given explicitly here, most likely the platform default encoding is used) to transform them to Unicode characters collected in a String. Already here most likely the binary data is damaged.
When using readLine you read these Unicode characters until the Reader recognizes a line break. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed. (Java API sources JavaDocs). When you continue to concatenate these lines uniformly using \n as separators, you essentially replace all single carriage return characters and all carriage return - line feed character pairs into single line feed characters, damaging the binary data even further.
When you make the attachment API you use encode this string as the content of some attachment part, you make it transform your Unicode characters back into bytes. If by chance the same character encoding is assumed as was by the Reader before, this might heal some of the damage done back then, but surely not all, and the line break interpretation of the step inbetween certainly isn't healed, either. If a different encoding is used, the data is damaged once again.
Thus, check what other arguments your AttachmentPart.setContent methods accept, choose something which does not damage binaries (e.g. InputStreams, ByteBuffers, byte[], ...) and use that, e.g. a FileInputStream.
The code which describes how you are getting PDF back from SOAP message is even weirder... You assume that toString of the attachment content returns some meaningful string representation (very unlikely here), and then continue to create a new PDF containing that string representation as text content of the first and only paragraph of the PDF. Thus while your attachment creation code discussed above at least 'merely' damaged the PDF, your attachment restrieval code completely ignores the nature of the attachment and destroys it beyond recognition.
You should instead check the actual type of the content object, retrieve the binary data it holds according to its type, and store that content using a FileOutputStream (not a Writer, and not using Strings inbetween, and not copying 'line' by 'line').
And whatever source gave you the impression that your code was appropriate for the task... well, either you completely misunderstood it or you should shun it from now on.

Base64 InputStream to String

I have been trying to get an input stream reading a file, which isa plain text and has embeded some images and another files in base64 and write it again in a String. But keeping the encoding, I mean, I want to have in the String something like:
/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAoHBwgHBgoICAgLCgoLDhgQDg0NDh0VFhEYIx8lJCIf
IiEmKzcvJik0KSEiMEExNDk7Pj4+JS5ESUM8SDc9Pjv/2wBDAQoLCw4NDhwQEBw7KCIoOzs7Ozs7
I have been trying with the classes Base64InputStream and more from packages as org.apache.commons.codec but I just can not fiugure it out. Any kind of help would be really appreciated. Thanks in advance!
Edit
Piece of code using a reader:
BufferedReader br= new BufferedReader(new InputStreamReader(bodyPart.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
Getting as a result something like: .DIC;ÿÛC;("(;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;ÿÀ##"ÿÄ
Have you tried doing this:
final byte[] bytes64bytes = Base64.encodeBase64(IOUtils.toByteArray(is));
final String content = new String(bytes64bytes);
A text file containing some base64 data can be read with the charset of the rest of the file.
Base64 encoding is a mean to encode bytes in a limited set of characters that are unchanged with almost all char encodings, for example ASCII or UTF-8.
Base64 isn't a charset encoding, you don't have to specify you have some base64 encoded data when reading a file into a string.
So if your text file is generally UTF-8 (that's probable), you can read it without problem even if it contains a base64 encoded stream. Simply use a basic reader and don't use a Base64InputStream if you don't want to decode it.
When opening a file with a reader, you have to specify the encoding. If you don't know it, I suggest you test with the probable ones, like UTF-8, US-ASCII or ISO-8859-1.
If you have a normal InputStream object than You can directly get Base64 encoded stream from it using apache common library class Base64InputStream constructor
I found the solution, inspired by this post getting base64 content string of an image from a mimepart in Java
I think it is kind of stupid decode and encode again the base64 code, but it is the only way I found to manage this issue. If someone could give a better solution, it would be also really appreciated.
Thanks

How to read the last line of an online file, from the end of file

all:
I wonder how to quickly read the last line of an online file, such as "http://www.17500.cn/getData/ssq.TXT",
I know the RandomAccessFile class, but it seems that it can only read the local files. Any suggestion ?? TKS in advance.
You'll have to read through the whole reader, and only keep the last line:
String line;
String lastLine = null;
while ((line = reader.readLine()) != null) {
lastLine = line;
}
EDIT: as Joachim says in his comment, if you know that the last line will never be longer than (for example) 500 bytes, you could set the Range header in your HTTP request to -500, and thus only download the last 500 bytes. The same algorithm as above could be used.
I don't know, however, if it would deal correctly a stream starting in the middle of a multi-byte encoded character if the encoding is multi-byte (like UTF-8). With ASCII or ISO-8859-1, you won't have any problem.
Also note that the server is not forced to honor the range request, and could retur the whole file.
httpConnection.setRequestProperty("Range","-500");

Categories