How to have line breaks in text/plain servlet response - java

I have a servlet that generates some text. The purpose of this is to have a simple status page for an application that can easily be parsed, therefor I'd like to avoid html.
Everything works fine, except that linebreaks get swallowed somewhere in the process.
I tried \r \r\n \n but the result in the browser always looks the same: everything that might look like a line break just disapears and everything is in one looooong line. Which makes it next to impossible to read (both for machines and humans).
Firefox does confirm that the result is of type text/plain
So how do I get line breaks in a text/plain document, that is supposed to be displayed in a browser and generated by a servlet?
update:
Other things that do not work:
println
System.getProperty("line.separator")

in plain/text it is totally dependent on browser's word-wrap processing
for example:
in firefox
about:config
and set this
plain_text.wrap_long_lines;true
you can't handle it from server, you might want to converting it to html with plain text and line breaks or css magic

As #gnicker said, I would go for a content type of text-plain like that in your servlet:
#Override
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
response.setContentType("text/plain");
// rest of your code...
}
To output a new line, you just use the println function of your ServletOutputStream:
ServletOutputStream out = response.getOutputStream();
out.println("my text in the first line");
out.println("my text in the second line");
out.close
The new line character could be \n or \r\n though you could just stick with the println function.

Sorry, once again it was my mistake.
We have a compressing Servlet Filter configured which removed the linebreaks.
Without that filter \n works just fine.

Related

Issue when convert buffer to string with hexadecimal code of LF

I am trying to download web page with all its resources . First i download the html, but when to be sure to keep file formatted and use this function below .
there is and issue , i found 10 in the final file and when i found that hexadecimal code of the LF or line escape . and this makes troubles to my javascript functions .
Example of the final result :
<!DOCTYPE html>10<html lang="fr">10 <head>10 <meta http-equiv="content-type" content="text/html; charset=UTF-8" />10
Can someone help me to found the real issue ?
public static String scanfile(File file) {
StringBuilder sb = new StringBuilder();
try {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
while (true) {
String readLine = bufferedReader.readLine();
if (readLine != null) {
sb.append(readLine);
sb.append(System.lineSeparator());
Log.i(TAG,sb.toString());
} else {
bufferedReader.close();
return sb.toString();
}
}
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
There are multiple problems with your code.
Charset error
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
This isn't going to work in tricky ways.
Files (and, for that matter, data given to you by webservers) comes in bytes. A stream of numbers, each number being between 0 and 255.
So, if you are a webserver and you want to send the character ö, what byte(s) do you send?
The answer is complicated. The mapping that explains how some character is rendered in byte(s)-form is called a character set encoding (shortened to 'charset').
Anytime bytes are turned into characters or vice versa, there is always a charset involved. Always.
So, you're reading a file (that'd be bytes), and turning it into a Reader (which is chars). Thus, charset is involved.
Which charset? The API of new FileReader(path) explains which one: "The system default". You do not want that.
Thus, this code is broken. You want one of two things:
Option 1 - write the data as is
When doing the job of querying the webserver for the data and relaying this information onto disk, you'd want to just store the bytes (after all, webserver gives bytes, and disks store bytes, that's easy), but the webserver also sends the encoding, in a header, and you need to save this separately. Because to read that 'sack of bytes', you need to know the charset to turn it into characters.
How would you do this? Well, up to you. You could for example decree that the data file starts with the name of a charset encoding (as sent via that header), then a 0 byte, and then the data, unmodified. I think you should go with option 2, however
Option 2
Another, better option for text-based documents (which HTML is), is this: When reading the data, convert it to characters, using the encoding as that header tells you. Then, to save it to disk, turn the chars back to bytes, using UTF-8, which is a great encoding and an industry standard. That way, when reading, you just know it's UTF-8, period.
To read a UTF-8 text file, you do:
Files.newBufferedReader(Paths.get(file));
The reason this works, is that the Files API, unlike most other APIs (and unlike FileReader, which you should never ever use), defaults to UTF_8 and not to platform-default. If you want, you can make it more readable:
Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8);
same thing - but now in the code it is clear what's happening.
Broken exception handling
} catch (IOException e) {
e.printStackTrace();
return null;
}
This is not okay - if you catch an exception, either [A] throw something else, or [B] handle the problem. And 'log it and keep going' is definitely not 'handling' it. Your strategy of exception handling results in 1 error resulting in a thousand things going wrong with a thousand stack traces, and all of them except the first are undesired and irrelevant, hence why this is horrible code and you should never write it this way.
The easy solution is to just put throws IOException on your scanFile method. The method inherently interacts with files, it SHOULD be throwing that. Note that your psv main(String[] args) method can, and usually should, be declared to throws Exception.
It also makes your code simpler and shorter, yay!
Resource Management failure
a filereader is a resource. You MUST close it, no matter what happens. You are not doing that: If .readLine() throws an exception, then your code will jump to the catch handler and bufferedReader.close is never executed.
The solution is to use the ARM (Automatic Resource Management) construct:
try (var br = Files.newBufferedReader(Paths.get(file), StandardCharsets.UTF_8)) {
// code goes here
}
This construct ensures that close() is invoked, regardless of how the 'code goes here' block exits. Even if it 'exits' via an exception or a return statement.
The problem
Your 'read a file and print it' code is other than the above three items mostly fine. The problem is that the HTML file on disk is corrupted; the error lies in your code that reads the data from the web server and saves it to disk. You did not paste that code.
Specifically, System.lineSeparator() returns the actual string. Thus, assuming the code you pasted really is the code you are running, if you are seeing an actual '10' show up, then that means the HTML file on disk has that in there. It's not the read code.
Closing thoughts
More generally the job of 'just print a file on disk with a known encoding' can be done in far fewer lines of code:
public static String scanFile(String path) throws IOException {
return Files.readString(Paths.get(path));
}
You should just use the above code instead. It's simple, short, doesn't have any bugs, cannot leak resources, has proper exception handling, and will use UTF-8.
Actually, there is no problem in this function I was mistakenly adding 10 using another function in my code .

Java for Web - Multipart/form-data file with wrong encoding

I am developing a web application with Java and Tomcat 8. This application has a page for uploading a file with the content that will be shown in a different page. Plain simple.
However, these files might contain not-so-common characters as part of their text. Right now, I am working with a file that contains Vietnamese text, for example.
The file is encoded in UTF-8 and can be opened in any text editor. However, I couldn't find any way to upload it and keep the content in the correct encoding, despite searching a lot and trying many different things.
My page which uploads the file contains the following form:
<form method="POST" action="upload" enctype="multipart/form-data" accept-charset="UTF-8" >
File: <input type="file" name="file" id="file" multiple/><br/>
Param1: <input type="text" name="param1"/> <br/>
Param2: <input type="text" name="param2"/> <br/>
<input type="submit" value="Upload" name="upload" id="upload" />
</form>
It also contains:
<%#page contentType="text/html" pageEncoding="UTF-8"%>
...
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
My servlet looks like this:
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException {
try {
response.setContentType("text/html;charset=UTF-8");
request.setCharacterEncoding("UTF-8");
String param1 = request.getParameter("param1");
String param2 = request.getParameter("param2");
Collection<Part> parts = request.getParts();
Iterator<Part> iterator = parts.iterator();
while (iterator.hasNext()) {
Part filePart = iterator.next();
InputStream filecontent = null;
filecontent = filePart.getInputStream();
String content = convertStreamToString(filecontent, "UTF-8");
//Save the content and the parameters in the database
if (filecontent != null) {
filecontent.close();
}
}
} catch (ParseException ex) {
}
}
static String convertStreamToString(java.io.InputStream is, String encoding) {
java.util.Scanner s = new java.util.Scanner(is, encoding).useDelimiter("\\A");
return s.hasNext() ? s.next() : "";
}
Despite all my efforts, I have never been able to get that "content" string with the correct characters preserved. I either get something like "K?n" or "Kạn" (which seems to be the ISO-8859-1 interpretation for it), when the correct should be "Kạn".
To add to the problem, if I write Vietnamese characters in the other form parameters (param1 or param2), which also needs to be possible, I can only read them correctly if I set both the form's accept-charset and the servlet scanner encoding to ISO-8859-1, which I definitely don't understand. In that case, if I print the received parameter I get something like "K & # 7 8 4 1 ; n" (without the spaces), which contains a representation for the correct character. So it seems to be possible to read the Vietnamese characters from the form using ISO-8859-1, as long as the form itself uses that charset. However, it never works on the content of the uploaded files. I even tried to encode the file in ISO-8859-1, to use the charset for everything, but it does not work at all.
I am sure this type of situation is not that rare, so I would like to ask some help from the people who might have been there before. I am probably missing something, so any help is appreciated.
Thank you in advance.
Edit 1: Although this question is yet to receive a reply, I will keep posting my findings, in case someone is interested or following it.
After trying many different things, I seem to have narrowed down the causes of problem. I created a class which reads a file from a specific folder in the disk and prints its content. The code goes:
public static void openFile() {
System.out.println(String.format("file.encoding: %s", System.getProperty("file.encoding")));
System.out.println(String.format("defaultCharset: %s", Charset.defaultCharset().name()));
File file = new File(myFilePath);
byte[] buffer = new byte[(int) file.length()];
BufferedInputStream f = null;
String content = null;
try {
f = new BufferedInputStream(new FileInputStream(file));
} catch (FileNotFoundException ex) {
}
try {
f.read(buffer);
content = new String(buffer, "UTF-8");
System.out.println("UTF-8 File: " + content);
f.close();
} catch (IOException ex) {
}
}
Then I added a main function to this class, making it executable. When I run it standalone, I get the following output:
file.encoding: UTF-8
defaultCharset: UTF-8
UTF-8 File: {"...Kạn..."}
However, if run the project as a webapp, as it is supposed to be, and call the same function from that class, I get:
file.encoding: Cp1252
defaultCharset: windows-1252
UTF-8 File: {"...K?n..."}
Of course, this was clearly showing that the default encoding used by the webapp to read the file was not UTF-8. So I did some research on the subject and found the classical answer of creating a setenv.bat for Tomcat and having it execute:
set "JAVA_OPTS=%JAVA_OPTS% -Dfile.encoding=UTF-8"
The result, however, is still not right:
file.encoding: UTF-8
defaultCharset: UTF-8
UTF-8 File {"...Kạn..."}
I can see now that the default encoding became UTF-8. The content read from the file, however, is still wrong. The content shown above is the same I would get if I opened the file in Microsoft Word, but chose to read it using ISO-Latin-1 instead of UTF-8. For some odd reason, reading the file is still working with ISO-Latin-1 somewhere, although everything points out to the use of UTF-8.
Again, if anyone might have suggestions or directions for this, it will be highly appreciated.
I don't seem to be able to close the question, so let me contribute with the answer I found.
The problem is that investigating this type of issue is very tricky, since there are many points in the code where the encoding might be changed (the page, the form encoding, the request encoding, file reading, file writing, console output, database writing, database reading...).
In my case, after doing everything that I posted in the question, I lost a lot of time trying to solve an issue that didn't exist any longer, just because the console output in my IDE (NetBeans, for that project) didn't use the desired character encoding. So I was doing everything right to a certain point, but when I tried to print anything I would get it wrong. After I started writing my logs to files, instead of the console, and thus controlling the writing encoding, I started to understand the issue clearly.
What was missing in my solution, after everything I had already described in my question (before the edit), was to configure the encoding for the database connection. To my surprise, even though my database and all of my tables were using UTF-8, the comunication between the application and MySQL was still in ISO-Latin. The last thing that was missing was adding "useUnicode=true&characterEncoding=utf-8" to the connection, just like this:
con = DriverManager.getConnection("jdbc:mysql:///dbname?useUnicode=true&characterEncoding=utf-8", "user", "pass");
Thanks to this answer, amongst many others: https://stackoverflow.com/a/3275661/843668

Setting a string in a body of httpResponse

I need help. In my current development one of the requirements says:
The server will return 200-OK as a response(httpresponse).
If the panelist is verified then as a result, the server must also
return the panelist id of this panelist.
The server will place the panelist id inside the body of the 200-OK
response in the following way:
<tdcp>
<cmd>
<ack cmd=”Init”>
<panelistid>3849303</panelistid>
</ack>
</cmd>
Now I am able to put the httpresponse as
httpServletResponse.setStatus(HttpServletResponse.SC_OK);
And I can put
String responseToClient= "<tdcp><cmd><ack cmd=”Init”><panelistid>3849303</panelistid></ack></cmd></tdcp>";
Now what does putting the above xml inside the body of 200-OK response mean and how can it be achieved?
You can write the XML directly to the response as follows:
This example uses a ServletResponse.getWriter(), which is a PrintWriter to write a String to the response.
String responseToClient= "<tdcp><cmd><ack cmd=”Init”><panelistid>3849303</panelistid></ack></cmd></tdcp>";
httpServletResponse.setStatus(HttpServletResponse.SC_OK);
httpServletResponse.getWriter().write(responseToClient);
httpServletResponse.getWriter().flush();
You simply need to get the output stream (or output writer) of the servlet response, and write to that. See ServletResponse.getOutputStream() and ServletResponse.getWriter() for more details.
(Or simply read any servlet tutorial - without the ability to include data in response bodies, servlets would be pretty useless :)
If that's meant to be XML, Word has already spoiled things for you by changing the attribute quote symbol to ” instead of ".
It is worth having a look at JAXP if you want to generate XML using Java. Writing strings with < etc. in them won't scale and you'll run into problems with encodings of non-ASCII characters.

response.sendredirect with url with foreign chars - how to encode?

I have a jsf app that has international users so form inputs can have non-western strings like kanjii and chinese - if I hit my url with ..?q=東日本大 the output on the page is correct and I see the q input in my form gets populated fine. But if I enter that same string into my form and submit, my app does a redirect back to itself after constructing the url with the populated parameters in the url (seems redundant but this is due to 3rd party integration) but the redirect is not encoding the string properly. I have
url = new String(url.getBytes("ISO-8859-1"), "UTF-8");
response.sendRedirect(url);
But url redirect ends up being q=???? I've played around with various encoding strings (switched around ISO and UTF-8 and just got a bunch of gibberish in the url) in the String constructor but none seem to work to where I get q=東日本大 Any ideas as to what I need to do to get the q=東日本大 populated in the redirect properly? Thanks.
How are you making your url? URIs can't directly have non-ASCII characters in; they have to be turned into bytes (using a particular encoding) and then %-encoded.
URLEncoder.encode should be given an encoding argument, to ensure this is the right encoding. Otherwise you get the default encoding, which is probably wrong and always to be avoided.
String q= "\u6771\u65e5\u672c\u5927"; // 東日本大
String url= "http://example.com/query?q="+URLEncoder.encode(q, "utf-8");
// http://example.com/query?q=%E6%9D%B1%E6%97%A5%E6%9C%AC%E5%A4%A7
response.sendRedirect(url);
This URI will display as the IRI ‘http://example.com/query?q=東日本大’ in the browser address bar.
Make sure you're serving your pages as UTF-8 (using Content-Type header/meta) and interpreting query string input as UTF-8 (server-specific; see this faq for Tomcat.)
Try
response.setContentType("text/html; charset=UTF-16");
response.setCharacterEncoding("utf-16");

Is there any way to make a JSP print carriage returns (CR)?

I'm currently generating some vCards using JSP. I found out some platforms don't recognize these generated vCards unless their lines are separated by Carriage Returns (CR), and JSP seems to use just Line Feed (LF) by default to separate lines.
Do you guys know any way to tell the JSP to include a CR between each line?
I hope someone has a clue, cause I haven't found much out there...
Thanks in advance!
If you need to emit non-HTML format, then you should be using a servlet instead of a JSP. This way you're not dependent on the JspServlet and/or appserver specifics how the output is been generated. More than often you simply cannot control this.
Using a servlet is relatively simple. Create a class which extends HttpServlet and implement the doGet() method like follows:
response.setContentType("text/x-vcard");
response.setCharacterEncoding("UTF-8");
PrintWriter writer = response.getWriter();
writer.write("BEGIN:VCARD" + (char) 10);
// ...
Map this in web.xml on an url-pattern of /vcard/* or *.vcf or whatever and use the request servletpath/pathinfo/params to generate output dynamically based on the URL.

Categories