I am using commons FileUpload to get client headers. I am trying to use the header Strings to write txt content like:
StringA; StringB; "ciryllic word"
The server localization is ru_RU.utf-8 the client is cp1251... The result is I always get on my client (desktop app)
StringA; StringB; ?????
instead of cyrillic characters in my server txt files lines. If I open txt with my IE 8 and watch the encoding the cyrillic content can be readable with utf-8 encoding only :( So my question is what should me do to make my servlet write ru_RU.utf-8 acceptable strings. I am testing it on windows so I need to know it for sure
Please help me to understand
Any useful comment is appreciated
ServletOutputStream out =response.getOutputStream();
out.write((YOUrfileasstring).getBytes("UTF-8"));
I might be misunderstanding the question, but to answer this part -
So my question is what should me do to make my servlet write ru_RU.utf-8 acceptable strings.
You should use either of two methods provided by the HttpServletResponse class. Those are:
setContentType(), e.g.
response.setContentType("text/html; charset=UTF-8");
or,
setCharacterEncoding(), e.g.
response.setCharacterEncoding("utf-8");
There are other examples for those methods and information on when to use them all over the place.
Related
I need help. In my current development one of the requirements says:
The server will return 200-OK as a response(httpresponse).
If the panelist is verified then as a result, the server must also
return the panelist id of this panelist.
The server will place the panelist id inside the body of the 200-OK
response in the following way:
<tdcp>
<cmd>
<ack cmd=”Init”>
<panelistid>3849303</panelistid>
</ack>
</cmd>
Now I am able to put the httpresponse as
httpServletResponse.setStatus(HttpServletResponse.SC_OK);
And I can put
String responseToClient= "<tdcp><cmd><ack cmd=”Init”><panelistid>3849303</panelistid></ack></cmd></tdcp>";
Now what does putting the above xml inside the body of 200-OK response mean and how can it be achieved?
You can write the XML directly to the response as follows:
This example uses a ServletResponse.getWriter(), which is a PrintWriter to write a String to the response.
String responseToClient= "<tdcp><cmd><ack cmd=”Init”><panelistid>3849303</panelistid></ack></cmd></tdcp>";
httpServletResponse.setStatus(HttpServletResponse.SC_OK);
httpServletResponse.getWriter().write(responseToClient);
httpServletResponse.getWriter().flush();
You simply need to get the output stream (or output writer) of the servlet response, and write to that. See ServletResponse.getOutputStream() and ServletResponse.getWriter() for more details.
(Or simply read any servlet tutorial - without the ability to include data in response bodies, servlets would be pretty useless :)
If that's meant to be XML, Word has already spoiled things for you by changing the attribute quote symbol to ” instead of ".
It is worth having a look at JAXP if you want to generate XML using Java. Writing strings with < etc. in them won't scale and you'll run into problems with encodings of non-ASCII characters.
I am trying to send some HTTP POST parameters to some web server and one of parameters contains cyrillic characters. So the problem is that if I use this code:
wc.getPage(requestSettings);
requestSettings.setHttpMethod(HttpMethod.POST);
requestSettings.setRequestParameters(new ArrayList());
requestSettings.getRequestParameters().add(new NameValuePair("username", "Друже бобер"));
wc.getPage(requestSettings);
Server will recieve the next urlencoded parameter:
And this is wrong decoded string "Друже бобер".
So I think that HtmlUnit encode url in core with using ASCII not Unicode. How to disable url encoding or how to fix this bug? If I'll encode this string and set to NameValuePair so all percent characters will be encoded by HtmlUnit to.
I think you need to set the charset using the setCharset method.
I have a javascript file that lots of people have embedded to their pages. Since I am hosting the file, I have control over that javascript file; I cannot control the way it is embedded because lots of people is using it already.
This javascript file sends GET requests to my servlets, and the parameters passed with the request are recorded to DB. For example, javascript sends a request to http://myserver.com/servlet?p1=123&p2=aString and then servlet records 123 and aString to DB somehow.
Before sending strings I use encodeURIComponent() to encode it. But what I figured out is every client sends the same string with different encodings depending on either their browser or the site they are visiting. As a result, same strings are represented with different characters when it reaches servlet (so they are different strings).
What I am trying to do is to convert the strings to one kind of encoding from javascript so when they reach the client same words are represented with same characters.
How is this possible?
PS. If there is a way to convert the encoding from Java it is also applicable.
Edit: To be more precise, I select some words from the page and send it to the server. That is where encoding causes problems.
Edit 2: I am NOT sending (and can't send) GET requests via XMLHttpRequest, because domains are different. I am using adding script tag to head method that #streetpc mentioned.
Edit 3: At the moment I am sanitizing the strings by replacing non-ASCII characters at javascript side, but I have a feeling that this is not the way to go:
function sanitize(word) {
/*
ğ : \u011f
ü : \u00fc
ş : \u015f
ö : \u00f6
ç : \u00e7
ı : \u0131
û : \u00fb
*/
return encodeURIComponent(
word.replace(/\u011f/g, '_g')
.replace(/\u00fc/g, '_u')
.replace(/\u00fb/g, '_u')
.replace(/\u015f/g, '_s')
.replace(/\u00f6/g, '_o')
.replace(/\u00e7/g, '_c')
.replace(/\u0131/g, '_i'));
}
what I figured out is every client sends the same string with different encodings
Whilst that would be normal for <form> submissions, it should not happen for XMLHttpRequest work. The encodeURIComponent function explicitly always writes URL-encoded UTF-8 bytes, regardless of the encoding of the page from which it was used. Of course persuading your servlet container to allow you to read those UTF-8 bytes without messing them up is another story, but that shouldn't depend on the client.
What might be a problem is if you are using raw non-ASCII characters inside your script file itself. In that case the interpretation of those characters will vary according to the charset the browser is using to load the script. This may be affected by:
any charset declared in the Content-Type: text/javascript;charset= header.
any charset attribute declared on the <script src="..." charset="..."> element.
the charset of the page that included the script.
(1) and (2) are not supported in all browsers. Normally you can rely on (3), but as a third-party script author that is out of your control. Therefore you should use only ASCII characters in your script. (Use \u1234 escapes to include non-ASCII characters in string literals in your script to get around this limitation.)
Do you specify the encoding of the JavaScript file in the HTTP headers? Like Content-type: text/javascript; charset=utf-8 with the .js file beign saved in UTF-8 of course. With Apache, you can configure
AddCharset utf-8 .js
Or you can make the hosted javascript file create another script tag with a charset='utf-8' parameter and add-it to the head element (like most bookmarklets do).
I think the javascript being interpreted as UTF-8 code should then get/manipulate UTF-8 strings.
Then, in your Java Servlet, you can specify the input encoding to use:
request.setCharacterEncoding("UTF-8");
Edit: check this page about Character Encoding in JavaScript, especially the part named "Setting the Character Encoding".
I have a webpage that is encoded (through its header) as WIN-1255.
A Java program creates text string that are automatically embedded in the page. The problem is that the original strings are encoded in UTF-8, thus creating a Gibberish text field in the page.
Unfortunately, I can not change the page encoding - it's required by a customer propriety system.
Any ideas?
UPDATE:
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
SECOND UPDATE:
Thanks for all the responses. I've managed to convert th string, and yet, Gibberish. Problem was that XML encoding should be set in addition to the header encoding.
Adam
To the point, you need to set the encoding of the response writer. With only a response header you're basically only instructing the client application which encoding to use to interpret/display the page. This ain't going to work if the response itself is written with a different encoding.
The context where you have this problem is entirely unclear (please elaborate about it as well in future problems like this), so here are several solutions:
If it is JSP, you need to set the following in top of JSP to set the response encoding:
<%# page pageEncoding="WIN-1255" %>
If it is Servlet, you need to set the following before any first flush to set the response encoding:
response.setCharacterEncoding("WIN-1255");
Both by the way automagically implicitly set the Content-Type response header with a charset parameter to instruct the client to use the same encoding to interpret/display the page. Also see this article for more information.
If it is a homegrown application which relies on the basic java.net and/or java.io API's, then you need to write the characters through an OutputStreamWriter which is constructed using the constructor taking 2 arguments wherein you can specify the encoding:
Writer writer = new OutputStreamWriter(someOutputStream, "WIN-1255");
Assuming you have control of the original (properly represented) strings, and simply need to output them in win-1255:
import java.nio.charset.*;
import java.nio.*;
Charset win1255 = Charset.forName("windows-1255");
ByteBuffer bb = win1255.encode(someString);
byte[] ba = new byte[bb.limit()];
Then, simply write the contents of ba at the appropriate place.
EDIT: What you do with ba depends on your environment. For instance, if you're using servlets, you might do:
ServletOutputStream os = ...
os.write(ba);
We also should not overlook the possible approach of calling setContentType("text/html; charset=windows-1255") (setContentType), then using getWriter normally. You did not make completely clear if windows-1255 was being set in a meta tag or in the HTTP response header.
You clarified that you have a UTF-8 file that you need to decode. If you're not already decoding the UTF-8 strings properly, this should no big deal. Just look at InputStreamReader(someInputStream, Charset.forName("utf-8"))
What's embedding the data in the page? Either it should read it as text (in UTF-8) and then write it out again in the web page's encoding (Win-1255) or you should change the Java program to create the files (or whatever) in Win-1255 to start with.
If you can give more details about how the system works (what's generating the web page? How does it interact with the Java program?) then it will make things a lot clearer.
The page I'm creating is an RSS feed that needs to be set to WIN-1255, showing information taken from another feed that is encoded in UTF-8.
In this case, use a parser to load the UTF-8 XML. This should correctly decode the data to UTF-16 character data (Java Strings are always UTF-16). Your output mechanism should encode from UTF-16 to Windows-1255.
byte[] originalUtf8;//Here input
//utf-8 to java String:
String internal = new String(originalUtf8,Charset.forName("utf-8");
//java string to w1255 String
byte[] win1255 = internal.getBytes(Charset.forName("cp1255"));
//Here output
Currently, I have a servlet act as web service.
When I pass in parameters using POST, it will return me an executable binary file (application/octet-stream). However, beside binary file, I would also like to get additional information (in text format) about this binary file.
Is it possible to achieve this by using only single POST request? But, how is it possible, to switch from application/octet-stream to text/plain within single POST response?
It is not possible to change the MIME type within a single response.
However, i think t is possible to put your additional information into the response header using the HttpServletResponse.addHeader method.
You could return a multipart MIME response (multipart/mixed; boundary=XXX instead of application/octet-stream) with the binary part encoded in Base64.
I'm not sure if the JavaMail API can be used to construct the content, but it's worth a look.
Within a single POST it isn't possible. But two ideas:
Show your text file as another web page that starts the download of your executable
Bundle both files into a zip archive