java how to decode get url parameter received throw BeanParam - java

I receive a GET response to this web service
#GET
#Path("/nnnnnn")
public Response pfpfpfpf(#BeanParam NNNNNN n)
The class NNNNN has:
#QueryParam("parameter")
private String parameter;
And for that parameter there is a get and set.
I send a request on a get with a query parameter and it is being bind automatically to my option NNNNN, everything is great.
but, now i am sending Japanese strings in the query url. I encode the paramter by UTF-8 before sending, and I have to decode them using UTF-8.
but my question is where should I call the URLDecoder? i tried to call it in the getter of that parameter, but it didn't work, i kept having something like C3%98%C2%B4%C3%98%C2 instead of the Japanese characters

The solution that works for me is :
on the servlet, i should do this:
request.setCharacterEncoding("UTF-8");
and then on the html page i had to add this:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

This is a good question which has potential clear many doubts about how information is processed (encoded and decoded) between systems.
Before I proceed I must say have a fair understanding on Charset, Encoding etc. You may want to read this answer for a quick heads up.
This has to looked from 2 perspectives - browser and server.
Browser perspective of Encoding
Each browser will render the information/text, now to render the information/text it has to know how to interpret those bits/bytes so that it can render correctly (read my answer's 3rd bullet that how same bits can represent different characters in different encoding scheme).
Browser page encoding
Each browser will have a default encoding associated with it. Check this on how to see the default encoding of browser.
If you do not specify any encoding on your HTML page then default encoding of browser will take effect and will render the page as per those encoding rules. so, if default encoding is ASCII and you are using Japanese or Chinese or characters from Unicode supplementary plane then you will see garbage value.
You can tell browser that do not use your default encoding scheme but use this one to render by website, using <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">.
And this exactly what you did/found and you were fine because this meta tag essentially overrode the default encoding of browser.
Another way to achieve same effect is do not use this meta tag but just change the browser's default encoding and still you will be fine. But this is not recommended and using Content-Type meta tag in your JSP is recommended.
Try playing around with browser default encoding and meta tag using below simple HTML.
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
の, は, でした <br></br>
昨夜, 最高
</body>
</html>
Server perspective of Encoding
Server should also know how to interpret the incoming stream of data, which basically means that which encoding scheme to use (server part is tricky because there are several possibilities). Read below from here
When data that has been entered into HTML forms is submitted, the form
field names and values are encoded and sent to the server in an HTTP
request message using method GET or POST, or, historically, via email.
The encoding used by default is based on a very early version of the
general URI percent-encoding rules, with a number of modifications
such as newline normalization and replacing spaces with "+" instead of
"%20". The MIME type of data encoded this way is
application/x-www-form-urlencoded, and it is currently defined (still
in a very outdated manner) in the HTML and XForms specifications. In
addition, the CGI specification contains rules for how web servers
decode data of this type and make it available to applications.
This again has 2 parts that how server should decode the incoming request stream and how it should encode the outgoing response stream.
There are several ways to do this depending upon the use case, for example:
There are methods like setCharacterEncoding, setContentType etc. in HTTP request and response object, which can be used to set the encoding.
This is exactly what you have done in your case that you have told the server that use UTF-8 encoding scheme for decoding the request data because I am expecting advanced Unicode supplementary plane characters. But this is not all, please do read more below.
Set the encoding at server or JVM level, using JVM attributes like -Dfile.encoding=utf8. Read this article on how to set the server encoding.
In your case you were fetching the Japanese characters from query string of the URL and query string is part of HTTP request object, so using request.setCharacterEncoding("UTF-8"); you were able to get the desired encoding result.
But same will not work for URL encoding, which is different from request encoding (your case). Consider below example, in both sysout you will not be able to see the desired encoding effect even after using request.setCharacterEncoding("UTF-8"); because here you want URL encoding since the URL will be something like http://localhost:7001/springapp/forms/executorTest/encodingTest/hellothere 昨夜, 最高 and in this URL there is no query string.
#RequestMapping(value="/encodingTest/{quertStringValue}", method=RequestMethod.GET)
public ModelAndView encodingTest(#PathVariable("quertStringValue") String quertStringValue, ModelMap model, HttpServletRequest request) throws UnsupportedEncodingException {
System.out.println("############### quertStringValue " + quertStringValue);
request.setCharacterEncoding("UTF-8");
System.out.println("############### quertStringValue " + quertStringValue);
return new ModelAndView("ThreadInfo", "ThreadInfo", "####### This is my encoded output " + quertStringValue);
}
Depending upon the framework you are using you may need additional configuration to specify a character encoding for requests or URLs so that you can either apply own encoding if the request does not already specify an encoding, or enforce the encoding in any case. This is useful because current browsers typically do not set a character encoding even if specified in the HTML page or form.
In Spring, there is org.springframework.web.filter.CharacterEncodingFilter for configuring request encoding. Read this similar interesting question which is based on this fact.
In nut shell
Every computer program whether an application server, web server, browser, IDE etc. understands only bits, so it need to know how to interpret the bits to make expected sense out of it because depending upon encoding used, same bits can represent different characters. And that's where "Encoding" comes into picture by giving a unique identifier to represent a character so that all computer programs, diverse OS etc. knows exact right way to interpret it.

Related

Serve html page for video streaming in embedded Jetty

I have an embedded jetty application that has various endpoints, one of them is /stream/fileId and another one is /streamWithHTML/fileId.
The first one gets a file with the specified id from filesystem and starts writing its contents to the response, making use of the range headers to skip to different parts of the video.
The second endpoint is intended to do the same thing, except instead of the default browser player, i want my own custom player built with html.
This is the code i attempted to put together until now. This returns a web page but i dont know how to handle the video stream:
response.setContentType("text/html");
try {
PrintWriter writer = response.getWriter();
String filePath1 = "path/to/file";
String encodedFilename = URLEncoder.encode(filePath1, StandardCharsets.UTF_8);
// Get an html file and then print its contents after replacing a couple of parameters
InputStream stream = getClass().getClassLoader().getResourceAsStream("HTML/VideoPlayer/index_template.html");
String htmlPage = IOUtils.toString(stream, StandardCharsets.UTF_8);
htmlPage = htmlPage.replace("$title", fileEntity1.fileName);
htmlPage = htmlPage.replace("$video_source", encodedFilename);
writer.println(htmlPage);
writer.flush();
writer.close();
} catch (IOException | NullPointerException e) {
e.printStackTrace();
}
And this is the contents of the html file:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>$title</title>
</head>
<body style="background-color: #303030">
<video style="max-width: 100%; max-height: 100%;" src="$video_source"></video>
</body>
</html>
Note that $title and $video_source are just strings i use to correctly place some values when i serve the page.
Every solution i have tried or have seen up until now, is to serve a static video, but it is not possible for me, since the video is saved on a database.
Additional note: the video must be accessible from outside through only one endpoint, which will be the one with the html page, so putting the video on one endpoint and then serving i in the html of another request is not an option for me.
The $video_source is just an HTTP request to a specific resource on your webserver. (it could be /videos/67328148734529)
It will be up to you to implement that specific resource endpoint on your webserver. Typically as an HttpServlet. (in the above example, your endpoint would be on url-pattern /videos/*, and you would use the request.getPathInfo() on your servlet to obtain the video id that is being requested by the browser)
Some things you'll be on the hook to handle.
Honor the limitations put on your response by the various Accept headers on the request.
Provide the correct Content-Type response.
Honor the various HTTP range request headers to know where in the video content you should serve.
Produce the correct HTTP range response headers to indicate to the browser where the content being responded to is coming from.
Stream this video content from your database to your response to the request.
You should consider not using a general database for the video content itself, it makes for a terribly inefficient experience, as most database blob requests don't support range requests efficiently (especially with the nuances of HTTP range requests and multipart/byteranges responses).
A common technique is to put the video in it's raw form on a storage device (hdd, ssd, cdn, etc) with a UUID style name, and have the database reference the UUID name, and other metadata. That way you serve from storage to network efficiently.

Internet Explorer doesn't handle html encoding in URL (GWT)

Using GWT, I've got a webapp, and on a certain page it pulls a parameter from the URL that has the pipe character (|) encoded. So, for example, the full URL would be (in dev mode):
http://127.0.0.1:8888/Home.html?gwt.codesvr=127.0.0.1:9997#DynamicPromo:pk=3%257C1000
and when I pull the parameter "pk" I should get "3|1000". (%257C is the encoded pip char)
Well, this works just fine in Firefox and Chrome.
In IE (I'm using 11), I get "3%7C1000" when I pull the parameter. For whatever reason, IE drops the 25 in the encoded character, meaning it's no longer a pipe char and my app breaks.
I've read around and found that encoding issues are common on IE. In particular, I found this page: http://support.microsoft.com/kb/928847
It's suggested solutions include:
Disable the Auto-Select setting in Internet Explorer.
Provide the character set in the HTTP headers.
Move the META tag to within the first kilobyte of data that is parsed
by MSHTML.
I've tried those 3 and it didn't help. Here is the beginning of my Home.html:
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta http-equiv="content-type" content="text/html;charset=utf-8" />
The other two suggestions:
Increase the size of the server's initial HTTP response. The initial
size should be at least 1 KB.
Make sure that the System Locale setting matches the character set of
the META tag that is specified in the HTML page.
I don't feel will do anything. My system locale settings are correct. And since my meta tags are at the beginning of the document, they are within the first kilobyte of data, so they would be read first. So I don't see why I'd need to increase the HTTP response size.
So, I need IE to properly read this encoded character for the web application to work properly. Does anyone have any other suggestions I could try?
UPDATE:
How the URL is encoded:
URL.encodePathSegment(place.getValue())
Where URL is from the package com.google.gwt.http.client
getValue() is set from this:
public static String encodePk(PrimaryKey pk)
{
if(pk != null)
{
return String.valueOf(pk.getPk()).concat("|").concat(String.valueOf(pk.getCpk()));
}
else{
return "";
}
}
The final result is the url I posted at the top:
http://127.0.0.1:8888/Home.html?gwt.codesvr=127.0.0.1:9997#DynamicPromo:pk=3%257C1000
Where the part after "pk=" is the encoded string.
In order to make sure IE kept the encoding in tact, I had to first decode the URL as soon as I set it:
public void setValue(String value)
{
this.value = unescape(value);
}
private static native String decodeURI( String s )
/*-{
return decodeURI(s);
}-*/;
Thanks a lot for the help!
Try JavaScript encodeURIComponent() Function to encode a string. This function makes a string portable, so it can be transmitted across any network to any computer that supports ASCII characters.
This function encodes special characters.
In addition, it encodes the following characters: , / ? : # & = + $ #
For more info click HERE.
Here is a sample code using JSNI:
public static final native String encodeURIComponent(String uri) /*-{
return encodeURIComponent(uri);
}-*/;

Storing text on GAE, non-standard unicode characters being changed

I have a servlet on Google App Engine that takes text from the page, stores it as an entity, and later sends it back to the client. When I store the word "You're", I get it showing up in the GAE localstore as "You're" as normal. When I return it to the client, however, I get "Youâre" and the debug code at times reads "Youâ??re". I am using the Java Text class to store this text.
How can I ensure that any Unicode characters can be stored correctly? It looks like client -> server is fine by the fact that the text does not change, but server -> client is definitely screwing up. Thanks!
The majority of times I've seen this problem, either the page doesn't declare that it's using UTF-8, via something like
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
or accept-charset isn't set in the form.
Could either of those be the case here?

How to handle non-ASCII Characters in Java while using PDPageContentStream/PDDocument

I am using PDFBox to create PDF from my web application. The web application is built in Java and uses JSF. It takes the content from a web based form and puts the contents into a PDF document.
Example: A user fill up an inputTextArea (JSF tag) in the form and that is converted to a PDF. I am unable to handle non-ASCII Characters.
How should I handle the non-ASCII characters or atleast strip them out before putting it on the PDF. Please help me with any suggestions or point me any resources. Thanks!
Since you're using JSF on JSP instead of Facelets (which is implicitly already using UTF-8), do the following steps to avoid the platform default charset being used (which is often ISO-8859-1, which is the wrong choice for handling of the majority of "non-ASCII" characters):
Add the following line to top of all JSPs:
<%# page pageEncoding="UTF-8" %>
This sets the response encoding to UTF-8 and sets the charset of the HTTP response content type header to UTF-8. The last will instruct the client (webbrowser) to display and submit the page with the form using UTF-8.
Create a Filter which does the following in doFilter() method:
request.setCharacterEncoding("UTF-8");
Map this on the FacesServlet like follows:
<filter-mapping>
<filter-name>nameOfYourCharacterEncodingFilter</filter-name>
<servlet-name>nameOfYourFacesServlet</servlet-name>
</filter-mapping>
This sets the request encoding of all JSF POST requests to UTF-8.
This should fix the Unicode problem in the JSF side. I have never used PDFBox, but since it's under the covers using iText which in turn should already be supporting Unicode/UTF-8, I think that part is fine. Let me know if it still doesn't after doing the above fixes.
See also:
Unicode - How to get the characters right?

JSPs and trademark symbol

On the web pages in our app, the trademark symbol (TM) is appearing as a questions mark. The registered trademark (R) works, though. We are displaying the value using the c:out tag in the JSP standard library. If I put ™ or ™ on the page to test this, those show up as they are supposed to.
<td><c:out value="${item.description}"/></td> <!-- does not work -->
<td>yada yada yada Spiros™ yada yada yada</td> <!-- works -->
To add to this, we're also using YUI, and before we display these pages, they show up in a YUI data table as the results of a query (the user clicks on a row to go to the page described above). The (TM) shows up properly in that table. That tells me that we are properly fetching the value from our database, and as well the server code generating the XML to send back to the YUI data table also works.
So why is the same String displayed properly in the YUI data table, but not in a normal JSP, unless we hardcode the symbol onto the page?
You probably have an encoding issue. If you do not have an explicit encoding in your JSP:
<%# page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" %>
then it's time to add one. Try UTF-8 and if that doesn't work try ISO-8859-1 ... or if you know the correct encoding, use that.
When a char appears as ? inside a browser (usually Firefox) it means that page encoding (as it's detected by the browser will not recognize the char. A good test would be to View->Character Encoding->UTF-8 in firefox. If the char appears correctly then it means that the (tm) char is encoded using UTF-8 standard. You have to instruct your page to set the response encoding header to UTF-8. This should work right now for you.
If that would not work you should first find out how is the character encoded (look at what encoding is read from the database for example) and try to set the page encoding header to that encoding.
The second format works because the (TM) char is encoded as a known html entity which the browser interprets regardless of the page encoding.

Categories