URLConnection does not get the charset

URLConnection does not get the charset - java

I'm using URL.openConnection() to download something from a server. The server says
Content-Type: text/plain; charset=utf-8
But connection.getContentEncoding() returns null. What up?

The value returned from URLConnection.getContentEncoding() returns the value from header Content-Encoding
Code from URLConnection.getContentEncoding()
/**
* Returns the value of the <code>content-encoding</code> header field.
*
* #return the content encoding of the resource that the URL references,
* or <code>null</code> if not known.
* #see java.net.URLConnection#getHeaderField(java.lang.String)
*/
public String getContentEncoding() {
return getHeaderField("content-encoding");
}
Instead, rather do a connection.getContentType() to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....
String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";
for (String value : values) {
value = value.trim();
if (value.toLowerCase().startsWith("charset=")) {
charset = value.substring("charset=".length());
}
}
if ("".equals(charset)) {
charset = "UTF-8"; //Assumption
}

This is documented behaviour as the getContentEncoding() method is specified to return the contents of the Content-Encoding HTTP header, which is not set in your example. You could use the getContentType() method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache.

Just as an addition to the answer from #Buhake Sindi. If you are using Guava, instead of the manual parsing you can do:
MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();

Related

Spring Java URL query parameter date encoded like ISO 8601 with RestTemplate [duplicate]

Say I have a URL
http://example.com/query?q=
and I have a query entered by the user such as:
random word £500 bank $
I want the result to be a properly encoded URL:
http://example.com/query?q=random%20word%20%A3500%20bank%20%24
What's the best way to achieve this? I tried URLEncoder and creating URI/URL objects but none of them come out quite right.

URLEncoder is the way to go. You only need to keep in mind to encode only the individual query string parameter name and/or value, not the entire URL, for sure not the query string parameter separator character & nor the parameter name-value separator character =.
String q = "random word £500 bank $";
String url = "https://example.com?q=" + URLEncoder.encode(q, StandardCharsets.UTF_8);
When you're still not on Java 10 or newer, then use StandardCharsets.UTF_8.toString() as charset argument, or when you're still not on Java 7 or newer, then use "UTF-8".
Note that spaces in query parameters are represented by +, not %20, which is legitimately valid. The %20 is usually to be used to represent spaces in URI itself (the part before the URI-query string separator character ?), not in query string (the part after ?).
Also note that there are three encode() methods. One without Charset as second argument and another with String as second argument which throws a checked exception. The one without Charset argument is deprecated. Never use it and always specify the Charset argument. The javadoc even explicitly recommends to use the UTF-8 encoding, as mandated by RFC3986 and W3C.
All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.
See also:
What every web developer must know about URL encoding

I would not use URLEncoder. Besides being incorrectly named (URLEncoder has nothing to do with URLs), inefficient (it uses a StringBuffer instead of Builder and does a couple of other things that are slow) Its also way too easy to screw it up.
Instead I would use URIBuilder or Spring's org.springframework.web.util.UriUtils.encodeQuery or Commons Apache HttpClient.
The reason being you have to escape the query parameters name (ie BalusC's answer q) differently than the parameter value.
The only downside to the above (that I found out painfully) is that URL's are not a true subset of URI's.
Sample code:
import org.apache.http.client.utils.URIBuilder;
URIBuilder ub = new URIBuilder("http://example.com/query");
ub.addParameter("q", "random word £500 bank \$");
String url = ub.toString();
// Result: http://example.com/query?q=random+word+%C2%A3500+bank+%24

You need to first create a URI like:
String urlStr = "http://www.example.com/CEREC® Materials & Accessories/IPS Empress® CAD.pdf"
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
Then convert that URI to an ASCII string:
urlStr = uri.toASCIIString();
Now your URL string is completely encoded. First we did simple URL encoding and then we converted it to an ASCII string to make sure no character outside US-ASCII remained in the string. This is exactly how browsers do it.

Guava 15 has now added a set of straightforward URL escapers.

The code
URL url = new URL("http://example.com/query?q=random word £500 bank $");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL = uri.toASCIIString();
System.out.println(correctEncodedURL);
Prints
http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$
What is happening here?
1. Split URL into structural parts. Use java.net.URL for it.
2. Encode each structural part properly!
3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the hostname!
4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded Unicode - (better would be NFKC!). For more information, see: How to encode properly this URL
In some cases it is advisable to check if the URL is already encoded. Also replace '+' encoded spaces with '%20' encoded spaces.
Here are some examples that will also work properly
{
"in" : "http://نامه‌ای.com/",
"out" : "http://xn--mgba3gch31f.com/"
},{
"in" : "http://www.example.com/‥/foo",
"out" : "http://www.example.com/%E2%80%A5/foo"
},{
"in" : "http://search.barnesandnoble.com/booksearch/first book.pdf",
"out" : "http://search.barnesandnoble.com/booksearch/first%20book.pdf"
}, {
"in" : "http://example.com/query?q=random word £500 bank $",
"out" : "http://example.com/query?q=random%20word%20%C2%A3500%20bank%20$"
}
The solution passes around 100 of the test cases provided by Web Platform Tests.

Using Spring's UriComponentsBuilder:
UriComponentsBuilder
.fromUriString(url)
.build()
.encode()
.toUri()

The Apache HttpComponents library provides a neat option for building and encoding query parameters.
With HttpComponents 4.x use:
URLEncodedUtils
For HttpClient 3.x use:
EncodingUtil

Here's a method you can use in your code to convert a URL string and map of parameters to a valid encoded URL string containing the query parameters.
String addQueryStringToUrlString(String url, final Map<Object, Object> parameters) throws UnsupportedEncodingException {
if (parameters == null) {
return url;
}
for (Map.Entry<Object, Object> parameter : parameters.entrySet()) {
final String encodedKey = URLEncoder.encode(parameter.getKey().toString(), "UTF-8");
final String encodedValue = URLEncoder.encode(parameter.getValue().toString(), "UTF-8");
if (!url.contains("?")) {
url += "?" + encodedKey + "=" + encodedValue;
} else {
url += "&" + encodedKey + "=" + encodedValue;
}
}
return url;
}

In Android, I would use this code:
Uri myUI = Uri.parse("http://example.com/query").buildUpon().appendQueryParameter("q", "random word A3500 bank 24").build();
Where Uri is a android.net.Uri

In my case I just needed to pass the whole URL and encode only the value of each parameters.
I didn't find common code to do that, so (!!) so I created this small method to do the job:
public static String encodeUrl(String url) throws Exception {
if (url == null || !url.contains("?")) {
return url;
}
List<String> list = new ArrayList<>();
String rootUrl = url.split("\\?")[0] + "?";
String paramsUrl = url.replace(rootUrl, "");
List<String> paramsUrlList = Arrays.asList(paramsUrl.split("&"));
for (String param : paramsUrlList) {
if (param.contains("=")) {
String key = param.split("=")[0];
String value = param.replace(key + "=", "");
list.add(key + "=" + URLEncoder.encode(value, "UTF-8"));
}
else {
list.add(param);
}
}
return rootUrl + StringUtils.join(list, "&");
}
public static String decodeUrl(String url) throws Exception {
return URLDecoder.decode(url, "UTF-8");
}
It uses Apache Commons' org.apache.commons.lang3.StringUtils.

Use this：
URLEncoder.encode(query, StandardCharsets.UTF_8.displayName());
or this:
URLEncoder.encode(query, "UTF-8");
You can use the following code.
String encodedUrl1 = UriUtils.encodeQuery(query, "UTF-8"); // No change
String encodedUrl2 = URLEncoder.encode(query, "UTF-8"); // Changed
String encodedUrl3 = URLEncoder.encode(query, StandardCharsets.UTF_8.displayName()); // Changed
System.out.println("url1 " + encodedUrl1 + "\n" + "url2=" + encodedUrl2 + "\n" + "url3=" + encodedUrl3);

parse multipart/form-data response in spring

I need to receive multipart/form-data response
but i have no idea on how to parse this kind of response
For example
--mf8sckatxs4PpMnOLF6ltSv26ZJc5qxy9qq
Content-Disposition: form-data; name="arguments"
Content-Type: text/plain;charset=UTF-8
Content-Length: 311
[{"code":200,"message":"123"}]
Content-Disposition: form-data; name="_0"; filename="0_BODY_feature"
Content-Type: application/octet-stream
Content-Length: 407
binarydata

Since you're using Spring you already have the Apache fileupload and http classes available. MultipartStream is designed to be attached to an incoming byte stream and can provide progress updates as data arrives.
This simple example illustrates a non-streaming scenario where you've already buffered the whole incoming body.
byte[] yourResponse = ... // the whole response as a byte array
String yourContentType = ... // the Content-Type header string
ContentType contentType = ContentType.parse(yourContentType);
MultipartStream multipartStream = new MultipartStream(
new ByteArrayInputStream(yourResponse),
contentType.getParameter("boundary").getBytes(),
1024, // internal buffer size (you choose)
null); // progress indicator (none)
boolean nextPart = multipartStream.skipPreamble();
while (nextPart) {
ByteArrayOutputStream output = new ByteArrayOutputStream();
String partHeaders = multipartStream.readHeaders();
multipartStream.readBodyData(output);
// do something with the multi-line part headers
// do something with the part 'output' byte array
nextPart = multipartStream.readBoundary();
}
Add exception handling as required.

You can store and separate those values in an Array of Strings:
String[] array = "allyourinputtext".split(";");
This will separate the values after a semicolon. Then, you can access each value by doing this:
String content = array[0];
String name = array[1];
...
This doesn't solve the WHOLE problem(since not all values are separated by semicolons), but you can play with the arguments you pass to split() to separate your values.
Note: If you want to parse a String to int (the length value for example) you can use:
int length = Integer.parseInt(array[index];

Http response through service sends content (mime) type in text/html but not application/json

My application using Oauth for basecam api. I am trying to get Httpresponse into json format but it revert into plain html (text/html) content-type. so there is no method to parse HTML content and get the token from basecamp. This is not homework but a small R&D to quick start Oauth protocol. as am new to oauth.
//HERE -> final String JSON_CONTENT = "application/json"
String contentType = OAuthConstants.JSON_CONTENT;
if (response.getEntity().getContentType() != null) {
contentType = response.getEntity().getContentType().getValue();
//BELOW -> getting contentType is in "text/html; utf-8
System.out.println(response.getEntity().getContentType().getValue()); //text/html; charset=utf-8
}
if (contentType.contains(OAuthConstants.JSON_CONTENT)) {
return handleJsonResponse(response);
} else
if (contentType.contains(OAuthConstants.URL_ENCODED_CONTENT)) {
return handleURLEncodedResponse(response);
} else
if (contentType.contains(OAuthConstants.XML_CONTENT)) {
return handleXMLResponse(response);
}
else {
// Unsupported Content type
throw new RuntimeException(
"Cannot handle "
+ contentType
+ " content type. Supported content types include JSON, XML and URLEncoded");
}
So above lines explain very well that control won't come is json, xml or url_encoded if-else. Si either i need to parse text/html into json or xml response or i have to create another method name handleHtmlResponse(). what way it would be continent to get contentType.

After the response is set with all the data(header, body ...), commit it by calling ServletResponse#flushBuffer.

Sending XML http requests in java

Does anyone know what I can use instead of StringRequestEntity() as it has been deprecated?
PostMethod post = new PostMethod("my endpoint url bla bla bla");
post.setRequestHeader("Content-Type", "text/xml;charset=UTF-8");
String xmlRequest = new String(sw_reg.toString());
log.info("Setting request body to [" + xmlRequest + "]");
//Send the request
post.setRequestEntity(new StringRequestEntity(xmlRequest));
httpclient.executeMethod(post);

According to the documentation (found here), you should use the following constructor instead
public StringRequestEntity(String content,
String contentType,
String charset)
throws UnsupportedEncodingException
Creates a new entity with the given content, content type, and
charset.
So you just need to add the content type and charset as extra parameters to the constructor call. Parameter description below...
content - The content to set.
contentType - The type of the content,
or null. The value retured by getContentType(). If this content type
contains a charset and the charset parameter is null, the content's
type charset will be used.
charset - The charset of the content, or
null. Used to convert the content to bytes. If the content type does
not contain a charset and charset is not null, then the charset will
be appended to the content type.

Best way to parse HTTP headers from HTTP request String using no 3rd party libs (Core Java)

Given an HTTP request header, does anyone have suggestions or know of existing code to properly parse the header? I am trying to do this with Core Java only, no third party libs
Edit:
Trying to find key fields from this String for example:
GET / HTTP/1.1User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15Host: localhost:9000Accept: /
Want to parse out the Method and method

I wrote a library, RawHTTP, whose only purpose is to parse HTTP messages (requests and responses).
If you don't want to use a library, you could copy the source into your own code base, starting form this: https://github.com/renatoathaydes/rawhttp/blob/a6588b116a4008e5b5840d4eb66374c0357b726d/rawhttp-core/src/main/java/com/athaydes/rawhttp/core/RawHttp.java#L52
This will split the lines of the HTTP message all the way to the end of the metadata sections (start-line + headers).
With the list of metadata lines at hand, you can then call the parseHeaders method, which will create the headers for you. You can easily adapt that to just return a Map<String, List<String>> to avoid having to also import the header classes.
That said... RawHTTP has no dependencies, so I would just use it instead :) but up to you.

Start by reading and understanding the HTTP specification.
The request line and headers are separated by CR LF sequences (bytes with decimal value 13 and 10), so you can read the stream and separate out each line. I believe that the headers must be encoded in US-ASCII, so you can simply convert bytes to characters and append to a StringBuilder (but check the spec: it may allow ISO-8859-1 or another encoding).
The end of the headers is signified by CR LF CR LF.

Your concatenated one-line string is not a HTTP header.
A proper HTTP request message should be look like this (not always)
GET / HTTP/1.1 CRLF
Host: localhost:9000 CRLF
User-Agent: curl/7.19.7 blar blar CRLF
Accept: */* CRLF
Content-Length: ?? CRLF
...: ... CRLF
CRLF
octets
See here http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
If you want implement a HTTP server without any help of Sevlets, JavaEE Containers, you should use Sockets.
Read the first line [Request-Line = Method SP Request-URI SP HTTP-Version CRLF]
Read the request header line by line till you got the blank line
For each header line you can parse [fieldName: fieldValue]
Read the entity body.
This is NOT the only case for HTTP message contracts.

I'm using the guava library to include preconditions for my methods. You can remove them in favor of null checks.
/**
* #return a string consisting of the HTTP headers, concatenating the keys and values delimited by
* CFLR (empty line) capable of serialization to the database.
*/
public static final String httpHeadersToString(final HttpResponse httpResponse) {
Preconditions.checkNotNull(httpResponse);
Preconditions.checkNotNull(httpResponse.getAllHeaders());
final Header[] allHeaders = httpResponse.getAllHeaders();
StringBuffer sb = new StringBuffer();
int index = 0;
while(index < allHeaders.length) {
Header header = allHeaders[index];
sb.append(header.getName())
.append(System.getProperty("line.separator"))
.append(header.getValue());
if (++index < allHeaders.length) {
sb.append(System.getProperty("line.separator"));
}
}
return sb.toString();
}
/**
* #return reconstruct HTTP headers from a string, delimited by CFLR (empty line).
*/
public final HttpHeaders stringToHttpHeaders(final String headerContents) {
HttpHeaders httpHeaders = new HttpHeaders();
final String[] tempHeaderArray = headerContents.split(System.getProperty("line.separator"));
int i = 0;
while (i + 1 <= tempHeaderArray.length) {
httpHeaders.add(tempHeaderArray[i++], tempHeaderArray[i++]);
}
return httpHeaders;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.