Issue with soap call and content length - java

I have written a java swing app that is sending SOAP requests based on this code here
Overall it is working great, however I have just started testing it when parsing Chinese characters in the soap:BODY and this causes an error where I get a 400 response from the Web Server:
s.AddParameter("xml", "班");
Using wireshark i eventually tracked it down to the Content-Length value that was constructed being incorrect when parsing these Chinese characters (and i am assuming any other multibyte(?) character).
I have proven this by overriding the content length generation by simply changing the code to this:
out.println("Content-Length: " + String.valueOf(postData.length()+2));
Obviously this is not a solution as it only proved my very isolated test case of sending a single character, but i believe the issue is that the postData.length() is calculated first and then on posting the data my 班 character is then converted to \347\217\255, throwing the content-length out and causing the request to fail.
So I am asking for advice on how to resolve this issue?
Is it possible for me to encode the value first, obtain the content length and suppress the encoding on the post? I am unsure what is actually encoding it; the PrintWriter i am assuming?
Regards.

Related

Java changes Cyrillic to unicode like \uXXXX

I am making an application in Java, that will log into my school diary using web api, so I will be able to make my own UI. As the title says, Java at some moment changes the cyrillic to unicode like \uXXXX symbolds. Here is the code on the Russian Stackoverflow: https://ru.stackoverflow.com/questions/1452959/%d0%a1%d0%b5%d1%80%d0%b2%d0%b5%d1%80-%d0%be%d1%82%d0%b2%d0%b5%d1%80%d0%b3%d0%b0%d0%b5%d1%82-%d0%b7%d0%b0%d0%bf%d1%80%d0%be%d1%81. Try to translate it, to understand more. When I am sending my request to https://httpbin.org/post instead of my LOGIN_URL with cyrillic symbols it returns them transformed, if I send request with ascii symbols, I get them back, and, in the linked post I mentioned the python project, which does exactly the same thing I want. And when I modify it to make it send request to httpbin, the cyrillic symbols are returned back! What do I do to fix my java code? P.S. Currently I am switched to okhttp3 from apache http client (same problem), but, I can go back.
Well, I solved my problem. It consisted not in the character encoding, but in the absence of two http headers, namely
httpPost.setHeader("X-Requested-With", "XMLHttpRequest");
httpPost.setHeader("Referer", Constants.BASE_URL);
(added to login request)

Encoding goes wrong in the transport of a SOAP message

Context
I have a SOAP web service which is served by a JBOSS EAP instance and is called via a SOAP UI client.
In the result returned by this web service there may be an XML string returned like this by the web service:
The same string will be rendered as follows in the SOAP UI client:
As you can observe, during the transport of this message some characters (specifically <) have been encoded to <: this is normal, as the encoder wants to avoid that the string gets interpreted as markup when it's just an output to be returned as is.
Problem
What we have observed is that when the string is too long, the encoding goes just wrong. I've tried to analyze and understand and this is all I can get:
Towards the end of the string, some < characters are left as such and are not converted into <
Very weirdly, an XML tag that is normally formed on server side:
<calculationPeriod>
...some stuff
</calculationPeriod>
... has its second c converted into < and that clearly breaks completely the XML:
<cal<ulationPeriod>
...some stuff
</calculationPeriod>
My question
Honestly, I have no idea how to debug this issue furtherly. All I can notice is that:
When inside the web service (stack that I control), the response is normally formed and encoded in XML using the open tag <.
Once out in the SOAP UI client (all across the stack there are generic JBOSS calls and RMI invocations), the message gets corrupted like this.
It is important to remark that this only happens when the string is particularly long. I have one output with length 8192 characters (before encoding) that goes fine, while the other output having length 9567 characters (before encoding) goes wrong and is the subject of this question.
Apologises :)
I'm sorry not to be able to provide a reproductible use case, as well as to use a title which means nothing and everything in the question.
I'm open to provide any additional information for those who may help and to rephrase the question once I get a clearer picture of what the problem is.
I've of course looked a lot on the web but I can't find anything similar, probably I don't search with the right keywords.

Java Jersey 2.4.1: content-length header requirement when using fixed length sreaming

Jersey 2.4.1 gives us the ability to enable fixed length streaming. This is very useful when uploading large files. The new client property for enabling this is: HTTP_URL_CONNECTOR_FIX_LENGTH_STREAMING.
By default, when doing uploads, the whole entity content is buffered by the connector before the bytes are sent to their destination. This means that the client will likely run out of memory when uploading large files. Enabling fixed length streaming solves this problem.
Unfortunately this property is not honored when the content-length header is not specified (or is set to 0) in the request. My question is why? What problem are the Jersey runtimes trying to prevent by putting this restriction? Is the content length information necessary to stream the data?
Thanks,
Habib
Whether fixed length streaming is actived or not, the client should set the header anyway. With fixed length you know the size without the need of buffering the content but that only makes sense if you actually set the header. The server doesn't care if the client buffered the content to determine the length or not.
In HTTP, [the Content-Length field] SHOULD be sent whenever the message's length can be determined prior to being transferred, unless this is prohibited by the rules in section 4.4.
RFC 2616, section 14.13 Content-Length
Without setting the length header, the client could start streaming indefinitely, without a buffer. I guess this it what Jersey tries to prevent, because then the server wouldn't know when the content ends (exept some cases listed in
RFC 2616, section 4.4 Message Length).
I forward upload requests I receive from clients to an another endpoint. I do not control the presence of the content length header in the requests I receive, and therefore may not always have a content length header to send to the end point.
That said, I can see that we need to protect against the malicious case you mention above, although I initially thought this would be the backend's responsibility.
Thanks for the clarification.

HTTP Request (POST) field size limit & Request.BinaryRead in JSPs

First off my Java is beyond rusty and I've never done JSPs or servlets, but I'm trying to help someone else solve a problem.
A form rendered by JavaScript is posting back to a JSP.
Some of the fields in this form are over 100KB in size.
However when the form field is being retrieved on the JSP side the value of the field is being truncated to 100KB.
Now I know that there is a similar problem in ASP Request.Form which can be gotten around by using Request.BinaryRead.
Is there an equivalent in Java?
Or alternatively is there a setting in Websphere/Apache/IBM HTTP Server that gets around the same problem?
Since the posted request must be kept in-memory by the servlet container to provide the functionality required by the ServletRequest API, most servlet containers have a configurable size limit to prevent DoS attacks, since otherwise a small number of bogus clients could provoke the server to run out of memory.
It's a little bit strange if WebSphere is silently truncating the request instead of failing properly, but if this is the cause of your problem, you may find the configuration options here in the WebSphere documentation.
We have resolved the issue.
Nothing to do with web server settings as it turned out and nothing was being truncated in the post.
The form field prior to posting was being split into 102399 bytes sized chunks by JavaScript and each chunk was added to the form field as a value so it was ending up with an array of values.
Request.Form() appears to automatically concatenate these values to reproduce the single giant string but Java getParameter() does not.
Using getParameterValues() and rebuilding the string from the returned values however did the trick.
You can use getInputStream (raw bytes) or getReader (decoded character data) to read data from the request. Note how this interacts with reading the parameters. If you don't want to use a servlet, have a look at using a Filter to wrap the request.
I would expect WebSphere to reject the request rather than arbitrarily truncate data. I suspect a bug elsewhere.

Soap body is utf-8 encoded twice

We use a web service which expects UTF-8. The framework we use on the client is Apache Axis2. We call the web service and the soap body contains strings in UTF-8. The problem is that it seems like the body is "double encoded". I.e we have the character 'å'. The utf-8 representation of 'å' in utf-8 is C3 A5 however we see in our logs that the (double) encoded value sent is C3 83 C2 A5.
Has anyone experienced similiar problems?
It's not entirely clear how you're calling the web service. Does the method in the web service just take a string? If so, what does your string look like in Java? All strings in Java are UTF-16 encoded - if you're converting the UTF-8 binary representation into a string by taking each byte and turning it into a character, then that's the problem.
If you could show what the method you're calling looks like, and how you're calling it, that would help a lot.
For what it's worth, I've used Axis with non-ASCII strings with no problem in the past. I strongly suspect this is a problem with how you're using it rather than with Axis itself, although I'm willing to be proved wrong :)
EDIT: Based on your comment, it sounds like you've got problems receiving the HTML form data, before you hit the web service. If the user has typed "å" into the form, then that's what you should see when you debug in Eclipse. If you're putting bad data into your web service, it's no wonder you're getting bad data out at the other end. I suggest you run WireShark to see exactly what the browser is sending you, both in terms of the raw bytes and also what content encoding it's specifying. My guess is that your web server is treating it as ISO-8859-1 but it's actually UTF-8.
Once you've got the string correctly from the form, I suspect you'll find there are no problems at all in passing it on to the web service.

Categories