Encoding goes wrong in the transport of a SOAP message - java

Context
I have a SOAP web service which is served by a JBOSS EAP instance and is called via a SOAP UI client.
In the result returned by this web service there may be an XML string returned like this by the web service:
The same string will be rendered as follows in the SOAP UI client:
As you can observe, during the transport of this message some characters (specifically <) have been encoded to <: this is normal, as the encoder wants to avoid that the string gets interpreted as markup when it's just an output to be returned as is.
Problem
What we have observed is that when the string is too long, the encoding goes just wrong. I've tried to analyze and understand and this is all I can get:
Towards the end of the string, some < characters are left as such and are not converted into <
Very weirdly, an XML tag that is normally formed on server side:
<calculationPeriod>
...some stuff
</calculationPeriod>
... has its second c converted into < and that clearly breaks completely the XML:
<cal<ulationPeriod>
...some stuff
</calculationPeriod>
My question
Honestly, I have no idea how to debug this issue furtherly. All I can notice is that:
When inside the web service (stack that I control), the response is normally formed and encoded in XML using the open tag <.
Once out in the SOAP UI client (all across the stack there are generic JBOSS calls and RMI invocations), the message gets corrupted like this.
It is important to remark that this only happens when the string is particularly long. I have one output with length 8192 characters (before encoding) that goes fine, while the other output having length 9567 characters (before encoding) goes wrong and is the subject of this question.
Apologises :)
I'm sorry not to be able to provide a reproductible use case, as well as to use a title which means nothing and everything in the question.
I'm open to provide any additional information for those who may help and to rephrase the question once I get a clearer picture of what the problem is.
I've of course looked a lot on the web but I can't find anything similar, probably I don't search with the right keywords.

Related

Java changes Cyrillic to unicode like \uXXXX

I am making an application in Java, that will log into my school diary using web api, so I will be able to make my own UI. As the title says, Java at some moment changes the cyrillic to unicode like \uXXXX symbolds. Here is the code on the Russian Stackoverflow: https://ru.stackoverflow.com/questions/1452959/%d0%a1%d0%b5%d1%80%d0%b2%d0%b5%d1%80-%d0%be%d1%82%d0%b2%d0%b5%d1%80%d0%b3%d0%b0%d0%b5%d1%82-%d0%b7%d0%b0%d0%bf%d1%80%d0%be%d1%81. Try to translate it, to understand more. When I am sending my request to https://httpbin.org/post instead of my LOGIN_URL with cyrillic symbols it returns them transformed, if I send request with ascii symbols, I get them back, and, in the linked post I mentioned the python project, which does exactly the same thing I want. And when I modify it to make it send request to httpbin, the cyrillic symbols are returned back! What do I do to fix my java code? P.S. Currently I am switched to okhttp3 from apache http client (same problem), but, I can go back.
Well, I solved my problem. It consisted not in the character encoding, but in the absence of two http headers, namely
httpPost.setHeader("X-Requested-With", "XMLHttpRequest");
httpPost.setHeader("Referer", Constants.BASE_URL);
(added to login request)

Issue with soap call and content length

I have written a java swing app that is sending SOAP requests based on this code here
Overall it is working great, however I have just started testing it when parsing Chinese characters in the soap:BODY and this causes an error where I get a 400 response from the Web Server:
s.AddParameter("xml", "班");
Using wireshark i eventually tracked it down to the Content-Length value that was constructed being incorrect when parsing these Chinese characters (and i am assuming any other multibyte(?) character).
I have proven this by overriding the content length generation by simply changing the code to this:
out.println("Content-Length: " + String.valueOf(postData.length()+2));
Obviously this is not a solution as it only proved my very isolated test case of sending a single character, but i believe the issue is that the postData.length() is calculated first and then on posting the data my 班 character is then converted to \347\217\255, throwing the content-length out and causing the request to fail.
So I am asking for advice on how to resolve this issue?
Is it possible for me to encode the value first, obtain the content length and suppress the encoding on the post? I am unsure what is actually encoding it; the PrintWriter i am assuming?
Regards.

Request.getParameter java to vb.net

I have a question about retrieving data from a client who is using java and i am using vb.net.
I am expecting a form posted to me and read the data.
My issues is when i do Request.Form("DATA") i get nothing from the client.
Now if i create a html form and post it to my url with the field "DATA" i can read everything fine. I can also loop through my form and see the field and the button if i right them out to the screen or to a text file. Code is below
response.write(Request.Form("DATA"))
OR
Dim entryName As String
For Each entryName In Request.Form
response.write("Entity Name: " & entryName)
Next
Either method above works fine for me but not for the client. When the client hits my page i see nothing at all no buttons no fields, nothing.
I am concerned he is not posting properly to me. I spoke with the developer and he said he would retrieve the data on his end by doing something like "Request.getparameter"
I do not know java at all but from what i read it sounds like "Request.getparameter" will grab any field out of a form or querysting that has the name specified aka my "DATA" field that i am looking for.
Can anyone explain to me what request.getparameter means in java and what the equivalent code would be in vb.net?
Again i do not know java at all and have searched on this for a while but cant quite find a definitive answer.
Thanks in advance.
It is correct that in Java, request.getParameter("DATA") will look in both the query string and posted form data, while in .NET, Request.Form("DATA") only looks at posted form data. Therefore, it seems likely that your client is sending the data in the query string, since you are not seeing it.
You have a few options. You could use Request.QueryString("DATA") to check only the query string, or either Request.Item("DATA") / Request("DATA") or Request.Params("DATA") to check both the query string and posted form data, plus cookies and server variables. I think Items and Params may be a little different in what they return, e.g. for multiple values. They are probably the closest equivalent to the Java request.getParameter function.

HTTP Request (POST) field size limit & Request.BinaryRead in JSPs

First off my Java is beyond rusty and I've never done JSPs or servlets, but I'm trying to help someone else solve a problem.
A form rendered by JavaScript is posting back to a JSP.
Some of the fields in this form are over 100KB in size.
However when the form field is being retrieved on the JSP side the value of the field is being truncated to 100KB.
Now I know that there is a similar problem in ASP Request.Form which can be gotten around by using Request.BinaryRead.
Is there an equivalent in Java?
Or alternatively is there a setting in Websphere/Apache/IBM HTTP Server that gets around the same problem?
Since the posted request must be kept in-memory by the servlet container to provide the functionality required by the ServletRequest API, most servlet containers have a configurable size limit to prevent DoS attacks, since otherwise a small number of bogus clients could provoke the server to run out of memory.
It's a little bit strange if WebSphere is silently truncating the request instead of failing properly, but if this is the cause of your problem, you may find the configuration options here in the WebSphere documentation.
We have resolved the issue.
Nothing to do with web server settings as it turned out and nothing was being truncated in the post.
The form field prior to posting was being split into 102399 bytes sized chunks by JavaScript and each chunk was added to the form field as a value so it was ending up with an array of values.
Request.Form() appears to automatically concatenate these values to reproduce the single giant string but Java getParameter() does not.
Using getParameterValues() and rebuilding the string from the returned values however did the trick.
You can use getInputStream (raw bytes) or getReader (decoded character data) to read data from the request. Note how this interacts with reading the parameters. If you don't want to use a servlet, have a look at using a Filter to wrap the request.
I would expect WebSphere to reject the request rather than arbitrarily truncate data. I suspect a bug elsewhere.

Soap body is utf-8 encoded twice

We use a web service which expects UTF-8. The framework we use on the client is Apache Axis2. We call the web service and the soap body contains strings in UTF-8. The problem is that it seems like the body is "double encoded". I.e we have the character 'å'. The utf-8 representation of 'å' in utf-8 is C3 A5 however we see in our logs that the (double) encoded value sent is C3 83 C2 A5.
Has anyone experienced similiar problems?
It's not entirely clear how you're calling the web service. Does the method in the web service just take a string? If so, what does your string look like in Java? All strings in Java are UTF-16 encoded - if you're converting the UTF-8 binary representation into a string by taking each byte and turning it into a character, then that's the problem.
If you could show what the method you're calling looks like, and how you're calling it, that would help a lot.
For what it's worth, I've used Axis with non-ASCII strings with no problem in the past. I strongly suspect this is a problem with how you're using it rather than with Axis itself, although I'm willing to be proved wrong :)
EDIT: Based on your comment, it sounds like you've got problems receiving the HTML form data, before you hit the web service. If the user has typed "å" into the form, then that's what you should see when you debug in Eclipse. If you're putting bad data into your web service, it's no wonder you're getting bad data out at the other end. I suggest you run WireShark to see exactly what the browser is sending you, both in terms of the raw bytes and also what content encoding it's specifying. My guess is that your web server is treating it as ISO-8859-1 but it's actually UTF-8.
Once you've got the string correctly from the form, I suspect you'll find there are no problems at all in passing it on to the web service.

Categories