Some Chinese characters cannot be interpreted when REST call is triggered

Some Chinese characters cannot be interpreted when REST call is triggered - java

Hopefully there are some of you who already encountered this problem and had a fix for it.
I've been encountering a problem with some Chinese characters like ('𨍭','𠀪') because when the REST call is triggered, they are transformed as 'ࠀª' and our Backend system cannot interpret this.
I've created a simple springboot program to check. see below. The character encoding is set to UTF-8
#RequestMapping(value="/chinese", method= RequestMethod.POST, produces="application/json", consumes="application/json;charset=UTF-8")
public String interpret(#RequestBody SampleRESTBean bean ) {
String value = bean.getChineseName();
return value;
}
The bean object below
public class SampleRESTBean {
private String chineseName;
public String getChineseName() {
return chineseName;
}
public void setChineseName(String chineseName) {
this.chineseName = chineseName;
}
Now, to test (in Postman), My JSON object is set as below
{"chineseName":"𨍭獲𠀪保意外,另危疾加1/1000" }
When the object is inside the method, it looks like this
਍獲ࠀª保意外,另危疾加1/1000
The other Chinese characters are interpreted but some specific characters are not.
Is there a solution for this kind of problem? I've searched for many references but still, I cannot fix this issue. Also, if your have a solution as well when JAX-RS is used because it produces the same issue.
Need your assistance please. Thanks!

A little bit of research shows that:
The unicode codepoint for 𨍭 is u+2836D
The unicode codepoint for 𠀪 is u+2002A
Both are codepoints in code plane 2.
I suspect that the problem is in the way that your client-side or server-side software stack is handling characters that are not in Unicode code-plane 0 (aka the BMP). These are represented as 2 char values when the data is converted to an array of char (or a String, StringBuffer, etcetera.) Some (old) libraries and a lot of custom code don't handle this properly.
I suggest that you do the following:
Brush up on your understanding of Unicode, UTF-16, UTF-8 and "surrogate pairs".
Brush up on your skills in using a Java debugger.
Use the debugger to find out what is happening on the client and server-side1.
Once you have found what is causing the problem:
If it is your code, fix it.
If it is someone else's code, submit a bug report and/or look for a newer version that has already fixed the problem.
1 - Set breakpoints at key points and look at the char values in the data structures that are passing the data around. Look for where the values seem to go "bad".

Related

JNA call with String behaves differently from one with byte[]

I have a JNA Java interface for a C function mpv_set_option_string defined as:
public interface MPV extends StdCallLibrary {
MPV INSTANCE = Native.loadLibrary("lib/mpv-1.dll", MPV.class, W32APIOptions.DEFAULT_OPTIONS);
long mpv_create();
int mpv_initialize(long handle);
int mpv_set_option_string(long handle, String name, String data);
}
When I call this like this:
System.setProperty("jna.encoding", "UTF8");
long handle = MPV.INSTANCE.mpv_create();
int error = MPV.INSTANCE.mpv_initialize(handle);
error = MPV.INSTANCE.mpv_set_option_string(handle, "keep-open", "always");
I get an error back (-5) from the last call, indicating the option (keep-open) is not found.
However, if I change the JNA function signature to:
int mpv_set_option_string(long handle, byte[] name, byte[] data);
...and then call it like this:
error = MPV.INSTANCE.mpv_set_option_string(
handle,
"keep-open\0".getBytes(StandardCharsets.UTF_8),
"always\0".getBytes(StandardCharsets.UTF_8)
);
...it returns no error (0) and works correctly (or so it seems).
What I don't get is, JNA is supposed to encode String by default as char * with UTF-8 encoding and NUL terminated (exactly what I do manually), yet I get different results.
Anyone able to shed some light on this?

You shouldn't be passing W32OPTIONS to a library that isn't a WIN32 API.
By default, JNA maps String to char*, so removing the options should fix the issue for you.
You should also be using an explicit native type for your handle instead of Java long. Pointer is probably correct in this case.

Looks like I found the issue, although I'm not 100% sure what is happening.
It seems that using W32APIOptions.DEFAULT_OPTIONS means it will use the UNICODE settings (because w32.ascii property is false). This looked okay to me, as mpv-1.dll works with UTF-8 strings only, which is Unicode.
However, now I'm guessing that in this case it means it will call a wide-char version of the library function (and if that doesn't exist, still call the original function), and probably means it encodes Strings with two bytes per character. This is because most Win32 libraries have an ASCII and WIDE version of methods accepting strings, but nothing for UTF-8.
Since mpv-1.dll only accepts UTF-8 (and isn't really Win32), strings should be just encoded as bytes in UTF-8 format (basically, just leave them alone). To let JNA know this, either donot pass a W32APIOptions map at all, or select the ASCII_OPTIONS manually.

thrift character encoding, perl to java

I have a complex situation that I'm trying to deal with involving character encoding.
I have a perl program which is communicating with a java endpoint via thrift, the java is then using the data to make a request to a legacy php service. It's ugly, but part of a migration plan so needs to work for a short while.
In perl a thrift object is created where some of the fields of the thrift object are json encoded strings.
The problem is that when perl makes the request to java, one of the strings is as follows (this is from data:dumper and is subsequently json encoded and added to thrift):
'offer_message' => "<<>>
&&
\x{c3}\x{82}\x{c2}\x{a9}©
<script>alert(\"XSS\");</script>
https://url.com/imghp?hl=uk",
However, when this data is received on the java side the sequence \x{c3}\x{82}\x{c2}\x{a9} has been converted so in java we receive the following:
<<>>\\n&&\\nÃ�Â�Ã�Â©©\\n<script>alert(\"XSS\");</script>\\nhttps://www.google.com.ua/imghp?hl=uk
The problem is that if I pass the second string to the legacy php program, it fails, if I pass the string taken from the dump of the perl hash, it succeeds. So my assumption is that I need to convert the received string to another encoding (correct me if I'm wrong, I'm not sure that this is the right solution).
I've tried taking the parameters received in java and converting them to every encoding I can think of, however it doesn't work. So for example:
byte[] utf8 = templateParams.getBytes("UTF8");
normallisedTemplateParams = new String(utf8, "UTF8");
I've been varying the encoding schemes in the hope I find something that works.
What is the correct way to solve this? For a short time this messy solution is my only option while other re-engineering is happening.

The problem in the end difficult to diagnose but simple to resolve. It turned out that the package I was using to convert in Java was using java's default encoding of UTF-16. I had to modify the package and force it to use UTF-8. After that, everything worked.

Find out programmatically which code page is being used for stderr

I've made some progress on my previous question, and discovered that the message being displayed is redirected from stderr and is encoded with Cp850.
So, how can I discover this from the program in Java? Reading the Wikipedia page on code page 850, and looking at the characters that can be encoded, it's obvious that this won't be used on all Windows systems because it doesn't include the characters needed for, say, Japanese. (And I assume that it won't be used on non-windows systems.)
If I print out the results of Charset.defaultCharset(), I get windows-1252. But using Cp1252 to decode the output from stderr doesn't give the correct encoding -- what should be è prints as Š.
This has been asked before in a slightly different way, but the answers given are just wrong or don't apply to what I'm trying to do. Is this possible to do in Java? If not, what are people doing to get a stderr message to the user in a format they can read?

Here's what worked for me, for the benefit of future Stack Overflowers.
import com.sun.jna.Library;
import com.sun.jna.Native;
public static int getCodePage(){
Kernel32 lib = (Kernel32) Native.loadLibrary("kernel32", Kernel32.class);
return lib.GetOEMCP();
}
public interface Kernel32 extends Library {
public int GetOEMCP();
}
The code page returned will be an int corresponding to the list at Code Page Identifiers. These can be correlated to the list of supported encodings for Java. (I made a static hashmap. I'm sure there are other, nicer ways to do it.)

Preventing "Null Byte Attacks" | Java

My initial understanding on this topic is that I need to prevent some junk characters available in request to avoid these attacks.
I have decided to solve this by Pattern matching for every request parameter before using it. Most of the posts available on internet talks about Null Byte and the example given shows how file IOs are the main victims of this attack. So following are my questions
Is File IOs are the only thing that null byte can affect or other operations are also victims of this attack?
What are the char/strings/patterns I need to take care if I want to filter my request parameter to be safe for null bye attacks? I have a list and I am sure it is not complete one. %00, \0, 0x00 in hex
The articles that I am referring to are:
http://projects.webappsec.org/w/page/13246949/Null%20Byte%20Injection
http://www.perlmonks.org/index.pl?node_id=38548
http://hakipedia.com/index.php/Poison_Null_Byte
Thanks in advance
So to make it more clear:
First post points out the vulnerability in java that I am talking about. String serverlogs.txt%00.db is allowed in java but when it comes to C/C++ this is serverlogs.txt as in C %00 would be replace by null byte causing the string to terminate after serverlogs.txt. So we should avoid such characters. This is what I am trying to figure out which such characters I should not allow.
String fn = request.getParameter("fn");
if (fn.endsWith(".db"))
{
File f = new File(fn);
//read the contents of “f” file
…
}

Have you tried it? I wrote this quick unit test:
#Test
public void test() throws Exception {
FileOutputStream out = new FileOutputStream("test.txt");
out.write("hello!".getBytes("utf-8"));
out.close();
String badPath = "test.txt\0foo";
File file = new File(badPath);
FileInputStream in = new FileInputStream(file);
System.out.println(StreamUtils.copyToString(in, Charset.forName("utf-8")));
}
Now, if the null character broke the string, I would expect to have the contents of my file printed to the console. Instead, I get a FileNotFoundException. For the record, this was using Java 1.7.0_40 on Ubuntu 13.04.
Update
Further investigation reveals this code in File#isInvalid:
final boolean isInvalid() {
if (status == null) {
status = (this.path.indexOf('\u0000') < 0) ? PathStatus.CHECKED
: PathStatus.INVALID;
}
return status == PathStatus.INVALID;
}

Not a bad question. I'm doubtful that this is a valid vulnerability on all platforms (for example, I believe Windows uses Pascal-style strings, not null-terminated strings, in its kernel), but I would not at all be surprised if some platforms and JVMs were in fact vulnerable to this kind of attack.
The key point to consider is where your strings are coming from, and what you're doing to those bytes before you interact with them as strings. Any bytes coming from a remote machine should always be assumed to be malicious until proven otherwise. And you should never take strings that come from over the Internet and try to turn them into paths on your local machine. Yes webservers like Apache do this, but that's also the most vulnerable code they have. The correct solution is: don't try to blacklist bad data (like null bytes), only whitelist good data.

You might also fight the issue of Null byte from the other angle!
in May 1013 Oracle fixed the problem: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=8014846
So, upgrade to Java 8 or Java 7u40 and you are protected.
(Yes, i tested it!), it works!
If a link to my personal blog is not considered a spam, I'l drop it here:
http://crocode.blogspot.ru/2015/03/java-null-byte-injections.html

If I'm reading your question correctly, you want to prevent executable code from being injected into memory after the terminating null byte of a string.
Java ain't C.
Java doesn't use terminating null byes for its strings so you don't need to protect against this.

Which kind of representation can be '\r\x00\x00\x00' (if usually I have hexadecimal code:'\x0\x00\x00\x03')

I'm using a program (klee) that give me tests of c code.
I need to use the results in my program.
It is not readable information, but some of the solutions are hexadecimal data with the next format:
'\x0e\x00\x00\x00'
I have already asked about how to convert it into integer, and I found the solution.
I will have to introduce this kind of results in structs too, I will know the size but any about the fields or anything else about it.
I think I can solve this but now the problem is that sometimes you can obtain things like:
'\n\x00\x00\x00'= 13
or
'\r\x00\x00\x00' = 10
And I didn't found which kind of representation they use to convert it in readable information..
Apparently I could solve this in python with:
import struct
selection = struct.unpack('
I don't have any idea of pyton, and I would like found a solution in java or c.
Thanks very much

The value \n\r is used by Windows systems to indicate a newline - the \n moves to the new line, and \r moves the write pointer to the start of the line. I'm thinking that you might have had some character data containing a newline where each character was converted into a 32-bit integer value in big-endian format.
Hope this helps!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Some Chinese characters cannot be interpreted when REST call is triggered - java

Related

JNA call with String behaves differently from one with byte[]

thrift character encoding, perl to java

Find out programmatically which code page is being used for stderr

Preventing "Null Byte Attacks" | Java

Which kind of representation can be '\r\x00\x00\x00' (if usually I have hexadecimal code:'\x0\x00\x00\x03')

Categories

Resources