JNA call with String behaves differently from one with byte[] - java

I have a JNA Java interface for a C function mpv_set_option_string defined as:
public interface MPV extends StdCallLibrary {
MPV INSTANCE = Native.loadLibrary("lib/mpv-1.dll", MPV.class, W32APIOptions.DEFAULT_OPTIONS);
long mpv_create();
int mpv_initialize(long handle);
int mpv_set_option_string(long handle, String name, String data);
}
When I call this like this:
System.setProperty("jna.encoding", "UTF8");
long handle = MPV.INSTANCE.mpv_create();
int error = MPV.INSTANCE.mpv_initialize(handle);
error = MPV.INSTANCE.mpv_set_option_string(handle, "keep-open", "always");
I get an error back (-5) from the last call, indicating the option (keep-open) is not found.
However, if I change the JNA function signature to:
int mpv_set_option_string(long handle, byte[] name, byte[] data);
...and then call it like this:
error = MPV.INSTANCE.mpv_set_option_string(
handle,
"keep-open\0".getBytes(StandardCharsets.UTF_8),
"always\0".getBytes(StandardCharsets.UTF_8)
);
...it returns no error (0) and works correctly (or so it seems).
What I don't get is, JNA is supposed to encode String by default as char * with UTF-8 encoding and NUL terminated (exactly what I do manually), yet I get different results.
Anyone able to shed some light on this?

You shouldn't be passing W32OPTIONS to a library that isn't a WIN32 API.
By default, JNA maps String to char*, so removing the options should fix the issue for you.
You should also be using an explicit native type for your handle instead of Java long. Pointer is probably correct in this case.

Looks like I found the issue, although I'm not 100% sure what is happening.
It seems that using W32APIOptions.DEFAULT_OPTIONS means it will use the UNICODE settings (because w32.ascii property is false). This looked okay to me, as mpv-1.dll works with UTF-8 strings only, which is Unicode.
However, now I'm guessing that in this case it means it will call a wide-char version of the library function (and if that doesn't exist, still call the original function), and probably means it encodes Strings with two bytes per character. This is because most Win32 libraries have an ASCII and WIDE version of methods accepting strings, but nothing for UTF-8.
Since mpv-1.dll only accepts UTF-8 (and isn't really Win32), strings should be just encoded as bytes in UTF-8 format (basically, just leave them alone). To let JNA know this, either donot pass a W32APIOptions map at all, or select the ASCII_OPTIONS manually.

Related

Some Chinese characters cannot be interpreted when REST call is triggered

Hopefully there are some of you who already encountered this problem and had a fix for it.
I've been encountering a problem with some Chinese characters like ('𨍭','𠀪') because when the REST call is triggered, they are transformed as 'ࠀª' and our Backend system cannot interpret this.
I've created a simple springboot program to check. see below. The character encoding is set to UTF-8
#RequestMapping(value="/chinese", method= RequestMethod.POST, produces="application/json", consumes="application/json;charset=UTF-8")
public String interpret(#RequestBody SampleRESTBean bean ) {
String value = bean.getChineseName();
return value;
}
The bean object below
public class SampleRESTBean {
private String chineseName;
public String getChineseName() {
return chineseName;
}
public void setChineseName(String chineseName) {
this.chineseName = chineseName;
}
Now, to test (in Postman), My JSON object is set as below
{"chineseName":"𨍭獲𠀪保意外,另危疾加1/1000" }
When the object is inside the method, it looks like this
਍­獲ࠀª保意外,另危疾加1/1000
The other Chinese characters are interpreted but some specific characters are not.
Is there a solution for this kind of problem? I've searched for many references but still, I cannot fix this issue. Also, if your have a solution as well when JAX-RS is used because it produces the same issue.
Need your assistance please. Thanks!
A little bit of research shows that:
The unicode codepoint for 𨍭 is u+2836D
The unicode codepoint for 𠀪 is u+2002A
Both are codepoints in code plane 2.
I suspect that the problem is in the way that your client-side or server-side software stack is handling characters that are not in Unicode code-plane 0 (aka the BMP). These are represented as 2 char values when the data is converted to an array of char (or a String, StringBuffer, etcetera.) Some (old) libraries and a lot of custom code don't handle this properly.
I suggest that you do the following:
Brush up on your understanding of Unicode, UTF-16, UTF-8 and "surrogate pairs".
Brush up on your skills in using a Java debugger.
Use the debugger to find out what is happening on the client and server-side1.
Once you have found what is causing the problem:
If it is your code, fix it.
If it is someone else's code, submit a bug report and/or look for a newer version that has already fixed the problem.
1 - Set breakpoints at key points and look at the char values in the data structures that are passing the data around. Look for where the values seem to go "bad".

Java String.getBytes( charsetName ) vs String.getBytes ( Charset object )

I need to encode a String to byte array using UTF-8 encoding. I am using Google guava, it has Charsets class already define Charset instance for UTF-8 encoding. I have 2 ways to do:
String.getBytes( charsetName )
try {
byte[] bytes = my_input.getBytes ( "UTF-8" );
} catch ( UnsupportedEncodingException ex) {
}
String.getBytes( Charset object )
// Charsets.UTF_8 is an instance of Charset
byte[] bytes = my_input.getBytes ( Charsets.UTF_8 );
My question is which one I should use? They return the same result. For way 2 - I don't have to put try/catch! I take a look at the Java source code and I see that way 1 and way 2 are implemented differently.
Anyone has any ideas?
If you are going to use a string literal (e.g. "UTF-8") ... you shouldn't. Instead use the second version and supply the constant value from StandardCharsets (specifically, StandardCharsets.UTF_8, in this case).
The first version is used when the charset is dynamic. This is going to be the case when you don't know what the charset is at compile time; it's being supplied by an end user, read from a config file or system property, etc.
Internally, both methods are calling a version of StringCoding.encode(). The first version of encode() is simply looking up the Charset by the supplied name first, and throwing an exception if that charset is unknown / not available.
The first API is for situations when you do not know the charset at compile time; the second one is for situations when you do. Since it appears that your code needs UTF-8 specifically, you should prefer the second API:
byte[] bytes = my_input.getBytes ( Charsets.UTF_8 ); // <<== UTF-8 is known at compile time
The first API is for situations when the charset comes from outside your program - for example, from the configuration file, from user input, as part of a client request to the server, and so on. That is why there is a checked exception thrown from it - for situations when the charset specified in the configuration or through some other means is not available.
Since they return the same result, you should use method 2 because it generally safer and more efficient to avoid asking the library to parse and possibly break on a user-supplied string. Also, avoiding the try-catch will make your own code cleaner as well.
The Charsets.UTF_8 can be more easily checked at compile-time, which is most likely the reason you do not need a try-catch.
If you already have the Charset, then use the 2nd version as it's less error prone.

String to Byte[] and Byte to String

Given the following example:
String f="FF00000000000000";
byte[] bytes = DatatypeConverter.parseHexBinary(f);
String f2= new String (bytes);
I want the output to be FF00000000000000 but it's not working with this method.
You're currently trying to interpret the bytes as if they were text encoded using the platform default encoding (UTF-8, ISO-8859-1 or whatever). That's not what you actually want to do at all - you want to convert it back to hex.
For that, just look at the converter you're using for the parsing step, and look for similar methods which work in the opposite direction. In this case, you want printHexBinary:
String f2 = DatatypeConverter.printHexBinary(bytes);
The approach of "look for reverse operations near the original operation" is a useful one in general... but be aware that sometimes you need to look at a parallel type, e.g. DataInputStream / DataOutputStream. When you find yourself using completely different types for inverse operations, that's usually a bit of a warning sign. (It's not always wrong, it's just worth investigating other options.)

jvm crashes when trying to send TCHAR type to java function that accepts string as argument

While trying to send the name of the window on which currently a key is being pressed from a JNI C code to a java method jvm crashes. I think it is due to passing of an invalid argument. Please explain why the call fails and how can I send the argument ?
Prototype of java method looks like :
public void storeNameOfForegroundWindow(String windowName) {
// store in the list
}
JNI C snippet :
jmethodID callBackToStoreWindowName = (*env)->GetMethodID(env,cls,"storeNameOfForegroundWindow","(Ljava/lang/String;)V");
TCHAR title[500];
GetWindowText(GetForegroundWindow(), title, 500);
jvalue windowName,*warr;
windowName.l = title;
warr = &title;
(*Env)->CallVoidMethodA(Env,object,callBackToStoreWindowName,warr);
JVM crashes as it encounters the above snippet.I know that jvm crashes due to passing of invalid argument to the java function ( via C code) . If that is so please explain how do I send the argument .I need to send the title of the current window to the java function.
Since your method has a String as its argument, you should give it a jstring instance. JVM can not understand what a TCHAR is. So you need to convert your chars to a java string using:
(*env)->NewStringUTF(env, title);
EDIT: if TCHAR is wchar_t, i.e. is 16 bit and can be cast to a jchar, then you need to use NewString instead of NewStringUTF. You can read more here.
When I first see TCHAR, I say "Oh! It is magnificent, You can write one code that work in both Win9X and WinNT and also call best platform functions with just one definition: _UNICODE". But in time I see that this confuse many developers. There is nothing standalone as TCHAR, it is a typedef of char when _UNICODE is not defined and a typedef of wchar_t otherwise, so depending on project's definition it will change. But on the other hand Java method expect only one of them (either char or wchar_t, but I don't know which one of them), so if you define _UNICODE in your project (this will be done automatically in new IDE) and Java method expect a char* then you are passing a wchar_t* then you are passing a wrong type to the function and length of string will be counted as one (since wchar_t is 2 bytes, it map most of single byte char to char + an extra '\0') and if you pass a char* to function while it expect a wchar_t*, it may produce an error (for example access violation) because:
TCHAR title[500]; // will be converted to char title[500];
// now fill it and it may contain 'abcde\0 some junk data'
// Java want to interpret this as wchar_t* and expect '\0\0' as end of string
// since wchar_t is 2 byte and it may never see it in you string and read beyond
// end of title that is obviously an error!

JNA mapping LPCSTR on windows platform

I am working on call one DLL api for C/C++ with JNA.
The function API in DLL is short DKT_init(LPCSTR name). I made the corresponding java method as public short DKT_init(String name); But when I call it, the DLL API return a parameter error. I wonder how to map LPCSTR in JNA? As LPCSTR is cons char * but String is char *.
String is the appropriate mapping for LPCSTR. JNA will convert the modified UTF16 characters into a NUL-terminated buffer of bytes using the default platform encoding.
You might try passing in an explicit byte array instead (using the suggested alternate method mapping above), which would eliminate the potential of an incorrect encoding issue, e.g.
byte[] arg = { (byte)'f', (byte)'o', (byte)'o', (byte)0 };
You can alter the encoding used by setting the system property "jna.encoding".
You should also eliminate the possibility that "LPCSTR" is actually an incorrect type; if the function is expecting a buffer it can write to, String will not work, and if it's actually LPTCSTR and you're using UNICODE, then you need to pass a WString instead.
Have you tried mapping it to a byte array, like this:
short DKT_INIT(byte [] nameAsByteArray);
//now you should be able to obtain it like this:
System.out.println(new String(nameAsByteArray).trim());

Categories