What is the best way to make a Jni as fast as Possible?
I need to call a .dll for a conversation with a cxternal Measurement Box.
Atm i do call the values of the Box over a JNI with a static loaded Lib.
public class myJni{
static {
System.loadLibrary("myJniDll");
}
public native double Get4(String para);
}
very simple as you can see.
On C side i use:
HINSTANCE hInstLibrary = LoadLibrary("my_64.dll");
typedef void(*FunctionFunc)();
JNIEXPORT jdouble JNICALL my_Get4
(JNIEnv * penv, jclass clazz, jstring Para)
{
typedef double(__stdcall *Get4)(char FAR *lpszPara);
Get4 _Get4;
FunctionFunc _FunctionFunc2;
_Get4 = (Get4)GetProcAddress(hInstLibrary, "my_Get4");
_FunctionFunc2 = (FunctionFunc)GetProcAddress(hInstLibrary, "Function");
const char *nativeString = penv->GetStringUTFChars(Para, 0);
const char* parameter = nativeString;
double ret = _Get4((char*)parameter);
penv->ReleaseStringUTFChars(Para, nativeString);
return ret;
}
The Code needs about 20 ms to get the Value of the Com Portunit. The Lag when the value is changing doesn't "feel" good. It is sensible when I change the value that it needs time to go over the Jni.
Has someone got some tweeks to get it to about 10 ms?
#Edit: Gil´s Pointer Skip made a huge impact. Its now less "laggy". Still not as far as i want to but ok.
The Unit on the Com port is a Measurement device that works in a 0.000000 accuracy. So the Lag is shown by the last 4 Numbers not smoothly changing but skipping much of the scale when changed.
You can skip loading the function pointer for each call:
static Get4 _Get4 = NULL;
static FunctionFunc _FunctionFunc2 = NULL;
if(!_Get4)
_Get4 = (Get4)GetProcAddress(hInstLibrary, "my_Get4");
if(!_FunctionFunc2)
_FunctionFunc2 = (FunctionFunc)GetProcAddress(hInstLibrary, "Function");
This will save a ot of time.
Other Answers offer some useful optimizations (viewed in isolation), but I'm pessimistic that they will give you the amount of speed-up that you desire.
If this method really takes 20 milliseconds per call, amortized over a number of calls, then I can confidently predict that the vast majority of that time is spent in either the call to Get4, or in the call to GetStringUTFChars. Neither of those can be optimized, so the chances of getting a 50% speedup are (IMO) non-existent.
You don't state which of these methods does anything resembling 'get the value of the Com Portunit', but you don't need to get the native function addresses every time you call this method. They won't change. Stick them into static variables the first time. As a matter of fact you don't need to dynamically load 'my_64.dll' at all. Statically link to it.
Related
I am trying to free the memory of t_data which is assigned as dummy variable. (The code is below). Now as soon as I free t_data the program throws a heap corruption error but instead if I copy all the stuff from body to a new memory for t_data, everything works fine. The delete code is called somewhere down the line in another class method (not shown here), it just uses t_Data pointer to delete the memory.
jshortArray val = (jshortArray)(m_pJVMInstance->m_pEnv->CallStaticObjectMethod(m_imageJ_cls, method_id, arr, (jint)t, (jint)c));
jsize len = m_pJVMInstance->m_pEnv->GetArrayLength(val);
jshort* body = m_pJVMInstance->m_pEnv->GetShortArrayElements(val, 0);
unsigned short int* dummy = reinterpret_cast<unsigned short int*>(body);
//t_data = dummy; //NOTE: Once you free t_data later exception is thrown.
t_data = new unsigned short int[len];
for (int i = 0; i < len; i++) {
unsigned short int test = *(body + i);
*((unsigned short int*)t_data + i) = test;
}
I am trying to figure out a way where I dont have to run the for loop to copy the body data to t_data and still be able to free the memory. (The for loop takes too much time for big images.)
What Michael said was correct and that indeed solved the problem. Referring to his comment:
Yes, definitely don't call free or delete on the pointer returned by GetShortArrayElements, because you don't know what GetShortArrayElements did internally. It might not have allocated any memory at all. Some implementations just pin the Java array to avoid having it moved by the GC, and then returns a pointer to the actual Java array contents. Just call ReleaseShortArrayElements when you're done with the pointer. – Michael
I came across a problem when I read the code of sun.misc.Unsafe.Java.
Is CAS a loop like spin?
At first, I think CAS is just an atomic operation in a low-live way. However, when I try to find the source code of the function compareAndSwapInt, I find the cpp code like this:
jbyte Atomic::cmpxchg(jbyte exchange_value, volatile jbyte* dest, jbyte compare_value) {
assert(sizeof(jbyte) == 1, "assumption.");
uintptr_t dest_addr = (uintptr_t)dest;
uintptr_t offset = dest_addr % sizeof(jint);
volatile jint* dest_int = (volatile jint*)(dest_addr - offset);
jint cur = *dest_int;
jbyte* cur_as_bytes = (jbyte*)(&cur);
jint new_val = cur;
jbyte* new_val_as_bytes = (jbyte*)(&new_val);
new_val_as_bytes[offset] = exchange_value;
while (cur_as_bytes[offset] == compare_value) {
jint res = cmpxchg(new_val, dest_int, cur);
if (res == cur) break;
cur = res;
new_val = cur;
new_val_as_bytes[offset] = exchange_value;
}
return cur_as_bytes[offset];
}
I saw "when" and "break " in this atomic function.
Is it a spin ways?
related code links:
http://hg.openjdk.java.net/jdk8u/jdk8u20/hotspot/file/190899198332/src/share/vm/prims/unsafe.cpp
http://hg.openjdk.java.net/jdk8u/jdk8u/jdk/file/07011844584f/src/share/classes/sun/misc/Unsafe.java
http://hg.openjdk.java.net/jdk8u/jdk8u20/hotspot/file/55fb97c4c58d/src/share/vm/runtime/atomic.cpp
CAS is a single operation that returns a value of 1 or 0 meaning this operation has made it or not, since you are doing a compareAndSwapInt you want this operation to succeed, thus the operations gets repeated until it works.
I think you are also confusing this with a spin lock, that basically means do something while this value is "1" (for example); all other threads wait until this value is zero (via compareAndSwap), which in effect means that some thread is done with the work and has released the lock (this is referred as release/acquire semantics)
The CAS operation is not a spin, it's an atomic operation at hardware level. On x86 and SPARC processors CAS a single instruction, and it supports int and long operands.
Indeed the Atomic::cmpxchg int / long overloads are generated on x86 using a single cmpxchgl/cmpxchgq instruction.
What you're looking at is an Atomic::cmpxchg single-byte overload, which works around the CAS instruction's limitation to simulate CAS at byte level. It does so by performing a CAS for an int located at the same address as the byte, then checking just one byte out of it and repeating if CAS fails because of a change in the other 3 bytes. The compare-and-swap is still atomic, it just needs to be re-tried sometimes because it covers more bytes than is necessary.
CAS is typically a hardware instruction just like integer addition or comparison, for example (only slower). The instruction itself may be broken down into several steps of so-called microcode, and might indeed contain a low-level loop or a blocking wait for another processor component. However, these are implementation details of the processor architecture. Remember the saying that any problem in CS can be solved by adding another layer of indirection? This also applies here. An atomic operation in Java may actually involve the following layers:
The Java method signature.
A C(++) JNI method to implement it.
A C(++) "compiler intrinsic" such as GCC's __atomic_compare_exchange
The actual processor instruction.
The microcode that implements this instruction.
Additional layers to be used by said microcode, such as cache coherency protocols and the like.
My recommendation is not to worry about how all of this works unless either case applies:
For some reason, it doesn't work. This is likely due to a platform bug.
It is too slow.
Unit tests can help you identify the former case. Benchmarking can help you identify the latter case. But it should be pointed out that if the CAS provided to you by Java is slow, chances are that you will not be able to write a faster one yourself. Therefore, your best bet in this case would be to change your data structures or data flows such as to further reduce the amount of thread synchronization required.
I am learning Java JNI and trying to understand the GetStringUTFChars & ReleaseStringUTFChars. Still i can't able to understand the ReleaseStringUTFChars.
As per my understanding from some article, in most cases that the GetStringUTFChars return a reference to the original string data and not a copy. So actually the ReleaseStringUTFChars release the jstring or the const char* (if copied) or both.
I can get a better understanding if i get the answer to the below question.
In the below code do i need to call the ReleaseStringUTFChars in a for loop or only once (with any one of the const char*)?
#define array_size 10
const char* chr[array_size];
jboolean blnIsCopy;
for (int i = 0; i < array_size; i++) {
chr[i] = env->GetStringUTFChars(myjstring, &blnIsCopy);
printf((bool)blnIsCopy ? "true\n" : "false\n"); //displays always true
printf("Address = %p\n\n",chr[i]); //displays different address
}
//ReleaseStringUTFChars with in a for loop or single statement is enough
for (int i = 0; i < array_size; i++) {
env->ReleaseStringUTFChars(myjstring, chr[i]);
}
Thanks in advance.
Get/ReleaseStringUTFChars must always be called in pairs, regardless of whether a copy or not is returned.
In practice, you pretty much always get a copy (at least with the JVM implementations I checked: OpenJDK and Dalvik) so that the GC is free to move the original array. It obviously can't collect it because you've got a reference to the string but it'll still move objects around.
There is also a GetStringCritical/ReleaseStringCritical call pair, which will always attempt to return a pointer to the original array (though in theory it may still return a copy). This makes it faster but it comes at a cost: the GC must not move the array until you release it. Again, in practice this is usually implemented by establishing a mutex with the GC, and incrementing a lock count for Get and decrementing it for Release. This means these must be called in pairs too, otherwise the lock count will never get back to zero and GC will probably never run. Please note: Get/ReleaseStringCritical also comes with other limitations which are less relevant to this question but are no less important.
I am looking for a way to pause a thread for accurate number of milliseconds in Java or C (I can use JNI to access the C method.
So far, I was using the following in java code.
LinkedBlockingQueue<String> SLEEPER = new LinkedBlockingQueue<String>();
SLEEPER.poll(msTime, TimeUnit.MILLISECONDS);
This was suggested on one of the threads in this forum and worked great on most of our windows7 machines.
But it is not giving me accrate results on a new set of hardware. So I decided to use JNI to access C. But even this does not pause for accurate amount of milliseconds on new hardwares (Dell and HP on Windows7).
JNIEXPORT void JNICALL Java_JniTimer_jniWait(JNIEnv *env, jobject obj , jint waitTime ) {
HANDLE hWaitEvent = CreateEvent(NULL, TRUE, FALSE, NULL);
if (hWaitEvent)
{
WaitForSingleObject(hWaitEvent,waitTime);
CloseHandle (hWaitEvent);
}
}
Does anyone have a reliable option for accurate sleep on thread.
Thanks.
You can't do that on a non RT OS.
Not so much due to the resolution of timers, but because the per thread 'time slice' (sometimes referred to as quantum), is usually set to a few ms (15 on my antique xp here), and is not something you can just play with.
I have a C structure that is sent over some intermediate networks and gets received over a serial link by a java code. The Java code gives me a byte array that I now want to repackage it as the original structure. Now if the receive code was in C, this was simple. Is there any simple way to repackage a byte[] in java to a C struct. I have minimal experience in java but this doesnt appear to be a common problem or solved in any FAQ that I could find.
FYI the C struct is
struct data {
uint8_t moteID;
uint8_t status; //block or not
uint16_t tc_1;
uint16_t tc_2;
uint16_t panelTemp; //board temp
uint16_t epoch#;
uint16_t count; //pkt seq since the start of epoch
uint16_t TEG_v;
int16_t TEG_c;
}data;
I would recommend that you send the numbers across the wire in network byte order all the time. This eliminates the problems of:
Compiler specific word boundary generation for your structure.
Byte order specific to your hardware (both sending and receiving).
Also, Java's numbers are always stored in network-byte-order no matter the platform that you run Java upon (the JVM spec requires a specific byte order).
A very good class for extracting bits from a stream is java.nio.ByteBuffer, which can wrap arbitrary byte arrays; not just those coming from a I/O class in java.nio. You really should not hand code your own extraction of primitive values if at all possible (i.e. bit shifting and so forth) since it is easy to get this wrong, the code is the same for every instance of the same type, and there are plenty of standard classes that provide this for you.
For example:
public class Data {
private byte moteId;
private byte status;
private short tc_1;
private short tc_2;
//...etc...
private int tc_2_as_int;
private Data() {
// empty
}
public static Data createFromBytes(byte[] bytes) throws IOException {
final Data data = new Data();
final ByteBuffer buf = ByteBuffer.wrap(bytes);
// If needed...
//buf.order(ByteOrder.LITTLE_ENDIAN);
data.moteId = buf.get();
data.status = buf.get();
data.tc_1 = buf.getShort();
data.tc_2 = buf.getShort();
// ...extract other fields here
// Example to convert unsigned short to a positive int
data.tc_2_as_int = buf.getShort() & 0xffff;
return data;
}
}
Now, to create one, just call Data.createFromBytes(byteArray).
Note that Java does not have unsigned integer variables, but these will be retrieved with the exact same bit pattern. So anything where the high-order bit is not set will be exactly the same when used. You will need to deal with the high-order bit if you expected that in your unsigned numbers. Sometimes this means storing the value in the next larger integer type (byte -> short; short -> int; int -> long).
Edit: Updated the example to show how to convert a short (16-bit signed) to an int (32-bit signed) with the unsigned value with tc_2_as_int.
Note also that if you cannot change the byte-order and it is not in network order, then java.nio.ByteBuffer can still serve you here with buf.order(ByteOrder.LITTLE_ENDIAN); before retrieving the values.
This can be difficult to do when sending from C to C.
If you have a data struct, cast it so that you end up with an array of bytes/chars and then you just blindly send it you can sometimes end up with big problems decoding it on the other end.
This is because sometimes the compiler has decided to optimize the way that the data is packed in the struct, so in raw bytes it may not look exactly how you expect it would look based on how you code it.
It really depends on the compiler!
There are compiler pragma's you can use to make packing unoptimized. See C/C++ Preprocessor Reference - pack
The other problem is the 32/64-bit bit problem if you just use "int", and "long" without specifying the number of bytes... but you have done that :-)
Unfortunately, Java doesnt really have structs... but it represents the same information in classes.
What I recommend is that you make a class that consists of your variables, and just make a custom unpacking function that will pull the bytes out from the received packet (after you have checked its correctness after transfer) and then load them in to the class.
e.g. You have a data class like
class Data
{
public int moteID;
public int status; //block or not
public int tc_1;
public int tc_2;
}
Then when you receive a byte array, you can do something like this
Data convertBytesToData(byte[] dataToConvert)
{
Data d = Data();
d.moteId = (int)dataToConvert[0];
d.status = (int)dataToConvert[1];
d.tc_1 = ((int)dataToConvert[2] << 8) + dataTocConvert[3]; // unpacking 16-bits
d.tc_2 = ((int)dataToConvert[4] << 8) + dataTocConvert[5]; // unpacking 16-bits
}
I might have the 16-bit unpacking the wrong way around, it depends on the endian of your C system, but you'll be able to play around and see if its right or not.
I havent played with Java for sometime, but hopefully there might be byte[] to int functions built in these days.
I know there are for C# anyway.
With all this in mind, if you are not doing high data rate transfers, definately look at JSON and Protocol Buffers!
Assuming you have control over both ends of the link, rather than sending raw data you might be better off going for an encoding that C and Java can both use. Look at either JSON or Protocol Buffers.
What you are trying to do is problematic for a couple of reasons:
Different C implementations will represent uint16_t (and int16_t) values in different ways. In some cases, the most significant byte will be first when the struct is laid out in memory. In other cases, the least significant byte will.
Different C compilers may pack the fields of the struct differently. So it is possible (for example) that the fields have been reordered or padding may have been added.
So what this all means is that you have to figure out exactly the struct is laid out ... and just hope that this doesn't change when / if you change C compilers or C target platform.
Having said that, I could not find a Java library for decoding arbitrary binary data streams that allows you to select "endian-ness". The DataInputStream and DataOutputStream classes may be the answer, but they are explicitly defined to send/expect the high order byte first. If your data comes the other way around you will need to do some Java bit bashing to fix it.
EDIT : actually (as #Kevin Brock points out) java.nio.ByteBuffer allows you to specify the endian-ness when fetching various data types from a binary buffer.