NDK vs JAVA performance [closed]

NDK vs JAVA performance [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Is any body have a assumption how fast will be C code with NDK with the same calculations then java code?(if any)
lets say I am doing X calculations(the same calculations) in Y seconds in java code.
How many X calculations can i do in the same Y seconds through the C code in NDK? 1.2 ?2.7 ? any guess number?
Lets say that the calc is B=L/A +C/D (the same one for all X calculations).
EDIT:
Why i am asking this?
because i consider to move my java processing camera frames to the C code.for bigger resolutions opportunities

Since no one else want to touch this topic, since its not consider serious to try to answer it, I will have a go:
Java compiles to bytecode, and the bytecode compiles to native code by the JIT.
C compiles directly to native code.
The difference are really the extra compile step, and in theory java should do a better work then your C compiler, and here's why:
Java can insert statistics calculations into the generated native code, and then after a while regenerate it to optimize it against the current runtime paths in your code!
That last point sounds awesome, java do however come with some tradeoffs:
It needs GC runs to clean out memory
It may not JIT code at all
GC copies alive objects and throws all dead one, since GC does not need to do anything for the dead one only for the live ones, GC in theory is faster then the normal malloc/free loop for objects.
However, one thing is forgotten by most Java advocates and that is that nothing says that you will have to malloc/free every object instance when coding C. You can reuse memory, you can malloc up memory blocks and free memory blocks containing thousands of temporarily objects on one go.
With big heaps on Java, GC time increases, adding stall time. In some software it is totally OK with stall times during GC cleanup cycle, in others it causes fatal errors. Try keeping your software to respond under a defined number of milliseconds when a GC happens, and you will see what I'm talking about.
In some extreme cases, the JIT may also choose not to JIT the code at all. This happens when a JITed method would be to big, 8K if I remember correct. A non JITed method has a runtime penalty in the range of 20000% (200 times slower that is, at least at our customer it was). JIT is also turned of when the JVMs CodeCache starts to get full (if keep loading new classes into the JVM over and over again this can happen, also happen at customer site).
At one point JIT statistics also reduced concurrency on one 128 core machine to basically single core performance.
In Java the JIT has a specific amount of time to compile the bytecode to native code, it is not OK to spend all CPU resources for the JIT, since it runs in parallel with the code doing the actually work of your program. In C the compiler can run as long as it needs to spit out what it thinks is the most optimized code it can. It has no impact on execution time, where in Java it has.
What I'm saying is really this:
Java gives you more, but it's not always up to you how it performs.
C gives you less, but it's up to you how it performs.
So to answer your question:
No selecting C over Java will not make your program faster
If you only keep to simple math over a preallocate buffer, both Java and C compilers should spit out about the same code.

You will probably not get a clear anwser from anyone. The questions is far more complex than it looks like.
It is no problem to put the same number of polys out in OpenGL be it with the NDK or SDK. After all it's just same OpenGL calls. The time to render the polys (in a batch) exeeds the time of the function call overhead by orders of magnitude. So it is usually completely neglectable.
But as soon as an application gets more complex and performs some serious calculations(AI, Scene Graph Management, Culling, Image processing, Number crunching, etc.) the native version will usually be much faster.
And there is another thing: Beside the fundamental problem that there currently is no JIT Compilation.
The current dalvikvm with its compiler seems to be very basic, without doing any optimizations - even not the most basic ones!
There is this (very good) video: Google I/O 2009 - Writing Real-Time Games for Android
After I have seen it, it was clear for me I will definitely use C++ with the NDK.
For example: He is talking about the overhead of function calls "Don't use function calls".
... So yeah we are back - before 1970 and start talking about the cost of structured programming and the performance advantage of using only global vars and gotos.
The garbage collection is a real problem for games. So you will spend a lot of your time thinking how you can avoid it. Even formatting a string will create new objects. So there are tips like: don't show the FPS!
Seriously, if you know C++ it is probably easier to manage you memory with new and delete than to tweak your architecture to reduce/avoid garbage collections.
It seems like if you want to program a non trivial real time game, you are loosing all the advantages of Java. Don't use Getters and Setters, Don't use function calls. Avoid any Abstraction, etc. SERIOUSLY?
But back to your question: The performance advantage of NDK vs SDK can be anything from 0-10000%. It all depends.

Related

Define StringBuilder capacity when number of characters is unknown

I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
My question is, if we don't know the expected size, how should one go about it? Is there a standard value/way to avoid expensive operations under the hood?
If not, is there potentially a way of alarming/logging in the code if the capacity is bigger than the value given upon initialisation?

I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
This is a wildly incorrect statement. It is very bad practice to do this. Even if you know exactly how large it'll be.
If I see this code:
StringBuilder sb = new StringBuilder(in1.length() + in2.length() * 3 + loaded ? suffixLen : 0);
Then this is an additional thing to worry about, test, and keep up to date. I would assume if all this is present that for whatever reason somebody did some performance testing and actually figured out that this saves a worthwhile chunk of cycles, and somehow, in a fit of idiocy, neglected to write an enlightening comment and link to the JMH or profiler result analysis to verify this conclusion.
So, I'd either painstakingly attempt to manually analyse precisely if the calculation is still correct after an update to this code, or, I'd fix the problem and add the tests (and then be utterly befuddled, when, of course, the profile review shows this code is utterly inconsequential), or, I'd go through the considerable trouble of writing an assert based test case that will run the entire operation and then verify at the end that the size calculation done at the top is, in fact, correct.
I don't think you fully grasp why the hyperbolic premature optimization is the root of all evil statement is so popular.
Here's the problem. 99% of the system's resources are spent on 1% of the code. That's not an exaggeration; in fact, that is likely understating the issue.
Developer time is not infinite, and even if it was, the programmer's ability to comprehend code and focus on the relevant parts, is limited because, in the end, they are humans. Spending additional code that needs to be parsed and understood by human eyeballs and brains is therefore bad if the code does something irrelevant. We're literally talking about the same order of magnitude as you throwing a glass of water into the ocean down by the seashore in europe and then watching the water levels rise in manhattan. Beyond any and all ability to measure, and utterly incomprehensive. Bold does not do sufficient justice to how little it matters. Even if this code runs 100 million times a day for 15 years, it amounts to perhaps 5 cents in IAAS deployment costs total over that entire decade and a half, and that's if there is even a performance impact, which often there isn't because modern VMs, GCs, OSes, and CPU architectures get up to some crazy shenanigans.
Furthermore, the system optimizes. Optimizers, such as JVM hotspot engines, are in the end pattern matching machines. They find commonly used patterns and recognize how to run them as efficient as possible. By writing code in ways that nobody else does, it is highly unlikely that code is going to actually outperform the common (idiomatic) case. Most likely because it just doesn't matter, and even if it does, because the idiomatic case gets optimized much more readily.
Here is a trivial example:
List<String> someList = new ArrayList<String>();
for (int i = 0; i < 10000; i++) someList.add(someRandomString);
String[] arrayForm = someList.toArray(new String[0]);
Here you may go: Huh, well, we can optimize this code a little bit and pass new String[10000] instead; this saves the system from having to allocate an admittedly small object (a 0-size string array).
You would be wrong. The above code, with the new String[0], is in fact faster. How can that be? Optimizers, with pattern matching. They recognize the pattern and realize that the system can create a new array of the requisite size, not zero it out, and then run the code that fills it. Whereas the optimization patterns do not include the new String[reqSize] variant where the system could in theory also realize it can allocate the array and then omit zeroing out the array (which the JVM spec guarantees, which merely means the spec guarantees you can never observe that it wasn't zeroed out; it doesn't actually mean the JVM must zero it out, that's where the pattern optimization of not doing so is coming from). However, it doesn't do that - not common enough, and somewhat more complicated.
I'm not saying that new StringBuilder() is neccessarilyt faster than new StringBuilder(knownSize). I'm saying it:
99.9% of the time literally does not make one iota of difference. Not a single nanosecond - the speedup is entirely theoretical: No performance test of any stripe can detect the difference. If a tree falls in the forest, and all that.
You have no idea when that 0.1% of the time even is, or if it's not actually straight up never - 0%. Between a CPU that caches and pipelines (did you know modern CPUs cannot access memory? At all? I bet you didn't. The basic von neumann model of how CPUs work? Totally misleads you if you try to performance analyse machine code if you do that) - VMs, garbage collectors (did you know that garbage is free but live objects are expensive? Re-using an object is in fact more expensive that creating a ton of fast garbage.. depending on many factors, of course this too is an oversimplification. That's the real point: This is an intractable thing; you cannot just look at code and jump to conclusions about performance) - you stand no chance to know what's 'faster'.
The only right move is to write code as simple and as clean as you can ('clean' defined as: When you look at it, you jump to conclusions, and these conclusions are correct. It is easy to adjust in the face of changing requirements, and flexible in how it connects to the rest of the codebase). IF (big if!) real life situations result in a performance issue, you first run a profiler so you know the 1% of the code that is in any way relevant, and then you go ham on that, with JMH benchmarks and all sorts of performance experiments to optimize the heck out of it. If your code is clean, that's great, because almost always this requires adjusting how the code that calls into the 'hot path' or where the code flows to out of the 'hot path' - and the cleaner your code the easier that will be.
Needless performance optimization almost invariably reduces flexibility, and makes code harder to understand.
Hence, objectively, micro-optimizing like this just makes your code slower and buggier for literally no benefit. Not even a tiny, almost immeasurable one.
Hence, the advice is silly. The only correct call is new StringBuilder() - no pre-configured size. The one and only excuse you have to write new StringBuilder(presetCapacity) is if there's a lengthy comment that immediately precedes it that lays out in a lot of detail, or links to a ticket, the exact performance study done to indicate this indeed fixes a real performance issue and how to recreate that study, and on what schedule it should be revisited.

Code optimization to avoid branching

I just came across this article: Compute the minimum or maximum of two integers without branching
It starts with "[o]n some rare machines where branching is expensive...".
I used to think that branching is always expensive as it often forces the processor to clear and restart its execution pipeline (e.g. see Why is it faster to process a sorted array than an unsorted array?).
This leaves me with a couple of questions:
Did the writer of the article get that part wrong? Or was this article maybe written in a time before branching was an issue (I can't find a date on it).
Do modern processors have a way to complete minimal branches like the one in (x < y) ? x : y without performance degradation?
Or do all modern compilers simply implement this hack automatically? Specifically, what does Java do? Especially since its Math.min(...) function is just that ternary statement...

Did the writer of the article get that part wrong? Or was this article maybe written in a time before branching was an issue (I can't find a date on it).
The oldest comment is 5 years old, so it's no hot news. However, unpredictable branching is always expensive and so was it 5 years ago. In the meantime, it just got worse as modern CPUs can do much more per cycle and a mispredicted branch therefore cost more work.
But in a sense, the writer is right. The majority of CPUs is not found in our PCs and servers, but in embedded devices, where the situation differs.
Do modern processors have a way to complete minimal branches like the one in (x < y) ? x : y without performance degradation?
Yes and no. AFAIK Math.max gets always translated as a conditional move, which means no branching. You own max may or may not use it, depending on statistics the JVM collected.
There's no silver bullet. With predictable outcomes, branching is faster. Finding out exactly, what pattern the CPU recognizes, is hard. The JVM simply looks at how often a branch gets takes and uses a magic threshold of about 18%. See my own question and answer for details.
Or do all modern compilers simply implement this hack automatically? Specifically, what does Java do? Especially since its Math.min(...) function is just that ternary statement...
It's actually a compiler intrinsic. Whenever the JITc sees this very method called, it handles it specially. When you copy the method, it gets no special treatments.
In this case, the intrinsic is not very useful, as it's something what gets heavily optimized anyway. For methods like Long#numberOfLeadingZeros, the intrinsic is essential, as the code is rather long and slow and modern CPUs get do it in a single cycle.

How can I measure memory usage of java method [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
sounds strange, but still.
Imagine I have a java method
boolean myMethod(Object param1, Object param2) {
// here can be some simple calculations. can be some services invokation
// or calls to DB. doesn't matter
return someValue;
}
How can I measure how much memory JVM uses to execute this method ? How much memory actually uses this method for its execution ?
Is there any tools or I maybe I can add some extra code to my method to be able to see used memory ?
I am using JConsole to monitor JVM, I also know that we can use java.lang.Runtime to see freeMemory, but actually I can't figure out how much memory my method uses with this tools.

Measuring the true size of of each invocation of a method is not as simple as it sounds and the JVM does not support it. The complexity comes about a) because the JVM was designed to hide this low level detail and b) the JVM may have more than one variation of the same method in flight at any point of time, which sounds nuts but it is also true. Some times the method will have been inlined, or optimised more heavily or even recompiled in a special way to support hot swapping of the method while it is running (!) which can radically change its runtime size.
Measuring the amount of heap used by a method call
Given that method invocations do not use the heap, it is tempting to say 0 bytes. However one may want to know the size of any new object that was allocated by a method call. This can be answered, however objects are shared across method calls and threads. So one has to be very clear about what they want to measure, and take care to not count it twice. There are libraries that will give the size of an Object in Java, they usually work either by reflection and a heuristic or they wrap sun.misc.Unsafe to inspect pointers to the object and its data. For example http://code.google.com/p/javabi-sizeof/.
Measuring the allocation rate of a method
YourKit has an excellent tool described here for tracking every object allocation at its source point and reporting back their total size and rates. Very useful for finding out who is causing GC churn.
Measuring the amount of stack used per method call
The JVM does not publish this information, and as mentioned before it will vary depending on the optimizations that are currently active and which runtime compiler is being used.
The alternative is to use a heuristic. Something like counting up how many variables are used within the method and multiply the count by 4 to give an answer in bytes (which assumes that every variable is 4 bytes in size, ie an int). Obviously this heuristic is flawed and could be improved, but it is simple enough to give a quick and fairly representative answer.

I was using the following in the past, and I found it to be useful:
http://www.javamex.com/classmexer/

How to memory profile in Java?

I'm still learning the ropes of Java so sorry if there's a obvious answer to this. I have a program that is taking a ton of memory and I want to figure a way to reduce its usage, but after reading many SO questions I have the idea that I need to prove where the problem is before I start optimizing it.
So here's what I did, I added a break point to the start of my program and ran it, then I started visualVM and had it profile the memory(I also did the same thing in netbeans just to compare the results and they are the same). My problem is I don't know how to read them, I got the highest area just saying char[] and I can't see any code or anything(which makes sense because visualvm is connecting to the jvm and can't see my source, but netbeans also does not show me the source as it does when doing cpu profiling).
Basically what I want to know is which variable(and hopefully more details like in which method) all the memory is being used so I can focus on working there. Is there a easy way to do this? I right now I am using eclipse and java to develop(and installed visualVM and netbeans specifically for profiling but am willing to install anything else that you feel gets this job done).
EDIT: Ideally, I'm looking for something that will take all my objects and sort them by size(so I can see which one is hogging memory). Currently it returns generic information such as string[] or int[] but I want to know which object its referring to so I can work on getting its size more optimized.

Strings are problematic
Basically in Java, String references ( things that use char[] behind the scenes ) will dominate most business applications memory wise. How they are created determines how much memory they consume in the JVM.
Just because they are so fundamental to most business applications as a data type, and they are one of the most memory hungry as well. This isn't just a Java thing, String data types take up lots of memory in pretty much every language and run time library, because at the least they are just arrays of 1 byte per character or at the worse ( Unicode ) they are arrays of multiple bytes per character.
Once when profiling CPU usage on a web app that also had an Oracle JDBC dependency I discovered that StringBuffer.append() dominated the CPU cycles by many orders of magnitude over all other method calls combined, much less any other single method call. The JDBC driver did lots and lots of String manipulation, kind of the trade off of using PreparedStatements for everything.
What you are concerned about you can't control, not directly anyway
What you should focus on is what in in your control, which is making sure you don't hold on to references longer than you need to, and that you are not duplicating things unnecessarily. The garbage collection routines in Java are highly optimized, and if you learn how their algorithms work, you can make sure your program behaves in the optimal way for those algorithms to work.
Java Heap Memory isn't like manually managed memory in other languages, those rules don't apply
What are considered memory leaks in other languages aren't the same thing/root cause as in Java with its garbage collection system.
Most likely in Java memory isn't consumed by one single uber-object that is leaking ( dangling reference in other environments ).
It is most likely lots of smaller allocations because of StringBuffer/StringBuilder objects not sized appropriately on first instantantations and then having to automatically grow the char[] arrays to hold subsequent append() calls.
These intermediate objects may be held around longer than expected by the garbage collector because of the scope they are in and lots of other things that can vary at run time.
EXAMPLE: the garbage collector may decide that there are candidates, but because it considers that there is plenty of memory still to be had that it might be too expensive time wise to flush them out at that point in time, and it will wait until memory pressure gets higher.
The garbage collector is really good now, but it isn't magic, if you are doing degenerate things, it will cause it to not work optimally. There is lots of documentation on the internet about the garbage collector settings for all the versions of the JVMs.
These un-referenced objects may just have not reached the time that the garbage collector thinks it needs them to for them to be expunged from memory, or there could be references to them held by some other object ( List ) for example that you don't realize still points to that object. This is what is most commonly referred to as a leak in Java, which is a reference leak more specifically.
EXAMPLE: If you know you need to build a 4K String using a StringBuilder create it with new StringBuilder(4096); not the default, which is like 32 and will immediately start creating garbage that can represent many times what you think the object should be size wise.
You can discover how many of what types of objects are instantiated with VisualVM, this will tell you what you need to know. There isn't going to be one big flashing light that points at a single instance of a single class that says, "This is the big memory consumer!", that is unless there is only one instance of some char[] that you are reading some massive file into, and this is not possible either, because lots of other classes use char[] internally; and then you pretty much knew that already.
I don't see any mention of OutOfMemoryError
You probably don't have a problem in your code, the garbage collection system just might not be getting put under enough pressure to kick in and deallocate objects that you think it should be cleaning up. What you think is a problem probably isn't, not unless your program is crashing with OutOfMemoryError. This isn't C, C++, Objective-C, or any other manual memory management language / runtime. You don't get to decide what is in memory or not at the detail level you are expecting you should be able to.

In JProfiler, you can take go to the heap walker and activate the biggest objects view. You will see the objects the retain most memory. "Retained" memory is the memory that would be freed by the garbage collector if you removed the object.
You can then open the object nodes to see the reference tree of the retained objects. Here's a screen shot of the biggest object view:
Disclaimer: My company develops JProfiler

I would recommend capturing heap dumps and using a tool like Eclipse MAT that lets you analyze them. There are many tutorials available. It provides a view of the dominator tree to provide insight into the relationships between the objects on the heap. Specifically for what you mentioned, the "path to GC roots" feature of MAT will tell you where the majority of those char[], String[] and int[] objects are being referenced. JVisualVM can also be useful in identifying leaks and allocations, particularly by using snapshots with allocation stack traces. There are quite a few walk-throughs of the process of getting the snapshots and comparing them to find the allocation point.

Java JDK comes with JVisualVM under bin folder, once your application server (for example is running) you can run visualvm and connect it to your localhost, which will provide you memory allocation and enable you to perform heap dump

If you use visualVM to check your memory usage, it focuses on the data, not the methods. Maybe your big char[] data is caused by many String values? Unless you are using recursion, the data will not be from local variables. So you can focus on the methods that insert elements into large data structures. To find out what precise statements cause your "memory leakage", I suggest you additionally
read Josh Bloch's Effective Java Item 6: (Eliminate obsolete object references)
use a logging framework an log instance creations on the highest verbosity level.

There are generally two distinct approaches to analyse Java code to gain an understanding of its memory allocation profile. If you're trying to measure the impact of a specific, small section of code – say you want to compare two alternative implementations in order to decide which one gives better runtime performance – you would use a microbenchmarking tool such as JMH.
While you can pause the running program, the JVM is a sophisticated runtime that performs a variety of housekeeping tasks and it's really hard to get a "point in time" snapshot and an accurate reading of the "level of memory usage". It might allocate/free memory at a rate that does not directly reflect the behaviour of the running Java program. Similarly, performing a Java object heap dump does not fully capture the low-level machine specific memory layout that dictates the actual memory footprint, as this could depend on the machine architecture, JVM version, and other runtime factors.
Tools like JMH get around this by repeatedly running a small section of code, and observing a long-running average of memory allocations across a number of invocations. E.g. in the GC profiling sample JMH benchmark the derived *·gc.alloc.rate.norm metric gives a reasonably accurate per-invocation normalised memory cost.
In the more general case, you can attach a profiler to a running application and get JVM-level metrics, or perform a heap dump for offline analysis. Some commonly used tools for profiling full applications are Async Profiler and the newly open-sourced Java Flight Recorder in conjunction with Java Mission Control to visualise results.

Should I inline long code in a loop, or move it in a separate method?

Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?

The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.

There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.

Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.

There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.

From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.

Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.