I do not have a teacher who I can ask questions about efficiency, so I will ask it here.
If I am only looking to have fast working code, not paying attention to ram use, only cpu:
I assume that checking 'if' once is faster than writing a variable once. But what is the ratio? When is it worth always checking if the variable is not already at the value that I am going to set it to?
For example:
//ex. 1
int a = 5;
while (true) {
a = 5;
}
//ex. 2
int a = 5;
while (true) {
if (a != 5) a = 5;
}
//ex. 3
int a = 6;
while (true) {
if (a != 5) a = 5;
a = 6;
}
I guess ex. 2 will work faster than ex. 1 because 'a' always stays at '5'. In this case 'if' speeds up the process by not writing a new value to 'a' everytime. But if 'a' often changes, like in ex. 3, then checking if (a != 5) is not necessary and slows down the process. So this checking is worth it if the variable stays the same most of the time; and not worth it if the variable changes most of the time. But where is the ratio? Maybe writing a variable takes 1000 times more time than just checking it? Or maybe writing almost takes the same time as checking it? Im not asking for an exact answer, I just always wonder what is best for my code.
Short answer: it doesn't matter.
Long answer: It really doesn't matter at that low level. Even if you were to actually compare the executed machine code, there are so many things in between (the JIT compiler for one, all sorts of CPU caches for other).
Gone are the times when you needed to micro-optimize things like this. What you need to make sure is that you're using effective algorithms. And as always, premature optimization is the root of all evil.
I noted that you wrote "I just always wonder what is the best way for my code". The best way is to write clear code, so that other people can understand what you're doing (if they saw code like in your examples, they would think you're insane). Another old adage was that in order for the JVM to optimize your code in the best way, you should write "dumb code". The JIT optimizer can then understand the code better and convert it to a more efficient form.
Related
Let's imagine I have a lib which contains the following simple method:
private static final String CONSTANT = "Constant";
public static String concatStringWithCondition(String condition) {
return "Some phrase" + condition + CONSTANT;
}
What if someone wants to use my method in a loop? As I understand, that string optimisation (where + gets replaced with StringBuilder or whatever is more optimal) is not working for that case? Or this is valid for strings initialised outside of the loop?
I'm using java 11 (Dropwizard).
Thanks.
No, this is fine.
The only case that string concatenation can be problematic is when you're using a loop to build one single string. Your method by itself is fine. Callers of your method can, of course, mess things up, but not in a way that's related to your method.
The code as written should be as efficient as making a StringBuilder and appending these 3 constants to it. There certainly is absolutely no difference at all between a literal ("Some phrase"), and an expression that the compiler can treat as a Compile Time Constant (which CONSTANT, here, clearly is - given that CONSTANT is static, final, not null, and of a CTCable type (All primitives and strings)).
However, is that 'efficient'? I doubt it - making a stringbuilder is not particularly cheap either. It's orders of magnitude cheaper than continually making new strings, sure, but there's always a bigger fish:
It doesn't matter
Computers are fast. Really, really fast. It is highly likely that you can write this incredibly badly (performance wise) and it still won't be measurable. You won't even notice. Less than a millisecond slower.
In general, anybody that worries about performance at this level simply lacks perspective and knowledge: If you apply that level of fretting to your java code and you have the knowledge to know what could in theory be non-perfectly-performant, you'll be sweating every 3rd character you ever type. That's no way to program. So, gain that perspective (or take it from me, "just git gud" is not exactly something you can do in a week - take it on faith for now, as you learn you can start verifying) - and don't worry about it. Unless you actually run into an actual situation where the code is slower than it feels like it could be, or slower than it needs to be, and then toss profilers and microbenchmark testing frameworks at it, and THEN, armed with all that information (and not before!), consider optimizing. The reports tell you what to optimize, because literally less than 1% of the code is responsible for 99% of the performance loss, so spending any time on code that isn't in that 1% is an utter waste of time, hence why you must get those reports first, or not start at all.
... or perhaps it does
But if it does matter, and it's really that 1% of the code that is responsible for 99% of the loss, then usually you need to go a little further than just 'optimize the method'. Optimize the entire pipeline.
What is happening with this string? Take that into consideration.
For example, let's say that it, itself, is being appended to a much bigger stringbuilder. In which case, making a tiny stringbuilder here is incredibly inefficient compared to rewriting the method to:
public static void concatStringWithCondition(StringBuilder sb, String condition) {
sb.append("Some phrase").append(condition).append(CONSTANT);
}
Or, perhaps this data is being turned into bytes using UTF_8 and then tossed onto a web socket. In that case:
private static final byte[] PREFIX = "Some phrase".getBytes(StandardCharsets.UTF_8);
private static final byte[] SUFFIX = "Some Constant".getBytes(StandardCharsets.UTF_8);
public void concatStringWithCondition(OutputStream out, String condition) {
out.write(PREFIX);
out.write(condition.getBytes(StandardCharsets.UTF_8));
out.write(SUFFIX);
}
and check if that outputstream is buffered. If not, make it buffered, that'll help a ton and would completely dwarf the cost of not using string concatenation. If the 'condition' string can get quite large, the above is no good either, you want a CharsetEncoder that encodes straight to the OutputStream, and may even want to replace all that with some ByteBuffer based approach.
Conclusion
Assume performance is never relevant until it is.
IF performance truly must be tackled, strap in, it'll take ages to do it right. Doing it 'wrong' (applying dumb rules of thumb that do not work) isn't useful. Either do it right, or don't do it.
IF you're still on bard, always start with profiler reports and use JMH to gather information.
Be prepared to rewrite the pipeline - change the method signatures, in order to optimize.
That means that micro-optimizing, which usually sacrifices nice abstracted APIs, is actively bad for performance - because changing pipelines is considerably more difficult if all code is micro-optimized, given that this usually comes at the cost of abstraction.
And now the circle is complete: Point 5 shows why the worrying about performance as you are doing in this question is in fact detrimental: It is far too likely that this worry results in you 'optimizing' some code in a way that doesn't actually run faster (because the JVM is a complex beast), and even if it did, it is irrelevant because the code path this code is on is literally only 0.01% or less of the total runtime expenditure, and in the mean time you've made your APIs worse and lack abstraction which would make any actually useful optimization much harder than it needs to be.
But I really want rules of thumb!
Allright, fine. Here are 2 easy rules of thumb to follow that will lead to better performance:
When in rome...
The JVM is an optimising marvel and will run the craziest code quite quickly anyway. However, it does this primarily by being a giant pattern matching machine: It finds recognizable code snippets and rewrites these to the fastest, most carefully tuned to juuust your combination of hardware machine code it can. However, this pattern machine isn't voodoo magic: It's got limited patterns. Which patterns do JVM makers 'ship' with their JVMs? Why, the common patterns, of course. Why include a pattern for exotic code virtually nobody ever writes? Waste of space.
So, write code the way java programmers tend to write it. Which very much means: Do not write crazy code just because you think it might be faster. It'll likely be slower. Just follow the crowd.
Trivial example:
Which one is faster:
List<String> list = new ArrayList<String>();
for (int i = 0; i < 10000; i++) list.add(someRandomName());
// option 1:
String[] arr = list.toArray(new String[list.size()]);
// option 2:
String[] arr = list.toArray(new String[0]);
You might think, obviously, option 1, right? Option 2 'wastes' a string array, making a 0-length array just to toss it in the garbage right after. But you'd be wrong: Option 2 is in fact faster (if you want an explanation: The JVM recognizes it, and does a hacky move: It makes an new string array that does not need to be initialized with all zeroes first. Normal java code cannot do this (arrays are neccessarily initialized blank, to prevent memory corruption issues), but specifically .toArray(new X[0])? Those pattern matching machines I told you about detect this and replace it with code that just blits the refs straight into a patch of memory without wasting time writing zeroes to it first.
It's a subtle difference that is highly unlikely to matter - it just highlights: Your instincts? They will mislead you every time.
Fortunately, .toArray(new X[0]) is common java code. And easier and shorter. So just write nice, convenient code that looks like how other folks write and you'd have gotten the right answer here. Without having to know such crazy esoterics as having to reason out how the JVM needs to waste time zeroing out that array and how hotspot / pattern matching might possibly eliminate this, thus making it faster. That's just one of 5 million things you'd have to know - and nobody can do that. Thus: Just write java code in simple, common styles.
Algorithmic complexity is a thing hotspot can't fix for you
Given an O(n^3) algorithm fighting an O(log(n) * n^2) algorithm, make n large enough and the second algorithm has to win, that's what big O notation means. The JVM can do a lot of magic but it can pretty much never optimize an algorithm into a faster 'class' of algorithmic complexity. You might be surprised at the size n has to be before algorithmic complexity dominates, but it is acceptable to realize that your algorithm can be fundamentally faster and do the work on rewriting it to this more efficient algorithm even without profiler reports and benchmark harnesses and the like.
I'm trying to make a game and I have a Selection class that holds a string named str in it. I apply the following code to my selection objects every 17 milliseconds.
if(s.Str == "Upgrade") {
}else if(s.Str == "Siege") {
}else if(s.Str == "Recruit") {
}
In other words, these selection objects will do different jobs according to their types(upgrade,siege etc...). I am using str variable elsewhere. my question is that:
Would it be more optimized if I assign the types to an integer when I first create the objects?
if(s.type == 1) {
}else if(s.type == 2) {
}else if(s.type == 3) {
}
This would make me write extra lines of code(Since I have to separate objects by type when I first create) and make the code more difficult to understand, but would there be a difference between comparing integers rather than comparing strings?
If you compare strings >that< way, there is probably no performance difference.
However, that is the WRONG WAY to compare strings. The correct way is to use the equals(Object) method. For example.
if (s.Str.equals("Upgrade")) {
Read this:
How do I compare strings in Java?
I apply the following code to my selection objects every 17 milliseconds.
The time that it will take to test two strings for equality is probably in the order of tens of NANOseconds. So ... basically ... the difference between comparing strings or integers is irrelevant.
This illustrates why premature optimization is a bad thing. You should only optimize code when you know that it is going to be worthwhile to spend your time on it; i.e. when you know there is going to be a pay-off.
So should I optimize after I write and finish all the code? Does 'not doing premature optimization' means that?
No it doesn't exactly mean that. (Well .. not to me anyway.) What it means to me is that you shouldn't optimize until:
you have a working program whose performance you can measure,
you have determined specific (quantifiable) performance criteria,
you have a means of measuring the performance; e.g. an appropriate benchmarks involving real or realistic use-cases, and
you have good a means of identifying the actual performance hotspots.
If you try to optimize before you have the above, you are likely to optimize the wrong parts of the code for the wrong reasons, and your effort (programmer time) is likely to be spent inefficiently.
In your specific case, my gut feeling is that if you followed the recommended process you would discover1 that this String vs int (vs enum) is irrelevant to your game's observable performance2.
But if you want to be more scientific than "gut feeling", you should wait until you have 1 through 4 settled, and then measure to see if the actual performance meets your criteria. Only then should you decide whether or not to optimize.
1 - My prediction assumes that your characterization of the problem is close enough to reality. That is always a risk when people try to identify performance issues "by eye" rather than by measuring.
2 - It is relevant to other things; e.g. code readability and maintainability, but I'm not going to address those in this Answer.
The Answer by Stephen C is correct and wise. But your example code is ripe for a different solution entirely.
Enum
If you want performance, type-safety, easier-to-read code, and want to ensure valid values, use enum objects rather than mere strings or integers.
public enum Action { UPGRADE , SIEGE , RECRUIT }
You can use a switch for the various enum possible objects.
Action action = Action.SIEGE ;
…
switch ( action )
{
case UPGRADE:
doUpgradeStuff() ;
break;
case SIEGE:
doSiegeStuff() ;
break;
case RECRUIT:
doRecruitStuff() ;
break;
default:
doDefaultStuff() ;
break;
}
Using enums this way will get even better in the future. See JEP 406: Pattern Matching for switch (Preview).
See Java Tutorials by Oracle on enums. And for an example, see their tutorial using enums for month, day-of-week, and text style.
See also this Question, linked to others.
Comparing primitive numbers like Integer will be definitely faster compared to String in Java. It will give you faster performance if you are executing it every 17 milliseconds.
Yes there is difference. String is a object and int is a primitive type. when you are doing object == "string" it is matching the address. You need to use equals method to check the exact match.
I have a question about the for and while loops, as we have to travel a value until a condition is met. I wonder which is more efficient at low level, and why?
That is, these two codes give the same result:
FOR:
for (int i = 0; i<10 ; i++)
{
if (i==4)
{
return;
}
}
WHILE:
int i=0;
while (i<10 and i!=4)
{
i++;
}
This is a small example of a possible loop, and we could be looking at a record of thousands.
What code is more effective? I've always said that I have to use a while in this case, but I wonder if a low level is still better while or better yet is for.
Thank you very much.
The answer is: it doesn't matter.
You will not see any difference in performance in either, unless you really try hard to make code to see the difference, and what really matters is the readability of your code (and this is where you'll save time and and money in the future), so use whichever one is more understandable.
In your case, i'll suggest the While approach ...
I'll also suggest reading this article by Eric Lippert: How Bad Is Good Enough?, just in case you're not sold on the readability vs. silly optimizations :)
They should compile very similarly. At a low level you are looking at executing the commands within the loop, and then you will have two calls to compare a value and jump to the next block of code if the condition calls for exiting the loop.
As mentioned above, while should lead to better readability and thus is the better choice.
Both for and while are the same.
The only difference is where you place the condition.
internally the while loop uses the 'for' syntax in low level.
In your scenario. While is the best option ** if you don't know the upper limit **
you can use
While(i!=4)
{
i++
}
Use for loop if you know the upper limit, else while is the best friend.
I saw the following code in this commit for MongoDB's Java Connection driver, and it appears at first to be a joke of some sort. What does the following code do?
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
(EDIT: the code has been updated since posting this question)
After inspecting the history of that line, my main conclusion is that there has been some incompetent programming at work.
That line is gratuitously convoluted. The general form
a? true : b
for boolean a, b is equivalent to the simple
a || b
The surrounding negation and excessive parentheses convolute things further. Keeping in mind De Morgan's laws it is a trivial observation that this piece of code amounts to
if (!_ok && Math.random() <= 0.1)
return res;
The commit that originally introduced this logic had
if (_ok == true) {
_logger.log( Level.WARNING , "Server seen down: " + _addr, e );
} else if (Math.random() < 0.1) {
_logger.log( Level.WARNING , "Server seen down: " + _addr );
}
—another example of incompetent coding, but notice the reversed logic: here the event is logged if either _ok or in 10% of other cases, whereas the code in 2. returns 10% of the times and logs 90% of the times. So the later commit ruined not only clarity, but correctness itself.
I think in the code you have posted we can actually see how the author intended to transform the original if-then somehow literally into its negation required for the early return condition. But then he messed up and inserted an effective "double negative" by reversing the inequality sign.
Coding style issues aside, stochastic logging is quite a dubious practice all by itself, especially since the log entry does not document its own peculiar behavior. The intention is, obviously, reducing restatements of the same fact: that the server is currently down. The appropriate solution is to log only changes of the server state, and not each its observation, let alone a random selection of 10% such observations. Yes, that takes just a little bit more effort, so let's see some.
I can only hope that all this evidence of incompetence, accumulated from inspecting just three lines of code, does not speak fairly of the project as a whole, and that this piece of work will be cleaned up ASAP.
https://github.com/mongodb/mongo-java-driver/commit/d51b3648a8e1bf1a7b7886b7ceb343064c9e2225#commitcomment-3315694
11 hours ago by gareth-rees:
Presumably the idea is to log only about 1/10 of the server failures (and so avoid massively spamming the log), without incurring the cost of maintaining a counter or timer. (But surely maintaining a timer would be affordable?)
Add a class member initialized to negative 1:
private int logit = -1;
In the try block, make the test:
if( !ok && (logit = (logit + 1 ) % 10) == 0 ) { //log error
This always logs the first error, then every tenth subsequent error. Logical operators "short-circuit", so logit only gets incremented on an actual error.
If you want the first and tenth of all errors, regardless of the connection, make logit class static instead of a a member.
As had been noted this should be thread safe:
private synchronized int getLogit() {
return (logit = (logit + 1 ) % 10);
}
In the try block, make the test:
if( !ok && getLogit() == 0 ) { //log error
Note: I don't think throwing out 90% of the errors is a good idea.
I have seen this kind of thing before.
There was a piece of code that could answer certain 'questions' that came from another 'black box' piece of code. In the case it could not answer them, it would forward them to another piece of 'black box' code that was really slow.
So sometimes previously unseen new 'questions' would show up, and they would show up in a batch, like 100 of them in a row.
The programmer was happy with how the program was working, but he wanted some way of maybe improving the software in the future, if possible new questions were discovered.
So, the solution was to log unknown questions, but as it turned out, there were 1000's of different ones. The logs got too big, and there was no benefit of speeding these up, since they had no obvious answers. But every once in a while, a batch of questions would show up that could be answered.
Since the logs were getting too big, and the logging was getting in the way of logging the real important things he got to this solution:
Only log a random 5%, this will clean up the logs, whilst in the long run still showing what questions/answers could be added.
So, if an unknown event occurred, in a random amount of these cases, it would be logged.
I think this is similar to what you are seeing here.
I did not like this way of working, so I removed this piece of code, and just logged these
messages to a different file, so they were all present, but not clobbering the general logfile.
I recently profiled some code using JVisualVM, and found that one particular method was taking up a lot of execution time, both from being called often and from having a slow execution time. The method is made up of a large block of if statements, like so: (in the actual method there are about 30 of these)
EcState c = candidate;
if (waypoints.size() > 0)
{
EcState state = defaultDestination();
for (EcState s : waypoints)
{
state.union(s);
}
state.union(this);
return state.isSatisfied(candidate);
}
if (c.var1 < var1)
return false;
if (c.var2 < var2)
return false;
if (c.var3 < var3)
return false;
if (c.var4 < var4)
return false;
if ((!c.var5) & var5)
return false;
if ((!c.var6) & var6)
return false;
if ((!c.var7) & var7)
return false;
if ((!c.var8) & var8)
return false;
if ((!c.var9) & var9)
return false;
return true;
Is there a better way to write these if statements, or should I look elsewhere to improve efficiency?
EDIT: The program uses evolutionary science to develop paths to a given outcome. Specifically, build orders for Starcraft II. This method checks to see if a particular evolution satisfies the conditions of the given outcome.
First, you are using & instead of &&, so you're not taking advantage of short circuit evaluation. That is, the & operator is going to require that both conditions of both sides of the & be evaluated. If you are genuinely doing a bitwise AND operation, then this wouldn't apply, but if not, see below.
Assuming you return true if the conditions aren't met, you could rewrite it like this (I changed & to &&).
return
!(c.var1 < var1 ||
c.var2 < var2 ||
c.var3 < var3 ||
c.var4 < var4 ||
((!c.var5) && var5) ||
((!c.var6) && var6) ||
((!c.var7) && var7) ||
((!c.var8) && var8) ||
((!c.var9) && var9));
Secondly, you want to try to move the conditions that will most likely be true to the top of the expression chain, that way, it saves evaluating the remaining expressions. For example, if (c1.var4 < var4) is likely to be true 99% of the time, you could move that to the top.
Short of that, it seems a bit odd that you'd be getting a significant amount of time spent in this method unless these conditions hit a database or something like that.
First, try rewriting the sequence of if statements into one statement (per #dcp's answer).
If that doesn't make much difference, then the bottleneck might be the waypoints code. Some possibilities are:
You are using some collection type for which waypoints.size() is expensive.
waypoints.size() is a large number
defaultDestination() is expensive
state.union(...) is expensive
state.isSatisfied(...) is expensive
One quick-and-dirty way to investigate this is to move all of that code into a separate method and see if the profiler tells you it is a bottleneck.
If that's not the problem then your problem is intractable, and the only way around it would be to find some clever way to avoid having to do so many tests.
Rearranging the test order might help, if there is an order that is likely to return false more quickly.
If there is a significant chance that this and c are the same object, then an initial test of this == c may help.
If all of your EcState objects are compared repeatedly and they are immutable, then you could potentially implement hashCode to cache its return value, and use hashCode to speed up the equality testing. (This is a long shot ... lots of things have to be "right" for this to help.)
Maybe you could use hashCode equality as a proxy for equality ...
As always, the best thing to do is measure it yourself. You can instrument this code with calls to System.nanotime() to get very fine-grained durations. Get the starting time, and then compute how long various big chunks of your method actually take. Take the chunk that's the slowest and then put more nanotime() calls in it. Let us know what you find, too, that will be helpful to other folks reading your question.
So here's my seat of the pants guess ...
Optimizing the if statements will have nearly no measurable effect: these comparisons are all quite fast.
So let's assume the problem is in here:
if (waypoints.size() > 0)
{
EcState state = defaultDestination();
for (EcState s : waypoints)
{
state.union(s);
}
state.union(this);
return state.isSatisfied(candidate);
}
I'm guessing waypoints is a List and that you haven't overridden the size() method. In this case, List.size() is just accessing an instance variable. So don't worry about the if statement.
The for statement iterates over your List's elements quite quickly, so the for itself isn't it, though the problem could well be the code it executes. Assignments and returns take no time.
This leaves the following potential hot spots:
The one call to defaultDestination().
All the calls to EcState.union().
The one call to EcState.isSatisfied().
I'd be willing to bet your hotspot is in union(), especially since it's building up some sort of larger and larger collection of waypoints.
Measure with nanotime() first though.
You aren't going to find too many ways to actually speed that up. The two main ones would be taking advantage of short-circuit evaluation, as has already been said, by switching & to &&, and also making sure that the order of the conditions is efficient. For example, if there's one condition that throws away 90% of the possibilities, put that one condition first in the method.