I have the most curious index problem I could imagine. I have the following innocent-looking code:
int lastIndex = givenOrder.size() - 1;
if (lastIndex >= 0 && givenOrder.get(lastIndex).equals(otherOrder)) {
givenOrder.remove(lastIndex);
}
Looks like a proper pre-check to me. (The list here is declared as List, so there is no direct access to the last element, but that is immaterial for the question anyway.) I get the following stack trace:
java.lang.IndexOutOfBoundsException: Index: 0, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:604) ~[na:1.7.0_17]
at java.util.ArrayList.remove(ArrayList.java:445) ~[na:1.7.0_17]
at my.code.Here:48) ~[Here.class:na]
At runtime, it’s a simple ArrayList. Now, index 0 should be quite inside the bounds!
Edit
Many people have suggested that introducing synchronization would solve the problem. I do not doubt it. But the core of my (admittedly unexpressed) question is different: How is that behaviour even possible?
Let me elaborate. We have 5 steps here:
I check the size and compute lastIndex (size is 1 here)
I even access that last element
I request removal
ArrayList checks the bounds, finding them inadequate
ArrayList constructs the exception message and throws
Strictly speaking, granularity could be even finer. Now, 50,000 times it works as expected, no concurrency issues. (Frankly, I haven’t even found any other place where that list could be modified, but the code is too large to rule that out.)
Then, one time it breaks. That’s normal for concurrency issues. However, it breaks in an entirely unexpected way. Somewhere after step 2 and before step 4, the list is emptied. I would expect an exception saying IndexOutOfBoundsException: Index: 0, Size: 0, which is bad enough. But I never saw an exception like this in the last months!
Instead, I see IndexOutOfBoundsException: Index: 0, Size: 1 which means that after step 4 but before step 5 the list gains one element. While this is possible, it seems about as unlikely as the phenomenon above. Yet, it happens each time that the error occurs! As a mathematician, I say that this is just very unprobable. But my common sense tells me that there is another issue.
Moreover, looking at the code in ArrayList, you see very short functions there that are run hundreds of times, and no volatile variable anywhere. That means that I would very much expect the hotspot compiler to have elided the function calls, making the critical section much smaller; and the elided the double access to the size variable, making the observed behaviour impossible. Clearly, this isn’t happening.
So, my question is why this can happen at all and why it happens in this weird way. Suggesting synchronization is not an answer to the question (it may be a solution to the problem, but that is a different matter).
So I have checked the source code for ArrayList implementation of rangeCheck - method that throws exception and this is what I have found:
private void rangeCheck(int paramInt) //given index
{
if (paramInt < this.size) // compare param with list size
return;
throw new IndexOutOfBoundsException(outOfBoundsMsg(paramInt)); // here we have exception
}
and relevant outOfBoundsMsg method
private String outOfBoundsMsg(int paramInt)
{
return "Index: " + paramInt + ", Size: " + this.size; /// OOOoo we are reading size again!
}
So as you can probably see, size (this.size) of list is accessed 2 times. First time it is read to check condition and the condition is not fullfilled, so the message is build for the exception. While creating message for the exception, only paramInt is persistent between calls, but size of list is read second time. And here we have our culprit.
In real, you should get Message : Index:0 Size:0, but the size value used for checking is not locally stored (microoptimalization). So between these 2 reads of this.size list has been changed.
That is why message is missleading.
Conclussion:
Such situation is possible in hightly concurrent environement, and can be very hard to reproduce. To solve that problem use synchronized version of ArrayList (like #JordiCastillia suggested). This solution can have impact on performance as every operation (add/remove and probably get) will be synchronized. Other solution would be to put your code into synchronized block, but this will only synchronized your calls in this piece of code, and the problem can still occure in the future, as different parts of the system can still access whole object async.
This is most likely a concurrency issue.
The size gets somehow modified before/after you tried to access the index.
Use Collections.synchronizedList().
Tested in a simple main it works:
List<String> givenOrder = new ArrayList<>();
String otherOrder = "null";
givenOrder.add(otherOrder);
int lastIndex = givenOrder.size() - 1;
if (lastIndex >= 0 && givenOrder.get(lastIndex).equals(otherOrder)) {
System.out.println("remove");
givenOrder.remove(lastIndex);
}
Are you on a thread-safe process? Your List is modified by some other thread or process.
Related
Suppose the following situation in a loop:
LinkedList<String> myList = someMethodReturnsList();
int start = 0, end = 0;
while (end < myList.size() && someOtherCondition)
++end;
List<String> subList = myList.sublist(start, end);
... (do stuff and possibly alter list)
start = end;
I'm having a situation where only sometimes, that call to sublist will throw an IndexOutOfBoundsException. Given my first test of end < myList.size(), this troubled me, so I wrote some debug code. My debug code told me that somewhere between that while loop and calling sublist, my end value ended up being 34 while myList.size() was returning 33.
How is that even possible? There are no other threads that can be operating on this list in my program, so how did my loop check pass and increment end to 34?
EDIT: this consistently happens at a particular point in my code's execution, so it's not a fluke error, but it doesn't happen with every input that this function has, which makes this even stranger.
What you are describing is impossible.
Actually, the only way for this to happen is your own debugging: There are IDE features like code elements inspection that execute parts of the code while the application is paused on a breakpoint.
You have to be changing the value of end with the debugger tools.
My guess is along the lines of what #Jalitha said, off by one: the sublist call should be:
List<String> subList=myList.sublist(start,end-1);
The reason that you are sometimes getting the correct number probably has to do with your someOtherCondition.
Remember that size() is 1 more than the last index and subList works on indexes.
I would have put this in a comment but unable due to reputation :(
Aha! I've figured it out through careful debugging.
Turns out, on the loop iteration where this exception occurs, the end value is starting at 34. Thus, the increment loop passes, and there is an index error.
The reason is that in the previous iteration, myList.size() was 36, but over the course of that loop, 3 elements were removed.
Looks to me like a classic 'off-by-one' array error. You're likely having end equal to myList.size(), and then referencing one more ahead than you mean to.
From your while loop for e.g. if array had 5 elements:
while (end < 5 && conditions)
{
++end; //results in end being 5
}
then later on it goes
List<String> subList = myList.sublist(0, 5);
and 5 is outside the bounds as the array only has indices 0,1,2,3,4
I think you should either make it while (end+1 < 5 && conditions) or myList.sublist(0, end-1);
EDIT: Props to #Justen, for pointing out my mistake.
This is a strange case that recently came up while profiling a specialised collection I've been working on.
The collection is pretty much just two arrays, one an int[] array of keys, and one an Object[] array of values, with a hash function providing rapid lookup. It's all working nicely, but I've come to profiling the code and am getting some weird results; for profiling I've decided to do it the old fashioned way, by grabbing System.currentTimeMillis(), running a test over and over and then checking how much time has elapsed, like so:
long sTime = System.currentTimeMillis();
for (int index : indices)
foo.remove(index);
long took = System.currentTimeMillis() - sTime;
In my test I have foo prepared with 200,000 entries, and a pre-generated the list of indices that I will remove. I reset and run the test using a loop for a thousand repetitions and add took to a running total.
Now, for commands I get extremely good results compared to other data types, except with my remove(int) method. However, I've been struggling to figure out why, as my removal method is identical to my get(int) method (other than the removal obviously), as shown:
public Object get(int key) {
int i = getIndex(key); // Hashes key and locates it
return (i >= 0) ? this.values[i] : null;
}
public Object remove(int key) {
int i = getIndex(key); // Does exactly the same as above
if (i >= 0) {
--this.size;
++this.modifications; // For concurrent access behaviour
this.keys[i] = 0; // Zero indicates null entry
Object old = this.values[i];
this.values[i] = null;
return old;
}
return null;
}
While I would expect the removal to take a bit longer, they're taking more than 5 times as long to execute as get(int). However, if I comment out the line this.keys[i] = 0 then performance becomes nearly identical to get(int).
Am I correct in observing that this is an issue with assigning a value to my int[] array? I've tried commenting out all the this.values operations and experience the same slow times, but leaving this.values while commenting out this.keys[i] = 0 consistently solves the problem; I'm at a total loss as to what's going on, is there anything to be done about it?
The performance is still good considering that removals are relatively rare, but it seems strange that setting a value in an int[] is seemingly having such a big impact, so I'm curious to know why.
The code as written doesn't work concurrently. If there's other concurrency code not shown, that could well be the source timing differences. Other than that, the most likely cause is merely accessing the keys[] array in addition to the values[] array changes memory access patterns. For instance, switching from registers to memory locations, L1 cache to L2 cache, or L3 cache, or main memory. 'False sharing' is an example of a degradation pattern. 'Mechanical sympathy' is a name used for tuning to current hardware architectures.
I saw the following code in this commit for MongoDB's Java Connection driver, and it appears at first to be a joke of some sort. What does the following code do?
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
(EDIT: the code has been updated since posting this question)
After inspecting the history of that line, my main conclusion is that there has been some incompetent programming at work.
That line is gratuitously convoluted. The general form
a? true : b
for boolean a, b is equivalent to the simple
a || b
The surrounding negation and excessive parentheses convolute things further. Keeping in mind De Morgan's laws it is a trivial observation that this piece of code amounts to
if (!_ok && Math.random() <= 0.1)
return res;
The commit that originally introduced this logic had
if (_ok == true) {
_logger.log( Level.WARNING , "Server seen down: " + _addr, e );
} else if (Math.random() < 0.1) {
_logger.log( Level.WARNING , "Server seen down: " + _addr );
}
—another example of incompetent coding, but notice the reversed logic: here the event is logged if either _ok or in 10% of other cases, whereas the code in 2. returns 10% of the times and logs 90% of the times. So the later commit ruined not only clarity, but correctness itself.
I think in the code you have posted we can actually see how the author intended to transform the original if-then somehow literally into its negation required for the early return condition. But then he messed up and inserted an effective "double negative" by reversing the inequality sign.
Coding style issues aside, stochastic logging is quite a dubious practice all by itself, especially since the log entry does not document its own peculiar behavior. The intention is, obviously, reducing restatements of the same fact: that the server is currently down. The appropriate solution is to log only changes of the server state, and not each its observation, let alone a random selection of 10% such observations. Yes, that takes just a little bit more effort, so let's see some.
I can only hope that all this evidence of incompetence, accumulated from inspecting just three lines of code, does not speak fairly of the project as a whole, and that this piece of work will be cleaned up ASAP.
https://github.com/mongodb/mongo-java-driver/commit/d51b3648a8e1bf1a7b7886b7ceb343064c9e2225#commitcomment-3315694
11 hours ago by gareth-rees:
Presumably the idea is to log only about 1/10 of the server failures (and so avoid massively spamming the log), without incurring the cost of maintaining a counter or timer. (But surely maintaining a timer would be affordable?)
Add a class member initialized to negative 1:
private int logit = -1;
In the try block, make the test:
if( !ok && (logit = (logit + 1 ) % 10) == 0 ) { //log error
This always logs the first error, then every tenth subsequent error. Logical operators "short-circuit", so logit only gets incremented on an actual error.
If you want the first and tenth of all errors, regardless of the connection, make logit class static instead of a a member.
As had been noted this should be thread safe:
private synchronized int getLogit() {
return (logit = (logit + 1 ) % 10);
}
In the try block, make the test:
if( !ok && getLogit() == 0 ) { //log error
Note: I don't think throwing out 90% of the errors is a good idea.
I have seen this kind of thing before.
There was a piece of code that could answer certain 'questions' that came from another 'black box' piece of code. In the case it could not answer them, it would forward them to another piece of 'black box' code that was really slow.
So sometimes previously unseen new 'questions' would show up, and they would show up in a batch, like 100 of them in a row.
The programmer was happy with how the program was working, but he wanted some way of maybe improving the software in the future, if possible new questions were discovered.
So, the solution was to log unknown questions, but as it turned out, there were 1000's of different ones. The logs got too big, and there was no benefit of speeding these up, since they had no obvious answers. But every once in a while, a batch of questions would show up that could be answered.
Since the logs were getting too big, and the logging was getting in the way of logging the real important things he got to this solution:
Only log a random 5%, this will clean up the logs, whilst in the long run still showing what questions/answers could be added.
So, if an unknown event occurred, in a random amount of these cases, it would be logged.
I think this is similar to what you are seeing here.
I did not like this way of working, so I removed this piece of code, and just logged these
messages to a different file, so they were all present, but not clobbering the general logfile.
I'm trying to solve a problem that calls for recursive backtracking and my solution produces a stackoverflow error. I understand that this error often indicates a bad termination condition, but my ternimation condition appears correct. Is there anything other than a bad termination condition that would be likely to cause a stackoverflow error? How can I figure out what the problem is?
EDIT: sorry tried to post the code but its too ugly..
As #irreputable says, even if your code has a correct termination condition, it could be that the problem is simply too big for the stack (so that the stack is exhausted before the condition is reached). There is also a third possibility: that your recursion has entered into a loop. For example, in a depth-first search through a graph, if you forget to mark nodes as visited, you'll end up going in circles, revisiting nodes that you have already seen.
How can you determine which of these three situations you are in? Try to make a way to describe the "location" of each recursive call (this will typically involve the function parameters). For instance, if you are writing a graph algorithm where a function calls itself on neighbouring nodes, then the node name or node index is a good description of where the recursive function is. In the top of the recursive function, you can print the description, and then you'll see what the function does, and perhaps you can tell whether it does the right thing or not, or whether it goes in circles. You can also store the descriptions in a HashMap in order to detect whether you have entered a circle.
Instead of using recursion, you could always have a loop which uses a stack. E.g. instead of (pseudo-code):
function sum(n){
if n == 0, return 0
return n + sum(n-1)
}
Use:
function sum(n){
Stack stack
while(n > 0){
stack.push(n)
n--
}
localSum = 0
while(stack not empty){
localSum += stack.pop()
}
return localSum
}
In a nutshell, simulate recursion by saving the state in a local stack.
You can use the -Xss option to give your stack more memory if your problem is too large to fix in the default stack limit size.
As the other fellas already mentioned, there might be few reasons for that:
Your code has problem by nature or in the logic of the recursion. It has to be a stoping condition, base case or termination point for any recursive function.
Your memory is too small to keep the number of recursive calls into the stack. Big Fibonacci numbers might be good example here. Just FYI Fibonacci is as follows (sometimes starts at zero):
1,1,2,3,5,8,13,...
Fn = Fn-1 + Fn-2
F0 = 1, F1 = 1, n>=2
If your code is correct, then the stack is simply too small for your problem. We don't have real Turing machines.
There are two common coding errors that could cause your program to get into an infinite loop (and therefore cause a stack overflow):
Bad termination condition
Bad recursion call
Example:
public static int factorial( int n ){
if( n < n ) // Bad termination condition
return 1;
else
return n*factorial(n+1); // Bad recursion call
}
Otherwise, your program could just be functioning properly and the stack is too small.
I have encountered a somewhat baffling problem with the simple task of filling an Array dynamically in Java. The following is a snapshot from where the problem originates:
entries = new Object[ (n = _entries.length + 1) ] ;
for(i = 0 ; i < n ; i++) {
entry = ( i == (n - 1) ) ? addition : _entries[i] ;
entries[i] = entry ;
//...
}
Where _entries is a source Array (field of the class); entries is initialized as an Array of Objects
Object[] entries = null ;
and addition is the Object to be added (passed as an Argument to the method this code is in).
The code passes the compiler but results in a memory-leak when called. I was able to narrow down the cause to the line where the code attempts to fill the new Array
entries[i] = entry ;
however, I cannot think of any reason why this would cause a memory-leak. I'm guessing the root of the issue must be either an extremely stupid fault on my part or an extremely arcane problem with Java. :-)
If you need more background let me know.
Edit:
Tomcat's log tells me:
A web application appears to have started a thread named ... but has failed to stop it.
This is very likely to create a memory leak.
Other than that obviously the page loading the class does not finish loading or loads very slowly.
Edit:
The problem might be somewhere else (at a more expected location) after all. Apparently Tomcat wasn't loading the class files all the time when I tried to pin down the faulty code and this mislead me a bit. I now suspect a infinte for-each loop caused by a defective Iterator implementation up in the call stack to be at fault.
In any case, thanks for your input! Always much appreciated!
I will use a Collection (probably a Vector) instead of an Array as a work-around; still, I'd like to know what the problem here is.
TIA,
FK82
So, about your Tomcat log message:
A web application appears to have started a thread named ... but has failed to stop it. This is very likely to create a memory leak.
This says that your servlet or something similar started a new thread, and this thread is still running when your servlet finished its operation. It doesn't relate at all to your example code (if this code isn't the one starting the thread).
Superfluous threads, even more when each HTTP-request starts a new one (which does not finish soon) can create a memory leak, since each thread needs quite some space for its stack, and also may inhibit garbage-collection by referencing objects who are not needed anymore. Make sure that your thread is really needed, and think about using a threadpool instead (preferably container-managed, if this is possible).
I cannot see a memory leak, but your code is more complicated than it needs to be. How about this:
newLength = $entries.length + 1;
entries = new Object[ newLength ] ;
for(i = 0 ; i < newLength - 1 ; i++) {
entries[i] = $entries[i];
//...
}
entries[ newLength - 1 ] = addition;
No need to check if you are at the last entry all the time and you could use a array copy method as suggested by Alison.
Think of this post as a comment. I just posted it as an answer because I don't know how code is formatted in comments...
It is working for me,
please find the sample code. and change it to accordingly
class test {
public static void main(String[] args) {
String[] str = new String[]{"1","2","3","4","5","6"};
int n=0;
Object[] entries = new Object[ (n = 5 + 1) ] ;
for(int i = 0 ; i < n ; i++) {
Object entry = ( i == (n - 1) ) ? new Object() : str [i] ;
entries[i] = entry ;
}
System.out.println(entries[3]);
}
}
Perhaps by Memory Leak you are meaning an OutOfMemoryException? Sometime you get that in Java if you do not have the minimum heap size set high enough (and also a well defined max heap size too) when you start up. If there is not enough heap created at startup then you can sometimes use it up faster than the JVM has time to allocate more memory to the heap or to garbage collect. Unfortunately, there is no "right answer" here. You just have to play with different settings to get the right result (ie, known as "tuning the JVM"). In other words, this is more of an art than a science.
And in case you didn't know, you pass the arguments to the JVM on the command line when firing up your program -Xmin 250m -Xmax 1024m is an example. You must specify the values in megabytes. The first sets the minimum heap (at startup) to 250 megabytes. The second argument sets the max heap size at one gigabyte.
Just another thought to go by as I too am puzzled by how you could trace a memory leak to one line of code.