The Java code is as follows:
Random r = new Random(1234697890);
HashMap<Integer, List<Integer>> map = new HashMap<Integer, List<Integer>>();
List<Integer> list = new ArrayList<Integer>();
for(int i=0;i<100000;i++){
for(int j=0;j<1000;j++){
list.add(r.nextInt(100000));
}
map.put(i, list);
map.remove(i);
}
when i reaches 37553 , java.lang.OutOfMemoryError: Java heap space happens.
It seems that garbage collection does not happen in the loop.
Now I wonder how to fix the problem.
Try rewriting the code as follows and you should not get OOME's ...
Random r = new Random(1234697890);
HashMap<Integer, List<Integer>> map = new HashMap<Integer, List<Integer>>();
for(int i=0;i<100000;i++){
List<Integer> list = new ArrayList<Integer>();
for(int j=0;j<1000;j++){
list.add(r.nextInt(100000));
}
map.put(i, list);
map.remove(i);
}
The problem with your original code is that:
you only create one list,
you keep adding more and more elements to it, and
that list only becomes garbage when the code completes ... because it is "in scope" the whole time.
Moving the list declaration inside the loop means that a new ArrayList is created and filled in each loop iteration, and becomes garbage when you start the next iteration.
Someone suggested calling System.gc(). It won't help at all in your case because there is minimal1 garbage to be collected. And in general it is a bad idea because:
the GC is guaranteed to have run immediately before an OOME is thrown,
the JVM can figure out better than you can when is the best (i.e. most efficient) time to run the GC,
your call to System.gc() may be totally ignored anyway. The JVM can be configured so that calls to System.gc() are ignored.
1 - The pedant in me would like to point out that map.put(i, list); map.remove(i); is most likely generating an Integer object that most likely becomes garbage. However, this is "chicken feed" compared to your indefinitely growing ArrayList object.
You use the same List all the time, which contains 100000 * 1000 items when the loop exits. To enable GC to get rid of your list, you need to reduce its scope to within the for(i) loop.
In other words, both map and list are reachable at all time in that piece of code and are therefore not eligible for collection.
In your case you keep filling the same list (even though you remove it from that HashMap, it still exists as a local variable).
The JVM promises to do a full garbage collect before throwing an OutOfMemoryError. So you can be sure there is nothing left to clean up.
Your code
List<Integer> list = new ArrayList<Integer>();
for(int i=0;i<100000;i++){
for(int j=0;j<1000;j++){
list.add(r.nextInt(100000));
}
map.put(i, list);
map.remove(i);
}
is the same as
List<Integer> list = new ArrayList<Integer>();
for(int i=0;i<100000 * 1000; i++) {
list.add(r.nextInt(100000));
}
As you can see its the List, not the map which is retaining all the integers. BTW Try this instead and see what happens ;)
list.add(r.nextInt(128));
Related
There are two test cases which use parallelStream():
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().filter(integer -> (integer % 2) == 0).forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>9332
List<Integer> src = new ArrayList<>();
for (int i = 0; i < 20000; i++) {
src.add(i);
}
List<String> strings = new ArrayList<>();
src.parallelStream().forEach(integer -> strings.add(integer + ""));
System.out.println("=size=>" + strings.size());
=size=>17908
Why do I always lose data when using parallelStream?
What did i do wrong?
ArrayList isn't thread safe. You need to do
List<String> strings = Collections.synchronizedList(new ArrayList<>());
or
List<String> strings = new Vector<>();
to ensure all updates are synchronized, or switch to
List<String> strings = src.parallelStream()
.filter(integer -> (integer % 2) == 0)
.map(integer -> integer + "")
.collect(Collectors.toList());
and leave the list building to the Streams framework. Note that it's undefined whether the list returned by collect is modifiable, so if that is a requirement, you may need to modify your approach.
In terms of performance, Stream.collect is likely to be much faster than using Stream.forEach to add to a synchronized collection, since the Streams framework can handle collection of values in each thread separately without synchronization and combine the results at the end in a thread safe fashion.
ArrayList isn't thread-safe. While 1 thread sees a list with 30 elements another might still see 29 and override the 30th position (loosing 1 element).
Another issue might arise when the array backing the list needs to be resized. A new array (with double the size) is created and elements from the original array are copied into it. While other threads might have added stuff the thread doing the resizing might not have seen this or multiple threads are resizing and eventually only 1 will win.
When using multiple threads you need to either do some syncronized when accessing the list OR use a multi-thread safe list (by either wrapping it in a SynchronizedList or by using a CopyOnWriteArrayList to mention 2 possible solutions). Even better would be to use the collect method on the stream to put everything into a list.
ParallelStream with forEach is a deadly combo if not used carefully.
Please take a look at below points to avoid any bugs:
If you have a preexisting list object in which you want to add more objects from a parallelStream loop, Use Collections.synchronizedList & pass that pre-existing list object to it before looping through the parallelstream.
If you have to create a new list, then you can use Vector to initialize the list outside the loop.
or
If you have to create a new list, then simply use parallelStream and collect the output at the end.
You lose the benefits of using stream (and parallel stream) when you try to do mutation. As a general rule, avoid mutation when using streams. Venkat Subramaniam explains why. Instead, use collectors. Also try to get a lot accomplished within the stream chain. For example:
System.out.println(
IntStream.range(0, 200000)
.filter(i -> i % 2 == 0)
.mapToObj(String::valueOf)
.collect(Collectors.toList()).size()
);
You can run that in parallelStream by adding .parallel()
When I plot the List, the objects are different, but only the HashMap value are the same as the last inserted in the List.
During the creation (the loop), I tried to plot the HashMap values and it are different. I use it to generate a graph and make some calculations. However, when I add it to the list all values became the same, or every time that I am creating a new object it are changing the HashMap values of all objects within the list.
Here is my code: https://github.com/Willtl/2/tree/master/GAScheduling/src/main/java/uni/lu
In the Main.java, I am creating a new population:
Population pop = new Population(jobs, machines, genesPlate, populationSize);
In the Population constructor:
for (int i = 0; i < popsize; i++) {
individuals.add(new Individual(i+1, jobs, machines, genesPlate));
}
Inside each Individual I am shuffling each ArrayList of the HashMap genesPlate.
this.id = id;
this.genesPlate = plate;
this.jobs = jobs;
this.machines = machines;
ArrayList<Job> l1 = null;
// shuffle list of jobs of each machine
for (int i = 0; i < machines.length; i++) {
l1 = genesPlate.get(machines[i].getId());
long seed = System.nanoTime();
Collections.shuffle(l1, new Random(seed));
genesPlate.put(machines[i].getId(), l1);
}
System.out.println("Reshuffled: " + genesPlate);
computeFitness();
Until here everything is ok. I am plotting and indeed it are shuffled. I use jobs and this genesPlate (random machine sequences) to generate a graph and compute the fitness of each individual.
However, I tryed to comment this code and nothing happened (prob. remains the same). So I think this code is ok.
I am stuck for days in that. I tried everything. I hope someone can help me.
Java only has primitive and reference variables.
When you add a reference to a List into a Map, only the reference to the List is copied, not the List it references. If you only create one List and you add it's reference to a Map, there is still only one List object.
To confirm this you can see this by stepping through your code in your debugger to see that your Map most likely has all it's values pointing to the same List.
I am looping through a list A to find X. Then, if X has been found, it is stored into list B. After this, I want to delete X from list A. As speed is an important issue for my application, I want to delete X from A without looping through A. This should be possible as I already know the location of X in A (I found its position in the first line). How can I do this?
for(int i = 0; i<n; i++) {
Object X = methodToGetObjectXFromA();
B.add(X);
A.remove(X); // But this part is time consuming, as I unnecessarily loop through A
}
Thanks!
Instead of returning the object from yhe method, you can return its index and then remove by index:
int idx = methodToGetObjectIndexFromA();
Object X = A.remove(idx); // But this part is time consuming, as I unnecessarily loop through A
B.add(X);
However, note that the remove method may be still slow due to potential move of the array elements.
You can use an iterator, and if performance is an issue is better you use a LinkedList for the list you want to remove from:
public static void main(String[] args) {
List<Integer> aList = new LinkedList<>();
List<Integer> bList = new ArrayList<>();
aList.add(1);
aList.add(2);
aList.add(3);
int value;
Iterator<Integer> iter = aList.iterator();
while (iter.hasNext()) {
value = iter.next().intValue();
if (value == 3) {
bList.add(value);
iter.remove();
}
}
System.out.println(aList.toString()); //[1, 2]
System.out.println(bList.toString()); //[3]
}
If you stored all the objects to remove in a second collection, you may use ArrayList#removeAll(Collection)
Removes from this list all of its elements that are contained in the
specified collection.
Parameters:
c collection containing elements to be removed from this list
In this case, just do
A.removeAll(B);
When exiting your loop.
Addition
It calls ArrayList#batchRemove which will use a loop to remove the objects but you do not have to do it yourself.
I need to iterate over a collection of items & sometimes add to that collection at the same time. However, incase I add while iterating then I just start the iteration from fresh by breaking out of iteration loop & restarting iteration from beginning. However this leads to
ConcurrentModificationException. [code below]
List<Integer> collection = new ArrayList<>();
for (Integer lobId: collection) {
..
if (someCondition) {
collection.add(something);
break;
}
}
How could I possibly do something like above avoiding ConcurrentModificationException?
Would it be correct to simply use an Array instead of ArrayList to avoid this exception ?
Is there any type of specialized collection for this ?
--
Edit:
I dont want to create a new copy for this arraylist because I'm repeating this entire iteration process multiple times unless some requirement is completed. Creating a new copy each time would bring in some extra overhead, which I would like to avoid if somehow possible.
Also if possible I would like to maintain a sorted order & unique values in that collection. Is there anything that is ready to use in any library? Otherwise I could sort it at the end of iteration process & remove duplicates. That will also do fine for me.
Use another collection for the additions and combine them at the end.
List<Integer> collection = new ArrayList<>();
collection.add(...)
...
List<Integer> tempCollection = new ArrayList<>();
for (Integer lobId: collection ) {
..
if (someCondition) {
tempCollection.add(something);
break;
}
}
collection.addAll(tempCollection);
This code cannot lead to ConcurrentModificationException because after you add an element you break the loop and dont use iterator anymore
if I understand you right, you want to iterate over the list , if some condition , you want to break the iteration , and an item and start fresh .
In the case do this:
List<Integer> collection = new ArrayList<>();
boolean flag = false;
Integer item =
for (Integer lobId: collection) {
..
if (someCondition) {
flag = true;
item = something;
break;
}
}
if (flag){
collection.add(item);
}
if someone else is going to change the list outside out loop - you will need to sync those access - read iterator thread safe , and use the other answers here like copying the list or some other copy on write
ConcurrentModificationException basically means that you're iterating over a Collection with one iterator (albeit implicitly defined by your enhanced for loop) and invalidating it on the fly by changing the Collection itself.
You can avoid this by doing the modifications via the sameiterator:
List<Integer> collection = new ArrayList<>();
ListIterator<Integer> iter = collection.listIterator();
while (iter.hasNext()) {
Integer currVal = iter.next();
if (someCondition) {
iter.add(something); // Note the addition is done on iter
break;
}
}
Don't use for each, use the good old
for(int i=0; i<collection.size();i++)
I have an ArrayList, and I need to filter it (only to remove some elements).
I can't modify the original list.
What is my best option regarding performances :
Recreate another list from the original one, and remove items from it :
code :
List<Foo> newList = new ArrayList<Foo>(initialList);
for (Foo item : initialList) {
if (...) {
newList.remove(item);
}
}
Create an empty list, and add items :
code :
List<Foo> newList = new ArrayList<Foo>(initialList.size());
for (Foo item : initialList) {
if (...) {
newList.add(item);
}
}
Which of these options is the best ? Should I use anything else than ArrayList ? (I can't change the type of the original list though)
As a side note, approximatively 80% of the items will be kept in the list. The list contains from 1 to around 20 elements.
Best option is to go with what is easiest to write and maintain.
If performance is problem, you should profile the application afterwards and not to optimize prematurely.
In addition, I'd use filtering from library like google-collections or commons collections to make the code more readable:
Collection<T> newCollection = Collections2.filter(new Predicate<T>() {
public boolean apply(T item) {
return (...); // apply your test here
}
});
Anyway, as it seems you are optimizing for the performance, I'd go with System.arraycopy if you indeed want to keep most of the original items:
String[] arr = new String[initialList.size()];
String[] src = initialList.toArray(new String[initialList.size()]);
int dstIndex = 0, blockStartIdx=0, blockSize=0;
for (int currIdx=0; currIdx < initialList.size(); currIdx++) {
String item = src[currIdx];
if (item.length() <= 4) {
if (blockSize > 0)
System.arraycopy(src, blockStartIdx, arr, dstIndex, blockSize);
dstIndex += blockSize;
blockSize = 0;
} else {
if (blockSize == 0)
blockStartIdx = currIdx;
blockSize++;
}
}
ArrayList newList = new ArrayList(arr.length + 1);
newList.addAll(Arrays.asList(arr));
}
It seems to be about 20% faster than your option 3. Even more so (40%) if you can skip the new ArrayList creation at the end.
See: http://pastebin.com/sDhV8BUL
You might want to go with the creating a new list from the initial one and removing. They would be less method calls that way since you're keeping ~80% of the original items.
Other than that, I don't know of any way to filter the items.
Edit: Apparently Google Collections has something that might interest you?
As #Sanjay says, "when in doubt, measure". But creating an empty ArrayList and then adding items to it is the most natural implementation and your first goal should be to write clear, understandable code. And I'm 99.9% sure it will be the faster one too.
Update: By copying the old List to a new one and then striking out the elements you don't want, you incur the cost of element removal. The ArrayList.remove() method needs to iterate up to the end of the array on each removal, copying each reference down a position in the list. This almost certainly will be more expensive than simply creating a new ArrayList and adding elements to it.
Note: Be sure to allocate the new ArrayList to an initial capacity set to the size of the old List to avoid reallocation costs.
the second is faster (iterate and add to second as needed) and the code for the first will throw ConcurrentModificationException when you remove any items
and in terms of what result type will be depends on what you are going to need the filtered list for
I'd first follow the age old advice; when in doubt, measure.
Should I use anything else than
ArrayList ?
That depends on what kind of operations would you be performing on the filtered list but ArrayList is usually is a good bet unless you are doing something which really shouldn't be backed by a contiguous list of elements (i.e. arrays).
List newList = new
ArrayList(initialList.size());
I don't mean to nitpick, but if your new list won't exceed 80% of the initial size, why not fine tune the initial capacity to ((int)(initialList.size() * .8) + 1)?
Since I'm only get suggestions here, I decided to run my own bench to be sure.
Here are the conclusions (with an ArrayList of String).
Solution 1, remove items from the copy : 2400 ms.
Solution 2, create an empty list and fill it : 1600 ms. newList = new ArrayList<Foo>();
Solution 3, same as 2, except you set the initial size of the List : 1530 ms. newList = new ArrayList<Foo>(initialList.size());
Solution 4, same as 2, except you set the initial size of the List + 1 : 1500 ms. newList = new ArrayList<Foo>(initialList.size() + 1); (as explained by #Soronthar)
Source : http://pastebin.com/c2C5c9Ha