Parallel iteration on multiple collections

Parallel iteration on multiple collections - java

I'm reading J. Bloch's "Effective Java" and now I'm at the section about for-each vs for-loop. He mentioned three cases where we couldn't use the for-each loop and one of them is the following:L
Parallel iteration— If you need to traverse multiple collections in
parallel, then you need explicit control over the iterator or index
variable, so that all iterators or index variables can be advanced in
lockstep (as demonstrated unintentionally in the buggy card and dice
examples above).
The case is not quite clear to me, I can't imagine an example.
The first thought that popped into my head was that it was just about iterating the same collection in multiple threads but it's probably not what he meant. I don't see any restrictions preventing us from doing so (read-only). Actually:
public class MyRunnable implements Runnable{
private Collection<String> col;
//CTOR ommmited
public void run(){
for(String s : col){
//print o, not modify
}
}
Then we just start some threads with the same instance. So, we're not afraid of getting ConcurrentModificationException (JavaDocs) because we perform read-only access, even by multiple threads simultaneously.
What's wrong?

I don't think he meant "in parallel" as in concurrently.
It is much simpler. Suppose you have two Collections and you want the same loop (not a nested loop) to iterate over both of them, taking the i'th element of each one in each iteration. You can't do that with the enhanced for loop, since it hides the indices and the iterator.
You must use the standard for loop (for ordered Collections) :
private List<String> one;
private List<String> two;
public void run(){
for(int i = 0; i<one.size() && i<two.size();i++){
// do something with one.get(i) and two.get(i)
}
}
Or explicit Iterators (for un-ordered Collections) :
private Set<String> one;
private Set<String> two;
public void run(){
for(Iterator<String> iterOne=one.iterator(),Iterator<String> iterTwo=two.iterator(); iterOne.hasNext()&&iterTwo.hasNext();){
// do something with iterOne.next() and iterTwo.next()
}
}

Parallel iteration— If you need to traverse multiple collections in
parallel, then you need explicit control over the iterator or index
variable, so that all iterators or index variables can be advanced in
lockstep (as demonstrated unintentionally in the buggy card and dice
examples above).
In plain English lockstep means at same time. It means you cannot iterate over more than one collection at same time using for-each. You will have to use separate iterators (
or for loop as shown by Eran) like below:
Iterator iterator1 = list1.iterator();
Iterator iterator2 = list2.iterator();
Iterator iterator3 = list3.iterator();
while (iterator1 .hasNext() && iterator2 .hasNext() && iterator3.hasNext()){
Item i1 = iterator1 .next();
Item i2 = iterator2 .next();
Item i3 = iterator3.next();
// rest of your code.
}

Related

ListIterator allocation in Java

I have a real-time program that runs a continuous while loop...
example:
while(true)
{
}
Inside that loop I have a for(MyObject o: myobjects) loop. When I run my code I see that every iteration of the while loop a new iterator is created to loop over my LinkedList called myobjects
What is a better way of iterating over a LinkedList without having Java create a ListIterator every time?

New iterator is given out by design. Read this. It gives you a new iterator every time you call iterator() method and it doesn't maintain states.
Code for the iterator :
public ListIterator<E> listIterator(final int index) {
rangeCheckForAdd(index);
return new ListItr(index);
}

What is a better way of iterating over a LinkedList without having Java create a ListIterator every time?
"Better" is very situational. You should consider whether the behavior you have now is actually a problem, because what you describe is about the simplest Java code for the job.
If you do need to iterate specifically over a LinkedList (as opposed, say, to an ArrayList), and you want to do so without creating a new ListIterator every time, then your best bet is probably to create a single ListIterator up front, and reuse it at every iteration:
ListIterator<MyObject> iterator = myobjects.listIterator();
while (true) {
// Return to the beginning of the list:
while (iterator.hasPrevious()) {
iterator.previous();
}
// The desired iteration:
while (iterator.hasNext()) {
MyObject o = iterator.next();
// do something with o
}
}
Do be aware, however, that this opens you up to trouble if the underlying list is modified. Any such modification will invalidate the ListIterator (its methods should start throwing ConcurrentModificationExceptions). In your original code, that will affect just one iteration of the outer loop, but if you reuse the iterator then you may need different handling of that situation. If the list is modified elsewhere in the outer loop, then re-using the same iterator is right out.
On the other hand, if you could be sure that you have a RandomAccess list, such as an ArrayList, then you could reasonably avoid iterators altogether, and just iterate by index:
while (true) {
// The desired iteration:
for (int i = 0; i < myobjects.size(); i += 1) {
MyObject o = myobjects.get(i);
// do something with o
}
}
Do not do that if you have or may have a LinkedList, however, because it will increase the cost of the iteration from O(n) to O(n2) for LinkedLists and similar sequential-access lists.

The source code of LinkedList says a new iterator is being created. If your list is dynamic, you will need a new iterator as it will become stale.
But, I believe if your list is a constant, you are better off running a normal for loop.
`int length = myobjects.size();
for (int i=0; i < length; i++)
{
//access here
}`

Modifying each item of a List in java

I'm just starting to work with lists in java. I'm wondering what the recommended method to modify each element of a list would be?
I've been able to get it done with both the following methods, but they both seem fairly unelegant. Is there any better way to get this done in java? And is any of the below methods recommended over the other, or are both on the same level?
//Modifying with foreach
for (String each : list)
{
list.set(list.indexOf(each), each+ " blah");
}
//Modifying with for
for (ListIterator<String> i = list.listIterator(); i.hasNext(); i.next())
{
i.next();
list.set(i.nextIndex()-1, i.previous() + " blah yadda");
}

The second version would be better. Internally they are the same in the end, but the second actually allows you to modify the list, while the first one will throw a ConcurrentModificationException.
But then you are using the Iterator in a wrong way. Here is how you do it correctly:
for (final ListIterator<String> i = list.listIterator(); i.hasNext();) {
final String element = i.next();
i.set(element + "yaddayadda");
}
The iterator is the one that needs to modify the list as it is the only one that knows how to do that properly without getting confused about the list elements and order.
Edit: Because I see this in all comments and the other answers:
Why you should not use list.get, list.set and list.size in a loop
There are many collections in the Java collections framework, each on optimized for specific needs. Many people use the ArrayList, which internally uses an array. This is fine as long as the amount of elements does not change much over time and has the special benefit that get, set and size are constant time operations on this specific type of list.
There are however other list types, where this is not true. For example if you have a list that constantly grows and/or shrinks, it is much better to use a LinkedList, because in contrast to the ArrayList add(element) is a constant time operation, but add(index, element), get(index) and remove(index) are not!
To get the position of the specific index, the list needs to be traversed from the first/last till the specific element is found. So if you do that in a loop, this is equal to the following pseudo-code:
for (int index = 0; index < list.size(); ++index) {
Element e = get( (for(int i = 0; i < size; ++i) { if (i == index) return element; else element = nextElement(); }) );
}
The Iterator is an abstract way to traverse a list and therefore it can ensure that the traversal is done in an optimal way for each list. Test show that there is little time difference between using an iterator and get(i) for an ArrayList, but a huge time difference (in favor for the iterator) on a LinkedList.

EDIT: If you know that size(), get(index) and set(index, value) are all constant time operations for the operations you're using (e.g. for ArrayList), I would personally just skip the iterators in this case:
for (int i = 0; i < list.size(); i++) {
list.set(i, list.get(i) + " blah");
}
Your first approach is inefficient and potentially incorrect (as indexOf may return the wrong value - it will return the first match). Your second approach is very confusing - the fact that you call next() twice and previous once makes it hard to understand in my view.
Any approach using List.set(index, value) will be inefficient for a list which doesn't have constant time indexed write access, of course. As TwoThe noted, using ListIterator.set(value) is much better. TwoThe's approach of using a ListIterator is a better general purpose approach.
That said, another alternative in many cases would be to change your design to project one list to another instead - either as a view or materially. When you're not changing the list, you don't need to worry about it.

Internally there in Iterator for for-each implementation. So there is no deference between these two cases. But if you trying to modify element it will throws ConcurrentModificationException.

I got mine working this way
String desiredInvoice="abc-123";
long desiredAmount=1500;
for (ListIterator<MyPurchase> it = input.getMyPurchaseList().listIterator(); it.hasNext();) {
MyPurchase item = it.next();
if (item.getInvoiceNo().equalsIgnoreCase(desiredInvoice)) {
item.setPaymentAmount(desiredAmount);
it.set(item);
break;
}
}

What is the difference in behavior between these two usages of synchronized on a list

List<String> list = new ArrayList<String>();
list.add("a");
...
list.add("z");
synchronized(list) {
Iterator<String> i = list.iterator();
while(i.hasNext()) {
...
}
}
and
List<String> list = new ArrayList<String>();
list.add("a");
...
list.add("z");
List<String> synchronizedList = Collections.synchronizedList(list);
synchronized(synchronizedList) {
Iterator<String> i = synchronizedList.iterator();
while(i.hasNext()) {
...
}
}
Specifically, I'm not clear as to why synchronized is required in the second instance when a synchronized list provides thread-safe access to the list.

If you don't lock around the iteration, you will get a ConcurrentModificationException if another thread modifies it during the loop.
Synchronizing all of the methods doesn't prevent that in the slightest.
This (and many other things) is why Collections.synchronized* is completely useless.
You should use the classes in java.util.concurrent. (and you should think carefully about how you will guarantee you will be safe)
As a general rule of thumb:
Slapping locks around every method is not enough to make something thread-safe.
For much more information, see my blog

synchronizedList only makes each call atomic. In your case, the loop make multiple calls so between each call/iteration another thread can modify the list. If you use one of the concurrent collections, you don't have this problem.
To see how this collection differs from ArrayList.
List<String> list = new CopyOnWriteArrayList<String>();
list.addAll(Arrays.asList("a,b,c,d,e,f,g,h,z".split(",")));
for(String s: list) {
System.out.print(s+" ");
// would trigger a ConcurrentModifcationException with ArrayList
list.clear();
}
Even though the list is cleared repeatedly, it prints the following because that wa the contents when the iterator was created.
a b c d e f g h z

The second code needs to be synchronized because of the way synchronized lists are implemented. This is explained in the javadoc:
It is imperative that the user manually synchronize on the returned list when iterating over it
The main difference between the two code snippets is the effect of the add operations:
with the synchronized list, you have a visibility guarantee: other threads will see the newly added items if they call synchronizedList.get(..) for example.
with the ArrayList, other threads might not see the newly added items immediately - they might actually not ever see them.

Get size of an Iterable in Java

I need to figure out the number of elements in an Iterable in Java.
I know I can do this:
Iterable values = ...
it = values.iterator();
while (it.hasNext()) {
it.next();
sum++;
}
I could also do something like this, because I do not need the objects in the Iterable any further:
it = values.iterator();
while (it.hasNext()) {
it.remove();
sum++;
}
A small scale benchmark did not show much performance difference, any comments or other ideas for this problem?

TL;DR: Use the utility method Iterables.size(Iterable) of the great Guava library.
Of your two code snippets, you should use the first one, because the second one will remove all elements from values, so it is empty afterwards. Changing a data structure for a simple query like its size is very unexpected.
For performance, this depends on your data structure. If it is for example in fact an ArrayList, removing elements from the beginning (what your second method is doing) is very slow (calculating the size becomes O(n*n) instead of O(n) as it should be).
In general, if there is the chance that values is actually a Collection and not only an Iterable, check this and call size() in case:
if (values instanceof Collection<?>) {
return ((Collection<?>)values).size();
}
// use Iterator here...
The call to size() will usually be much faster than counting the number of elements, and this trick is exactly what Iterables.size(Iterable) of Guava does for you.

If you are working with java 8 you may use:
Iterable values = ...
long size = values.spliterator().getExactSizeIfKnown();
it will only work if the iterable source has a determined size. Most Spliterators for Collections will, but you may have issues if it comes from a HashSetor ResultSetfor instance.
You can check the javadoc here.
If Java 8 is not an option, or if you don't know where the iterable comes from, you can use the same approach as guava:
if (iterable instanceof Collection) {
return ((Collection<?>) iterable).size();
} else {
int count = 0;
Iterator iterator = iterable.iterator();
while(iterator.hasNext()) {
iterator.next();
count++;
}
return count;
}

This is perhaps a bit late, but may help someone. I come across similar issue with Iterable in my codebase and solution was to use for each without explicitly calling values.iterator();.
int size = 0;
for(T value : values) {
size++;
}

You can cast your iterable to a list then use .size() on it.
Lists.newArrayList(iterable).size();
For the sake of clarity, the above method will require the following import:
import com.google.common.collect.Lists;

Strictly speaking, Iterable does not have size. Think data structure like a cycle.
And think about following Iterable instance, No size:
new Iterable(){
#Override public Iterator iterator() {
return new Iterator(){
#Override
public boolean hasNext() {
return isExternalSystemAvailble();
}
#Override
public Object next() {
return fetchDataFromExternalSystem();
}};
}};

java 8 and above
StreamSupport.stream(data.spliterator(), false).count();

I would go for it.next() for the simple reason that next() is guaranteed to be implemented, while remove() is an optional operation.
E next()
Returns the next element in the iteration.
void remove()
Removes from the underlying collection the last element returned by the iterator (optional operation).

As for me, these are just different methods. The first one leaves the object you're iterating on unchanged, while the seconds leaves it empty.
The question is what do you want to do.
The complexity of removing is based on implementation of your iterable object.
If you're using Collections - just obtain the size like was proposed by Kazekage Gaara - its usually the best approach performance wise.

Why don't you simply use the size() method on your Collection to get the number of elements?
Iterator is just meant to iterate,nothing else.

Instead of using loops and counting each element or using and third party library we can simply typecast the iterable in ArrayList and get its size.
((ArrayList) iterable).size();

How to compare two Arraylist values in java?

I have Two Arraylist RunningProcessList AllProcessList its contains following values are
RunningProcessList:
Receiver.jar
AllProcessList:
Receiver.jar
Sender.jar
Timeout.jar
TimeourServer.jar
AllProcessList arraylist contains the all java processes , RunningProcessList arraylist contains currently running process. I want to compare these two arraylist and I want to display If the process is not running. For Example compare two list and want to display following process is not running.
Result:
Sender.jar
Timeout.jar
TimeourServer.jar
I used the following code but its not working.
Object Result = null;
for (int i = 0; i <AllProcessList.size(); i++) {
for (int j = 0; j < RunningProcessList.size(); j++) {
if( AllProcessList.get(i) != ( RunningProcessList.get(j))) {
System.out.println( RunningProcessList.get(j)));
Result =RunningProcessList.get(j);
}
if(AllProcessList.get(i) != ( RunningProcessList.get(j))) {
list3.add(Result);
}
}
}

Take a look at the documentation for List, ecpecially the removeAll() method.
List result = new ArrayList(AllProcessList);
result.removeAll(RunningProcessList);
You could then iterate over that list and call System.out.println if you wanted, as you've done above... but is that what you want to do?

Assuming your lists are not too long, you can just collect all elements of AllProcessList that are not in the RunningProceesList
for (Object process : AllProcessList) {
if (!RunningProcessList.contains(process)) {
list3.add(process);
}
}
it's important that the RunningProcessList contains the same instances as the AllProcessList (or the objects must implement a functional equals method).
it would be better if your list contains instances of Process (or some other dedicated class).
List<Process> AllProcessList = new ArrayList<Process>();
List<Process> RunningProcessList = new ArrayList<Process>();
List<Process> list3 = new ArrayList<Process>();
...
for (Process process : AllProcessList) {
if (!RunningProcessList.contains(process)) {
list3.add(process);
}
}
English is not my first (neither second) language, any correction is welcome

Hi lakshmi,
I upvoted noelmarkham's answer as I think it's the best code wise and suits Your needs. So I'm not going to add another code snippet to this already long list, I just wanted to point You towards two things:
If Your processes are unique (their name/id whatever), You might consider to use (Hash)Sets in order to store them for better performance of Your desired operations. This should only be a concern when Your lists are large.
What about using ActiveProcesses and InactiveProccesses instead of Your current two lists? If a process changes its state You just have to remove it from one list and insert it into the other. This would lead to an overall cleaner design and You could access the not-running processes immediately.
Greetings

Depending on the type on AllProcessList and RunningProcessList (whocu should be allProcessList and runningProcessList to follow the Java naming conventions) the following will not work:
if ( AllProcessList.get(i) != ( RunningProcessList.get(j))) {
you should replace it with
if (!(AllProcessList.get(i).equals(RunningProcessList.get(j)))) {
!= compares physical equality, are the two things the exact same "new"ed object?
.equals(Object) compared locaical equality, ate the two things the "same"?
To do that you will need to override the equals and hashCode methods. Here is an article on that.
If the class is a built in Java library one then odds are equals and hashCode are done.

For sorted lists, the following is O(n). If a sort is needed, this method becomes O(nlogn).
public void compareLists(final List<T> allProcesses, final List<T> runningProcesses) {
// Assume lists are sorted, if not call Collection.sort() on each list (making this O(nlogn))
final Iterator<T> allIter = allProcesses.iterator();
final Iterator<T> runningIter = runningProcesses.iterator();
T allEntry;
T runningEntry;
while (allIter.hasNext() && runningIter.hasNext()) {
allEntry = allIter.next();
runningEntry = runningIter.next();
while (!allEntry.equals(runningEntry) && allIter.hasNext()) {
System.out.println(allEntry);
allEntry = allIter.next();
}
// Now we know allEntry == runningEntry, so we can go through to the next iteration
}
// No more running processes, so just print the remaining entries in the all processes list
while (allIter.hasNext()) {
System.out.println(allIter.next());
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parallel iteration on multiple collections - java

Related

ListIterator allocation in Java

Modifying each item of a List in java

What is the difference in behavior between these two usages of synchronized on a list

Get size of an Iterable in Java

How to compare two Arraylist values in java?

Categories

Resources