Thread with lists - java

I have an app that is a little bit slow. I thoutght it could be faster using threads.
So, here is my plan: My program have a list of objects of type X and each object X has a very big list of Integers (let's consider Integer for the sake of simplicity).
I have a static method (called getSubsetOfX) that receives a object X from the list of X's and return a list of Integers of the object X the list returned is a subset of all the Integers contained in X.
This method is called for every X contained in the list. Then I insert the returned List in a List of Integer Lists.
This is the code I explained in a compact version:
// Class of object X
public class X{
public List<Integer> listX;
...
}
// Utility class
public class Util{
// Return a sub-set of Integer contained in X
public static List<Integer> getSubsetOfX(X x){...}
}
public class exec{
public static void main(String args[]){
// Let's suppose that lx is already filled with data!
List<X> lx = new ArrayList<X>();
// List of the subsets of integer
List<List<Integer>> li = new ArrayList<ArrayList<Integer>>();
for(X x : lx){
// I want to turn this step "threadrized"
li.add(getSubsetOfX(x));
}
}
}
I don't know if a List allow concurrent insertions. I don't know how to apply threads in it too. I read some about Threads, but, as the run() method doesn't return anything, how can turn the method getSubsetOfX(X x) parallel?
Can you help me doing this?

Just to be clear, getSubsetOfX() is the call that takes a long time, right?
For this sort of task, I'd suggest you look at Java's Executors. The first step would be to create a Callable that runs getSubsetOfX(x) on a given instance of X. Something like this:
public class SubsetCallable implements Callable<List<Integer>> {
X x;
public SubsetCallable(X x) {
this.x = x;
}
public List<Integer> call() {
return Util.getSubsetOfX(x);
}
}
Then you can create an ExecutorService using one of the methods in Executors. Which method to use depends on your available resources and your desired execution model - they're all described in the documentation. Once you create the ExecutorService, just create a SubsetCallable for each instance of X that you have and pass it off to the service to run it. I think it could go something like this:
ExecutorService exec = ...;
List<SubsetCallable> callables = new LinkedList<SubsetCallable>();
for (X x : lx) {
callables.append(new SubsetCallable(x));
}
List<Future<List<Integer>>> futures = exec.invokeAll(lc);
for (Future<List<Integer>> f : futures) {
li.add(f.get());
}
This way you can delegate the intense computation to other threads, but you still only access the list of results in one thread, so you don't have to worry about synchronization. (As winsharp93 pointed out, ArrayList, like most of Java's standard collections, is unsynchronized and thus not safe for concurrent access.)

I don't know if a List allow concurrent insertions.
See Class ArrayList:
Note that this implementation is not
synchronized. If multiple threads
access an ArrayList instance
concurrently, and at least one of the
threads modifies the list
structurally, it must be synchronized
externally. (A structural modification
is any operation that adds or deletes
one or more elements, or explicitly
resizes the backing array; merely
setting the value of an element is not
a structural modification.) This is
typically accomplished by
synchronizing on some object that
naturally encapsulates the list. If no
such object exists, the list should be
"wrapped" using the
Collections.synchronizedList method.
This is best done at creation time, to
prevent accidental unsynchronized
access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
But be careful: Synchronization comes with a significant performance cost. This could relativity the performance you get by using multiple threads (especially when the calculations are quite fast do do).
Thus, avoid accessing those synchronized collections wherever possible. Prefer thread-local lists instead which you can then merge with your shared list using AddAll.

Related

Is it possible to iterate through a array in lambda expressions? (java)

I need to make a new ForkJoinTask for every division, and don't know how to get to the next position in the array. Can someone help me?
protected static double parManyTaskArraySum(final double[] input, final int numTasks) {
double sum = 0;
// ToDo: Start Calculation with help of ForkJoinPool
ForkJoinPool fjp = new ForkJoinPool(numTasks);
fjp.execute(() -> {
sum+=( 1 / input[???] ); //the problem is here
});
return sum;
}
Exception: local variables referenced from a lambda expression must be final or effectively final
You are feeding a Runnable lambda to your ForkJoinPool.
As such, you cannot parametrize it with the desired array chunk.
You should actually define a class extending RecursiveTask<Double> whose constructor takes the array chunk as parameter, and decides whether to operate on the whole of it or fork if it's too large.
Then use the invoke method of your ForkJoinPool to get the result of the final calculation, by passing it a new instance of that RecursiveTask<Double> taking the whole array (the task will the decide based on your criteria whether to do everything in one go, or to fork, say, half of the array's elements to another task and join later).
Note as there is some confusion here.
If in fact, you don't need to leverage the fork/join framework and only want to perform your operation asynchronously, there are many ways to do so without a ForkJoinPool.
For instance:
Callable<Double> call = () -> {return Arrays.stream(input).sum();};
Future<Double> future = Executors.newSingleThreadExecutor().submit(call);
// when you're ready
Double sum = future.get();

Thread Safety in Java Using Atomic Variables

I have a Java class, here's its code:
public class MyClass {
private AtomicInteger currentIndex;
private List<String> list;
MyClass(List<String> list) {
this.list = list; // list is initialized only one time in this constructor and is not modified anywhere in the class
this.currentIndex = new AtomicInteger(0);
}
public String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
}
}
Now my question:
Is this class really thread safe thanks to using an AtomicInteger only or there must be an addional thread safety mechansim to ensure thread-safety (for example locks)?
The use of currentIndex.getAndIncrement() is perfectly thread-safe. However, you need a change to your code to make it thread-safe in all circumstances.
The fields currentIndex and list need to be made final to achieve full thread-safety, even on unsafe publication of the reference to your MyClass object.
private final AtomicInteger currentIndex;
private final List<String> list;
In practice, if you always ensure that your MyClass object itself is safely published, for example if you create it on the main thread, before any of the threads that use it are started, then you don't need the fields to be final.
Safe publication means that the reference to the MyClass object itself is done in a way that has a guaranteed multi-threaded ordering in the Java Memory Model.
It could be that:
All threads that use the reference get it from a field that was initialized by the thread that started them, before their thread was started
All threads that use the reference get it from a method that was synchronized on the same object as the code that set the reference (you have a synchronized getter and setter for the field)
You make the field that contains the reference volatile
It was in a final field if that final field was initialized as described in section 17.5 of the JLS.
A few more cases the are not easily used to publish references
I think your code contains two bugs.
First, normally when you receive an object from some unknown source like your constructor does, you make a defensive copy to be certain it is not modified outside of the class.
MyClass(List<String> list) {
this.list = new ArrayList<String>( list );
So if you do this, do you now need to mutate that list anywhere inside the class? If so, the method:
public String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
isn't atomic. What could happen here is a thread call getAndIncrement() and then perform the modulus (%). Then at that point if it's swapped out with another thread that removes an item from the list, the old limit of list.size() will no longer be valid.
I think there's nothing for it but to add synchronized to the whole method:
public synchronized String select() {
return list.get(currentIndex.getAndIncrement() % list.size());
And the same with any other mutator.
(final as the other poster mentions is still required on the instance fields.)

How to make this function thread safe?

public class Sol {
static Map<Integer, List<String>> emap;
static List<Integer> sortSalaries(List<List<String>> workers) {
List<Integer> res = new ArrayList<Integer>();
emap = new HashMap<>();
for (List<String> e: workers)
emap.put(Integer.parseInt(e.get(0)), e);
for(List<String> worker: workers )
{
//accessing workers
.....
}
Collections.sort(res);
return res;
}
public static int dfs(int eid) {
List<String> employee = emap.get(eid);
int salary=0;
String ans = employee.get(3);
for (int i=0;i<ans.length();i=i+2)
{
// accesing emap
......
}
return salary;
}
}
Do i have to use synchronized keyword to make it thread safe. Do i have to use Vector and Hashtable if method is synchronized.
Alternatively, What if i use Vector and Hashtable, move the emap variable to sortSalaries() and pass it to dfs(). Is it okay if i not use synchronized keyword in this case..
I asked you question in comment that - do you understand why these methods are not thread-safe if called from multiple threads? and you pointed me to a link without specifying that if you really understood it or not and why do you think that your class is not thread safe so I am providing a little bit of background instead of directly answering the question.
A Bit of Short Discussion
Any class or its methods might become not thread safe when you start sharing data among runner / calling threads. Your class by default is thread - safe if no data is shared among threads so easiest way to make your class thread - safe is to stop sharing data among threads and in your case, its going to be removal of - emap ( because its a class state and used in methods ) & List<List<String>> workers ( This is what I am not sure of since its a reference passed on from caller and different method calls will be working on same instance or might be different instances are passed to this method ) and replace these by method local variables.
Method local variables are thread - safe by default since new instances are created and destroyed for each call.
if you can't do that or not feasible , follow oleg.cherednik's answer to synchronize for variable - emap - either at block level or method level. Do remember that there are various ways to synchronize in Java with synchronized keyword being easiest.
Now for method parameters - List<List<String>> workers & int eid , synchronization for eid is not needed since you are simply reading it and not updating & also its not pass by reference but pass by value due to type being primitive.
Synchronization for access to List<List<String>> workers is needed if you are passing same list instance to calls of this method from different threads. Refer to Gray's Answer - Here and this point is missed in oleg.cherednik's answer. You are better judge if synchronization would be needed or not for this reference.
Its easy to assume that List iteration is thread- safe ( since you are not updating the list ) but that might not always be true . Refer this question and all answers for detailed discussion.
So summary is this - you start implementing thread - safety for your class by first analyzing if some objects are shared among threads or not. If objects are shared , read / write to those objects need to be synchronized ( to make it atomic & provided those objects are not already thread - safe ) . If no objects are shared - its already thread safe . Also, try to create your classes with already thread - safe data structures , that way you will have less work to do.
java.lang.NullPointerException ( NPE ) point of oleg.cherednik's answer stands too.
Protect emap from outer access
Init emap to exclude NPE
Example:
public final class Sol {
private static final Map<Integer, List<String>> emap = new HashMap<>();
static List<Integer> sortSalaries(List<List<String>> workers) {
synchronized (Foo.class) {
for (List<String> e : workers)
emap.put(Integer.parseInt(e.get(0)), e);
}
// do smth, not access emap
}
public static synchronized int dfs(int eid) {
// do smth with accessing emap
}
}
In sortSalaries you can minimize synchoronized block with for loop. In dfs you access emap in different places of the method and therefore you have to synchoonized enire method.
Using either ConcurrentHashMap or Vector do not help here, becuase betwee get/set elements to the collection, they could be changed, which is not OK for dfs method: it should feeze emap when it's called.

Synchronization on Set element in java

I have a set of unique elements. Each element have a set of operations that it can perform but each element operations are independent of other element operations. For instance there are three operations : O1, O2, O3 that each element can perform. Now each element can perform O1, O2, O3 without conflicting with other element. But each element when performing operation have to perform O1, O2, O3 exclusively(one at a time).
Is it a good way to take lock on the element in that case. Will it work? Is there also other way for instance with ReentrantLock or with java8 to do this in better way?
For e.g
for(Element element : elements) {
synchronized(element) {
//Perform O1,O2,O3 but one at a time
}
}
Assume that above for loop can be called from multiple places and this for loop is written in multiple places in a code to perform different operation of a element.
As markspace suggests, the safest option is likely to be making your operations synchronized, rather than relying on all callers to do so properly, e.g.:
public class Element {
public synchronized void O1() { ... }
public synchronized void O2() { ... }
public synchronized void O3() { ... }
}
This will ensure that, for a given instance of Element, only one operation is being invoked at a time. Your callers then don't need to worry about explicitly synchronizing anything:
for(Element element : elements) {
element.O1();
element.O2();
element.O3();
}
However this won't be sufficient if you need to ensure these separate operations (O1, O2, and O3) all occur together, without interleaving with other possible callers. In that case the safest option would be to introduce new methods on Element that do what you need, rather than trying to compose them externally:
public class Element {
// ...
public synchronized void doAllOperations() { O1(); O2(); O3(); }
}
This ensures that everything in doAllOperations() will occur atomically with respect to any other invocation on this instant.

synchronized in java - Proper use

I'm building a simple program to use in multi processes (Threads).
My question is more to understand - when I have to use a reserved word synchronized?
Do I need to use this word in any method that affects the bone variables?
I know I can put it on any method that is not static, but I want to understand more.
thank you!
here is the code:
public class Container {
// *** data members ***
public static final int INIT_SIZE=10; // the first (init) size of the set.
public static final int RESCALE=10; // the re-scale factor of this set.
private int _sp=0;
public Object[] _data;
/************ Constructors ************/
public Container(){
_sp=0;
_data = new Object[INIT_SIZE];
}
public Container(Container other) { // copy constructor
this();
for(int i=0;i<other.size();i++) this.add(other.at(i));
}
/** return true is this collection is empty, else return false. */
public synchronized boolean isEmpty() {return _sp==0;}
/** add an Object to this set */
public synchronized void add (Object p){
if (_sp==_data.length) rescale(RESCALE);
_data[_sp] = p; // shellow copy semantic.
_sp++;
}
/** returns the actual amount of Objects contained in this collection */
public synchronized int size() {return _sp;}
/** returns true if this container contains an element which is equals to ob */
public synchronized boolean isMember(Object ob) {
return get(ob)!=-1;
}
/** return the index of the first object which equals ob, if none returns -1 */
public synchronized int get(Object ob) {
int ans=-1;
for(int i=0;i<size();i=i+1)
if(at(i).equals(ob)) return i;
return ans;
}
/** returns the element located at the ind place in this container (null if out of range) */
public synchronized Object at(int p){
if (p>=0 && p<size()) return _data[p];
else return null;
}
Making a class safe for multi-threaded access is a complex subject. If you are not doing it in order to learn about threading, you should try to find a library that does it for you.
Having said that, a place to start is by imagining two separate threads executing a method line by line, in an alternating fashion, and see what would go wrong. For example, the add() method as written above is vulnerable to data destruction. Imagine thread1 and thread2 calling add() more or less at the same time. If thread1 runs line 2 and before it gets to line 3, thread2 runs line 2, then thread2 will overwrite thread1's value. Thus you need some way to prevent the threads from interleaving like that. On the other hand, the isEmpty() method does not need synchronization since there is just one instruction that compares a value to 0. Again, it is hard to get this stuff right.
You can check the following documentation about synchronized methods: http://docs.oracle.com/javase/tutorial/essential/concurrency/syncmeth.html
By adding the synchronized keyword two things are guaranteed to happen:
First, it is not possible for two invocations of synchronized methods on the same object to interleave. When one thread is executing a synchronized method for an object, all other threads that invoke synchronized methods for the same object block (suspend execution) until the first thread is done with the object.
Second, when a synchronized method exits, it automatically establishes a happens-before relationship with any subsequent invocation of a synchronized method for the same object. This guarantees that changes to the state of the object are visible to all threads.
So whenever you need to guarantee that only one thread accesses your variable at a time to read/write it to avoid consistency issues, one way is to make your method synchronized.
My advice to you is to first read Oracle's concurrency tutorial.
A few comments:
Having all your methods synchronized causes bottlenecks
Having _data variable public is a bad practice and will difficult concurrent programming.
It seems that you are reimplementing a collection, better use existing Java's concurrent collections.
Variable names would better not begin with _
Avoid adding comments to your code and try to have declarative method names.
+1 for everybody who said read a tutorial, but here's a summary anyway.
You need mutual exclusion (i.e., synchronized blocks) whenever it is possible for one thread to create a temporary situation that other threads must not be allowed to see. Suppose you have objects stored in a search tree. A method that adds a new object to the tree probably will have to reassign several object references, and until it finishes its work, the tree will be in an invalid state. If one thread is allowed to search the tree while another thread is in the add() method, then the search() function may return an incorrect result, or worse (maybe crash the program.)
One solution is to synchronize the add() method, and the search() method, and any other method that depends on the tree structure. All must be synchronized on the same object (the root node of the tree would be an obvious choice).
Java guarantees that no more than one thread can be synchronized on the same object at any given time. Therefore, no more than one thread will be able to see or change the internals of the tree at the same time, and the temporary invalid state created inside the add() method will be harmless.
My example above explains the principle of mutual exclusion, but it is a simplistic and inefficient solution to protecting a search tree. A more practical approach would use reader/writer locks, and synchronize only on interesting parts of the tree rather than on the whole thing. Practical synchronization of complex data structures is a hard problem, and whenever possible, you should let somebody else solve it for you. E.g., If you use the container classes in java.util.concurrent instead of creating your own data structures, you'll probably save yourself a lot of work (and maybe a whole lot of debugging).
You need to protect variables that form the object's state. If these variables are used in static method, you have to protect them as well. But, be careful, following example is wrong:
private static int stateVariable = 0;
//wrong!!!!
public static synchronized void increment() {
stateVariable++;
}
public synchronized int getValue() {
return stateVariable;
}
It seems that above is safe, but these methods operate on different locks. Above is more or less corresponds to following:
private static int stateVariable = 0;
//wrong!!!!
public static void increment() {
synchronized (YourClassName.class) {
stateVariable++;
}
}
public synchronized int getValue() {
synchronized (this) {
return stateVariable;
}
}
Notice that different locks are used when mixing static and object methods.

Categories