I have a huge table about 1 m record , i want to do some processing on all records , so 1 thread way , would be , get say... 1000 record , process them , get another 1000 record etc...
but what if i want to use multitasking ? that is 2 threads each fetching 1000 record and do the processing in parallel , how can i make sure that each thread will fetch different 1000 record ?
note : am using hibernate
something looks like that
public void run() {
partList=getKParts(10);
operateOnList(partList);
}
Sure, you can synchronize the code.
public class MyClass {
private final HibernateFetcher hibernateFetcher = new HibernateFetcher();
private class Worker implements Runnable {
public run() {
List partList = hibernateFetcher.fetchRecords();
operateOnList(partList);
}
}
public void myBatchProcessor() {
while(!hibernateFetcher.isFinished()) {
// create *n* workers and go!
}
}
}
class HibernateFetcher {
private int count = 0;
private final Object lock = new Object();
private volatile boolean isFinished = false;
public List fetchRecords() {
Criteria criteria = ...;
synchronized(lock) {
criteria.setFirstResult(count) // offset
.setMaxResults(1000);
count=count+1000;
}
List result = criteria.list();
isFinished = result.length > 0 ? false: true;
return result;
}
public synchronized boolean isFinished(){
return isFinished;
}
}
If I understood correctly you don't want 1m record fetched upfront but want it in batches of 1000 then to process them in 2 threads but make it parallel.
First you have to implement paging type feature in your database query using RowCount or something. From Java you can pass fromRowCount to toRowCount and fetch records in 1000 batches and process them parallel in threads. I am adding sample code here but you have to further implement your logic for different variables.
int totalRecordCount = 100000;
int batchSize =1000;
ExecutorService executor = Executors.newFixedThreadPool(totalRecordCount/batchSize);
for(int x=0; x < totalRecordCount;){
int toRowCount = x+batchSize;
partList=getKParts(10,x,toRowCount);
x= toRowCount + 1;
executor.submit(new Runnable<>() {
#Override
public void run() {
operateOnList(partList);
}
});
}
Hope this helps. Let me know in case further clarification required
If your records in the database do have a primary key of type int or long, add a restriction to each thread to fetch only records from ranges:
Thread1: 0000 - 0999, 2000 - 2999, etc
Thread2: 1000 - 1999, 3000 - 3999, etc
This way you need only an offset, a counter and an increment for each thread. For example Thread1 would have an offset of 0 while Thread2 would have an offset of 1000. Because of two threads in this example, you have an increment of 2000. For each round increment the counter (starting at 0) of each thread and calculate the next ranges as:
form = offset + (count * 2000)
to = from + 999
import com.se.sas.persistance.utils.HibernateUtils;
public class FinderWorker implements Runnable {
#Override
public void run() {
operateOnList(getNParts(IndexLocker.getAllowedListSize()));
}
public List<Parts> getNParts(int listSize) {
try {
criteria = .....
// *********** SYNCHRONIZATION OCCURS HERE ********************//
criteria.setFirstResult(IndexLocker.getAvailableIndex());
criteria.setMaxResults(listSize);
partList = criteria.list();
} catch (Exception e) {
e.printStackTrace();
} finally {
session.close();
}
return partList;
}
public void operateOnList(List<Parts> partList) {
....
}
}
locker class
public class IndexLocker {
private static AtomicInteger index = new AtomicInteger(0);
private final static int batchSize = 1000;
public IndexLocker() {
}
public static int getAllowedListSize() {
return batchSize;
}
public static synchronized void incrmntIndex(int hop) {
index.getAndAdd(hop);
}
public static synchronized int getAvailableIndex() {
int result = index.get();
index.getAndAdd(batchSize);
return result;
}
}
Related
My first question, Thank for your help!
I'm trying to print odd and even numbers 1~100 alternatively using two threads.
Expected results:
pool-1-thread-1=> 1
pool-1-thread-2=> 2
pool-1-thread-1=> 3
pool-1-thread-2=> 4
......
pool-1-thread-1=> 99
pool-1-thread-2=> 100
I think i can use FairSync, but it can only guarantee that most of the print is correct. like this:
pool-1-thread-1=> 55
pool-1-thread-2=> 56
pool-1-thread-1=> 57
pool-1-thread-2=> 58
pool-1-thread-2=> 59 //※error print※
pool-1-thread-1=> 60
pool-1-thread-2=> 61
pool-1-thread-1=> 62
I don't know why is the order lost in very few cases?
You can criticize my code and my English.
Here is my code:
private static final int COUNT = 100;
private static final int THREAD_COUNT = 2;
private static int curr = 1;
static ReentrantLock lock = new ReentrantLock(true);
static ExecutorService executorService = Executors.newCachedThreadPool();
public static void main(String[] args) {
Runnable task = () -> {
for (; ; ) {
try {
lock.lock();
if (curr <= COUNT) {
System.out.println(Thread.currentThread().getName() + "=> " + curr++);
} else {
System.exit(0);
}
} catch (Exception e) {
e.printStackTrace();
} finally {
lock.unlock();
}
}
};
for (int i = 0; i < THREAD_COUNT; i++) {
executorService.execute(task);
}
}
No dear your implementation is not correct. Which thread get's the opportunity to RUN is decided by the OS. Thread 1 & 2 will execute one after another cannot be guaranteed.
You can fix your code by checking the previous value of the variable curr and if the value is not what this thread expects don't increment and print.
for eg :
if(curr.threadName.equals("Thread 2") && (curr%2 !=0))
{
// Print
// Increment
}
You cant use single lock to achieve this. Even ReentrantLock gives fairness but it cant control thread schedule.
We can achieve throw inter thread communication like Semaphore. Semaphore controls the thread execution.
We create two threads, an odd thread, and an even thread. The odd thread would print the odd numbers starting from 1, and the even thread will print the even numbers starting from 2.
Create two semaphores, semOdd and semEven which will have 1 and 0 permits to start with. This will ensure that odd number gets printed first.
class SharedPrinter {
private Semaphore semEven = new Semaphore(0);
private Semaphore semOdd = new Semaphore(1);
void printEvenNum(int num) {
try {
semEven.acquire();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
System.out.println(Thread.currentThread().getName() + num);
semOdd.release();
}
void printOddNum(int num) {
try {
semOdd.acquire();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
System.out.println(Thread.currentThread().getName() + num);
semEven.release();
}
}
class Even implements Runnable {
private SharedPrinter sp;
private int max;
// standard constructor
#Override
public void run() {
for (int i = 2; i <= max; i = i + 2) {
sp.printEvenNum(i);
}
}
}
class Odd implements Runnable {
private SharedPrinter sp;
private int max;
// standard constructors
#Override
public void run() {
for (int i = 1; i <= max; i = i + 2) {
sp.printOddNum(i);
}
}
}
public static void main(String[] args) {
SharedPrinter sp = new SharedPrinter();
Thread odd = new Thread(new Odd(sp, 10),"Odd");
Thread even = new Thread(new Even(sp, 10),"Even");
odd.start();
even.start();
}
Refer : here
Lets say I have n threads concurrently taking values from a shared queue:
public class WorkerThread implements Runnable{
private BlockingQueue queue;
private ArrayList<Integer> counts = new ArrayList<>();
private int count=0;
public void run(){
while(true) {
queue.pop();
count++;
}
}
}
Then for each thread, I want to count every 5 seconds how many items it has dequeued, and then store it in its own list (counts)
I've seen here Print "hello world" every X seconds how you can run some code every x seconds:
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask(){
#Override
public void run(){
counts.add(count);
count = 0
}
}, 0, 5000);
The problem with this is that I can't access count variable and the list of counts unless they are static. But I don't want them to be static because I don't want the different threads to share those variables.
Any ideas of how to handle this?
I don't think it's possible to use scheduled execution for you case(neither Timer nor ScheduledExecutorService), because each new scheduled invocation will create a new tasks with while loop. So number of tasks will increase constantly.
If you don't need to access this list of counts in runtime i would suggest something like this one:
static class Task implements Runnable {
private final ThreadLocal<List<Integer>> counts = ThreadLocal.withInitial(ArrayList::new);
private volatile List<Integer> result = new ArrayList<>();
private BlockingQueue<Object> queue;
public Task(BlockingQueue<Object> queue) {
this.queue = queue;
}
#Override
public void run() {
int count = 0;
long start = System.nanoTime();
try {
while (!Thread.currentThread().isInterrupted()) {
queue.take();
count++;
long end = System.nanoTime();
if ((end - start) >= TimeUnit.SECONDS.toNanos(1)) {
counts.get().add(count);
count = 0;
start = end;
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// the last value
counts.get().add(count);
// copy the result cause it's not possible
// to access thread local variable outside of this thread
result = counts.get();
}
public List<Integer> getCounts() {
return result;
}
}
public static void main(String[] args) throws Exception {
ExecutorService executorService = Executors.newFixedThreadPool(3);
BlockingQueue<Object> blockingQueue = new LinkedBlockingQueue<>();
Task t1 = new Task(blockingQueue);
Task t2 = new Task(blockingQueue);
Task t3 = new Task(blockingQueue);
executorService.submit(t1);
executorService.submit(t2);
executorService.submit(t3);
for (int i = 0; i < 50; i++) {
blockingQueue.add(new Object());
Thread.sleep(100);
}
// unlike shutdown() interrupts running threads
executorService.shutdownNow();
executorService.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("t1 " + t1.getCounts());
System.out.println("t2 " + t2.getCounts());
System.out.println("t3 " + t3.getCounts());
int total = Stream.concat(Stream.concat(t1.getCounts().stream(), t2.getCounts().stream()), t3.getCounts().stream())
.reduce(0, (a, b) -> a + b);
// 50 as expected
System.out.println(total);
}
Why not a static AtomicLong?
Or the WorkerThread(s) can publish that they poped to the TimerTask or somewhere else? And the TimerTask reads that info?
I'm trying to implement a concurrent cache in java for learning propose.
This code is responsable for garantee thread-safy operations. So, whenever a thread try to fetch a value, if this value is not already cached, the algorithm should calculate it from the last cached one.
My problem is that i'm getting null values that are supposed to be already cached. I'm using semaphore (though i've tried with ReentrantLock too, so i think it's not the problem) to assure the thread-safety access to an HashMap.
Note that i would like to restrict the locked area to the smallest possible. So i would not like to synchronize the entire method or utilize an already thread safe ConcurrentMap.
Here is a complete simple code:
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Semaphore;
public class ConcurrentCache {
private final Semaphore semaphore = new Semaphore(1);
private final Map<Integer, Integer> cache;
private int lastCachedNumber;
public ConcurrentCache() {
cache = new HashMap<Integer, Integer>();
cache.put(0, 0);
lastCachedNumber = 0;
}
public Integer fetchAndCache(int n) {
//if it's already cached, supposedly i can access it in an unlocked way
if (n <= lastCachedNumber)
return cache.get(n);
lock();
Integer number;
if (n < lastCachedNumber) { // check it again. it may be updated by another thread
number = cache.get(n);
} else {
//fetch a previous calculated number.
number = cache.get(lastCachedNumber);
if (number == null)
throw new IllegalStateException(String.format(
"this should be cached. n=%d, lastCachedNumber=%d", n,
lastCachedNumber));
for (int i = lastCachedNumber + 1; i <= n; i++) {
number = number + 1;
cache.put(i, number);
lastCachedNumber = i;
}
}
unlock();
return number;
}
private void lock() {
try {
semaphore.acquire();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
private void unlock() {
semaphore.release();
}
public static void main(String[] args) {
ConcurrentCache cachedObject = new ConcurrentCache();
for (int nThreads = 0; nThreads < 5; nThreads++) {
new Thread(new Runnable() {
#Override
public void run() {
for (int cacheValue = 0; cacheValue < 1000; cacheValue++) {
if (cachedObject.fetchAndCache(cacheValue) == null) {
throw new IllegalStateException(String.format(
"the number %d should be cached",
cacheValue));
}
}
}
}).start();
}
}
}
Thank you for you help.
Few pointers/ideas:
1) pre-size your Map when you create it to accommodate all/many of your future cached values, Map resizing is very thread unsafe and time consuming
2) you can simplify your whole algorithm to
YourClass.get(int i) {
if (!entryExists(i)) {
lockEntry(i);
entry = createEntry(i);
putEntryInCache(i, entry);
unlockEntry(i);
}
return entry;
}
Edit
Another point:
3) your approach to caching is very bad - imagine what will happen if the 1st request is to get something # position 1,000,000?
Pre-populate in separate thread is going to be a lot better...
I have an array with the size of n which is filled with the numbers 1..n.
I need to sum this array using m threads by each time taking two elements, sum them up and inserting the sum back to the array.
Here is what I tried to do.
The synchronized part first
public class MultiThreadedSum {
private ArrayBuffer ArrayBufferInst;
private int Sum;
private boolean Flag, StopFlag;
public MultiThreadedSum(ArrayBuffer ArrayBufferInst) {
this.ArrayBufferInst = ArrayBufferInst;
Sum = 0;
Flag = false;
StopFlag = false;
}
public synchronized void Sum2Elements() {
while(Flag){
try {wait();}
catch (InterruptedException e){}
}
Flag = true;
if (StopFlag) {
notifyAll();
return;
}
System.out.println("Removing and adding 2 elements.");
Sum = ArrayBufferInst.Sum2Elements();
notifyAll();
}
public synchronized void InsertElement() {
while(!Flag){
try {wait();}
catch (InterruptedException e){}
}
Flag = false;
if (StopFlag) {
notifyAll();
return;
}
System.out.println("Inserting the sum.");
ArrayBufferInst.InsertElement(Sum);
if (ArrayBufferInst.RetunrSize() == 1) {
StopFlag = true;
}
System.out.println(ArrayBufferInst);
notifyAll();
}
public boolean ReturnStopFlag(){
return StopFlag;
}
#Override
public String toString(){
return ArrayBufferInst.toString();
}
}
I've splitted the m threads to 2 groups, half of them will do the summarization and half will do the adding using wait and notify.
public class Sum2ElementsThread implements Runnable{
private MultiThreadedSum MultiThreadedSumInst;
public Sum2ElementsThread( MultiThreadedSum MultiThreadedSumInst){
this.MultiThreadedSumInst = MultiThreadedSumInst;
}
#Override
public void run() {
while(!MultiThreadedSumInst.ReturnStopFlag())
MultiThreadedSumInst.Sum2Elements();
}
}
public class InsertThread implements Runnable{
private MultiThreadedSum MultiThreadedSumInst;
public InsertThread( MultiThreadedSum MultiThreadedSumInst) {
this.MultiThreadedSumInst = MultiThreadedSumInst;
}
#Override
public void run() {
while(!MultiThreadedSumInst.ReturnStopFlag()) {
MultiThreadedSumInst.InsertElement();
}
}
}
Here is part of the main:
ArrayBufferInst = new ArrayBuffer(n);
System.out.println("The Array");
System.out.println(ArrayBufferInst);
MultiThreadedSumInst = new MultiThreadedSum(ArrayBufferInst);
ExecutorService Threads = Executors.newCachedThreadPool();
for (i = 0; i < m/2; i++)
Threads.execute( new Sum2ElementsThread(MultiThreadedSumInst) );
for (; i < m; i++)
Threads.execute( new InsertThread(MultiThreadedSumInst) );
Threads.shutdown();
while(!MultiThreadedSumInst.ReturnStopFlag()){}
System.out.println("The sum of the array is " + MultiThreadedSumInst);
And the buffer
public class ArrayBuffer {
private ArrayList<Integer> ArrayBufferInst;
public ArrayBuffer(int SizeOfBuffer){
int i;
ArrayBufferInst = new ArrayList<>(SizeOfBuffer);
for (i = 0; i < SizeOfBuffer; i++){
ArrayBufferInst.add(i, i+1);
}
}
public int Sum2Elements(){
if (ArrayBufferInst.size() < 2){
return -1;
}
return ArrayBufferInst.remove(0) + ArrayBufferInst.remove(0);
}
public void InsertElement(int Elem) {
ArrayBufferInst.add(Elem);
}
public int RetunrSize(){
return ArrayBufferInst.size();
}
#Override
public String toString() {
return ArrayBufferInst.toString();
}
}
My question is about the end of the main, sometimes the program stop, sometime it doesn't, I know all the threads are exiting the run method because I checked that.
Sometimes I see the The sum of the array is message, sometimes I don't.
Your problem lies here:
public synchronized void Sum2Elements() {
while(Flag){
try {wait();}
catch (InterruptedException e){}
}
Flag = true;
// rest of method omitted here
}
When this part of the program is executed for the first time Flag is false and the loop is ignored. All subsequent executions of this method will result in a deadlock since this is the only place where you set Flag to false.
Not even interrupting will work, since you have no break in your loop and after the interruption you just go on to the next cycle and wait() forever.
Oh and read this - Java is not c#
It is really a very long code for you task.
Maybe i can propose a different sollution.
You can just split array for m parts (m - is a number of threads) - and each thread would sum it`s own part. When summing is over in each Thread - just sum all part results.
Or maybe i didnt get your task correctly. Specify more details please (the full task).
I have some thread-related questions, assuming the following code. Please ignore the possible inefficiency of the code, I'm only interested in the thread part.
//code without thread use
public static int getNextPrime(int from) {
int nextPrime = from+1;
boolean superPrime = false;
while(!superPrime) {
boolean prime = true;
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
if(prime) {
superPrime = true;
} else {
nextPrime++;
}
}
return nextPrime;
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i < 10000;i++) {
list.add(primeStart);
primeStart = getNextPrime(primeStart);
}
}
If I'm running the code like this and it takes about 56 seconds. If, however, I have the following code (as an alternative):
public class PrimeRunnable implements Runnable {
private int from;
private int lastPrime;
public PrimeRunnable(int from) {
this.from = from;
}
public boolean isPrime(int number) {
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return false;
}
}
lastPrime = number;
return true;
}
public int getLastPrime() {
return lastPrime;
}
public void run() {
while(!isPrime(++from))
;
}
}
public static void main(String[] args) {
int primeStart = 5;
ArrayList list = new ArrayList();
for(int i = 0;i < 10000;i++) {
PrimeRunnable pr = new PrimeRunnable(primeStart);
Thread t = new Thread(pr);
t.start();
t.join();
primeStart = pr.getLastPrime();
list.add(primeStart);
}
}
The whole operation takes about 7 seconds. I am almost certain that even though I only create one thread at a time, a thread doesn't always finish when another is created. Is that right? I am also curious: why is the operation ending so fast?
When I'm joining a thread, do other threads keep running in the background, or is the joined thread the only one that's running?
By putting the join() in the loop, you're starting a thread, then waiting for that thread to stop before running the next one. I think you probably want something more like this:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
// Pull thread pool count out into a value so you can easily change it
int threadCount = 10000;
Thread[] threads = new Thread[threadCount];
// Start all threads
for(int i = 0;i < threadCount;i++) {
// Pass list to each Runnable here
// Also, I added +i here as I think the intention is
// to test 10000 possible numbers>5 for primeness -
// was testing 5 in all loops
PrimeRunnable pr = new PrimeRunnable(primeStart+i, list);
Thread[i] threads = new Thread(pr);
threads[i].start(); // thread is now running in parallel
}
// All threads now running in parallel
// Then wait for all threads to complete
for(int i=0; i<threadCount; i++) {
threads[i].join();
}
}
By the way pr.getLastPrime() will return 0 in the case of no prime, so you might want to filter that out before adding it to your list. The PrimeRunnable has to absorb the work of adding to the final results list. Also, I think PrimeRunnable was actually broken by still having incrementing code in it. I think this is fixed, but I'm not actually compiling this.
public class PrimeRunnable implements Runnable {
private int from;
private List results; // shared but thread-safe
public PrimeRunnable(int from, List results) {
this.from = from;
this.results = results;
}
public void isPrime(int number) {
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return;
}
}
// found prime, add to shared results
this.results.add(number);
}
public void run() {
isPrime(from); // don't increment, just check one number
}
}
Running 10000 threads in parallel is not a good idea. It's a much better idea to create a reasonably sized fixed thread pool and have them pull work from a shared queue. Basically every worker pulls tasks from the same queue, works on them and saves the results somewhere. The closest port of this with Java 5+ is to use an ExecutorService backed by a thread pool. You could also use a CompletionService which combines an ExecutorService with a result queue.
An ExecutorService version would look like:
public static void main(String[] args) {
int primeStart = 5;
// Make thread-safe list for adding results to
List list = Collections.synchronizedList(new ArrayList());
int threadCount = 16; // Experiment with this to find best on your machine
ExecutorService exec = Executors.newFixedThreadPool(threadCount);
int workCount = 10000; // See how # of work is now separate from # of threads?
for(int i = 0;i < workCount;i++) {
// submit work to the svc for execution across the thread pool
exec.execute(new PrimeRunnable(primeStart+i, list));
}
// Wait for all tasks to be done or timeout to go off
exec.awaitTermination(1, TimeUnit.DAYS);
}
Hope that gave you some ideas. And I hope the last example seemed a lot better than the first.
You can test this better by making the exact code in your first example run with threads. Sub your main method with this:
private static int currentPrime;
public static void main(String[] args) throws InterruptedException {
for (currentPrime = 0; currentPrime < 10000; currentPrime++) {
Thread t = new Thread(new Runnable() {
public void run() {
getNextPrime(currentPrime);
}});
t.run();
t.join();
}
}
This will run in the same time as the original.
To answer your "join" question: yes, other threads can be running in the background when you use "join", but in this particular case you will only have one active thread at a time, because you are blocking the creation of new threads until the last thread is done executing.
JesperE is right, but I don't believe in only giving hints (at least outside a classroom):
Note this loop in the non-threaded version:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
}
}
As opposed to this in the threaded version:
for(int i = 2;i < from;i++) {
if((number % i) == 0) {
return false;
}
}
The first loop will always run completely through, while the second will exit early if it finds a divisor.
You could make the first loop also exit early by adding a break statement like this:
for(int i = 2;i < nextPrime;i++) {
if(nextPrime % i == 0) {
prime = false;
break;
}
}
Read your code carefully. The two cases aren't doing the same thing, and it has nothing to do with threads.
When you join a thread, other threads will run in the background, yes.
Running a test, the second one doesn't seem to take 9 seconds--in fact, it takes at least as long as the first (which is to be expected, threding can't help the way it's implemented in your example.
Thread.join will only return when the thread.joined terminates, then the current thread will continue, the one you called join on will be dead.
For a quick reference--think threading when starting one iteration does not depend on the result of the previous one.