In my code I send batch requests to a custom Database.
The response time of the each batch is in milliseconds.
However I have a limitation on the number of batches I can send per second. Max of one batch. In case of additional batches, the batch would be dropped which is not what is desired.
I can use Thread.sleep() for a second so that I would never hit the database with more than one batch per second.
The pseudo code looks like :
createBatch()
sendBatch()
What I am trying to do is limit the number of times sendBatch() is called in a second.
Can I achieve this using any throttling library rather than using Thread.sleep()?
You can use RateLimiter from guava.
see: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/RateLimiter.html
i think this is problem of limiting number resources that can be utilized at a time. Try using pooling technique. in java you can use ExecutorService to do the same. Refer - http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html
here is sample code
class Worker implements Callable<String> {
private int id;
public Worker(int id) {
this.id = id;
System.out.println("im worker = " + id);
}
public String call() throws Exception {
System.out.println("Started some long operation - " + id);
Thread.sleep(1000); // only to simulate long running operation
System.out.println("Fiished long operation - " + id);
return null;
}
}
// main mehtod
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) throws ExecutionException, InterruptedException {
final int poolSize = 100;
final int workerSize = 1000;
ExecutorService executor = Executors.newFixedThreadPool(poolSize);
Future[] futures = new Future[workerSize];
Worker[] workers = new Worker[workerSize];
for(int i = 0; i < workerSize; i++){
workers[i] = new Worker(i);
}
System.out.println("finished creating workers================");
for(int i = 0; i < workerSize; i++){
futures[i] = executor.submit(workers[i]);
}
for (int i = 0; i < workerSize; i++){
futures[i].get();
}
System.out.println("Finished executing all");
executor.shutdown();
}
}
Related
While testing concurrency, I found something unexpected.
Concurrency was controlled using concurrentHashMap and AtomicLong.
public class HumanRepository {
private final static Map<Long, Human> STORE = new ConcurrentHashMap<>();
private AtomicLong sequence = new AtomicLong();
public void save(Human human) {
STORE.put(sequence.incrementAndGet(), human);
}
public int size() {
return STORE.size();
}
public Long getSeq() {
return sequence.get();
}
}
I tested saving in multiple threads.
#Test
void name() throws NoSuchMethodException, InterruptedException {
final int threads = 3_500;
final ExecutorService es = Executors.newFixedThreadPool(threads);
final CountDownLatch count = new CountDownLatch(threads);
final HumanRepository repository = new HumanRepository();
for (int i = 0; i < threads; i++) {
try {
es.execute(() -> repository.save(new Human("aa")));
} finally {
count.countDown();
}
}
count.await();
System.out.println("seq = " + repository.getSeq());
System.out.println("size = " + repository.size());
}
I tested it with 3500 threads simultaneously. The result I expected is 3500 for both seq and size.
But sometimes I get seq=3499, size=3500.
That's weird. It is strange that seq does not come out as 3500, and even though the size is 3500, it does not make sense that seq is 3499.
I don't know why the data number and seq in the map are not the same and 3500 is not coming out.
** If you do Thread.sleep(400L); after count.await();, surprisingly, the value of seq is 3500
You are not actually waiting for all tasks to complete. Which means that if you get the 3500/3500 output, it's by chance.
Specifically, you decrease the countdown latch on the main thread after scheduling the job, instead of inside of the job, once it's done. That means your countdownlatch is basically just another glorified loop variable that doesn't do any inter-thread communication. Try something like this instead:
for (int i = 0; i < threads; i++) {
es.execute(() -> {
repository.save(new Human("aa"));
count.countDown();
});
}
You are calling count.countDown() outside the thread executing the HumanRepository.save(). So its possible that the main thread is not synchronized for the completion of the threads.
So you may see the results of repository.getSeq() while one thread is running. Can you try with the following code?
final int threads = 3_500;
final ExecutorService es = Executors.newFixedThreadPool(threads);
final CountDownLatch count = new CountDownLatch(threads);
final HumanRepository repository = new HumanRepository();
for (int i = 0; i < threads; i++) {
try {
es.execute(() -> {
repository.save(new Human("aa"));
count.countDown();
});
} finally {
}
}
count.await();
System.out.println("seq = " + repository.getSeq());
System.out.println("size = " + repository.size());
I'm currently trying to increase the performance of my software by implementing the producer-consumer pattern. In my particular case I have a producer that sequentially creates Rows and multiple consumers that perform some task for a given batch of rows.
The problem I'm facing now is that when I measure the performance of my Producer-Consumer pattern, I can see that the producer's running time massively increases and I don't understand why this is the case.
So far I mainly profiled my code and did micro-benchmarking yet the results did not lead me to the actual problem.
public class ProdCons {
static class Row {
String[] _cols;
Row() {
_cols = Stream.generate(() -> "Row-Entry").limit(5).toArray(String[]::new);
}
}
static class Producer {
private static final int N_ITER = 8000000;
final ExecutorService _execService;
final int _batchSize;
final Function<Row[], Consumer> _f;
Producer(final int batchSize, final int nThreads, Function<Row[], Consumer> f) throws InterruptedException {
_execService = Executors.newFixedThreadPool(nThreads);
_batchSize = batchSize;
_f = f;
// init all threads to exclude their generaration time
startThreads();
}
private void startThreads() throws InterruptedException {
List<Callable<Void>> l = Stream.generate(() -> new Callable<Void>() {
#Override
public Void call() throws Exception {
Thread.sleep(10);
return null;
}
}).limit(4).collect(Collectors.toList());
_execService.invokeAll(l);
}
long run() throws InterruptedException {
final long start = System.nanoTime();
int idx = 0;
Row[] batch = new Row[_batchSize];
for (int i = 0; i < N_ITER; i++) {
batch[idx++] = new Row();
if (idx == _batchSize) {
_execService.submit(_f.apply(batch));
batch = new Row[_batchSize];
idx = 0;
}
}
final long time = System.nanoTime() - start;
_execService.shutdownNow();
_execService.awaitTermination(100, TimeUnit.MILLISECONDS);
return time;
}
}
static abstract class Consumer implements Callable<String> {
final Row[] _rowBatch;
Consumer(final Row[] data) {
_rowBatch = data;
}
}
static class NoOpConsumer extends Consumer {
NoOpConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
return null;
}
}
static class SomeConsumer extends Consumer {
SomeConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
String res = null;
for (int i = 0; i < 1000; i++) {
res = "";
for (final Row r : _rowBatch) {
for (final String s : r._cols) {
res += s;
}
}
}
return res;
}
}
public static void main(String[] args) throws InterruptedException {
final int nRuns = 10;
long totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new NoOpConsumer(data)).run();
}
System.out.println("Avg time with NoOpConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new SomeConsumer(data)).run();
}
System.out.println("Avg time with SomeConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
}
Actually, since the consumers run in different threads than the producer, I would expect that the running time of the producer is not effected by the Consumer's workload. However, running the program I get the following output
#1 Thread, #100 batch size
Avg time with NoOpConsumer: 0.7507254368s
Avg time with SomeConsumer: 1.5334749871s
Note that the time measurement does only measure the production time and not the consumer time and that not submitting any jobs requires on avg. ~0.6 secs.
Even more surprising is that when I increase the number of threads from 1 to 4, I get the following results (4-cores with hyperthreading).
#4 Threads, #100 batch size
Avg time with NoOpConsumer: 0.7741189636s
Avg time with SomeConsumer: 2.5561667638s
Am I doing something wrong? What am I missing? Currently I have to believe that the running time differences are due to context switches or anything related to my system.
Threads are not completely isolated from one another.
It looks like your SomeConsumer class allocates a lot of memory, and this produces garbage collection work that is shared between all threads, including your producer thread.
It also accesses a lot of memory, which can knock the memory used by the producer out of L1 or L2 cache. Accessing real memory takes a lot longer than accessing cache, so this can make your producer take longer as well.
Note also that I didn't actually verify that you're measuring the producer time properly, and it's easy to make mistakes there.
Lets say I have n threads concurrently taking values from a shared queue:
public class WorkerThread implements Runnable{
private BlockingQueue queue;
private ArrayList<Integer> counts = new ArrayList<>();
private int count=0;
public void run(){
while(true) {
queue.pop();
count++;
}
}
}
Then for each thread, I want to count every 5 seconds how many items it has dequeued, and then store it in its own list (counts)
I've seen here Print "hello world" every X seconds how you can run some code every x seconds:
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask(){
#Override
public void run(){
counts.add(count);
count = 0
}
}, 0, 5000);
The problem with this is that I can't access count variable and the list of counts unless they are static. But I don't want them to be static because I don't want the different threads to share those variables.
Any ideas of how to handle this?
I don't think it's possible to use scheduled execution for you case(neither Timer nor ScheduledExecutorService), because each new scheduled invocation will create a new tasks with while loop. So number of tasks will increase constantly.
If you don't need to access this list of counts in runtime i would suggest something like this one:
static class Task implements Runnable {
private final ThreadLocal<List<Integer>> counts = ThreadLocal.withInitial(ArrayList::new);
private volatile List<Integer> result = new ArrayList<>();
private BlockingQueue<Object> queue;
public Task(BlockingQueue<Object> queue) {
this.queue = queue;
}
#Override
public void run() {
int count = 0;
long start = System.nanoTime();
try {
while (!Thread.currentThread().isInterrupted()) {
queue.take();
count++;
long end = System.nanoTime();
if ((end - start) >= TimeUnit.SECONDS.toNanos(1)) {
counts.get().add(count);
count = 0;
start = end;
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// the last value
counts.get().add(count);
// copy the result cause it's not possible
// to access thread local variable outside of this thread
result = counts.get();
}
public List<Integer> getCounts() {
return result;
}
}
public static void main(String[] args) throws Exception {
ExecutorService executorService = Executors.newFixedThreadPool(3);
BlockingQueue<Object> blockingQueue = new LinkedBlockingQueue<>();
Task t1 = new Task(blockingQueue);
Task t2 = new Task(blockingQueue);
Task t3 = new Task(blockingQueue);
executorService.submit(t1);
executorService.submit(t2);
executorService.submit(t3);
for (int i = 0; i < 50; i++) {
blockingQueue.add(new Object());
Thread.sleep(100);
}
// unlike shutdown() interrupts running threads
executorService.shutdownNow();
executorService.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("t1 " + t1.getCounts());
System.out.println("t2 " + t2.getCounts());
System.out.println("t3 " + t3.getCounts());
int total = Stream.concat(Stream.concat(t1.getCounts().stream(), t2.getCounts().stream()), t3.getCounts().stream())
.reduce(0, (a, b) -> a + b);
// 50 as expected
System.out.println(total);
}
Why not a static AtomicLong?
Or the WorkerThread(s) can publish that they poped to the TimerTask or somewhere else? And the TimerTask reads that info?
Though there are similar issues, I couldn't found any similar examples like the one I got. I really appreciate any help understanding where I got wrong with my implementation.
What I'm trying to do:
I have a Main class Driver, which can instantiates unknown number of threads. Each thread call a singleton class which should simulate a 'fake' file transfer action.
The issue I have is that I need to limit the concurrent transfers to 2 transfers, regardless the number of concurrent requests.
The way I tried to solve my problem is by adding each new Thread in a ConcurrentLinkedQueue and managing it by using Executors.newFixedThreadPool(POOL_SIZE) to limit the concurrent threads to be 2. for every interation - I poll new thread from the pool using pool.submit.
The Problem I have is my output is like this:
[Thread1], [Thread1, Thread2], [Thread1, Thread2, Thread3]...
While it should be:
[Thread1, Thread2], [Thread3, Thread4]
Why the limitation doesn't work here?
My implementation:
Copier - this is my singleton class.
public class Copier {
private final int POOL_SIZE = 2;
private static volatile Copier instance = null;
private Queue<Reportable> threadQuere = new ConcurrentLinkedQueue();
private static FileCopier fileCopier = new FileCopier();
private Copier() {
}
public static Copier getInstance() {
if (instance == null) {
synchronized (Copier.class) {
if (instance == null) {
instance = new Copier();
}
}
}
return instance;
}
public void fileTransfer(Reportable reportable) {
threadQuere.add(reportable);
ExecutorService pool = Executors.newFixedThreadPool(POOL_SIZE);
for (int i=0; i < threadQuere.size(); i++) {
System.out.println("This is the " + (i+1) + " thread");
pool.submit(new CopyThread());
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
CopyThread - represend a thread class
public class CopyThread implements Reportable, Runnable {
private static FileCopier fileCopier = new FileCopier();
#Override
public void report(String bitrate) {
System.out.println(bitrate);
}
#Override
public void run() {
synchronized(fileCopier) {
long startTime = System.nanoTime();
long bytes = fileCopier.copyFile();
long endTime = System.nanoTime();
double duration = (double)(endTime - startTime) / 1000000000; // get in seconds
double bytesInMegas = (double) bytes / 1000000;
report(bytesInMegas + "MB were transferred in " + duration + " seconds");
}
}
}
Driver - my main class where do I create all the threads
public class Driver {
public static void main(String[] args) {
Copier copier = Copier.getInstance();
CopyThread copyThread1 = new CopyThread();
CopyThread copyThread2 = new CopyThread();
CopyThread copyThread3 = new CopyThread();
CopyThread copyThread4 = new CopyThread();
copier.fileTransfer(copyThread1);
copier.fileTransfer(copyThread2);
copier.fileTransfer(copyThread3);
copier.fileTransfer(copyThread4);
int q = 0;
}
}
A simpler solution would be a Semaphore with 2 permits.
This makes sure that "outside" threads can't bypass the limit either, since your solution expects that the simultaneous tasks are limited by the size of the threadpool.
Your solution uses several concurrency tools when a single one would suffice. Your DCL singleton is a bit outdated too.
Everything is probably fine here (although a bit weird). You are printing the thread numbers before submiting, what you need to do is put print in a run method, and you will see that everything works fine. The print are all gonna go off normally, because the area where you are using print has nothing to do with Executors. There is more problems with your code, but I think you did all that just for testing/learning so that's why it's like that.
In that case, like I said, put prints in the run method (you can use some static variable in CopyThread class for counting threads). Your output will be something like 2 prints about thread numbers (1 and 2), 2 prints about how long transfer took and then prints about thread 3 and 4 (I say probably, because we are working with threads, can't be sure of anything) - all this at the step 4 ofcourse, when your fileTransfer submits 4 runnables. Your singleton is outdated, because it uses double checked locking, which is wrong on multithreaded machine, check this: here. That's not ruining your program so worry about it later. About everything else (weird queue usage, fileTransfer method making new threads pools etc.) like I said, it's probably for learning, but if it's not - your queue may as well be deleted, you are using it only for counting and counting like this could be done with some counter variable, and your fileTransfer method should just submit new runnable to pool (which would be instance variable) to transfer a file, not create pool and submit few runnables, it's kinda anty-intuitive.
Edit: check this, I put all in Cat.java for simplicity, changed some things that I had to change (I don't have FileCopier class etc., but answer to your problem is here):
import java.util.*;
import java.util.concurrent.*;
class Copier {
private final int POOL_SIZE = 2;
private static volatile Copier instance = null;
private Copier() {
}
public static Copier getInstance() {
if (instance == null) {
synchronized (Copier.class) {
if (instance == null) {
instance = new Copier();
}
}
}
return instance;
}
public void fileTransfer() {
ExecutorService pool = Executors.newFixedThreadPool(POOL_SIZE);
for (int i=0; i < 4; i++) {
pool.submit(new CopyThread());
}
pool.shutdown();
try {
pool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
class CopyThread implements Runnable {
private static int counter = 0;
public void report(String bitrate) {
System.out.println(bitrate);
}
Object obj = new Object();
#Override
public void run() {
synchronized(obj) {
System.out.println("This is the " + (++counter) + " thread");
long startTime = System.nanoTime();
long bytes = 0;
for(int i=0; i<100000; i++)
bytes+=1;
long endTime = System.nanoTime();
double duration = (double)(endTime - startTime) / 1000000000; // get in seconds
double bytesInMegas = (double) bytes / 1000000;
report(bytesInMegas + "MB were transferred in " + duration + " seconds");
}
}
}
public class Cat {
public static void main(String[] args) {
Copier copier = Copier.getInstance();
copier.fileTransfer();
}
}
I am working on a classification problem and I have implemented a grid search algorithm in order to find the best accuracy. My problem is that the program's execution time is about 2 hours and I have tried to improve this time by using threads. Obviously something I'm doing wrong since the execution time was the same even after implementing the threads. Bellow is the algorithm.
I must specify that is the first time I am using threads, I have read some good things about Executors, but I can't figure out how to implement them.
public static void gridSearch(Dataset ds)
{
double bestAcc = 0;
for (int i = -5; i < 15; i++) {
double param1 = Math.pow(2, i);
for (int j = -15; j < 3; j++) {
double param2 = Math.pow(2, j);
int size = 10;
CrossValidation[] works = new CrossValidation[size];
Thread[] threads = new Thread[size];
for (int k=1;k<=size;k++) {
CrossValidation po = new CrossValidation(param1, param2, ds);;
works[k-1] = po;
Thread t = new Thread(po);
threads[k-1] = t;
t.start();
}
for (int k = 0; k < size; k++) {
try { threads[k].join(); } catch (InterruptedException ex) {}
double accuracy = works[k].getAccuracy();
accuracy /= 106;
if (accuracy > bestAccuracy)
bestAcc = accuracy;
}
}
}
System.out.println("Best accuracy: " + bestAcc);
}
The CrossValidation class implements Runnable and has a method getAccuracy that returns the accuracy.
Please help me figure it out what I am doing wrong, in order to improve the execution time.
Your problem seems to be that you start for each parameter setting 10 threads instead of starting a thread for each parameter setting. Look closely what you're doing here. You're generating param1 and param2 and then start 10 threads that work with those parameters - redundantly. After that you are waiting for those threads to finish before you start over again.
But no worries, I have prepared something for you ...
I want to show you how you could make a Thread Pool do what you actually want to achieve here. It will be easier to understand once you get it running and note that:
You can download the whole example here.
First you need a WorkerThread and something like CVResult to return the results. This is where you are going to perform the CrossValidation algorithm:
public static class CVResult {
public double param1;
public double param2;
public double accuracy;
}
public static class WorkerThread implements Runnable {
private double param1;
private double param2;
private double accuracy;
public WorkerThread(double param1, double param2){
this.param1 = param1;
this.param2 = param2;
}
#Override
public void run() {
System.out.println(Thread.currentThread().getName() +
" [parameter1] " + param1 + " [parameter2]: " + param2);
processCommand();
}
private void processCommand() {
try {
Thread.sleep(500);
;
/*
* ### PERFORM YOUR CROSSVALIDATION ALGORITHM HERE ###
*/
this.accuracy = this.param1 + this.param2;
// Give back result:
CVResult result = new CVResult();
result.accuracy = this.accuracy;
result.param1 = this.param1;
result.param2 = this.param2;
Main.addResult(result);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
You also need to assure you have access to a ExecutorService and List<Future>. ExecutorService will take care of your threads and we will initialize the number of threads to be the number of cores that your CPU has available. This will ensure that no more threads are running than cores are available on your CPU - however - no task gets lost because each thread gets enqueued and starts after another has finished. You'll see that soon. List<Future> will allow us to wait for all threads to finish before we continue with the main thread. List<CVResult> is of course there to hold the results added by the threads (note that it is synchronized since multiple threads are going to access it).
private static ExecutorService executor = null;
private static List<Future> futures = new ArrayList<>();
private static List<CVResult> resultList = Collections.synchronizedList(new ArrayList<CVResult>());
This is how your gridSearch() would look like. You don't have to initialize executor here.. you can do that wherever you want of course:
public static void gridSearch(/*Dataset ds*/)
{
double bestAcc = 0;
int cores = Runtime.getRuntime().availableProcessors();
executor = Executors.newFixedThreadPool(cores);
for (int i = -5; i < 15; i++) {
double param1 = Math.pow(2, i);
for (int j = -15; j < 3; j++) {
double param2 = Math.pow(2, j);
Runnable worker = new WorkerThread(param1, param2);
futures.add(executor.submit(worker));
}
}
System.out.println("Waiting for all threads to terminate ..");
// Joining all threads in order to wait for all to finish
// before returning from gridSearch()
for (Future future: futures) {
try {
future.get(100, TimeUnit.SECONDS);
} catch (Throwable cause) {
// process cause
}
}
System.out.println("Printing results ..");
for(CVResult result : resultList) {
System.out.println("Acc: " + result.accuracy +
" for param1: " + result.param1 +
" | param2: " + result.param2);
}
}
Last but not least here is a synchronized method to add your results to the list:
public static void addResult(CVResult accuracy) {
synchronized( resultList ) {
resultList.add(accuracy);
}
}
If you call this in your main e.g. like this:
public static void main(String[] args) {
gridSearch(/* params */);
System.out.println("All done.");
}
You'll get an output like this:
...
pool-1-thread-5 [parameter1] 0.0625 [parameter2]: 3.0517578125E-5
param1 0.03125
param2 1.0
pool-1-thread-4 [parameter1] 0.0625 [parameter2]: 0.25
param1 0.0625
param2 0.03125
...
Printing results ..
...
Acc: 16384.5 for param1: 16384.0 | param2: 0.5
Acc: 16386.0 for param1: 16384.0 | param2: 2.0
...
All done.
Possibly because thread creation/teardown overhead is increasing the time needed to run the threads, fix this by using Executors. This will help you get started. As commented already, your processor may also not have the available processing threads or physical cores to execute your threads concurrently.
More prominently, between each of the -15 to 3 iterations, you must wait. To fix this, move your waiting and processing to the end of the for loop, once everything is processed. That way, the last 10 threads do not need to completely before starting the next batch. Additionally, I recommend using a CountDownLatch to await full completion before processing the results.