Java unexpected concurrent result

Java unexpected concurrent result - java

While testing concurrency, I found something unexpected.
Concurrency was controlled using concurrentHashMap and AtomicLong.
public class HumanRepository {
private final static Map<Long, Human> STORE = new ConcurrentHashMap<>();
private AtomicLong sequence = new AtomicLong();
public void save(Human human) {
STORE.put(sequence.incrementAndGet(), human);
}
public int size() {
return STORE.size();
}
public Long getSeq() {
return sequence.get();
}
}
I tested saving in multiple threads.
#Test
void name() throws NoSuchMethodException, InterruptedException {
final int threads = 3_500;
final ExecutorService es = Executors.newFixedThreadPool(threads);
final CountDownLatch count = new CountDownLatch(threads);
final HumanRepository repository = new HumanRepository();
for (int i = 0; i < threads; i++) {
try {
es.execute(() -> repository.save(new Human("aa")));
} finally {
count.countDown();
}
}
count.await();
System.out.println("seq = " + repository.getSeq());
System.out.println("size = " + repository.size());
}
I tested it with 3500 threads simultaneously. The result I expected is 3500 for both seq and size.
But sometimes I get seq=3499, size=3500.
That's weird. It is strange that seq does not come out as 3500, and even though the size is 3500, it does not make sense that seq is 3499.
I don't know why the data number and seq in the map are not the same and 3500 is not coming out.
** If you do Thread.sleep(400L); after count.await();, surprisingly, the value of seq is 3500

You are not actually waiting for all tasks to complete. Which means that if you get the 3500/3500 output, it's by chance.
Specifically, you decrease the countdown latch on the main thread after scheduling the job, instead of inside of the job, once it's done. That means your countdownlatch is basically just another glorified loop variable that doesn't do any inter-thread communication. Try something like this instead:
for (int i = 0; i < threads; i++) {
es.execute(() -> {
repository.save(new Human("aa"));
count.countDown();
});
}

You are calling count.countDown() outside the thread executing the HumanRepository.save(). So its possible that the main thread is not synchronized for the completion of the threads.
So you may see the results of repository.getSeq() while one thread is running. Can you try with the following code?
final int threads = 3_500;
final ExecutorService es = Executors.newFixedThreadPool(threads);
final CountDownLatch count = new CountDownLatch(threads);
final HumanRepository repository = new HumanRepository();
for (int i = 0; i < threads; i++) {
try {
es.execute(() -> {
repository.save(new Human("aa"));
count.countDown();
});
} finally {
}
}
count.await();
System.out.println("seq = " + repository.getSeq());
System.out.println("size = " + repository.size());

Related

Why do consumers decrease the producer's performance

I'm currently trying to increase the performance of my software by implementing the producer-consumer pattern. In my particular case I have a producer that sequentially creates Rows and multiple consumers that perform some task for a given batch of rows.
The problem I'm facing now is that when I measure the performance of my Producer-Consumer pattern, I can see that the producer's running time massively increases and I don't understand why this is the case.
So far I mainly profiled my code and did micro-benchmarking yet the results did not lead me to the actual problem.
public class ProdCons {
static class Row {
String[] _cols;
Row() {
_cols = Stream.generate(() -> "Row-Entry").limit(5).toArray(String[]::new);
}
}
static class Producer {
private static final int N_ITER = 8000000;
final ExecutorService _execService;
final int _batchSize;
final Function<Row[], Consumer> _f;
Producer(final int batchSize, final int nThreads, Function<Row[], Consumer> f) throws InterruptedException {
_execService = Executors.newFixedThreadPool(nThreads);
_batchSize = batchSize;
_f = f;
// init all threads to exclude their generaration time
startThreads();
}
private void startThreads() throws InterruptedException {
List<Callable<Void>> l = Stream.generate(() -> new Callable<Void>() {
#Override
public Void call() throws Exception {
Thread.sleep(10);
return null;
}
}).limit(4).collect(Collectors.toList());
_execService.invokeAll(l);
}
long run() throws InterruptedException {
final long start = System.nanoTime();
int idx = 0;
Row[] batch = new Row[_batchSize];
for (int i = 0; i < N_ITER; i++) {
batch[idx++] = new Row();
if (idx == _batchSize) {
_execService.submit(_f.apply(batch));
batch = new Row[_batchSize];
idx = 0;
}
}
final long time = System.nanoTime() - start;
_execService.shutdownNow();
_execService.awaitTermination(100, TimeUnit.MILLISECONDS);
return time;
}
}
static abstract class Consumer implements Callable<String> {
final Row[] _rowBatch;
Consumer(final Row[] data) {
_rowBatch = data;
}
}
static class NoOpConsumer extends Consumer {
NoOpConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
return null;
}
}
static class SomeConsumer extends Consumer {
SomeConsumer(Row[] data) {
super(data);
}
#Override
public String call() throws Exception {
String res = null;
for (int i = 0; i < 1000; i++) {
res = "";
for (final Row r : _rowBatch) {
for (final String s : r._cols) {
res += s;
}
}
}
return res;
}
}
public static void main(String[] args) throws InterruptedException {
final int nRuns = 10;
long totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new NoOpConsumer(data)).run();
}
System.out.println("Avg time with NoOpConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
totTime = 0;
for (int i = 0; i < nRuns; i++) {
totTime += new Producer(100, 1, (data) -> new SomeConsumer(data)).run();
}
System.out.println("Avg time with SomeConsumer:\t" + (totTime / 1000000000d) / nRuns + "s");
}
Actually, since the consumers run in different threads than the producer, I would expect that the running time of the producer is not effected by the Consumer's workload. However, running the program I get the following output
#1 Thread, #100 batch size
Avg time with NoOpConsumer: 0.7507254368s
Avg time with SomeConsumer: 1.5334749871s
Note that the time measurement does only measure the production time and not the consumer time and that not submitting any jobs requires on avg. ~0.6 secs.
Even more surprising is that when I increase the number of threads from 1 to 4, I get the following results (4-cores with hyperthreading).
#4 Threads, #100 batch size
Avg time with NoOpConsumer: 0.7741189636s
Avg time with SomeConsumer: 2.5561667638s
Am I doing something wrong? What am I missing? Currently I have to believe that the running time differences are due to context switches or anything related to my system.

Threads are not completely isolated from one another.
It looks like your SomeConsumer class allocates a lot of memory, and this produces garbage collection work that is shared between all threads, including your producer thread.
It also accesses a lot of memory, which can knock the memory used by the producer out of L1 or L2 cache. Accessing real memory takes a lot longer than accessing cache, so this can make your producer take longer as well.
Note also that I didn't actually verify that you're measuring the producer time properly, and it's easy to make mistakes there.

Keeping a counter with ExecutorService?

I'd like to keep a counter of executed threads, to use in the same threads that I am executing.
The problem here is that although the counter increases, it increases unevenly and from the console output I got this (I have a for loop that executes 5 threads with ExecutorService):
This is a test. N:3
This is a test. N:4
This is a test. N:4
This is a test. N:4
This is a test. N:4
As you can see instead of getting 1,2,3,4,5 I got 3,4,4,4,4.
I assume this is because the for loop is running fast enough to execute the threads, and the threads are fast enough to execute the code requesting for the counter faster than the counter can update itself (does that even make sense?).
Here is the code (it is smaller and there is no meaningful use for the counter):
for (int i = 0; i < 5; i++)
{
Thread thread;
thread = new Thread()
{
public void run()
{
System.out.println("This is test. N: "+aldo );
//In here there is much more stuff, saying it because it might slow down the execution (if that is the culprit?)
return;
}
};
threadList.add(thread);
}
//later
for (int i = 0; i < threadList.size(); i++)
{
executor.execute(threadList.get(i));
aldo = aldo + 1;
}
executor.shutdown();
try
{
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
catch (InterruptedException e)
{
}
Yes, aldo the counter ( with a few other lists, I think) are missing from the code (they are very simple).

The best way I know of doing this is by creating a custom thread class with a constructor that passes in a number. The variable holding the number can then be used later for any needed logging. Here is the code I came up with.
public static void main(String[] args) {
class NumberedThread implements Runnable {
private final int number;
public NumberedThread(int number) {
this.number = number;
}
#Override
public void run() {
System.out.println("This is test. N: " + number);
}
}
List<Thread> threadList = new ArrayList<>();
for (int i = 1; i < 6; i++) threadList.add(new Thread(new NumberedThread(i)));
ExecutorService executor = Executors.newFixedThreadPool(10);;
for (Thread thread : threadList) executor.execute(thread);
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
}
catch (InterruptedException ignored) { }
}
You could also use a string object instead if you wanted to name the threads.

aldo is not modified by the tasks in the thread, but instead is modified in the main thread, here:
for (int i = 0; i < threadList.size(); i++) {
executor.execute(threadList.get(i));
//here...
aldo = aldo + 1;
}
Also, since you want a counter that can increase its value in several threads, then you may use an AtomicInteger rather than int.
Your code should look like this:
AtomicInteger aldo = new AtomicInteger(1);
for (int i = 0; i < 5; i++) {
executor.execute( () -> {
System.out.println("This is test. N: " + aldo.getAndIncrement());
});
}

How can I schedule some work in n threads separately

Lets say I have n threads concurrently taking values from a shared queue:
public class WorkerThread implements Runnable{
private BlockingQueue queue;
private ArrayList<Integer> counts = new ArrayList<>();
private int count=0;
public void run(){
while(true) {
queue.pop();
count++;
}
}
}
Then for each thread, I want to count every 5 seconds how many items it has dequeued, and then store it in its own list (counts)
I've seen here Print "hello world" every X seconds how you can run some code every x seconds:
Timer t = new Timer();
t.scheduleAtFixedRate(new TimerTask(){
#Override
public void run(){
counts.add(count);
count = 0
}
}, 0, 5000);
The problem with this is that I can't access count variable and the list of counts unless they are static. But I don't want them to be static because I don't want the different threads to share those variables.
Any ideas of how to handle this?

I don't think it's possible to use scheduled execution for you case(neither Timer nor ScheduledExecutorService), because each new scheduled invocation will create a new tasks with while loop. So number of tasks will increase constantly.
If you don't need to access this list of counts in runtime i would suggest something like this one:
static class Task implements Runnable {
private final ThreadLocal<List<Integer>> counts = ThreadLocal.withInitial(ArrayList::new);
private volatile List<Integer> result = new ArrayList<>();
private BlockingQueue<Object> queue;
public Task(BlockingQueue<Object> queue) {
this.queue = queue;
}
#Override
public void run() {
int count = 0;
long start = System.nanoTime();
try {
while (!Thread.currentThread().isInterrupted()) {
queue.take();
count++;
long end = System.nanoTime();
if ((end - start) >= TimeUnit.SECONDS.toNanos(1)) {
counts.get().add(count);
count = 0;
start = end;
}
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// the last value
counts.get().add(count);
// copy the result cause it's not possible
// to access thread local variable outside of this thread
result = counts.get();
}
public List<Integer> getCounts() {
return result;
}
}
public static void main(String[] args) throws Exception {
ExecutorService executorService = Executors.newFixedThreadPool(3);
BlockingQueue<Object> blockingQueue = new LinkedBlockingQueue<>();
Task t1 = new Task(blockingQueue);
Task t2 = new Task(blockingQueue);
Task t3 = new Task(blockingQueue);
executorService.submit(t1);
executorService.submit(t2);
executorService.submit(t3);
for (int i = 0; i < 50; i++) {
blockingQueue.add(new Object());
Thread.sleep(100);
}
// unlike shutdown() interrupts running threads
executorService.shutdownNow();
executorService.awaitTermination(1, TimeUnit.SECONDS);
System.out.println("t1 " + t1.getCounts());
System.out.println("t2 " + t2.getCounts());
System.out.println("t3 " + t3.getCounts());
int total = Stream.concat(Stream.concat(t1.getCounts().stream(), t2.getCounts().stream()), t3.getCounts().stream())
.reduce(0, (a, b) -> a + b);
// 50 as expected
System.out.println(total);
}

Why not a static AtomicLong?
Or the WorkerThread(s) can publish that they poped to the TimerTask or somewhere else? And the TimerTask reads that info?

start multiple threads at the same time

For our assignment for class, we have to count the amount of words in a txt file by splitting it into n segments, which we are supposed to be able to set before launching the programm. Each segment should then get its own thread, which counts the words and then stops. At the end, the main thread should collect all the individual word counts and add them together.
This is (part of) what I wrote so far
for (int i = 0; i < segments; i++){
Thread thread = new Thread();
thread.start();
int words = counting(stringarray[i]);
totalwords += words;
long nanos = ManagementFactory.getThreadMXBean().getThreadCpuTime(Thread.currentThread().getId());
System.out.println("This Thread read " + words + " words. The total word count now is " + totalwords +
". The time it took to finish for this thread is " + nanos +".");
System.out.println("Number of active threads from the given thread: " + Thread.activeCount());
}
Now, while this gets the primary job done (counting the words in different threads and adding them to the total), I dont know how to just "leave the thread be" and then add the individual wordcounts together after every thread has done its job.
Additionally, while this is definitely starting multiple threads, it only ever prints out that I have 2, or maybe 3 threads running at a time, even if I split the txt into 100 segments. Is there a way to have them all run at the same time?

The wording of the question suggest that each thread has its own counter, so I would declare a thread class:
public class WordCounter extends Thread {
private String text;
private int count;
public WordCounter(String text) {
this.text = text;
}
public int getCount() {
return count;
}
#Override
public void run() {
count = counting(text);
}
}
and use it as follows:
WordCounter[] threads = new WordCounter[segments];
for (int i = 0; i < segments; ++i) {
threads[i] = new WordCounter(stringarray[i]);
threads[i].start();
}
int total = 0;
for (int i = 0; i < segments; ++i) {
threads[i].join();
total += threads[i].getCount();
}

You may use next code snippet as a basis.
Note, that in case you increment common variable in different threads, this operation has to be thread-safe. That's why AtomicInteger variable is used as a counter
final List<String> segments = new ArrayList<>();
//TODO:Fill segments ... this is up to you
//In case threads will increment same variable it has to be thread-safe
final AtomicInteger worldCount = new AtomicInteger();
//Create Thread for each segment (this is definitely not optimal)
List<Thread> workers = new ArrayList<>(segments.size());
for (int i = 0; i < segments.size(); i++) {
final String segment = segments.get(i);
Thread worker = new Thread(new Runnable() {
#Override
public void run() {
//increment worldCount
worldCount.addAndGet(counting(segment));
}
});
workers.add(worker);
worker.start();
}
//Wait until all Threads are finished
for (Thread worker : workers) {
worker.join();
}
int result = worldCount.get();

Same solutions, but with Executors:
final List<String> segments = new ArrayList<>();
segments.add("seg1");
segments.add("seg2");
segments.add("seg 3");
final AtomicInteger worldCount = new AtomicInteger();
List<Future> workers = new ArrayList<>(segments.size());
ExecutorService executor = Executors.newFixedThreadPool(segments.size());
for (String segment : segments) {
Future<Integer> worker = executor.submit(() -> worldCount.addAndGet(counting(segment)));
workers.add(worker);
}
executor.shutdown();
if (!executor.awaitTermination(5, TimeUnit.SECONDS)) {
System.out.println("Still waiting...");
System.exit(0);
}
int result = worldCount.get();
System.out.println("result = " + result);

Throttle the number of calls to a method

In my code I send batch requests to a custom Database.
The response time of the each batch is in milliseconds.
However I have a limitation on the number of batches I can send per second. Max of one batch. In case of additional batches, the batch would be dropped which is not what is desired.
I can use Thread.sleep() for a second so that I would never hit the database with more than one batch per second.
The pseudo code looks like :
createBatch()
sendBatch()
What I am trying to do is limit the number of times sendBatch() is called in a second.
Can I achieve this using any throttling library rather than using Thread.sleep()?

You can use RateLimiter from guava.
see: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/RateLimiter.html

i think this is problem of limiting number resources that can be utilized at a time. Try using pooling technique. in java you can use ExecutorService to do the same. Refer - http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html
here is sample code
class Worker implements Callable<String> {
private int id;
public Worker(int id) {
this.id = id;
System.out.println("im worker = " + id);
}
public String call() throws Exception {
System.out.println("Started some long operation - " + id);
Thread.sleep(1000); // only to simulate long running operation
System.out.println("Fiished long operation - " + id);
return null;
}
}
// main mehtod
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) throws ExecutionException, InterruptedException {
final int poolSize = 100;
final int workerSize = 1000;
ExecutorService executor = Executors.newFixedThreadPool(poolSize);
Future[] futures = new Future[workerSize];
Worker[] workers = new Worker[workerSize];
for(int i = 0; i < workerSize; i++){
workers[i] = new Worker(i);
}
System.out.println("finished creating workers================");
for(int i = 0; i < workerSize; i++){
futures[i] = executor.submit(workers[i]);
}
for (int i = 0; i < workerSize; i++){
futures[i].get();
}
System.out.println("Finished executing all");
executor.shutdown();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java unexpected concurrent result - java

Related

Why do consumers decrease the producer's performance

Keeping a counter with ExecutorService?

How can I schedule some work in n threads separately

start multiple threads at the same time

Throttle the number of calls to a method

Categories

Resources