Lock free solution while updating the map and reading it - java

I am trying to measure how much time each thread takes in inserting to database. I have captured all those performance numbers in a ConcurrentHashMap named histogram like how much time each thread is taking in inserting.
Below is the code in which I am measuring how much time each thread is taking and storing it in a ConcurrentHashMap
class Task implements Runnable {
public static ConcurrentHashMap<Long, AtomicLong> histogram = new ConcurrentHashMap<Long, AtomicLong>();
#Override
public void run() {
try {
long start = System.nanoTime();
preparedStatement.executeUpdate(); // flush the records.
long end = System.nanoTime() - start;
final AtomicLong before = histogram.putIfAbsent(end / 1000000, new AtomicLong(1L));
if (before != null) {
before.incrementAndGet();
}
}
}
}
So my question is whether the way I am trying to measure how much time each thread is taking and storing all those numbers in a ConcurrentHashMap will be thread safe or not?
I think my whole update operation is Atomic. And I just wanted to see if there are any better approach than this if my whole operation is not Atomic. I am looking for mostly lock free solution.
And then after every thread is finished executing there tasks, I am printing this Histogram map from the main method as I have made that map as Static. So this way is right or not?
public class LoadTest {
public static void main(String[] args) {
//executing all the threads using ExecutorService
//And then I am printing out the historgram that got created in Task class
System.out.println(Task.histogram);
}
}

Your code is correct; there is also a (more complex) idiom that avoids instantiating AtomicLong each time. However do note that a "naïve" lock-based solution would probably be just as good due to a very low duration of the critical section.

Related

Thread safety with Java static inner class field

I have a class something like this:
public class Outer {
public static final TaskUpdater TASK_UPDATER = new TaskUpdater() {
public void doSomething(Task task) {
//uses and modifies task and some other logic
}
};
public void taskRelatedMethod() {
//some logic
TASK_UPDATER.doSomething(new Task());
//some other logic
}
}
I've noticed some strange behaviour when running this in a multi-threaded environment that I can't reproduce locally, and I suspect it's a threading issue. Is it possible for two instances of Outer to somehow interfere with each other by both calling doSomething on TASK_UPDATER? Each will be passing a difference instance of Task into the doSomething method.
Is it possible for two instances of Outer to somehow interfere with each other by both calling doSomething on TASK_UPDATER?
The answer is "it depends". Any time you have multiple threads sharing the same object instances, you may have concurrency issues. In your case, you have multiple instances of Outer sharing the same static TaskUpdater instance. This in itself is not a problem however if TaskUpdater has any fields, they will be shared by the threads. If the threads make any changes to those fields in any way then data synchronization needs to happen and possible blocking around critical code sections. If the TaskUpdater is only reading and operating on the Task argument, which seems to be per Outer instance, then there is no problem.
For example, you could have a task updater like:
public static final TaskUpdater TASK_UPDATER = new TaskUpdater() {
public void doSomething(Task task) {
int total = 0;
for (Job job : task.getJobs() {
total += job.getSize();
}
task.setTotalSize(total);
}
};
In this case, the task is only changing the Task instance passed in. It can use local variables without a problem because those are on the stack and now shared between threads. This is thread safe.
However consider this updater:
public static final TaskUpdater TASK_UPDATER = new TaskUpdater() {
private long total = 0;
public void doSomething(Task task) {
for (Job job : task.getJobs() {
// race condition and memory synchronization issues here
total += job.getSize();
}
}
public long getTotal() {
return total;
}
};
In this case, both threads will be updating the same total field on the shard TaskUpdater. This is not thread safe since you have race conditions around the += (since it is 3 operations: get, plus, set) as well as memory synchronization issues. One thread may have a cached version of total which is 5 which it increments to 6 but another thread has already incremented its cached version of total to 10.
When threads share common fields you need to protect those operations and worry about synchronization in terms of mutex access and memory publishing. In this case, making total be an AtomicLong will be in order.
private AtomicLong total = new AtomicLong(0);
...
total.addAndGet(job.getSize());
AtomicLong wraps a volatile long so the memory is published appropriately to all threads and it has code that does atomic test/set operations which removes the race conditions.

10 threads write to single hash simultaniously

Sorry for the question(( I just have stuck in the end of the day((
I need for test 10 threads writing to the same hash (really not a hash but very similar thing i need to prove it synchronization for write)
is this is right code?
Random rn = new Random();
Map<int,int> hash = new MyHashMap<int,int>();
for(int i = 0; i< 10; i++)
{
Thread th = new MyAddingThread();
th.Start();
}
public class MyAddingThread extends Thread{
public void run()
{
hash.Add(rn.nextInt,rn.nextInt);
}
}
May be better to change 10 to 100. but I have no idea how to test dat hash for synchronization(
HashMap is not thread-safe. Use a ConcurrentHashMap instead.
EDIT
If your real question is whether your code will allow you to test whether any given data structure is thread-safe or not, there really is no reliable way to do so. Multi-thread development can introduce any number of bugs that can be extremely difficult to detect. A data structure is either designed to be thread-safe, or it is not.
That won't work, as all threads should try to add the item to the hash at the same time.
To do that, you need to use a CountDownLatch
[[main]]
CountDownLatch startSignal = new CountDownLatch(1);
for(int i = 0; i< 10; i++)
{
Thread th = new MyAddingThread();
th.Start();
}
startSignal.countdown();
[[on the thread]]
public class MyAddingThread extends Thread{
public void run()
{
startSignal.await();
hash.Add(rn.nextInt,rn.nextInt);
}
}
The javadoc of this class has a similar (but more complex) example.
Also, if you want to test this properly, create the same number of threads as cores in your computer and each thread should do a loop and insert a few hundred items at least. if you try to only insert 10 elements, there are very little chances that you'll hit a concurrent problem.
A HashMap is not synchronized so you have to do it on your own!
You may synchronize the access:
public class MyAddingThread extends Thread {
public void run() {
synchronized(hash) {
hash.put(rn.nextInt(),rn.nextInt());
}
}
}
but than you miss most of the concurrent execution.
Another idea is to use a ConcurrentHashMap:
ConcurrentMap<Integer,Integer> hash = new ConcurrentHashMap<>();
public class MyAddingThread extends Thread {
public void run() {
hash.put(rn.nextInt(),rn.nextInt());
}
}
This would give more performance due to less blocking code.
The other problem is the use of Random here which is thread safe but will also result in poor performance.
Which solution is the best for your problem I cannot consider from your little pseudo code.

How to use Multithreading to effectively

I want to do a task that I've already completed except this time using multithreading. I have to read a lot of data from a file (line by line), grab some information from each line, and then add it to a Map. The file is over a million lines long so I thought it may benefit from multithreading.
I'm not sure about my approach here since I have never used multithreading in Java before.
I want to have the main method do the reading, and then giving the line that has been read to another thread which will format a String, and then give it to another thread to put into a map.
public static void main(String[] args)
{
//Some information read from file
BufferedReader br = null;
String line = '';
try {
br = new BufferedReader(new FileReader("somefile.txt"));
while((line = br.readLine()) != null) {
// Pass line to another task
}
// Here I want to get a total from B, but I'm not sure how to go about doing that
}
public class Parser extends Thread
{
private Mapper m1;
// Some reference to B
public Parse (Mapper m) {
m1 = m;
}
public parse (String s, int i) {
// Do some work on S
key = DoSomethingWithString(s);
m1.add(key, i);
}
}
public class Mapper extends Thread
{
private SortedMap<String, Integer> sm;
private String key;
private int value;
boolean hasNewItem;
public Mapper() {
sm = new TreeMap<String, Integer>;
hasNewItem = false;
}
public void add(String s, int i) {
hasNewItem = true;
key = s;
value = i;
}
public void run() {
while (!Thread.currentThread().isInterrupted()) {
try {
if (hasNewItem) {
// Find if street name exists in map
sm.put(key, value);
newEntry = false;
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
// I'm not sure how to give the Map back to main.
}
}
I'm not sure if I am taking the right approach. I also do not know how to terminate the Mapper thread and retrieve the map in the main. I will have multiple Mapper threads but I have only instantiated one in the code above.
I also just realized that my Parse class is not a thread, but only another class if it does not override the run() method so I am thinking that the Parse class should be some sort of queue.
And ideas? Thanks.
EDIT:
Thanks for all of the replies. It seems that since I/O will be the major bottleneck there would be little efficiency benefit from parallelizing this. However, for demonstration purpose, am I going on the right track? I'm still a bit bothered by not knowing how to use multithreading.
Why do you need multiple threads? You only have one disk and it can only go so fast. Multithreading it won't help in this case, almost certainly. And if it does, it will be very minimal from a user's perspective. Multithreading isn't your problem. Reading from a huge file is your bottle neck.
Frequently I/O will take much longer than the in-memory tasks. We refer to such work as I/O-bound. Parallelism may have a marginal improvement at best, and can actually make things worse.
You certainly don't need a different thread to put something into a map. Unless your parsing is unusually expensive, you don't need a different thread for it either.
If you had other threads for these tasks, they might spend most of their time sitting around waiting for the next line to be read.
Even parallelizing the I/O won't necessarily help, and may hurt. Even if your CPUs support parallel threads, your hard drive might not support parallel reads.
EDIT:
All of us who commented on this assumed the task was probably I/O-bound -- because that's frequently true. However, from the comments below, this case turned out to be an exception. A better answer would have included the fourth comment below:
Measure the time it takes to read all the lines in the file without processing them. Compare to the time it takes to both read and process them. That will give you a loose upper bound on how much time you could save. This may be decreased by a new cost for thread synchronization.
You may wish to read Amdahl's Law. Since the majority of your work is strictly serial (the IO) you will get negligible improvements by multi-threading the remainder. Certainly not worth the cost of creating watertight multi-threaded code.
Perhaps you should look for a new toy-example to parallelise.

Thread safety in multithreaded access to LinkedList

My application needs to keep an access log of requests to a certain resource and multiple threads will be recording log entries. The only pertinent piece of information is the timestamp of the request and the stats being retrieved will be how many requests occurred in the last X seconds. The method that returns the stats for a given number of seconds also needs to support multiple threads.
I was thinking of approaching the concurrency handling using the Locks framework, with which I am not the most familiar, hence this question. Here is my code:
import java.util.LinkedList;
import java.util.concurrent.locks.ReentrantLock;
public class ConcurrentRecordStats
{
private LinkedList<Long> recLog;
private final ReentrantLock lock = new ReentrantLock();
public LinkedConcurrentStats()
{
this.recLog = new LinkedList<Long>();
}
//this method will be utilized by multiple clients concurrently
public void addRecord(int wrkrID)
{
long crntTS = System.currentTimeMillis();
this.lock.lock();
this.recLog.addFirst(crntTS);
this.lock.unlock();
}
//this method will be utilized by multiple clients concurrently
public int getTrailingStats(int lastSecs)
{
long endTS = System.currentTimeMillis();
long bgnTS = endTS - (lastSecs * 1000);
int rslt = 0;
//acquire the lock only until we have read
//the first (latest) element in the list
this.lock.lock();
for(long crntRec : this.recLog)
{
//release the lock upon fetching the first element in the list
if(this.lock.isLocked())
{
this.lock.unlock();
}
if(crntRec > bgnTS)
{
rslt++;
}
else
{
break;
}
}
return rslt;
}
}
My questions are:
Will this use of ReentrantLock insure thread safety?
Is it needed to use a lock in getTrailingStats?
Can I do all this using synchronized blocks? The reason I went with locks is because I wanted to have the same lock in both R and W sections so that both writes and reading of the first element in the list (most recently added entry) is done a single thread at a time and I couldn't do that with just synchronized.
Should I use the ReentrantReadWriteLock instead?
The locks can present a major performance bottleneck. An alternative is to use a ConcurrentLinkedDeque: use offerFirst to add a new element, and use the (weakly consistent) iterator (that won't throw a ConcurrentModificationException) in place of your for-each loop. The advantage is that this will perform much better than your implementation or than the synchronizedList implementation, but the disadvantage is that the iterator is weakly consistent - thread1 might add elements to the list while thread2 is iterating through it, which means that thread2 won't count those new elements. However, this is functionally equivalent to having thread2 lock the list so that thread1 can't add to it - either way thread2 isn't counting the new elements.

Database insertion is taking zero seconds to insert

I am trying to measure the performance of Database Insert. So for that I have written a StopWatch class which will reset the counter before executeUpdate method and calculate the time after executeUpdate method is done.
And I am trying to see how much time each thread is taking, so I am keeping those numbers in a ConcurrentHashMap.
Below is my main class-
public static void main(String[] args) {
final int noOfThreads = 4;
final int noOfTasks = 100;
final AtomicInteger id = new AtomicInteger(1);
ExecutorService service = Executors.newFixedThreadPool(noOfThreads);
for (int i = 0; i < noOfTasks * noOfThreads; i++) {
service.submit(new Task(id));
}
while (!service.isTerminated()) {
}
//printing the histogram
System.out.println(Task.histogram);
}
Below is the class that implements Runnable in which I am trying to measure each thread performance in inserting to database meaning how much time each thread is taking to insert to database-
class Task implements Runnable {
private final AtomicInteger id;
private StopWatch totalExecTimer = new StopWatch(Task.class.getSimpleName() + ".totalExec");
public static ConcurrentHashMap<Long, AtomicLong> histogram = new ConcurrentHashMap<Long, AtomicLong>();
public Task(AtomicInteger id) {
this.id = id;
}
#Override
public void run() {
dbConnection = getDBConnection();
preparedStatement = dbConnection.prepareStatement(Constants.INSERT_ORACLE_SQL);
//other preparedStatement
totalExecTimer.resetLap();
preparedStatement.executeUpdate();
totalExecTimer.accumulateLap();
final AtomicLong before = histogram.putIfAbsent(totalExecTimer.getCumulativeTime() / 1000, new AtomicLong(1L));
if (before != null) {
before.incrementAndGet();
}
}
}
Below is the StopWatch class
/**
* A simple stop watch.
*/
protected static class StopWatch {
private final String name;
private long lapStart;
private long cumulativeTime;
public StopWatch(String _name) {
name = _name;
}
/**
* Resets lap start time.
*/
public void resetLap() {
lapStart = System.currentTimeMillis();
}
/**
* Accumulates the lap time and return the current lap time.
*
* #return the current lap time.
*/
public long accumulateLap() {
long lapTime = System.currentTimeMillis() - lapStart;
cumulativeTime += lapTime;
return lapTime;
}
/**
* Gets the current cumulative lap time.
*
* #return
*/
public long getCumulativeTime() {
return cumulativeTime;
}
public String getName() {
return name;
}
#Override
public String toString() {
StringBuilder sb = new StringBuilder();
sb.append(name);
sb.append("=");
sb.append((cumulativeTime / 1000));
sb.append("s");
return sb.toString();
}
}
After running the above program, I can see 400 rows got inserted. And when it is printing the histogram, I am only seeing like this-
{0=400}
which means 400 calls came back in 0 seconds? It's not possible for sure.
I am just trying to see how much time each thread is taking to insert the record and then store those numbers in a Map and print that map from the main thread.
I think the problem I am assuming it's happening because of thread safety here and that is the reason whenever it is doing resetlap zero is getting set to Map I guess.
If yes how can I avoid this problem? And also it is required to pass histogram map from the main thread to constructor of Task? As I need to print that Map after all the threads are finished to see what numbers are there.
Update:-
If I remove divide by 1000 thing to store the number as milliseconds then I am able to see some numbers apart from zero. So that looks good.
But One thing more I found out is that numbers are not consistent, If I sum up each threads time, I will get some number for that. And I also I am printing how much time in which whole program is finishing as well. SO I compare these two numbers they are different by big margin
To avoid concurrency issues with your stopwatch you're probably better off creating a new one as a local variable within the run method of your Runnable. That way each thread has it's own stopwatch.
As for the timing you're seeing, I would absolutely hope that a simple record insert would happen in well under a second. Seeing 400 inserts that all happen in less than a second each doesn't surprise me at all. You may get better results by using the millisecond value from your stopwatch as your HashMap key.
Update
For the stopwatch concurrency problem I'm suggesting something like this:
class Task implements Runnable {
private final AtomicInteger id;
// Remove the stopwatch from here
//private StopWatch totalExecTimer = new StopWatch(Task.class.getSimpleName() + ".totalExec");
public static ConcurrentHashMap<Long, AtomicLong> histogram = new ConcurrentHashMap<Long, AtomicLong>();
public Task(AtomicInteger id) {
this.id = id;
}
#Override
public void run() {
// And add it here
StopWatch totalExecTimer = new StopWatch(Task.class.getSimpleName() + ".totalExec");
dbConnection = getDBConnection();
In this way each thread, indeed each Task, gets its own copy, and you don't have to worry about concurrency. Making the StopWatch thread-safe as-is is probably more trouble than it's worth.
Update 2
Having said that then the approach you mentioned in your comment would probably give better results, as there's less overhead in the timing mechanism.
To answer your question about the difference in cumulative thread time and the toal running time of the program I would glibbly say, "What did you expect?".
There are two issues here. One is that you're not measuring the total running time of each thread, just the bit where you're doing the DB insert.
The other is that measuring the running time of the whole application does not take into account any overlap in the execution times of the threads. Even if you were measuring the total time of each task, and assuming you're running on a multi-core machine, I would expect the cumulative time to be more than the elapse time of program execution. That's the benefit of parallel programming.
As an additional note, System.currentTimeMillis() is pseudo time and has a level of innacuracy. Using System.nanoTime() is a more accurate approach
long start = System.nanoTime();
long end = System.nanoTime();
long timeInSeconds = TimeUnit.NANOSECONDS.convert(end-start, TimeUnit.SECONDS);
For a number of reasons, currentTimeMillis is apt to not "refresh" its value on every call. You should use nanoTime for high-resolution measurements.
And your code is throwing away fractions of a second. Your toString method should use sb.append((cumulativeTime / 1000.0)); so that you get fractional seconds.
But the overhead of your timing mechanism is substantial, and if you ever do measure something a big chunk of the time will just be the timing overhead. It's much better to measure a number of operations rather than just one.

Categories