I know that java.util.BitSet operations are not thread-safe. Does only reading and writing to a BitSet in parallel threads cause a permanent(in the current runtime of application) loss of information? Or write operation executes correctly, and only the current parallel read operation may return wrong information and later read operations return correct information. In other words, I mean that, if I only synchronize the write operations and allow write operations to run in parallel with read operations, will some information still be lost permanently?
The only thread-safe operation is read vs read: nothing is written in memory, memory can be accessed from any thread without any problem.
BUT when you have read vs write you can have surprises, ex: reading while writing may give you half the previous result and half the new result since bitfield is not atomic.
In your question, you accept that the concurrent read/write returns incorrect results in read. In that case, how do you know if the data returned by a read is correct? read many times and make an average?
So you have to synchronize your read operations with your write operations too.
EDIT: if you really want to go down the "I don't care if data is corrupt when reading" road, I suggest you add CRC to the emitted data, and you can reject the data if incorrect.
When you're talking about a standard Java library class, then you should go by what the javadoc says.
There are users out there running different JRE versions, from different vendors, on different operating system versions. You can't rely on the behavior of Bitset or any other library class to be exactly the same in every environment, but you can rely on it to do whatever the Javadoc says it will do.
if I only synchronize the write operations and allow write operations to run in parallel with read operations, will some information still be lost permanently?
It's highly unlikely that overlapped read operations or reads overlapped with a single write operation could leave a Bitset (or any other object) in some invalid state. If you think that your application can cope with incorrect results that might be returned by a read, then that might be a reasonable risk to take,
BUT
Are you certain that synchronizing reads causes a performance problem? If you haven't actually measured the performance, and you haven't found that the difference between unsynchronized and synchronized makes the difference beetween acceptable performance and unacceptable performance, then why not just synchronize all access?
To do otherwise is called "premature optimization", and more often than not, it's a waste of your own time.
A BitSet is not safe for multithreaded use without external synchronization.
Please read this article for basic understanding of readers writer problem.
https://dzone.com/articles/java-concurrency-read-write-lo
Only READing will not cause any issue in concurreny.
READ/WRITE or WRITE/WRITE cause inconsistent issues when you access the information concurrently
Related
Can anyone please explain to me the consequences of mutating a collection in java that is not thread-safe and is being used by multiple threads?
The results are undefined and somewhat random.
With JDK collections that are designed to fail fast, you might receive a ConcurrentModificationException. This is really the only consequence that is specific to thread safety with collections, as opposed to any other class.
Problems that occur generally with thread-unsafe classes may occur:
The internal state of the collection might be corrupted.
The mutation may appear to be successful, but the changes may not, in fact, be visible to other threads at any given time. They might be invisible at first and become visible later.
The changes might actually be successful under light load, but fail randomly under heavy load with lots of threads in contention.
Race conditions might occur, as was mentioned in a comment above.
There are lots of other possibilities, none of them pleasant. Worst of all, these things tend to most commonly reveal themselves in production, when the system is stressed.
In short, you probably don't want to do that.
The most common outcome is it looks like it works, but doesn't work all the time.
This can mean you have a problem which
works on one machine but doesn't on another.
works for a while but something apparently unrelated changes and your program breaks.
whenever you have a bug you don't know if it's a multi-threading issue or not if you are not using thread safe data structures.
What can happen is;
you rarely/randomly get an error and strange behaviour
your code goes into an infinite loop and stops working (HashMap used to do this)
The only option is to;
limit the amount of state which is shared between threads, ideally none at all.
be very careful about how data is updated.
don't rely on unit tests, you have to understand what the code doing and be confident it will be behave correctly in all possible situations.
The invariants of the data structure will not be guaranteed.
For example:
If thread 2 does a read whilst thread 1 is adding to the DS thread 1 may consider this element added while thread 2 doesn't see that the element has been added yet.
There are plenty of data structures that aren't thread-safe that will still appear to function(i.e. not throw) in a multi threaded environment and they might even perform correctly under certain circumstances(like if you aren't doing any writes to the data structure).
To fully understand this topic exploring the different classes of bugs that occur in concurrent systems is recommended: this short document seems like a good start.
http://pages.cs.wisc.edu/~remzi/OSTEP/threads-bugs.pdf
I am learning multithreading, and I have a little question.
When I am sharing some variable between threads (ArrayList, or something other like double, float), should it be lcoked by the same object in read/write? I mean, when 1 thread is setting variable value, can another read at same time withoud any problems? Or should it be locked by same object, and force thread to wait with reading, until its changed by another thread?
All access to shared state must be guarded by the same lock, both reads and writes. A read operation must wait for the write operation to release the lock.
As a special case, if all you would to inside your synchronized blocks amounts to exactly one read or write operation, then you may dispense with the synchronized block and mark the variable as volatile.
Short: It depends.
Longer:
There is many "correct answer" for each different scenarios. (and that makes programming fun)
Do the value to be read have to be "latest"?
Do the value to be written have let all reader known?
Should I take care any race-condition if two threads write?
Will there be any issue if old/previous value being read?
What is the correct behaviour?
Do it really need it to be correct ? (yes, sometime you don't care for good)
tl;dr
For example, not all threaded programming need "always correct"
sometime you tradeoff correctness with performance (e.g. log or progress counter)
sometime reading old value is just fine
sometime you need eventually correct (e.g. in map-reduce, nobody nor synchronized is right until all done)
in some cases, correct is mandatory for every moment (e.g. your bank account balance)
in write-once, read-only it doesn't matter.
sometime threads in groups with complex cases.
sometime many small, independent lock run faster, but sometime flat global lock is faster
and many many other possible cases
Here is my suggestion: If you are learning, you should thing "why should I need a lock?" and "why a lock can help in DIFFERENT cases?" (not just the given sample from textbook), "will if fail or what could happen if a lock is missing?"
If all threads are reading, you do not need to synchronize.
If one or more threads are reading and one or more are writing you will need to synchronize somehow. If the collection is small you can use synchronized. You can either add a synchronized block around the accesses to the collection, synchronized the methods that access the collection or use a concurrent threadsafe collection (for example, Vector).
If you have a large collection and you want to allow shared reading but exclusive writing you need to use a ReadWriteLock. See here for the JavaDoc and an exact description of what you want with examples:
ReentrantReadWriteLock
Note that this question is pretty common and there are plenty of similar examples on this site.
I'm wondering if there could be any problems while accessing one array with multiple threads but either only reading or only writing.
When the threads write to the array it wouldn't matter in which order they write and even if they write to the same entry all threads would write the same value.
For example, if I want to find prime numbers via the Sieve of Eratosthenes:
I create an array of consecutive numbers and set all multiples of prime numbers to 0 using multiple threads.
It wouldn't matter if the thread which strikes off the multiples of two and the thread which strikes off the multiples of 5 set the entry of the number 20 to 0 at the same time or one before or after the other.
So it's not an question of the qualitiy or consistency of the data, but of the technical possibility to do it wihout facing any java errors.
I'm assuming you mean 'without synchronization controls'. The short answer is no.
Synchronization is used for 2 reasons:
Mutual exclusion of data
communication between threads
Your setup indicates that the first reason isn't really a problem in your case. The algorithm effectively separates the data out so that multiple worker threads won't be using the same data.
However, in order for changes done in one thread to become visible to another thread, you must use synchronization. Without synchronization, the JVM makes no guarantee as to the ordering of writes. Updates that one thread makes may be visible in another thread at any time later, or even never. See Effective Java Item #66, and maybe look at the Java Concurrency in Practice book.
I don't think it would work since eventually you need to read the variables (to output them, save to disk, etc.). And the read has to be synchronized in order to guarantee correct interthread operation ordering. Remember that without synchronization java only guarantees intrathread operation ordering.
Now, you can say that you don't want to read them at all in anyway, but if that is the case, java can just optimize throwing away the whole code.
All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java
Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.
You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".
As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.
Buy and read Brian Goetz's "Java Concurrency in Practice".
Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.
I read app engine wiki, on datastore contention if too frequent write
more than 5 times in 1 seconds. The wiki introduced use "shard"
approach" as workaround. May i know if we use spring #transactional
on this, this can prevent datastore contention timeout right since
writing in done concurrently ?
No, you can't do that. Whether or not you use #transactional, it will not make the problem go away - the fact that you have one object that you need to keep on writing to. The contention limit will continue to remain whatever approach you use.
The answer to this problem is actually deciding what it is you want to do, and how important accuracy is to you. Take the case of a simple counter, which is a common example of this problem. If you think accuracy is very important, then you will have to have a list of counters that you choose either sequentially, or at random, and write into. If you have ten counters in this list, then that gives you then times more writes per second, even transactional writes. You need to write code to choose which counters you want to write to, though.
On the other had, if you don't require too much precision, you could try writing to memcache very often. The write contention limits are much higher when writing to memcache or incrementing a counter there. You can then write out and reset the counter at a set interval.
When I was on a project that needed to store a lot of individual records to DB I found the system could not handle all the concurrent transactions. I instead build the object in memory and then all at once saved it to the DB.