How to make writing method thread safe? - java

I have multiple threads to call one method in writing contents from an object to file, as below:
When I use 1 thread to test this method, the output into my file is expected. However, for multiple threads, the output into the file is messy. How to make this thread safe?
void (Document doc, BufferedWriter writer){
Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
for(Sentence sentence : matrix.keySet()){
Set<Matrix> set = doc.getMatrix(sentence);
for(Matrix matrix : set){
List<Result> results = ResultGenerator.getResult();
writer.write(matrix, matrix.frequency());
writer.write(results.toString());
writer.write("\n");
}
}
}
Edit:
I added this line List<Result> results = ResultGenerator.getResult(). What I really want is to use multiple threads to process this method call, since this part is expensive and takes a lot of time. The writing part is very quick, I don't really need multiple threads.
Given this change, is there a way to make this method call safe in concurrent environment?

Essentially, you are limited by single file at the end. There are no global variables and it publishes nothing, so the method is thread safe.
But, if processing does take a lot of time, you can use parallelstreams and publish the results to concurrenthashmap or a blocking queue. You would however still have a single consumer to write to the file.

I am not well versed in Java so I am going to provide a language-agnostic answer.
What you want to do is to transform matrices into results, then format them as string and finally write them all into the stream.
Currently you are writing into the stream as soon as you process each result, so when you add multi threads to your logic you end up with racing conditions in your stream.
You already figured out that only the calls for ResultGenerator.getResult() should be done in parallel whilst the stream still need to be accessed sequentially.
Now you only need to put this in practice. Do it in order:
Build a list where each item is what you need to generate a result
Process this list in parallel thus generating all results (this is a map operation). Your list of items will become a list of results.
Now you already have your results so you can iterate over them sequentially to format and write them into the stream.
I suspect the Java 8 provides some tools to make everything in a functional-way, but as said I am not a Java guy so I cannot provide code samples. I hope this explanation will suffice.
#edit
This sample code in F# explains what I meant.
open System
// This is a pretty long and nasty operation!
let getResult doc =
Threading.Thread.Sleep(1000)
doc * 10
// This is writing into stdout, but it could be a stream...
let formatAndPrint =
printfn "Got result: %O"
[<EntryPoint>]
let main argv =
printfn "Starting..."
[| 1 .. 10 |] // A list with some docs to be processed
|> Array.Parallel.map getResult // Now that's doing the trick
|> Array.iter formatAndPrint
0

If you need the final file in a predetermined sequential order, do not multithread, or you will not get what you expect.
If you think that with multithreading your program will execute faster in regards to I/O output, you are likely mistaken; because of locking or overhead due to synchronisation, you will actually get degraded performance than a single thread.
If you trying to write a very big file, the ordering of Document instances is not relevant, and you think your writer method will hit a CPU bottleneck instead (but the only possible cause I can figure out from our code is the frequency() method call), what you can do is having each thread hold its own BufferedWriter that writes to a temporary file, and then add an additional thread that waits for all, then generates the final file using concatenation.

If your code is using distinct doc and writer objects, then your method is already thread-safe as it does not access and use instance variables.
If you are writing passing the same writer object to the method, you could use one of these approaches, depending on your needs:
void (Document doc, BufferedWriter writer){
Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
for(Sentence sentence : matrix.keySet()){
Set<Matrix> set = doc.getMatrix(sentence);
for(Matrix matrix : set){
List<Result> results = ResultGenerator.getResult();
// ensure that no other thread interferes while the following
// three .write() statements are executed.
synchronized(writer) {
writer.write(matrix, matrix.frequency()); // from your example, but I doubt it compiles
writer.write(results.toString());
writer.write("\n");
}
}
}
}
or lock-free with using a temporary StringBuilder object:
void (Document doc, BufferedWriter writer){
Map<Sentence, Set<Matrix>> matrix = doc.getMatrix();
StringBuilder sb = new StringBuilder();
for(Sentence sentence : matrix.keySet()){
Set<Matrix> set = doc.getMatrix(sentence);
for(Matrix matrix : set){
List<Result> results = ResultGenerator.getResult();
sb.append(matrix).append(matrix.frequency());
sb.append(results.toString());
sb.append("n");
}
}
// write everything at once
writer.write(sb.toString();
}

I'd make it synchronized. In that case, only one thread in your application is allowed to call this method at the same time => No messy output. If you have multiple applications running, you should consider something like file locking.
Example for a synchronized method:
public synchronized void myMethod() {
// ...
}
This method is exclusive for each thread.

You could lock down a method and then unlock it when you are finished with it. By putting synchronized before a method, you make sure only one thread at a time can execute it. Synchronizing slows down Java, so it should only be used when necessary.
ReentrantLock lock = new ReentrantLock();
/* synchronized */
public void run(){
lock.lock();
System.out.print("Hello!");
lock.unlock();
}
This locks down the method just like synchronized. You can use it instead of synchronized, that's why synchronized is commented out above.

Related

NoSuchElementException occurs when Iterating through Java ArrayList concurrently

I have a method similar to the one below:
public void addSubjectsToCategory() {
final List<Subject> subjectsList = new ArrayList<>(getSubjectList());
for (final Iterator<Subject> subjectIterator =
subjectsList.iterator(); subjectIterator.hasNext();) {
addToCategory(subjectIterator.next().getId());
}
}
When this runs concurrently for the same user (another instance), sometimes it throws NoSuchElementException. As per my understanding, sometimes subjectIterator.next() get executed when there are no elements in the list. This occurs when being accessed only. Will method synchronization solve this issue?
The stack trace is:
java.util.NoSuchElementException: null
at java.util.ArrayList$Itr.next(Unknown Source)
at org.cmos.student.subject.category.CategoryManager.addSubjectsToCategory(CategoryManager.java:221)
This stack trace fails at the addToCategory(subjectIterator.next().getId()); line.
The basic rule of iterators is that underlying collection must not be modified while the iterator is being used.
If you have a single thread, there seems to be nothing wrong with this code as long as getSubjectsList() does not return null OR addToCategory() or getId() have some strange side-effects that would modify the subjectsList. Note, however, that you could rewrite the for-loop somewhat nicer (for(Subject subject: subjectsList) ...).
Judging by your code, my best guess is that you have another thread which is modifying subjectsList somewhere else. If this is the case, using a SynchronizedList will probably not solve your problem. As far as I know, synchronization only applies to List methods such as add(), remove() etc., and does not lock a collection during iteration.
In this case, adding synchronized to the method will not help either, because the other thread is doing its nasty stuff elsewhere. If these assumptions are true, your easiest and safest way is to make a separate synchronization object (i.e. Object lock = new Object()) and then put synchronized (lock) { ... } around this for loop as well as any other place in your program that modifies the collection. This will prevent the other thread from doing any modifications while this thread is iterating, and vice versa.
subjectIterator.hasNext();) {
--- Imagine a thread switch occurs here, at this point, between the call to hasNext() and next() methods.
addToCategory(subjectIterator.next().getId());
What could happen is the following, assuming you are at the last element in the list:
thread A calls hasNext(), the result is true;
thread switch occurs to thread B;
thread B calls hasNext(), the result is also true;
thread B calls next() and gets the next element from the list; now the list is empty because it was the last one;
thread switch occurs back to thread A;
thread A is already inside the body of the for loop, because this is where it was interrupted, it already called hasNext earlier, which
was true;
so thread A calls next(), which fails now with an exception, because there are no more elements in the list.
So what you have to do in such situations, is to make the operations hasNext and next behave in an atomic way, without thread switches occurring in between.
A simple synchronization on the list solves, indeed, the problem:
public void addSubjectsToCategory() {
final ArrayBlockingQueue<Subject> subjectsList = new ArrayBlockingQueue(getSubjectList());
synchronized (subjectsList) {
for (final Iterator<Subject> subjectIterator =
subjectsList.iterator(); subjectIterator.hasNext();) {
addToCategory(subjectIterator.next().getId());
}
}
}
Note, however, that there may be performance implications with this approach. No other thread will be able to read or write from/to the same list until the iteration is over (but this is what you want). To solve this, you may want to move the synchronization inside the loop, just around hasNext and next. Or you may want to use more sophisticated synchronization mechanisms, such as read-write locks.
It sounds like another thread is calling the method and grabbing the last element while another thread is about to get the next. So when the other thread finishes and comes back to the paused thread there is nothing left. I suggest using an ArrayBlockingQueue instead of a list. This will block threads when one is already iterating.
public void addSubjectsToCategory() {
final ArrayBlockingQueue<Subject> subjectsList = new ArrayBlockingQueue(getSubjectList());
for (final Iterator<Subject> subjectIterator =
subjectsList.iterator(); subjectIterator.hasNext();) {
addToCategory(subjectIterator.next().getId());
}
}
There is a bit of a wrinkle that you may have to sort out. The ArrayBlockingQueue will block if it is empty or full and wait for a thread to either insert something or take something out, respectively, before it will unblock and allow other threads to access.
You can use Collections.synchronizedList(list) if all you need is a simple invocation Sycnchronization. But do note that the iterator that you use must be inside the Synchronized block.
As I get you are adding elements to a list which might be under reading process.
Imagine the list is empty and your other thread is reading it. These kinds of problems might lead into your problem. You could never be sure that an element is written to your list which you are trying to read , in this approach.
I was surprised not to see an answer involving the use of a CopyOnWriteArrayList or Guava's ImmutableList so I thought that I would add such an answer here.
Firstly, if your use case is such that you only have a few additions relative to many reads, consider using the CopyOnWriteArrayList to solve the concurrent list traversal problem. Method synchronization could solve your issue, but CopyOnWriteArrayList will likely have better performance if the number of concurrent accesses "vastly" exceeds the number of writes, as per that class's Javadoc.
Secondly, if your use case is such that you can add everything to your list upfront in a single-threaded manner and only then do you need iterate across it concurrently, then consider Guava's ImmutableList class. You accomplish this by first using a standard ArrayList or a LinkedList or a builder for your ImmutableList. Once your single-threaded data entry is complete, then you instantiate your ImmutableList using either ImmutableList.copyOf() or ImmutableList.build(). If your use case will allow for this write/read pattern, this will probably be your most performant option.
Hope that helps.
I would like to make a suggestion that would probably solve your problem, considering that this is a concurrency issue.
If making the method addSubjectsToCategory() synchronized solves your problem, then you have located where your concurrency issue is. It is important to locate where the problem occurs, otherwise the information you provided is useless to us, we can't help you.
IF using synchronized in your method solves your problem, then consider this answer as educational or as a more elegant solution. Otherwise, share the code where you implement your threading environment, so we can have a look.
public synchronized void addSubjectsToCategory(List subjectsList){
Iterator iterator = subjectsList.iterator();
while(iterator.hasNext())
addToCategory(iterator.next().getId());
}
or
//This semaphore should be used by all threads. Be careful not to create a
//different semaphore each time.
public static Semaphore mutex = new Semaphore(1);
public void addSubjectsToCategory(List subjectsList){
Iterator<Subject> iterator = subjectsList.iterator();
mutex.acquire();
while(iterator.hasNext())
addToCategory(iterator.next().getId());
mutex.release();
}
Synchronized is clean, tidy and elegant. You have a really small method and creating locks, imho is unnecessary.
Synchronized means that only 1 thread will be able to enter the method at a time. Which means, you should use it only if you want 1 thread active each time.
If you actually need parallel execution, then your problem is not thread-related, but has something to do with the rest of your code, which we can not see.

Is this the correct way of using synchronized in Java?

The for loop shown here is run within a thread. Inside the sychronized block, a thread writes to some file. There are several different files, so the writers are kept in an array. What I want to make sure here is that no two different threads are writing to the same file at the same time. However, they can write to different files. Am I using the correct parameter with the synchronized block?
for(Element e: elements)
{
int i = getWriterIndex(e)
writeri = writers(i)
synchronized(writeri)
{
// Write to corresponding segment
writers(i).write(e)
recordsWritten(i) += 1
}
}
While I think that this would work, I strongly suggest that you avoid using synchronized. The reason being that it is quite common that you end up being to strict in your synchronization policy. As others have mentioned this seems to be a perfect use case for queues.
If you do not want to use queues in most scenarios (including this) I would suggest using locks in order to maintain thread safety (typically ReentrantReadWriteLock). You can find an example here
In your case I would create a single lock per writer and require that in order to use the writer the current thread must hold the writelock. (If you are only writing you might use simple locks instead of ReentrantReadWriteLock.
Yes, your synchronisation code will work as expected, as long as the synchronised block accesses only the data structure that is used as the lock. So, in your code, writeri cannot be accessed by multiple threads concurrently, so it is thread-safe.
However, you have to make sure that you are not accessing the variable recordsWritten somewhere else, because then you will have race-conditions. So, ideally you would also lock on that variable as well (in every place you access it), or you can use some Java primitive such as AtomicInteger

ParallelStreams in java

I'm trying to use parallel streams to call an API endpoint to get some data back. I am using an ArrayList<String> and sending each String to a method that uses it in making a call to my API. I have setup parallel streams to call a method that will call the endpoint and marshall the data that comes back. The problem for me is that when viewing this in htop I see ALL the cores on the db server light up the second I hit this method ... then as the first group finish I see 1 or 2 cores light up. My issue here is that I think I am truly getting the result I want ... for the first set of calls only and then from monitoring it looks like the rest of the calls get made one at a time.
I think it may have something to do with the recursion but I'm not 100% sure.
private void generateObjectMap(Integer count){
ArrayList<String> myList = getMyList();
myList.parallelStream().forEach(f -> performApiRequest(f,count));
}
private void performApiRequest(String myString,Integer count){
if(count < 10) {
TreeMap<Integer,TreeMap<Date,MyObj>> tempMap = new TreeMap();
try {
tempMap = myJson.getTempMap(myRestClient.executeGet(myString);
} catch(SocketTimeoutException e) {
count += 1;
performApiRequest(myString,count);
}
...
else {
System.exit(1);
}
}
This seems an unusual use for parallel streams. In general the idea is that your are informing the JVM that the operations on the stream are truly independent and can run in any order in one thread or multiple. The results will subsequently be reduced or collected as part of the stream. The important point to remember here is that side effects are undefined (which is why variables changed in streams need to be final or effectively final) and you shouldn't be relying on how the JVM organises execution of the operations.
I can imagine the following being a reasonable usage:
list.parallelStream().map(item -> getDataUsingApi(item))
.collect(Collectors.toList());
Where the api returns data which is then handed to downstream operations with no side effects.
So in conclusion if you want tight control over how the api calls are executed I would recommend you not use parallel streams for this. Traditional Thread instances, possibly with a ThreadPoolExecutor will serve you much better for this.

How should I maintain a cache of values read from a file?

Setup
There is a program running that is performing arbitrary computations and writing a status (an integer value, representing progress) to a file. The integer values can only be incremented.
Now I am developing an other application that can (among other things) perform arithmetic operations, e.g., comparisons, on those integer values. The files are permanently deleted and written by a different program. As such, there is no guarantee that a file exists at any time.
Basically, the application needs to execute something arbitrary, but has a constraint on the other program's progress, i.e., it may only execute something if the other program has done enough work.
Problem
When performing the arithmetic operations, the application should not care about where the integer values come from. Especially, accessing those integer values must not throw an exception. How should I separate all the bad things that can happen when performing io access?
Note that I do not want the execution thread to block until a value can be read from the file. E.g., say the file system dies somehow, then the integer values will not be updated, but the main thread should still continue to work. This desire is driven by the definition of the arithmetic comparison as a predicate, which has exactly two outcomes, true and false, but no third "error"-outcome. That's why I think that the values that are read from the file would need to be cached somehow.
Limitation
Java 1.7, Scala 2.11
Current Approach
I have a solution that looks as if it would work, but I am not sure if there could something go wrong.
The solution is to maintain a cache of those integer values for each file. The core functionality is provided the getters of the cache, while there is a separate "updater"-thread that constantly reads the files and updates the chaches.
If an error occurs the producer should take notice (i.e., log the error), but continue to run, because an incomplete computation should not affect subsequent computations.
A minimal example of what I am currently doing would look something like this:
object Application {
def main(args: Array[String]) {
val caches = args.map(filename => new Cache(Paths.get(filename))
val producer = new Thread(new Updater(caches)))
producer.start()
execute(caches)
producer.interrupt()
}
def execute(values: Array[AccessValue]) {
while (values.head.getValue < 5) {/* This should never throw an exception */}
}
class Updater(caches: Array[Cache]) {
def run() {
var interrupted = false
while(!interrupted) {
caches.foreach{cache =>
try {
val input = Files.newInputStream(cache.file)
cache.updateValue(parse(input))
} catch {
case _: InterruptedException =>
interrupted = true
case t: Throwable =>
log.error(t)
/*continue as if nothing happend*/
}
}
}
}
def parse(input: InputStream): Int = input.read() /* In reality, some xml parsing */
}
trait AccessValue{
def getValue: Int // should not throw an exception
}
class Cache(val file: Path) extends AccessValue{
private val value = 0
def getValue = value
def updateValue(newValue: Int) { value = newValue }
}
Doing it like this works on a synthetic test setup, but I am wondering whether something bad can happen. Also, if anyone would approach the problem differently, I would be glad to hear how.
Could there be a throwable that could cause other threads to go wild? I am thinking of something like OutOfMemoryException or StackOverflow. Would I need to handle them differently, or does it not matter, because, e.g., the whole application would die anyways?
What would happen if the the InterruptException is thrown outside the try block, or even in the catch block? Is there a better way to terminate a thread?
Must the member value of class Cache be declared volatile? I do not care much about the ordering of reads and write, but the compiler must not "optimize" reading the value away just because it deduces that the value is constant.
There are a lot of different concurrency-related libraries. Do you suggest me to use something other than new Thread(...).start()? If yes, what facility do you suggest? I know of Scala's ExecutionContext, Future's, and Java's Executors class, which provides various static constructors for thread pools. However, I have never used any of these before and I do not know their advantages and disadvantages. I also stumbled upon the name "Akka", but my guess is that using Akka is overkill for what I want to achieve.
Thank you
I would recommend to read through oracle's documentation on concurrency.
When one thread writes a value and different thread reads a value, you should always use a synchronized block or declare that value as volatile. Otherwise there is no guarantee that the value written by one thread is visible to the other thread (see oracle's documentation on establishing happens-before relationship).
The OutOfMemoryException can influence the other threads as the heap space to which the OutOfMemoryException refers is shared among threads. The StackOverflow exception would kill only the thread in which it occurs because each thread has its own stack.
If you do not need some sort of synchronization between the two threads then you probably do not need any Futures or Executors.

what to use in multithreaded environment; Vector or ArrayList

I have this situation:
web application with cca 200 concurent requests (Threads) are in need to log something to local filesystem. I have one class to which all threads are placing their calls, and that class internally stores messages to one Array (Vector or ArrayList) which then in turn will be written to filesystem.
Idea is to return from thread's call ASAP so thread can do it's job as fast as possible, what thread wanted to log can be written to filesystem later, it is not so crucial.
So, that class in turn removes first element from that list and writes it to filesystem, while in real time there is 10 or 20 threads which are appending new logs at the end of that list.
I would like to use ArrayList since it is not synchronized and therefore thread's calls will last less, question is:
am I risking deadlocks / data loss? Is it better to use Vector since it is thread safe? Is it slower to use Vector?
Actually both ArrayList and Vector are very bad choices here, not because of synchronization (which you would definitely need), but because removing the first element is O(n).
The perfect data structure for your purspose is the ConcurrentLinkedQueue: it offers both thread safety (without using synchronization), and O(1) adding and removing.
Are you limitted to particular (old) java version? It not please consider using java.util.concurrent.LinkedBlockingQueue for this kind of stuff. It's really worth looking at java.util.concurrent.* package when dealing with concurrency.
Vector is worse than useless. Don't use it even when using multithreading. A trivial example of why it's bad is to consider two threads simultaneously iterating and removing elements on the list at the same time. The methods size(), get(), remove() might all be synchronized but the iteration loop is not atomic so - kaboom. One thread is bound to try removing something which is not there, or skip elements because the size() changes.
Instead use synchronized() blocks where you expect two threads to access the same data.
private ArrayList myList;
void removeElement(Object e)
{
synchronized (myList) {
myList.remove(e);
}
}
Java 5 provides explicit Lock objects which allow more finegrained control, such as being able to attempt to timeout if a resource is not available in some time period.
private final Lock lock = new ReentrantLock();
private ArrayList myList;
void removeElement(Object e) {
{
if (!lock.tryLock(1, TimeUnit.SECONDS)) {
// Timeout
throw new SomeException();
}
try {
myList.remove(e);
}
finally {
lock.unlock();
}
}
There actually is a marginal difference in performance between a sychronizedlist and a vector. (http://www.javacodegeeks.com/2010/08/java-best-practices-vector-arraylist.html)

Categories