Multi-threads writing same thing to the same file?

Multi-threads writing same thing to the same file? - java

I alway thought concurrently threads writing to one same file needs synchronization.
What happend when multi-threads writing same thing to the same file without synchronization? I imagined the output file must be incomplete or corrupted.
public class Test
{
public Runnable createLayoutRunnable() {
return new Runnable() {
public void run() {
try {
FileInputStream inputStream = new FileInputStream("mov.mp4");
FileOutputStream outputStream = new FileOutputStream("mov_co.mp4");
//IOUtils.copy(inputStream, outputStream);
//synchronized ("lock"){
int read = 0;
byte[] bytes = new byte[1024];
while ((read = inputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
//}
System.out.println(Thread.currentThread().getName() + " is done");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
};
}
public static void main(String[] args) {
Test test = new Test();
//Create Thread Pool for parallel layout
ExecutorService executor = Executors.newFixedThreadPool(9);
//Run Tasks and wait for termination in the current thread
Future<?> f1 = executor.submit(test.createLayoutRunnable());
Future<?> f2 = executor.submit(test.createLayoutRunnable());
Future<?> f3 = executor.submit(test.createLayoutRunnable());
Future<?> f4 = executor.submit(test.createLayoutRunnable());
Future<?> f5 = executor.submit(test.createLayoutRunnable());
Future<?> f6 = executor.submit(test.createLayoutRunnable());
Future<?> f7 = executor.submit(test.createLayoutRunnable());
Future<?> f8 = executor.submit(test.createLayoutRunnable());
Future<?> f9 = executor.submit(test.createLayoutRunnable());
try {
f1.get();
f2.get();
f3.get();
f4.get();
f5.get();
f6.get();
f7.get();
f8.get();
f9.get();
} catch (Exception ex) {
ex.printStackTrace();
}
executor.shutdown();
System.out.println("all done");
}
}
Surprise! The output mov is good to play!
How come? Please help!
Edit: Before all, I'm terribly sorry about the confusion. Yes the first-time code I posted was synchronised as opposed to what I said. I have commented it off now. This is because I was playing with the code and that's where I found whether it's synchronised or not doesn't matter and was wondering why.

In this particular case, you're writing identical content from an input file to identical locations in an output file. This is what's called an idempotent operation, and it doesn't matter whether you synchronize or not.
If each thread wrote its own source file (and you eliminated synchronization), you'd see that either (1) one thread would win, or (2, more likely) you'd get interleaved (corrupted) content.

In your example, even if you took out the synchronisation, each thread is writing the same content to the same file. Because each thread is using a separate OutputStream (and InputStream) the threads do not interfere with each other's file position. Thus the output is a copy of the input file.
It's analogous to this:
public static int a;
public static int b;
public static int c;
With the threaded code being:
a = 1;
b = 2;
c = 3;
Imagine you have two threads, A and B. The sequence of execution might run as follows, for example:
A sets a = 1;
A sets b = 2;
B sets a = 1;
A sets c = 3;
B sets b = 2;
B sets c = 3;
It doesn't matter how many threads run that sequence nor whether they are synchronised, once they are finished the contents of {a,b,c} will always be {1,2,3} (with some caveats that don't apply when writing to an external file). It's the same with your example copying a file - the contents of the output file are always the same; the exact sequence of execution in the threads doesn't matter.

Multithreaded access does not mean that you will get garbage. It means that result is inpredictable. Some systems may just synchronize something themselves and result may seem like it is accessed under mutex.
Secondly, your code is synchronized, as opposed to what you say. Do you see that section sync("lock")? At first you say that it is not synchronized. Then, we look into the code and see that it is synchronized. Then we think that "lock", visible to the first thread is different from what another sees. But it is the same object because in java "static string" == "static string". So, the thread makes the full copy under lock. Then another comes and makes the full copy. Sure, the playback will be uninterrupted.
The only thing in file manipulation that goes outside synchronization is file open/close. (close?) Try in Linux. This may make a big difference there.

Related

Java Concurrency - Read Write Lock Performance

I am trying to understand ReadWriteLock. [ This code will just work in your IDE. Copy & paste. Try to do this yourself ]
class ReadWrite {
private static final ReadWriteLock LOCK = new ReentrantReadWriteLock();
private static final Lock READ_LOCK = LOCK.readLock();
private static final Lock WRITE_LOCK = LOCK.writeLock();
private static final int[] ARR = new int[1];
int i = 0;
Integer read(){
Integer value = null;
try{
READ_LOCK.lock();
value = ARR[0];
}catch (Exception e){
e.printStackTrace();
}finally {
READ_LOCK.unlock();
}
return value;
}
void write(){
try{
WRITE_LOCK.lock();
ARR[0] = i++;
}catch (Exception e){
e.printStackTrace();
}finally {
WRITE_LOCK.unlock();
}
}
}
I was trying to do a performance test.
AtomicInteger atomicInteger = new AtomicInteger(0);
ReadWrite rw = new ReadWrite();
// read 10 millions times
Runnable r1 = () -> IntStream.rangeClosed(1, 10_000_000).forEach(i -> {
if(rw.read() > 0)
atomicInteger.incrementAndGet();
});
Runnable r2 = rw::write;
ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(1);
Thread[] threads = new Thread[10];
long before = System.currentTimeMillis();
scheduledExecutorService.scheduleAtFixedRate(r2, 1, 1, TimeUnit.MICROSECONDS);
for (int i = 0; i < 10; i++) {
threads[i] = new Thread(r1);
threads[i].start();
}
for (int i = 0; i < 10; i++) {
threads[i].join();
}
System.out.println("Time Taken :: " + (System.currentTimeMillis() - before));
System.out.println("No fo reads :: " + atomicInteger.get());
Ran this test few times.
Case 1:
When i use READ_LOCK for reading it takes 12 seconds to complete. no of reads is 100000000.
Case 2:
When I use WRITE_LOCK for both reading & writing (READ_LOCK not used in this case), it the test takes only 2.5 seconds.
no of reads is 100000000.
I was thinking having separate locks should improve performance.
What is going on here? What is the mistake I do?

You are running read() for 10 millions times (* 10 threads).
and run write() only once ..
The write took 2.5 sec because it was able to take the write lock only when there was no thread with the read lock.
Also, as #Burak was mentioned, you did not measured the right thing here.
You should run same method once with read lock and once with write lock.
Run this method with 10 threads for example.
The method will iterate 1-10 million for example.
In addition, you are calculating the time of creating a thread inside your test (which is not part of the locks mechanism. You should create the threads before)
Then you will see that the write lock method is slower than the read lock.
Why? because when a thread takes the write lock, only this thread will be able to execute the method code.
In case of the read lock, all the 10 threads will run the method in parallel

The documentation of ReadWriteLock mentions this:
Further, if the read operations are too short the overhead of the read-write lock implementation (which is inherently more complex than a mutual exclusion lock) can dominate the execution cost, particularly as many read-write lock implementations still serialize all threads through a small section of code. Ultimately, only profiling and measurement will establish whether the use of a read-write lock is suitable for your application.
Your reads are indeed very fast, so you're observing the overhead that a read-write lock has over a simple lock.
What's involved in the implementation of a read-write lock? To start, there are actually two locks. The read lock may be taken by multiple threads, making it different from a simple reentrant lock, and it must check that the write lock is not locked when trying to lock. The writer lock must check that there are no locked readers when trying to lock, but it's otherwise similar to a single-threaded reentrant lock.
For fine-grained accesses such as in your example, a read-write lock is not worth it. The overhead might become negligible when accessing a bunch of data, such as a "page" of data, e.g. hundreds or thousands of cached database rows.

Questions about Threads and Callbacks in Java

I am reading Network Programming in Java by Elliotte and in the chapter on Threads he gave this piece of code as an example of a computation that can be ran in a different thread
import java.io.*;
import java.security.*;
public class ReturnDigest extends Thread {
private String filename;
private byte[] digest;
public ReturnDigest(String filename) {
this.filename = filename;
}
#Override
public void run() {
try {
FileInputStream in = new FileInputStream(filename);
MessageDigest sha = MessageDigest.getInstance("SHA-256");
DigestInputStream din = new DigestInputStream(in, sha);
while (din.read() != -1) ; // read entire file
din.close();
digest = sha.digest();
} catch (IOException ex) {
System.err.println(ex);
} catch (NoSuchAlgorithmException ex) {
System.err.println(ex);
}
}
public byte[] getDigest() {
return digest;
}
}
To use this thread, he gave an approach which he referred to as the solution novices might use.
The solution most novices adopt is to make the getter method return a
flag value (or perhaps throw an exception) until the result field is
set.
And the solution he is referring to is:
public static void main(String[] args) {
ReturnDigest[] digests = new ReturnDigest[args.length];
for (int i = 0; i < args.length; i++) {
// Calculate the digest
digests[i] = new ReturnDigest(args[i]);
digests[i].start();
}
for (int i = 0; i < args.length; i++) {
while (true) {
// Now print the result
byte[] digest = digests[i].getDigest();
if (digest != null) {
StringBuilder result = new StringBuilder(args[i]);
result.append(": ");
result.append(DatatypeConverter.printHexBinary(digest));
System.out.println(result);
break;
}
}
}
}
He then went on to propose a better approach using callbacks, which he described as:
In fact, there’s a much simpler, more efficient way to handle the
problem. The infinite loop that repeatedly polls each ReturnDigest
object to see whether it’s finished can be eliminated. The trick is
that rather than having the main program repeatedly ask each
ReturnDigest thread whether it’s finished (like a five-year-old
repeatedly asking, “Are we there yet?” on a long car trip, and almost
as annoying), you let the thread tell the main program when it’s
finished. It does this by invoking a method in the main class that
started it. This is called a callback because the thread calls its
creator back when it’s done
And the code for the callback approach he gave is below:
import java.io.*;
import java.security.*;
public class CallbackDigest implements Runnable {
private String filename;
public CallbackDigest(String filename) {
this.filename = filename;
}
#Override
public void run() {
try {
FileInputStream in = new FileInputStream(filename);
MessageDigest sha = MessageDigest.getInstance("SHA-256");
DigestInputStream din = new DigestInputStream( in , sha);
while (din.read() != -1); // read entire file
din.close();
byte[] digest = sha.digest();
CallbackDigestUserInterface.receiveDigest(digest, filename); // this is the callback
} catch (IOException ex) {
System.err.println(ex);
} catch (NoSuchAlgorithmException ex) {
System.err.println(ex);
}
}
}
And the Implementation of CallbackDigestUserInterface and it's usage was given as:
public class CallbackDigestUserInterface {
public static void receiveDigest(byte[] digest, String name) {
StringBuilder result = new StringBuilder(name);
result.append(": ");
result.append(DatatypeConverter.printHexBinary(digest));
System.out.println(result);
}
public static void main(String[] args) {
for (String filename: args) {
// Calculate the digest
CallbackDigest cb = new CallbackDigest(filename);
Thread t = new Thread(cb);
t.start();
}
}
}
But my question (or clarification) is regarding what he said about this method...He mentioned
The trick is
that rather than having the main program repeatedly ask each
ReturnDigest thread whether it’s finished, you let the thread
tell the main program when it’s finished
Looking at the code, the Thread that was created to run a separate computation is actually the one that continues executing the original program. It is not as if it passed the result back to the main thread. It seems it becomes the MAIN Thread!
So it is not as if the Main threads gets notified when the task is done (instead of the main thread polling). It is that the main thread does not care about the result. It runs to its end and it finishes. The new thread would just run another computation when it is done.
Do I understand this correctly?
How does this play with debugging? Does the thread now becomes the Main thread? and would the debugger now treat it as such?
Is there another means to actually pass the result back to the main thread?
I would appreciate any help, that helps in understanding this better :)

It is a common misunderstanding to think that the "main" thread, the one that public static void main is run on, should be considered the main thread for the application. If you write a gui app for instance, the starting thread will likely finish and die well before the program ends.
Also, callbacks are normally called by the thread that they are handed off to. This in true in Swing, and in many other places (including DataFetcher, for example)

None of the other threads become the "main thread". Your main thread is the thread that starts with the main() method. It's job is to start the other threads... then it dies.
At this point, you never return to the main thread, but the child threads have callbacks... and that means that when they are done, they know where to redirect the flow of the program.
That is your receiveDigest() method. Its job is to display the results of the child threads once they complete. Is this method being called from the main thread, or the child threads? What do you think?
It is possible to pass the result back to the main thread. To do this, you need to keep the main thread from terminating, so it will need to have a loop to keep it going indefinitely, and to keep that loop from eating up processor duty, it will need to be put to sleep while the other threads work.
You can read an example of fork and join architecture here:
https://www.tutorialspoint.com/java_concurrency/concurrency_fork_join.htm

The book is misleading you.
First of all, there is no Callback in the example. There is only one function calling another function by name. A true callback is a means for communication between different software modules. It is pointer or reference to a function or object-with-methods that module A provides to module B so that module B can call it when something interesting happens. It has nothing at all to do with threads.
Second of all, the alleged callback communicates nothing between threads. The function call happens entirely in the new thread, after the main() thread has already died.

Is Files.write() method thread safe. [duplicate]

I have a java program which uses 20 threads. Every one of them write their results in a file called output.txt.
I always get a different number of lines in output.txt.
Can it be a problem with the synchronization of threads? Is there a way to handle this?

can it be a problem of synchronization of threads?
Yes.
There's a way to handle this?
Yes, ensure that writes are serialized by synchronizing on a relevant mutex. Or alternately, have only one thread that actually outputs to the file, and have all of the other threads simply queue text to be written to a queue that the one writing thread draws from. (That way the 20 main threads don't block on I/O.)
Re the mutex: For instance, if they're all using the same FileWriter instance (or whatever), which I'll refer to as fw, then they could use it as a mutex:
synchronized (fw) {
fw.write(...);
}
If they're each using their own FileWriter or whatever, find something else they all share to be the mutex.
But again, having a thread doing the I/O on behalf of the others is probably also a good way to go.

I'd suggest you to organize it this way: One thread-consumer will consume all data and write it to the file. All worker threads will produce data to the consumer thread in synchronous way. Or with multiple threads file writing you can use some mutex or locks implementations.

If you want any semblance of performance and ease of management, go with the producer-consumer queue and just one file-writer, as suggested by Alex and others. Letting all the threads at the file with a mutex is just messy - every disk delay is transferred directly into your main app functionality, (with added contention). This is especially unfunny with slow network drives that tend to go away without warning.

If you can hold your file as a FileOutputStream you can lock it like this:
FileOutputStream file = ...
....
// Thread safe version.
void write(byte[] bytes) {
try {
boolean written = false;
do {
try {
// Lock it!
FileLock lock = file.getChannel().lock();
try {
// Write the bytes.
file.write(bytes);
written = true;
} finally {
// Release the lock.
lock.release();
}
} catch ( OverlappingFileLockException ofle ) {
try {
// Wait a bit
Thread.sleep(0);
} catch (InterruptedException ex) {
throw new InterruptedIOException ("Interrupted waiting for a file lock.");
}
}
} while (!written);
} catch (IOException ex) {
log.warn("Failed to lock " + fileName, ex);
}
}

You should use synchronization in this case. Imagine that 2 threads (t1 and t2) open the file at the same time and start writing to it. The changes performed by the first thread are overwrited by the second thread because the second thread is the last to save the changes to the file. When a thread t1 is writing to the file, t2 must wait until t1 finishes it's task before it can open it.

Well, without any implementation detail, it is hard to know, but as my test case shows, I always get 220 lines of output, i.e., constant number of lines, with FileWriter. Notice that no synchronized is used here.
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
/**
* Working example of synchonous, competitive writing to the same file.
* #author WesternGun
*
*/
public class ThreadCompete implements Runnable {
private FileWriter writer;
private int status;
private int counter;
private boolean stop;
private String name;
public ThreadCompete(String name) {
this.name = name;
status = 0;
stop = false;
// just open the file without appending, to clear content
try {
writer = new FileWriter(new File("test.txt"), true);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) {
for (int i=0; i<20; i++) {
new Thread(new ThreadCompete("Thread" + i)).start();
}
}
private int generateRandom(int range) {
return (int) (Math.random() * range);
}
#Override
public void run() {
while (!stop) {
try {
writer = new FileWriter(new File("test.txt"), true);
if (status == 0) {
writer.write(this.name + ": Begin: " + counter);
writer.write(System.lineSeparator());
status ++;
} else if (status == 1) {
writer.write(this.name + ": Now we have " + counter + " books!");
writer.write(System.lineSeparator());
counter++;
if (counter > 8) {
status = 2;
}
} else if (status == 2) {
writer.write(this.name + ": End. " + counter);
writer.write(System.lineSeparator());
stop = true;
}
writer.flush();
writer.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
As I understand (and test), there are two phases in this process:
all threads in the pool all created and started, ready to grab the file;
one of them grabs it, and I guess it then internally locks it, prevents other threads to get access, because I never see a line combined of contents that come from two threads. So when a thread is writing, others are waiting until it completes the line, and very likely, releases the file. So, no race condition will happen.
the quickest of the others grabs the file and begins writing.
Well, it is just like a crowd waiting outside a bathroom, without queuing.....
So, if your implementation is different, show the code and we can help to break it down.

Writing buffers to a Java channel: Thread-safe or not?

Consider the following code snippet, which simply writes the contents of someByteBuffer to the standard output:
// returns an instance of "java.nio.channels.Channels$WritableByteChannelImpl"
WritableByteChannel w = Channels.newChannel(System.out);
w.write(someByteBuffer);
Java specifies that channels are, in general, intended to be safe for multithreaded access, while buffers are not safe for use by multiple concurrent threads.
So, I was wondering whether the above snippet requires synchronization, as it is invoking the write method of a channel (which is supposed to be thread-safe) on some buffer (which is not thread-safe).
I took a look at the implementation of the write method:
public int write(ByteBuffer src) throws IOException {
int len = src.remaining();
int totalWritten = 0;
synchronized (writeLock) {
while (totalWritten < len) {
int bytesToWrite = Math.min((len - totalWritten),
TRANSFER_SIZE);
if (buf.length < bytesToWrite)
buf = new byte[bytesToWrite];
src.get(buf, 0, bytesToWrite);
try {
begin();
out.write(buf, 0, bytesToWrite);
} finally {
end(bytesToWrite > 0);
}
totalWritten += bytesToWrite;
}
return totalWritten;
}
}
Notice that everything is synchronized by the writeLock, except the first two lines in the method. Now, as the ByteBuffer src is not thread-safe, calling src.remaining() without proper synchronization is risky, as another thread might change it.
Should I synchronize the line w.write(someByteBuffer) in the above snippet, or am I missing something and the Java implementation of the write() method has already taken care of that?
Edit: Here's a sample code which often throws a BufferUnderflowException, since I commented out the synchronized block at the very end. Removing those comments will make the code exception free.
import java.nio.*;
import java.nio.channels.*;
public class Test {
public static void main(String[] args) throws Exception {
ByteBuffer b = ByteBuffer.allocate(10);
b.put(new byte[]{'A', 'B', 'C', 'D', 'E', 'F', 'G', '\n'});
// returns an instance of "java.nio.channels.Channels$WritableByteChannelImpl"
WritableByteChannel w = Channels.newChannel(System.out);
int c = 10;
Thread[] r = new Thread[c];
for (int i = 0; i < c; i++) {
r[i] = new Thread(new MyRunnable(b, w));
r[i].start();
}
}
}
class MyRunnable implements Runnable {
private final ByteBuffer b;
private final WritableByteChannel w;
MyRunnable(ByteBuffer b, WritableByteChannel w) {
this.b = b;
this.w = w;
}
#Override
public void run() {
try {
// synchronized (b) {
b.flip();
w.write(b);
// }
} catch (Exception e) {
e.printStackTrace();
}
}
}

The point is: if your setup allows more than one thread to tamper with that buffer object, then you are subject to threading issues. It is that simple!
The question is not if channel.write() is thread-safe. That is good to know, but not at the core of the problem!
The real question is: what is your code doing with that buffer?
What does it help that this channel implementation does lock internally on something when the data it is operating on ... is coming from the outside?!
You know, all kinds of things could happen on that src object coming into this method - while that channel is busy writing the buffer!
In other words: the question whether this code is "safe" fully depends on what your code is doing with that very src buffer object in parallel.
Given the OP's comment: the core point is: you have to ensure that any activity that makes use of that byte buffer is thread-safe. In the example given, we have two operations:
b.flip();
w.write(b);
Those are the only operation each thread will be doing; thus: when making sure that only one thread can make those two calls (as shown; by looking on the shared buffer object); then you are good.
It is really simple: if you have "shared data"; then you have to ensure that the threads reading/writing that "shared data" are synchronized somehow to avoid race conditions.

You only need to lock the Channel if you are making multiple writes from multiple threads and want to ensure atomicity of those writes in which case you need to lock the System.out object.
If you shared a mutated data structure which is not thread safe across threads, you need to add locking. I would avoid using a ByteBuffer in multiple threads if you can.

Does volatile write of a variable avoids out of order writes?

Does volatile write assure that whatever writes (non-volatile / volatile writes) happens before it in one thread will be visible to other thread?
Will the following given code always produce 90,80 as output?
public class MyClass
{
private boolean flag = false;
private volatile int volatileInt = 0;
private int nonVolatileInt = 0;
public void initVariables()
{
nonVolatileInt = 90; // non-volatile write
volatileInt = 80; // volatile write
flag = true; // non-volatile write
}
public void readVariables()
{
while (flag == false)
{}
System.out.println(nonVolatileInt + ","+ volatileInt);
}
public static void main(String st[])
{
final MyClass myClass = new MyClass();
Thread writer = new Thread( new Runnable()
{
public void run()
{
myClass.initVariables();
}
});
Thread reader = new Thread ( new Runnable()
{
public void run()
{
myClass.readVariables();
}
});
reader.start();writer.start();
}
}
My concern is the method initVariables(). Isn't JVM has a freedom to reorder the code blocks in following way?:
flag = true;
nonVolatileInt = 90 ;
volatileInt = 80;
And consequently, we get the output by the reader thread as : 0,0
Or, they can be reordered in the following way:
nonVolatieInt = 90;
flag = true;
volatileInt = 80;
And consequently, we get the output by the reader thread as : 90,0

A volatile write ensures that writes already performed do not appear after this write. However to ensure you see this you need to perform a volatile read first.
And consequently, we get the output by the reader thread as : 90,0
Correct. However if you perform your reads correctly you cannot get 0, 80
0, 0 - ok
90, 0 - ok
90, 80 - ok
0, 80 - breaks happens before.
However, your reads do not ensure happens before behaviour as it doesn't perform the volatile read first.
System.out.println(nonVolatileInt + ","+ volatileInt);
This reads the non-volatile fields first, so you could see an old version of the non-volatile field and a new version of the volatile field.
Note: in reality, you are highly unlikely to see a problem. This is because caches invalidate a whole cache line at a time and if these fields are in the same 64-byte block, you shouldn't see an inconsistency.
What is more likely to be a problem is this loop.
while (flag == false)
{}
The problem is; the JIT can see your thread nevers writes to flag so it can inline the value. i.e. it never needs to read the value. This can result in an infinite loop.
http://vanillajava.blogspot.co.uk/2012/01/demonstrating-when-volatile-is-required.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Multi-threads writing same thing to the same file? - java

Related

Java Concurrency - Read Write Lock Performance

Questions about Threads and Callbacks in Java

Is Files.write() method thread safe. [duplicate]

Writing buffers to a Java channel: Thread-safe or not?

Does volatile write of a variable avoids out of order writes?

Categories

Resources