I know that thread safety of java sockets has been discussed in several threads here on stackoverflow, but I haven't been able to find a clear answer to this question - Is it, in practice, safe to have multiple threads concurrently write to the same SocketOutputStream, or is there a risk that the data sent from one thread gets mixed up with the data from another tread? (For example the receiver on the other end first receives the first half of one thread's message and then some data from another thread's message and then the rest of the first thread's message)
The reason I said "in practice" is that I know the Socket class isn't documented as thread-safe, but if it actually is safe in current implementations, then that's good enough for me. The specific implementation I'm most curious about is Hotspot running on Linux.
When looking at the Java layer of hotspot's implementation, more specifically the implementation of socketWrite() in SocketOutputStream, it looks like it should be thread safe as long as the native implementation of socketWrite0() is safe. However, when looking at the implemention of that method (j2se/src/solaris/native/java/net/SocketOutputStream.c), it seems to split the data to be sent into chunks of 64 or 128kb (depending on whether it's a 64bit JVM) and then sends the chunks in seperate writes.
So - to me, it looks like sending more than 64kb from different threads is not safe, but if it's less than 64kb it should be safe... but I could very well be missing something important here. Has anyone else here looked at this and come to a different conclusion?
I think it's a really bad idea to so heavily depend on the implementation details of something that can change beyond your control. If you do something like this you will have to very carefully control the versions of everything you use to make sure it's what you expect, and that's very difficult to do. And you will also have to have a very robust test suite to verify that the multithreaded operatio functions correctly since you are depending on code inspection and rumors from randoms on StackOverflow for your solution.
Why can't you just wrap the SocketOutputStream into another passthrough OutputStream and then add the necessary synchronization at that level? It's much safer to do it that way and you are far less likely to have unexpected problems down the road.
According to this documentation http://www.docjar.com/docs/api/java/net/SocketOutputStream.html, the class does not claim to be thread safe, and thus assume it is not. It inherits from FileOutputStream, which normally file I/O is not inherently thread safe.
My advice is that if the class is related to hardware or communications, it is not thread safe or "blocking". The reason is thread safe operations consume more time, which you may not like. My background is not in Java but other libraries are similar in philosophy.
I notice you tested the class extensively, but you may test it all day for many days, and it may not prove anything, my 2-cents.
Good luck & have fun with it.
Tommy Kwee
Related
Can anyone please explain to me the consequences of mutating a collection in java that is not thread-safe and is being used by multiple threads?
The results are undefined and somewhat random.
With JDK collections that are designed to fail fast, you might receive a ConcurrentModificationException. This is really the only consequence that is specific to thread safety with collections, as opposed to any other class.
Problems that occur generally with thread-unsafe classes may occur:
The internal state of the collection might be corrupted.
The mutation may appear to be successful, but the changes may not, in fact, be visible to other threads at any given time. They might be invisible at first and become visible later.
The changes might actually be successful under light load, but fail randomly under heavy load with lots of threads in contention.
Race conditions might occur, as was mentioned in a comment above.
There are lots of other possibilities, none of them pleasant. Worst of all, these things tend to most commonly reveal themselves in production, when the system is stressed.
In short, you probably don't want to do that.
The most common outcome is it looks like it works, but doesn't work all the time.
This can mean you have a problem which
works on one machine but doesn't on another.
works for a while but something apparently unrelated changes and your program breaks.
whenever you have a bug you don't know if it's a multi-threading issue or not if you are not using thread safe data structures.
What can happen is;
you rarely/randomly get an error and strange behaviour
your code goes into an infinite loop and stops working (HashMap used to do this)
The only option is to;
limit the amount of state which is shared between threads, ideally none at all.
be very careful about how data is updated.
don't rely on unit tests, you have to understand what the code doing and be confident it will be behave correctly in all possible situations.
The invariants of the data structure will not be guaranteed.
For example:
If thread 2 does a read whilst thread 1 is adding to the DS thread 1 may consider this element added while thread 2 doesn't see that the element has been added yet.
There are plenty of data structures that aren't thread-safe that will still appear to function(i.e. not throw) in a multi threaded environment and they might even perform correctly under certain circumstances(like if you aren't doing any writes to the data structure).
To fully understand this topic exploring the different classes of bugs that occur in concurrent systems is recommended: this short document seems like a good start.
http://pages.cs.wisc.edu/~remzi/OSTEP/threads-bugs.pdf
I would like opinion on this to settle a small dispute. Any help would be greatly appreciated.
I have written my own file handler that is attached to the logger. This being a file handler and being accessed by multiple threads, I am using synchronization in order to ensure that there is no collision during the writing process. Additionally it is a rolling log, so I also close and open files, and do not want any problems there either.
His response to it was (as pasted from email)
I strongly believe that Synchronization is very bad in the Handler. It
is too complex for such easy task. So, I would say why do not use one
instance per Thread?
What would you say is better from performance's and memory management perspective.
Thank you very much for any response. Whenever writing and reading is involved in multithreaded applications I have used synchronization on java applications all my life, and have not heard of any severe performance issues.
So please I would like to know if there are any issues and I really should switch to one instance per thread.
And in general, what would be the downfall of using synchronization?
EDIT: the reason why I wrote a custom file handler (yes I do love slf4j), is because my custom handler is dealing with two files at once, and additionally I have few other functions I perform on top of writing to files.
another solution would be to use a separate thread to do the (costly on its own) writing and use concurrent queues to pass the log messages from the domain threads
the key part here is that pushing to a queue is much less costly that writing to a file and means that there is less interference from concurrent log calls
the call to log would then log like
private static BlockingQueue logQueue = //...
public static void log(String message){
//construct&filter message
logQueue.add(message);
}
then in the logger thread it will look like
while(true){
String message = logQueue.poll();
logFile.println(message);//or whatever you are doing
}
As with all I/O, you have little choice but mutual exclusion. You may theoretically build up a complex scheme with a lock-free queue which accumulates logging entries, but its utility, and especially its reliability, would be very questionable: without careful design you could get a logging-caused OOME, have the application hang on due to threads which you didn't clean up, etc.
Keep in mind that, assuming you are using buffered I/O, you already have an equivalent of a queue, minimizing the time spent occupying the lock.
The downfall to synchronisation is the fact that only one thread can access that part of the code at any one time, meaning your code will see little benefit from multithreading I.e. the synchronised part of your application will only be as fast as a single thread. (Small overhead for handling the synchronised status too, so a little slower perhaps)
However, in subjects where you don't want the threads to interfere with one another, such as writing to files, the security gained from the synchronisation is paramount, and the performance loss should just be accepted.
I've been caught by yet another deadlock in our Java application and started thinking about how to detect potential deadlocks in the future. I had an idea of how to do this, but it seems almost too simple.
I'd like to hear people's views on it.
I plan to run our application for several hours in our test environment, using a typical data set.
I think it would be possible to perform bytecode manipulation on our application such that, whenever it takes a lock (e.g. entering a synchronized block), details of the lock are added to a ThreadLocal list.
I could write an algorithm that, at some later point, compares the lists for all threads and checks if any contain the same pair of locks in opposite order - this would be reported as a deadlock possibility. Again, I would use bytecode manipulation to add this periodic check to my application.
So my question is this: is this idea (a) original and (b) viable?
This is something that we talked about when I took a course in concurrency. I'm not sure if your implementation is original, but the concept of analysis to determine potential deadlock is not unique. There are dynamic analysis tools for Java, such as JCarder. There is also research into some analysis that can be done statically.
Admittedly, it's been a couple of years since I've looked around. I don't think JCarder was the specific tool we talked about (at least, the name doesn't sound familiar, but I couldn't find anything else). But the point is that analysis to detect deadlock isn't an original concept, and I'd start by looking at research that has produced usable tools as a starting point - I would suspect that the algorithms, if not the implementation, are generally available.
I have done something similar to this with Lock by supplying my own implementation.
These days I use the actor model, so there is little need to lock the data (as I have almost no shared mutable data)
In case you didn't know, you can use the Java MX bean to detect deadlocked threads programmatically. This doesn't help you in testing but it will help you at least better detect and recover in production.
ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
long[] deadLockedThreadIds = threadMxBean.findMonitorDeadlockedThreads();
// log the condition or even interrupt threads if necessary
...
That way you can find some deadlocks, but never prove their absence. I'd better develop static checking tool, a kind of bytecode analizer, feeded with annotations for each synchronized method. Annotations should show the place of the annotated method in the resource graph. The task is then to find loops in the graph. Each loop means deadlock.
I have multiple threads each one with its own private concurrent queue and all they do is run an infinite loop retrieving messages from it. It could happen that one of the queues doesn't receive messages for a period of time (maybe a couple seconds), and also they could come in big bursts and fast processing is necessary.
I would like to know what would be the most appropriate to do in the first case: use a blocking queue and block the thread until I have more input or do a Thread.yield()?
I want to have as much CPU resources available as possible at a given time, as the number of concurrent threads may increase with time, but also I don't want the message processing to fall behind, as there is no guarantee of when the thread will be reescheduled for execution when doing a yield(). I know that hardware, operating system and other factors play an important role here, but setting that aside and looking at it from a Java (JVM?) point of view, what would be the most optimal?
Always just block on the queues. Java yields in the queues internally.
In other words: You cannot get any performance benefit in the other threads if you yield in one of them rather than just block.
You certainly want to use a blocking queue - they are designed for exactly this purpose (you want your threads to not use CPU time when there is no work to do).
Thread.yield() is an extremely temperamental beast - the scheduler plays a large role in exactly what it does; and one simple but valid implementation is to simply do nothing.
Alternatively, consider converting your implementation to use one of the managed ExecutorService implementations - probably ThreadPoolExecutor.
This may not be appropriate for your use case, but if it is, it removes the whole burden of worrying about thread management from your own code - and these questions about yielding or not simply vanish.
In addition, if better thread management algorithms emerge in future - for example, something akin to Apple's Grand Central Dispatch - you may be able to convert your application to use it with almost no effort.
Another thing that you could do is use the concurrent hash map for your queue. When you do a read it gives you a reference of the object you were looking for, so it is possible you my miss a message that was just put into the queue. But if all this is doing is listening for a message you will catch it the next iteration. It would be different if the messages could be updated by other threads. But there doesn't really seem to be a reason to block that I can see.
I got assigned to work on some performance and random crashing issues of a multi-threaded java server. Even though threads and thread-safety are not really new topics for me, I found out designing a new multi-threaded application is probably half as difficult as trying to tweak some legacy code. I skimmed through some well known books in search of answers, but the weird thing is, as long as I read about it and analyze the examples provided, everything seems clear. However, the second I look at the code I'm supposed to work on, I'm no longer sure about anything! Must be too much of theoretical knowledge and little real-world experience or something.
Anyway, getting back on topic, as I was doing some on-line research, I came across this piece of code. The question which keeps bothering me is: Is it really safe to invoke getInputStream() and getOutputStream() on the socket from two separate threads without synchronization? Or am I now getting a bit too paranoid about the whole thread-safety issue? Guess that's what happens when like the 5th book in a row tells you how many things can possibly go wrong with concurrency.
PS. Sorry if the question is a bit lengthy or maybe too 'noobie', please be easy on me - that's my first post here.
Edit: Just to be clear, I know sockets work in full-duplex mode and it's safe to concurrently use their input and output streams. Seems fine to me when you acquire those references in the main thread and then initialize thread objects with those, but is it also safe to get those streams in two different threads?
#rsp:
So I've checked Sun's code and PlainSocketImpl does synchronize on those two methods, just as you said. Socket, however, doesn't. getInputStream() and getOutputStream() are pretty much just wrappers for SocketImpl, so probably concurrency issues wouldn't cause the whole server to explode. Still, with a bit of unlucky timing, seems like things could go wrong (e.g. when some other thread closes the socket when the method already checked for error conditions).
As you pointed out, from a code structure standpoint, it would be a good idea to supply each thread with a stream reference instead of a whole socket. I would've probably already restructured the code I'm working on if not for the fact that each thread also uses socket's close() method (e.g. when the socket receives "shutdown" command). As far as I can tell, the main purpose of those threads is to queue messages for sending or for processing, so maybe it's a Single Responsibility Principle violation and those threads shouldn't be able to close the socket (compare with Separated Modem Interface)? But then if I keep analysing the code for too long, it appears the design is generally flawed and the whole thing requires rewriting. Even if the management was willing to pay the price, seriously refactoring legacy code, having no unit tests what so ever and dealing with a hard to debug concurrency issues, would probably do more harm than good. Wouldn't it?
The input stream and output stream of the socket represent two separate datastreams or channels. It is perfectly save using both streams in threads that are not synchronised between them. The socket streams themselves will block reading and writing on empty or full buffers.
Edit: the socket implementation classes from Sun do sychronize the getInputStream() and getOutputStream() methods, calling then from different threads should be OK. I agree with you however that passing the streams to the threads using them might make more sense from a code structure standpoint (dependency injection helps testing for instance.)