Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm midway through programming a Java program, and I'm at the stage where I'm debugging far more concurrency issues than I'd like to be dealing with.
I have to ask: how do you deal with concurrency issues when setting out your program mentally? In my case, it's for a relatively simple game, yet issues with threads keep popping up - any quick-fix almost certainly leads to a new issue.
Speaking in very general terms, what techniques should I use when deciding how my application should 'flow' with out all my threads getting in a knot?
Concurrency boils down to managing shared state.
"All concurrency issues boil down to
coordinating access to mutable state.
The less mutable state, the easier it
is to ensure thread safety."
-- Java Concurrency in Practice
So the question you must ask yourself are:
What is the inherent shared data that the my application will need?
When can a thread work on a snapshot of the data, that is, it momentary work on a clone of the shared data?
Can I identify known pattern and use higher-level abstraction rather than low-level locks and thread coordination, e.g. queues, executor, etc. ?
Think of a global locking scheme as to avoid deadlock and have a consistent acquisition of locks
The simplest approach to manage shared state is to serialize every action. This coarse-grained approach results however into a high lock contention and poor performance. Managing concurrency can be seen an optimization exercise where you try to reduce the contention. So subsequent questions are:
How would the simplest approach be?
What are the simple choice that I can make to reduce contention (possibly with fine grained locking) and improve performance without making the solution overly complicated?
When am I going too fined-grained, that is, the complexity introduced isn't worth the performance gain?
A lot of approach to reduce contention rely on some form of trade-off between what would be necessary to enforce the correct behavior and what is feasible to reduce contention.
Where can I relax a few constraint and accept that sometimes stuff won't be 100% correct (e.g. a counter) ?
Can I be optimistic and deal with conflict only when concurrent modifications happen (e.g. using time stamp and retry logic - that's what TM do)?
Note that I never worked on a game, only on server-side part of enterprise apps. I can imagine that it can be quite different.
I use immutable data structures as much as possible. About the only time I do use mutable structures is when I have to such as with a library that will save a boatload of work. Even then I try to encapsulate that library in an immutable structure. If things can't change then there's less to worry about.
I should add that some things to keep in mind on your future endeavors are STM and Actor models. Both of these approaches to concurrency are showing very good progress. While there is some overhead for each, depending on the nature of your program that might not be an issue.
Edit:
Here are a few links to some libraries you could use in your next project. There's Deuce STM which as the name implies is an STM implementation for Java. Then there's the ActorFoundry which as the name implies is an Actor model for Java. However, I can't help but make the plug for Scala with its built in Actor model.
The fewer threads you have, the smaller state they share, and the simpler their interaction pattern on this shared state, the simpler your life will be.
You say Lists are throwing ConcurrentModificationException. I take it that your lists are acessed by seperate threads. So the first thing you should ask yourself is whether this is necessary. Is it not possible for the second thread to operate on a copy of the list?
If it is indeed necessary for the threads to access the list concurrently, locking the list during the entire traversal might be an option (Iterators are invalidated if the list is modified by any other means than that iterator). Of course, if you do other things while traversing the list, this traversal might take long, and locking out other threads might threaten the liveness of the system.
Also keep in mind that if the list is shared state, so are its contents, so if you intend to circumwent locking by copying the list, be sure to perform a deep copy, or prove that the objects contained in the list are themselves thread safe.
It's possible that the multi-threaded nature of your application might be a red herring, with respect to the ConcurrentModificationExceptions you mentioned: there are other ways that you can get a ConcurrentModificationException that don't necessarily involve multiple threads.
Consider the following:
List<Item> items = new ArrayList<Item>();
//... some code adding items to the list
for (Item item : items) {
if(item.isTheOneIWantToRemove()) {
items.remove(item); //This will result in a ConcurrentModificationException
}
}
Changing your for loop to a loop with an iterator, or an increasing index value solves the problem:
for (Iterator<String> it = items.iterator(); it.hasNext();) {
if(item.isTheOneIWantToRemove()) {
it.remove(); //No exception thrown
}
}
or
for (int i = 0; i < items.size(); i++) {
if(item.isTheOneIWantToRemove()) {
items.remove(items.get(i)); //No exception thrown
}
}
From the design perspective, I've found it useful to draw sequence diagrams where each thread's actions are color coded (that is, each thread has its own color). Using color in this way may be a non-standard use of a sequence diagram, but it's good for giving an overview of how and where threads interract.
As others have mentioned though, reducing the amount of threading in your design to the absolute minimum it needs to work properly will help a lot as well.
It depends what your threads do. Typically programs have a main thread that does the thinking and worker threads to do parallel tasks (timers, handling long computations on a GUI, etc.) But your app may be different - it depends on your design. What do you use threads for? What locks do you have to protect shared datastructures? If you use multiple locks, do you have a single order in which you lock to prevent deadlocks?
Try to use collections from java.util.concurrent package or even better immutable collections from Google Collections.
Read about using synchronized blocks
Related
I am a beginner in Multi threading and have this one doubt:
Is there any other alternative for traditional Synchronisation(which uses synchronised keywords) in java,since it affects the performance of the application?
As others have indicated, it depends on what you're trying to avoid, as well as what you're trying to achieve with multithreading.
If you mean "is there a zero-overhead way to do multithreading with shared resources," the answer is almost certainly "no." If two cars going in different directions approach an intersection at the same time, one of them will have to wait for the other one - there's no way that the cars can occupy the same space at the same time. That's why we have stop signs and traffic lights. (Alternatively, there are things like traffic circles, but even those have some overhead - you really can't just go through them at full speed as if they weren't there).
There are lots of ways of doing asynchronous and parallel operations other than using that type of synchronization:
Non-blocking I/O. The argument here is that, when you're interacting with a server or slow I/O device or something, most of the time is spent waiting for a response from the device or server, so you really don't need multiple threads to handle that - you just need to allow the original thread to do other work while it's waiting for a response. My usual analogy here is: suppose you go out to eat with a group of 10 people. When the waiter comes to take orders, the first person he asks to order isn't ready yet. The sensible thing to do, of course, is for the waiter to take other people's orders first, and then to come back to the first guy. There's no need to bring in separate waiters for each person's orders, bring in another waiter to wait for the first guy, or anything like that.
Promise/futures based async
Event-driven async
Using immutable data structures to minimize the amount of shared resources.
There are, of course, a lot of types of locking and synchronization mechanisms available other than just the synchronized keywords, such as counting semaphores, reader-writer locks, etc.
There are a lot of other types of concurrency as well, such as the actor model.
When used properly, these can help minimize your overhead and possibly reduce the amount of explicit locking and synchronization required. They all have overhead, though.
TL;DR You have overhead no matter what you do - just select the design and primitives that result in the smallest overhead for your particular use case.
You can look for ReentrantLock and ReentrantReadWriteLock.
In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.
I am kicking off my final year project right now. I am going to be investigating the concurrency approaches from java and scala perspectives. Having come out of a java concurrency module, I can see why people say that the shared state threading approach is difficult to reason about. You have critical sections to worry about, run the risk of race conditions and deadlocks etc due to the non deterministic way in which java threads operate. With 1.5 this reasoning was given some clarity ,but still, far from crystal clear.
At first view, scala appears to remove this complex reasoning through the actors class. This has given the programmer the ability to develop concurrent systems from a more sequential viewpoint and easier to conceptualize. But, for this positive, am I right in saying that there are some drawbacks? For instance, say we want to sort a large list in both scenarios - with java you create two threads split the list in two, worry about the critical sections, atomic actions etc and go code. With scala, because it is "share nothing" you actually have to pass the list/2 to two actors to peform the sort operation, right?
I guess my question is that the price you pay for simpler reasoning is performance overhead of having to pass the collection to your actors, in scala?
I was thinking of doing some benchmark tests to this effect (selection sort, quick sort etc;) but because one is functional and one is imperative - I will not be comparing apples with apples from an algorithm viewpoint.
I would really appreciate any views you guys have on the above to give me some ideas to get me started.
Many thanks.
The nice thing about Scala is that you can do concurrency the Java way if you want. All the Java classes are available.
So it really boils down to the difference between a model where you have threads with concurrent access to mutable variables, and a model where you have stateful actors which send messages to each other but do not peek into each others' internals. And you're absolutely right that in some scenarios you have to trade off performance against ease of getting the code correct.
I generally find as a rough rule of thumb that if you're going to have a pile of threads spending a significant amount of time waiting for a lock to open up, using a Java model, and there is no clean way to separate the work to avoid having everyone waiting for that resource, and if the execution switches between threads quickly, then the Java model is far superior to an actor model where the actor sends an "I'm done" message back to a supervisor, which then sends out a "Here's new work!" message to an existing non-busy actor. Sorting algorithms, depending on how you envision them, can very much fall into this category.
For most everything else, the performance penalty associated with actors doesn't amount to much as far as I've seen. If you can conceive of your problem as lots and lots of reactive elements (i.e. they only need time when they've received a message), then actors can scale particularly well (millions available, though only a handful are working at any given instant); with threads, you'd need to have some sort of extra internal state to keep track of who should be doing what work, since you couldn't handle that many active threads.
I'm just going to point out here that Scala does not copy arguments passed to actors, so actors can share whatever it is passed to them.
As opposed to Erlang, it is the programmer's responsibility to avoid sharing mutable stuff. However, there is no penalty in sharing immutable stuff, since there's no need to lock it, as all accesses to it are read-only. And Scala has strong support for immutable data structures.
All,
What should be the approach to writing a thread safe program. Given a problem statement, my perspective is:
1 > Start of with writing the code for a single threaded environment.
2 > Underline the fields which would need atomicity and replace with possible concurrent classes
3 > Underline the critical section and enclose them in synchronized
4 > Perform test for deadlocks
Does anyone have any suggestions on the other approaches or improvements to my approach. So far, I can see myself enclosing most of the code in synchronized blocks and I am sure this is not correct.
Programming in Java
Writing correct multi-threaded code is hard, and there is not a magic formula or set of steps that will get you there. But, there are some guidelines you can follow.
Personally I wouldn't start with writing code for a single threaded environment and then converting it to multi-threaded. Good multi-threaded code is designed with multi-threading in mind from the start. Atomicity of fields is just one element of concurrent code.
You should decide on what areas of the code need to be multi-threaded (in a multi-threaded app, typically not everything needs to be threadsafe). Then you need to design how those sections will be threadsafe. Methods of making one area of the code threadsafe may be different than making other areas different. For example, understanding whether there will be a high volume of reading vs writing is important and might affect the types of locks you use to protect the data.
Immutability is also a key element of threadsafe code. When elements are immutable (i.e. cannot be changed), you don't need to worry about multiple threads modifying them since they cannot be changed. This can greatly simplify thread safety issues and allow you to focus on where you will have multiple data readers and writers.
Understanding details of concurrency in Java (and details of the Java memory model) is very important. If you're not already familiar with these concepts, I recommend reading Java Concurrency In Practice http://www.javaconcurrencyinpractice.com/.
You should use final and immutable fields wherever possible, any other data that you want to change add inside:
synchronized (this) {
// update
}
And remember, sometimes stuff brakes, and if that happens, you don't want to prolong the program execution by taking every possible way to counter it - instead "fail fast".
As you have asked about "thread-safety" and not concurrent performance, then your approach is essentially sound. However, a thread-safe program that uses synchronisation probably does not scale much in a multi cpu environment with any level of contention on your structure/program.
Personally I like to try and identify the highest level state changes and try and think about how to make them atomic, and have the state changes move from one immutable state to another – copy-on-write if you like. Then the actual write can be either a compare-and-set operation on an atomic variable or a synchronised update or whatever strategy works/performs best (as long as it safely publishes the new state).
This can be a bit difficult to structure if your new state is quite different (requires updates to several fields for instance), but I have seen it very successfully solve concurrent performance issues with synchronised access.
Buy and read Brian Goetz's "Java Concurrency in Practice".
Any variables (memory) accessible by multiple threads potentially at the same time, need to be protected by a synchronisation mechanism.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
I have been considering adding threaded procedures to my application to speed up execution, but the problem is that I honestly have no idea how to use threads, or what is considered "thread safe". For example, how does a game engine utilize threads in its rendering processes, or in what contexts would threads only be considered nothing but a hindrance? Can someone point the way to some resources to help me learn more or explain here?
This is a very broad topic. But here are the things I would want to know if I knew nothing about threads:
They are units of execution within a single process that happen "in parallel" - what this means is that the current unit of execution in the processor switches rapidly. This can be achieved via different means. Switching is called "context switching", and there is some overhead associated with this.
They can share memory! This is where problems can occur. I talk about this more in depth in a later bullet point.
The benefit of parallelizing your application is that logic that uses different parts of the machine can happen simultaneously. That is, if part of your process is I/O-bound and part of it is CPU-bound, the I/O intensive operation doesn't have to wait until the CPU-intensive operation is done. Some languages also allow you to run threads at the same time if you have a multicore processor (and thus parallelize CPU-intensive operations as well), though this is not always the case.
Thread-safe means that there are no race conditions, which is the term used for problems that occur when the execution of your process depends on timing (something you don't want to rely on). For example, if you have threads A and B both incrementing a shared counter C, you could see the case where A reads the value of C, then B reads the value of C, then A overwrites C with C+1, then B overwrites C with C+1. Notice that C only actually increments once!
A couple of common ways avoid race conditions include synchronization, which excludes mutual access to shared state, or just not having any shared state at all. But this is just the tip of the iceberg - thread-safety is quite a broad topic.
I hope that helps! Understand that this was a very quick introduction to something that requires a good bit of learning. I would recommend finding a resource about multithreading in your preferred language, whatever that happens to be, and giving it a thorough read.
There are four things you should know about threads.
Threads are like processes, but they share memory.
Threads often have hardware, OS, and language support, which might make them better than processes.
There are lots of fussy little things that threads need to support (like locks and semaphores) so they don't get the memory they share into an inconsistent state. This makes them a little difficult to use.
Locking isn't automatic (in the languages I know), so you have to be very careful with the memory they (implicitly) share.
Threads don't speed up applications. Algorithms speed up applications. Threads can be used in algorithms, if appropriate.
Well someone will probably answer this better, but threads are for the purpose of having background processing that won't freeze the user interface. You don't want to stop accepting keyboard input or mouse input, and tell the user, "just a moment, I want to finish this computation, it will only be a few more seconds." (And yet its amazing how many times commercial programs do this.
As far as thread safe, it means a function that does not have some internal saved state. If it did you couldn't have multiple threads using it simutaneously.
As far as thread programming you just have to start doing it, and then you'll start encountering various issues unique to thread programming, for example simultaneuous access to data, in which case you have to decide to use some syncronization method such as critical sections or mutexes or something else, each one having slightly different nuances in their behavior.
As far as the differences between processes and threads (which you didn't ask) processes are an OS level entity, whereas threads are associated with a program. In certain instances your program may want to create a process rather than a thread.
Threads are simply a way of executing multiple things simultaneously (assuming that the platform on which they are being run is capable of parallel execution). Thread safety is simply (well, nothing with threads is truly simple) making sure that the threads don't affect each other in harmful ways.
In general, you are unlikely to see systems use multiple threads for rendering graphics on the screen due to the multiple performance implications and complexity issues that may arise from that. Other tasks related to state management (or AI) can potentially be moved to separate threads however.
First rule of threading: don't thread. Second rule of threading: if you have to violate rule one...don't. Third rule: OK, fine you have to use threads, so before proceeding get your head into the pitfalls, understand locking and the common thread problems such as deadlock and livelocking.
Understand that threading does not speed up anything, it is only useful to background long-running processes allowing the user can do something else with the application. If you have to allow the user to interact with the application while the app does something else in the background, like poll a socket or wait for ansynchronous input from elsewhere in the application, then you may indeed require threading.
The thread sections in both Effective Java and Clean Code are good introductions to threads and their pitfalls.
Since the question is tagged with 'Java', I assume you are familiar with Java, in which case this is a great introductory tutorial
http://java.sun.com/docs/books/tutorial/essential/concurrency/
Orm, great question to ask. I think all serious programmers should learn about threads, cause eventually you will at least consider using them and you really want to be prepared when it happens. Concurrency bugs can be incredibly subtle and the best way to avoid them is to know what idioms are safe(-ish).
I highly recommend you take the time to read the book Concurrent Programming in Java: Design Principles and Patterns by Doug Lea:
http://gee.cs.oswego.edu/dl/cpj/
Lea takes the time not only to teach you the concepts, but also to show you the correct and incorrect ways to use the concurrent programming primitives (in Java but also helpful for any other environment that uses shared-memory locking/signaling style concurrency). Most of all he teaches respect for the difficulty of concurrent programming.
I should add that this style of concurrent programming is the most common but not the only approach. There's also message passing, which is safer but forces you to structure your algorithm differently.
Since the original post is very broad, and also tagged with C++, I think the following pointers are relevant:
Anthony Williams, maintainer of the Boost Thread Library, has been working on a book called "C++ Concurrency in Action", a description of which you can find here. The first (introductory) chapter is available for free in pdf form here.
Also, Herb Sutter (known, among other things, for his "Exceptional C++" series) has been writing a book to be called "Effective Concurrency", many articles of which are available in draft form here.
There's a nice book, Java Concurrency in Practice, http://www.javaconcurrencyinpractice.com/ .