I am a beginner in Multi threading and have this one doubt:
Is there any other alternative for traditional Synchronisation(which uses synchronised keywords) in java,since it affects the performance of the application?
As others have indicated, it depends on what you're trying to avoid, as well as what you're trying to achieve with multithreading.
If you mean "is there a zero-overhead way to do multithreading with shared resources," the answer is almost certainly "no." If two cars going in different directions approach an intersection at the same time, one of them will have to wait for the other one - there's no way that the cars can occupy the same space at the same time. That's why we have stop signs and traffic lights. (Alternatively, there are things like traffic circles, but even those have some overhead - you really can't just go through them at full speed as if they weren't there).
There are lots of ways of doing asynchronous and parallel operations other than using that type of synchronization:
Non-blocking I/O. The argument here is that, when you're interacting with a server or slow I/O device or something, most of the time is spent waiting for a response from the device or server, so you really don't need multiple threads to handle that - you just need to allow the original thread to do other work while it's waiting for a response. My usual analogy here is: suppose you go out to eat with a group of 10 people. When the waiter comes to take orders, the first person he asks to order isn't ready yet. The sensible thing to do, of course, is for the waiter to take other people's orders first, and then to come back to the first guy. There's no need to bring in separate waiters for each person's orders, bring in another waiter to wait for the first guy, or anything like that.
Promise/futures based async
Event-driven async
Using immutable data structures to minimize the amount of shared resources.
There are, of course, a lot of types of locking and synchronization mechanisms available other than just the synchronized keywords, such as counting semaphores, reader-writer locks, etc.
There are a lot of other types of concurrency as well, such as the actor model.
When used properly, these can help minimize your overhead and possibly reduce the amount of explicit locking and synchronization required. They all have overhead, though.
TL;DR You have overhead no matter what you do - just select the design and primitives that result in the smallest overhead for your particular use case.
You can look for ReentrantLock and ReentrantReadWriteLock.
Related
There is a bus booking site and around 100k threads are acting on it but total seats are only 20. How will you control the performance and what will be your approach in Java multithreading?
I replied that i will go for synchronize methods or block as it will control the concurrent threads and lock will prevent asynchronous execution.
But the interviewer interrupted me and said that syncronize is a bogus idea and performance will degrade and its not helpful.
Please let me know if i am missing any other useful multithreading concept that can fix this issue
You have to use Semaphore of permits 20, it means at a time 20 passengers can occupy seats if passengers are considered as thread. This may be answer only in case of core java multi threading.
This question is not about thread concept. Interviewer actually wants you not to use locking because that will make the application very slow if 100K threads are accessing the same object. It can be achieved with the optimistic locking (in jpa or hibernate).
This is accomplished by a version column in the database table, with a corresponding version attribute (#Version annotation) in the entity class. When a row is modified, the version value is incremented. The original transaction checks the version attribute, and if the data has been modified by another transaction, a javax.persistence.OptimisticLockException will be thrown, and the original transaction will be rolled back. We can use Date and timeStamp as well.
Please have a look here
javaEE Entity locking docs.oracle.com
An interviewer who asks a question like this is not looking for some specific solution or idea. They're looking to hear how you reason about synchronization. A sensible answer might have been:
With so many threads, clearly nobody cares about performance. If you care about performance, reduce the number of threads. They're just fighting over the same resource.
But if you must have so many threads, using locks makes a lot of sense as it will negate some of the harm from having so many threads by keeping most of the threads out of the ready-to-run state for as long as possible. While some method other than locks might keep more threads running at the same time, it would likely decrease performance as the threads would contend heavily causing extensive cache ping-ponging and other issues. With so many threads, blocking most of them is the best strategy to avoid contention among the executing threads and allow this poorly-designed application to actually at least get some work done.
Again, if you really want this to work sensibly, you have to rearchitect it out of the insane thread count issue.
Can anybody suggest me how can can I show Statistically difference between Normal
Multithreading and Executors with multithreading in-terms of as e.g CPU time,Total thread
user time,memory usage, & so on
Any suggestions will be helpful.
I am not sure I understand the term "Statistically difference". I believe that you are asking about using of executors and plain thread API and what is the difference among them.
First, executors a based on threads; it is just yet another layer on top of them. No magic. Plain threading API allows you creation and managing of multithreaded applications but requires dealing with gory details of thread synchronization, pooling, transfering data between threads etc.
Executors framework solves some of these problems. You can define thread pool policy, choose queue type according to your needs and just put new tasks to the incoming queue. The thread pool will execute the tasks according to it configuration.
The problem is that what your question is asking something that makes little sense.
Before you can meaningfully talk about the "statistical difference" between things, you have to have some way of quantifying and measuring them. And before that can happen, you have a clear statement of what you are trying to quantify / measure.
What you are asking satisfies none of these criteria.
Assuming that you have a meaningful question ...
At a practical level, the normal way that people try to quantify the effect of something like this (using thread pools versus creating new threads) is to develop a benchmark application with variants corresponding to the two strategies. Then measure the relative performance. But this has many problems.
The most fundamental problem that what you are actually measuring is effect of the two strategies for that benchmark, and that benchmark only. Generalizing from the benchmark to other applications is very difficult. The problem is that there are "hidden parameters" embedded in the design of any benchmark. For instance, the number of processors, the number of threads, the length and complexity of the tasks, and so on. Without having a good intuition as to what the parameters are, it is difficult to design a benchmark to take them into account. And even if you succeed in figuring out what the hidden parameters are and quantifying their effect, you have the problem that you can't figure out what those parameters will be in a real (more complex) application. At the end of the day, you'll end up with a model that can't give you quantitative answers for real problems. (Computing has nothing like Newton's Law of Gravity.)
I am developing a text-based game, MUD. I have the base functions of the program ready, and now I would like to allow to connect more than one client at a time. I plan to use threads to accomplish that.
In my game I need to store information such as current position or health points for each player. I could hold it in the database, but as it will change very quick, sometimes every second, the use of database would be inefficient (am I right?).
My question is: can threads behave as "sessions", ie hold some data unique to each user?
If yes, could you direct me to some resources that I could use to help me understand how it works?
If no, what do you suggest? Is database a good option or would you recommend something else?
Cheers,
Eleeist
Yes, they can, but this is a mind-bogglingly stupid way to do things. For one thing, it permanently locks you into a "one thread per client" model. For another thing, it makes it difficult (maybe even impossible) to implement interactions between users, which I'm sure your MUD has.
Instead, have a collection of some kind that stores your users, with data on each user. Save persistent data to the database, but you don't need to update ephemeral data on every change.
One way to handle this is to have a "changed" boolean in each user. When you make a critical change to a user, write them to the database immediately. But if it's a routine, non-critical change, just set the "changed" flag. Then have a thread come along every once in a while and write out changed users to the database (and clear the "changed" flag).
Use appropriate synchronization, of course!
A Thread per connection / user session won't scale. You can only have N number of threads active where N is equal to the number of physical cores / processors your machine has. You are also limited by the amount of memory in your machine for how many threads you can create a time, some operating systems just put arbitrary limits as well.
There is nothing magical about Threads in handling multiple clients. They will just make your code more complicated and less deterministic and thus harder to reason about what is actually happening when you start hunting logic errors.
A Thread per connection / user session would be an anti-pattern!
Threads should be stateless workers that pull things off concurrent queues and process the data.
Look at concurrent maps for caching ( or use some appropriate caching solution ) and process them and then do something else. See java.util.concurrent for all the primitive classes you need to implement something correctly.
Instead of worrying about threads and thread-safety, I'd use an in-memory SQL database like HSQLDB to store session information. Among other benefits, if your MUD turns out to be the next Angry Birds, you could more easily scale the thing up.
Definitely you can use threads as sessions. But it's a bit off the mark.
The main point of threads is the ability of concurrent, asynchronous execution. Most probably, you don't want events received from your MUD clients to happen in an parallel, uncontrolled order.
To ensure consistency of the world I'd use an in-memory database to store the game world. I'd serialize updates to it, or at least some updates to it. Imagine two players in parallel hitting a monster with HP 100. Each deals 100 damage. If you don't serialize the updates, you could end up giving credit for 100 damage to both players. Imagine two players simultaneously taking loot from the monster. Without proper serialization they could end up each with their own copy of the loot.
Threads, on the other hand, are good for asynchronous communication with clients. Use threads for that, unless something else (like a web server) does that for you already.
ThreadLocal is your friend! :)
http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
ThreadLocal provides storage on the Thread itself. So the exact same call from 2 different threads will return/store different data.
The biggest danger is having a leak between Threads. You would have to be absolutely sure that if a different user used a Thread that someone else used, you would reset/clear the data.
In order to avoid race condition, we can synchronize the write and access methods on the shared variables, to lock these variables to other threads.
My question is if there are other (better) ways to avoid race condition? Lock make the program slow.
What I found are:
using Atomic classes, if there is only one shared variable.
using a immutable container for multi shared variables and declare this container object with volatile. (I found this method from book "Java Concurrency in Practice")
I'm not sure if they perform faster than syncnronized way, is there any other better methods?
thanks
Avoid state.
Make your application as stateless as it is possible.
Each thread (sequence of actions) should take a context in the beginning and use this context passing it from method to method as a parameter.
When this technique does not solve all your problems, use the Event-Driven mechanism (+Messaging Queue).
When your code has to share something with other components it throws event (message) to some kind of bus (topic, queue, whatever).
Components can register listeners to listen for events and react appropriately.
In this case there are no race conditions (except inserting events to the queue). If you are using ready-to-use queue and not coding it yourself it should be efficient enough.
Also, take a look at the Actors model.
Atomics are indeed more efficient than classic locks due to their non-blocking behavior i.e. a thread waiting to access the memory location will not be context switched, which saves a lot of time.
Probably the best guideline when synchronization is needed is to see how you can reduce the critical section size as much as possible. General ideas include:
Use read-write locks instead of full locks when only a part of the threads need to write.
Find ways to restructure code in order to reduce the size of critical sections.
Use atomics when updating a single variable.
Note that some algorithms and data structures that traditionally need locks have lock-free versions (they are more complicated however).
Well, first off Atomic classes uses locking (via synchronized and volatile keywords) just as you'd do if you did it yourself by hand.
Second, immutability works great for multi-threading, you no longer need monitor locks and such, but that's because you can only read your immutables, you cand modify them.
You can't get rid of synchronized/volatile if you want to avoid race conditions in a multithreaded Java program (i.e. if the multiple threads cand read AND WRITE the same data). Your best bet is, if you want better performance, to avoid at least some of the built in thread safe classes which do sort of a more generic locking, and make your own implementation which is more tied to your context and thus might allow you to use more granullar synchronization & lock aquisition.
Check out this implementation of BlockingCache done by the Ehcache guys;
http://www.massapi.com/source/ehcache-2.4.3/src/net/sf/ehcache/constructs/blocking/BlockingCache.java.html
One of the alternatives is to make shared objects immutable. Check out this post for more details.
You can perform up to 50 million lock/unlocks per second. If you want this to be more efficient I suggest using more course grain locking. i.e. don't lock every little thing, but have locks for larger objects. Once you have much more locks than threads, you are less likely to have contention and having more locks may just add overhead.
I am kicking off my final year project right now. I am going to be investigating the concurrency approaches from java and scala perspectives. Having come out of a java concurrency module, I can see why people say that the shared state threading approach is difficult to reason about. You have critical sections to worry about, run the risk of race conditions and deadlocks etc due to the non deterministic way in which java threads operate. With 1.5 this reasoning was given some clarity ,but still, far from crystal clear.
At first view, scala appears to remove this complex reasoning through the actors class. This has given the programmer the ability to develop concurrent systems from a more sequential viewpoint and easier to conceptualize. But, for this positive, am I right in saying that there are some drawbacks? For instance, say we want to sort a large list in both scenarios - with java you create two threads split the list in two, worry about the critical sections, atomic actions etc and go code. With scala, because it is "share nothing" you actually have to pass the list/2 to two actors to peform the sort operation, right?
I guess my question is that the price you pay for simpler reasoning is performance overhead of having to pass the collection to your actors, in scala?
I was thinking of doing some benchmark tests to this effect (selection sort, quick sort etc;) but because one is functional and one is imperative - I will not be comparing apples with apples from an algorithm viewpoint.
I would really appreciate any views you guys have on the above to give me some ideas to get me started.
Many thanks.
The nice thing about Scala is that you can do concurrency the Java way if you want. All the Java classes are available.
So it really boils down to the difference between a model where you have threads with concurrent access to mutable variables, and a model where you have stateful actors which send messages to each other but do not peek into each others' internals. And you're absolutely right that in some scenarios you have to trade off performance against ease of getting the code correct.
I generally find as a rough rule of thumb that if you're going to have a pile of threads spending a significant amount of time waiting for a lock to open up, using a Java model, and there is no clean way to separate the work to avoid having everyone waiting for that resource, and if the execution switches between threads quickly, then the Java model is far superior to an actor model where the actor sends an "I'm done" message back to a supervisor, which then sends out a "Here's new work!" message to an existing non-busy actor. Sorting algorithms, depending on how you envision them, can very much fall into this category.
For most everything else, the performance penalty associated with actors doesn't amount to much as far as I've seen. If you can conceive of your problem as lots and lots of reactive elements (i.e. they only need time when they've received a message), then actors can scale particularly well (millions available, though only a handful are working at any given instant); with threads, you'd need to have some sort of extra internal state to keep track of who should be doing what work, since you couldn't handle that many active threads.
I'm just going to point out here that Scala does not copy arguments passed to actors, so actors can share whatever it is passed to them.
As opposed to Erlang, it is the programmer's responsibility to avoid sharing mutable stuff. However, there is no penalty in sharing immutable stuff, since there's no need to lock it, as all accesses to it are read-only. And Scala has strong support for immutable data structures.