I am developing a text-based game, MUD. I have the base functions of the program ready, and now I would like to allow to connect more than one client at a time. I plan to use threads to accomplish that.
In my game I need to store information such as current position or health points for each player. I could hold it in the database, but as it will change very quick, sometimes every second, the use of database would be inefficient (am I right?).
My question is: can threads behave as "sessions", ie hold some data unique to each user?
If yes, could you direct me to some resources that I could use to help me understand how it works?
If no, what do you suggest? Is database a good option or would you recommend something else?
Cheers,
Eleeist
Yes, they can, but this is a mind-bogglingly stupid way to do things. For one thing, it permanently locks you into a "one thread per client" model. For another thing, it makes it difficult (maybe even impossible) to implement interactions between users, which I'm sure your MUD has.
Instead, have a collection of some kind that stores your users, with data on each user. Save persistent data to the database, but you don't need to update ephemeral data on every change.
One way to handle this is to have a "changed" boolean in each user. When you make a critical change to a user, write them to the database immediately. But if it's a routine, non-critical change, just set the "changed" flag. Then have a thread come along every once in a while and write out changed users to the database (and clear the "changed" flag).
Use appropriate synchronization, of course!
A Thread per connection / user session won't scale. You can only have N number of threads active where N is equal to the number of physical cores / processors your machine has. You are also limited by the amount of memory in your machine for how many threads you can create a time, some operating systems just put arbitrary limits as well.
There is nothing magical about Threads in handling multiple clients. They will just make your code more complicated and less deterministic and thus harder to reason about what is actually happening when you start hunting logic errors.
A Thread per connection / user session would be an anti-pattern!
Threads should be stateless workers that pull things off concurrent queues and process the data.
Look at concurrent maps for caching ( or use some appropriate caching solution ) and process them and then do something else. See java.util.concurrent for all the primitive classes you need to implement something correctly.
Instead of worrying about threads and thread-safety, I'd use an in-memory SQL database like HSQLDB to store session information. Among other benefits, if your MUD turns out to be the next Angry Birds, you could more easily scale the thing up.
Definitely you can use threads as sessions. But it's a bit off the mark.
The main point of threads is the ability of concurrent, asynchronous execution. Most probably, you don't want events received from your MUD clients to happen in an parallel, uncontrolled order.
To ensure consistency of the world I'd use an in-memory database to store the game world. I'd serialize updates to it, or at least some updates to it. Imagine two players in parallel hitting a monster with HP 100. Each deals 100 damage. If you don't serialize the updates, you could end up giving credit for 100 damage to both players. Imagine two players simultaneously taking loot from the monster. Without proper serialization they could end up each with their own copy of the loot.
Threads, on the other hand, are good for asynchronous communication with clients. Use threads for that, unless something else (like a web server) does that for you already.
ThreadLocal is your friend! :)
http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
ThreadLocal provides storage on the Thread itself. So the exact same call from 2 different threads will return/store different data.
The biggest danger is having a leak between Threads. You would have to be absolutely sure that if a different user used a Thread that someone else used, you would reset/clear the data.
Related
I am writing a video game in my spare time and have a question about data consistency when introducing mult-threading.
At the moment my game is single threaded and has a simple game loop as it is taught in many tutorials:
while game window is not closed
{
poll user input
react to user input
update game state
render game objects
flip buffers
}
I now want to add a new feature to my game where the player can automate certain tasks that are long and tedious, like walking long distances (fast travel). I may chose to simply "teleport" the player character to their destination but I would prefer not to. Instead, the game will be sped up and the player character will actually walk as if the player was doing it manually. The benefit of this is that the game world will interact with the player character as usual and any special events that might happen will still happen and immediately stop the fast travel.
To implement this feature I was thinking about something like this:
Start a new thread (worker thread) and have that thread update the game state continuously until the player character reaches its destination
Have the main thread no longer update the game state and render the games objects as usual and instead display the travel progress in a more simplistic manner
Use a synchronized message queue to have the main thread and the worker thread communicate
When the fast travel is finished or canceled (by player interaction or other reasons) have the worker thread die and resume the standard game loop with the main thread
In pseudo code it may look like this:
[main thread]
while game window is not closed
{
poll user input
if user wants to cancel fast travel
{
write to message queue player input "cancel"
}
poll message queue about fast travel status
if fast travel finished or canceled
{
resume regular game loop
} else {
render travel status
flip buffers
}
}
[worker thread]
while (travel ongoing)
{
poll message queue
if user wants to cancel fast travel
{
write to message queue fast travel status "canceled"
return
}
update game state
if fast travel is interrupted by internal game event
{
write to message queue fast travel status "canceled"
return
}
write to message queue fast travel status "ongoing"
}
if travel was finished
{
write to message queue fast travel status "finished"
}
The message queue will be some kind of two-channeled synchronized data structure. Maybe two ArrayDeque's with a Lock for each. I am fairly certain this will not be too much trouble.
What I am more concerned is caching problems with the game data:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
4) Is there a better approach? Is my concept perhaps ill conceived?
Thanks for any help and sorry for the big chunk of text. I thought it would be easier to answer the question if you know the intended use.
Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. That's not how any of this works. Let me give you very short answers to explain why you are thinking about this the wrong way:
1.a) Could it be that the worker thread, after being started, may see old game data because the main thread may run on a different core which has cached some of its results?
Sure. Or it might for some other reason. Memory visibility is not guaranteed, so you can't rely on it unless you use something guaranteed to provide memory visilbity.
1.b) If the above is true: Would I need to declare every single field in the game data as volatile to protect myself with absolute guarantee against inconsistent data?
No. Any method of assuring memory visibility will work. You don't have to do it any particular way.
2) Am I right to assume that performance would take a non trivial hit if all fields are volatile?
Probably. This would probably be the worst possible way to do it.
3) Since I only need to pass the data between threads at few and well controlled points in time, would it be possible to force all caches to write back to main memory instead of using volatile fields?
No. Since there is no "write cache back to memory" operation that assures memory visibility. Your platform may not even have such caches and the issue might be something else entirely. You're writing Java code, you don't have to think about how your particular CPU works, what cores or caches it has, or anything like that. That's one of the big advantages of using a language with semantics that are guaranteed and don't talk about cores, caches, or anything like this.
4) Is there a better approach? Is my concept perhaps ill conceived?
Absolutely. You are writing Java code. Use the various Java synchronization classes and functions and rely on them to prove the semantics they're documented to provide. Don't even think about cores, caches, flushing to memory, or anything like that. Those are hardware details that, as a Java programmer, you don't even have to ever think about.
Any Java documentation you see that talks about cores, caches, or flushes to memory is not actually talking about real cores, caches, or flushes to memory. It's just giving you some ways to think about hypothetical hardware so you can wrap your brain around why memory visibility and total ordering don't always work perfectly just by themselves. Your real CPU or platform may have completely different issues that bear no resemblance to this hypothetical hardware. (And real-world CPUs and systems have cache coherency guaranteed by hardware and their visibility/ordering issues in fact are completely different!)
I am in the process of designing a system where there's a main stream of objects and there are multiple workers which produces some result from that object. Finally, there is some special/unique worker (sort of a "sink", in terms of graph theory) which takes all the results, and process them to some final object which is written to some DB.
It is possible for a worker to be dependent on the result of some other workers (hence, waiting for their results)
Now, I'm facing several problems:
It could be that one worker is much slower than another. How do you deal with that? Adding more workers (= scaling) of the slower type? (maybe dynamically)
Suppose W_B is dependent on W_A. If W_B is down for some reason then the flow will stop and the system will stop working. So I'd like the system to bypass this worker, somehow.
Moreover, how do the final worker decide when to operate on the set of results? Suppose it has the results of A and B but lacking the result of C. It may be that C is down or it's just very slow at the moment. How can it make a decision?
It is worth mentioning that it's not a realtime application but rather an offline processing system (i.e. you may access the DB and alter a record), but at the same time, it has to deal with relatively large amount of objects in an "high pace".
Regarding technologies,
I'm developing the system with Java but I'm not bounded to a specific technology.
I'd be glad if you could help me with the general design of the system.
Thanks a lot!
As Peter said, it really depends on the use case. Some general remarks though:
If a worker is slower than the other, maybe create more instances of that type; eg Kubernetes allows dynamic Node creation, and Kafka allows to partition a topic so more than one instance can read off and process it.
If B depends on A and A is down, B can't work and that's it. Maybe restart A? Maybe you can do a regular health check on it.
If the final worker needs the results of A, B and C, how would it process without C being available? If it can, it can store the results of A and B, install a timer, and if that goes off without C having arrived, continue.
Some additional thoughts:
If you mean to say that some subtasks of the overall application are quicker to execute than others, then it can be a good idea to slice up the application so that each worker is doing a bit of everything -- in other words, a share of the quick work and a share of the slow work. But if you mean to say that some machines are slower than others, then you could run fewer workers on the slow machines, and more on the faster ones, so as to balance things so that each worker has roughly the same resources.
You might want to decouple your architecture with some sort of durable queueing between the workers.
It's common to use heartbeats with timeouts and restarts.
Distributed stream processing quickly becomes very complex. Your life will be much easier if you build on top a stream processing framework that provides high availability and exactly-once semantics out of the box.
I am a beginner in Multi threading and have this one doubt:
Is there any other alternative for traditional Synchronisation(which uses synchronised keywords) in java,since it affects the performance of the application?
As others have indicated, it depends on what you're trying to avoid, as well as what you're trying to achieve with multithreading.
If you mean "is there a zero-overhead way to do multithreading with shared resources," the answer is almost certainly "no." If two cars going in different directions approach an intersection at the same time, one of them will have to wait for the other one - there's no way that the cars can occupy the same space at the same time. That's why we have stop signs and traffic lights. (Alternatively, there are things like traffic circles, but even those have some overhead - you really can't just go through them at full speed as if they weren't there).
There are lots of ways of doing asynchronous and parallel operations other than using that type of synchronization:
Non-blocking I/O. The argument here is that, when you're interacting with a server or slow I/O device or something, most of the time is spent waiting for a response from the device or server, so you really don't need multiple threads to handle that - you just need to allow the original thread to do other work while it's waiting for a response. My usual analogy here is: suppose you go out to eat with a group of 10 people. When the waiter comes to take orders, the first person he asks to order isn't ready yet. The sensible thing to do, of course, is for the waiter to take other people's orders first, and then to come back to the first guy. There's no need to bring in separate waiters for each person's orders, bring in another waiter to wait for the first guy, or anything like that.
Promise/futures based async
Event-driven async
Using immutable data structures to minimize the amount of shared resources.
There are, of course, a lot of types of locking and synchronization mechanisms available other than just the synchronized keywords, such as counting semaphores, reader-writer locks, etc.
There are a lot of other types of concurrency as well, such as the actor model.
When used properly, these can help minimize your overhead and possibly reduce the amount of explicit locking and synchronization required. They all have overhead, though.
TL;DR You have overhead no matter what you do - just select the design and primitives that result in the smallest overhead for your particular use case.
You can look for ReentrantLock and ReentrantReadWriteLock.
I am wondering how I would go about doing this. Say I load a list of 1,000 words and for each word a thread is created and say it does a google search on each word. The problem here is obvious. I can't have 1k threads, can I. Keep in mind I am extremely new to threads and synchronization. So basically I am wondering how I would go about using less threads. I assume I have to set thread amount to a fixed number and synchronize the threads. Was wondering how to do this with Apache HttpClient using GetThread and then run it. In run I'm getting the data from webpage and turning it into a String and then checking if it contains a certain word.
Surely you can have as many threads as you want. But in general it is not recommended to use more threads than there are processing cores on your computer.
And don't forget that creating 1000 internet sessions at once affects your networking. A size of one single google page is nearly 0.3 megabytes. Are you really going to download 300 megabytes of data at once?
By the way,
There is a funny thing about concurrency.
Some people say: "synchronization is like concurrency". It is not true.
Synchronization is the opposite of concurrency.
Concurrency is when lots of things happen in parallel.
Synchronization is when I am blocking you.
(Joshua Bloch)
Maybe you can look at this problem this way.
You have 1000 words and for each word you are going to carry out a search.
In other words there are 1000 tasks to be executed and they are not related
to each other, so there is no need for synchronization in the case of this
problem as per the following definition from Wiki.
"In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity"
So in this problem you do not have to synchronize the 1000 processes which
execute the word searches since they can run independently and dont need
to join forces. So it is not a Process synchronization.
It is not a Data synchronization either since the data of each search is
independent of the other 999 searches.
Hence when Joshua says Synchronization is when I am blocking you, there is no need of blocking in this case.
Yes all tasks can concurrently get executed in different threads.
Of course your system may not have the resources to run 1000 threads
concurrently ( read same time ).
So you need concepts like pools where a pool has a certain no of
threads...say if it has 10 threads...then those 10 will start
10 independent searches on 10 words from your list.
If any of them is done with its task then it will take up the next
word search task available and the process goes on....
I read app engine wiki, on datastore contention if too frequent write
more than 5 times in 1 seconds. The wiki introduced use "shard"
approach" as workaround. May i know if we use spring #transactional
on this, this can prevent datastore contention timeout right since
writing in done concurrently ?
No, you can't do that. Whether or not you use #transactional, it will not make the problem go away - the fact that you have one object that you need to keep on writing to. The contention limit will continue to remain whatever approach you use.
The answer to this problem is actually deciding what it is you want to do, and how important accuracy is to you. Take the case of a simple counter, which is a common example of this problem. If you think accuracy is very important, then you will have to have a list of counters that you choose either sequentially, or at random, and write into. If you have ten counters in this list, then that gives you then times more writes per second, even transactional writes. You need to write code to choose which counters you want to write to, though.
On the other had, if you don't require too much precision, you could try writing to memcache very often. The write contention limits are much higher when writing to memcache or incrementing a counter there. You can then write out and reset the counter at a set interval.
When I was on a project that needed to store a lot of individual records to DB I found the system could not handle all the concurrent transactions. I instead build the object in memory and then all at once saved it to the DB.