I have a class HostServer which contains my instances of an other kind of servers (actually Minecraft servers) and this host class may be created more than a 100 times (depending on the user). So I wanted to know, what would be a good way to handle concurrency for a considerable amount of instances (for the host class)? I have thought that I could use locks (ReentrantReadWriteLock), but it may be quite heavy if there is a lot of instances.
Thank you for your answers
EDIT (answers to the comments):
I actually need to share the resources because there is an updater thread and the others just read the data.
My needs are to have most recent data, so I need to handle the fact that if a thread reads and an other ones writes, the write has to come first. Although, I don't want the code to be too heavy because I may have a lot of instances (for the host class, from 10 to a 1000 and for the Minecraft server class, from 20 to 10000).
The actual code: https://github.com/devcreart/GameStack/blob/develop/server/src/main/java/fr/creart/gamestack/server/server/HostServer.java
Thanks again
Premature optimization is a root of evil.
If you don't know exactly that you have a performance problem just do the simple solution.
If you have some requirements or some assumptions then try to create load test and do a conclusion after it.
But if you have a performance problem right now try to move out a shared state to some non-blocking data structure.
Related
I have been looking at an issue where a main workflow table with a relevant dao (which is hit lots by multiple threads) is struggling to keep up with requests in a fair way - almost like resource starvation.
The threads all are responsible for pulling data (around 5 of them) from various external systems.
The issue here is that when a thread gets so much information at once - it hammers requests to the table which leaves the others competing for access / resource. As such, typically they time out and need to be restarted.
Are there any mechanisms or strategies to manage this kind of thing. I was thinking off the top of my head (this is my first initial thought) to create some form of blocking list which all the threads can add too (on a first come first served basis maybe) and then filter through the SimpleJdbcOperations that way.
I would be open to any theories for solving such a problem that are considered standard for this kind of problem.
Thanks
EDIT: This question might be appropriate for other languages as well - the overall theory behind it seems mostly language agnostic. However, as this will run in a JVM, I'm sure there's differences between JVM overheads/threading and those of other environments.
EDIT 2: To clarify a little better, I guess the main question is which is better for scalability: to have smaller threads that can return quicker to enable processing other chunks of work for other workloads, or try to get a single workload through as quickly as possible? The workloads are sequential and multithreading won't help speed up a single unit of work in this case; it's more in hopes of increasing the throughput of the system overall (thanks to Uri for leading me towards the clarification).
I'm working on a system that's replacing an existing system; the current system has a pretty heavy load, so we already know the replacement needs to be highly scalable. It communicates with several outside processes, such as email, other services, databases, etc., and I'm already planning on making it multithreaded to help with scaling. I've worked on multithreaded apps before, just nothing with this high of a performance/scalability requirement, so I don't have much experience when it comes to getting the absolute most out of concurrency.
The question I have is what's the best way to divide the work up between threads? I'm looking at two different versions, one that creates a single thread for the full workflow, and another that creates a thread for each of the individual steps, continuing on to the next step (in a new/different thread) when the previous step completes - probably with a NodeJS-style callback system, but not terribly concerned about the direct implementation details.
I don't know much about the nitty-gritty details of multithreading - things like context switching, for example - so I don't know if the overhead of multiple threads would swamp the execution time in each of the threads. On one hand, the single thread model seems like it would be fastest for an individual work flow compared to the multiple threads; however, it would also tie up a single thread for the entire workflow, whereas the multiple threads would be shorter lived and would return to the pool quicker (I imagine, at least).
Hopefully the underlying concept is easy enough to understand; here's a contrived pseudo-code example though:
// Single-thread approach
foo();
bar();
baz();
Or:
// Multiple Thread approach
Thread.run(foo);
when foo.isDone()
Thread.run(bar);
when bar.isDone()
Thread.run(baz);
UPDATE: Completely forgot. The reason I'm considering the multithreaded approach is the (possibly mistaken) belief that, since the threads will have smaller execution times, they'll be available for other instances of the overall workload. If each operation takes, say 5 seconds, then the single-thread version locks up a thread for 15 seconds; the multiple thread version would lock up a single thread for 5 seconds, and then it can be used for another process.
Any ideas? If there's anything similar out there in the interwebs, I'd love even a link - I couldn't think of how to search for this (I blame Monday for that, but it would probably be the same tomorrow).
Multithreading is not a silver bullet. It's means to an end.
Before making any changes, you need to ask yourself where your bottlenecks are, and what you're really trying to parallelize. I'm not sure that without more information that we can give good advice here.
If foo, bar, and baz are part of a pipeline, you're not necessarily going to improve the overall latency of a single sequence by using multiple threads.
What you might be able to do is to increase your throughput by letting multiple executions of the pipeline over different input pieces work in parallel, by letting later items to travel through the pipeline while earlier items are blocked on something (e.g., I/O). For instance, if bar() for a particular input is blocked and waiting on a notification, it's possible that you could do computationally heavy operations on another input, or have CPU resources to devote to foo(). A particularly important question is whether any of the external dependencies act as a limited shared resource. e.g., if one thread is accessing system X, is another thread going to be affected?
Threads are also very effective if you want to divide and conquer your problem - splitting your input into smaller parts, running each part through the pipeline, and then waiting on all the pieces to be ready. Is that possible with the kind of workflow you're looking at?
If you need to first do foo, then do bar, and then do baz, you should have one thread do each of these steps in sequence. This is simple and makes obvious sense.
The most common case where you're better off with the assembly line approach is when keeping the code in cache is more important than keeping the data in cache. In this case, having one thread that does foo over and over can keep the code for this step in cache, keep branch prediction information around, and so on. However, you will have data cache misses when you hand the results of foo to the thread that does bar.
This is more complex and should only be attempted if you have good reason to think it will work better.
Use a single thread for the full workflow.
Dividing up the workflow can't improve the completion time for one piece of work: since the parts of the workflow have to be done sequentially anyway, only one thread can work on the piece of work at a time. However, breaking up the stages can delay the completion time for one piece of work, because a processor which could have picked up the last part of one piece of work might instead pick up the first part of another piece of work.
Breaking up the stages into multiple threads is also unlikely to improve the time to completion of all your work, relative to executing all the stages in one thread, since ultimately you still have to execute all the stages for all the pieces of work.
Here's an example. If you have 200 of these pieces of work, each requiring three 5 second stages, and say a thread pool of two threads running on two processors, keeping the entire workflow in a single thread results in your first two results after 15 seconds. It will take 1500 seconds to get all your results, but you only need the working memory for two of the pieces of work at a time. If you break up the stages, then it may take a lot longer than 15 seconds to get your first results, and you potentially may need memory for all 200 pieces of work proceeding in parallel if you still want to get all the results in 1500 seconds.
In most cases, there are no efficiency advantages to breaking up sequential stages into different threads, and there may be substantial disadvantages. Threads are generally only useful when you can use them to do work in parallel, which does not seem to be the case for your work stages.
However, there is a huge disadvantage to breaking up the stages into separate threads. That disadvantage is that you now need to write multithreaded code that manages the stages. It's extremely easy to write bugs in such code, and such bugs can be very difficult to catch prior to production deployment.
The way to avoid such bugs is to keep the threading code as simple as possible given your requirements. In the case of your work stages, the simplest possible threading code is none at all.
I know that thread safety of java sockets has been discussed in several threads here on stackoverflow, but I haven't been able to find a clear answer to this question - Is it, in practice, safe to have multiple threads concurrently write to the same SocketOutputStream, or is there a risk that the data sent from one thread gets mixed up with the data from another tread? (For example the receiver on the other end first receives the first half of one thread's message and then some data from another thread's message and then the rest of the first thread's message)
The reason I said "in practice" is that I know the Socket class isn't documented as thread-safe, but if it actually is safe in current implementations, then that's good enough for me. The specific implementation I'm most curious about is Hotspot running on Linux.
When looking at the Java layer of hotspot's implementation, more specifically the implementation of socketWrite() in SocketOutputStream, it looks like it should be thread safe as long as the native implementation of socketWrite0() is safe. However, when looking at the implemention of that method (j2se/src/solaris/native/java/net/SocketOutputStream.c), it seems to split the data to be sent into chunks of 64 or 128kb (depending on whether it's a 64bit JVM) and then sends the chunks in seperate writes.
So - to me, it looks like sending more than 64kb from different threads is not safe, but if it's less than 64kb it should be safe... but I could very well be missing something important here. Has anyone else here looked at this and come to a different conclusion?
I think it's a really bad idea to so heavily depend on the implementation details of something that can change beyond your control. If you do something like this you will have to very carefully control the versions of everything you use to make sure it's what you expect, and that's very difficult to do. And you will also have to have a very robust test suite to verify that the multithreaded operatio functions correctly since you are depending on code inspection and rumors from randoms on StackOverflow for your solution.
Why can't you just wrap the SocketOutputStream into another passthrough OutputStream and then add the necessary synchronization at that level? It's much safer to do it that way and you are far less likely to have unexpected problems down the road.
According to this documentation http://www.docjar.com/docs/api/java/net/SocketOutputStream.html, the class does not claim to be thread safe, and thus assume it is not. It inherits from FileOutputStream, which normally file I/O is not inherently thread safe.
My advice is that if the class is related to hardware or communications, it is not thread safe or "blocking". The reason is thread safe operations consume more time, which you may not like. My background is not in Java but other libraries are similar in philosophy.
I notice you tested the class extensively, but you may test it all day for many days, and it may not prove anything, my 2-cents.
Good luck & have fun with it.
Tommy Kwee
I am developing a text-based game, MUD. I have the base functions of the program ready, and now I would like to allow to connect more than one client at a time. I plan to use threads to accomplish that.
In my game I need to store information such as current position or health points for each player. I could hold it in the database, but as it will change very quick, sometimes every second, the use of database would be inefficient (am I right?).
My question is: can threads behave as "sessions", ie hold some data unique to each user?
If yes, could you direct me to some resources that I could use to help me understand how it works?
If no, what do you suggest? Is database a good option or would you recommend something else?
Cheers,
Eleeist
Yes, they can, but this is a mind-bogglingly stupid way to do things. For one thing, it permanently locks you into a "one thread per client" model. For another thing, it makes it difficult (maybe even impossible) to implement interactions between users, which I'm sure your MUD has.
Instead, have a collection of some kind that stores your users, with data on each user. Save persistent data to the database, but you don't need to update ephemeral data on every change.
One way to handle this is to have a "changed" boolean in each user. When you make a critical change to a user, write them to the database immediately. But if it's a routine, non-critical change, just set the "changed" flag. Then have a thread come along every once in a while and write out changed users to the database (and clear the "changed" flag).
Use appropriate synchronization, of course!
A Thread per connection / user session won't scale. You can only have N number of threads active where N is equal to the number of physical cores / processors your machine has. You are also limited by the amount of memory in your machine for how many threads you can create a time, some operating systems just put arbitrary limits as well.
There is nothing magical about Threads in handling multiple clients. They will just make your code more complicated and less deterministic and thus harder to reason about what is actually happening when you start hunting logic errors.
A Thread per connection / user session would be an anti-pattern!
Threads should be stateless workers that pull things off concurrent queues and process the data.
Look at concurrent maps for caching ( or use some appropriate caching solution ) and process them and then do something else. See java.util.concurrent for all the primitive classes you need to implement something correctly.
Instead of worrying about threads and thread-safety, I'd use an in-memory SQL database like HSQLDB to store session information. Among other benefits, if your MUD turns out to be the next Angry Birds, you could more easily scale the thing up.
Definitely you can use threads as sessions. But it's a bit off the mark.
The main point of threads is the ability of concurrent, asynchronous execution. Most probably, you don't want events received from your MUD clients to happen in an parallel, uncontrolled order.
To ensure consistency of the world I'd use an in-memory database to store the game world. I'd serialize updates to it, or at least some updates to it. Imagine two players in parallel hitting a monster with HP 100. Each deals 100 damage. If you don't serialize the updates, you could end up giving credit for 100 damage to both players. Imagine two players simultaneously taking loot from the monster. Without proper serialization they could end up each with their own copy of the loot.
Threads, on the other hand, are good for asynchronous communication with clients. Use threads for that, unless something else (like a web server) does that for you already.
ThreadLocal is your friend! :)
http://docs.oracle.com/javase/6/docs/api/java/lang/ThreadLocal.html
ThreadLocal provides storage on the Thread itself. So the exact same call from 2 different threads will return/store different data.
The biggest danger is having a leak between Threads. You would have to be absolutely sure that if a different user used a Thread that someone else used, you would reset/clear the data.
Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
There are 20+ tasks defined in that application, each one may depends on other ones or not.
The application execute tasks one by one, the whole application runs in a single thread.
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.
Multi-threading will improve your performance but there are a couple of things you need to know:
Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
Upload the data in chunks and commit once in a while to avoid accumulating huge rollback/undo tables.
Cut tasks into several work units where each unit does one job.
To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
What you should do:
One (!) thread to read the file and create "jobs" out of it. Each job should contains a small, but not too small "unit of work". Push those into a queue
The next thread(s) wait(s) for jobs in the queue and do the calculations. This can happen while the threads in step #1 wait for the slow hard disk to return the new lines of data. The result of this conversion step goes into the next queue
One or more threads to upload the data via JDBC.
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
Using different worker threads gives you lots of advantages:
It's easy to test each thread separately. Since they don't share data, you need no synchronization. The queues will do that for you
You can quickly change the number of threads for each step to tweak performance
Multi threading may be of help, if the lines are uncorrelated, you may start off two processes one reading even lines, another uneven lines, and get your db connection from a connection pool (dbcp) and analyze performance. But first I would investigate whether jdbc is the best approach normally databases have optimized solutions for imports like this. These solutions may also temporarily switch of constraint checking of your table, and turn that back on later, which is also great for performance. As always depending on your requirements.
Also you may want to checkout springbatch which is designed for batch processing.
As far as I know,the JDBC Bridge uses synchronized methods to serialize all calls to ODBC so using mutliple threads won't give you any performance boost unless it boosts your application itself.
I am not all that familiar with JDBC but regarding the multithreading bit of your question, what you should keep in mind is that parallel processing relies on effectively dividing your problem into bits that are independent of one another and in some way putting them back together (their output that is). If you dont know the underlying dependencies between tasks you might end up having really odd errors/exceptions in your code. Even worse, it might all execute without any problems, but the results might be off from true values. Multi-threading is tricky business, in a way fun to learn (at least I think so) but pain in the neck when things go south.
Here are a couple of links that might provide useful:
Oracle's java trail: best place to start
A good tutorial for java concurrency
an interesting article on concurrency
If you are serious about putting effort to getting into multi-threading I can recommend GOETZ, BRIAN: JAVA CONCURRENCY, amazing book really..
Good luck
I had a similar task. But in my case, all the tables were unrelated to each other.
STEP1:
Using SQL Loader(Oracle) for uploading data into database(very fast) OR any similar bulk update tools for your database.
STEP2:
Running each uploading process in a different thread(for unrelated tasks) and in a single thread for related tasks.
P.S. You could identify different inter-related jobs in your application and categorize them in groups; and running each group in different threads.
Links to run you up:
JAVA Threading
follow the last example in the above link(Example: Partitioning a large task with multiple threads)
SQL Loader can dramatically improve performance
The fastest way I've found to insert large numbers of records into Oracle is with array operations. See the "setExecuteBatch" method, which is specific to OraclePreparedStatement. It's described in one of the examples here:
http://betteratoracle.com/posts/25-array-batch-inserts-with-jdbc
If Multi threading would complicate your work, you could go with Async messaging. I'm not fully aware of what your needs are, so, the following is from what I am seeing currently.
Create a file reader java whose purpose is to read the biz file and put messages into the JMS queue on the server. This could be plain Java with static void main()
Consume the JMS messages in the Message driven beans(You can set the limit on the number of beans to be created in the pool, 50 or 100 depending on the need) if you have mutliple servers, well and good, your job is now split into multiple servers.
Each row of data is asynchronously split between 2 servers and 50 beans on each server.
You do not have to deal with threads in the whole process, JMS is ideal because your data is within a transaction, if something fails before you send an ack to the server, the message will be resent to the consumer, the load will be split between the servers without you doing anything special like multi threading.
Also, spring is providing spring-batch which can help you. http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html#springBatchUsageScenarios