Assign few connections from a connection pool to each specific task

Assign few connections from a connection pool to each specific task - java

I have a connection pool with 50 connections.
I want to dedicate
10 among these to task A
20 for task B
Is this a good practice? Is this possible in java while creating connection pools or thread pools?

Sharing resources between different participants is a good practice.
It is also a well-established practice to not do that blindly, but in context of priorities/goals/policies. Very often that happens on a higher level (think of load balancing); but of course, you could also built that into your application directly.
But to my knowledge, there are no simple mechanism in the (standard) java libraries to do that.
Long story short: if you want to use something like this; then you might have to step back and implement your own solution. In other words: you create your own connection pool that knows about "different" tasks; and that allows you to give priorities to them; and then the pool decides on such policies who will be served next.
On the downside: implementing something like this can turn out pretty complicated pretty quickly. Thus my first advise: go for two independent pools; make experiments and see how things work out for you. Only when you find that this solution is too inefficient for you; then start looking into your building your "load balancing"!

Related

Lightweight way to handle concurrency for a lot of instances

I have a class HostServer which contains my instances of an other kind of servers (actually Minecraft servers) and this host class may be created more than a 100 times (depending on the user). So I wanted to know, what would be a good way to handle concurrency for a considerable amount of instances (for the host class)? I have thought that I could use locks (ReentrantReadWriteLock), but it may be quite heavy if there is a lot of instances.
Thank you for your answers
EDIT (answers to the comments):
I actually need to share the resources because there is an updater thread and the others just read the data.
My needs are to have most recent data, so I need to handle the fact that if a thread reads and an other ones writes, the write has to come first. Although, I don't want the code to be too heavy because I may have a lot of instances (for the host class, from 10 to a 1000 and for the Minecraft server class, from 20 to 10000).
The actual code: https://github.com/devcreart/GameStack/blob/develop/server/src/main/java/fr/creart/gamestack/server/server/HostServer.java
Thanks again

Premature optimization is a root of evil.
If you don't know exactly that you have a performance problem just do the simple solution.
If you have some requirements or some assumptions then try to create load test and do a conclusion after it.
But if you have a performance problem right now try to move out a shared state to some non-blocking data structure.

Java connection thread pool AND connectionfactory?

I think somebody is confusing their patterns. I've got one guy telling me to use thread pooling and another telling me to use a ConnectionFactory (granted the second guy is not a software engineer, but a very knowledgeable system architect). I'm going to use thread pooling, so we can keep the number of connections to a reasonable number of threads. I've looked all over the internet and I cannot see anywhere where anyone is using both together. I'm thinking about dumping the ConnectionFactory, because it seems redundant at the very least and I just cannot see why or how to use both.
Just curious to see if somebody more knowledgeable than me has ever seen the two used together and can enlighten me as to why.
Also, each connection has to have its own instance of several other classes and we are using a pub-sub architecture. I need to make sure that the subscribers are NOT getting a published message that belongs to another connection. Can I manage that with a ConnectionFactory or do I absolutely need to use a new thread to ensure separation between connection processes?
Just looking for some direction here.
Thank you.

In general Factory pattern is how to create an object. So ConnectionFactory pattern abstracts the way Conncetion is created.
Thread pool abstracts the way threads are managed, i.e. the main things are: when they are started, how many of threads are runnable, their scheduling, their stopping - not creation process!
You can use both this patterns. Your pool can use factory to properly create thread or connection objects.

Is it bad practise to utilize many threads? (through SwingWorkers)

My Java (Swing) application creates a new SwingWorker object when it needs to (e.g) download data from the Internet and do something at the same time (think display a loader). However, monitoring the threads created, this can quickly reach ~100 threads.
Is this bad practice? If yes; what's the proper way to do it? Doesn't the GC automatically clean up unused threads?

Yes it is a bad practice when you put no upper bound on the number of threads (or generally resources).
In this case you better use a thread pool which contains at most a specific number of threads (say for example 25). You can either create them all at startup, or create them lazily on demand.
Implement a simple request manager system for the pool, which gives to the requesters the resources (or in case of running out of resources, queues them or simply denies them).
In this way, cleaning them in the end will also be easy and obvious.

Tutorial about Using multi-threading in jdbc

Our company has a Batch Application which runs every day, It does some database related jobs mostly, import data into database table from file for example.
There are 20+ tasks defined in that application, each one may depends on other ones or not.
The application execute tasks one by one, the whole application runs in a single thread.
It takes 3~7 hours to finish all the tasks. I think it's too long, so I think maybe I can improve performance by multi-threading.
I think as there is dependency between tasks, it not good (or it's not easy) to make tasks run in parallel, but maybe I can use multi-threading to improve performance inside a task.
for example : we have a task defined as "ImportBizData", which copy data into a database table from a data file(usually contains 100,0000+ rows). I wonder is that worth to use multi-threading?
As I know a little about multi-threading, I hope some one provide some tutorial links on this topic.

Multi-threading will improve your performance but there are a couple of things you need to know:
Each thread needs its own JDBC connection. Connections can't be shared between threads because each connection is also a transaction.
Upload the data in chunks and commit once in a while to avoid accumulating huge rollback/undo tables.
Cut tasks into several work units where each unit does one job.
To elaborate the last point: Currently, you have a task that reads a file, parses it, opens a JDBC connection, does some calculations, sends the data to the database, etc.
What you should do:
One (!) thread to read the file and create "jobs" out of it. Each job should contains a small, but not too small "unit of work". Push those into a queue
The next thread(s) wait(s) for jobs in the queue and do the calculations. This can happen while the threads in step #1 wait for the slow hard disk to return the new lines of data. The result of this conversion step goes into the next queue
One or more threads to upload the data via JDBC.
The first and the last threads are pretty slow because they are I/O bound (hard disks are slow and network connections are even worse). Plus inserting data in a database is a very complex task (allocating space, updating indexes, checking foreign keys)
Using different worker threads gives you lots of advantages:
It's easy to test each thread separately. Since they don't share data, you need no synchronization. The queues will do that for you
You can quickly change the number of threads for each step to tweak performance

Multi threading may be of help, if the lines are uncorrelated, you may start off two processes one reading even lines, another uneven lines, and get your db connection from a connection pool (dbcp) and analyze performance. But first I would investigate whether jdbc is the best approach normally databases have optimized solutions for imports like this. These solutions may also temporarily switch of constraint checking of your table, and turn that back on later, which is also great for performance. As always depending on your requirements.
Also you may want to checkout springbatch which is designed for batch processing.

As far as I know,the JDBC Bridge uses synchronized methods to serialize all calls to ODBC so using mutliple threads won't give you any performance boost unless it boosts your application itself.

I am not all that familiar with JDBC but regarding the multithreading bit of your question, what you should keep in mind is that parallel processing relies on effectively dividing your problem into bits that are independent of one another and in some way putting them back together (their output that is). If you dont know the underlying dependencies between tasks you might end up having really odd errors/exceptions in your code. Even worse, it might all execute without any problems, but the results might be off from true values. Multi-threading is tricky business, in a way fun to learn (at least I think so) but pain in the neck when things go south.
Here are a couple of links that might provide useful:
Oracle's java trail: best place to start
A good tutorial for java concurrency
an interesting article on concurrency
If you are serious about putting effort to getting into multi-threading I can recommend GOETZ, BRIAN: JAVA CONCURRENCY, amazing book really..
Good luck

I had a similar task. But in my case, all the tables were unrelated to each other.
STEP1:
Using SQL Loader(Oracle) for uploading data into database(very fast) OR any similar bulk update tools for your database.
STEP2:
Running each uploading process in a different thread(for unrelated tasks) and in a single thread for related tasks.
P.S. You could identify different inter-related jobs in your application and categorize them in groups; and running each group in different threads.
Links to run you up:
JAVA Threading
follow the last example in the above link(Example: Partitioning a large task with multiple threads)
SQL Loader can dramatically improve performance

The fastest way I've found to insert large numbers of records into Oracle is with array operations. See the "setExecuteBatch" method, which is specific to OraclePreparedStatement. It's described in one of the examples here:
http://betteratoracle.com/posts/25-array-batch-inserts-with-jdbc

If Multi threading would complicate your work, you could go with Async messaging. I'm not fully aware of what your needs are, so, the following is from what I am seeing currently.
Create a file reader java whose purpose is to read the biz file and put messages into the JMS queue on the server. This could be plain Java with static void main()
Consume the JMS messages in the Message driven beans(You can set the limit on the number of beans to be created in the pool, 50 or 100 depending on the need) if you have mutliple servers, well and good, your job is now split into multiple servers.
Each row of data is asynchronously split between 2 servers and 50 beans on each server.
You do not have to deal with threads in the whole process, JMS is ideal because your data is within a transaction, if something fails before you send an ack to the server, the message will be resent to the consumer, the load will be split between the servers without you doing anything special like multi threading.
Also, spring is providing spring-batch which can help you. http://docs.spring.io/spring-batch/reference/html/spring-batch-intro.html#springBatchUsageScenarios

How many JDBC connections in Java?

I have a Java program consisting of about 15 methods. And, these methods get invoked very frequently during the exeuction of the program. At the moment, I am creating a new connection in every method and invoking statements on them (Database is setup on another machine on the network).
What I would like to know is: Should I create only one connection in the main method and pass it as an argument to all the methods that require a connection object since it would significantly reduce the number of connections object in the program, instead of creating and closing connections very frequently in every method.
I suspect I am not using the resources very efficiently with the current design, and there is a lot of scope for improvement, considering that this program might grow a lot in the future.

Yes, you should consider re-using connections rather than creating a new one each time. The usual procedure is:
make some guess as to how many simultaneous connections your database can sensibly handle (e.g. start with 2 or 3 per CPU on the database machine until you find out that this is too few or too many-- it'll tend to depend on how disk-bound your queries are)
create a pool of this many connections: essentially a class that you can ask for "the next free connection" at the beginning of each method and then "pass back" to the pool at the end of each method
your getFreeConnection() method needs to return a free connection if one is available, else either (1) create a new one, up to the maximum number of connections you've decided to permit, or (2) if the maximum are already created, wait for one to become free
I'd recommend the Semaphore class to manage the connections; I actually have a short article on my web site on managing a resource pool with a Semaphore with an example I think you could adapt to your purpose
A couple of practical considerations:
For optimum performance, you need to be careful not to "hog" a connection while you're not actually using it to run a query. If you take a connection from the pool once and then pass it to various methods, you need to make sure you're not accidentally doing this.
Don't forget to return your connections to the pool! (try/finally is your friend here...)
On many systems, you can't keep connections open 'forever': the O/S will close them after some maximum time. So in your 'return a connection to the pool' method, you'll need to think about 'retiring' connections that have been around for a long time (build in some mechanism for remembering, e.g. by having a wrapper object around an actual JDBC Connection object that you can use to store metrics such as this)
You may want to consider using prepared statements.
Over time, you'll probably need to tweak the connection pool size

You can either pass in the connection or better yet use something like Jakarta Database Connection Pooling.
http://commons.apache.org/dbcp/

You should use a connection pool for that.
That way you could ask for the connection and release it when you are finish with it and return it to the pool
If another thread wants a new connection and that one is in use, a new one could be created. If no other thread is using a connection the same could be re-used.
This way you can leave your app somehow the way it is ( and not passing the connection all around ) and still use the resources properly.
Unfortunately first class ConnectionPools are not very easy to use in standalone applications ( they are the default in application servers ) Probably a microcontainer ( such as Sping ) or a good framework ( such as Hibernate ) could let you use one.
They are no too hard to code one from the scratch though.
:)
This google search will help you to find more about how to use one.
Skim through

Many JDBC drivers do connection pooling for you, so there is little advantage doing additional pooling in this case. I suggest you check the documentation for you JDBC driver.
Another approach to connection pools is to
Have one connection for all database access with synchronised access. This doesn't allow concurrency but is very simple.
Store the connections in a ThreadLocal variable (override initialValue()) This works well if there is a small fixed number of threads.
Otherwise, I would suggest using a connection pool.

If your application is single-threaded, or does all its database operations from a single thread, it's ok to use a single connection. Assuming you don't need multiple connections for any other reason, this would be by far the simplest implementation.
Depending on your driver, it may also be feasible to share a connection between threads - this would be ok too, if you trust your driver not to lie about its thread-safety. See your driver documentation for more info.
Typically the objects below "Connection" cannot safely be used from multiple threads, so it's generally not advisable to share ResultSet, Statement objects etc between threads - by far the best policy is to use them in the same thread which created them; this is normally easy because those objects are not generally kept for too long.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.