I have to create an mysql database to be used by several applications in parallel for the first time. Up until this point my only experience with mysql databases have been single programs (for example webservers) querying the database.
Now i am moving into a scenario where i will have several CXF java servlet type programs, as well as a background server editing and reading on the same schemas.
I am using the Connector/J JDBC driver to connect to the database in all instances.
My question is this: What do i need to do in order to make sure that the parallel access does not become a problem. I realize that i need to use transactions where appropriate, but where i am truly lost is in the management.
For example.
Do i need to close the connection every time a servlet is done with a job?
Do i need a unique user for each program accessing the database?
Do i have to do something with my Connector/J objects?
Do i have to declare my tables in a different way?
Did i miss anything or is there something i failed to think about?
I have a pretty good idea about how to handle transactions and the SQL itself, but i am pretty lost when it comes to what i need to do when setting up my database.
You should maintain a pool of connections. Connections are really expensive to create think on the order of of several hundred milliseconds. So for high volume apps it makes sense to cache and reuse them.
For your servlet it depends on what container you are using. Something like JBoss will provide pooling as part of the container. It can be defined through the datasource definition and accessed through JNDI. Other containers like tomcat may rely on something like C3PO.
Most of these frameworks return custom implementations of JDBC connections that implement the close() methods with logic that returns the connection to the pool. You should familiarize yourself with the details of your concrete implementation to make sure you are doing things in a way that is supported
As for the concurrency considerations, you should familiarize yourself with concepts of optimistic/pessimistic locking and transaction isolation levels. These have trade offs where the correct answer can only be determined given the operational context of your application.
Considering the user, Most applications have one user that represents the application called the read/write user. This user should only have privilege to read and write records from the tables,indices,sequences, etc. that are associated with your application. All the instances of the application will specify this user in their connection string.
If you familiarize yourself with the concepts above, you'll be about 95% of the way there.
One more thing. As pointed out in the comments on the administration side your database engine is a huge consideration. You should familiarize yourself with the differences and the tuning/configuration options.
Related
Referring to similar question :
Pattern for connecting to different databases using JDBC
I am using different Connection Strings/Drivers for each database.This is what I am doing, not very sure if it's the most efficient way to do it:
Create separate classes for each db's Connection with a getConnection(String URl,String userid,String password) method in it
In main class get connection object for DB1,DB2,DB3, open connections
Fetch data from DB1, write it to a flat file, repeat for DB2 and DB3
Close all three connections.
NOTE:I read about using Spring/Hibernate/DataSources/ConnectionPooling Dont know what shoud be the best option
The way I understand it is that you want your application to run some (SELECT?) queries on different databases and dump the results. I presume this is a part of a larger application since otherwise you would probably get results quicker by simply writing a command-line script that automates the client tools for the specific databases.
Hibernate, Data Sources (in the Java DataSource object sense) and Connection Pooling won't solve your problem - I guess it's the same for Spring but I don't know which part of Spring you're referring to. The reason for this is that they all are designed to abstract over a single (or a pool/collection of connections) to a single database - connection pooling simply allows you to keep a pool of ready-to-use (TCP) connections to a given database in order to improve performance, for example by avoiding connection and authentication overhead. Hibernate does the same in the sense that it abstracts a connection to a single database (and can use connection pooling for performance reasons on top of that).
I would suggest to maybe take a different approach to thinking about your problem:
Since you want to run some queries on some datasource and write the results to some destination, why don't you start your design this way: Come up with an interface/class DataExtractionTask that requires a database connection, a set of queries to run and some output stream. Instead of using java.sql.Connection directly you could choose some framework to make your life easier, there are heavy-weights like Hibernate and light-weights like jdbi. Then come up with code that establishes your database connection, decides which queries to run and the outputs to write to and feed all of that into your thought-out DataExtractionTask to run the logic of processing (orchestrating the individual parts).
Once you have the basic stuff in place you can add other features on top of it, you could make it configurable, you could choose to run multiple DataExtractionTasks in parallel instead of sequentially, et cetera.
This way you can generalize the processing logic and then focus on getting everything (database connections, query definitions, etc.) ready for processing. I realize that this is very broad-picture but maybe it makes things a bit easier.
Regarding efficiency: If you mean high performance (relative terms!), the best way would be what #Elliott Frisch wrote -- keeping it all in a single database that you connect to using a single connection pool.
You don't need to use separate classes just for connecting, just build up a util class which holds all the JDBC URLs and obtain a connection from it.
Besides that, you should consider using JPA instead, which you can do as well in Java SE as in Java EE. With that, you can abstract from the low level connection and define a named datasource. See for example this Oracle tutorial.
My requirement is to share a java object across a cluster.
I get Confused
whether to write an EJB and share the java objects across the cluster
or
to use any third party such as infinispan or memecached or terracotta or
what about JCache?
with the constraint that
I can't change any of my source code with specific to any application
server (such as implementing the weblogic's singleton services).
I can't offer two builds for cluster and non cluster environment.
Performance should not be downgraded.
I am looking for only open source third party if I need to use it.
It need to work in weblogic , Websphere , Jbos and Tomcat too.
Can any one come up with the best option with these constraints in mind.
It can depend on the use case of the objects you want to share in the cluster.
I think it comes down to really the following options in most complex to least complex
Distributed cacheing
http://www.ehcache.org
Distributed cacheing is good if you need to ensure that an object is accessible from a cache on every node. I have used ehache to distribute quite successfully, no need to setup a terracotta server unless you need the scale, can just point instances together via rmi. Also works synchronously and asynchronously depending on requirements. Also cache replication is handy if nodes go down so cache is actually redundant and dont lose anything. Good if you need to make sure that the object has been updated across all the nodes.
Clustered Execution/data distribution
http://www.hazelcast.com/
Hazelcast is also a nice option as provides a way of executing java classes across a cluster. This is more useful if you have an object that represents a unit of work that needs to be performed and you dont care so much where it gets executed.
Also useful for distributed collections, i.e. a distributed map or queue
Roll your own RMI/Jgroups
Can write your own client/server but I think you will start to run into issues that the bigger frameworks solve if the requirements of the objects your dealing with starts to get complex. Realistically Hazelcast is really simple and should really eliminate the need to roll your own.
It's not open source, but Oracle Coherence would easily solve this problem.
If you need an implementation of JCache, the only one that I'm aware of being available today is Oracle Coherence; see: http://docs.oracle.com/middleware/1213/coherence/develop-applications/jcache_part.htm
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.
It is just an idea. you might want to check the exact implementation.
It will downgrade performance but I don't see how it is possible to avoid it.
It not an easy one to implement. might be you should consider load balance instead of clustering.
you might consider RMI and/or dynamic-proxy.
extract interface of your objects.
use RMI to access the real object (from all clusters even the one that actually holds the object)
in order to create RMI for an existing code you might use dynamic-proxy (again..not sure about implementation)
*dynamic proxy can wrap any object and do some pre and post task on each method invocation. in this case it might use the original object for RMI invocation
you will need connectivity between clusters in order to propogate the RMI object.
I am writing a moderately complex Java desktop app, including an embedded database. I do not see any reason why, after the app establishes a connection to the database, why it should close the connection until the app is going to shut down.
Practically everything one does with the database requires a connection; transactions can be started and completed serially in the connection, the app is not doing anything fantastically complicated with the database.
Is there any reason why I should not create the connection and put a reference to it in a static variable in a class known and used by database-specific classes? It would save having the connection have to be passed around among all kinds of methods without ever changing value.
Is there a design-level consideration I'm missing somewhere?
rc
I would suggest using a library such as c3p0 or dbcp which handles connection pooling for you. It gives you the flexibility to scale up your application later if necessary.
Anything static usually makes it harded to write proper test cases, since you never know if the static resource has been altered or not.
Three months down the road, you're going to want to be able to connect to two databases at the same time - maybe you're doing some import / export work, or an upgrade job, or merging two customers together. And then you're going to want two of them. And now suddenly that static field everyone uses is a nightmare.
You could look into an IoC container like Guice or Spring to ensure that you can keep track of "singleton" objects without abusing static fields to enforce their "Singleton"ness.
Avoid statics. Think on concurrency and multithread issues with this kind of variables. A good point is handle your connections with a database pool. Spring is your friend to reach a simple and nice configuration
I do not see any reason why, after the app establishes a connection to the
database, why it should close the connection until the app is going to shut down.
That seems completely fine to me. It's an embedded database; it is at the service
of your application. Create the connection when you start, use it as long as you
need, shut it down when your application closes down.
I have to make a web application multi-tenant enabled using Shared database separate schema approach. The application is built using Java/J2EE and Oracle 10g.
I need to have one single appserver using a shared database with multiple schema, one schema per client.
What is the best implementation approach to achieve this?
What needs to be done at the middle tier (app-server) level?
Do I need to have multiple host headers each per client?
How can I connect to the correct schema dynamically based on the client who is accessing the application?
At a high level, here are some things to consider:
You probably want to hide the tenancy considerations from day-to-day development. Thus, you will probably want to hide it away in your infrastructure as much as possible and keep it separate from your business logic. You don't want to be always checking whether which tenant's context you are in... you just want to be in that context.
If you are using a unit of work pattern, you will want to make sure that any unit of work (except one that is operating in a purely infrastructure context, not in a business context) executes in the context of exactly one tenant. If you are not using the unit of work pattern... maybe you should be. Not sure how else you are going to follow the advice in the point above (though maybe you will be able to figure out a way).
You probably want to put a tenant ID into the header of every messaging or HTTP request. Probably better to keep this out of the body on principle of keeping it away from business logic. You can scrape this off behind the scenes and make sure that behind the scenes it gets put on any outgoing messages/requests.
I am not familiar with Oracle, but in SQL Server and I believe in Postgres you can use impersonation as a way of switching tenants. That is to say, rather than parameterizing the schema in every SQL command and query, you can just have one SQL user (without an associated login) that has the schema for the associated tenant as its default schema, and then leave the schema out of your day-to-day SQL. You will have to intercept calls to the database and wrap them in an impersonation call. Like I say, I'm not exactly sure how this works out in Oracle, but that's the general idea for SQL Server.
Authentication and security are a big concern here. That is far beyond the scope of what I can discuss in this answer but make sure you get that right.
i am trying to upload a file into the server and storing the information of the file into an Access database, is there any need to handle the threads while database connectivity for multiple user. If yes how to do it?
Your webserver is inheritly multithreaded, that saves you from implementing you own threads to handle the uploads.
Do however make sure that multiple requests dont use same resources (dont write all uploaded file in the same tmp file,....)
To avoid problems saving the data to your db, use a Connection Pool.
So yes you need threads but if your design is good then all the threading will be handled by your frameworks
Exactly. Each HTTP request is already a thread at its own. Keep in mind that the web container will create only one servlet instance during application's lifetime and that the servlet code is been shared among all requests. This implies that any class-level variables or static variables are going to be shared among all requests. If you have such one variable, it is not threadsafe. You need to declare request-specific variables threadlocal at method-level.
As to JDBC: just write solid code and everything should go well. Using a connection pool is only useful to improve connecting performance (which is really worth the effort, believe me, connecting the DB is a fairly expensive task which may account up to at least 200ms or even more, while reusing a connection from the pool costs almost nothing). It only doesn't change anything to the threadsafety of the code you write, it's still in your control/hands. To get a clear picture of how to do the basic JDBC coding the right way, you may find this article useful.