How to implement the "Shared database, separate schema" multi-tenant strategy

How to implement the "Shared database, separate schema" multi-tenant strategy - java

I have to make a web application multi-tenant enabled using Shared database separate schema approach. The application is built using Java/J2EE and Oracle 10g.
I need to have one single appserver using a shared database with multiple schema, one schema per client.
What is the best implementation approach to achieve this?
What needs to be done at the middle tier (app-server) level?
Do I need to have multiple host headers each per client?
How can I connect to the correct schema dynamically based on the client who is accessing the application?

At a high level, here are some things to consider:
You probably want to hide the tenancy considerations from day-to-day development. Thus, you will probably want to hide it away in your infrastructure as much as possible and keep it separate from your business logic. You don't want to be always checking whether which tenant's context you are in... you just want to be in that context.
If you are using a unit of work pattern, you will want to make sure that any unit of work (except one that is operating in a purely infrastructure context, not in a business context) executes in the context of exactly one tenant. If you are not using the unit of work pattern... maybe you should be. Not sure how else you are going to follow the advice in the point above (though maybe you will be able to figure out a way).
You probably want to put a tenant ID into the header of every messaging or HTTP request. Probably better to keep this out of the body on principle of keeping it away from business logic. You can scrape this off behind the scenes and make sure that behind the scenes it gets put on any outgoing messages/requests.
I am not familiar with Oracle, but in SQL Server and I believe in Postgres you can use impersonation as a way of switching tenants. That is to say, rather than parameterizing the schema in every SQL command and query, you can just have one SQL user (without an associated login) that has the schema for the associated tenant as its default schema, and then leave the schema out of your day-to-day SQL. You will have to intercept calls to the database and wrap them in an impersonation call. Like I say, I'm not exactly sure how this works out in Oracle, but that's the general idea for SQL Server.
Authentication and security are a big concern here. That is far beyond the scope of what I can discuss in this answer but make sure you get that right.

Related

What is a good practice to deploy webservices? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Is it a good practice to deploy web services separately or should they be part of the web application? For instance, I am developing a spring rest based web service. The function of this service is to, let's say, to get user data.
Each webapplication that queries this web service has it's user data in different schema. So, now the webservice will need to know who is calling it - is it Appilcation A or Application B? If it's AppA, then it should get data from Schema A, if it's AppB, then its another schema. Note, that AppA and AppB are just the same code packed into two different wars and the schema they are supposed to query is supplied from properties file.
In a situation like this, does it make sense to pack the webservice with the webapp code and deploy it under different contexts, so it becomes a duplciate service running in a different context. Or, should it be deployed separately and somehow the AppA and AppB are supposed to identify themselves to this web service?

I prefer below approach, which is in use for 50K concurrent users.
Make sure that each web service encapsulates both UI and Schema independently by executing required business use case. Each web service will have all three layers - Model, View and Controller for that business service. That means your App-A is one web service & App-B is other web service.
All web services will register and un-register with Master web service. Master web service is responsible to redirecting user request to appropriate web service like App-A OR App-B.
You should have cluster of Master web service & cluster of individual web services - App-A & App-B
In this approach, your schema can reside on different database instead of single database
Advantages of this approach:
Each web service can scale horizontally. Just add additional VM nodes if you want to increase the scale.
If you have different schemas on different databases in different locations, you are avoiding network performane bottlenecks in OLTP queries (Online transaction processing queries).
Disadvantages:
I see only one disadvantage since Master Web Service acts like a Facade and it should know the internals of individual web service. But it's not a drawback for the advantages it is offering if you consider the trade-off.

I have no idea about your business requirement to maintain different schemas for user data and going with webservice.
But instead of maintaining multiple wars with same code, i would suggest you to configure multiple datasources within the application and switch to datasource as per your requirement.
This link may help you to configure multiple DS
If you fallow aforementioned logic, you may end up with single deployable context.
Still want to stick with multiple wars as webservice, i would suggest you to have look at SpringBoot simple, container less deployable and scalable.

It is a matter of opinion, both choices are okay. You should take into account the usage of the service, scaling concerns etc.
You could look at Microservices as an idea, but it has to make sense from your standpoint.
About the two different apps: if the differences are only in configuration, try externalizing it (23. Externalized Configuration). This way you can have a single artifact being deployed twice.

Given that scenario, it is a good practice having only one web service, in this way you improve the maintainability of the system because you don’t have the same code twice. If you have in the future other new similar app you don’t have to implement a new service.

Approach 1:- (Preffered)
You should have a single web application in which will have the entire code for application UI and Repo/data interaction.
Based on the type of request dynamically switch the data source as needed. You can have at look at Spring Dynamic datasource routing here
Approach 2:-
In case your UI has a completely different type of interactions managed by different teams, it makes sense to have separate UI components and the backend web services maintained at a same location.
Again based on the type of request you can dynamically route the datasource.
Hope this helps :)

my inputs:
1) Any specific reasons to build 2 different wars for same code? Is it only because you have two different data sources for each of them?
Why cant you have single application deploy with some parameterized mechanism in each request to identify which schema to get data from?
2) Why do you need a web service in first place? Why not application hook directly to database it needed.
3) Is underlying database transactional DB or some historical data? How about merging both schemas in one as one-time effort OR using some sort of virtualized views which picks data from 2 schemas based on input parameters.
***** edited after Jay's inputs:
My suggestion will be to have web service deployed separately from 2 web apps because it provides single place to manage code in long run. I have following additional suggestions:
Define your own headers in SOAP XML Schema which can give you both appContext(application making call) as well as userContext(user). Give a good thought on this aspect keeping long term view.
Keep SOAP request-response stateless which will give you scalability. Dont maintain any state of SOAP request at server side.
I have in past used a data virtualization solution (CISCO Composite)..what benefits it provides: if there are two (or more) data sources containing similar type of data(entities), it can join,cleanse & merge it virtually and expose it as REST/SOAP based web service. Try evaluating this option as well.
What it can further help if in future you have other consumers to access your information using plain SQL/JDBC call, they will be able to do it...also data virtualization solutions support many other interfaces to consumers like Hadoop, OData etc...again it depends on budget and other constraints of project...I am not sure if there is any effective open source data virtualization solution available or not?

Personally, in my experience, it's a lot better to have them separated, it usually depends on how big and how critical your main project is.
But even if at the beginning your project isn't that big and there's only 1 person working on it, later on, as it continues to grow, if you have microservices for all the things your main project do, it will be a lot easier to maintain, rather than having many people working on the same code handling many versions of an unique project, handling many small projects is less confusing and errors are easier to find.
Plus if something fails, you can have 1 microservice down while your main still runs without interruption, it will only by denied of 1 service, instead of having everything down while you fix it.
High availability is very important in production, and having them separated helps with this.

Given your situation I'd advice going with ONE webapp (one "project") with some caveat and then consider one of the two solutions:
1) Given you are using spring, I'll assume (hope) you are using maven as well..
Make a different compilation goal and make it so that, based on the goal invoked to produce the war, the relevant properties file is different..
This way you have ONE webapp, and based on the compilation (or rather based on the properties file tied to that specific compilation) you will obtain a war tied to a specific environment&schema... You deploy an individual war for each webservice with a clean separation, though the root code is the very same and it's only one application... [CLEANER SOLUTION]
2) Make it so that you don't only get the json request but also the https certificate of the sender (thus you identify a specific "webapp" based on the https certificate exposed), and based on the certificate AND The source of the request, you ensure the source as "qualified" to receive data from schema X rather than schema Y.. You deploy ONE war only that will, at his own discretion, apply logic to reroute your "user data fetch query" to one database or the other [I DISCOURAGE THIS PRACTICE]
of course there are other approach as well, but I think these two are the most feasible..

It really depends on what you want to achieve.
If you want to encapsulate the database/schema/table, then it should really be one service for each application. The main advantage of doing this is that you could swap the database later on if there is some problem with the current one, it also simplifies caching and invalidation, etc etc.
If the database/schema/table is not encapsulated anyway, then the single service is much easier and better. Each web application just have to identify themselves, and each of them will get exactly what they need. This could be achieved by putting the query/schema information in property file, or creating db views with the same name as client, etc.
If we were to go for this approach, a question will pop up. Why bother having this layer at all? Couldn't each web application just query the db directly? If the answer is yes, then just remove the whole layer completely.

You are trying to implement a Data Provider, or DAO as a service.
To make it -
Simple
Scalable
Maintainence-friendly,
Adaptable
You can simply have a single webservice, deployed outside the WebApp(s) and driven off configuration. The configuration itself can be stored as property file, or from a DB. The identifier for the client should be being passed in the webservice request.
This is actually a pretty standard approach implemented to enable optimizations at the Data tier outside of DB, like caching (again driven of configuration), expiry, pooling, etc.
The other option, to include as a shared jar within the webapp, yes, has advantage of code-reuse (which you get with externally deployed service as well), but the following disadvantages outweigh the option.
Coupling
Employing optimizations are difficult
Release management (this also depends upon how your code is organized)
Versioning.
Hope it helps.

I would deploy to one instance. No matter what. Of course, there are circumstances where it may be necessary to deploy separately. From a best "coding" practice, one instance should be used to allow for "right once, use many".
Then...
Define different XSD's for each AppA, AppB, etc. Marshall accordingly.
Or, use Groovy to marshall appropriate objects as json or xml.

Providing database access via java webservice

Our company is currently implementing a couple of tools for employee use, as i'm the only programmer within the company its fallen to me to develop these tools.
However i have little to no experience with webservices or java, so im a little stumped on some logic here. and hoping someone can give me some guidance
We have a mysql database hosted in the UK, this will provide the data for the tools that will be used both within the UK and outside of the UK by our other offices. I'm looking to provide access to the database via web services.
However having looked into this, I get the feeling i have missed something key. Right now I'm looking to create methods for every database table, so each table will need a select, update and delete method, since there are 20 odd tables, that means the web service would have 60 methods exposed!, is this normal?
It seems to me that there would be an easier way to do this but having little experience with java i'm at a loss, and my google fu has failed me thus far.
Could anyone give me some pointers on what the "usual" way of doing this is? and if there is some way that I've simply overlooked.

Web services should be written for each entity and not for each table. An entity should be a logical one and not simply something very abstract. There can be multiple tables in your database to store the data for one entity. For example: You have an entity called 'Person' but assume that details of the person are stored in multiple tables such as 'PersonDetail', 'PersonContactDetails','PersonDependentDetails', etc. You can manipulate these tables data using webservices created for 'Person'.
Web services operations can be mapped to database CRUD(CREATE,READ,UPDATE,DELETE) operations. If you are writing RESTful webservices CRUD operations can be mapped to HTTP methods i.e. POST,GET,PUT,DELETE.

Here's one typical approach, although it's a pretty big learning curve:
Create Data Access Objects (DAOs) to query the DB and convert from your relational data model to a java object model. If extreme performance isn't a consideration (it isn't a consideration for most applications), consider ORM mapping frameworks like Hibernate or JPA. You probably don't need one method per table. Many times multiple tables make up one domain object. For instance, in a banking app you might have a table called customer, and a related table called customer_balance. If you just want to present a balance to a customer, you could have one domain object called "Customer", with a field called "balance". Your Customer DAO would join customer and customer_balance to create a single Customer object.
Create services to wrap DAOs and apply your business rules to them. Keep biz rules in the service as much as possible because it improves testability. An example of a simple banking service method would be "withdrawMoney(amount)". The service would pull the Customer from the DB via a DAO, then first check that the custom has at least "amount" in current balance, and then subtract "amount" from the current balance and save it in the database via the DAO.
Your web layer will call the services layer and present the data to the user and allow them to operate on it. At some point, you may want your web layer to communicate with the services layer via a web service API, although that is probably overkill for early implementations.
As others have cited, the Java Petstore application is a good example of this approach. Oracle doesn't maintain the Petstore app any longer, but volunteers have copied it to GitHub and are keeping it up to date with the latest J2ee versions. Here's a link to the GitHub site: https://github.com/agoncal/agoncal-application-petstore-ee6

Yes, if every one of your 20 tables will require selection (HTTP GET), update (HTTP PUT) and delete (HTTP DELETE), you will probably need 20*3=60 methods.

You'll probably want to start off by having a read of this part of the Java EE 7 tutorial which will give you an overview of web service development. What you are suggesting though seem strange and perhaps not really what you want. If you want to expose every table to updates / deletes / etc then you'd perhaps be better off just opening the port to the database server but this is generally considered a bad idea.
I think you probably want to work at a higher level and pass around objects rather than database updates, lets say, for example you have a Person object in your application. You can pass that to and from your web application and client application and let the web application worry about putting it in the database, deleting it etc. Although there is nothing technically wrong with performing updates in the way you are suggesting I've not seen it done for many years.

Considerations when calling mysql database in parallel

I have to create an mysql database to be used by several applications in parallel for the first time. Up until this point my only experience with mysql databases have been single programs (for example webservers) querying the database.
Now i am moving into a scenario where i will have several CXF java servlet type programs, as well as a background server editing and reading on the same schemas.
I am using the Connector/J JDBC driver to connect to the database in all instances.
My question is this: What do i need to do in order to make sure that the parallel access does not become a problem. I realize that i need to use transactions where appropriate, but where i am truly lost is in the management.
For example.
Do i need to close the connection every time a servlet is done with a job?
Do i need a unique user for each program accessing the database?
Do i have to do something with my Connector/J objects?
Do i have to declare my tables in a different way?
Did i miss anything or is there something i failed to think about?
I have a pretty good idea about how to handle transactions and the SQL itself, but i am pretty lost when it comes to what i need to do when setting up my database.

You should maintain a pool of connections. Connections are really expensive to create think on the order of of several hundred milliseconds. So for high volume apps it makes sense to cache and reuse them.
For your servlet it depends on what container you are using. Something like JBoss will provide pooling as part of the container. It can be defined through the datasource definition and accessed through JNDI. Other containers like tomcat may rely on something like C3PO.
Most of these frameworks return custom implementations of JDBC connections that implement the close() methods with logic that returns the connection to the pool. You should familiarize yourself with the details of your concrete implementation to make sure you are doing things in a way that is supported
As for the concurrency considerations, you should familiarize yourself with concepts of optimistic/pessimistic locking and transaction isolation levels. These have trade offs where the correct answer can only be determined given the operational context of your application.
Considering the user, Most applications have one user that represents the application called the read/write user. This user should only have privilege to read and write records from the tables,indices,sequences, etc. that are associated with your application. All the instances of the application will specify this user in their connection string.
If you familiarize yourself with the concepts above, you'll be about 95% of the way there.
One more thing. As pointed out in the comments on the administration side your database engine is a huge consideration. You should familiarize yourself with the differences and the tuning/configuration options.

Multitenancy with Spring JPA

I am looking around for a multitenancy solution for my web application.
I would like to implement a application with Separate Schema Model. I am thinking to have a datasource per session. In order to do that i put datasource and entitymanger in session scope , but thats not working. I am thinking to load data-access-context.xml(which include datasource and other repository beans) file when the user entered username and password and tenantId. I would like to know if it is a good solution?

Multitenancy is a bit tricky subject and it has to be handled on the JPA provider side so that from the client code perspective nothing or almost nothing changes. eclipselink has support for multitenancy (see: EclipseLink/Development/Indigo/Multi-Tenancy), hibernate just added it recently.
Another approach is to use AbstractRoutingDataSource, see: Multi tenancy in Hibernate.
Using session-scope is way too risky (also you will end up with thousands of database connections, few for every session/user. Finally EntityManager and underlying database connections are not serializable so you cannot migrate your session and scale your app properly.

I have worked with a number of multi-tenancy systems. The challenge here is how you keep
open architecture and
provide a solution that evolves with your business.
Let's look at second challenge first. Multi-tenancy systems has a tendency to evolve where you'll need to support use cases where same data (record) can be accessed by multiple tenants with different capacity (e.g. https://bugs.eclipse.org/bugs/show_bug.cgi?id=355458). So, the system ultimately needs Access Control List.
To keep the open architecture you can code to a standard (like JPA). Coding to EclipseLink or Hibernate makes me uncomfortable.
Spring Security ACL provides very flexible community supported solution to both these challenges. Give that a try. I did and been happy with it's performance. However, I must caution you, it took me some digging to get my head around it.

Strategy for Offline/Online data synchronization

My requirement is I have server J2EE web application and client J2EE web application. Sometimes client can go offline. When client comes online he should be able to synchronize changes to and fro. Also I should be able to control which rows/tables need to be synchronized based on some filters/rules. Is there any existing Java frameworks for doing it? If I need to implement on my own, what are the different strategies that you can suggest?
One solution in my mind is maintaining sql logs and executing same statements at other side during synchronization. Do you see any problems with this strategy?

There are a number of Java libraries for data synchronizing/replication. Two that I'm aware of are daffodil and SymmetricDS. In a previous life I foolishly implemented (in Java) my own data replication process. It seems like the sort of thing that should be fairly straightforward, but if the data can be updated in multiple places simultaneously, it's hellishly complicated. I strongly recommend you use one of the aforementioned projects to try and bypass dealing with this complexity yourself.

The biggist issue with synchronization is when the user edits something offline, and it is edited online at the same time. You need to merge the two changed pieces of data, or deal with the UI to allow the user to say which version is correct. If you eliminate the possibility of both being edited at the same time, then you don't have to solve this sticky problem.
The method is usually to add a field 'modified' to all tables, and compare the client's modified field for a given record in a given row, against the server's modified date. If they don't match, then you replace the server's data.
Be careful with autogenerated keys - you need to make sure your data integrity is maintained when you copy from the client to the server. Strictly running the SQL statements again on the server could put you in a situation where the autogenerated key has changed, and suddenly your foreign keys are pointing to different records than you intended.
Often when importing data from another source, you keep track of the primary key from the foreign source as well as your own personal primary key. This makes determining the changes and differences between the data sets easier for difficult synchronization situations.

Your synchronizer needs to identify when data can just be updated and when a human being needs to mediate a potential conflict. I have written a paper that explains how to do this using logging and algebraic laws.

What is best suited as the client-side data store in your application? You can choose from an embedded database like SQLite or a message queue or some object store or (if none of these can be used since it is a web application) files/ documents saved on the client using Web DB or IndexedDB through HTML 5's LocalStorage API.
Check the paper Gold Rush: Mobile Transaction Middleware with Java-Object Replication. Microsoft's documentation of occasionally connected systems describes two approaches: service-oriented or message-oriented and data-oriented. Gold Rush takes the earlier approach. The later approach uses database merge-replication.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.