What are my options in distributing a J2EE app on multiple servers? - java

I'm using JSP+Struts2+Tomcat6+Hibernate+MySQL as my J2EE developing environment. The first phase of the project has finished and it's up and running on a single server. due to growing scale of the website it's predicted that we're gonna face some performance issues in the future.
So we wanna distribute the application on several servers, What are my options around here?

Before optimize anything you should detect where your bottleneck is (Services, Database,...). If you do not do this, the optimization will be a waste of time and money.
And then the optimization is for example depending on you use case.
For example, if you have a read only application, add the bottleneck is both, Java Server and Database, then you can setup two database servers and two java servers.
Hardware is very important too. May the easiest way to to update the hardware. But this will only work if the hardware is the bottleneck.

You can use any J2EE application server that supports clustering (e.g. WebLogic, WebSphere, JBoss, Tomcat). You are already using Tomcat so you may want use their clustering solution. Note that each offering provides different levels of clustering support so you should do some research before picking a particular app server (make sure it is the right clustering solution for your needs).
Also porting code from a standalone to a cluster environment often requires a non-negligible amount of development work. Among many other things you'll need to make sure that your application doesn't rely on any local files on the file system (this is a bad J2EE practice anyway), that state (HTTP sessions or stateful EJB - if any) gets properly propagated to all nodes in your cluster, etc. As a general rule, the more stateless, the smoother the transition to a cluster environment.

As you are using Tomcat, I'd recommend to take a look at mod_cluster. But I suggest you to consider a real application server, like JBoss AS. Also, make sure to run some performance tests and understand where is the bottleneck of your application. Throwing more application servers is ineffective if, for instance, the bottleneck is at the database.

Related

Any suggestions for an automated way to capture your application's external connections?

I am trying to replace a requirement our dev teams have where they manually have to fill out a form that includes a list of their app's external connections (for example any database connections, calls to other services/applications, backing services, etc...). This is required in order to get approval to deploy to production. Mgmt/Security and our last mile folks use this information to determine risk level and to make sure that any scheduled dependencies are looked at (e.g., make sure the deployment is not scheduled for a time when one of the backing services is down so all the integration tests don't fail). Any suggestions to capture this automatically by scanning the code in Git? Or can Dynatrace provide this information if we have it monitoring in the lower environments pre-prod? Some other tool?
A little background in case you need it - we are using Jenkins with OpenShift to deploy docker containers to AWS PaaS. Code is stored in Git, we use Bitbucket. In the pipeline we have SonarQube scanning and a tool that scans third party libraries the app is using (e.g., struts, cucumber, etc..). We have dynatrace to monitor the app in production (but we can also use it in dev if we want). Mostly Java apps but we also have Node and Python and .NET.
I can't suggest a way to automate this. I suspect there isn't one.
I would have thought it was advisable that the dev teams did this by hand anyway. Surely they should have a handle on what external connections the apps ought to be making. Expecting the production / security team to take care of it all means that they need to develop a deeper understanding of the app's functionality and architecture so that they can make a reasoned decision on whether particular access is necessary,
I do have one suggestion though. You could conceivably do your testing on machines with firewalls that block out-going connections for all but a set of white-listed hosts and ports. That white-list could be the starting point for the forms you need to fill in.
Have you looked into tagging? Manual or environment based variable set up looks painful (which is why I have avoided), but might be worthwhile? https://www.dynatrace.com/support/help/how-to-use-dynatrace/tags-and-metadata/

Google Cloud Platform: are my architectural solutions correct?

I'm trying to make simple application and deploy it on Google Cloud Platform Flexible App Engine, which will contain two main parts:
Front end application (simple Web UI based on Java 8 (Spring + Thymeleaf) with OAuth authorization from different external sites)
Back end application (monitoring several resources in separate threads, based on logged in users and reacting to their input in a certain way (behavioral changes))
Initially I was planning to make them as one app, but I think that potentially heavy background processing may cause failures in my front end application part + App Engine docs says that deployed services behave similar to microservice architecture.
My questions are:
Do I really need to separate front end from back end, if I need to react to user input as fast as possible? (but delays up to 2 seconds aren't that critical)
If I do need to separate them (and I strongly believe that I do) - how to I set up interaction between applications?
Each resource must be tracked exactly by one thread on back end - what are the best practices about this? I thought about having a SQL table with a list of acquired resources, but the flaw I see there is if an instance will fail I will need to make some kind of clean up on that table and redetermine which resources are actually acquired.
Your proposed architecture sounds like the right approach in separating the two into different services for the following reasons:
Can deploy code for each separately, rollback versions separately, and split traffic separately for experiments or phased rollouts.
Can adjust machine types and memory allocations for each service to better suit its needs. If you're doing work that is memory intensive on the backend, you can adjust that service's settings to allocate more memory per instance.
Allow each type of service to scale independently based on demands, which should result in better utilization of the services and less waste. This should also lower your overall spending than if you tried to go for a one-sized fits all approach in a single monolithic service.
You can mix different runtime environments across services. For example, you can mix language runtimes within a project OR you could even mix between standard and flexible environments. Say your front-end code is more cost efficient in standard, designate that service as a standard environment service and your backend as a flexible environment service. Or say you need a customer docker file with Perl in it, you could do that as a flexible environment custom runtime and have your front-end in Java 8.
You can still share common services like Cloud SQL, PubSub, Cloud Tasks (currently in alpha) or Redis for in memory caching. Your works don't need t reside in App Engine, they could reside in a different product if that better suits your needs.
Overall, you get much better control over your application to split it apart. The biggest benefit likely comes down to optimizing your application for spending only on what you need.
I think that you are likely to be able to deploy everything as an appengine app except if you use some exotic Java libraries that are not whitelisted. It could still be good to deploy it with compute engine for increased configurability and versatility.
You can create one front-end instance and one back-end instance in compute engine and divide the resources between them like that. Google's documentation has an example where you can do that.

How to build a distributed java application?

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).
Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes).
I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution.
Questions:
This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?
2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?
If I want to make the application distributed, is it possible with my idea? If not what would i need to use?
Sorry for having many questions , but I am really confused about this.
I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).
If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...
The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?
The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)
The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.
The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).
There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.
Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.
To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.
As #AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.
If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.
Another route could be to distribute your application using EJBs running on an application server.

Java Web Application for 5000~ Users

For the first time (hopefully not the last) in my life I will be developing an application that will have to handle a high number of users (around 5000) and manage lots of data. I have developed an application that manages lots of data (around 100~ GB of data, not so much by many of your standards), but the user count was pretty low (around 50).
Here is the list of tools / frameworks I think I will be using:
Vaadin UI framework
Hibernate
PostgreSQL
Apache Tomcat
Memcached (for session handling)
The application will mainly be run inside a company network. It might be run on a cluster of servers or not, depends on how much money the company wants to spend to make its life easier.
So what do you think of my choices and what should I take caution of?
Cheers
The answer, as with all performance/scaling related issues is: it depends.
There is nothing in your frameworks of choice that would lead me to think it wouldn't be able to handle a large amount of users. But without knowing what exactly you want to do or what your budget is, it's impossible to pick a technology.
To ensure that your application will scale/perform, I would consider the following:
Keep the memory footprint of each session low. For example, caching stuff in the HttpSession may work when you have 50, but not a good idea when you have 5000 sessions.
Do as much work as you can in the database to reduce the amount of data that is being moved around (e.g. when looking at tables with lots of rows, ensure that you've got paging that is done in the database (rather than getting 10,000 rows back to Tomcat and then picking the first 10...)
Try to minimise the state that has to be kept inside the HttpSession, this makes it easier to cluster.
Probably the most important recommendations:
Use load testing tools to simulate your peak load and beyond and test. JMeter is the tool I use for performance/load-testing.
When load testing, ensure:
That you actually use 5000 users (so that 5000 HttpSessions are created) and use a wide range of data (to avoid always hitting the cache).
EDIT:
I don't think 5000 users is not that much and you may find that (performance-wise) you only need a single server (depends on the size of the server and the results of the load testing, of course, and you may consider a clustered solution for failovers anyway...) for handling the load (i.e. not every one of your 5000 users will be hitting a button concurrently, you'll find the load going up in the morning (i.e. everyone logs in).
You might want to consider an Apache HTTP server in front of your Tomcat servers. Apache will provide: compression, static caching, load-balancing and SSL.
Any reason for not using Spring? It has really became an de-facto standard in the enterprise java applications.
Spring provides an incredibly powerful and flexible collection of technologies to improve your enterprise Java application development that is used by millions of developers.
Spring is lightweight and can stay as a middle layer, connecting the vaadin and hibernate, there by creating a clean separation of layers. The spring transaction management is also superior to the one on hibernate. I will suggest you go for it until you have a strong reason stopping you.
Since you asked people to weigh in, I won't hold back my opinion. ORMs in general, and Hibernate in particular, are an anti-pattern. I know, I've worked in shops that use Hibernate over the past 9 years. Knowing what I know now, I will never use it again.
I highly recommend this blog post, as it puts it more succinctly than I can:
ORM is an anti-pattern
But forgive me if I quote the bit from that blog about ORMs and anti-patterns:
The reason I call ORM an anti-pattern is because it matches the two
criteria the author of AntiPatterns used to distinguish anti-patterns
from mere bad habits, specifically:
It initially appears to be beneficial, but in the long term has more
bad consequences than good ones
An alternative solution exists that is proven and repeatable
Your other technology choices seem fine. Personally, I lean more toward Jetty than Tomcat. There's a reason that Google embeds it in a lot of their projects (think GWT and PlayN); it's a younger codebase and I think more actively developed now that Eclipse has taken it over. Just my humble opinion.
[UPDATE] One more link, very long read but when making architectural decisions, reading is good.
Object/Relational Mapping: The Vietnam of Computer Science
I recommend Glassfish for application server because Apache Tomcat can serve simple content. And Glassfish has full implementation of the Java EE specification.
Depending on you specification and future goals I would perhaps leave the normal version of tomcat and go for Apache TomEE or my personal preference of jBoss. As I understand it EJB's are not very well supported in the normal tomcat version and that is probalby something sweet to have when you want to create a couple of services, some clustered singleton service and other stuff. But this is just my personal pref of course and if your specification will not allow a more advanced EE server then you should stick with the slick tomcat.

SOLR, independent on a Jetty server or as a webapp in my existing Tomcat?

I have an eCommerce site and I wanted to implement search in it. After reading a lot about Lucene and SOLR, I finally choose SOLR as it adds functionality like JSON API facets, and a lot more.
SOLR comes with a builtin Jetty server, running in background, and my webapp is running on Tomcat server. I wanted to know what would be better for me in long run, performance wise and ease of customization and use, whether to leave SOLR as standalone on a different Jetty server or to integrate SOLR within Tomcat, by listing in JAVA_OPTS in catalina.bat?
I personally feel putting SOLR in Tomcat will reduce my performance as it will take more time to load, and I don't want SOLR to restart every time I redeploy my webapp, but then it being all together at one place maybe a plus point (not sure). I am looking forward for some opinion from guys who have been using SOLR, as to what would be the best for me, the data set is huge and also attract thousands of users every day.
Having it all on a single Tomcat instance will make administration easier. You can easily redeploy your webapp independently of Solr, Tomcat is designed to host multiple applications. If you have to put both Solr and your web app in a single box, I'd avoid having two web servers unless you have a real, measurable, compelling reason.
Mauricio's answer is a good one.
The counter-point to the operational simplicity of sharing the same Tomcat is the potential safety from isolating Solr into its own entire JVM and heap. That way you don't have to worry about the case of some memory leak or query of death or gigantic GC pause in Solr taking down your main application.
At Websolr, we run Solr in Tomcat, because it's what we know. However, if we implemented a design where we were to run more than one Java service on the same instance type, we would give serious consideration to switching to multiple Jettys for the increased isolation.
Weigh the pros and cons: Operational simplicity versus total isolation. Your mileage may vary. If this kind of isolation is not compelling, stick to hosting within Tomcat. If anything about that makes you nervous, set up monit and munin to keep an eye on everything.
And don't restart all of Tomcat just to reload one of its web applications.
Considering security, you may want your solr server to be isolated and to be accessed only from your webapp.
In this case, it would be better to host to a different instance not binded on a public address.

Categories