scalability and performance in java web applications

scalability and performance in java web applications - java

Let's say you want to build a web application with high scalability (over 10,000 simultanious users). How do you guarantee good and steady performance? What design patterns are recommendable? What are most frequent mistakes?
Are there frameworks that force yourself to write scalable code? Would you maybe consider php as frontend and Java as backend technology? Or is let's say JSF reasonable as well and it's all about your architecture? And how good is developing with Grails in that context?
Hope this thread is not too subjective but I like to gather some experiences of you :-)

If you want to build a highly scalable application then it should be stateless and use shared nothing architecture as much as possible. If you share nothing between nodes and a node dont have a state then synchronization is minimum. There are several good web frameworks suitable for your requirements (Play Framework and Lift for Java, Django for Python, Ruby on Rails for Ruby).
As for JSF and related technologies, I dont think it would be wise to use them in your case. A good old request-responce is better.

If you want your application to scale nicely and perform well then you need to have a Distributed Cache. Distributed cache can incredibly boost up application performance and for this purpose you can use any third party distributed cache like NCache.

With so many simultaneous users (a situation I confess I've never encountered myself), what I think is the most important is to be able to load-balance your charge across many web servers.
If you want failover (which is probably a must-have), this means that you must be very careful about state : the more state you have, the more memory you need, and the more difficult it is to handle failover between servers : either you need to persist the session state in a location that is common to all the servers, or you need to replicate the state across servers.
So, I would choose an architecture where you don't need too much state on the server. IMHO, an action based framework is more suited to this kind of architecture than a component-based one, unless the state is handled at client side, with rich JavaScript components.

Related

Java Web Application for 5000~ Users

For the first time (hopefully not the last) in my life I will be developing an application that will have to handle a high number of users (around 5000) and manage lots of data. I have developed an application that manages lots of data (around 100~ GB of data, not so much by many of your standards), but the user count was pretty low (around 50).
Here is the list of tools / frameworks I think I will be using:
Vaadin UI framework
Hibernate
PostgreSQL
Apache Tomcat
Memcached (for session handling)
The application will mainly be run inside a company network. It might be run on a cluster of servers or not, depends on how much money the company wants to spend to make its life easier.
So what do you think of my choices and what should I take caution of?
Cheers

The answer, as with all performance/scaling related issues is: it depends.
There is nothing in your frameworks of choice that would lead me to think it wouldn't be able to handle a large amount of users. But without knowing what exactly you want to do or what your budget is, it's impossible to pick a technology.
To ensure that your application will scale/perform, I would consider the following:
Keep the memory footprint of each session low. For example, caching stuff in the HttpSession may work when you have 50, but not a good idea when you have 5000 sessions.
Do as much work as you can in the database to reduce the amount of data that is being moved around (e.g. when looking at tables with lots of rows, ensure that you've got paging that is done in the database (rather than getting 10,000 rows back to Tomcat and then picking the first 10...)
Try to minimise the state that has to be kept inside the HttpSession, this makes it easier to cluster.
Probably the most important recommendations:
Use load testing tools to simulate your peak load and beyond and test. JMeter is the tool I use for performance/load-testing.
When load testing, ensure:
That you actually use 5000 users (so that 5000 HttpSessions are created) and use a wide range of data (to avoid always hitting the cache).
EDIT:
I don't think 5000 users is not that much and you may find that (performance-wise) you only need a single server (depends on the size of the server and the results of the load testing, of course, and you may consider a clustered solution for failovers anyway...) for handling the load (i.e. not every one of your 5000 users will be hitting a button concurrently, you'll find the load going up in the morning (i.e. everyone logs in).

You might want to consider an Apache HTTP server in front of your Tomcat servers. Apache will provide: compression, static caching, load-balancing and SSL.

Any reason for not using Spring? It has really became an de-facto standard in the enterprise java applications.
Spring provides an incredibly powerful and flexible collection of technologies to improve your enterprise Java application development that is used by millions of developers.
Spring is lightweight and can stay as a middle layer, connecting the vaadin and hibernate, there by creating a clean separation of layers. The spring transaction management is also superior to the one on hibernate. I will suggest you go for it until you have a strong reason stopping you.

Since you asked people to weigh in, I won't hold back my opinion. ORMs in general, and Hibernate in particular, are an anti-pattern. I know, I've worked in shops that use Hibernate over the past 9 years. Knowing what I know now, I will never use it again.
I highly recommend this blog post, as it puts it more succinctly than I can:
ORM is an anti-pattern
But forgive me if I quote the bit from that blog about ORMs and anti-patterns:
The reason I call ORM an anti-pattern is because it matches the two
criteria the author of AntiPatterns used to distinguish anti-patterns
from mere bad habits, specifically:
It initially appears to be beneficial, but in the long term has more
bad consequences than good ones
An alternative solution exists that is proven and repeatable
Your other technology choices seem fine. Personally, I lean more toward Jetty than Tomcat. There's a reason that Google embeds it in a lot of their projects (think GWT and PlayN); it's a younger codebase and I think more actively developed now that Eclipse has taken it over. Just my humble opinion.
[UPDATE] One more link, very long read but when making architectural decisions, reading is good.
Object/Relational Mapping: The Vietnam of Computer Science

I recommend Glassfish for application server because Apache Tomcat can serve simple content. And Glassfish has full implementation of the Java EE specification.

Depending on you specification and future goals I would perhaps leave the normal version of tomcat and go for Apache TomEE or my personal preference of jBoss. As I understand it EJB's are not very well supported in the normal tomcat version and that is probalby something sweet to have when you want to create a couple of services, some clustered singleton service and other stuff. But this is just my personal pref of course and if your specification will not allow a more advanced EE server then you should stick with the slick tomcat.

Designing a Scalable Application

Starting a new project an e-commerce, that basically consists of two, main separated projects: a core application and a web client.
Core app would act as a service provider, the back-end for the web client (or other clients), exposing all its functionality in REST-ful web service/JSON.
Web client would act as a front-end and a service consumer for the core app.
Both project are mainly based on: Spring, Apache CXF, Maven, and either Tomcat or Jetty.
Git as VCS. CouchDB as the DB. Also a distributed caching system like Memcached.
The principle is to design the project (both core and web) in a way to be scalable and distributable on several nodes on the internet.
Perhaps the subject is too big and complex to discuss in one topic; the idea is to put the main headlines that would assist on making the right decisions before going on with the implementation.
The questions:
Based on the technology stack above, what could possibly be missing or worth adding?
Any books, materials or case studies recommendation that touch on the subject?

On the server side, you should structure your URLs and application state in such a way that they can either be generated statically and served via a web server like apache, or dynamically generated and served by the app server. Static content generation can be a pretty effective, reasonably straightforward way to improve performance. Having your API points to be implementation agnostic is generally a good practice anyways.
High Performance Websites has some great stuff. Also, check out Squid for HTTP caching.

Take a look at CQRS principles. Even though great scalability is only side effect of applying CQRS, it ideally fits for cloud computing which main goal is to provide elastic scalability. Here is a awesome video from Greg Young's class.
PS. Despite most of materials are based on .NET stack, these principles are cross-platform.

#Ellead:Go through Tomcat clustering (session replication) :http://tomcat.apache.org/tomcat-6.0-doc/cluster-howto.html
Just take care of singleton objects in spring (keep in mind singleton is per JVM)

You did not mentioned performance and load testing - you should measure performance of your application and scalability too:
How is a single node coping wih high load ?
What performance gainyou get by adding a single node ?
Do no expect or guess anything and start as soon as possible - that alone will save you fortune.
For REST-based solution look at Apache httpd and mod_jk for load-balancing.

Which is scalable? Simple CRUD Webapp vs Webapp talking to a REST service

I think the title says it clearly. I am no scalability guru. I am on the verge of creating a web application which needs to scale to large data sets and possibly many (wont exaggerate here, lets say thousands of) concurrent users.
MongoDB is the data repository and i am torn between writing a simple Play! webapp talking to MongoDB versus Play! app talking to a REST service app (in Scala) which does the heavy lifting of all business logic and persistence.
Part of me thinks that having the business logic wrapped as a service is future proof and allows deploying just the webapp in multiple nodes (scaling). I come from Java EE stack and Play! is a rebel in java web frameworks. This approach assures me that i can move away from Play! if needed.
Part of me also thinks that Play! app + Scala service app is additional complexity and mayn't be fruitful in the long run.
Any suggestions are appreciated.
NOTE: I am a newbie to Scala, MongoDB and Play!. Pardon me if my question was silly.

Scalability is an engineering art. Which means that you have lots of parameters and apply your experience to specific values of these parameters to come to a solution. So general advice, without more specific data about your problem, is hard.
Having said that, from experience, some general advice:
Keep your application as clean and simple as possible. This allows you to keep your options open. In your case, start with a simple Play app. Concentrate on clean code so you can easily rework what you have into a different architectural model (with clean code, that's simpler than you'd think :-))
Measure, not guess, where the bottlenecks are. It's simple enough to flood a server with requests. Use profiling, memory dumps, whatever, to pinpoint your bottleneck in scalability.
Only then, with a working app in hand (which you could launch early with) and data on where your scaling bottlenecks are, you can make decisions on what to split off in (horizontally scalable) services.
On the outset, services look nice and scalable, but they often get you in an early mess - services need to communicate with each other, so you start introducing messaging, etcetera. Keep it simple, measure, optimize.

The question does not appear to be silly . For me encapsulating your data access behind the rest layer does not directly improve the scalability of the application.(significantly, ofcourse, there is the server that can perform http caching and handle request queues etc.., but from your description, your application looks small enough). You can achieve similar scalability without the Rest layer. But having said that, the Service layer could have a indirect impact.
First it makes your application cleaner. (UI Talking to db is messy.). It helps make the application maintainable. (Multi folds). Rest layer could provide you with middle tier that you may need in your application. Also a correctly designed Rest Layer will have to be Resource Driven . In my experience a Resource Driven Architecture is a good middle ground between ease of implementation and highly scalable design.
So I strongly suggest that you use the Service layer (Rest is the way to go :) ), but scalability in itself cannot justify the decision.

Putting the service between the UI and data source encapsulates the data source, so the UI need not know the details of how data is persisted. It also prevents the UI from reaching directly into the data source. This allows the service to authenticate, authorize, validate, bind, and perform biz logic as needed.
The downside is a slight speed bump for the app.
I'd say that adding the service has a small cost and a big upside. I'd vote for that.

The answer, as usual, is. It depends.
If there is some heavy-lifting involved and some business logic: Yup, that is best put into its own layer and if you add a RESTful interface to it, you can serve that up to whatever front-end technology you want.
Nowadays, people are often not bothering with having a separate web app layer, but serve the data via AJAX directly to the client.
You might consider adding a layer, if you either need to maintain a lot of user session state or have an opportunity to cache data on the presentation layer. There are more reasons why you would want a presentation layer, for example, serving out different presentations to different devices/clients.
Don't just add layers for complexities sake, though.
I might add that you should try to employ the HATEOAS principle. That will ease things significantly when scaling out the solution.

JavaEE Application Server or Lightweight Container?

Let me preface this by saying this is not an actual situation of mine but I'm asking this question more for my own knowledge and to get other people's inputs here.
I've used both Spring and EJB3/JBoss, and for the smaller types of applications I've built, Spring (+Tomcat when needed) has been much simpler to use. However, when scaling up to larger applications that require things like load balancing and clustering, is Spring still a viable solution? Or is it time to turn to a solution like EJB3/JBoss when you start to get big enough to need that? I'm not sure if I've scoped the problem well enough to get a good answer, so please let me know.
Thanks,
Jeff

Tomcat can be clustered.
Load balancing is usually a hardware solution (e.g., a BigIP or Cisco ACE) that's independent of app server.
Spring can be enterprise, just like EJB. There's no dividing line that says Spring can't handle it.

I could say that in our project which quite large (~500K LOC) we've got rid of JBoss in favour of Spring/Tomcat for a performance sake.
One of J2EE Application container (and JBoss, as an implementation) key features is a possibility of transparent distributed transactions among different kind of transactional resources. That is great idea and it simplifies coordination of, let's say, JMS messenging and database operations a lot. But when it comes to a necessity of high throughout it becomes a problem. Unfortunately, distributed transactions are notable for its slow speed.
It isn't an easy task to migrate from JBoss to Spring, however it's possible and Spring/Tomcat could be considered, with quite rare exceptions, as fully functional replacement for JBoss.

How do I decide between a using a Swing GUI or a light-weight web client for the user front end of my Java application?

I always seem to have this internal struggle when it comes to user interface. I build up an application "engine" and tend to defer user interface to after I get my algorithms working. Then I go back and forth trying to decide how to let a user interact with my program. Personally, I'm a fan of the command line, but I can't expect that of my users generally.
I really like what's possible in the browser in the age of web 2.0 and ajax. On the other hand it's not so hard to make a Swing front-end either, and you can generally count on a more consistent presentation to the user (though using a good javascript framework like YUI or jQuery goes a long way toward normalizing browsers).
Clearly both approaches have their merits and drawbacks. So, what criteria / parameters / situations should lead me to use a lightweight (e.g. web-based) GUI? What criteria / parameters / situations should lead me to use a heavier (e.g. Swing-based) GUI?
It is not my intent to start a flame war, merely interested in the community's constructive/objective opinions.
Edit #1
In light of the first few responses, I would like to clarify that I would like to deploy my application regardless, not host it on some internet server necessarily. So I would have to deploy with a light-weight web-server infrastructure a la Jetty/Tomcat or similar.

It depends on the application and this is essentially a usability driven question (though there are considerations like data storage and platform requirements). Think of the pros and cons.
Pros of a lightweight Web UI:
Ease of distribution
Platform independent
Ease of maintenance
Cons of a lightweight Web UI:
Less environmental control
Markup standards vary between browsers
Requires a web server and everything that goes with it
Pros of an executable UI
More environmental control (i.e.: full screen applications, etc)
Not necessarily subject to latency and outages
Cons of an executable UI
Pushing updates may be more difficult
Requires installation
Potential platform requirements (frameworks, packages, etc)
Potentially requires knowledge of advanced networking topics (web services, etc)

One small factor you may want to consider is that the user will have go through some type of installation (albeit minimal) if you distribute a swing application.
Also a web application will allow you to accurately track the usage of your application (via google analytics or something similar). Not sure if that's a concern but it may be useful to you in the future.

If it is a client-server application I would normally go for a web frontend for the application.
You will save yourself of countless problems with things like installed JRE versions, distributing upgrades, application permissions, disappeared shortcuts...

You need to break the requirements of the application down to decide this...
Do the users have Java of sufficient version installed? It will need to be, to run a Swing GUI.
Do you have a web server?
Do you need the flexibility of a Swing GUI or the accessibility of the web interface?
Is Java Webstart and option, if so, you can distribute a Swing GUI via the web.
Does your application perform extensive calculations or processing? If so, a client app may be the answer.
There are a million questions such as these. I would suggest a brain storming session and keeping track of all the pros and cons of each, adding a point score, than throwing it all away and going with your gut feeling :)

If you anticipate there being frequent updates to the app then web based may be better since the user would not have to update the client or install a new client containing the updates.
If you think that the user may need the ability to use the app while not conencted to the internet then swing would be better.
Just two things off the top of my head.

Think about the users and use cases of your project.
Do users expect to have access to it when they're disconnected from the Internet (for example, on an airplane or in a coffee shop with no Internet access)? Use Swing.
Do you want users to be able to access the same tool from different computers (for example, both at work and at home)? Use a web UI.
Also consider whether the user needs to save and load data, and whether the tool produces data files that some might consider sensitive (if so, storage on the web might be an issue).

Do make a quick guess I often try to ask myself/customers if the application has a high "write" demand.
For a mostly read-only application a thin-client solution is perfectly well suited.
But if a lot write actions are needed then a swing desktop application has more flexibility.
Personally I always prever a swing desktop application. It can easily deployed using Java Webstart.

Not knowing anything about your application I can not give the best recommendation possible. However I can state from personal/professional experience that installing an application on clients machines is a LOT more of a pain in the ass than it seems.
With AJAX/web you really only have to worry about supporting like three browsers. Installation messes/updates are only felt once when you deploy the product to the web server.
With like a stand-along Swing app, you get to deal with the really really big mess that is installing the application onto unknown systems. This mess was so bad that things like AJAX were really pushed along to make web apps behave/feel like a real native app.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.