Java Web Application for 5000~ Users - java

For the first time (hopefully not the last) in my life I will be developing an application that will have to handle a high number of users (around 5000) and manage lots of data. I have developed an application that manages lots of data (around 100~ GB of data, not so much by many of your standards), but the user count was pretty low (around 50).
Here is the list of tools / frameworks I think I will be using:
Vaadin UI framework
Hibernate
PostgreSQL
Apache Tomcat
Memcached (for session handling)
The application will mainly be run inside a company network. It might be run on a cluster of servers or not, depends on how much money the company wants to spend to make its life easier.
So what do you think of my choices and what should I take caution of?
Cheers

The answer, as with all performance/scaling related issues is: it depends.
There is nothing in your frameworks of choice that would lead me to think it wouldn't be able to handle a large amount of users. But without knowing what exactly you want to do or what your budget is, it's impossible to pick a technology.
To ensure that your application will scale/perform, I would consider the following:
Keep the memory footprint of each session low. For example, caching stuff in the HttpSession may work when you have 50, but not a good idea when you have 5000 sessions.
Do as much work as you can in the database to reduce the amount of data that is being moved around (e.g. when looking at tables with lots of rows, ensure that you've got paging that is done in the database (rather than getting 10,000 rows back to Tomcat and then picking the first 10...)
Try to minimise the state that has to be kept inside the HttpSession, this makes it easier to cluster.
Probably the most important recommendations:
Use load testing tools to simulate your peak load and beyond and test. JMeter is the tool I use for performance/load-testing.
When load testing, ensure:
That you actually use 5000 users (so that 5000 HttpSessions are created) and use a wide range of data (to avoid always hitting the cache).
EDIT:
I don't think 5000 users is not that much and you may find that (performance-wise) you only need a single server (depends on the size of the server and the results of the load testing, of course, and you may consider a clustered solution for failovers anyway...) for handling the load (i.e. not every one of your 5000 users will be hitting a button concurrently, you'll find the load going up in the morning (i.e. everyone logs in).

You might want to consider an Apache HTTP server in front of your Tomcat servers. Apache will provide: compression, static caching, load-balancing and SSL.

Any reason for not using Spring? It has really became an de-facto standard in the enterprise java applications.
Spring provides an incredibly powerful and flexible collection of technologies to improve your enterprise Java application development that is used by millions of developers.
Spring is lightweight and can stay as a middle layer, connecting the vaadin and hibernate, there by creating a clean separation of layers. The spring transaction management is also superior to the one on hibernate. I will suggest you go for it until you have a strong reason stopping you.

Since you asked people to weigh in, I won't hold back my opinion. ORMs in general, and Hibernate in particular, are an anti-pattern. I know, I've worked in shops that use Hibernate over the past 9 years. Knowing what I know now, I will never use it again.
I highly recommend this blog post, as it puts it more succinctly than I can:
ORM is an anti-pattern
But forgive me if I quote the bit from that blog about ORMs and anti-patterns:
The reason I call ORM an anti-pattern is because it matches the two
criteria the author of AntiPatterns used to distinguish anti-patterns
from mere bad habits, specifically:
It initially appears to be beneficial, but in the long term has more
bad consequences than good ones
An alternative solution exists that is proven and repeatable
Your other technology choices seem fine. Personally, I lean more toward Jetty than Tomcat. There's a reason that Google embeds it in a lot of their projects (think GWT and PlayN); it's a younger codebase and I think more actively developed now that Eclipse has taken it over. Just my humble opinion.
[UPDATE] One more link, very long read but when making architectural decisions, reading is good.
Object/Relational Mapping: The Vietnam of Computer Science

I recommend Glassfish for application server because Apache Tomcat can serve simple content. And Glassfish has full implementation of the Java EE specification.

Depending on you specification and future goals I would perhaps leave the normal version of tomcat and go for Apache TomEE or my personal preference of jBoss. As I understand it EJB's are not very well supported in the normal tomcat version and that is probalby something sweet to have when you want to create a couple of services, some clustered singleton service and other stuff. But this is just my personal pref of course and if your specification will not allow a more advanced EE server then you should stick with the slick tomcat.

Related

Is it worth changing from java/spring/hibernate to rails for a program that is undergoing massive changes?

I have a project whose core domain is dramatically changing. It's possible to use 50% of the core functionality from this site and just add the 50% new functionality, but I am starting to consider that maybe it might be faster to simply redo the product in Rails. Development speed is very important.
There are some things I really like about java - the performance and scalability are very good. I am not a crappy Java developer, so my apps tend to run very well - better than the Rails sites I've seen. I've always accepted the idea that people probably just throw a little more money at the problem when it comes to using Rails, which probably works itself out in the end because of the insane productivity benefits.
I am actually quite agile with Java. I know it will still take me longer to add a basic entity to the system, but I am quick at it and I don't mind it that much. At least it's easy and straight-forward to do.
What I do mind is:
having to start/stop the server just to fix a route, lazy load exception, controller is going to wrong view, etc.
putting up with the fact that unit/integration tests sometimes have different results than the production environment (because annotations on controllers can't be tested, or lazy-loading exceptions occur during asynchronous service calls, or things like that). Knowing if your Jackson is marshaling your data properly is another Tomcat-only thing because it's handled by Spring. There are lots of things that go wrong after you have tested all that you can, and this frankly annoys the crap out of me.
putting up with the occasional maven/classloader problem that doesn't rear its ugly head until you deploy into tomcat. It gives the false impression that everything is "a-okay" when you are in your IDE.
having to put more effort to do database migrations than the ruby people ever have to.
putting up with framework bugs in Spring that block (it's happened about 5 times on this project since 2009) or Hibernate. I also don't like upgrading Spring Security and having them constantly change the configuration, apis and tag libraries over and over again. This is annoying.
wasting so much time uploading 58 MB war files to the server! These take me 12 minutes to upload whenever I need to deploy changes. If I forgot to do 'mvn clean' before I upload, Spring might complain that 2 beans exist with the same name because I moved one to a new package... and then I have to re-upload the whole stupid war file again. Why isn't "clean" run by default whenever you do 'mvn package' for?!?! Sometimes these frameworks and tools use the stupidest default settings. This is just so common in the Java world.
Having to spend hour(s) to figure out where a framework wants to plug-in your own custom implementation for something. This is very annoying. You can spend 2 hours sifting through Google and crappy documentation trying to figure out how to override Spring Security's authentication mechanism for example... and then spend only 5 minutes writing the actual implementation. Of course, they wrote paragraphs upon paragraphs explaining the architecture and how awesome it is, but nobody cares. For something so common, why not just give example source code and be done with it?
Waiting 10-15 seconds for Spring to start up whenever you want to run your integration tests. This is a drag.
There are a few things I like about Java though. Role-based access is very easy to do with Spring Security. Authentication is never that big of a gain, but I like the implementation inside of Spring.
I also like Spring's form-backing objects and #ModelAttribute. These are huge wins when it comes to controllers, and I don't know if Rails can do these things. I honestly never liked passing request parameters around in every action - Spring MVC is actually a lot easier to use when it comes to this common bloat.
Being able to cache really massive structures in memory and have them stay in memory when you start the application is also highly desirable, especially for this application actually. I have an in-memory thesaurus and grammar checker that needs to get called hundreds of times per request, so in memory is pretty much the fastest option for me.
Even still, I think I could rebuild what I have in 2-3 weeks, and then add all of the new features in a few weeks using rails.
On the bright side, all of the really well-designed css, html and javascript could be ported over with very little problems.
I'd appreciate some advice on the subject before I continue.
PS: I could also go to Spring-ROO... but that would also be a considerable rework. I was never using JPA - I was using Hibernate directly. I am also not using JSP's - I am using Freemarker.
It takes more time to get good at Ruby, and Rails. I worked as an independent contractor as Spring and Hibernate expert myself, but I felt strangled by java and it's web frameworks so I decided to learn Ruby on Rails.
I would advice you to learn Ruby, from what I read you would probably master it, although get pretty frustrated with the very different way the use the ORM. I had issues with it, used to working on aggregate roots in Hibernate to the ActiveRecord one class one table kind of pattern. But hey, you could easily try out MongoDB to have some real fun.
Ruby is
less code
it's fast and scalable (slower than java on the specific tasks, but you get rid of stacks of layers.)
the problems are more often; which gem should I use. Luxorious!
a unique, big, sharing and caring open source community
nice frameworks, as Rails and Sinatra
powerful.
fun!
Would I advice you to do the project you describe in Ruby.
NO.
Not if speed of development matters. You will be slower, trust me. There's a lot to learn, it's conventions are not familiar to a java programmer and when you get stuck, lots of hours fly by.
The best option would be to hire a senior ruby developer to pair up with you and teach you. Be a good apprentice and you'll learn fast. Faster than me, I had to learn most by myself, which is really inefficient.
Good luck!
Check out Playframework. Its fun to develop, and you can use your Java experience to develop features way quicker (given than you have 2 weeks) than any other Java-based frameworks out there.
You do not have to start/stop a server. You fix the code in Eclipse and hit refresh on the browser. No dealing with WAR files till you have to actually deploy in production. Do everything from within Eclipse. Easily perform TDD process if thats what you want as you develop code. From an architecture standpoint, it is a fully stateless, RESTful framework from the get-go. Fully JPA compliant (even for NoSQL like Mongo), so you will not have to write complex JDBC code. On the front-end, it has a full featured templating engine, using Groovy as a templating language.
I can go on and on, but I'd recommend going through the site and take a look.
You should take a look at Grails.
You can continue to leverage a lot of your Java code but use a scripting language (Groovy) and many of the paradigms of Rails. E.g. lots of time saved by using convention rather than configuration.
Grails is used by some pretty big web sites E.g. BSkyB the UK satellite broadcaster.
It doesn't really help with some of the startup speed aspects. If you really prize development speed that much - get a faster machine or buy an SSD and fit in your machine. If you work for a big company - sell it to your manager as the cheaper option (E.g. buy a $2000 machine rather than spend 3 weeks rewriting something to save 10 minutes a day).
Java will scale better in the long run than Rails. The Hotspot technology in the JVM is one of the wonders of modern technology.
Also worth checking out is Tapestry5. It allows you to make code changes on the fly (no server restart required) and is easily the fastest & leanest framework to develop with in Java I've used.
I would still give Spring Roo a shot, it will the same rework as with Ruby on Rails or Grails or even less, but you will still stay with something that you are familiar with, which is often the biggest consideration
It has the scaffolding concepts of Ruby on Rails and Grails, but it gives you zero lock in code, just simple, well written (massive use of AOP is matter of taste though) of Spring + Hibernate / JPA (I think you can use Freemarker for the views, Roo has a miriad of plugins, but I'm not 100% sure)

scalability and performance in java web applications

Let's say you want to build a web application with high scalability (over 10,000 simultanious users). How do you guarantee good and steady performance? What design patterns are recommendable? What are most frequent mistakes?
Are there frameworks that force yourself to write scalable code? Would you maybe consider php as frontend and Java as backend technology? Or is let's say JSF reasonable as well and it's all about your architecture? And how good is developing with Grails in that context?
Hope this thread is not too subjective but I like to gather some experiences of you :-)
If you want to build a highly scalable application then it should be stateless and use shared nothing architecture as much as possible. If you share nothing between nodes and a node dont have a state then synchronization is minimum. There are several good web frameworks suitable for your requirements (Play Framework and Lift for Java, Django for Python, Ruby on Rails for Ruby).
As for JSF and related technologies, I dont think it would be wise to use them in your case. A good old request-responce is better.
If you want your application to scale nicely and perform well then you need to have a Distributed Cache. Distributed cache can incredibly boost up application performance and for this purpose you can use any third party distributed cache like NCache.
With so many simultaneous users (a situation I confess I've never encountered myself), what I think is the most important is to be able to load-balance your charge across many web servers.
If you want failover (which is probably a must-have), this means that you must be very careful about state : the more state you have, the more memory you need, and the more difficult it is to handle failover between servers : either you need to persist the session state in a location that is common to all the servers, or you need to replicate the state across servers.
So, I would choose an architecture where you don't need too much state on the server. IMHO, an action based framework is more suited to this kind of architecture than a component-based one, unless the state is handled at client side, with rich JavaScript components.

JavaEE Application Server or Lightweight Container?

Let me preface this by saying this is not an actual situation of mine but I'm asking this question more for my own knowledge and to get other people's inputs here.
I've used both Spring and EJB3/JBoss, and for the smaller types of applications I've built, Spring (+Tomcat when needed) has been much simpler to use. However, when scaling up to larger applications that require things like load balancing and clustering, is Spring still a viable solution? Or is it time to turn to a solution like EJB3/JBoss when you start to get big enough to need that? I'm not sure if I've scoped the problem well enough to get a good answer, so please let me know.
Thanks,
Jeff
Tomcat can be clustered.
Load balancing is usually a hardware solution (e.g., a BigIP or Cisco ACE) that's independent of app server.
Spring can be enterprise, just like EJB. There's no dividing line that says Spring can't handle it.
I could say that in our project which quite large (~500K LOC) we've got rid of JBoss in favour of Spring/Tomcat for a performance sake.
One of J2EE Application container (and JBoss, as an implementation) key features is a possibility of transparent distributed transactions among different kind of transactional resources. That is great idea and it simplifies coordination of, let's say, JMS messenging and database operations a lot. But when it comes to a necessity of high throughout it becomes a problem. Unfortunately, distributed transactions are notable for its slow speed.
It isn't an easy task to migrate from JBoss to Spring, however it's possible and Spring/Tomcat could be considered, with quite rare exceptions, as fully functional replacement for JBoss.

Any experience using Terracotta open source?

Does anybody have experience using the open source offering from Terracotta as opposed to their enterprise offering? Specifically, I'm interested if it is worth the effort to use terracotta without the enterprise tools to manage your cluster?
Over-simplified usage summary: we're a small startup with limited budget that needs to process millions of records and scale for hundreds-of-thousands of page views per day.
I am in a process of integrating Terracotta with my project (a sensor node network simulator). About three weeks ago I found out about Terracotta from one of my colleagues. And now my application takes advantage of grid computing using Terracotta. Below I summarized some essential points of my experience with Terracotta.
The Terracotta site contains pretty detailed documentation. This article probably a good starting point for a developer Concept and Architecture Guide
When you are stuck with a problem and found no answer in the documentation, the Terracotta community forum is a good place to ask questions. It seems that Terracotta developers check it on a regular basis and pretty responsive.
Even though Terracotta is running under JVM and it is advertised that it is only a matter of configuration to make you application running in a cluster, you should be ready that it may require to introduce some serious changes in you application to make it perform reasonably well. E.g. I had to completely rewrite synchronization logic of my application.
Good integration with Eclipse.
Admin Console is a great tool and it helped me a lot in tweaking my application to perform decently under Terracotta. It collects all performance metrics from servers and clients you can only think of. It certainly has some GUI related issues, but who does not :-)
Prefer standard Java Synchronization primitives (synchronized/wait/notify) over java.util.concurrent.* citizens. I found that standard primitives provide higher flexibility (can be configured to be a read or write cluster lock or even not-a-lock at all), easier to track in the Admin Console (you see the class name of the object being locked rather then e.g. some ReentrantLock).
Hope that helps.
At the moment, the Terracotta enterprise tools provide only a few features beyond the open source version around things like visualization and management (like the ability to kick a client out of the cluster). That will continue to diverge and the enterprise tools are likely to boast more operator-level functionality around things like managing and monitoring, but you can certainly manage and tune an app even with the open source tools.
The enterprise license also gives you things like support, indemnification, etc which may or may not be as important to you as the tooling.
I would urge you to try it for yourself. If you'd like to see an example of a real app using Terracotta, you should check out this reference web app that was just released:
The Examinator
You may want to take a look at JBossCache/PojoCache which is an in-memory distributed caching solution. The difference is it uses a simple API to propagate objects across your 'cluster' of caches, where as Terracotta works at the classloading/jvm level.
(They don't actually have their own JVM, but they modify classes as they are loaded to allow them to be 'clusterable')
Our company had a lot of luck with JBossCache, I'd recommend checking it out.
Update
What I see in the OP message is "well, I don't really know what we need (thus the lack of detailed requirements), but may be some enterprizey tool will magically solve all our problems, known and unforeseen? That would be awesome!"
With an architectural approach like this it's not gonna fly. No success stories from Teracotta would change that.
OSS is beneficial when the community around it can replace the commercial support. Suppose the guy have a problem in production. Community cannot help -- it's too small for the obscure product like this. Servers are down, business is in danger. You see? You need a commercial license up-front. No money? Well, then you're not a business, and probably not gonna become one (if nobody's willing to invest into it).
Sorry for interrupting your day-dreaming.
IMHO:
Terracotta is a clustering solution. Clustering is required for large, enterprise-grade applications. Large applications mean big budgets. Big budgets mean you can afford commercial license from Terracotta.
To put it in another way: if you don't have budget to buy it, it's probably not beneficial for your project.

MVC framework for huge Java EE application

Which MVC-framework is the best option (performance/ease of development) for a web application, that will have + 2 million visits per week.
Basically the site is a search engine,but also there will be large amounts of XML parsing, and high db traffic.
We are using Java, over Jboss 4.2.3x, with PG as DB, and Solr for the searches.
We were thinking on code JSPs with taglibs, and Servlets, but we were feeling like there would be a better alternative, which don't know yet, as we are starting on the Java Web applications world.
Any opinions, and shares of your experience will be appreciated!
Thanks in advance!
I think you really need to sit down with the options, and assess each one (or combination thereof).
Some possible framewords that you might use (that come to mind, beyond plain old JSPs with Servlets) are:
Struts and Tiles
Spring
Hibernate
Roll your own framework (often worthwhile for large projects, but only if you know what you need which is unlikely if you haven't done web apps before)
Grails (Groovy on Rails, but it runs on the JVM and can use Java libs)
and many more I'm sure...
Do you want to reinvent the wheel?
What client-side frameworks will you also want to use?
Spring MVC may be your best choice. While it is easy to use and integrate with the rest of the Java EE stack, it allows a huge level of customization, and last but not the least, it is really fast because there's little overhead. I highly recommend it.

Categories