I currently have an application running with Jetty (version 8.1.3). I would like to create an additional version for a different client environment on the same server.
Is there a risk of memory overhead on the server? or other? The two applications used the same database.
"Is there a risk of memory overhead on the server?"
From the Jetty standpoint, unlikely to be a risk, it generally occupies a very small footprint when compared to the applications deployed into it.
From your application standpoint, only you can determine that. You must compute your applications memory needs and what it may scale to in order to make this determination. You need to sort out a high water mark for memory needs for your application, double that and round up a bit to then decide if you have both the processing and memory available to do it. Remember your thread requirements as well, double the connection pooling (or are you sharing the pool with server wise jndi pools) and is your database going to be fine with that, the number of open files on the server allowed, etc, etc.
So long story short, there is no definitely yes or no answer available from a site like stackoverflow on this, it depends too much on your specific application and amount of traffic you have. Knowing that information however will let you have confidence on if you can do this or not.
Related
I am trying to determine what part of my app code is needing a large amount of memory when handling client requests.
For this I am using VisualVM and JConsole, in the local development server, but the requests have become pretty complex and it is very hard to track down the memory consumption of the requests, and I have no idea how to proceed.
One request, from start to finish, usually uses: Search API, Entity Low level API (datastore access), Java reflection for entity conversion (low level to java plain objects), GWT RPC. so there are tens or hundreds of classes to look for (my code) at least.
I would like to know:
is it ok to make tests in local dev server, since the environment is very different from the one in production mode? I believe it should't be a problem if I know specifically "where /how to look" for memory.
what tools or patterns can I use to track down the memory used by one request (and then use it to estimate for N clients running simulateous requests).
I believe that indeed the memory needed has become very large, but I need to know how much of my code can I optimize (or where do I have code problems, garbage, libs etc) and at what point should I increase the instance types (to F4) or event switch from standard to flexible environment.
Also, if there are java tools/APIs to programmatically determine memory consumption, please advise!
Thank you.
I am creating a (semi) big data analysis app. I am utilizing apache-mahout. I am concerned about the fact that with java, I am limited to 4gb of memory. This 4gb limitation seems somewhat wasteful of the memory modern computers have at their disposal. As a solution, I am considering using something like RMI or some form of MapReduce. (I, as of yet, have no experience with either)
First off: is it plausible to have multiple JVM's running on one machine and have them talk? and if so, am I heading in the right direction with the two ideas alluded to above?
Furthermore,
In attempt to keep this an objective question, I will avoid asking "Which is better" and instead will ask:
1) What are key differences (not necessarily in how they work internally, but in how they would be implemented by me, the user)
2) Are there drawbacks or benefits to one or the other and are there certain situations where one or the other is used?
3) Is there another alternative that is more specific to my needs?
Thanks in advance
First, re the 4GB limit, check out Understanding max JVM heap size - 32bit vs 64bit . On a 32 bit system, 4GB is the maximum, but on a 64 bit system the limit is much higher.
It is a common configuration to have multiple jvm's running and communicating on the same machine. Two good examples would be IBM Websphere and Oracle's Weblogic application servers. They run the administrative console in one jvm, and it is not unusual to have three or more "working" jvm's under its control.
This allows each JVM to fail independently without impacting the overall system reactiveness. Recovery is transparent to the end users because some fo the "working" jvm's are still doing their thing while the support team is frantically trying to fix things.
You mentioned both RMI and MapReduce, but in a manner that implies that they fill the same slot in the architecture (communication). I think that it is necessary to point out that they fill different slots - RMI is a communications mechanism, but MapReduce is a workload management strategy. The MapReduce environment as a whole typically depends on having a (any) communication mechanism, but is not one itself.
For the communications layer, some of your choices are RMI, Webservices, bare sockets, MQ, shared files, and the infamous "sneaker net". To a large extent I recommend shying away from RMI because it is relatively brittle. It works as long as nothing unexpected happens, but in a busy production environment it can present challenges at unexpected times. With that said, there are many stable and performant large scale systems built around RMI.
The direction the world is going this week for cross-tier communication is SOA on top of something like spring integration or fuse. SOA abstracts the mechanics of communication out of the equation, allowing you to hook things up on the fly (more or less).
MapReduce (MR) is a way of organizing batched work. The MR algorithm itself is essentially turn the input data into a bunch of maps on input, then reduce it to the minimum amount necessary to produce an output. The MR environment is typically governed by a workload manager which receives jobs and parcels out the work in the jobs to its "worker bees" splattered around the network. The communications mechanism may be defined by the MR library, or by the container(s) it runs in.
Does this help?
If an application already runs well on a laptop with a local webserver and db, how does it impact hardware sizing for when it is deployed into product?
We're piloting this application for the first time, and up until now the application runs fine off a mid tier laptop.
I assume any server will be more powerful than a laptop. How should one scale the requirements appropriately?
The main impacts I can see are:
Locality of DB (may be installed on a seperate server or data centre causing network issues - no idea if this even impacts cpu, memory specs)
Overhead of enterprise web container (currently using jetty, expected to move to tomcat for support reasons)
We're currently using Windows, server will most likely be in unix.
Not sure what applications details are relevant but:
- Single thread application
- Main function is to host a REST service which computes a algorithm of average complexity. Expecting around 16 requests a second max
- Using Java and Postgre currently
Thanks!
There's no substitute for a few things including testing on comparable server hardware and knowing the performance model of your stack. Keep in mind your requirements are likely to change over time, and often times it is more important to have a flexible approach to hardware (in terms of being able to move pieces of it onto other servers) than it is to have a rigid formula for what you need.
You should understand however, that different parts of your stack have different needs. PostgreSQL usually (not always, but usually) needs fast disk I/O and more processor cores (processor speed is usually less of a factor) while your Java app is likely to benefit from faster cores.
Here are my general rules:
Do some general performance profiling, come to an understanding of where CPU power is being spent, and what is waiting on disk I/O.
Pay close attention to server specs. Just because the server may be more powerful in some respects does not mean your application will perform better on it.
Once you have this done, select your hardware with your load in mind. And then test on that hardware.
Edit: What we pay attention to at Efficito are the followingfor CPU:
Cache size
Cores
GFLOPS/core
Other performance metrics
For hard drives, we pay attention to the following:
Speed
Type
RAID setup
General burst and sustained throughput
Keep in mind your database-portions are more likely to be I/O bound, meaning you get more from paying attention to hard drive specs than from CPU specs, while your application code will more likely be CPU-bound, meaning better CPU specs give you better performance. Keep in mind we are doing virtualized hosting of ERP software on PostgreSQL and Apache, and so we get to balance both sides usually on the same hardware.
I'm using JSP+Struts2+Tomcat6+Hibernate+MySQL as my J2EE developing environment. The first phase of the project has finished and it's up and running on a single server. due to growing scale of the website it's predicted that we're gonna face some performance issues in the future.
So we wanna distribute the application on several servers, What are my options around here?
Before optimize anything you should detect where your bottleneck is (Services, Database,...). If you do not do this, the optimization will be a waste of time and money.
And then the optimization is for example depending on you use case.
For example, if you have a read only application, add the bottleneck is both, Java Server and Database, then you can setup two database servers and two java servers.
Hardware is very important too. May the easiest way to to update the hardware. But this will only work if the hardware is the bottleneck.
You can use any J2EE application server that supports clustering (e.g. WebLogic, WebSphere, JBoss, Tomcat). You are already using Tomcat so you may want use their clustering solution. Note that each offering provides different levels of clustering support so you should do some research before picking a particular app server (make sure it is the right clustering solution for your needs).
Also porting code from a standalone to a cluster environment often requires a non-negligible amount of development work. Among many other things you'll need to make sure that your application doesn't rely on any local files on the file system (this is a bad J2EE practice anyway), that state (HTTP sessions or stateful EJB - if any) gets properly propagated to all nodes in your cluster, etc. As a general rule, the more stateless, the smoother the transition to a cluster environment.
As you are using Tomcat, I'd recommend to take a look at mod_cluster. But I suggest you to consider a real application server, like JBoss AS. Also, make sure to run some performance tests and understand where is the bottleneck of your application. Throwing more application servers is ineffective if, for instance, the bottleneck is at the database.
Is there any way to determine how many threads can specific JVM + Application Server combination handle? The question is not only about web application request serving threads, but also about background threads.
For all practical purposes this is situational, it really does "depend".
what are the threads are doing?
how much memory do they each need?
how much garbage collection is going on?
how much memory have you got?
how many CPUs have you got?
how fast are they?
Java EE App Server applications tend not to create threads themselves. Rather you configure thread pools. I've never yet been in a situation where the ability to create 10 more threads would solve a problem and some app server limitation prevented me from doing so.
Making performance comparisons between different App Servers is very non-trivial and the answers tend to brittle - ie. small changes in the type of work can produce different answers.
Why are you asking the question?
This is really dependent on the particular hardware you are running with (number of CPUs, amount of memory, etc.) and also dependent on the OS (Solaris vs. Windows) since the underlying threading is dependent on the OS-provided thread management. It also depends on the application and app server itself, since the amount of resources each thread consumes is application-dependent.
This really isn't something that you can find out purely from the Java VM. It is more of a hardware/OS limitation than anything specific to the VM. The best way to find out this answer is to test with a large number of threads and see where you start to see a performance dropoff. See also this devx discussion.