In Vaadin Flow web apps, the state of the entire user-interface is maintained in the session on the web server, with automatic dynamic generation of the HTML/CSS/JavaScript needed to represent that UI remotely on the web browser client. Depending on the particular app, and the number of users, this can result in a significant amount of memory used on the web container.
Is it possible to limit the amount of memory a session and requests related to it can use?
For example, I would like to limit each user session to one megabyte. This limit should apply to any objects created when handling requests. Is that possible?
It is theoretically possible, but it is not practical.
As far as I am aware, no JVM keeps track of the amount of memory that (say) a thread allocates. So if you wanted to do this, you would build a lot of infrastructure to do that. Here are a couple of theoretical ideas.
You could use bytecode engineering to inject some code before each new to measure and record the size of the object allocated. You would need to run this across your entire codebase ... including any Java SE classes and 3rd-party classes that you app uses.
You could modify the JVM to record the information itself. For example, you might modify the memory allocator that new uses.
However, both of these are liable be a lot of work to implement, debug and maintain. And both are liable to have significant performance impact.
It is not clear to me why you would need this ... as a general thing. If you have a problem with the memory usage of particular types of requests, then it would be simpler for the request code itself to keep tabs on how big the request data structures are getting. When the data structures get too large, the request could "abort" itself.
As the correct Answer by Stephen C explains, there is no simple automatic approach to limiting or managing the memory used in Java.
Given the nature of Vaadin Flow web apps, a large amounts of memory may be consumed on the server for user sessions containing all the state of each user’s user-interface.
Reduce memory usage of your codebase
The first step is to examine your code base.
Do you have data replicated across users that could instead be shared across users in a thread-safe manner? Do you have cached data not often used that could instead be retrieved again from its source (database, web services call)? Do you cache parts of the UI not currently onscreen that could instead be instantiated again later when needed?
More RAM
Next step is to simply add more memory to your web server.
Buying RAM is much cheaper than paying for the time of programmers and sysadmins. And so simple to just drop in more stocks of memory.
Multiple web servers
The next step after that is horizontal scaling: Use multiple web servers.
With load balancers you can spread the user load across servers fairly. And “sticky” sessions can be used to direct further user interactions to the same server to continue a session.
Of course, this horizontal scaling approach is more complicated. But this approach is commonly done in the industry, and well-understood.
Vaadin Fusion
Another programming step could involve refactoring app to build parts of your app using Vaadin Fusion.
Instead of your app being driven from the server as with Vaadin Flow, Fusion is focused on web components running in the browser. Instead of writing in pure Java, you write in TypeScript, a superset of JavaScript. Fusion can make calls into Vaadin Flow server as needed to access data and services there.
Consulting
The Vaadin Ltd company sells consulting services, as do others, to assist with any of these steps.
Session serialization
Be aware that without taking these steps, when running low on memory, some web containers such as Apache Tomcat will serialize sessions to disk to purge them from memory temporarily.
This can result in poor performance if the human users are actively still engaged with those sessions. But the more serious problem is that all the objects in your entire sessions must be serializable. And you must code for reconnecting database connections, etc. If supporting such serialization is not feasible, you likely can turn off this serialize-sessions-on-low-memory feature of the web server. But then your web server will suffer when running out of memory with no such recourse available.
Related
I'm trying to make simple application and deploy it on Google Cloud Platform Flexible App Engine, which will contain two main parts:
Front end application (simple Web UI based on Java 8 (Spring + Thymeleaf) with OAuth authorization from different external sites)
Back end application (monitoring several resources in separate threads, based on logged in users and reacting to their input in a certain way (behavioral changes))
Initially I was planning to make them as one app, but I think that potentially heavy background processing may cause failures in my front end application part + App Engine docs says that deployed services behave similar to microservice architecture.
My questions are:
Do I really need to separate front end from back end, if I need to react to user input as fast as possible? (but delays up to 2 seconds aren't that critical)
If I do need to separate them (and I strongly believe that I do) - how to I set up interaction between applications?
Each resource must be tracked exactly by one thread on back end - what are the best practices about this? I thought about having a SQL table with a list of acquired resources, but the flaw I see there is if an instance will fail I will need to make some kind of clean up on that table and redetermine which resources are actually acquired.
Your proposed architecture sounds like the right approach in separating the two into different services for the following reasons:
Can deploy code for each separately, rollback versions separately, and split traffic separately for experiments or phased rollouts.
Can adjust machine types and memory allocations for each service to better suit its needs. If you're doing work that is memory intensive on the backend, you can adjust that service's settings to allocate more memory per instance.
Allow each type of service to scale independently based on demands, which should result in better utilization of the services and less waste. This should also lower your overall spending than if you tried to go for a one-sized fits all approach in a single monolithic service.
You can mix different runtime environments across services. For example, you can mix language runtimes within a project OR you could even mix between standard and flexible environments. Say your front-end code is more cost efficient in standard, designate that service as a standard environment service and your backend as a flexible environment service. Or say you need a customer docker file with Perl in it, you could do that as a flexible environment custom runtime and have your front-end in Java 8.
You can still share common services like Cloud SQL, PubSub, Cloud Tasks (currently in alpha) or Redis for in memory caching. Your works don't need t reside in App Engine, they could reside in a different product if that better suits your needs.
Overall, you get much better control over your application to split it apart. The biggest benefit likely comes down to optimizing your application for spending only on what you need.
I think that you are likely to be able to deploy everything as an appengine app except if you use some exotic Java libraries that are not whitelisted. It could still be good to deploy it with compute engine for increased configurability and versatility.
You can create one front-end instance and one back-end instance in compute engine and divide the resources between them like that. Google's documentation has an example where you can do that.
I am trying to determine what part of my app code is needing a large amount of memory when handling client requests.
For this I am using VisualVM and JConsole, in the local development server, but the requests have become pretty complex and it is very hard to track down the memory consumption of the requests, and I have no idea how to proceed.
One request, from start to finish, usually uses: Search API, Entity Low level API (datastore access), Java reflection for entity conversion (low level to java plain objects), GWT RPC. so there are tens or hundreds of classes to look for (my code) at least.
I would like to know:
is it ok to make tests in local dev server, since the environment is very different from the one in production mode? I believe it should't be a problem if I know specifically "where /how to look" for memory.
what tools or patterns can I use to track down the memory used by one request (and then use it to estimate for N clients running simulateous requests).
I believe that indeed the memory needed has become very large, but I need to know how much of my code can I optimize (or where do I have code problems, garbage, libs etc) and at what point should I increase the instance types (to F4) or event switch from standard to flexible environment.
Also, if there are java tools/APIs to programmatically determine memory consumption, please advise!
Thank you.
What are the different ways to cache a web application data, developed using Java and NoSQL database? Databases also provide caching, are they, the only & always the best option to go with, for caching?
How else can I cache my data of users on the application. Application contains very user specific data like in a social network. Are there some simple thumb rules of what type of things should be cached?
Can I also cache my data on the application server using Java ?
If you want a rule of thumb, here's what Michael Jackson (not that Michael Jackson) said:
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization (for experts only!): Don't do it yet.
The ancient tradition is that you don't optimise until you've profiled - that is, until you have hard evidence as to what actually needs to be optimised. Cacheing is a kind of optimisation; it is very likely to be important for your app, but until you are able to put your app under load and look at what objects are taking a long time to obtain (loading from the database or whatever), you won't know what needs cacheing. It really doesn't matter how smart you are, or what advice you get here - until you do that, you will not know what needs to be cached.
As for things you can cache, it's anything, but i suppose you can classify it into three groups:
Things that have come fresh from the database. These are easy to cache, because at the point at which you go to the database, you have the identifying information you'd need for a cache key (primary key, query parameters, etc). By cacheing them, you save the time taken to get them from the database - this involves IO, so it is likely to be quite large.
Things that have been produced by computation in the domain model (news feeds in a social app, perhaps). These may be trickier to cache, because more contextual information goes into producing them; you might have to refactor your code to create a single point where the required information is all to hand, so you can apply cacheing to it. Or you might find that this exists already. Cacheing these will save all the database access needed to obtain the information that goes into making them, as well as all the computation; the time taken for computation may or may not be a significant addition to the time taken for IO. Invalidating cached things of this kind is likely to be much harder than pure database objects.
Things that are being sent to the browser - pages, or fragments of pages. These can be quite easy to cache, because in a properly-designed application, they're uniquely identified by either the URL, or the combination of URL and user. Cacheing these will save all the computation in your app; it can even avoid servicing requests, because it can be done by a reverse proxy sitting in front of your app server. Two problems. Firstly, it uses a huge amount of memory: the page rendered from a few kilobytes of objects could be tens or hundreds of kilobytes in size (my Facebook homepage is 50 kB). That means you have to save a vast amount of computation to make it a better deal than cacheing at the database or domain model layers, and there just isn't that much computation between the domain model and the HTML in a sensibly-designed application. Secondly, invalidation is even harder than in the domain model, and is likely to happen prohibitively often - anything which changes the page or the fragment needs to invalidate the cache.
Finally, the actual mechanism: start with something simple and in-process, like a map with limited size and a least-recently-used eviction policy. That's simple but effective. Something out-of-process like EHCache is more complicated, but has two advantages: you can share caches between multiple processes (helpful if you have a cluster, which you probably will at some point), and you can store data where the garbage collector won't see it, which might save some CPU time (might - this is too big a subject to get into here).
But i reiterate my first point: don't cache until you know what needs to be cached, and once you do, be mindful of the limitations on the benefits of cacheing, and try to keep your cacheing strategy as simple as possible (but no simpler, of course).
I'll assume you're building a relatively typical web application that:
has a single server used for persistence
multiple web servers
ties authenticated users to a single server via sticky sessions through a load balancer
Now, with that stated to answer so of your questions. Most persistence, database or NoSQL, likely have some sort of caching built in such that if you execute the same simple query repeatedly (e.g. retrieval by primary key) it's able to cache the result. However, the more complex the query, the less likely persistence can perform caching on it. In addition, if there's only one server for persistence (i.e. no sharding, or write master/read slaves) it quickly becomes the bottleneck. So the application level caching you want to do usually should occur on the web servers to reduce load on the database.
As far as what should be cached, the heuristic is items frequently accessed and/or expensive to generate (in terms of database/web server processing/memory). Typical candidates are the home page and any other landing page of a site - often the best approach for these is generating a static file and serving that. The next pieces depend on your application, but typically the most effective strategy is caching as close to the final result as possible - often the HTML being served. For your social network this might be a list of featured updates or some such.
As far as user sessions are concerned, these are definitely a good candidate for caching. In this case you can probably get a lot of mileage out of judicious use of the web server's session scope (assuming a JSP server). This data lives in memory and is a good place to keep of user specific information shown once a user authenticates on every page (e.g. first and last name).
Now the final thing to consider is dealing with cache invalidation and really is the hard part of all this (naming stuff is the other hard thing in computer science). In this case using something like memcached or ehcache as others have mentioned is the right approach. ehcache can easily run in process with your java application and does a good job of expiring things, with policies for least recently used and least frequently used, and allowing you to use both memory and disk for caching. What you'll need to think about is the situations where you need to expire something form the cache ahead of this schedule because data's changed. In this case you need to work through those dependencies in your application's architecture so that it read/writes to the cache as appropriate.
I have a question form my brand new Java web application. In a web server, what is the biggest problem for a site that have increased your visits every day? Is memory a problem in the future? My application uses a simple J2EE, Tomcat, JPA and Hibernate.
I was a PHP developer, and for each visitor of my site, I use a little bit more memory, and in Java, how works?
Like PHP a Java web applications uses a bit of memory for each concurrent request. Thus the more concurrently running requests, the bigger the memory foot print becomes. The total required memory under certain loads depends on how fast each request is processed, because faster processing means less concurrent requests.
I also assume that a PHP web application will use very little initial memory at startup, but will use more memory for each request compared to a Java web application. The cause is that Java web application typically keeps more object preloaded and API’s like Hybernate are often configured to use database connection pooling and object caches.
It depends on how many objects you are using...in java it is usually memory issue, which is caused by a fact that you are creating DOM model of documents for example.
But if it is a simple web application, then the issue should be the fact, that there is always one servlet instance handling requests, so you wouldn't go out of memory, but it would get very slow. The threads from tomcat would have to wait until the request is processed for another one to be executed.
There are simply limits for the number of requests per second...But as I said, it is more likely then that you would get out of memory.
I have an application that's a mix of Java and C++ on Solaris. The Java aspects of the code run the web UI and establish state on the devices that we're talking to, and the C++ code does the real-time crunching of data coming back from the devices. Shared memory is used to pass device state and context information from the Java code through to the C++ code. The Java code uses a PostgreSQL database to persist its state.
We're running into some pretty severe performance bottlenecks, and right now the only way we can scale is to increase memory and CPU counts. We're stuck on the one physical box due to the shared memory design.
The really big hit here is being taken by the C++ code. The web interface is fairly lightly used to configure the devices; where we're really struggling is to handle the data volumes that the devices deliver once configured.
Every piece of data we get back from the device has an identifier in it which points back to the device context, and we need to look that up. Right now there's a series of shared memory objects that are maintained by the Java/UI code and referred to by the C++ code, and that's the bottleneck. Because of that architecture we cannot move the C++ data handling off to another machine. We need to be able to scale out so that various subsets of devices can be handled by different machines, but then we lose the ability to do that context lookup, and that's the problem I'm trying to resolve: how to offload the real-time data processing to other boxes while still being able to refer to the device context.
I should note we have no control over the protocol used by the devices themselves, and there is no possible chance that situation will change.
We know we need to move away from this to be able to scale out by adding more machines to the cluster, and I'm in the early stages of working out exactly how we'll do this.
Right now I'm looking at Terracotta as a way of scaling out the Java code, but I haven't got as far as working out how to scale out the C++ to match.
As well as scaling for performance we need to consider high availability as well. The application needs to be available pretty much the whole time -- not absolutely 100%, which isn't cost effective, but we need to do a reasonable job of surviving a machine outage.
If you had to undertake the task I've been given, what would you do?
EDIT: Based on the data provided by #john channing, i'm looking at both GigaSpaces and Gemstone. Oracle Coherence and IBM ObjectGrid appear to be java-only.
The first thing I would do is construct a model of the system to map the data flow and try to understand precisely where the bottleneck lies. If you can model your system as a pipeline, then you should be able to use the theory of constraints (most of the literature is about optimising business processes but it applies equally to software) to continuously improve performance and eliminate the bottleneck.
Next I would collect some hard empirical data that accurately characterises the performance of your system. It is something of a cliché that you cannot manage what you cannot measure, but I have seen many people attempt to optimise a software system based on hunches and fail miserably.
Then I would use the Pareto Principle (80/20 rule) to choose the small number of things that will produce the biggest gains and focus only on those.
To scale a Java application horizontally, I have used Oracle Coherence extensively. Although some dismiss it as a very expensive distributed hashtable, the functionality is much richer than that and you can, for example, directly access data in the cache from C++ code .
Other alternatives for horizontally scaling your Java code would be Giga Spaces, IBM Object Grid or Gemstone Gemfire.
If your C++ code is stateless and is used purely for number crunching, you could look at distributing the process using ICE Grid which has bindings for all of the languages you are using.
You need to scale sideways and out. Maybe something like a message queue could be the backend between the frontend and the crunching.
Andrew, (in addition to modeling as a pipeline etc), measuring things is important. Have you ran a profiler over the code and got metrics of where most of the time is spent?
For the database code, how often does it change ? Are you looking at caching at the moment ? I assume you have looked at indexes etc over the data to speed up the Db ?
What levels of traffic do you have on the front end ? Are you caching web pages ? (It isn't too hard to say use a JMS type api to communicate between components. You can then put Web Page component on one machine (or more), and then put the integration code (c++) on another, and for many JMS products there are usually native C++ api's ie. ActiveMQ comes to mind), but it really helps to know how much of the time is in Web (JSP ?) , C++, Database ops.
Is the database storing business data, or is it being also used to pass data between Java and C++ ? You say you are using shared mem not JNI ? What level of multi-threading currently exists in the APP? Would you describe the code as being synchronous in nature or async?
Is there a physical relationship between the Solaris code and the devices that must be maintained (ie. do all the devices register with the c++ code, or can that be specified). ie. if you were to put a web load balancer on the frontend, and just put 2 machines up today is the relationhip of which devices are managed by a box initialized up front or in advance?
What are the HA requirements ? ie. just state info ? Can the HA be done just in the web tier by clustering Session data ?
Is the DB running on another machine ?
How big is the DB ? Have you optimized your queries ie. tried using explicit inner/outer joins sometimes helps versus nested sub queries (sometmes). (again look at the sql stats).