I have an application that's a mix of Java and C++ on Solaris. The Java aspects of the code run the web UI and establish state on the devices that we're talking to, and the C++ code does the real-time crunching of data coming back from the devices. Shared memory is used to pass device state and context information from the Java code through to the C++ code. The Java code uses a PostgreSQL database to persist its state.
We're running into some pretty severe performance bottlenecks, and right now the only way we can scale is to increase memory and CPU counts. We're stuck on the one physical box due to the shared memory design.
The really big hit here is being taken by the C++ code. The web interface is fairly lightly used to configure the devices; where we're really struggling is to handle the data volumes that the devices deliver once configured.
Every piece of data we get back from the device has an identifier in it which points back to the device context, and we need to look that up. Right now there's a series of shared memory objects that are maintained by the Java/UI code and referred to by the C++ code, and that's the bottleneck. Because of that architecture we cannot move the C++ data handling off to another machine. We need to be able to scale out so that various subsets of devices can be handled by different machines, but then we lose the ability to do that context lookup, and that's the problem I'm trying to resolve: how to offload the real-time data processing to other boxes while still being able to refer to the device context.
I should note we have no control over the protocol used by the devices themselves, and there is no possible chance that situation will change.
We know we need to move away from this to be able to scale out by adding more machines to the cluster, and I'm in the early stages of working out exactly how we'll do this.
Right now I'm looking at Terracotta as a way of scaling out the Java code, but I haven't got as far as working out how to scale out the C++ to match.
As well as scaling for performance we need to consider high availability as well. The application needs to be available pretty much the whole time -- not absolutely 100%, which isn't cost effective, but we need to do a reasonable job of surviving a machine outage.
If you had to undertake the task I've been given, what would you do?
EDIT: Based on the data provided by #john channing, i'm looking at both GigaSpaces and Gemstone. Oracle Coherence and IBM ObjectGrid appear to be java-only.
The first thing I would do is construct a model of the system to map the data flow and try to understand precisely where the bottleneck lies. If you can model your system as a pipeline, then you should be able to use the theory of constraints (most of the literature is about optimising business processes but it applies equally to software) to continuously improve performance and eliminate the bottleneck.
Next I would collect some hard empirical data that accurately characterises the performance of your system. It is something of a cliché that you cannot manage what you cannot measure, but I have seen many people attempt to optimise a software system based on hunches and fail miserably.
Then I would use the Pareto Principle (80/20 rule) to choose the small number of things that will produce the biggest gains and focus only on those.
To scale a Java application horizontally, I have used Oracle Coherence extensively. Although some dismiss it as a very expensive distributed hashtable, the functionality is much richer than that and you can, for example, directly access data in the cache from C++ code .
Other alternatives for horizontally scaling your Java code would be Giga Spaces, IBM Object Grid or Gemstone Gemfire.
If your C++ code is stateless and is used purely for number crunching, you could look at distributing the process using ICE Grid which has bindings for all of the languages you are using.
You need to scale sideways and out. Maybe something like a message queue could be the backend between the frontend and the crunching.
Andrew, (in addition to modeling as a pipeline etc), measuring things is important. Have you ran a profiler over the code and got metrics of where most of the time is spent?
For the database code, how often does it change ? Are you looking at caching at the moment ? I assume you have looked at indexes etc over the data to speed up the Db ?
What levels of traffic do you have on the front end ? Are you caching web pages ? (It isn't too hard to say use a JMS type api to communicate between components. You can then put Web Page component on one machine (or more), and then put the integration code (c++) on another, and for many JMS products there are usually native C++ api's ie. ActiveMQ comes to mind), but it really helps to know how much of the time is in Web (JSP ?) , C++, Database ops.
Is the database storing business data, or is it being also used to pass data between Java and C++ ? You say you are using shared mem not JNI ? What level of multi-threading currently exists in the APP? Would you describe the code as being synchronous in nature or async?
Is there a physical relationship between the Solaris code and the devices that must be maintained (ie. do all the devices register with the c++ code, or can that be specified). ie. if you were to put a web load balancer on the frontend, and just put 2 machines up today is the relationhip of which devices are managed by a box initialized up front or in advance?
What are the HA requirements ? ie. just state info ? Can the HA be done just in the web tier by clustering Session data ?
Is the DB running on another machine ?
How big is the DB ? Have you optimized your queries ie. tried using explicit inner/outer joins sometimes helps versus nested sub queries (sometmes). (again look at the sql stats).
Related
In Vaadin Flow web apps, the state of the entire user-interface is maintained in the session on the web server, with automatic dynamic generation of the HTML/CSS/JavaScript needed to represent that UI remotely on the web browser client. Depending on the particular app, and the number of users, this can result in a significant amount of memory used on the web container.
Is it possible to limit the amount of memory a session and requests related to it can use?
For example, I would like to limit each user session to one megabyte. This limit should apply to any objects created when handling requests. Is that possible?
It is theoretically possible, but it is not practical.
As far as I am aware, no JVM keeps track of the amount of memory that (say) a thread allocates. So if you wanted to do this, you would build a lot of infrastructure to do that. Here are a couple of theoretical ideas.
You could use bytecode engineering to inject some code before each new to measure and record the size of the object allocated. You would need to run this across your entire codebase ... including any Java SE classes and 3rd-party classes that you app uses.
You could modify the JVM to record the information itself. For example, you might modify the memory allocator that new uses.
However, both of these are liable be a lot of work to implement, debug and maintain. And both are liable to have significant performance impact.
It is not clear to me why you would need this ... as a general thing. If you have a problem with the memory usage of particular types of requests, then it would be simpler for the request code itself to keep tabs on how big the request data structures are getting. When the data structures get too large, the request could "abort" itself.
As the correct Answer by Stephen C explains, there is no simple automatic approach to limiting or managing the memory used in Java.
Given the nature of Vaadin Flow web apps, a large amounts of memory may be consumed on the server for user sessions containing all the state of each user’s user-interface.
Reduce memory usage of your codebase
The first step is to examine your code base.
Do you have data replicated across users that could instead be shared across users in a thread-safe manner? Do you have cached data not often used that could instead be retrieved again from its source (database, web services call)? Do you cache parts of the UI not currently onscreen that could instead be instantiated again later when needed?
More RAM
Next step is to simply add more memory to your web server.
Buying RAM is much cheaper than paying for the time of programmers and sysadmins. And so simple to just drop in more stocks of memory.
Multiple web servers
The next step after that is horizontal scaling: Use multiple web servers.
With load balancers you can spread the user load across servers fairly. And “sticky” sessions can be used to direct further user interactions to the same server to continue a session.
Of course, this horizontal scaling approach is more complicated. But this approach is commonly done in the industry, and well-understood.
Vaadin Fusion
Another programming step could involve refactoring app to build parts of your app using Vaadin Fusion.
Instead of your app being driven from the server as with Vaadin Flow, Fusion is focused on web components running in the browser. Instead of writing in pure Java, you write in TypeScript, a superset of JavaScript. Fusion can make calls into Vaadin Flow server as needed to access data and services there.
Consulting
The Vaadin Ltd company sells consulting services, as do others, to assist with any of these steps.
Session serialization
Be aware that without taking these steps, when running low on memory, some web containers such as Apache Tomcat will serialize sessions to disk to purge them from memory temporarily.
This can result in poor performance if the human users are actively still engaged with those sessions. But the more serious problem is that all the objects in your entire sessions must be serializable. And you must code for reconnecting database connections, etc. If supporting such serialization is not feasible, you likely can turn off this serialize-sessions-on-low-memory feature of the web server. But then your web server will suffer when running out of memory with no such recourse available.
I am trying to determine what part of my app code is needing a large amount of memory when handling client requests.
For this I am using VisualVM and JConsole, in the local development server, but the requests have become pretty complex and it is very hard to track down the memory consumption of the requests, and I have no idea how to proceed.
One request, from start to finish, usually uses: Search API, Entity Low level API (datastore access), Java reflection for entity conversion (low level to java plain objects), GWT RPC. so there are tens or hundreds of classes to look for (my code) at least.
I would like to know:
is it ok to make tests in local dev server, since the environment is very different from the one in production mode? I believe it should't be a problem if I know specifically "where /how to look" for memory.
what tools or patterns can I use to track down the memory used by one request (and then use it to estimate for N clients running simulateous requests).
I believe that indeed the memory needed has become very large, but I need to know how much of my code can I optimize (or where do I have code problems, garbage, libs etc) and at what point should I increase the instance types (to F4) or event switch from standard to flexible environment.
Also, if there are java tools/APIs to programmatically determine memory consumption, please advise!
Thank you.
I suppose this is not possible. But I am looking at best way to separate different layers of my service yet be able to access layers quickly or without overhead of IPC/RMI.
The main programming language I am using is java, but can use C++ if required.
What we have right now is a server that host database and access control. And we use RMI for consumers to request data. This slow and doesn't scale very well.
We need performance and scalability which we dont have at the moment.
What we are thinking of is using a layered architecture with database at base, access control ontop of it along with a notification bus to notify clients of changes in database.
The main problem is the overhead of communication that we want to avoid/or minimize.
Is there any magic thread that can run in two context (switch context) and share information that way. I know the short answer would be no, but what are the options?
Update
We are currently using Java RMI.
Our base layer will provide an API that can be used to create plugins that will run on top. So its not a fixed collectors/consumer we have. We can have 5-6 collectors running and same amount of consumers.
We can have upto 1000 consumers.
My first suggestion is that you should buy a book (or find an online tutorial) on building scalable applications, because you seem to be pretty lost.
Sharing a thread between processes doesn't make sense at any level - it is meaningless, but you can share the data that the thread accesses, which is probably what you want.
The fastest method will be C based IPC (e.g., shared memory, semasphores, etc: Shmget). You say you want to avoid the overhead of IPC, but really, it isn't going to get any faster than that.
But why do you want multiple processes? If you are worried about the overhead of communicating between processes, just have your threads in one process? There is no reason your different layers have to be in different processes.
But anyway, I am not convinced that your original statement that RMI is slow and doesn't scale is completely correct. If it is not scaling, you are probably not using the right framework. Maybe you have an issue that you only have one RMI end point on the server. Have you considered an J2EE system with stateless session beans?
Without knowing about your requirements, it is hard to say.
It is not possible in general to share thread between two processes due to OS design. The problem of sharing data between two or more processes is usually solved by sharing files, sharing database or sharing messages (which in turn can be synchronous or asynchronous), having processes communicate via pipes, say in Linux, or even sharing memory. You scenario description is not very precise, you need to describe all processes and how information is supposed to flow, what triggers information flow, etc.
Most likely you need high performance messaging library, https://github.com/real-logic/Aeron/ is one. But to get precise answer you would need to describe better what overhead exactly you want to minimize.
If your goal is to notify users, you should consider publish/subscribe messaging (pub/sub). There are many middleware vendors out there that provide this architecture though most are expensive in production scenarios. For open source, check out http://redis.io/topics/pubsub. (No affiliation.)
I am creating a (semi) big data analysis app. I am utilizing apache-mahout. I am concerned about the fact that with java, I am limited to 4gb of memory. This 4gb limitation seems somewhat wasteful of the memory modern computers have at their disposal. As a solution, I am considering using something like RMI or some form of MapReduce. (I, as of yet, have no experience with either)
First off: is it plausible to have multiple JVM's running on one machine and have them talk? and if so, am I heading in the right direction with the two ideas alluded to above?
Furthermore,
In attempt to keep this an objective question, I will avoid asking "Which is better" and instead will ask:
1) What are key differences (not necessarily in how they work internally, but in how they would be implemented by me, the user)
2) Are there drawbacks or benefits to one or the other and are there certain situations where one or the other is used?
3) Is there another alternative that is more specific to my needs?
Thanks in advance
First, re the 4GB limit, check out Understanding max JVM heap size - 32bit vs 64bit . On a 32 bit system, 4GB is the maximum, but on a 64 bit system the limit is much higher.
It is a common configuration to have multiple jvm's running and communicating on the same machine. Two good examples would be IBM Websphere and Oracle's Weblogic application servers. They run the administrative console in one jvm, and it is not unusual to have three or more "working" jvm's under its control.
This allows each JVM to fail independently without impacting the overall system reactiveness. Recovery is transparent to the end users because some fo the "working" jvm's are still doing their thing while the support team is frantically trying to fix things.
You mentioned both RMI and MapReduce, but in a manner that implies that they fill the same slot in the architecture (communication). I think that it is necessary to point out that they fill different slots - RMI is a communications mechanism, but MapReduce is a workload management strategy. The MapReduce environment as a whole typically depends on having a (any) communication mechanism, but is not one itself.
For the communications layer, some of your choices are RMI, Webservices, bare sockets, MQ, shared files, and the infamous "sneaker net". To a large extent I recommend shying away from RMI because it is relatively brittle. It works as long as nothing unexpected happens, but in a busy production environment it can present challenges at unexpected times. With that said, there are many stable and performant large scale systems built around RMI.
The direction the world is going this week for cross-tier communication is SOA on top of something like spring integration or fuse. SOA abstracts the mechanics of communication out of the equation, allowing you to hook things up on the fly (more or less).
MapReduce (MR) is a way of organizing batched work. The MR algorithm itself is essentially turn the input data into a bunch of maps on input, then reduce it to the minimum amount necessary to produce an output. The MR environment is typically governed by a workload manager which receives jobs and parcels out the work in the jobs to its "worker bees" splattered around the network. The communications mechanism may be defined by the MR library, or by the container(s) it runs in.
Does this help?
What's a good method for assigning work to a set of remote machines? Consider an example where the task is very CPU and RAM intensive, but doesn't actually process a large dataset. The language of choice would be Java. I was thinking Hadoop would be a good option, but the dataset passed between remote machines is fairly small, and Hadoop seems to focus mainly on the distribution of data rather than distribution of work.
What are some good technologies that can help?
EDIT: I'm mainly interested in load balancing. There will be a series of jobs with a small (< 3MB) dataset, but significant processing and memory needs.
MPI would probably be a good choice, there's even a JAVA implementation.
MPI may be part of your answer, but looking at the question, I'm not sure if it addresses the portion of the problem you care about.
MPI provides a communication layer between processing components. It is low level requiring you to do a fair amount of work, but from what I saw in an introduction presentation, it also comes with some common matrix data manipulation functions.
In your question, you seem to be more interested in the load balancing/job processing aspects of the problem. If that really is your focus, maybe a small program hosted in a Servlet or an RMI server might be sufficient. Let each program go to the server for their next unit of work and then submit the results back (you might even be able to use a database/file share, but pay attention to locking issues). In other words, a pull mechanism versus a push mechanism.
This approach is fairly simple to implement and gives you the advantage of scaling up by just running more distributed clients. Load balancing isn't too important if you intend to allow your process to take full control of the machine. You can experiment with running multiple clients on a machine that has multiple cores to see if you can improve overall through-put for the node. A multi-threaded client would be more efficient, but can increase complexity depending on the structure of the code you are using to solve the problem.