ehcache monitor: installation/configuration - java

In the documentation http://ehcache.org/documentation/user-guide/monitor there is a phrase:
It is recommended that you install the Monitor on an Operations server separate to production.
Why it is so? What will be if I install it on the production?
And, the second question on which I did not find the answer there - is it really this monitor does not affect the performance of application?

I'll try to explain what I think they mean.
First of all, I don't think the intention is that you not use the Monitor in production. Rather, I think they mean that the Monitor should be installed on a separate server in a production environment. There are at least three good reasons to do this.
The first is one of security. The clients that your production server is servicing shouldn't be able to reach the Monitor's services. By putting it on a separate server (perhaps behind a firewall) you prevent this.
The second is one of landscape simplicity. The Monitor can monitor several servers. By putting it on a separate server, you prevent one application server from being "special" - all the application servers are identical as far as this is concerned. Easier for scaling and maintenance of your landscape.
The third reason is one of performance. Calls to the Monitor won't impact the application server/s. This is as it should be.
As for the second part of your question- obviously, adding ehcache monitoring will affect performance to some extent. Probably it's meant to only incur a minimal overhead, but nothing is completely without cost. But if you end up optimizing the caches, it will probably be worth it.
I found this paragraph detailing how often the Monitor samples:
Memory is estimated by sampling. The first 15 puts or updates are measured and then every 100th put or update
(this is from the statistics section of the Monitor page)

Related

"In a distributed environment, one does not use multithreding" - Why?

I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.
When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.
I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.
Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.
What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.

Communication between databases with Java ee

We have a costumer that have 3 stores with different databases. In every store has a wildfly running some webservices which communicate between them. Each json request with 10/30 rows spends 1 seconds in average. Every wildfly uses 1,5 gb of RAM. I know that memory is always a problem in Java, but can I be more economic using some microframework like Javalin or microservices rather than a java ee app server? And node.js would be a option for better performance?
Before you start looking into a different architecture, which would probably make for a major rewrite, find out where all that time is going. Set up profiling on the WildFly servers. Start by doing that on one, then have some calls coming in. Check how much time is spent in various parts of the stack. Is one call to the web service handled rather slowly? Then see where that time goes. It might be the database access. Is one such call handled pretty quickly on the server itself once it comes in? Then your best bet is your losing time on the network layer.
Check the network traffic. You can use Wireshark or a similar tracing tool for this. See how much time actually passes between a request coming in and the response going out. Is that slow but the processing on Wildfly itself seems fast enough? Maybe there's some overhead going on (like security). Is the time between request and response very fast? You're definitely looking at the network as the culprit.
Eventually you may need to have profiling and network tracing active on all three servers simultaneously to see what's going on, or for each combination of two servers. It may turn out only one of them is the bottleneck. And if you have servers A, B and C, from the sound of it your setup might cause a call from A to B to also require a call from B to C before some result can be returned to A. If that is the case, it's little wonder you may see some serious latency.
But measure and find the root of the problem before you start deciding to change the entire framework and a different programming language. Otherwise you may put a lot of time into something for no improvement at all. If the architecture is fundamentally flawed you need to think of a different approach. If this is still in the prototyping phase that would be substantially easier.
Well, first you may prune your WildFly installation or try Quarkus :)

Sharing thread between processes

I suppose this is not possible. But I am looking at best way to separate different layers of my service yet be able to access layers quickly or without overhead of IPC/RMI.
The main programming language I am using is java, but can use C++ if required.
What we have right now is a server that host database and access control. And we use RMI for consumers to request data. This slow and doesn't scale very well.
We need performance and scalability which we dont have at the moment.
What we are thinking of is using a layered architecture with database at base, access control ontop of it along with a notification bus to notify clients of changes in database.
The main problem is the overhead of communication that we want to avoid/or minimize.
Is there any magic thread that can run in two context (switch context) and share information that way. I know the short answer would be no, but what are the options?
Update
We are currently using Java RMI.
Our base layer will provide an API that can be used to create plugins that will run on top. So its not a fixed collectors/consumer we have. We can have 5-6 collectors running and same amount of consumers.
We can have upto 1000 consumers.
My first suggestion is that you should buy a book (or find an online tutorial) on building scalable applications, because you seem to be pretty lost.
Sharing a thread between processes doesn't make sense at any level - it is meaningless, but you can share the data that the thread accesses, which is probably what you want.
The fastest method will be C based IPC (e.g., shared memory, semasphores, etc: Shmget). You say you want to avoid the overhead of IPC, but really, it isn't going to get any faster than that.
But why do you want multiple processes? If you are worried about the overhead of communicating between processes, just have your threads in one process? There is no reason your different layers have to be in different processes.
But anyway, I am not convinced that your original statement that RMI is slow and doesn't scale is completely correct. If it is not scaling, you are probably not using the right framework. Maybe you have an issue that you only have one RMI end point on the server. Have you considered an J2EE system with stateless session beans?
Without knowing about your requirements, it is hard to say.
It is not possible in general to share thread between two processes due to OS design. The problem of sharing data between two or more processes is usually solved by sharing files, sharing database or sharing messages (which in turn can be synchronous or asynchronous), having processes communicate via pipes, say in Linux, or even sharing memory. You scenario description is not very precise, you need to describe all processes and how information is supposed to flow, what triggers information flow, etc.
Most likely you need high performance messaging library, https://github.com/real-logic/Aeron/ is one. But to get precise answer you would need to describe better what overhead exactly you want to minimize.
If your goal is to notify users, you should consider publish/subscribe messaging (pub/sub). There are many middleware vendors out there that provide this architecture though most are expensive in production scenarios. For open source, check out http://redis.io/topics/pubsub. (No affiliation.)

JAVA Distributed processing on a single machine (Ironic i know)

I am creating a (semi) big data analysis app. I am utilizing apache-mahout. I am concerned about the fact that with java, I am limited to 4gb of memory. This 4gb limitation seems somewhat wasteful of the memory modern computers have at their disposal. As a solution, I am considering using something like RMI or some form of MapReduce. (I, as of yet, have no experience with either)
First off: is it plausible to have multiple JVM's running on one machine and have them talk? and if so, am I heading in the right direction with the two ideas alluded to above?
Furthermore,
In attempt to keep this an objective question, I will avoid asking "Which is better" and instead will ask:
1) What are key differences (not necessarily in how they work internally, but in how they would be implemented by me, the user)
2) Are there drawbacks or benefits to one or the other and are there certain situations where one or the other is used?
3) Is there another alternative that is more specific to my needs?
Thanks in advance
First, re the 4GB limit, check out Understanding max JVM heap size - 32bit vs 64bit . On a 32 bit system, 4GB is the maximum, but on a 64 bit system the limit is much higher.
It is a common configuration to have multiple jvm's running and communicating on the same machine. Two good examples would be IBM Websphere and Oracle's Weblogic application servers. They run the administrative console in one jvm, and it is not unusual to have three or more "working" jvm's under its control.
This allows each JVM to fail independently without impacting the overall system reactiveness. Recovery is transparent to the end users because some fo the "working" jvm's are still doing their thing while the support team is frantically trying to fix things.
You mentioned both RMI and MapReduce, but in a manner that implies that they fill the same slot in the architecture (communication). I think that it is necessary to point out that they fill different slots - RMI is a communications mechanism, but MapReduce is a workload management strategy. The MapReduce environment as a whole typically depends on having a (any) communication mechanism, but is not one itself.
For the communications layer, some of your choices are RMI, Webservices, bare sockets, MQ, shared files, and the infamous "sneaker net". To a large extent I recommend shying away from RMI because it is relatively brittle. It works as long as nothing unexpected happens, but in a busy production environment it can present challenges at unexpected times. With that said, there are many stable and performant large scale systems built around RMI.
The direction the world is going this week for cross-tier communication is SOA on top of something like spring integration or fuse. SOA abstracts the mechanics of communication out of the equation, allowing you to hook things up on the fly (more or less).
MapReduce (MR) is a way of organizing batched work. The MR algorithm itself is essentially turn the input data into a bunch of maps on input, then reduce it to the minimum amount necessary to produce an output. The MR environment is typically governed by a workload manager which receives jobs and parcels out the work in the jobs to its "worker bees" splattered around the network. The communications mechanism may be defined by the MR library, or by the container(s) it runs in.
Does this help?

Re-Architect Application to Run on Multiple Servers(JVMs) to enhance performance

I have an legacy in house business application which is running in one JVM and there are many performance issues with it more specifically regarding Heap Usage and Running Concurrent Threads, at the core of it, it's an scheduling application wherein the user can schedule some task from front end and when time arrives the task get's fired up, all code is home grown and we are not using any third party scheduler for scheduling purpose, now my goal is to enhance performance of the application and there are some options which i can try, like using scheduling mechanism like Quartz or distribute application to different jvms, challenge i have here is that i have never being exposed to this kind of situation of re-architecting the application and so am not sure where to start from, i know SO is not right place to ask this type of question but am not sure how to approach and any help/suggestions would be highly appreciated.
From reading your post I don't get the impression that you've really grasped what the underlying cause of your performance problems are. The first step in addressing any such problem should be to identify the cause before proposing a solution. I'd begin by asking some pretty high level questions.
How many concurrent tasks/threads do you currently execute?
Are the jobs CPU or IO bound?
What software stack is the app running on?
What hardware is the app running on?
By distributing the application across multiple JVMs you will invariably add complexity, which is fine, provided it's a valid and required solution.
I suggest you exercise the application with a realistic workload so that the server is busy and profile it to find CPU, memory and resource bottleneck.
IMHO: Separating JVM might be an option if you are using more than 1 - 8 GB of heap AND Full GC times are an issue. If you are using much less than it, its unlikely to help.
DON'T jump to any conclusions about which solution should be until you have a very good understanding of the problem or you can end up spending a lot of time optimising the wrong things and possibly making it worse.

Categories