In our application remote Procedure call is solved with an own netty based command dispatcher system. We have a lot of modules (about 20) and I want to run all modules in separate jvm-s. My problem is, that RMI spawns about 17 threads for each JVM. I do not need RMI at all (as far as I know).
Can I completely disable RMI for a jvm? Or at least configure it in a way that it does not use this many threads?
You are most likely looking at your JVMs with a monitoring application, right? Well: these monitoring applications use RMI. So you will see always RMI threads within your monitoring application, profiler, etc. And you will always see them using some amount of CPU time. Just as gathering profiling information does not work without transporting these information (via RMI) to your tool. You can implement your own transportation protocol and direct the management beans to use it but I doubt that you can save enough resources to return your development costs.
If you don’t use any RMI, RMI won’t start any threads. But if you are using it, even if you aren’t aware of this, disabling RMI implies that your software will not work any more.
StuartMarks is correct. RMI doesn't start any threads until you use it.
Possibly you are using it in some way of which you are unaware, e.g. JMX?
Related
I am working on a platfor that hosts small Java applications, all of which currently uses a single thread, living inside a Docker engine, consuming data from a Kafka server and logging to a central DB.
Now, I need to put another Java application to this platform. This app at hand uses multithreading relatively heavily, I already tested it inside a Docker container and it works perfectly there, so I'm ready to deploy it on the platform where it would be scaled manually, that is, some human would define the number of containers that would be started, each of them containing an instance of this app.
My Architect has an objection, saying that "In a distributed environment we never use multithreading". So now, I have to refactor my application eliminating any thread related logic from it, making it single threaded. I requested a more detailed reasoning from him, but he yells "If you are not aware of this principle, you have no place near Java".
Is it really a mistake to use a multithreaded Java application in a distributed system - a simple cluster with ten or twenty physical machines, each hosting a number of virtual machines, which then runs Docker containers, with Java applications inside them.
Honestly, I don't see the problem of multithreading inside a container.
Is it really a mistake or somehow "forbidden"?
Thanks.
When you write for example a web application that will run in a Java EE application server, then normally you should not start up your own threads in your web application. The application server will manage threads, and will allocate threads to process incoming requests on the server.
However, there is no hard rule or reason why it is never a good idea to use multi-threading in a distributed environment.
There are advantages to making applications single-threaded: the code will be simpler and you won't have to deal with difficult concurrency issues.
But "in a distributed environment we never use multithreading" is not necessarily always true and "if you are not aware of this principle, you have no place near Java" sounds arrogant and condescending.
I guess he only tells you this as using a single thread eliminates multi threading and data ordering issues.
There is nothing wrong with multithreading though.
Distributed systems usually have tasks that are heavily I/O bound.
If I/O calls are blocking in your system
The only way to achieve concurrency within the process is spawning new threads to do other useful work. (Multi-threading).
The caveat with this approach is that, if they are too many threads
in flight, the operating system will spend too much time context
switching between threads, which is wasteful work.
If I/O calls are Non-Blocking in your system
Then you can avoid the Multi-threading approach and use a single thread to service all your requests. (read about event-loops or Java's Netty Framework or NodeJS)
The upside for single thread approach
The OS does not any wasteful thread context switches.
You will NOT run into any concurrency problems like dead locks or race conditions.
The downside is that
It is often harder to code/think in a non-blocking fashion
You typically end up using more memory in the form of blocking queues.
What? We use RxJava and Spring Reactor pretty heavily in our application and it works pretty fine. You can't work with threads across two JVMs anyway. So just make sure that your logic is working as you expect on a single JVM.
I'm making a desktop application in java and am doing some memory optimisations. That made me come across two threads running in the JVM, both named:
RMI TCP connection
And they're both contributing to heap growth quite considerably (at my perspective)
Now I don't know much, but TCP sounds to me like it's some internet thing. From what I've managed to find on google, it has something to do with serialization/deserialization over the internet.
But my application doesn't need the internet, so I would like to know two things:
what are they and what are they doing in my JVM?
Can I get rid of them somehow?
My tool has been "Java visualVM". A though has crossed my mind that the two threads are spawned as a result of using this tool, in which case I'll feel a bit stupid.
The threads are used to feed a remote JMX client (in your case Java VisualVM) with data from your JVM.
Once you disconnect the threads should not allocate so much data anymore.
To verify this you can go to the Threads tab and look at the thread dump of a RMI TCP Connection thread. You should see that the RMI operations trigger JMX beans.
RMI is a Java API, which allows you to divide the implementation of parts of the same application on multiple computers.
Do you use java.rmi library in your project?
I'm working on a Ruby on Rails app, currently hosted on Heroku.
We have about 5 web dynos and about 2 worker process running on average. But because we're using adeptscale these can change a lot, and the cost is increasing from month to month.
We're thinking about changing the process and the infrastructure (using our own, off of amazon/google etc). And also because of the performance, access to java libraries and other gains we're planning to go with jRuby.
I haven't got much experience with jRuby at all, but I do have Java experience. So I have a few questions:
Question intro: Since rails philosophy/approach differs from Javas, i.e ruby webserver uses far less memory but can only process one request at a time, and so having multiple servers sort of compensates the inability to process multiple requests.
If we go with jRuby (and have our rails project packaged as a war file and deployed to any servlet container i.e Tomcat or Jboss(more than just container)), will we be able to process multiple requests then?
Question intro: Currently we got some application logic running in the workers(instead of blocking the webserver, and not being able to serve other clients/browser clients). i.e when users submit some form and then our app needs to contact the 3rd party service to return the response, we simply let the worker do the workload of getting back from the 3rd party service and updating the ui (which reports waiting status) via websockets that the 3rd party service returned x/y or whatever status.
If we switch to jRuby, how will we achieve the similar logic? I mean do we go with the java code which has some kind of thread pool of workers and then free workers do the workload of contacting the 3rd party service etc? How would we go about this if we decide to go with jRuby?
1) You can serve multiple requests at a time in jruby with nearly any container, but you can also serve multiple requests at a time with mri-ruby. You only have to have a threadsafe app (config.threadsafe! is default in rails4). Different rack servers have different approaches to serve multiple requests at a time. For example unicorn uses multiple processes while passenger or puma go for a multi-threaded approach.
In my experience jruby containers like jboss or tomcat are more complicated to configure properly. But there are things like tourquebox, trinidad that help you with this. But you can even still go for some of the ruby servers (e.g. puma) that dont use c extensions.
2) If I understand you correctly you are looking for some background-processing library? You can use sidekiq or resque with ruby or jruby (while jruby will be faster in general, and its easier to debug memory leaks). You can even use ruby for your rack servers and jruby for your workers (can even be run in parallel with things like rvm/rbenv)
In general I would only go for the jruby option if you know what you are doing and need better performance for your app servers or if you want to speed up your worker servers. If I was you I would probably stay in the ruby world and use puma for your app and sidekiq as a background service. Both are very elegant and need not so much configuration.
Yes, JRuby uses Java threads and is really multithreaded. And I can say that it's really good in integration with Java, even using classes for JNI.
I can recommend next servers (some have already been mentioned):
puma (https://github.com/puma/puma)
any servlet container (even IBM WebSphere Application Server!) - just use warbler (https://github.com/jruby/warbler)
The 'simplest' way to run application on servlet container is make .war with warbler. Usually resulting .war file includes all dependencies and JRuby interpreter, so resulting file usually is 30 Mb. But I think that it is not so easy to setup warbler, then I wouldn't recommend this way if you don't really need to run Rails in enterprise Java environment.
And I would just remind that Rails opens DB connection for any request, then default size of DB connection pool of 5 isn't enough - don't forget to increase it before load testing :) (e.g. default thread pool for puma is 16, IBM WAS is 50, Tomcat - 200 threads).
I agree with smallbutton.com that puma is good choice. Finally, with puma you can switch between JRuby and other interpreter almost easy (in my experience there is one difference - gem's names)
My question: What approach could/should I take to communicate between two or more JVM instances that are running locally?
Some description of the problem:
I am developing a system for a project that requires separate JVM instances to isolate certain tasks from each other entirely.
In it's running, the 'parent' JVM will create 'child' JVMs that it will expect to execute and then return results to it (in the format of relatively simple POJO classes, or perhaps structured XML data). These results should not be transferred using the SysErr/SysOut/SysIn pipes as the child may already use these as part of its running.
If a child JVM does not respond with results within a certain time, the parent JVM should be able to signal to the child to cease processing, or to kill the child process. Otherwise, the child JVM should exit normally at the end of completing its task.
Research so far:
I am aware there are a number of technologies that may be of use e.g....
Using Java's RMI library
Using sockets to transfer objects
Using distribution libraries such as Cajo, Hessian
...but am interested in hearing what approaches others may consider before pursuing one of these options, or any others.
Thanks for any help or advice on this!
Edits:
Quantity of data to transfer- relatively small, it will mostly be just a handful of POJOs containing strings that will represent the result of the child executing. If any solution would be inefficient on larger amounts of information, this is unlikely to be a problem in my system. The amount being transferred should be pretty static and so this does not have to be scalable.
Latency of transfer- not a critical concern in this case, although if any 'polling' of results is needed this should be able to be fairly frequent without significant overheads, so I can maintain a responsive GUI on top of this at a later time (e.g. progress bar)
Not directly an answer to your question, but a suggestion of an alternative.
Have you considered OSGI?
It lets you run java projects in complete isolation from each other, within the SAME jvm.
The beauty of it is that communication between projects is very easy with services (see Core Specifications PDF page 123). This way there is not "serialization" of any sort being done as the data and calls are all in the same jvm.
Furthermore all your requirements of quality of service (response time etc...) go away - you only have to worry about whether the service is UP or DOWN at the time you want to use it. And for that you have a really nice specification that does that for you called Declarative Services (See Enterprise Spec PDF page 141)
Sorry for the off-topic answer, but I thought some other people might consider this as an alternative.
Update
To answer your question about security, I have never considered such a scenario. I don't believe there is a way to enforce "memory" usage within OSGI.
However there is a way of communicating outside of JVM between different OSGI runtimes. It is called Remote Services (see Enterprise Spec PDF, page 7). They also have nice discussion there of the factors to take into consideration when doing something like that (see 13.1 Fallacies).
Folks at Apache Felix (implementation of OSGI) I think have implementation of this with iPOJO, called Distributed Services with iPOJO (their wrapper to make using services easier). I've never used this - so ignore me if I am wrong.
I'd use KryoNet with local sockets since it specialises heavily in serialisation and is quite lightweight (you also get Remote Method Invocation! I'm using it right now), but disable the socket disconnection timeout.
RMI basically works on the principle that you have a remote type and that the remote type implements an interface. This interface is shared. On your local machine, you bind the interface via the RMI library to code 'injected' in-memory from the RMI library, the result being that you have something that satisfies the interface but is able to communicate with the remote object.
akka is another option, as well as other java actor frameworks, it provides communication and other goodies derived from the actor model.
If you can't use stdin/stdout, then i'd go with sockets. You need some sort of serialization layer on top of the sockets (as you would with stdin/stdout), and RMI is a very easy to use and pretty effective such layer.
If you used RMI and found the performance wasn't good enough, i'd switch to some more efficient serializer - there are plenty of options.
I wouldn't go anywhere near web services or XML. That seems like a complete waste of time, likely take more effort and deliver less performance than RMI.
Not many people seem to like RMI any longer.
Options:
Web Services. e.g. http://cxf.apache.org
JMX. Now, this is really a means of using RMI under the table, but it would work.
Other IPC protocols; you cited Hessian
Roll-your-own using sockets, or even shared memory. (Open a mapped file in the parent, open it again in the child. You'd still need something for synchronization.)
Examples of note are Apache ant (which forks all sorts of Jvms for one purpose or another), Apache maven, and the open source variant of the Tanukisoft daemonization kit.
Personally, I'm very facile with web services, so that's the hammer which which I tend to turn things into nails. A typical JAX-WS+JAX-B or JAX-RS+JAX-B service is very little code with CXF, and manages all the data serialization and deserialization for me.
It was mentioned above, but i wanted to expand a bit on the JMX suggestion. we actually are doing pretty much exactly what you are planning to do (from what i can glean from your various comments). we landed on using jmx for a variety of reasons, a few of which i'll mention here. for one thing, jmx is all about management, so in general it is a perfect fit for what you want to do (especially if you already plan on having jmx services for other management tasks). any effort you put into jmx interfaces will do double duty as apis you can call using java management tools like jvisualvm. this leads to my next point, which is the most relevant to what you want. the new Attach API in jdk 6 and above is very sweet. it enables you to dynamically discover and communicate with running jvms. this allows, for example, for your "controller" process to crash and restart and re-find all the existing worker processes. this is the makings of a very robust system. it was mentioned above that jmx basically rmi under the hood, however, unlike using rmi directly, you don't need to manage all the connection details (e.g. dealing with unique ports, discoverability, etc). the attach api is a bit of a hidden gem in the jdk, as it isn't very well documented. when i was poking into this stuff initially, i didn't know the name of the api, so figuring how the "magic" in jvisualvm and jconsole worked was very difficult. finally, i came across an article like this one, which shows how to actually use the attach api dynamically in your own program.
Although it's designed for potentially remote communication between JVMs, I think you'll find that Netty works extremely well between local JVM instances as well.
It's probably the most performant / robust / widely supported library of its type for Java.
A lot is discussed above. But be it sockets, rmi, jms - there is a lof of dirty work involved.
I would ratter advice akka. It is a actor based model which communicate with each other using Messages.
The beauty is, the actors can be on same JVM or another (very little config) and akka takes care the rest for you. I haven't seen a more cleaner way than doing this :)
Try out jGroups if the data to be communicated is not huge.
How about http://code.google.com/p/protobuf/
It is lightweight.
As you mentioned you can obviously send the objects over the network but that is a costly thing not to mention start up a separate JVM.
Another approach if you just want to separate your different worlds inside one JVM is to load the classes with different classloaders. ClassA#CL1!=ClassA#CL2 if they are loaded by CL1 and CL2 as sibling classloaders.
To enable communications between classA#CL1 and classA#CL2 you could have three classloaders.
CL1 that loads process1
CL2 that loads process2 (same classes as in CL1)
CL3 that loads communication classes (POJOs and Service).
Now you let CL3 be the parent classloader of CL1 and CL2.
In classes loaded by CL3 you can have a light-weight communication send/receive functionality (send(Pojo)/receive(Pojo)) the POJOs between classes in CL1 and classes in CL2.
In CL3 you expose a static service that enables implementations from CL1 and CL2 register to send and receive the POJOs.
We have a Java web application and we'd like to set up some basic monitoring with a view to expanding this monitoring in future. Our plan is as follows:
(1) Collect generic information (e.g. memory and threads) about the virtual machine of the web container that application is running in.
(2) Monitor the "state" of the application. This is rather vague but at the least we'd like to see if the web application is still alive and can respond to requests.
(3) In the future we'd like to collect more information that is specific to our application. Again this is rather vague but you can assume that we might want to make certain statistics collected internally by the application available to the support staff.
Usually the web application will be deployed in a Tomcat 5.5 or 6 environment. A quick bit of searching on the web shows that JMX can be enabled for Tomcat and that JConsole can then be used to connect to the server. This gives us lots of basic information that solves point (1). Also, some information is available in the MBeans section for "Catalina" and drilling down on this I can at least, for example, see how many requests a particular servlet has received. This is not quite what we want for point (2) but at least gives us some information. There seems to be quite a lot of information there but it's rather difficult to interpret using JConsole. Perhaps there is a better tool for interpreting the MBeans exposed by Tomcat.
For point (3), it seems, at first glance that we could write our own MBeans and then make these available to something like JConsole. Personally, this would involve me learning about JMX which I'm quite happy to do but I have a concern. Having looked around I notice that most of the textbooks on the subject haven't been updated for several years and the open source tools seem to be languishing without recent updates. So my main question is a simple one. What are your opinions on JMX? Does it have a future or is it/has it been superseded by something else? Given we already have our web application but we're starting from scratch for the management console, should we choose JMX or is there something more appropriate with a better future ahead of it?
I ask this question with no personal axe to grind, I'm simply interested to hear your opinions and experiences. I'm sure there's no one correct answer but I think an informed discussion would be useful.
Thanks in advance,
Adam.
JMX is certainly a good solution here. I wouldn't worry about it languishing. Most enterprises I've worked for recently use (or have plans to use) JMX, and I'd have to hear a pretty convincing argument before choosing something else in the Java world. It's easy to write clients (monitoring solutions) for it and you can return complex data very easily indeed. Most 3rd party components support monitoring via JMX as well.
Note that you may want to consider integration with any existing management solutions (e.g. Nagios, BNC Patrol, HP Openview etc.) as well. They may not be so Java-aware, but rather prefer tests like simple HTTP connectivity for testing if a web-site is up (easy using Nagios), or integration using SNMP (which Openview talks natively).
If applicable to your situation (Java 6 update 10 JDK or later, plus on the same machine) then consider using jvisualvm instead as it can dig even deeper than JConsole.
You may find that the easiest way to do what you need is a plugin to jvisualvm knowing your application