How to effectively manage a bunch of jar files and their plumbing?

How to effectively manage a bunch of jar files and their plumbing? - java

This is a rather high-level question so apologies if it's off-topic. I'm new to the enterprise Java world.
Suppose I have written some individual Java packages that do things like parse data feeds and store the parsed information to a queue. Another package might read from that queue and ingest those entries into a rules engine package. Tripped alerts get fed into another queue, which is polled by an alerting service (assume it's written in Python) that reads from the queue and issues emails.
As it stands I have to manually run each jar file and stick it in the background. While I could probably daemonize some or all of these services for resiliency or write some kind of service manager to do the same, this strikes me as being very amateur. Especially since I'd have to start a dozen services for this single workflow at boot.
I feel like I'm missing something, but I don't know what I don't know. Short of writing one giant, monolithic application, what should I be looking into to help me manage all these discrete components and be able to (conceptually) deliver a holistic application? I'd like to end up with some sort of hypervisor where I can click one button, it starts/stops all the above services, provides me some visibility into their status and makes sure the services are running when they should.
Is this where frameworks come into play? I see a number of them but don't know if that's just overkill, especially if I'm not actively developing a solution for that framework.

It seems you architected a system with a lot of components, and then after some time you decided to aggregate some of them because they happen to share the same programming language: Java. So, first a warning: this is not the best way to wire components together.
Also, it seems you don't know Java very well because you mix terms like package, jar and executable that are totally unrelated and distinct concepts.
However, let's assume that the current state of the art is the best possible and is immutable. Your current requirement is building a graphical interface (I guess HTTP/HTML based) to manage all the distinct components of the system written in Java. I suggest you use a single JVM, writing your components as EJB (essentially a start(), stop() and a method to query the component state that returns a custom object), and finally wire everything up with the Spring framework, that has a nice annotation-driven configuration for #Bean's.
SpringBoot also has an actuator package that simplify exposing objects. You may also find it useful to register your beans as Managed beans, and using the Hawtio framework to administer them (via a Jolokia agent).

I am not sure if you're actually using J2EE (i.e. Java Enterprise Edition). It is possible to write enterprise software also in J2SE. J2SE is not having too much available off the shelf for this, but in contrast has a lot of micro-frameworks such as Ninja, or full stack frameworks such as Play framework which work quite well, much easier to program, and performs much better than J2EE.
If you're not using J2EE, then you can go as simple as:
make one new Java project
add all the jars as dependency to that project (see the comment on Maven above by NimChimpsky)
start the classes in the jars by simply calling their constructor
This is quite a naive approach, but can serve you at this point. Of course, if you're aiming for a scalable platform, there is a lot more you need to learn first. For scalability, I suggest the Play! framework as a good start. Alternatively you can use Vert.x which has its own message queue implementation as well as support for high performance distributed caches.
The standard J2EE approach is doable (and considered "de-facto" in many oldschool enterprises) but has fundamental -flaws- or "differences" which makes a very steep learning curve and a very much non-scalable application.

It seems like you're writing your application in a microservice architecture.
You need an orchestrator.
If you are running everything in a single machine, a simple orchestrator that you probably is already running is systemd. You write systemd service description, and systemd will maintain your services according to your services description. You can specify the order the services should be brought up based on dependencies between services, restart policy if your service goes down unexpectedly, logging for stdout/stderr, etc. Note that this is the same systemd that runs the startup sequence of most modern Linux distros.
If you're running multiple machines, you can still keep using single machine orchestrator like systemd, but usually the requirement for the orchestrator will also become more complex. With multiple machines, you now have to take into account things like moving services between machines, phased roll out, etc. For these setups, there are software that adapts systemd for multi machine orchestration, like CoreOS's fleetd; and there are also standalone multi machine orchestrator like Kubernetes. Both uses docker as application container mechanism.
None of what I've described here is Java specific, which means you can use the same orchestration for Java as you used for Python or other languages or architecture.

You have to choose, As Raffaele suggested you can choose to write all your requirements into one app/service. Seems like a possible mission, using java Ejb's or using spring integration - ampqTemplate ( can write to a queue with ampqTemplate and receive the message with a dedicated listener (example).
Or choosing implementation with microservices architecture. write a service that will push to the queue another one that will contain the listener etc. a task that can be done easily with spring boot.
"One button to control them all" - in the case of a monolithic app - it's easy.
In case that you choose microservices architecture. It depends what are you needs. if its just the "start" "stop" operation I guess that that start and stop of your tomcat/other server will do. For other metrics, there is a variety of solutions. again, it depends on your needs.

Related

Google Cloud Platform: are my architectural solutions correct?

I'm trying to make simple application and deploy it on Google Cloud Platform Flexible App Engine, which will contain two main parts:
Front end application (simple Web UI based on Java 8 (Spring + Thymeleaf) with OAuth authorization from different external sites)
Back end application (monitoring several resources in separate threads, based on logged in users and reacting to their input in a certain way (behavioral changes))
Initially I was planning to make them as one app, but I think that potentially heavy background processing may cause failures in my front end application part + App Engine docs says that deployed services behave similar to microservice architecture.
My questions are:
Do I really need to separate front end from back end, if I need to react to user input as fast as possible? (but delays up to 2 seconds aren't that critical)
If I do need to separate them (and I strongly believe that I do) - how to I set up interaction between applications?
Each resource must be tracked exactly by one thread on back end - what are the best practices about this? I thought about having a SQL table with a list of acquired resources, but the flaw I see there is if an instance will fail I will need to make some kind of clean up on that table and redetermine which resources are actually acquired.

Your proposed architecture sounds like the right approach in separating the two into different services for the following reasons:
Can deploy code for each separately, rollback versions separately, and split traffic separately for experiments or phased rollouts.
Can adjust machine types and memory allocations for each service to better suit its needs. If you're doing work that is memory intensive on the backend, you can adjust that service's settings to allocate more memory per instance.
Allow each type of service to scale independently based on demands, which should result in better utilization of the services and less waste. This should also lower your overall spending than if you tried to go for a one-sized fits all approach in a single monolithic service.
You can mix different runtime environments across services. For example, you can mix language runtimes within a project OR you could even mix between standard and flexible environments. Say your front-end code is more cost efficient in standard, designate that service as a standard environment service and your backend as a flexible environment service. Or say you need a customer docker file with Perl in it, you could do that as a flexible environment custom runtime and have your front-end in Java 8.
You can still share common services like Cloud SQL, PubSub, Cloud Tasks (currently in alpha) or Redis for in memory caching. Your works don't need t reside in App Engine, they could reside in a different product if that better suits your needs.
Overall, you get much better control over your application to split it apart. The biggest benefit likely comes down to optimizing your application for spending only on what you need.

I think that you are likely to be able to deploy everything as an appengine app except if you use some exotic Java libraries that are not whitelisted. It could still be good to deploy it with compute engine for increased configurability and versatility.
You can create one front-end instance and one back-end instance in compute engine and divide the resources between them like that. Google's documentation has an example where you can do that.

How to build a distributed java application?

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).
Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes).
I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution.
Questions:
This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?
2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?
If I want to make the application distributed, is it possible with my idea? If not what would i need to use?
Sorry for having many questions , but I am really confused about this.

I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).
If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...
The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?
The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)
The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.

The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).
There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.
Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.

To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.
As #AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.
If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.
Another route could be to distribute your application using EJBs running on an application server.

Invoking a service on other java application running on the same machine

I created a command line interface on a small java application I created for personal use.
For the moment the cli is resided in the same project as the original application but I'm planning to extract it into it's own project, effectively building 2 separate executable jars enabling me to start the cli as needed and query the other running program for information.
I'm trying to figure out the easiest and most lightweight solution to call a remote service, on the same machine.
I looked at spring remoting but many of the provided solutions such as HttpInvoker, Hessian/Burlap, JAX RPC web services are based on HTTP or SOAP and therefore not suited for the job.
JMS also seems like overkill.
This leaves me with RMI, which looks rather heavyweight, and possibly JMX?
Suggestions?

JMX would use RMI underneath for remote access. JMX is meant for exposing admin apis (monitoring / management) - not intended as a general purpose remoting api.
RMI with the spring remoting support is fairly lightweight from a development point of view. Even runtime it is the option that adds least overhead compared to the other options you have listed.
Also with spring remoting support you can easily switch over to a different option if required later.
Take a look at this artcile, that compares / benchmarks performance of the above options.

I'd say it depends very much on where the project/functionality is heading. JMX is easy enough to set up, and you can make use of existing clients/guis to query and set parameters - this may save you a lot of work. It may also allow you system to integrate with monitoring tools out there.
If, on the other hand, the functionality has little to do with managment/monitoring, and more along the lines of pumping data in and out, one option may be Apache MINA. I've used it in the past with great results. But you'll effectively be creating your own protocol ! I doubt that MINA will end up being "less heavyweight" than simple RMI though.

In an app for personal use, I'd go with JMX because it should be the path of least resistance. I've had great experiences with this in the past. You'll be able to get it up and running in minutes, and you won't have to think about what message format to move data in (as long as your beans are Serializable, that is).
Put an interface in front of the remote call, so that later you can drop in another implementation later if JMX turns out to be inadequate.

Communication between local JVMs

My question: What approach could/should I take to communicate between two or more JVM instances that are running locally?
Some description of the problem:
I am developing a system for a project that requires separate JVM instances to isolate certain tasks from each other entirely.
In it's running, the 'parent' JVM will create 'child' JVMs that it will expect to execute and then return results to it (in the format of relatively simple POJO classes, or perhaps structured XML data). These results should not be transferred using the SysErr/SysOut/SysIn pipes as the child may already use these as part of its running.
If a child JVM does not respond with results within a certain time, the parent JVM should be able to signal to the child to cease processing, or to kill the child process. Otherwise, the child JVM should exit normally at the end of completing its task.
Research so far:
I am aware there are a number of technologies that may be of use e.g....
Using Java's RMI library
Using sockets to transfer objects
Using distribution libraries such as Cajo, Hessian
...but am interested in hearing what approaches others may consider before pursuing one of these options, or any others.
Thanks for any help or advice on this!
Edits:
Quantity of data to transfer- relatively small, it will mostly be just a handful of POJOs containing strings that will represent the result of the child executing. If any solution would be inefficient on larger amounts of information, this is unlikely to be a problem in my system. The amount being transferred should be pretty static and so this does not have to be scalable.
Latency of transfer- not a critical concern in this case, although if any 'polling' of results is needed this should be able to be fairly frequent without significant overheads, so I can maintain a responsive GUI on top of this at a later time (e.g. progress bar)

Not directly an answer to your question, but a suggestion of an alternative.
Have you considered OSGI?
It lets you run java projects in complete isolation from each other, within the SAME jvm.
The beauty of it is that communication between projects is very easy with services (see Core Specifications PDF page 123). This way there is not "serialization" of any sort being done as the data and calls are all in the same jvm.
Furthermore all your requirements of quality of service (response time etc...) go away - you only have to worry about whether the service is UP or DOWN at the time you want to use it. And for that you have a really nice specification that does that for you called Declarative Services (See Enterprise Spec PDF page 141)
Sorry for the off-topic answer, but I thought some other people might consider this as an alternative.
Update
To answer your question about security, I have never considered such a scenario. I don't believe there is a way to enforce "memory" usage within OSGI.
However there is a way of communicating outside of JVM between different OSGI runtimes. It is called Remote Services (see Enterprise Spec PDF, page 7). They also have nice discussion there of the factors to take into consideration when doing something like that (see 13.1 Fallacies).
Folks at Apache Felix (implementation of OSGI) I think have implementation of this with iPOJO, called Distributed Services with iPOJO (their wrapper to make using services easier). I've never used this - so ignore me if I am wrong.

I'd use KryoNet with local sockets since it specialises heavily in serialisation and is quite lightweight (you also get Remote Method Invocation! I'm using it right now), but disable the socket disconnection timeout.
RMI basically works on the principle that you have a remote type and that the remote type implements an interface. This interface is shared. On your local machine, you bind the interface via the RMI library to code 'injected' in-memory from the RMI library, the result being that you have something that satisfies the interface but is able to communicate with the remote object.

akka is another option, as well as other java actor frameworks, it provides communication and other goodies derived from the actor model.

If you can't use stdin/stdout, then i'd go with sockets. You need some sort of serialization layer on top of the sockets (as you would with stdin/stdout), and RMI is a very easy to use and pretty effective such layer.
If you used RMI and found the performance wasn't good enough, i'd switch to some more efficient serializer - there are plenty of options.
I wouldn't go anywhere near web services or XML. That seems like a complete waste of time, likely take more effort and deliver less performance than RMI.

Not many people seem to like RMI any longer.
Options:
Web Services. e.g. http://cxf.apache.org
JMX. Now, this is really a means of using RMI under the table, but it would work.
Other IPC protocols; you cited Hessian
Roll-your-own using sockets, or even shared memory. (Open a mapped file in the parent, open it again in the child. You'd still need something for synchronization.)
Examples of note are Apache ant (which forks all sorts of Jvms for one purpose or another), Apache maven, and the open source variant of the Tanukisoft daemonization kit.
Personally, I'm very facile with web services, so that's the hammer which which I tend to turn things into nails. A typical JAX-WS+JAX-B or JAX-RS+JAX-B service is very little code with CXF, and manages all the data serialization and deserialization for me.

It was mentioned above, but i wanted to expand a bit on the JMX suggestion. we actually are doing pretty much exactly what you are planning to do (from what i can glean from your various comments). we landed on using jmx for a variety of reasons, a few of which i'll mention here. for one thing, jmx is all about management, so in general it is a perfect fit for what you want to do (especially if you already plan on having jmx services for other management tasks). any effort you put into jmx interfaces will do double duty as apis you can call using java management tools like jvisualvm. this leads to my next point, which is the most relevant to what you want. the new Attach API in jdk 6 and above is very sweet. it enables you to dynamically discover and communicate with running jvms. this allows, for example, for your "controller" process to crash and restart and re-find all the existing worker processes. this is the makings of a very robust system. it was mentioned above that jmx basically rmi under the hood, however, unlike using rmi directly, you don't need to manage all the connection details (e.g. dealing with unique ports, discoverability, etc). the attach api is a bit of a hidden gem in the jdk, as it isn't very well documented. when i was poking into this stuff initially, i didn't know the name of the api, so figuring how the "magic" in jvisualvm and jconsole worked was very difficult. finally, i came across an article like this one, which shows how to actually use the attach api dynamically in your own program.

Although it's designed for potentially remote communication between JVMs, I think you'll find that Netty works extremely well between local JVM instances as well.
It's probably the most performant / robust / widely supported library of its type for Java.

A lot is discussed above. But be it sockets, rmi, jms - there is a lof of dirty work involved.
I would ratter advice akka. It is a actor based model which communicate with each other using Messages.
The beauty is, the actors can be on same JVM or another (very little config) and akka takes care the rest for you. I haven't seen a more cleaner way than doing this :)

Try out jGroups if the data to be communicated is not huge.

How about http://code.google.com/p/protobuf/
It is lightweight.

As you mentioned you can obviously send the objects over the network but that is a costly thing not to mention start up a separate JVM.
Another approach if you just want to separate your different worlds inside one JVM is to load the classes with different classloaders. ClassA#CL1!=ClassA#CL2 if they are loaded by CL1 and CL2 as sibling classloaders.
To enable communications between classA#CL1 and classA#CL2 you could have three classloaders.
CL1 that loads process1
CL2 that loads process2 (same classes as in CL1)
CL3 that loads communication classes (POJOs and Service).
Now you let CL3 be the parent classloader of CL1 and CL2.
In classes loaded by CL3 you can have a light-weight communication send/receive functionality (send(Pojo)/receive(Pojo)) the POJOs between classes in CL1 and classes in CL2.
In CL3 you expose a static service that enables implementations from CL1 and CL2 register to send and receive the POJOs.

How do I decide between a using a Swing GUI or a light-weight web client for the user front end of my Java application?

I always seem to have this internal struggle when it comes to user interface. I build up an application "engine" and tend to defer user interface to after I get my algorithms working. Then I go back and forth trying to decide how to let a user interact with my program. Personally, I'm a fan of the command line, but I can't expect that of my users generally.
I really like what's possible in the browser in the age of web 2.0 and ajax. On the other hand it's not so hard to make a Swing front-end either, and you can generally count on a more consistent presentation to the user (though using a good javascript framework like YUI or jQuery goes a long way toward normalizing browsers).
Clearly both approaches have their merits and drawbacks. So, what criteria / parameters / situations should lead me to use a lightweight (e.g. web-based) GUI? What criteria / parameters / situations should lead me to use a heavier (e.g. Swing-based) GUI?
It is not my intent to start a flame war, merely interested in the community's constructive/objective opinions.
Edit #1
In light of the first few responses, I would like to clarify that I would like to deploy my application regardless, not host it on some internet server necessarily. So I would have to deploy with a light-weight web-server infrastructure a la Jetty/Tomcat or similar.

It depends on the application and this is essentially a usability driven question (though there are considerations like data storage and platform requirements). Think of the pros and cons.
Pros of a lightweight Web UI:
Ease of distribution
Platform independent
Ease of maintenance
Cons of a lightweight Web UI:
Less environmental control
Markup standards vary between browsers
Requires a web server and everything that goes with it
Pros of an executable UI
More environmental control (i.e.: full screen applications, etc)
Not necessarily subject to latency and outages
Cons of an executable UI
Pushing updates may be more difficult
Requires installation
Potential platform requirements (frameworks, packages, etc)
Potentially requires knowledge of advanced networking topics (web services, etc)

One small factor you may want to consider is that the user will have go through some type of installation (albeit minimal) if you distribute a swing application.
Also a web application will allow you to accurately track the usage of your application (via google analytics or something similar). Not sure if that's a concern but it may be useful to you in the future.

If it is a client-server application I would normally go for a web frontend for the application.
You will save yourself of countless problems with things like installed JRE versions, distributing upgrades, application permissions, disappeared shortcuts...

You need to break the requirements of the application down to decide this...
Do the users have Java of sufficient version installed? It will need to be, to run a Swing GUI.
Do you have a web server?
Do you need the flexibility of a Swing GUI or the accessibility of the web interface?
Is Java Webstart and option, if so, you can distribute a Swing GUI via the web.
Does your application perform extensive calculations or processing? If so, a client app may be the answer.
There are a million questions such as these. I would suggest a brain storming session and keeping track of all the pros and cons of each, adding a point score, than throwing it all away and going with your gut feeling :)

If you anticipate there being frequent updates to the app then web based may be better since the user would not have to update the client or install a new client containing the updates.
If you think that the user may need the ability to use the app while not conencted to the internet then swing would be better.
Just two things off the top of my head.

Think about the users and use cases of your project.
Do users expect to have access to it when they're disconnected from the Internet (for example, on an airplane or in a coffee shop with no Internet access)? Use Swing.
Do you want users to be able to access the same tool from different computers (for example, both at work and at home)? Use a web UI.
Also consider whether the user needs to save and load data, and whether the tool produces data files that some might consider sensitive (if so, storage on the web might be an issue).

Do make a quick guess I often try to ask myself/customers if the application has a high "write" demand.
For a mostly read-only application a thin-client solution is perfectly well suited.
But if a lot write actions are needed then a swing desktop application has more flexibility.
Personally I always prever a swing desktop application. It can easily deployed using Java Webstart.

Not knowing anything about your application I can not give the best recommendation possible. However I can state from personal/professional experience that installing an application on clients machines is a LOT more of a pain in the ass than it seems.
With AJAX/web you really only have to worry about supporting like three browsers. Installation messes/updates are only felt once when you deploy the product to the web server.
With like a stand-along Swing app, you get to deal with the really really big mess that is installing the application onto unknown systems. This mess was so bad that things like AJAX were really pushed along to make web apps behave/feel like a real native app.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.