How to build a distributed java application?

How to build a distributed java application? - java

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).
Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes).
I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution.
Questions:
This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?
2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?
If I want to make the application distributed, is it possible with my idea? If not what would i need to use?
Sorry for having many questions , but I am really confused about this.

I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).
If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...
The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?
The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)
The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.

The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).
There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.
Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.

To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.
As #AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.
If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.
Another route could be to distribute your application using EJBs running on an application server.

Related

What is the difference between Microservices and Monolythical approach for the provided use-case

So I started reading some things about different software architectures and inevitably came across Microservices Architecture. Yet, I am wondering about the way these achitectures differ from each other. In a monolythical approach I would e.g. modify a POM.XML to take my different layers and pack them into one application to deploy it. I'd say this might even be the most common way to set up a simple application.
Now as I understood Microservices, you seperate each service from each other and let them run independently. For me that means, that every service is deployed on its own (so you basically got a tomcat running with quite a lot of .war-files on it. But I think I miss the difference to a monolythical application.
I am going to try to set an (quite common) example:
You got a frontend (e.g. Angular) with a Spring-Boot Backend communicating via REST-Services. Now I take a POM.XML and do the following steps:
build the Frontend-Application
include the necessary JS-files into my Spring-Application
create WAR-file from the project
As a result I got one single WAR-file that I can deploy but got two Services included: Backend and Frontend. Yet, I would call it a monolythical approach.
Now when I would include the Angular-Application into my tomcat and deploy a WAR-file of my Spring-Boot part of the application (without integrated frontend). That way I would have two deployed services running on the same server interacting with each other that can be replaced without touching each other. By definition I would not call it a monolythical approach (same code, different deployment) but a Microservice-architecture, right? But this can not be the case since literally every article I read told me the same advantages and disadvantages for architectures and I cannot see any difference except for the possibility to exchange frontend and backend (which I have in both cases, yet in one I would need to redeploy the full application in the first case).

Microservices are set just set of guide lines that talk about how to design your application so that it is scalable, manageable and adapts to fast development pace. It is not just about how you are deploying your application.
Over the years, we have learned that when you try build one big application as monolith, initially it gives you pace, different modules in your monolith has complete visibility of each other and can access things, tweak things around as they wish, even one change that should affect one module may migrate into other classes, where it should not have been. While it helps you prototype, but code becomes less and less maintainable. You can ofcourse put in effort to make sure your code remains clean, but that effort grows as app grows.
Also you as developer are required to know whole product and it is difficult to work in silo, without worrying about the whole architecture, which makes it difficult for new people to join and make changes.
Next when you deploy, specially now a day, scale is important, and you need to adapt to traffic. All your modules will not expect high traffic 24/7. But if you have monolith, even if one module is being used by 100 of users your application have to scale for 100 of users.
Microservices just pulls in info from this, and defines some guidelines
You should breakdown your app based on biz domains. Every service is responsible for one aspect only. They talk to other via contract (API or events) and as long as contract stands you can do what you want within your service. New dev need to learn just one service to start with.
Scaling becomes easy, because if you have load on one service only that will scale. Other modules deployed independently can scale as the load specific to them.
By keeping it small you can build fast, make changes in rapid way. No shared database make sure that you take a call on what you want to save, how you want to save and how you want to change.
For you case, just deploy it the way you think its best. But if you start to grow, you have some 50 odd services (or that size project) you will see benefits of divide and conquer.

Spliting Frontend from Backend is not the best/canonical example of microservices deployments; this is the equivalent of having layers in a monolith. Thing better about how you'd split your monolith by (sub)domain into modules, each module having frontend and backend responsibilities; then each module can become a microservice, if needed.
The canonical MS architecture for a web-based app is a Gateway that assembles (in paralel!) HTML responses from different MSs. So, individual MS would respond with HTML, CSS and JS instead of JSON or other incomplete form of data. This is Tell Don't Ask principle applied to MSs. This gives you a real MS, in which you can very easily replace one MS with another.
If the Gateway cannot assemble the individual responses in parallel because they depend one another then the splitting is wrong and you need to refactor.
The biggest notable difference between a modular monolith and microservices is that microservices run in separate processes.
If you create your monolith using location transparency, then you could deploy components as microservices without touching others' components code. For example if you use CQRS, you could deploy a Readmodel as a microservice just by using cut/paste on your code, from monolith to microservice.

Google Cloud Platform: are my architectural solutions correct?

I'm trying to make simple application and deploy it on Google Cloud Platform Flexible App Engine, which will contain two main parts:
Front end application (simple Web UI based on Java 8 (Spring + Thymeleaf) with OAuth authorization from different external sites)
Back end application (monitoring several resources in separate threads, based on logged in users and reacting to their input in a certain way (behavioral changes))
Initially I was planning to make them as one app, but I think that potentially heavy background processing may cause failures in my front end application part + App Engine docs says that deployed services behave similar to microservice architecture.
My questions are:
Do I really need to separate front end from back end, if I need to react to user input as fast as possible? (but delays up to 2 seconds aren't that critical)
If I do need to separate them (and I strongly believe that I do) - how to I set up interaction between applications?
Each resource must be tracked exactly by one thread on back end - what are the best practices about this? I thought about having a SQL table with a list of acquired resources, but the flaw I see there is if an instance will fail I will need to make some kind of clean up on that table and redetermine which resources are actually acquired.

Your proposed architecture sounds like the right approach in separating the two into different services for the following reasons:
Can deploy code for each separately, rollback versions separately, and split traffic separately for experiments or phased rollouts.
Can adjust machine types and memory allocations for each service to better suit its needs. If you're doing work that is memory intensive on the backend, you can adjust that service's settings to allocate more memory per instance.
Allow each type of service to scale independently based on demands, which should result in better utilization of the services and less waste. This should also lower your overall spending than if you tried to go for a one-sized fits all approach in a single monolithic service.
You can mix different runtime environments across services. For example, you can mix language runtimes within a project OR you could even mix between standard and flexible environments. Say your front-end code is more cost efficient in standard, designate that service as a standard environment service and your backend as a flexible environment service. Or say you need a customer docker file with Perl in it, you could do that as a flexible environment custom runtime and have your front-end in Java 8.
You can still share common services like Cloud SQL, PubSub, Cloud Tasks (currently in alpha) or Redis for in memory caching. Your works don't need t reside in App Engine, they could reside in a different product if that better suits your needs.
Overall, you get much better control over your application to split it apart. The biggest benefit likely comes down to optimizing your application for spending only on what you need.

I think that you are likely to be able to deploy everything as an appengine app except if you use some exotic Java libraries that are not whitelisted. It could still be good to deploy it with compute engine for increased configurability and versatility.
You can create one front-end instance and one back-end instance in compute engine and divide the resources between them like that. Google's documentation has an example where you can do that.

How to effectively manage a bunch of jar files and their plumbing?

This is a rather high-level question so apologies if it's off-topic. I'm new to the enterprise Java world.
Suppose I have written some individual Java packages that do things like parse data feeds and store the parsed information to a queue. Another package might read from that queue and ingest those entries into a rules engine package. Tripped alerts get fed into another queue, which is polled by an alerting service (assume it's written in Python) that reads from the queue and issues emails.
As it stands I have to manually run each jar file and stick it in the background. While I could probably daemonize some or all of these services for resiliency or write some kind of service manager to do the same, this strikes me as being very amateur. Especially since I'd have to start a dozen services for this single workflow at boot.
I feel like I'm missing something, but I don't know what I don't know. Short of writing one giant, monolithic application, what should I be looking into to help me manage all these discrete components and be able to (conceptually) deliver a holistic application? I'd like to end up with some sort of hypervisor where I can click one button, it starts/stops all the above services, provides me some visibility into their status and makes sure the services are running when they should.
Is this where frameworks come into play? I see a number of them but don't know if that's just overkill, especially if I'm not actively developing a solution for that framework.

It seems you architected a system with a lot of components, and then after some time you decided to aggregate some of them because they happen to share the same programming language: Java. So, first a warning: this is not the best way to wire components together.
Also, it seems you don't know Java very well because you mix terms like package, jar and executable that are totally unrelated and distinct concepts.
However, let's assume that the current state of the art is the best possible and is immutable. Your current requirement is building a graphical interface (I guess HTTP/HTML based) to manage all the distinct components of the system written in Java. I suggest you use a single JVM, writing your components as EJB (essentially a start(), stop() and a method to query the component state that returns a custom object), and finally wire everything up with the Spring framework, that has a nice annotation-driven configuration for #Bean's.
SpringBoot also has an actuator package that simplify exposing objects. You may also find it useful to register your beans as Managed beans, and using the Hawtio framework to administer them (via a Jolokia agent).

I am not sure if you're actually using J2EE (i.e. Java Enterprise Edition). It is possible to write enterprise software also in J2SE. J2SE is not having too much available off the shelf for this, but in contrast has a lot of micro-frameworks such as Ninja, or full stack frameworks such as Play framework which work quite well, much easier to program, and performs much better than J2EE.
If you're not using J2EE, then you can go as simple as:
make one new Java project
add all the jars as dependency to that project (see the comment on Maven above by NimChimpsky)
start the classes in the jars by simply calling their constructor
This is quite a naive approach, but can serve you at this point. Of course, if you're aiming for a scalable platform, there is a lot more you need to learn first. For scalability, I suggest the Play! framework as a good start. Alternatively you can use Vert.x which has its own message queue implementation as well as support for high performance distributed caches.
The standard J2EE approach is doable (and considered "de-facto" in many oldschool enterprises) but has fundamental -flaws- or "differences" which makes a very steep learning curve and a very much non-scalable application.

It seems like you're writing your application in a microservice architecture.
You need an orchestrator.
If you are running everything in a single machine, a simple orchestrator that you probably is already running is systemd. You write systemd service description, and systemd will maintain your services according to your services description. You can specify the order the services should be brought up based on dependencies between services, restart policy if your service goes down unexpectedly, logging for stdout/stderr, etc. Note that this is the same systemd that runs the startup sequence of most modern Linux distros.
If you're running multiple machines, you can still keep using single machine orchestrator like systemd, but usually the requirement for the orchestrator will also become more complex. With multiple machines, you now have to take into account things like moving services between machines, phased roll out, etc. For these setups, there are software that adapts systemd for multi machine orchestration, like CoreOS's fleetd; and there are also standalone multi machine orchestrator like Kubernetes. Both uses docker as application container mechanism.
None of what I've described here is Java specific, which means you can use the same orchestration for Java as you used for Python or other languages or architecture.

You have to choose, As Raffaele suggested you can choose to write all your requirements into one app/service. Seems like a possible mission, using java Ejb's or using spring integration - ampqTemplate ( can write to a queue with ampqTemplate and receive the message with a dedicated listener (example).
Or choosing implementation with microservices architecture. write a service that will push to the queue another one that will contain the listener etc. a task that can be done easily with spring boot.
"One button to control them all" - in the case of a monolithic app - it's easy.
In case that you choose microservices architecture. It depends what are you needs. if its just the "start" "stop" operation I guess that that start and stop of your tomcat/other server will do. For other metrics, there is a variety of solutions. again, it depends on your needs.

Ruby on rails with jRuby

I'm working on a Ruby on Rails app, currently hosted on Heroku.
We have about 5 web dynos and about 2 worker process running on average. But because we're using adeptscale these can change a lot, and the cost is increasing from month to month.
We're thinking about changing the process and the infrastructure (using our own, off of amazon/google etc). And also because of the performance, access to java libraries and other gains we're planning to go with jRuby.
I haven't got much experience with jRuby at all, but I do have Java experience. So I have a few questions:
Question intro: Since rails philosophy/approach differs from Javas, i.e ruby webserver uses far less memory but can only process one request at a time, and so having multiple servers sort of compensates the inability to process multiple requests.
If we go with jRuby (and have our rails project packaged as a war file and deployed to any servlet container i.e Tomcat or Jboss(more than just container)), will we be able to process multiple requests then?
Question intro: Currently we got some application logic running in the workers(instead of blocking the webserver, and not being able to serve other clients/browser clients). i.e when users submit some form and then our app needs to contact the 3rd party service to return the response, we simply let the worker do the workload of getting back from the 3rd party service and updating the ui (which reports waiting status) via websockets that the 3rd party service returned x/y or whatever status.
If we switch to jRuby, how will we achieve the similar logic? I mean do we go with the java code which has some kind of thread pool of workers and then free workers do the workload of contacting the 3rd party service etc? How would we go about this if we decide to go with jRuby?

1) You can serve multiple requests at a time in jruby with nearly any container, but you can also serve multiple requests at a time with mri-ruby. You only have to have a threadsafe app (config.threadsafe! is default in rails4). Different rack servers have different approaches to serve multiple requests at a time. For example unicorn uses multiple processes while passenger or puma go for a multi-threaded approach.
In my experience jruby containers like jboss or tomcat are more complicated to configure properly. But there are things like tourquebox, trinidad that help you with this. But you can even still go for some of the ruby servers (e.g. puma) that dont use c extensions.
2) If I understand you correctly you are looking for some background-processing library? You can use sidekiq or resque with ruby or jruby (while jruby will be faster in general, and its easier to debug memory leaks). You can even use ruby for your rack servers and jruby for your workers (can even be run in parallel with things like rvm/rbenv)
In general I would only go for the jruby option if you know what you are doing and need better performance for your app servers or if you want to speed up your worker servers. If I was you I would probably stay in the ruby world and use puma for your app and sidekiq as a background service. Both are very elegant and need not so much configuration.

Yes, JRuby uses Java threads and is really multithreaded. And I can say that it's really good in integration with Java, even using classes for JNI.
I can recommend next servers (some have already been mentioned):
puma (https://github.com/puma/puma)
any servlet container (even IBM WebSphere Application Server!) - just use warbler (https://github.com/jruby/warbler)
The 'simplest' way to run application on servlet container is make .war with warbler. Usually resulting .war file includes all dependencies and JRuby interpreter, so resulting file usually is 30 Mb. But I think that it is not so easy to setup warbler, then I wouldn't recommend this way if you don't really need to run Rails in enterprise Java environment.
And I would just remind that Rails opens DB connection for any request, then default size of DB connection pool of 5 isn't enough - don't forget to increase it before load testing :) (e.g. default thread pool for puma is 16, IBM WAS is 50, Tomcat - 200 threads).
I agree with smallbutton.com that puma is good choice. Finally, with puma you can switch between JRuby and other interpreter almost easy (in my experience there is one difference - gem's names)

What are my options in distributing a J2EE app on multiple servers?

I'm using JSP+Struts2+Tomcat6+Hibernate+MySQL as my J2EE developing environment. The first phase of the project has finished and it's up and running on a single server. due to growing scale of the website it's predicted that we're gonna face some performance issues in the future.
So we wanna distribute the application on several servers, What are my options around here?

Before optimize anything you should detect where your bottleneck is (Services, Database,...). If you do not do this, the optimization will be a waste of time and money.
And then the optimization is for example depending on you use case.
For example, if you have a read only application, add the bottleneck is both, Java Server and Database, then you can setup two database servers and two java servers.
Hardware is very important too. May the easiest way to to update the hardware. But this will only work if the hardware is the bottleneck.

You can use any J2EE application server that supports clustering (e.g. WebLogic, WebSphere, JBoss, Tomcat). You are already using Tomcat so you may want use their clustering solution. Note that each offering provides different levels of clustering support so you should do some research before picking a particular app server (make sure it is the right clustering solution for your needs).
Also porting code from a standalone to a cluster environment often requires a non-negligible amount of development work. Among many other things you'll need to make sure that your application doesn't rely on any local files on the file system (this is a bad J2EE practice anyway), that state (HTTP sessions or stateful EJB - if any) gets properly propagated to all nodes in your cluster, etc. As a general rule, the more stateless, the smoother the transition to a cluster environment.

As you are using Tomcat, I'd recommend to take a look at mod_cluster. But I suggest you to consider a real application server, like JBoss AS. Also, make sure to run some performance tests and understand where is the bottleneck of your application. Throwing more application servers is ineffective if, for instance, the bottleneck is at the database.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.