What is the difference between SEDA, VM and direct in Apache Camel? - java

I had worked with both SEDA and direct, and I've also read the documentation.
But I still cannot visualize the usage of SEDA and direct. Vm is new to me.
Please explain it with an example.

There are at least four different mechanisms by which one Camel route can directly pass data to another. By "directly" I mean without using a network or some form of intermediate storage (file, database). These mechanisms can be grouped according to whether they can pass data between CamelContext instances or not, and whether they are synchronous or asynchronous.
direct -- single CamelContext, synchronous (blocks producer)
SEDA -- single CamelContext, asynchronous (does not block producer)
VM -- multiple CamelContext, asynchronous (does not block producer)
direct-VM -- multiple CamelContext, synchronous (blocks producer)
The direct and direct-VM mechanisms are synchronous, in the sense that the producing endpoint blocks until the consuming endpoint, and all the rest of its routing logic, is complete. The SEDA and VM mechanisms both use a pool of threads on the consumer, such that each request made by the producer is assigned to one of the threads in the pool. This allows the consumer endpoint and its associated routing logic to act independently of the producer.
Both the VM endpoints are required in situations where communication is between different Camel contexts. In many cases it is possible to combine routes into the same CamelContext. However, it may sometimes be inadvisable, for reasons of modularity, or impossible, because some application framework makes it so. For example, I might implement some Camel routing logic in a library (or component) with the intention that the library be used by other code. To be complete, this library will probably define a self-contained CamelContext with various routes. If I want to invoke the Camel logic in the library, I will need to use VM or direct-VM, because direct and SEDA endpoints do not contain the logic needed to route between Camel contexts.

The difference between direct: and seda: components is that the first is synchronous and the second is asynchronous, with a queue.
The practical difference is that for sending synchronous messages, you must wait for the route to complete, whereas with asynchronous messages, its "fire and forget" - you put them onto a queue and presume a consumer will process them. You can also have multiple consumers (parallelisation).
The last example, vm: is also asynchronous, but it can also call routes in different camel contexts within the same JVM. Imagine application 1 has a camel context, and application 2 has a camel context, in this way they can communicate with each other.
edit:
In relation to "what to use when" :
use direct: for calling normally between endpoints in a camel context
use seda: when you need parallelisation or queues, but dont want to use jms:
use vm: when calling between applications.
There are of course many other use cases but those are the common ones (subjective to my own experience)

Related

Kafka Consumers in Onion Architecture

I am working with a project following DDD and has consumers to a Kafka Queue. My question is straight forward, where do the consumers reside in the Onion Architecture and Hexagonal Architecture? Are they event handlers or should they be a part of the infrastructure?
I am using Kafka Consumers to listen to change events of other aggregate roots and want to store the data in my current aggregate. Basically replicating the data from one microservice to another.
The way I see it is:
Your aggregates are the Core
The message handlers are Use Cases, which use dependencies (like repository interfaces) and the core to execute the business use case.
There is infrastructure code that peeks messages from the queue and triggers use cases
With this approach, you can unit test your use cases without worrying about infrastructure and you can replace the whole messaging queue technology. The same approach works for handling API requests. In practice, the only difference is that with the API you can return a response synchronously, and with messaging you can't.
As a practical note, in .NET I use a library called Mediatr to implement and trigger the use cases. In java, I found PipelinR which looks similar at first sight. This type of approach allows you to implement all use cases the same way for all your synchronous and asynchronous usages.
Think this way: if the same message your application receives from a Kafka Queue should also be received with a RESTful endpoint, would it be any difference between the consumer and the RESTful endpoint from the PoV of their place in the architectural layers? no, it wouldn't, they'd part of the same layer because they do the same: they accept an external message, despite the fact that the communication channel & type is (very) different.
According to The Onion Architecture : part 2:
SpeakerController is part of the user interface
and I'd say that the same (i.e. user interface) is for the RESTful endpoint and the Kafka Queue consumer. Both of them would contain no business logic at all but they'll delegate to an Application Service. If the message types are not exactly the same, there might be message-integrity-validation & conversion-to-DTO particular to each other before delegating to the target Application Service.
The ideea is that one could add more communication channels (e.g. command line, web sockets, etc) but as long as the use cases don't change than the entire Application Core remains unchanged because it doesn't depend on the User Interface but the opposite.

Asynchronous Message-Passing and Microservices

I am planning the develop of a microservice based architecture application and I decided to use kafka for the internal communicaton while I was reading the book Microservice Architecture by Ronnie Mitra; Matt McLarty; Mike Amundsen; Irakli Nadareishvili where they said:
letting microservices directly interact with message brokers (such as
RabbitMQ, etc.) is rarely a good idea. If two microservices are
directly communicating via a message-queue channel, they are sharing a
data space (the channel) and we have already talked, at length, about
the evils of two microservices sharing a data space. Instead, what we
can do is encapsulate message-passing behind an independent
microservice that can provide message-passing capability, in a loosely
coupled way, to all interested microservices.
I am using Netflix Eureka for Service registration and discovery, Zuul as edge server and Hystrix.
Said so, in practice, how can I implement that kind of microservice? How can I make my microservices indipendent from the communcation channel ( in this case Kafka)?
Actually I'm directly interacting with the channel, so I don't have an extra layer between my publishers/subscribers and kafka.
UPDATE 06/02/2018
to be more precise, we have a couple of microservices: one is publishing news on a topic (activemq, kafka...) and the other microservice is subscribed on that topic and doing some operations on the messages that are coming through. So we have these services that are coupled to the message broker (to the channel)... we have the the message broker's apis "embedeed" on our code and for example, if we want to change the message broker we have to change all the microservices that made use of the message broker's api. So, they are suggesting to use a microservice(in the picture I assume is the Events Hub) that is the "dispatcher" of the various messages. In this way it is the only component that interacts with the channel.
A general foreword - Don't do it if you don't need it. Introducing a queue system can be a big improvement if you are dealing with high number of events and events backing up issues etc. But if you don't face any issues you are probably better off with the lower complexity of a direct service communication.
Back to your question - It sounds like you want to abstract your communication with the queue because you are worried about the effort for replacing the queue with a different system - Is that correct?
In this case you can either do what you proposed - Develop a new service in the middle. This comes with all the baggage of a physical service (including deployment, scaling, etc).
Or the second alternative is to write a client library that abstracts the queue the way you want and allows you to reuse it in all services requiring to participate in the queue. This way you don't have to physically deploy another service for this purpose but you are still in full control of what your interface to the queue should look like and you have a single piece of code to incorporate changes (at least toward the direction of the queue). This would work given you are sure the app-facing side of the library can be stable enough.
But, again, don't do any of those in the first iteration when you are not sure you need all the complexity. (Over-engineering is a dangerous thing)
You should create a Interface lets say "Queue" which provide all functionalities which you want from Kafka or RabbitMQ, the create diff. impl like KafkaQueue and RabbitMQQueue of the Queue interface and inject the right impl which you want to use in your system.
In this your if new queue system is used , your existing code will not be changed
Creating another microservice is an extra overhead in this case
In a service architecture proper way to make your code independent out of constraints of communication channel is by having properly modeled self-sufficient messages. Historic examples would be WSDL in document mode, EDIFACT, HATEOAS etc. From this point of view microservices with spring-boot and kafka are just different implementation of same old thing done since mainframes ruled the world.
Essentially if you take a view of your app as blackbox asynchronous server; everything app does is receives events and produces new ones. It should not matter how events are raised within app. Http requests, xml within jms messages, json in kafka, whatever - all those things are just a way to pass events and business layer of application should respond only to a content of events.
So business layer is usually structured around some custom model/domain which are delivered as payload. Business layer is invoked/triggered by listener/producer layer which talks to communcation channel (kafka listener, http listener etc..). Aside from logging and enforcing security you should not have communication channel logic in app. I have seen unfortunate examples of business logic driven by by originating jms connection or parsing url of request. If you ever have this in your code you have failed to properly structure your code.
However that is easier to say than to implement. Some people are good at this level of modeling, and some never learn.
And there is no other way to learn but to try and fail.

Consume message only once from Topic per listeners running in cluster

I'm implementing an Domain Event infrastructure, but the project doesn't allow any messaging infra(financial services client) so found an alternative in Hazelcast Topics and ExecutorService,
But the problem is when running in cluster the message shall be delivered to listeners which is going to be running in cluster, so for a cluster of 2, we have same listener running in 2 jvm, and message consume twice and acted upon, suppose the Domain event is supposed to perform some non idempotent operation like credit some loyalty points, unless I explicitly maintain a trace of domain event acted upon and check against that everytime I receive an event, I will end up crediting it twice, "any suggestion implementing this without having to write those boiler plates possibly at the infralayer", or is there a known patter for such implementation.
Edit: Meanwhile I'm also evaluating Hazelcast ExecutorService as suggested Here
The use case you described can be solved by using Hazelcast's Queues instead of Topics. The main reason to use topics is if you are interested that multiple (possibly independent) consumers get the same message. Your requirement sounds like you are interested that only one of the consumers gets the message, and that's what queues are for, see the Hazelcast documentation for Queues.

Best way to utilize multiple instances of a service

We have a component called Workflow which exposes SOAP web service. We are trying to introduce a asynchronous processing in Workflow by allowing it to consume messages from WebSphere MQ. We also want to utilize multiple instances of Workflow. So there can be 4 instances of Workflow listening to same queue. The problem here is, how to make sure all Workflow instances are utilized evenly and not single instance is overloaded.
Workflow is completely written in Java. We use Spring and Hibernate extensively. The processes which will be submitting message to Workflow are written in Java. For message processing and MQ, we use Spring Integration.
The best way to ensure that no Workflow instance is overloaded is to have each individual Workflow instance not consume a message from the message queue that will overload it. In this case, you may not care whether the work is distributed evenly, as long as all the work gets done promptly.
If you really want to make sure all Workflow instances are used evenly even when your load is so light that you don't need all of the instances, you may need to check whether there's a way of reconfiguring WebSphere MQ to distribute messages on a FIFO basis rather than a LIFO basis, or if WebSphere MQ can't be configured that way, to switch to a different message queue. However, I don't recommend this: the system as a whole can work perfectly fine even if, at low loads, only some of the Workflow instances are utilized, with all being utilized only at high loads.

Does synchronous servlet processing make sense for a distributed server-side application

The scope/context of this question:
I am to develop a Java/Java EE based distributed server-side application that is scalable (scale-up, rather than scale-out).
My application comprises of servlets utilizing multiple instances of distributed back-end services for processing client requests. If I need to achieve more throughput, I want to be able to just add more instances of these distributed services (JVMs on the same or another machine) and (expect to) see an increase in throughput.
To achieve this, I was thinking of a loosely-coupled asynchronous system.
I thought I would use Async Servlets (servlet 3.0) and an application-managed thread-pool that places client requests on JMS queues, which would be picked by one of the distributed service instances and processed. The responses can be relayed back to the client using JMS, from the service instances to a response-thread in the servlet container.
However, an asynchronous system seems to be (obviously) more complex than a synchronous one (ex: error-handling and error-relaying to the client, request tracking etc). I am also worried about the future maintainability of the design/code.
So, a question arises Does it make sense to do this synchronously, while still remaining distributed, scalable and loosely-coupled ?
If the answer is yes, then pls also share possible ways of achieving this (while remaining 'constructive').
If I can do this well in a synchronous way, then it will simplify the entire system.
I dont want to add complexity to the system unnecessarily.
(Assuming it makes sense) One possible implementation I could think of is using RMI.
For ex: A service registry for the distributed service instances to register and have a load-balancer distribute the RMI calls across all the available instances. But it feels to be a old-generation solution. Are there any better options available ?
Edit:
Other details about the scope of this question:
The client-side is browser-based does not demand an asynchronous
server-side.
I dont need server-push.
At any time, I wont have more outstanding requests than max-worker-threads of the popular web servers (even Apache).
For the above reasons, the use-cases mentioned in a related question dont seem to apply to my scenario.
Loose coupling and distribution are independent of whether processing is synchronous or asynchronous.
With scalability, the matter is more complex. In a synchronous model, you will need one thread per pending request. If you need to scale to really high load (say, thousands of concurrent requests per server), an asynchronous model may scale better. To reap the benefit of that however, the entire processing, starting from the handling of incoming connections, needs to be done in an asynchronous way. There is little point to have a synchronous request processing thread delegate to a asynchronous thread pool, and blocking until that thread pool has computed the result - after all, the request thread could just as well have done the work himself.
If you need to return a response, I'd therefore go for synchronous request processing whenever scalabity permits (which it usually does).
Edit:
There are numerous ways to talk to the distributed backend servers. You might simply use EJB (which, if I recall correctly, uses RMI under the hood). Or, you might use webservices behind a load balancer.

Categories