Spring Integration - Sharing lock across steps

Spring Integration - Sharing lock across steps - java

I have the following setup: a number of devices send data via HTTP to my backend, where multiple instances a receiver component are running. I need to process the data and then send it to another external partner, who needs this data in timestamp order. So I came up with the following architecture:
There are n receiver instances running, with a load balancer in front of them, so they potientially get data from all devices. These instances process each incoming data by adding some information and then put the data into a Redis Sorted Set (there is one for each device). After this they send a message (via Redis) about how many data entries are currently in the set.
There are m processing instances whose task is it to send the data to the external partner. They listen to the messages sent by the receivers and if the number of entries inside a set is larger than some threshold, they retrieve the data from the queue, add some other information and then send it to the external partner.
The problem I have is the timestamp order requirement. I have n and m instances, each one running multiple threads. For the processing instances, who all receive the messages from the receiver, I thought about doing the retrieval of the data from the set and sending it to the external partner inside a shared Redis lock for the queue associated with the message (and the respective device). But currently there are multiple Spring Integration steps that are part of the processing flow: get the data from the queue -> transform it for sending -> send it via an HTTP outbound channel. I thought about using a lock that is obtained in the first step (getting the data from the queue) and released in the last step (after sending it via the outbound channel). In case of an error the lock would be released in the error processing step.
Are there any ideas for alternatives to this? I was thinking about sending the lock as part of the message header through the remaining flow and then release it at the end.

Since you say about ordering, you should consider to use PriorityChannel or Resequencer to reorder records before sending to the external partner.
Both of them can be configured with the shared MessageStore.

Related

Manage delivery of JMS messages to multiple servers

Our app uses Spring Boot and JMS messages with Tibco. We have two production servers running and processing messages concurrently. Servers are listening to the same one queue. Each server has 10 concurrent listeners. I do not want the very same message gets processed by both servers at the same time. Nothing prevents our queue of having duplicate messages, like we can have two copies of the message A in the queue. If messages in the queue are: A, A, B, C, D, then if first A gets delivered to server1 and second A gets delivered to server2, and both servers process A at the same time, then they are chances of creating duplicate entities. I want to find a way to send all A messages to only one server. I can't use Message Selector b/c we have the same code base running on both servers. This is what I'm considering:
Based on the message, set properties in the headers. Once the message got delivered to the process() method, depending on which server is processing the message, either discard, simply return the message or process the message and acknowledge it. The problem with this solution is that since we need to dynamicacaly find out which server is processing the message, the server name needs to be hardcoded, meaning if the server moves, the code breaks!
Other solution - that might work - is the Destination field.
https://docs.spring.io/spring/docs/4.0.x/spring-framework-reference/html/jms.html
Destinations, like ConnectionFactories, are JMS administered objects
that can be stored and retrieved in JNDI. When configuring a Spring
application context you can use the JNDI factory class
JndiObjectFactoryBean / to perform dependency
injection on your object’s references to JMS destinations.
It's something I never done before. Is there anyway, to configure the Destination that it picks up the right server to route the message to? Meaning, if message1 is supposed to be delivered to server1, then it does not even gets delivered to server2 and remains in the queue until server1 consumes it?
What are other ways to implement this?
EDIT:
I still do not know what’s the best way to send certain messages to only one server for processing, however, accepted the response given to use database as validation, b/c this is what we consider to avoid creating duplicate entities when processing the data.

I think the idea of using the JMS Destination is a non-starter as there is nothing in the JMS specification which guarantees any kind of link between the destination and a broker. The destination is just an encapsulation for the provider-specific queue/topic name.
The bottom line here is that you either need to prevent the duplicate messages in the first place or have some way to coordinate the consumers to deal with the duplicates after they've been pulled off the queue. I think you could do either of these using an external system like a database, e.g.:
When producing the message check the database for an indication that the message was sent already. If no indication is found then write a record to the database (will need to use a primary key to prevent duplicates) and send the message. Otherwise don't send the message.
When consuming the message check the database for an indication that the message is being (or was) consumed already. If no indication is found then write a record to the database (will need to use a primary key to prevent duplicates) and process the message. Otherwise just acknowledge the message without processing it.

I suggest an alternative to "post DB sync".
Keep the servers and listeners as-is, and broadcast all+ the the processed messages on a topic. For servers just starting, you can use "durable subscribers" to not miss any messages.
If you broadcast each start and end of processing for messages A, B, C, etc AND consider adding a little pause (in milli), you should avoid collisions. It's the main risk of course.
It's not clear to me if you should validate for duplicate processing at the beginning or end of a message processing... it depends on your needs.
If this whole idea is not acceptable, DB validation might be the only option, but as stated in comments above, I fear for scaling.

How to send requests parallel to asynchronous services and collect the responses in a Java EE application?

I develop an application that at some point starts to aggregate an infomation from a bunch of services. Some of that services are called via SOAP interfaces synchronously and some of them works asynchronosly - I have to send a request to JMS queue Q1 and get an answer to Q2 at some point.
The problem is that the app sends requests in one thread and the responses a processed using MDBs (Message-Driven Bean). The solution from the top of my head is to store already aggregated responses in some shared container (like ConcurrentHashMap) with some correlationId. So when an MDB gets a response it looks through the shared container and adds response to the corresponding record.
The app runs on WildFly AS in domain HA mode.
Are there some problems that I can run into with this approach? Like the container will be instantiated one for each node in cluster.
Or I can accidently process so many requests that I will store so many responses that I will get OutOfMemoryError?
What are the best approaches for this kind of problems?

Let me answer your questions:
A response to a JMS service call could arrive anytime (quite soon : the destination server down, the operator take a rest, etc). So you should store the requests in a database during the data aggregation.
Performance issues always could happen when you serve many requests parallel. And if you have asynchronous answers you can store many-many hashes for a long time (or with SFSB activate/passivate) till the last answer arrives. The first answer (partly) solve this problem as well, because it stores most of the data in the db and takes just the current ones in the memory. And more robust. The persistent data live survive a server crash/shutdown.
When you need the data, create a db entry for all and send out the requests with its PK in the header. When an answer arrives, its header contains the same PK for the identification. The MDBs are best way to receive them. But use them just to receive the messages. Process its contents by EJBs. Delegate the message contents synchronously to the EJB(s) and acknowledge them according to the EJB answers. At the very end of the EJB processing fetch the IDs of the unprocessed requests belong to the current aggregation. If there is no one, (remove the query entries from the db table and) call the appropriate EJB (via MDB?) to proceed the work with the fulfilled data needs.

Send data from Node.js API through SQS to Java worker and return the result of the worker to the API through SQS

I am creating a logging service. The architecture is as follows: A Node.js API, that receives requests from a website (the requests can be get or post), the request is send to an SQS messaging queue, a Java worker is listening to the SQS for messages if it is post I write the data in a Cassandra database. If it is get, I read the necessary data from Cassandra, do some computations and return it to the Node.js API, which in tern returns it to the client.
The read part is a little blurry for me. Is it possible to return the data as a message in SQS? (I red that a single message can contain only 256KB of data and the read data can be more than that) I will be running multiple instances of the Node API, so is there a way to know to which instance I need to return the data? Should I create a Java API that receives read requests from the Node API (bypassing the SQS)? What is the best way to do this?
Should I use a message queue to retrieve analytics data, or should I just connect to the service that will prepare the data and receive the data from there?

You don't really return data to node.js from SQS, nodejs has to be polling the queue and needs to act on messages if/when they arrive.
SO you could use two queues, one for incoming messages (your posts), and then a second queue for your outgoing messages (your gets). But in all cases, the process that is going to consue those messages needs to be polling for them - you can't 'push' messages to a listener, the listerner needs to 'pull' them.

Design for scalable periodic queue message batching

We currently have a distributed setup where we are publishing events to SQS and we have an application which has multiple hosts that drains messages from the queue and does some transformation over it and transmits to interested parties. I have a use case where the receiving end point has scalability concerns with the message volume and hence we would like to batch these messages periodically (say every 15 mins) in the application before sending it.
The incoming message rate is around 200 messages per second and each message is no more than 10 KB. This system need not be real time, but would definitely be a good to have and also the order is not important (its okay if a batch containing older messages gets sent first).
One approach that I can think of is maintaining an embedded database within the application (each host) that batches the events and another thread that runs periodically and clears the data.
Another approach could be to create timestamped buckets in a a distributed key-value store (s3, dynamo etc.) where we write the message to the correct bucket based the messages time stamp and we periodically clear the buckets.
We can run into several issues here, since the messages would be out of order a bucket might have already been cleared (can be solved by having a default bucket though), would need to accurately decide when to clear a bucket etc.
The way I see it, at least two components would be required one which does the batching into a temporary storage and another that clears it.
Any feedback on the above approaches would help, also it looks like a common problem are they any existing solutions that I can leverage ?
Thanks

Generic QoS Message batching and compression in Java

We have a custom messaging system written in Java, and I want to implement a basic batching/compression feature that basically under heavy load it will aggregate a bunch of push responses into a single push response.
Essentially:
if we detect 3 messages were sent in the past second then start batching responses and schedule a timer to fire in 5 seconds
The timer will aggregate all the message responses received in the next 5 seconds into a single message
I'm sure this has been implemented before I'm just looking for the best example of it in Java. I'm not looking for a full blown messaging layer, just the basic detect messages per second and schedule some task (obviously I can easily write this myself I just want to compare it with any existing algorithms to make sure I'm not missing any edge cases or that I've simplified the problem as much as possible).
Are there any good open source examples of building a basic QoS batching/throttling/compression implementations?

we are using a very similar mechanism for high load.
it will work as you described it
* Aggregate messages over a given interval
* Send a List instead of a single message after that.
* Start aggregating again.
You should watch out for the following pitfalls:
* If you are using a transacted messaging system like JMS you can get into trouble because your implementation will not be able to send inside the JMS transaction so it will keep aggregating. Depending on the size of your data structure to hold the messages this can run out of space. If you are have very long transactions sending many messages this can pose a problem.
* Sending a message in such a way will happen asynchronous because a different thread will be sending the message and the thread calling the send() method will only put it in the data structure.
* Sticking to the JMS example you should keep in mind that they way messages are consumed is also changed by this approach. Because you will receive the list of messages from JMS as a single message. So once you commit this single JMS message you commited the entire list of messages. You should check if this i a problem to your requirements.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.