Spring integration - Appropriate pattern for collating/batching service calls - java

I have a remote service that I'm calling to load pricing data for a product, when a specific event occurs. Once loaded, the product pricing is then broadcast for another consumer to process elsewhere.
Rather than call the remote service on every event, I'd like to batch the events into small groups, and send them in one go.
I've cobbled together the following pattern based on an Aggregator. Although it works, lots of it 'smells' -- especially my SimpleCollatingAggregator. I'm new to Spring Integration, and EIP in general, and suspect I'm misusing components.
The Code
My code is triggered elsewhere in code by calling a method on the below #Gateway:
public interface ProductPricingGateway {
#Gateway(requestChannel="product.pricing.outbound.requests")
public void broadcastPricing(ProductIdentifer productIdentifier);
}
This is then wired to an aggregator, as follows:
<int:channel id="product.pricing.outbound.requests" />
<int:channel id="product.pricing.outbound.requests.batch" />
<int:aggregator input-channel="product.pricing.outbound.requests"
output-channel="product.pricing.outbound.requests.batch" release-strategy="releaseStrategy"
ref="collatingAggregator" method="collate"
correlation-strategy-expression="0"
expire-groups-upon-completion="true"
send-partial-result-on-expiry="true"/>
<bean id="collatingAggregator" class="com.mangofactory.pricing.SimpleCollatingAggregator" />
<bean id="releaseStrategy" class="org.springframework.integration.aggregator.TimeoutCountSequenceSizeReleaseStrategy">
<!-- Release when: 10 Messages ... or ... -->
<constructor-arg index="0" value="10" />
<!-- ... 5 seconds since first request -->
<constructor-arg index="1" value="5000" />
</bean>
Here's the aggregator implementation:
public class SimpleCollatingAggregator {
public List<?> collate(List<?> input)
{
return input;
}
}
Finally, this gets consumed on the following #ServiceActivator:
#ServiceActivator(inputChannel="product.pricing.outbound.requests.batch")
public void fetchPricing(List<ProductIdentifer> identifiers)
{
// omitted
}
Note: In practice, I'm also using #Async, to keep the calling code as quick-to-return as possible. I have a bunch of questions about that too, which I'll move to a seperate question.
Question 1:
Given what I'm trying to acheive, is an aggregator pattern an appropriate choice here? This feels like a lot of boilerplate -- is there a better way?
Question 2:
I'm using a fixed collation value of 0, to effectively say : 'It doesn't matter how you group these messages, take 'em as they come.'
Is this an appropriate way of achieving this?
Question 3:
SimpleCollatingAggregator simply looks wrong to me.
I want this to receive my individual inbound ProductIdentifier objects, and group them into batches, and then pass them along. This works, but is it appropriate? Are there better ways of acheiving the same thing?

Q1: Yes, but see Q3 and the further discussion below.
Q2: That is the correct way to say 'no correlation needed' (but you need the expire-groups-on-completion, which you have).
Q3: In this case, you don't need a custom Aggregator, just use the default (remove the ref and method attributes).
Note that the aggregator is a passive component; the release is triggered by the arrival of a new message; hence the second part of your release strategy will only kick in when a new message arrives (it won't spontaneously release the group after 5 seconds).
However, you can configure a MessageGroupStoreReaper for that purpose: http://static.springsource.org/spring-integration/reference/html/messaging-routing-chapter.html#aggregator

Related

Spring Integration and Transaction Management - how difficult need it be?

Using Spring Integration I am trying to built a simple message producing component. Basically something like this:
<jdbc:inbound-channel-adapter
channel="from.database"
data-source="dataSource"
query="SELECT * FROM my_table"
update="DELETE FROM my_table WHERE id IN (:id)"
row-mapper="someRowMapper">
<int:poller fixed-rate="5000">
<int:transactional/>
</int:poller>
</jdbc:inbound-channel-adapter>
<int:splitter
id="messageProducer"
input-channel="from.database"
output-channel="to.mq" />
<jms:outbound-channel-adapter
channel="to.mq"
destination="myMqQueue"
connection-factory="jmsConnectionFactory"
extract-payload="true" />
<beans:bean id="myMqQueue" class="com.ibm.mq.jms.MQQueue">
<!-- properties omitted --!>
</beans:bean>
The "messageProducer" may produce several messages per poll but not necessarily one per row.
My concern is that I want to make sure that rows are not deleted from my_table unless the messages produced has been committed to the MQ channel.
On the other hand I will accept that rows in case of db- or network failure are not deleted thus causing duplicate messages to be produced. In other words I will settle for a non-XA one-phase commit with possible duplicates.
When trying to figure out what I need to put to my Spring configuration I quickly get lost in endless discussions about transaction managers, AOP and transaction advice chains which I find difficult to understand - I know I ought to though.
But I fear that I will spend a lot of time cooking up a configuration that is not really necessary for my problem at hand.
So - my question is: Can it be that simple - or do I need to provide explicit configuration for transaction synchronization?
But can I do something similar with a jdbc/jms mix?
I'd say "Yes".
Please, read Dave Syer's article about Best effort 1PC, where the ChainedTransactionManager came from.

Server-side paging possible?

In a Java application, I am using Spring-Data to access a Neo4j database via the REST binding.
The spring.xml used as a context contains the following lines:
<neo4j:config graphDatabaseService="graphDatabaseService" />
<neo4j:repositories base-package="org.example.graph.repositories"/>
<bean id="graphDatabaseService"
class="org.springframework.data.neo4j.rest.SpringRestGraphDatabase">
<constructor-arg index="0" value="http://example.org:1234/db/data" />
</bean>
My repository is very simple:
public interface FooRepository extends GraphRepository<Foo> {
}
Now, I would like to loop through some Foos:
for (Foo foo : fooRepository.findAll(new PageRequest(0, 5))) //...
However, the performance of this request is awful: It takes over 400 seconds (!) to complete.
After a bit of debugging, I found out that Spring-data generates the following query:
START `foo`=node:__types__(className="org.example.Foo") RETURN `foo`
It then looks like as if paging is done on the client and all Foos (more than 100,000) are transferred to the client. When issuing the above query to the Neo4j server using the web interface, it takes around 60 seconds. However, if I manually append a "LIMIT 5", the execution time reduces to around 0.5 seconds.
What am I doing wrong so that spring-data does not use server-side, CYPHER pagination?
According to Programming Model
the expensive operations like traversals and querying are executed efficiently on the server side by using the REST API to forward those calls.
Or does this exclude the pagination?
What other options do I have in this case?
You can do the below to handle this server side.
Provide your own query method in the repository
The cypher query should use order, skip and limit and parameterize them so that you can pass in the skip and limit values on a per page basis.
E.g.
start john=node:users("name:pangea")
match john-[:HAS_SEEN]-(movie)
return movie
order by movie.name?
skip 20
limit 10

How to wait for completion of multi-threaded publish-subscribe-channel

I have a Spring Integration project where I want to process a message concurrently through multiple actions. So I have set up a publish-subscribe-channel with a task-executor. However I want to wait for all processing to complete before moving on. How would I do this?
<publish-subscribe-channel id="myPubSub" task-executor="my10ThreadPool"/>
<channel id="myOutputChannel"/>
<service-activator input-channel="myPubSub" output-channel="myOutputChannel"
ref="beanA" method="blah"/>
<service-activator input-channel="myPubSub" output-channel="myOutputChannel"
ref="beanB" method="blah"/>
<service-activator id="afterThreadingProcessor" input-channel="myOutputChannel" .../>
So in the above case, I want my afterThreadingProcessor to be invoked only once after both beanA and beanB have completed their work. However, in the above afterThreadingProcessor will be invoked twice.
Add apply-sequence="true" to the pub-sub channel (this adds default correlation data to the messages, including correlationId, sequenceSize, and sequenceNumber and allows default strategies to be used on downstream components).
Add an <aggregator/> before afterThreadingProcessor and route the output from the two <service-activator/>s to it.
Add a <splitter/> after the aggregator - the default splitter will split the collection made by the aggregator into two messages.
afterThreadingProcessor will be invoked once for each message on the second thread that completes its work.
You can make the configuration easier by using a chain...
<chain input-channel="myOutputChannel">
<aggregator />
<splitter />
<service-activator id="afterThreadingProcessor" input-channel="myOutputChannel" .../>
</chain>
To make a single call to the final service, just change your service to take a Collection<?> instead of adding the splitter.
EDIT:
In order to do what you want in comment #3 (run the final service on the original thread), something this should work...
<int:channel id="foo" />
<int:service-activator ref="twoServicesGateway" input-channel="foo"
output-channel="myOutputChannel" />
<int:gateway id="twoServicesGateway" default-request-channel="myPubSub"/>
<int:publish-subscribe-channel id="myPubSub" task-executor="my10ThreadPool"
apply-sequence="true"/>
<int:service-activator input-channel="myPubSub" output-channel="aggregatorChannel"
ref="beanA" method="blah"/>
<int:service-activator input-channel="myPubSub" output-channel="aggregatorChannel"
ref="beanB" method="blah"/>
<int:aggregator input-channel="aggregatorChannel" />
<int:service-activator id="afterThreadingProcessor" input-channel="myOutputChannel" .../>
In this case, the gateway encapsulates the two other services and the aggregator; the default service-interface is a simple RequestReplyExchanger. The calling thread will wait for the output. Since the aggregator has no output-channel the framework will send the reply to the gateway, and the waiting thread will receive it, return to the <service-activator/> and the result will then be sent to the final service.
You would probably want to put a reply-timeout on the gateway because, by default, it will wait indefinitely and, if one of the services returns a null, no agreggated response will ever be received.
Note that I indented the gateway flow just to show it runs from the gateway, they are NOT child elements of the gateway.
Same kind of behavior can now be achieved using a more cleaner approach introduced in Spring Integration 4.1.0 as an implementation of EIP Scatter-Gather pattern.
Checkout Scatter-Gather example gist:
https://gist.github.com/noorulhaq/b13a19b9054941985109

Calling Spring Integration transformers dynamically

I am new in Spring integration and working on a SI project. I am doing a simple job of getting message from a channel (fromAdapter), calling a transformer and sending the output to another channel (toQueue). The below code is used in the SI configuration file ----
<int:channel id="fromAdapter"></int:channel>
<int:channel id="toQueue">
</int:channel>
<bean id="trans" class="src.MyTransformer"></bean>
<int:transformer input-channel="fromAdapter" output-channel="toQueue" ref="trans"></int:transformer>
However, now I have a slightly complex requirement. Instead of always sending the message to one transformer,based on some value of the message, I want to send the message to any one of 6 transformers. How can this be implemented?
The recipient list router will work, and may be appropriate if you want to send a message to multiple transformers, but if not, you'll have to be careful to make the selector expressions mutually exclusive. Maybe one of the simpler routers might be more appropriate. For example...
<header-value-router input-channel="routingChannel" header-name="foo">
<mapping value="1" channel="channel1" />
<mapping value="2" channel="channel2" />
</header-value-router>
or
<router id="spelRouter" input-channel="expressionRouter"
expression="payload.someProperty"
default-output-channel="defaultChannelForExpression"
resolution-required="false">
<mapping value="foo" channel="fooChannelForExpression"/>
<mapping value="bar" channel="barChannelForExpression"/>
</router>
You can declare those 6 transformers as subscribers to a single point-to-point channel and by default it will use a round robin dispatching strategy (it will only invoke a single transformer for each message, but it will always pick the next transformer in the list and then cycle).
In your case, you should simply declare all those transformers to use the exact same input and output channels and the above will automagically happen.
To pick the transformer based on some attribute of your message, you can use a recipient-list-router and define a selector-expression for each recipient in the list in order to match a particular kind of message. Also, for each recipient you should use a different channel name. Then each of those channels will be used as input by the desired transformer:
<recipient-list-router input-channel="fromAdapter" default-output-channel="toQueue">
<recipient channel="t1" selector-expression="payload.someFlag"/>
<recipient channel="t2" selector-expression="headers.someOtherFlag"/>
</recipient-list-router>
<transformer input-channel="t1" ref="transformer1" method="transform"/>
<transformer input-channel="t2" ref="transformer2" method="transform"/>
Keep in mind that with this approach, a message could match more than one selector expression so it's up to you to provide mutually exclusive expressions.
Or, if you are willing to write some infrastructure code, you can write your own implementation of LoadBalancingStrategy and provide that to your point-to-point channel. Your strategy will then be responsible for picking the right handler for each message.

Waiting for all threads to finish in Spring Integration

I have a self-executable jar program that relies heavily on Spring Integration. The problem I am having is that the program is terminating before the other Spring beans have completely finished.
Below is a cut-down version of the code I'm using, I can supply more code/configuration if needed. The entry point is a main() method, which bootstraps Spring and starts the import process:
public static void main(String[] args) {
ctx = new ClassPathXmlApplicationContext("flow.xml");
DataImporter importer = (DataImporter)ctx.getBean("MyImporterBean");
try {
importer.startImport();
} catch (Exception e) {
e.printStackTrace();
} finally {
ctx.close();
}
}
The DataImporter contains a simple loop that fires messages to a Spring Integration gateway. This delivers an active "push" approach to the flow, rather than the common approach of polling for data. This is where my problem comes in:
public void startImport() throws Exception {
for (Item item : items) {
gatewayBean.publish(item);
Thread.sleep(200); // Yield period
}
}
For completeness, the flow XML looks something like this:
<gateway default-request-channel="inChannel" service-interface="GatewayBean" />
<splitter input-channel="inChannel" output-channel="splitChannel" />
<payload-type-router input-channel="splitChannel">
<mapping type="Item" channel="itemChannel" />
<mapping type="SomeOtherItem" channel="anotherChannel" />
</payload-type-router>
<outbound-channel-adapter channel="itemChannel" ref="DAOBean" method="persist" />
The flow starts and processes items effectively, but once the startImport() loop finishes the main thread terminates and tears down all the Spring Integration threads immediately. This results in a race condition, the last (n) items are not completely processed when the program terminates.
I have an idea of maintaining a reference count of the items I am processing, but this is proving to be quite complicated, since the flow often splits/routes the messages to multiple service activators - meaning it is difficult to determine if each item has "finished".
What I think I need is some way to either check that no Spring beans are still executing, or to flag that all items sent to the gateway have been completely processed before terminating.
My question is, how might I go about doing either of these, or is there a better approach to my problem I haven't thought of?
You're not using a request-response pattern here.
outbound-channel-adapter is a fire and forget action, if you want to wait for the response you should use an outbound-gateway that will wait for response, and connect the response to the original gateway, then in java sendAndReceive not just publish.
If you can get an Item to determine, whether it is still needed or not (processingFinished() or something similar executed in the back-end-stages), you can register all Items at a central authority, which keeps track of the number of non-finished Items and effecitvely determines a termination-condition.
If this approach is feasible, you could even think of packaging the items into FutureTask-objects or make use of similar concepts from java.util.concurrent.
Edit: Second Idea:
Have you thought about making the channels more intelligent? A sender closes the channel once it does not send any more data. In this scenario, the worker-beans do not have to be deamon threads but can determine their termination-criterion based on a closed and empty input channel.

Categories