Spring integration with multiple executor channel not processing in parallel - java

I have requirement where i need to pass message to multiple channels asyc. To make my flow asyc i am using all executor channel. But for some reason flow is still sequential. i can seen diff thread as i configured in task executor but in sequence.
Here is the configuration I am using
<int:channel id="mainChannel">
<int:interceptors>
<int:wire-tap channel="channel1"/>
<int:wire-tap channel="channel2"/>
<int:wire-tap channel="channel3"/>
</int:interceptors>
</int:channel>
<int:channel id="channel1">
<int:dispatcher task-executor="exec1" />
</int:channel>
<int:channel id="channel2">
<int:dispatcher task-executor="exec2" />
</int:channel>
<int:channel id="channel3">
<int:dispatcher task-executor="exec3" />
</int:channel>
As per my understanding all this will be asyc (in my case 3 thread should run in parallel)
from log i can see all sequential but with diff thread name..
I am assuming preSend/Postsend should have been called in random order.
am i missing anything to make multiple executor channel in parallel.
I will really appreciate any help.

You might need to call the async implementation bean as shown:
<beans:bean id="asyncExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
<int:channel id="channel1">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
<int:channel id="channel2">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
<int:channel id="channel3">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
Description of SimpleAsyncTaskExecutor:
public class SimpleAsyncTaskExecutor extends CustomizableThreadCreator
implements AsyncListenableTaskExecutor, Serializable
TaskExecutor implementation that fires up a new Thread for each task,
executing it asynchronously.
Supports limiting concurrent threads through the "concurrencyLimit"
bean property. By default, the number of concurrent threads is
unlimited.
NOTE: This implementation does not reuse threads! Consider a
thread-pooling TaskExecutor implementation instead, in particular for
executing a large number of short-lived tasks.
Example Of Usage from Github:
<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:beans="http://www.springframework.org/schema/beans"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/integration/spring-integration.xsd">
<channel id="taskExecutorOnly">
<dispatcher task-executor="taskExecutor"/>
</channel>
<channel id="failoverFalse">
<dispatcher failover="false"/>
</channel>
<channel id="failoverTrue">
<dispatcher failover="true"/>
</channel>
<channel id="loadBalancerDisabled">
<dispatcher load-balancer="none"/>
</channel>
<channel id="loadBalancerDisabledAndTaskExecutor">
<dispatcher load-balancer="none" task-executor="taskExecutor"/>
</channel>
<channel id="roundRobinLoadBalancerAndTaskExecutor">
<dispatcher load-balancer="round-robin" task-executor="taskExecutor"/>
</channel>
<channel id="lbRefChannel">
<dispatcher load-balancer-ref="lb"/>
</channel>
<beans:bean id="taskExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
<beans:bean id="lb"
class="org.springframework.integration.channel.config.DispatchingChannelParserTests.SampleLoadBalancingStrategy"/>
</beans:beans>

from log i can see all sequential but with diff thread name
Because logs are just a single place where messages are printed and they really are printed by one writer even if from different thread. They are appear over there one by one. With a good load you would definitely see that messages are logged in an unexpected order.
I am assuming preSend/Postsend should have been called in random order.
That's not true. Interceptors are called in an order how they are added to the channel and if their order is the same, which is a case for you. It is already not an interceptor chain responsibility how those interceptors are implemented.
I think you just were not lucky to see logs in arbitrary order and probably just because consumers for those executor channels are plain loggers - no any loads to hold the thread and have an impression that work in other threads is done in parallel.

Related

Spring Integration : Dispatcher has no subscribers for channel

I am trying to build a spring integration application, which has the following configuration (the culprit seems to be the channel xsltSpecific) :
<beans:beans>
<channel id="channel1"></channel>
<channel id="channel2"></channel>
<channel id="xsltSpecific"></channel>
<channel id="xsltSpecificDelayed"></channel>
<channel id="xsltCommon"></channel>
<channel id="irdSpecificUnmarshallerChannel"></channel>
<channel id="irdSpecificInputChannel"></channel>
<file:outbound-channel-adapter
directory="${dml.ird.directory}" channel="channel1"
auto-create-directory="true" filename-generator="timestampedFileNameGenerator">
</file:outbound-channel-adapter>
<recipient-list-router input-channel="fileChannel">
<recipient channel="channel1" selector-expression="${dml.data.logs.enable}" />
<recipient channel="channel2" />
</recipient-list-router>
<recipient-list-router input-channel="channel2">
<recipient channel="xsltSpecificDelayed"></recipient>
<recipient channel="xsltCommon"></recipient>
</recipient-list-router>
<delayer id="specificDelayer" input-channel="xsltSpecificDelayed" default-delay="5000" output-channel="xsltSpecific"/>
<jms:message-driven-channel-adapter
id="jmsInboundAdapterIrd" destination="jmsInputQueue" channel="fileChannel"
acknowledge="transacted" transaction-manager="transactionManager"
error-channel="errorChannel" client-id="${ibm.jms.connection.factory.client.id}"
subscription-durable="true" durable-subscription-name="${ibm.jms.subscription.id1}" />
<si-xml:xslt-transformer input-channel="xsltCommon" output-channel="jmsInputChannel"
xsl-resource="classpath:summit-hub-to-cpm-mapping.xsl" result-transformer="resultTransformer" >
</si-xml:xslt-transformer>
<si-xml:xslt-transformer input-channel="xsltSpecific" output-channel="irdSpecificUnmarshallerChannel"
xsl-resource="classpath:summit-hub-specific.xsl" result-transformer="resultTransformer" >
</si-xml:xslt-transformer>
<si-xml:unmarshalling-transformer id="irdUnmarshaller"
unmarshaller="irdUnmarshallerDelegate" input-channel="irdSpecificUnmarshallerChannel"
output-channel="saveSpecificTradeChannel" />
<beans:bean id="irdUnmarshallerDelegate"
class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<beans:property name="schema"
value="summit-hub-specific.xsd" />
<beans:property name="contextPath"
value="com.h.i.c.d.i.mapping" />
</beans:bean>
<beans:bean id="resultTransformer" class="org.springframework.integration.xml.transformer.ResultToStringTransformer" />
<service-activator ref="specificTradeService" input-channel="saveSpecificTradeChannel"
requires-reply="false" method="save"/>
<file:inbound-channel-adapter directory="${dml.retry.directoryForIrd}"
channel="fileChannelAfterRetry" auto-create-directory="true"
prevent-duplicates="false" filename-regex=".*\.(msg|xml)" queue-size="50" >
<poller fixed-delay="${dml.retry.delay}" max-messages-per-poll="50">
<transactional transaction-manager="transactionManager" />
</poller>
</file:inbound-channel-adapter>
<channel id="fileChannel"/>
<channel id="fileChannelAfterRetry"/>
<file:file-to-string-transformer
input-channel="fileChannelAfterRetry" output-channel="fileChannel"
delete-files="true" />
<beans:import resource="classpath:cpm-dml-common-main.xml" />
</beans:beans>
But I am having the following exception :
org.springframework.messaging.MessageDeliveryException: Dispatcher has no subscribers for channel 'org.springframework.context.support.GenericApplicationContext#6950e31.xsltSpecific'.; nested exception is org.springframework.integration.MessageDispatchingException: Dispatcher has no subscribers
What does this exception mean ?
Also, I am not able to spot the problem, can you help me fix this issue ?
UPDATE
Sorry, I didn't give the whole context earlier, because I didn't think it was relevant.
The exception arises during a test derived from AbstractTransactionalJUnit4SpringContextTests, which closed the application context at the end of the test, before the message had a chance to get to the end.
I've added a Thread.sleep(10000) at the end of the test, and the exception doesn't happen anymore.
The xsltSpecific is just a default DirectChannel with a UnicastingDispatcher to deliver messages to channel's subscribers.
According your configuration you send a message to this channel from the:
<delayer id="specificDelayer" input-channel="xsltSpecificDelayed" default-delay="5000" output-channel="xsltSpecific"/>
And also it looks like you really have a subscriber to this channel:
<si-xml:xslt-transformer input-channel="xsltSpecific" output-channel="irdSpecificUnmarshallerChannel"
xsl-resource="classpath:summit-hub-specific.xsl" result-transformer="resultTransformer" >
</si-xml:xslt-transformer>
What is really not clear when this defined subscriber is lost. It doesn't look like you have an auto-startup="false" on this endpoint, but on the other hand maybe you really stop it at runtime...
Would you mind to share more stack trace on the matter? I want to see who is an original caller for that lost message.

New threads are getting created on polling a directory using spring integration

This is our spring config:
<int-file:inbound-channel-adapter id="fileReprocessorChannelId" channel="fileReprocessorChannel"
directory="${file.location}" scanner="headScanner">
<int:poller cron="${reprocess.cronExpression}" max-messages-per-poll="${reprocess.maxMsgPerPoll}" />
</int-file:inbound-channel-adapter>
<int:chain id="reprocessorChain" input-channel="fileReprocessorChannel" output-channel="transformerChannel">
<int-file:file-to-string-transformer delete-files="false" charset="UTF-8" />
<int:header-enricher>
<int:header name="Operation" value="${operation.fileReprocessor}" overwrite="true" />
<int:header name="GUID" method="getGuidForReprocessing" ref="headerAttributesGenerator"/>
</int:header-enricher>
</int:chain>
<bean id="headScanner" class="FileStreamDirectoryScanner">
<constructor-arg>
<value>${reprocess.maxMsgPerPoll}</value>
</constructor-arg>
<constructor-arg>
<value>${reprocess.fileAgeInMillis}</value>
</constructor-arg>
<property name="locker" ref="nio-locker" />
</bean>
<bean id="nio-locker" class="org.springframework.integration.file.locking.NioFileLocker" />
<int:channel id="transformerChannel">
<int:interceptors>
<int:wire-tap channel="loggerChannel"/>
</int:interceptors>
</int:channel>
On running the server with around 10000 files on disk, we find the following exceprtion when around 7000 files are processed: java.nio.file.FileSystemException: Too many open files.
On debugging the code, the threads seem to be created here: https://github.com/spring-projects/spring-integration/blob/master/spring-integration-core/src/main/java/org/springframework/integration/endpoint/AbstractPollingEndpoint.java#L334
The huge number of threads is consuming large cpu at ~70 threads leading to application crash.
Could you please advice if there is a better way to do this (are we doing something wrong?) or if this is a known bug in the spring code?
Edit:
Thread dump attached:
Thread dump
I would recommend instead the WatchService if the files are in a few directories. This likely spins up a thread too but for the directory, not each file in the directory.
The default taskExecutor is a SyncTaskExecutor so the task runs on the scheduler thread.
The default taskScheduler bean only has 10 threads so you must have some other configuration that you are not showing.
Have you looked at the thread dump (jstack pid) to see what all these threads are doing?

reply-timeout meaning in tcp-inbound-gateway in spring integration

Spring integration tcp gateway can be setup as follows:
<!-- Server side -->
<int-ip:tcp-connection-factory id="crLfServer"
type="server"
port="${availableServerSocket}"/>
<int-ip:tcp-inbound-gateway id="gatewayCrLf"
connection-factory="crLfServer"
request-channel="serverBytes2StringChannel"
error-channel="errorChannel"
reply-timeout="10000" />
<int:channel id="toSA" />
<int:service-activator input-channel="toSA"
ref="echoService"
method="test"/>
<bean id="echoService"
class="org.springframework.integration.samples.tcpclientserver.EchoService" />
<int:object-to-string-transformer id="serverBytes2String"
input-channel="serverBytes2StringChannel"
output-channel="toSA"/>
<int:transformer id="errorHandler"
input-channel="errorChannel"
expression="Error processing payload"/>
Notice the reply-timeout which is set as 10 seconds.
Does it mean that the TCP server will call the service and can wait for a maximum of 10 seconds? If the service does not reply within 10 seconds, Does the TCP server will send the message to errorChannel which in turn sends the client error message "Error processing payload"?
When I tested the TCP Server with a service that takes 20 seconds, client is taking 20 seconds to get the response. I am not seeing error message.
Can you please help in understanding the reply-timeout in TCP inbound-gateway?
Thanks
UPDATE:
Thanks for Artem to help out with this issue.
Best way to solve this problem is with the following config:
<beans>
<int-ip:tcp-connection-factory id="crLfServer" type="server" port="${availableServerSocket}"/>
<int-ip:tcp-inbound-gateway id="gatewayCrLf" connection-factory="crLfServer" request-channel="requestChannel" error-channel="errorChannel" reply-timeout="5000" />
<int:service-activator input-channel="requestChannel" ref="gateway" requires-reply="true"/>
<int:gateway id="gateway" default-request-channel="timeoutChannel" default-reply-timeout="5000" />
<int:object-to-string-transformer id="serverBytes2String" input-channel="timeoutChannel" output-channel="serviceChannel"/>
<int:channel id="timeoutChannel">
<int:dispatcher task-executor="executor"/>
</int:channel>
<bean id="executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="5" />
<property name="maxPoolSize" value="10" />
<property name="queueCapacity" value="25" />
</bean>
<int:service-activator input-channel="serviceChannel" ref="echoService" method="test"/>
<bean id="echoService" class="org.springframework.integration.samples.tcpclientserver.EchoService" />
<int:transformer id="errorHandler" input-channel="errorChannel" expression="payload.failedMessage.payload + ' errorHandleMsg: may be timeout error'"/>
</beans>
Thanks
Well, actually we should on that attribute a description like we have in other similar places, e.g. HTTP Inbound Gateway:
<xsd:attribute name="reply-timeout" type="xsd:string">
<xsd:annotation>
<xsd:documentation><![CDATA[
Used to set the receiveTimeout on the underlying MessagingTemplate instance
(org.springframework.integration.core.MessagingTemplate) for receiving messages
from the reply channel. If not specified this property will default to "1000"
(1 second).
]]></xsd:documentation>
</xsd:annotation>
</xsd:attribute>
That timeout means how much to wait for reply from downstream flow. But! That is possible if you flow is shifted to another thread somewhere. Otherwise everything is performed in the caller's Thread and therefore the wait time isn't deterministic.
Anyway we return null there after timeout without reply. And it is reflected in the TcpInboundGateway:
Message<?> reply = this.sendAndReceiveMessage(message);
if (reply == null) {
if (logger.isDebugEnabled()) {
logger.debug("null reply received for " + message + " nothing to send");
}
return false;
}
We can reconsider a logic in the TcpInboundGateway for :
if (reply == null && this.errorOnTimeout) {
if (object instanceof Message) {
error = new MessageTimeoutException((Message<?>) object, "No reply received within timeout");
}
else {
error = new MessageTimeoutException("No reply received within timeout");
}
}
But seems for me it really would be better on to rely on the timeout from the client.
UPDATE
I think we can overcome the limitation and meet you requirements with the midflow <gateway>:
<gateway id="gateway" default-request-channel="timeoutChannel" default-reply-timeout="10000"/>
<channel id="timeoutChannel">
<dispatcher task-executor="executor"/>
</channel>
<service-activator input-channel="requestChannel"
ref="gateway"
requires-reply="true"/>
So, the <service-activator> calls <gateway> and waits for reply from there. Requiring the last one, of course, to end up with the ReplyRequiredException, which you can convert into desired MessageTimeoutException in your error flow on the error-channel="errorChannel".
The timeoutChannel is an executor one, making our default-reply-timeout="10000" very useful because we shift a message on the gateway into separate thread immediately and move right from there into reply waiting process wrapped with that timeout on the CountDonwLatch.
Hope that is clear.

TCP Spring Integration - Dispatcher has no subscribers for 'response' channel

I am using spring integration to make TCP call to server by providing some message and getting response back. I prefer to use channel adapter to send and receive bulk messages. The problem I am facing is with the response channel. Getting "Dispatcher has no subscribers for channel " for response channel.
Everything is working fine except the response not getting transported on response channel. I can see the handshaking at server and the response in the log being put on the response and logger channels. But after that exception is thrown. Configuration setup is:
<gateway id="clientPositionsGateway" service-interface="MyGatewayInterface">
<method name="fetchClientPositions" request-channel="clientPositionsRequestChannel" />
</gateway>
<channel id="clientPositionsRequestChannel" />
<splitter input-channel="clientPositionsRequestChannel"
output-channel="singleClientPositionsRequestChannel" />
<channel id = "singleClientPositionsRequestChannel" />
<transformer
input-channel="singleClientPositionsRequestChannel"
output-channel="dmQueryRequestChannel"
ref="dmPosBaseQueryTransformer" />
<channel id = "dmQueryRequestChannel">
<!-- <dispatcher task-executor="executor"/> -->
</channel>
<ip:tcp-connection-factory id="csClient"
type="client"
host="somehost"
port="12345"
single-use="true"
deserializer="connectionSerializeDeserialize"
/>
<ip:tcp-outbound-channel-adapter id="dmServerOutboundAdapter"
channel="dmQueryRequestChannel"
connection-factory="csClient"
order="2"
/>
<ip:tcp-inbound-channel-adapter id="dmServerInboundAdapter"
channel="dmQueryResponseChannel"
connection-factory="csClient"
error-channel="errorChannel"/>
<channel id="dmQueryResponseChannel"/>
As Artem said in his comment, 'Dispatcher has no subscribers' means here that there is no endpoint configured to receive the response on dmQueryResponseChannel, or the endpoint configured with that channel as its input channel is not started.
In any case, even when you resolve that, using independent adapters for request/response scenarios is tricky because the framework has no way to automatically correlated the response to the request. That's what the outbound gateway is for. You can use collaborating adapters, but you have to deal with the correlation yourself. If you are using a request/reply gateway to initiate the flow, you will have to use a technique such as the one explored in the tcp-client-server-multiplex sample. This is because using independent adapters means you'll lose the replyChannel header used to get the response back to the gateway.
Or, you can use a void returning gateway to send the request, and an <int:outbound-channel-adapter/> so the framework will call back with the response and you can do your own correlation programmatically.
If your clientPositionsGateway is invoked from client threads, there is no reason to use executor channels. If you do million loop clientPositionsGateway try to use Future gateway: http://docs.spring.io/spring-integration/docs/2.2.5.RELEASE/reference/html/messaging-endpoints-chapter.html#async-gateway and again: without executor channels. And I don't see reason to use reply-channel on both gateways.
And one more: you have <splitter> before <tcp:outbound-gateway>, but where is an <aggregator> after <tcp:outbound-gateway>?..
In your current case you get reply from your clientPositionsGateway, all others will be dropped, because TemporaryReplyChannel will be already closed.
public interface ClientPositionsGateway {
String fetchClientPositions(List<String> clientList);
}
Here is a code that solved my Problem :
#ContextConfiguration(locations={"/clientGIM2Position.xml"})
#RunWith(SpringJUnit4ClassRunner.class)
public class GetClientPositionsTest {
#Autowired
ClientPositionsGateway clientPositionsGateway;
#Test
public void testGetPositions() throws Exception {
String positions = clientPositionsGateway.fetchClientPositions(clientList);
System.out.println("returned !!!!" + positions);
}
}
<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.springframework.org/schema/integration"
xmlns:ip="http://www.springframework.org/schema/integration/ip"
xmlns:context="http://www.springframework.org/schema/context"
xmlns:task="http://www.springframework.org/schema/task"
xsi:schemaLocation="http://www.springframework.org/schema/integration/ip http://www.springframework.org/schema/integration/ip/spring-integration-ip.xsd
http://www.springframework.org/schema/integration http://www.springframework.org/schema/integration/spring-integration.xsd
http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
http://www.springframework.org/schema/task http://www.springframework.org/schema/task/spring-task.xsd">
<!-- intercept and log every message -->
<logging-channel-adapter id="logger" level="DEBUG" />
<wire-tap channel = "logger" />
<gateway id="clientPositionsGateway"
service-interface="com.example.ClientPositionsGateway">
<method name="fetchClientPositions" request-channel="clientPositionsRequestChannel" reply-channel="dmQueryResponseChannel"/>
</gateway>
<channel id="clientPositionsRequestChannel" />
<splitter input-channel="clientPositionsRequestChannel"
output-channel="singleClientPositionsRequestChannel" />
<channel id = "singleClientPositionsRequestChannel" />
<transformer
input-channel="singleClientPositionsRequestChannel"
output-channel="dmQueryRequestChannel"
ref="dmPosBaseTransQueryTransformer" />
<channel id = "dmQueryRequestChannel">
<dispatcher task-executor="executor"/>
</channel>
<ip:tcp-connection-factory id="csClient"
type="client"
host="hostserver"
port="22010"
single-use="true"
deserializer="connectionSerializeDeserialize"
/>
<ip:tcp-outbound-gateway id="dmServerGateway"
request-channel="dmQueryRequestChannel"
reply-channel="dmQueryResponseChannel"
connection-factory="csClient" />
<channel id="dmQueryResponseChannel">
<dispatcher task-executor="executor"/>
</channel>
<channel id="serverBytes2StringChannel" />
<bean id="connectionSerializeDeserialize" class="com.example.DMQueryResponseSerializer"/>
<bean id="dmPosBaseTransQueryTransformer" class="com.example.DMPOSBaseTransQueryTransformer"/>
<task:executor id="executor" pool-size="5"/>
</beans:beans>
Configuration settings:
<gateway id="clientPositionsGateway" service-interface="com.example.ClientPositionsGateway">
<method name="fetchClientPositions" request-channel="clientPositionsRequestChannel" reply-channel="dmQueryResponseChannel"/>
</gateway>
<channel id="clientPositionsRequestChannel" />
<splitter input-channel="clientPositionsRequestChannel"
output-channel="singleClientPositionsRequestChannel" />
<channel id = "singleClientPositionsRequestChannel" />
<transformer
input-channel="singleClientPositionsRequestChannel"
output-channel="dmQueryRequestChannel"
ref="dmPosBaseQueryTransformer" />
<logging-channel-adapter channel="clientBytes2StringChannel"/>
<channel id = "dmQueryRequestChannel">
<dispatcher task-executor="executor"/>
</channel>
<ip:tcp-connection-factory id="csClient"
type="client"
host="serverHost"
port="22010"
single-use="true"
deserializer="connectionSerializeDeserialize"
/>
<ip:tcp-outbound-channel-adapter id="dmServerOutboundAdapter"
channel="dmQueryRequestChannel"
connection-factory="csClient"
/>
<ip:tcp-inbound-channel-adapter id="dmServerInboundAdapter"
channel="dmQueryResponseChannel"
connection-factory="csClient"
error-channel="errorChannel"/>
<transformer input-channel="dmQueryResponseChannel" output-channel="clientBytes2StringChannel" ref="dmPOSBaseQueryResponseTransformer"
/>
<channel id="dmQueryResponseChannel"/>
<channel id="clientBytes2StringChannel"/>

Several data sources in one spring integration pipeline?

I have a configured spring integration pipeline where xml files are parsed into various objects. The objects are going through several channel endpoints where they are slightly modified - nothing special, just some properties added.
The last endpoint from the pipeline is the persister, where the objects are persisted in DB. There might be duplicates, so in this endpoint there is also a check whether the object is already persisted or its a new one.
I use a message driven architecture, with simple direct channels.
<int:channel id="parsedObjects1" />
<int:channel id="parsedObjects2" />
<int:channel id="processedObjects" />
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" />
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" />
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" />
In the moment there is only one data source, from where I get xml files, and everything is going smoothly. The problems begin when I need to attach a second data source. The files are coming in the same time so I want them processed in parallel. So, I've placed two parser instances, and every parser is sending messages through the pipeline.
The configuration with the direct channels that I have creates concurrency problems, so I've tried modifying it. I've tried several configuration from spring integration documentation, but so far with no success.
I've tried with dispatcher configured with max pool size of 1 - one thread per message in every channel endpoint.
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
<int:channel id="parsedObjects2" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
<int:channel id="processedObjects" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
I have tried the queue-poller configuration also:
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:rendezvous-queue/>
</int:channel>
<int:channel id="parsedObjects2" >
<int:rendezvous-queue/>
</int:channel>
<int:channel id="processedObjects" >
<int:rendezvous-queue/>
</int:channel>
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
Basically, I want to get rid of any race conditions in the channel endpoints - in my case in the persister. The persister channel endpoint should block for every message, because if it runs in parallel, I get many duplicates persisted in the DB.
EDIT:
After some debugging I've done, it seems that the problems are in the endpoints logic rather than the configuration. Some of the objects which are sent through the pipeline to the persister, are also stored in a local cache until parsing of the file is done - they are later sent through the pipeline as well to persist some join tables as a part of some other domain entities. It happens that with the above configurations, some of the objects were not yet persisted when they are sent for the second time in the pipeline, so at the end I get duplicates in the DB.
I'm fairly new at spring integration, so probably at this point I will ask more general questions. In a setup with multiple data sources - meaning multiple instances of parsers etc:
Is there a common way (best way) to go to configure the pipeline to enable parallelization?
If there is need, is there a way to serialize the message handling?
Any suggestions are welcomed. Thanks in advance.
First, can you describe what the "concurrency problems" are? Ideally you would not need to serialize the message handling, so that would be a good place to start.
Second, the thread pool as you've configured it will not completely serialize. You will have 1 thread available in the pool but the rejection policy you've chosen leads to a caller thread running the task itself (basically throttling) in the case that the queue is at capacity. That means you will get a caller-run thread concurrently with the one from the pool.
The best way that I can think of for your scenario would be along these lines:
Make your parsedObject1 and parsedObject2 be normal queue channels, the capacity of the queue can be set appropriately (say 25 at any time):
<int:channel id="parsedObjects1" >
<int:queue />
</int:channel>
Now at this point your xml processors against the 2 channels - parsedObjects1 and parsedObjects2, will process the xml's and should output to the processedObjects channel. You can use the configuration similar to what you have for this, except that I have explicitly specified the processedObjects channel -:
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" output-channel="processedObjects">
<int:poller task-executor="channelTaskExecutor"/>
</int:service-activator>
The third step is where I will deviate from your configuration, at this point you said you want to serialize the persistence, the best way would be to do it through a DIFFERENT task executor with a pool size of 1, this way only 1 instance of your persister is running at any point in time:
<task:executor id="persisterpool" pool-size="1"/>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="persisterpool" fixed-delay="2"/>
</int:service-activator>
I managed to get the pipeline working. I'm not sure if I'll keep the current configuration, or experiment some more, but for now, this is the configuration I ended up with:
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:queue capacity="1000" />
</int:channel>
<int:channel id="parsedObjects2" >
<int:queue capacity="1000" />
</int:channel>
<int:channel id="processedObjects" >
<int:queue capacity="1000" />
</int:channel>
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="100" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="100" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>

Categories