I have a configured spring integration pipeline where xml files are parsed into various objects. The objects are going through several channel endpoints where they are slightly modified - nothing special, just some properties added.
The last endpoint from the pipeline is the persister, where the objects are persisted in DB. There might be duplicates, so in this endpoint there is also a check whether the object is already persisted or its a new one.
I use a message driven architecture, with simple direct channels.
<int:channel id="parsedObjects1" />
<int:channel id="parsedObjects2" />
<int:channel id="processedObjects" />
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" />
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" />
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" />
In the moment there is only one data source, from where I get xml files, and everything is going smoothly. The problems begin when I need to attach a second data source. The files are coming in the same time so I want them processed in parallel. So, I've placed two parser instances, and every parser is sending messages through the pipeline.
The configuration with the direct channels that I have creates concurrency problems, so I've tried modifying it. I've tried several configuration from spring integration documentation, but so far with no success.
I've tried with dispatcher configured with max pool size of 1 - one thread per message in every channel endpoint.
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
<int:channel id="parsedObjects2" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
<int:channel id="processedObjects" >
<int:dispatcher task-executor="channelTaskExecutor" />
</int:channel>
I have tried the queue-poller configuration also:
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:rendezvous-queue/>
</int:channel>
<int:channel id="parsedObjects2" >
<int:rendezvous-queue/>
</int:channel>
<int:channel id="processedObjects" >
<int:rendezvous-queue/>
</int:channel>
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
Basically, I want to get rid of any race conditions in the channel endpoints - in my case in the persister. The persister channel endpoint should block for every message, because if it runs in parallel, I get many duplicates persisted in the DB.
EDIT:
After some debugging I've done, it seems that the problems are in the endpoints logic rather than the configuration. Some of the objects which are sent through the pipeline to the persister, are also stored in a local cache until parsing of the file is done - they are later sent through the pipeline as well to persist some join tables as a part of some other domain entities. It happens that with the above configurations, some of the objects were not yet persisted when they are sent for the second time in the pipeline, so at the end I get duplicates in the DB.
I'm fairly new at spring integration, so probably at this point I will ask more general questions. In a setup with multiple data sources - meaning multiple instances of parsers etc:
Is there a common way (best way) to go to configure the pipeline to enable parallelization?
If there is need, is there a way to serialize the message handling?
Any suggestions are welcomed. Thanks in advance.
First, can you describe what the "concurrency problems" are? Ideally you would not need to serialize the message handling, so that would be a good place to start.
Second, the thread pool as you've configured it will not completely serialize. You will have 1 thread available in the pool but the rejection policy you've chosen leads to a caller thread running the task itself (basically throttling) in the case that the queue is at capacity. That means you will get a caller-run thread concurrently with the one from the pool.
The best way that I can think of for your scenario would be along these lines:
Make your parsedObject1 and parsedObject2 be normal queue channels, the capacity of the queue can be set appropriately (say 25 at any time):
<int:channel id="parsedObjects1" >
<int:queue />
</int:channel>
Now at this point your xml processors against the 2 channels - parsedObjects1 and parsedObjects2, will process the xml's and should output to the processedObjects channel. You can use the configuration similar to what you have for this, except that I have explicitly specified the processedObjects channel -:
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" output-channel="processedObjects">
<int:poller task-executor="channelTaskExecutor"/>
</int:service-activator>
The third step is where I will deviate from your configuration, at this point you said you want to serialize the persistence, the best way would be to do it through a DIFFERENT task executor with a pool size of 1, this way only 1 instance of your persister is running at any point in time:
<task:executor id="persisterpool" pool-size="1"/>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="persisterpool" fixed-delay="2"/>
</int:service-activator>
I managed to get the pipeline working. I'm not sure if I'll keep the current configuration, or experiment some more, but for now, this is the configuration I ended up with:
<task:executor id="channelTaskExecutor" pool-size="1-1" keep-alive="10" rejection-policy="CALLER_RUNS" queue-capacity="1" />
<int:channel id="parsedObjects1" >
<int:queue capacity="1000" />
</int:channel>
<int:channel id="parsedObjects2" >
<int:queue capacity="1000" />
</int:channel>
<int:channel id="processedObjects" >
<int:queue capacity="1000" />
</int:channel>
<int:service-activator input-channel="parsedObjects1" ref="processor1" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="100" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="parsedObjects2" ref="processor2" method="process" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="100" fixed-rate="2" />
</int:service-activator>
<int:service-activator input-channel="processedObjects" ref="persister" method="persist" >
<int:poller task-executor="channelTaskExecutor" max-messages-per-poll="1" fixed-rate="2" />
</int:service-activator>
Related
I have requirement where i need to pass message to multiple channels asyc. To make my flow asyc i am using all executor channel. But for some reason flow is still sequential. i can seen diff thread as i configured in task executor but in sequence.
Here is the configuration I am using
<int:channel id="mainChannel">
<int:interceptors>
<int:wire-tap channel="channel1"/>
<int:wire-tap channel="channel2"/>
<int:wire-tap channel="channel3"/>
</int:interceptors>
</int:channel>
<int:channel id="channel1">
<int:dispatcher task-executor="exec1" />
</int:channel>
<int:channel id="channel2">
<int:dispatcher task-executor="exec2" />
</int:channel>
<int:channel id="channel3">
<int:dispatcher task-executor="exec3" />
</int:channel>
As per my understanding all this will be asyc (in my case 3 thread should run in parallel)
from log i can see all sequential but with diff thread name..
I am assuming preSend/Postsend should have been called in random order.
am i missing anything to make multiple executor channel in parallel.
I will really appreciate any help.
You might need to call the async implementation bean as shown:
<beans:bean id="asyncExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
<int:channel id="channel1">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
<int:channel id="channel2">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
<int:channel id="channel3">
<int:dispatcher task-executor="asyncExecutor" />
</int:channel>
Description of SimpleAsyncTaskExecutor:
public class SimpleAsyncTaskExecutor extends CustomizableThreadCreator
implements AsyncListenableTaskExecutor, Serializable
TaskExecutor implementation that fires up a new Thread for each task,
executing it asynchronously.
Supports limiting concurrent threads through the "concurrencyLimit"
bean property. By default, the number of concurrent threads is
unlimited.
NOTE: This implementation does not reuse threads! Consider a
thread-pooling TaskExecutor implementation instead, in particular for
executing a large number of short-lived tasks.
Example Of Usage from Github:
<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:beans="http://www.springframework.org/schema/beans"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/integration/spring-integration.xsd">
<channel id="taskExecutorOnly">
<dispatcher task-executor="taskExecutor"/>
</channel>
<channel id="failoverFalse">
<dispatcher failover="false"/>
</channel>
<channel id="failoverTrue">
<dispatcher failover="true"/>
</channel>
<channel id="loadBalancerDisabled">
<dispatcher load-balancer="none"/>
</channel>
<channel id="loadBalancerDisabledAndTaskExecutor">
<dispatcher load-balancer="none" task-executor="taskExecutor"/>
</channel>
<channel id="roundRobinLoadBalancerAndTaskExecutor">
<dispatcher load-balancer="round-robin" task-executor="taskExecutor"/>
</channel>
<channel id="lbRefChannel">
<dispatcher load-balancer-ref="lb"/>
</channel>
<beans:bean id="taskExecutor"
class="org.springframework.core.task.SimpleAsyncTaskExecutor"/>
<beans:bean id="lb"
class="org.springframework.integration.channel.config.DispatchingChannelParserTests.SampleLoadBalancingStrategy"/>
</beans:beans>
from log i can see all sequential but with diff thread name
Because logs are just a single place where messages are printed and they really are printed by one writer even if from different thread. They are appear over there one by one. With a good load you would definitely see that messages are logged in an unexpected order.
I am assuming preSend/Postsend should have been called in random order.
That's not true. Interceptors are called in an order how they are added to the channel and if their order is the same, which is a case for you. It is already not an interceptor chain responsibility how those interceptors are implemented.
I think you just were not lucky to see logs in arbitrary order and probably just because consumers for those executor channels are plain loggers - no any loads to hold the thread and have an impression that work in other threads is done in parallel.
This is our spring config:
<int-file:inbound-channel-adapter id="fileReprocessorChannelId" channel="fileReprocessorChannel"
directory="${file.location}" scanner="headScanner">
<int:poller cron="${reprocess.cronExpression}" max-messages-per-poll="${reprocess.maxMsgPerPoll}" />
</int-file:inbound-channel-adapter>
<int:chain id="reprocessorChain" input-channel="fileReprocessorChannel" output-channel="transformerChannel">
<int-file:file-to-string-transformer delete-files="false" charset="UTF-8" />
<int:header-enricher>
<int:header name="Operation" value="${operation.fileReprocessor}" overwrite="true" />
<int:header name="GUID" method="getGuidForReprocessing" ref="headerAttributesGenerator"/>
</int:header-enricher>
</int:chain>
<bean id="headScanner" class="FileStreamDirectoryScanner">
<constructor-arg>
<value>${reprocess.maxMsgPerPoll}</value>
</constructor-arg>
<constructor-arg>
<value>${reprocess.fileAgeInMillis}</value>
</constructor-arg>
<property name="locker" ref="nio-locker" />
</bean>
<bean id="nio-locker" class="org.springframework.integration.file.locking.NioFileLocker" />
<int:channel id="transformerChannel">
<int:interceptors>
<int:wire-tap channel="loggerChannel"/>
</int:interceptors>
</int:channel>
On running the server with around 10000 files on disk, we find the following exceprtion when around 7000 files are processed: java.nio.file.FileSystemException: Too many open files.
On debugging the code, the threads seem to be created here: https://github.com/spring-projects/spring-integration/blob/master/spring-integration-core/src/main/java/org/springframework/integration/endpoint/AbstractPollingEndpoint.java#L334
The huge number of threads is consuming large cpu at ~70 threads leading to application crash.
Could you please advice if there is a better way to do this (are we doing something wrong?) or if this is a known bug in the spring code?
Edit:
Thread dump attached:
Thread dump
I would recommend instead the WatchService if the files are in a few directories. This likely spins up a thread too but for the directory, not each file in the directory.
The default taskExecutor is a SyncTaskExecutor so the task runs on the scheduler thread.
The default taskScheduler bean only has 10 threads so you must have some other configuration that you are not showing.
Have you looked at the thread dump (jstack pid) to see what all these threads are doing?
Spring integration tcp gateway can be setup as follows:
<!-- Server side -->
<int-ip:tcp-connection-factory id="crLfServer"
type="server"
port="${availableServerSocket}"/>
<int-ip:tcp-inbound-gateway id="gatewayCrLf"
connection-factory="crLfServer"
request-channel="serverBytes2StringChannel"
error-channel="errorChannel"
reply-timeout="10000" />
<int:channel id="toSA" />
<int:service-activator input-channel="toSA"
ref="echoService"
method="test"/>
<bean id="echoService"
class="org.springframework.integration.samples.tcpclientserver.EchoService" />
<int:object-to-string-transformer id="serverBytes2String"
input-channel="serverBytes2StringChannel"
output-channel="toSA"/>
<int:transformer id="errorHandler"
input-channel="errorChannel"
expression="Error processing payload"/>
Notice the reply-timeout which is set as 10 seconds.
Does it mean that the TCP server will call the service and can wait for a maximum of 10 seconds? If the service does not reply within 10 seconds, Does the TCP server will send the message to errorChannel which in turn sends the client error message "Error processing payload"?
When I tested the TCP Server with a service that takes 20 seconds, client is taking 20 seconds to get the response. I am not seeing error message.
Can you please help in understanding the reply-timeout in TCP inbound-gateway?
Thanks
UPDATE:
Thanks for Artem to help out with this issue.
Best way to solve this problem is with the following config:
<beans>
<int-ip:tcp-connection-factory id="crLfServer" type="server" port="${availableServerSocket}"/>
<int-ip:tcp-inbound-gateway id="gatewayCrLf" connection-factory="crLfServer" request-channel="requestChannel" error-channel="errorChannel" reply-timeout="5000" />
<int:service-activator input-channel="requestChannel" ref="gateway" requires-reply="true"/>
<int:gateway id="gateway" default-request-channel="timeoutChannel" default-reply-timeout="5000" />
<int:object-to-string-transformer id="serverBytes2String" input-channel="timeoutChannel" output-channel="serviceChannel"/>
<int:channel id="timeoutChannel">
<int:dispatcher task-executor="executor"/>
</int:channel>
<bean id="executor" class="org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor">
<property name="corePoolSize" value="5" />
<property name="maxPoolSize" value="10" />
<property name="queueCapacity" value="25" />
</bean>
<int:service-activator input-channel="serviceChannel" ref="echoService" method="test"/>
<bean id="echoService" class="org.springframework.integration.samples.tcpclientserver.EchoService" />
<int:transformer id="errorHandler" input-channel="errorChannel" expression="payload.failedMessage.payload + ' errorHandleMsg: may be timeout error'"/>
</beans>
Thanks
Well, actually we should on that attribute a description like we have in other similar places, e.g. HTTP Inbound Gateway:
<xsd:attribute name="reply-timeout" type="xsd:string">
<xsd:annotation>
<xsd:documentation><![CDATA[
Used to set the receiveTimeout on the underlying MessagingTemplate instance
(org.springframework.integration.core.MessagingTemplate) for receiving messages
from the reply channel. If not specified this property will default to "1000"
(1 second).
]]></xsd:documentation>
</xsd:annotation>
</xsd:attribute>
That timeout means how much to wait for reply from downstream flow. But! That is possible if you flow is shifted to another thread somewhere. Otherwise everything is performed in the caller's Thread and therefore the wait time isn't deterministic.
Anyway we return null there after timeout without reply. And it is reflected in the TcpInboundGateway:
Message<?> reply = this.sendAndReceiveMessage(message);
if (reply == null) {
if (logger.isDebugEnabled()) {
logger.debug("null reply received for " + message + " nothing to send");
}
return false;
}
We can reconsider a logic in the TcpInboundGateway for :
if (reply == null && this.errorOnTimeout) {
if (object instanceof Message) {
error = new MessageTimeoutException((Message<?>) object, "No reply received within timeout");
}
else {
error = new MessageTimeoutException("No reply received within timeout");
}
}
But seems for me it really would be better on to rely on the timeout from the client.
UPDATE
I think we can overcome the limitation and meet you requirements with the midflow <gateway>:
<gateway id="gateway" default-request-channel="timeoutChannel" default-reply-timeout="10000"/>
<channel id="timeoutChannel">
<dispatcher task-executor="executor"/>
</channel>
<service-activator input-channel="requestChannel"
ref="gateway"
requires-reply="true"/>
So, the <service-activator> calls <gateway> and waits for reply from there. Requiring the last one, of course, to end up with the ReplyRequiredException, which you can convert into desired MessageTimeoutException in your error flow on the error-channel="errorChannel".
The timeoutChannel is an executor one, making our default-reply-timeout="10000" very useful because we shift a message on the gateway into separate thread immediately and move right from there into reply waiting process wrapped with that timeout on the CountDonwLatch.
Hope that is clear.
I would like to apply an Interceptor on the reply-channel of an http:inbound-gateway to save some event related data to a table. The flow continues in a chain which then goes to a header-value-router. As an example let's take a service-activator at the end of this flow, where the output-channel is not specified. In this case, the replyChannel header holds a TemporaryReplyChannel object (anonymous reply channel) instead of the gateway's reply-channel. This way the Interceptor is never called.
Is there a way to "force" the usage of the specified reply-channel? The Spring document states that
by defining a default-reply-channel you can point to a channel of your choosing, which in this case would be a publish-subscribe-channel. The Gateway would create a bridge from it to the temporary, anonymous reply channel that is stored in the header.
I've tried using a publish-subscribe-channel as reply-channel, but it didn't make any difference. Maybe I misunderstood the article...
Inside my chain I've also experimented with a header-enricher. I wanted to overwrite the value of the replyChannel with the id of the channel I want to intercept (submit.reply.channel). While debugging I was able to see "submit.reply.channel" in the header, but then I got an exception java.lang.NoClassDefFoundError: org/springframework/transaction/interceptor/NoRollbackRuleAttribute and stopped trying ;-)
Code snippets
<int-http:inbound-gateway id="submitHttpGateway"
request-channel="submit.request.channel" reply-channel="submit.reply.channel" path="/submit" supported-methods="GET">
<int-http:header name="requestAttributes" expression="#requestAttributes" />
<int-http:header name="requestParametersMap" expression="#requestParams" />
</int-http:inbound-gateway>
<int:channel id="submit.request.channel" />
<int:publish-subscribe-channel id="submit.reply.channel">
<int:interceptors>
<int:ref bean="replyChannelInterceptor" />
</int:interceptors>
</int:publish-subscribe-channel>
Thanks in advance for your help!
The only "easy" way is to explicitly send the reply via the output-channel on the last endpoint.
In fact, all that happens when you send to a declared channel is the reply channel is simply bridged to the replyChannel header.
You could do it by saving off the replyChannel header in another header, set the replyChannel header to some other channel (which you can intercept); then restore the replyChannel header to the saved-off channel before the reply is returned to the gateway.
EDIT:
Sample config...
<int:channel id="in" />
<int:header-enricher input-channel="in" output-channel="next">
<int:header name="origReplyChannel" expression="headers['replyChannel']"/>
<int:reply-channel ref="myReplies" overwrite="true" />
</int:header-enricher>
<int:router input-channel="next" expression="payload.equals('foo')">
<int:mapping value="true" channel="channel1" />
<int:mapping value="false" channel="channel2" />
</int:router>
<int:transformer input-channel="channel1" expression="payload.toUpperCase()" />
<int:transformer input-channel="channel2" expression="payload + payload" />
<int:channel id="myReplies" />
<!-- restore the reply channel -->
<int:header-enricher input-channel="myReplies" output-channel="tapped">
<int:reply-channel expression="headers['origReplyChannel']" overwrite="true" />
</int:header-enricher>
<int:channel id="tapped">
<int:interceptors>
<int:wire-tap channel="loggingChannel" />
</int:interceptors>
</int:channel>
<int:logging-channel-adapter id="loggingChannel" log-full-message="true" logger-name="tapInbound"
level="INFO" />
<!-- route reply -->
<int:bridge id="bridgeToNowhere" input-channel="tapped" />
Test:
MessageChannel channel = context.getBean("in", MessageChannel.class);
MessagingTemplate template = new MessagingTemplate(channel);
String reply = template.convertSendAndReceive("foo", String.class);
System.out.println(reply);
reply = template.convertSendAndReceive("bar", String.class);
System.out.println(reply); }
Result:
09:36:30.224 INFO [main][tapInbound] GenericMessage [payload=FOO, headers={replyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#fba92d3, errorChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#fba92d3, id=326a610f-80c6-5b74-0158-e3644b732aab, origReplyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#fba92d3, timestamp=1442496990223}]
FOO
09:36:30.227 INFO [main][tapInbound] GenericMessage [payload=barbar, headers={replyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#662b4c69, errorChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#662b4c69, id=d161917c-ca73-a5a9-d0f1-d7a4346a459e, origReplyChannel=org.springframework.messaging.core.GenericMessagingTemplate$TemporaryReplyChannel#662b4c69, timestamp=1442496990227}]
barbar
I have a spring-integration message channel which read from a database using a jpa inbound-channel-adapter.
<int:channel id="logChannel">
<int:priority-queue capacity="20" />
</int:channel>
<int-jpa:inbound-channel-adapter
channel="logChannel" entity-class="com.objects.Transactionlog"
entity-manager-factory="entityManagerFactory" auto-startup="true"
jpa-query="SELECT x FROM Transactionlog AS x WHERE x.status LIKE '1'" max-results="1">
<int:poller fixed-rate="5000">
<int:transactional propagation="REQUIRED"
transaction-manager="transactionManager" />
</int:poller>
</int-jpa:inbound-channel-adapter>
This always reads only the first row of the table transactionlog. So I want to update the status of each database entry just after read. Any body know how to do that?
If max-results="1" is OK for you and receive only one entity per 5 second is appropiate for your use-case, let it be.
Now how to update that entity to skip it on the next poll.
The <int-jpa:inbound-channel-adapter> has delete-after-poll="true" option, which allows to perform entityManager.remove(entity) after an entity retrival.
Right, it is the real removal from DB. To convert it to the UPDATE, you can mark your entity with:
#SQLDelete(sql = "UPDATE Transactionlog SET status = false WHERE id = ?")
Or something similar, that is appropiate for you.
Another feature is Transaction Synchronization, when you mark your <poller> with some before-commit factory and do UPDATE there. Something like:
<int-jpa:inbound-channel-adapter ...>
<int:poller fixed-rate="5000">
<int:transactional propagation="REQUIRED"
transaction-manager="transactionManager"
synchronization-factory="txSyncFactory" />
</int:poller>
<int-jpa:inbound-channel-adapter>
<int:transaction-synchronization-factory id="txSyncFactory">
<int:before-commit channel="updateEntityChannel" />
</int:transaction-synchronization-factory>
<int:chain input-channel="updateEntityChannel">
<int:enricher>
<int:property name="status" value="true"/>
</int:enricher>
<int-jpa:outbound-channel-adapter entity-manager="entityManager"/>
</int:chain/>
Something like that.