Launching Spring batch job - java

I have a problem where in I need to receive a series of messages from an MQ queue and write this to a file and initiate a spring batch job with the file as input. Right now I'm thinking of launching the job with wired #Autowired JobLauncher jobLauncher and #Autowired Job job; from the MDB itself.But I feel this is not a good approach as spring batch may create a series of threads and EJB as such doesnt support multi threading.
Is there any other effective way to do this ? I dont want to use quartz scheduler or anything else since it adds complexity. Is there any interface in spring batch itself which launches a job soon after a file comes in a directory ? Any leads in doing this better would be appreciated.
Thanks.

I have a problem where in I need to receive a series of messages from an MQ queue and write this to a file and initiate a spring batch job with the file as input
One way to do that would be engage a bit of Spring Integration, where you would have a file poller, that would poll for a new file:
<file:inbound-channel-adapter id="filePoller"
channel="filesAreComing"
directory="file:${input.directory}"
filename-pattern="test*" />
Adapt a file message ( java.io.File ) to a file name ( String ), since that is what Spring Batch needs. This can be done with a JobLauncher adapter, that is already available from Spring Batch Admin here:
#ServiceActivator
public JobLaunchRequest adapt(File file) throws NoSuchJobException {
JobParameters jobParameters = new JobParametersBuilder().addString(
"input.file", file.getAbsolutePath()).toJobParameters();
return new JobLaunchRequest(job, jobParameters);
}
wrap it to a JobLaunchRequest ( which is just a holder for a Job and JobParameters ) and send this request [as a message] to JobLaunchingMessageHandler:
<service-activator input-channel="jobLauncher">
<beans:bean class="org.springframework.batch.integration.launch.JobLaunchingMessageHandler">
<beans:constructor-arg ref="jobLauncher" />
</beans:bean>
</service-activator>
that would launch the job.
"input.file" is a parameter that is bound at runtime ( hence #{...} ):
<bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">
<property name="resource" value="#{jobParameters[input.file]}" />
... line mapper and other props
</bean>

I'm not sure I understand why you have a message queue, a message-driven POJO/EJB, AND a batch job.
One way to do it is to have the message driven POJO/EJB do the work. It's already an asynch process. You can pool the message driven beans so there are sufficient workers to handle the load. Why add complexity?
If you'd rather not do that, forget the queue and use Spring Batch on its own. I wouldn't do both.

Related

SPRING BATCH : How to configure remote chunking for multiple jobs running in a task executor

I am new to spring batch processing. I am using remote chunking where there is a master , multiple slaves and ActiveMQ for messaging.
Master has a job and a job launcher and the job launcher has a task-executor which is having following configuration
<task:executor id="batchJobExecutor" pool-size="2"queue-capacity="100" />.
Chunk configuration is
<bean id="chunkWriter"
class="org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter" scope="step">
<property name="messagingOperations" ref="messagingGateway" />
<property name="replyChannel" ref="replies" />
<property name="throttleLimit" value="50" />
<property name="maxWaitTimeouts" value="60000" />
</bean>
<bean id="chunkHandler"
class="org.springframework.batch.integration.chunk.RemoteChunkHandlerFactoryBean">
<property name="chunkWriter" ref="chunkWriter" />
<property name="step" ref="someJobId" />
</bean>
<integration:service-activator
input-channel="requests" output-channel="replies" ref="chunkHandler" />
So we are allowed to run two jobs at a time and the remaining jobs will be in the queue.
When two jobs are submitted Master is creating the chunk and submitting to the queue and slave is processing.
But the acknowledgment from the slave to master is giving error
java.lang.IllegalStateException: Message contained wrong job instance id [9331] should have been [9332].
at org.springframework.util.Assert.state(Assert.java:385) ~[Assert.class:4.1.6.RELEASE]
at org.springframework.batch.integration.chunk.ChunkMessageChannelItemWriter.getNextResult
Please help me with this.
The ChunkMessageChannelItemWriter is only designed for one concurrent step - you need to put it in step scope so each job gets its own instance - see this test case
EDIT
Actually, no; that won't work - since the bean instances are using the same reply channel, they could get each other's replies. I opened a JIRA Issue.
This is a very old post, but I think the issue you see here might be related to the throttle limit being larger than the maxWaitTimouts value 4.
What we have seen is that the implementation will not read more than maxWaitTimeouts entries from the reply queue after the job finished. I think this is a bug.
See also the question I asked on stackoverflow here : Remote batch job does not read all responses in afterStep method
I made a bug report for this as well: https://jira.spring.io/browse/BATCH-2651 and am creating a PR to fix the issue.

Spring Integration: Poller acting weird

I have a configuration to read the data from DB using jdbc:inbound-channel-adapter. The configuration:
<int-jdbc:inbound-channel-adapter query="SELECT * FROM requests WHERE processed_status = '' OR processed_status IS NULL LIMIT 5" channel="requestsJdbcChannel"
data-source="dataSource" update="UPDATE requests SET processed_status = 'INPROGRESS', date_processed = NOW() WHERE id IN (:id)" >
<int:poller fixed-rate="30000" />
</int-jdbc:inbound-channel-adapter>
<int:splitter input-channel="requestsJdbcChannel" output-channel="requestsQueueChannel"/>
<int:channel id="requestsQueueChannel">
<int:queue capacity="1000"/>
</int:channel>
<int:chain id="requestsChain" input-channel="requestsQueueChannel" output-channel="requestsApiChannel">
<int:poller max-messages-per-poll="1" fixed-rate="1000" />
.
.
</int:chain>
In the above configuration, I have defined the jdbc poller with fixed-rate of 30 seconds. When there is direct channel instead of requestsQueueChannel the select query gets only 5 rows (since I am using limiting the rows in select query) and waits for another 30 seconds for next poll.
But after I introduce requestsQueueChannel with queue and added poller inside requestsChain, the jdbc-inbound doesn't work as expected. It doesn't wait for another 30 second for next poll. Sometimes it polls the DB twice in a row(within a second) as if there are 2 threads running and gets two sets of rows from DB. However, there is no async handoff except these mentioned above.
My understanding is that even if there is requestsQueueChannel, once it executes the select query it should wait for another 30 seconds to poll the DB. Is there anything I am missing? I just want to understand the behavior of this configuration.
When using a DirectChannel the next poll isn't considered until the current one ends.
When using a QueueChannel (or task executor), the poller is free to run again.
Inbound adapters have max-messages-per-poll set to 1 by default so your config should work as expected. Can you post a DEBUG log somewhere?
The issue of Spring integration pollers activating twice, as though they are 2 threads, is the basically the same problem I came across here, with file system pollers:
How to prevent duplicate Spring Integration service activations when polling directory
Apparently this is a relatively common misconfiguration, where Spring root and servlet contexts both load the Spring Integration configuration. As a result of this, there are indeed two threads, and pollers can be seen to activate twice within their polling period. Usually within a few seconds of each other, as each will start when its context loads.
My approach to ensuring that the Spring Integration configuration was only loaded in a single context was to structure the project packages to ensure separation.
First define a web config which only picks up classes under the "web" package.
#Configuration
#ComponentScan(basePackages = { "com.myapp.web" })
#EnableWebMvc
public class WebConfig extends WebMvcConfigurerAdapter {
#Override
public void configureDefaultServletHandling(
DefaultServletHandlerConfigurer configurer) {
configurer.enable();
}
}
Create separate root configuration classes to load beans such as services and repositories, which do not belong in the servlet context. One of these should load the Spring Integration configuration. i.e.:
#Configuration
#ComponentScan(basePackages = { "com.myapp.eip" })
#ImportResource(value = { "classpath:META-INF/spring/integration-context.xml" })
public class EipConfig {
}
An additional factor in the configuration that took a little while to work out, was that my servlet filters and web security config needed to be in the root context rather than the servlet context.

Is there a way to start the file:inbound-channel-adapter through code?

I have a situation where a particular file is to be copied from a location to another. The polling is not required as the action will be deliberately triggered. Also the directory from which the file is to be picked up is decided at run time.
I can have a configuration as follows:
<int-file:inbound-channel-adapter id="filesIn" directory="#outPathBean.getPath()" channel="abc" filter="compositeFilter" >
<int:poller id="poller" fixed-delay="5000" />
</int-file:inbound-channel-adapter>
<int:channel id="abc"/>
<int-file:outbound-channel-adapter channel="abc" id="filesOut"
directory-expression="file:${paths.root}"
delete-source-files="true" filename-generator="fileNameGenerator" />
filenamegenerator and composite filter classes are configured as well.
I am new to spring. Please point me in the right direction!!
You can use a FireOnceTrigger as discussed in this answer and start/stop the adapter as needed.
To get a reference to the adapter (a SourcePollingChannelAdapter), inject (or #Autowire etc.) it as a Lifecycle bean (start()/stop() etc).
Or you can do the whole thing programmatically using a FileReadingMessageSource, and discussed in this answer.
Sample for start/stop Adapter.incase its useful.
SourcePollingChannelAdapter sourcePollingChannelAdapter = (SourcePollingChannelAdapter) context
.getBean("filesIn"); //adapter id in the bean configuration
// Stop
if (sourcePollingChannelAdapter.isRunning()) {
sourcePollingChannelAdapter.stop();
}
// Set Cron Expression if required when start or use any triggers
CronTrigger cronTrigger = new CronTrigger("* * * * * ?");
sourcePollingChannelAdapter.setTrigger(cronTrigger);
// Start
if (!sourcePollingChannelAdapter.isRunning()) {
sourcePollingChannelAdapter.start();
}

JMS Listener fires before Hibernate is setup on Server startup

I have a grails 2.2 application that uses the JMS plugin (using version 1.3).
The situation I have is that when my server starts up, the JMS plugin initialises and the Listener service grabs any waiting messages on the queue before the server has completed setting up.
Specifically, it hits the first hibernate query in the code and fails with the following error:
| Error 2014-10-14 11:06:56,535 [ruleInputDataListenerJmsListenerContainer-1] ERROR drms.RuleInputDataListenerService - Message Exception: Failed to process JMS Message.
groovy.lang.MissingMethodException: No signature of method: au.edu.csu.drms.Field.executeQuery() is applicable for argument types: () values: []
Possible solutions: executeQuery(java.lang.String), executeQuery(java.lang.String, java.util.Collection), executeQuery(java.lang.String, java.util.Map), executeQuery(java.lang.String, java.util.Collection, java.util.Map), executeQuery(java.lang.String, java.util.Map, java.util.Map)
The code in question is correct:
String query = "SELECT f FROM field f WHERE (attributeName = :attributeName AND entityName = :entityName)"
def fieldList = Field.executeQuery(query, [attributeName: _attributeName, entityName: _entityName])
From what I can tell, it's a matter of hibernate not being initialised when the JMS listener executes the onMessage method. It also happens with a withCriteria or any other hibernate query method.
It only happens when there are messages on the queue on server start-up and fails for each message waiting. Once the queue is completed and it processes new messages, it works fine.
Is there a way to either get hibernate to initialise in time or to delay the Listener service from execute (much like the Quartz plugin that has a start up delay timer)?
Update:
I don't use a bean configuration because it's a daemon type application - we have no beans to define.
Is there a way to use #DependsOn and have my listener depend on Hibernate itself?
Let's say you have the following EntityManagerFactory configuration:
<bean id="entityManagerFactory" class="org.springframework.orm.jpa.LocalContainerEntityManagerFactoryBean">
<property name="jpaDialect" ref="jpaDialect"/>
</bean>
You need to make your JMS connection factory depend on entityManagerFactory:
<bean id="jmsConnectionFactory" class="org.apache.activemq.pool.PooledConnectionFactory"
destroy-method="stop" depends-on="jmsBroker, entityManagerFactory">
<property name="connectionFactory" ref="activeMQConnectionFactory"/>
</bean>
Unfortunately, the #DependsOn notation didn't work due to the nature of my application (no bean configuration).
Given that there are a few bugs/problems with the Grails JMS plugin, the solution to my problem was to use the following code prior to processing the JMS Message:
def onMessage(msg) {
try {
Rule.withNewTransaction {
log.info("Hibernate is up and running!")
}
} catch (Exception e) {
resendMessage(msg)
}
// rest of code...
}
Where I use a transaction to test if Hibernate is fully initialised (already tested that it's not when the JMS listener fires at start up) and if it catches an exception, it will resend the message back to the queue for re-processing.

Issue with Quartz persistent jobs while using with Spring

I have configured a spring's method invoking job previously which is working fine. Now my requirement is to have this job as persistent which will run in a clustered environment.
After configuring the quartz as clustered and persistence, application is throwing the following exception at deployment:
java.io.NotSerializableException: Unable to serialize JobDataMap for
insertion into database because the value of property 'methodInvoker'
is not serializable:
org.springframework.scheduling.quartz.MethodInvokingJobDetailFactoryBean
I am using the following versions:
Spring version 3.1.4.RELEASE
Quartz version 2.1.7
Update: As per the documentation of MethodInvokingJobDetailFactoryBean:
JobDetails created via this FactoryBean are not serializable.
So, looking for some alternative approach to configure a persistent job in spring.
I have solved the problem by replacing MethodInvokingJobDetailFactoryBean with JobDetailFactoryBean. Configuration for the same is as follows:
<bean name="myJob" class="org.springframework.scheduling.quartz.JobDetailFactoryBean">
<property name="jobClass" value="mypackage.MyJob" />
<property name="group" value="MY_JOBS_GROUP" />
<property name="durability" value="true" />
</bean>
However, to Autowire the spring managed beans in my job class mypackage.MyJob, I have added the following as first line in my execute method:
class MyJob implements Job {
...
public void execute(final JobExecutionContext context) throws JobExecutionException {
// Process #Autowired injection for the given target object, based on the current web application context.
SpringBeanAutowiringSupport.processInjectionBasedOnCurrentContext(this);
...
}
}
Hope it it will help someone else facing the same issue.
When you are using persistent quartz jobs, you should be setting the org.quartz.jobStore.useProperties property to true. That forces the job data to be saved as Strings instead of Java Serialized objects.
Doing so however may cause some problems with Spring, that are easily solvable.
Check these links for more details:
http://site.trimplement.com/using-spring-and-quartz-with-jobstore-properties/
http://forum.spring.io/forum/spring-projects/container/121806-quartz-error-ioexception
The other way to solve this problem is avoid using 'jobDataMap' property for 'JobDetailFactoryBean' bean. Instead, add the dependency (the bean containing method to run) in Scheudler with 'schedulerContextAsMap' property.
<bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean">
... (other properties)...
<property name="schedulerContextAsMap">
<map>
<entry key="executeProcessBean" value-ref="executeProcessBean" />
</map>
</property>
</bean>
Since, the documentation of schedulerContextAsMap in SchedulerFactoryBean mentions about the usage when you have Spring beans.
/**
* Register objects in the Scheduler context via a given Map.
* These objects will be available to any Job that runs in this Scheduler.
* <p>Note: When using persistent Jobs whose JobDetail will be kept in the
* database, do not put Spring-managed beans or an ApplicationContext
* reference into the JobDataMap but rather into the SchedulerContext.
* #param schedulerContextAsMap Map with String keys and any objects as
* values (for example Spring-managed beans)
* #see JobDetailFactoryBean#setJobDataAsMap
*/
just add implements Serializable

Categories