How to send emails from a Java EE Batch Job - java

I have a requirement to process a list of large number of users daily to send them email and SMS notifications based on some scenario. I am using Java EE batch processing model for this. My Job xml is as follows:
<step id="sendNotification">
<chunk item-count="10" retry-limit="3">
<reader ref="myItemReader"></reader>
<processor ref="myItemProcessor"></processor>
<writer ref="myItemWriter"></writer>
<retryable-exception-classes>
<include class="java.lang.IllegalArgumentException"/>
</retryable-exception-classes>
</chunk>
</step>
MyItemReader's onOpen method reads all users from database, and readItem() reads one user at a time using list iterator. In myItemProcessor, the actual email notification is sent to user, and then the users are persisted in database in myItemWriter class for that chunk.
#Named
public class MyItemReader extends AbstractItemReader {
private Iterator<User> iterator = null;
private User lastUser;
#Inject
private MyService service;
#Override
public void open(Serializable checkpoint) throws Exception {
super.open(checkpoint);
List<User> users = service.getUsers();
iterator = users.iterator();
if(checkpoint != null) {
User checkpointUser = (User) checkpoint;
System.out.println("Checkpoint Found: " + checkpointUser.getUserId());
while(iterator.hasNext() && !iterator.next().getUserId().equals(checkpointUser.getUserId())) {
System.out.println("skipping already read users ... ");
}
}
}
#Override
public Object readItem() throws Exception {
User user=null;
if(iterator.hasNext()) {
user = iterator.next();
lastUser = user;
}
return user;
}
#Override
public Serializable checkpointInfo() throws Exception {
return lastUser;
}
}
My problem is that checkpoint stores the last record that was executed in the previous chunk. If I have a chunk with next 10 users, and exception is thrown in myItemProcessor of the 5th user, then on retry the whole chunck will be executed and all 10 users will be processed again. I don't want notification to be sent again to the already processed users.
Is there a way to handle this? How should this be done efficiently?
Any help would be highly appreciated.
Thanks.

I'm going to build on the comments from #cheng. My credit to him here, and hopefully my answer provides additional value in organizing and presenting the options usefully.
Answer: Queue up messages for another MDB to get dispatched to send emails
Background:
As #cheng pointed out, a failure means the entire transaction is rolled back, and the checkpoint doesn't advance.
So how to deal with the fact that your chunk has sent emails to some users but not all? (You might say it rolled back but with "side effects".)
So we could restate your question then as: How to send email from a batch chunk step?
Well, assuming you had a way to send emails through an transactional API (implementing XAResource, etc.) you could use that API.
Assuming you don't, I would do a transactional write to a JMS queue, and then send the emails with a separate MDB (as #cheng suggested in one of his comments).
Suggested Alternative: Use ItemWriter to send messages to JMS queue, then use separate MDB to actually send the emails
With this approach you still gain efficiency by batching the processing and the updates to your DB (you were only sending the emails one at a time anyway), and you can benefit from simple checkpointing and restart without having to write complicated error handling.
This is also likely to be reusable as a pattern across batch jobs and outside of batch even.
Other alternatives
Some other ideas that I don't think are as good, listed for the sake of discussion:
Add batch application logic tracking users emailed (with ItemProcessListener)
You could build your own list of either/both successful/failed emails using the ItemProcessListener methods: afterProcess and onProcessError.
On restart, then, you could know which users had been emailed in the current chunk, which we are re-positioned to since the entire chunk rolled back, even though some emails have already been sent.
This certainly complicates your batch logic, and you also have to persist this success or failure list somehow. Plus this approach is probably highly specific to this job (as opposed to queuing up for an MDB to process).
But it's simpler in that you have a single batch job without the need for a messaging provider and a separate app component.
If you go this route you might want to use a combination of both a skippable and a "no-rollback" retryable exception.
single-item chunk
If you define your chunk with item-count="1", then you avoid complicated checkpointing and error handling code. You sacrifice efficiency though, so this would only make sense if the other aspects of batch were very compelling: e.g. scheduling and management of jobs through a common interface, the ability to restart at the failing step within a job
If you were to go this route, you might want to consider defining socket and timeout exceptions as "no-rollback" exceptions (using ) since there's nothing to be gained from rolling back, and you might want to retry on a network timeout issue.
Since you specifically mentioned efficiency, I'm guessing this is a bad fit for you.
use a Transaction Synchronization
This could work perhaps, but the batch API doesn't especially make this easy, and you still could have a case where the chunk completes but one or more email sends fail.

Your current item processor is doing something outside the chunk transaction scope, which has caused the application state to be out of sync. If your requirement is to send out emails only after all items in a chunk have successfully completed, then you can move the emailing part to a ItemWriterListener.afterWrite(items).

Related

How to scale more than 1 instance and deal with scheduled task in spring?

I am having a push notifications being send to android and ios application through spring boot every day at 8am Europe/Paris.
If I run multiple instances, the notifications will send multiple times. I am thinking to send every day notifications send on the database, and check them but I am worried it still run multiple times, this is what I am doing:
#Component
public class ScheduledTasks {
private static final Logger log = LoggerFactory.getLogger(ScheduledTasks.class);
private static final SimpleDateFormat dateFormat = new SimpleDateFormat("HH:mm:ss");
#Autowired
private ExpoPushTokenRepository expoPushTokenRepository;
#Autowired
private ExpoPushNotificationService expoPushNotificationService;
#Autowired
private MessageSource messageSource;
// TODO: if instances > 1, this will run multiple times, save to database the notifications send and prevent multiple sending.
#Scheduled(cron = "${cron.promotions.notification}", zone = "Europe/Paris")
public void sendNewPromotionsNotification() {
List<ExpoPushToken> expoPushTokenList = expoPushTokenRepository.findAll();
ArrayList<NotifyRequest> notifyRequestList = new ArrayList<>();
for (ExpoPushToken expoPushToken : expoPushTokenList) {
NotifyRequest notifyRequest = new NotifyRequest(
expoPushToken.getToken(),
"This is a test title",
"This is a test subtitle",
"This is a test body"
);
notifyRequestList.add(notifyRequest);
}
expoPushNotificationService.sendPushNotificationToList(notifyRequestList);
log.info("{} Send push notification to " + expoPushTokenList.size() + " userse", dateFormat.format(new Date()));
}
}
Does anybody have an idea on how I can prevent that safely?
Quartz would be my mostly database-agnostic solution for the task at hand, but was ruled out, so we are not going to discuss it.
The solution we are going to explore instead makes the following assumptions:
Postgres >= 9.5 is used (because we are going to use SKIP LOCKED, which was introduced in Postgresl 9.5).
It is okay to run a native query.
Under this conditions, we can retrieve batches of notifications from multiple instances of the application running through the following query:
SELECT * FROM expo_push_token FOR UPDATE SKIP LOCKED LIMIT 100;
This will retrieve and lock up to 100 entries from the table expo_push_token. If two instances of the application execute this query simultaneously, the received results will be disjoint. 100 is just some sample value. We may want to fine-tune this value for our use case. The locks stay active until the current transaction ends.
After an instance has fetched a batch of notifications, it has to also delete the entries it locked from the table or otherwise mark that this entry has been processed (if we go down this route, we have to modify the query above to filter-out already processed entires) and close the current transaction to release the locks. Each instance of the application would then repeat this query until the query returns zero entries.
There is also an alternative approach: an instance first fetches a batch size of notifications to send, keeps the transaction to the database open (thus continues holding the lock on the database), sends out its notification and then deletes/updates the entries and closes the transactions.
The two solutions have different strengths/weaknesses:
the first solutions keeps the transaction short. But if the application crashes in the middle of sending out notificatiosn, the part of its batch that was not send out is lost in this run.
the second solution keeps the transaction open, for possibly a long time. If it crashes in the middle fo sending out notifications, all entries will be unlocked and its batch would be re-processed, possibly resulting in some notifications being sent out twice.
For this solution to work, we also need some kind of job that fills table expo_push_token with the data we need. This job should run beforehand, i.e. its execution should not overlap with the notification sending process.

How to limit the number of active Spring WebClient calls

I have a requirement where I read a bunch of rows (thousands) from a SQL DB using Spring Batch and call a REST Service to enrich content before writing them on a Kafka topic.
When using the Spring Reactive webClient, how do I limit the number of active non-blocking service calls? Should I somehow introduce a Flux in the loop after I read data using Spring Batch?
(I understand the usage of delayElements and that it serves a different purpose, as when a single Get Service Call brings in lot of data and you want the server to slow down -- here though, my use case is a bit different in that I have many WebClient calls to make and would like to limit the number of calls to avoid out of memory issues but still gain the advantages of non-blocking invocations).
Very interesting question. I pondered about it and I thought of a couple of ideas on how this could be done. I will share my thoughts on it and hopefully there are some ideas here that perhaps help you with your investigation.
Unfortunately, I'm not familiar with Spring Batch. However, this sounds like a problem of rate limiting, or the classical producer-consumer problem.
So, we have a producer that produces so many messages that our consumer cannot keep up, and the buffering in the middle becomes unbearable.
The problem I see is that your Spring Batch process, as you describe it, is not working as a stream or pipeline, but your reactive Web client is.
So, if we were able to read the data as a stream, then as records start getting into the pipeline those would get processed by the reactive web client and, using back-pressure, we could control the flow of the stream from producer/database side.
The Producer Side
So, the first thing I would change is how records get extracted from the database. We need to control how many records get read from the database at the time, either by paging our data retrieval or by controlling the fetch size and then, with back pressure, control how many of those are sent downstream through the reactive pipeline.
So, consider the following (rudimentary) database data retrieval, wrapped in a Flux.
Flux<String> getData(DataSource ds) {
return Flux.create(sink -> {
try {
Connection con = ds.getConnection();
con.setAutoCommit(false);
PreparedStatement stm = con.prepareStatement("SELECT order_number FROM orders WHERE order_date >= '2018-08-12'", ResultSet.TYPE_FORWARD_ONLY);
stm.setFetchSize(1000);
ResultSet rs = stm.executeQuery();
sink.onRequest(batchSize -> {
try {
for (int i = 0; i < batchSize; i++) {
if (!rs.next()) {
//no more data, close resources!
rs.close();
stm.close();
con.close();
sink.complete();
break;
}
sink.next(rs.getString(1));
}
} catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
catch (SQLException e) {
//TODO: close resources here
sink.error(e);
}
});
}
In the example above:
I control the amount of records we read per batch to be 1000 by setting a fetch size.
The sink will send the amount of records requested by the subscriber (i.e. batchSize) and then wait for it to request more using back pressure.
When there are no more records in the result set, then we complete the sink and close resources.
If an error occurs at any point, we send back the error and close resources.
Alternatively I could have used paging to read the data, probably simplifying the handling of resources by having to reissue a query at every request cycle.
You may consider also doing something if subscription is cancelled or disposed (sink.onCancel, sink.onDispose) since closing the connection and other resources is fundamental here.
The Consumer Side
At the consumer side you register a subscriber that only requests messages at a speed of 1000 at the time and it will only request more once it has processed that batch.
getData(source).subscribe(new BaseSubscriber<String>() {
private int messages = 0;
#Override
protected void hookOnSubscribe(Subscription subscription) {
subscription.request(1000);
}
#Override
protected void hookOnNext(String value) {
//make http request
System.out.println(value);
messages++;
if(messages % 1000 == 0) {
//when we're done with a batch
//then we're ready to request for more
upstream().request(1000);
}
}
});
In the example above, when subscription starts it requests the first batch of 1000 messages. In the onNext we process that first batch, making http requests using the Web client.
Once the batch is complete, then we request another batch of 1000 from the publisher, and so on and so on.
And there your have it! Using back pressure you control how many open HTTP requests you have at the time.
My example is very rudimentary and it will require some extra work to make it production ready, but I believe this hopefully offers some ideas that can be adapted to your Spring Batch scenario.

Hold thread in spring rest request for long-polling

As I wrote in title we need in project notify or execute method of some thread by another. This implementation is part of long polling. In following text describe and show my implementation.
So requirements are that:
UserX send request from client to server (poll action) immediately when he got response from previous. In service is executed spring async method where thread immediately check cache if there are some new data in database. I know that cache is usually used for methods where for specific input is expected specific output. This is not that case, because I use cache to reduce database calls and output of my method is always different. So cache help me store notification if I should check database or not. This checking is running in while loop which end when thread find notification to read database in cache or time expired.
Assume that UserX thread (poll action) is currently in while loop and checking cache.
In that moment UserY (push action) send some data to server, data are stored in database in separated thread, and also in cache is stored userId of recipient.
So when UserX is checking cache he found id of recipient (id of recipient == his id in this case), and then break loop and fetch these data.
So in my implementation I use google guava cache which provide manually write.
private static Cache<Long, Long> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
In create method I store id of user which should read these data.
public void create(Data data) {
dataRepository.save(data);
cache.save(data.getRecipient(), null);
System.out.println("SAVED " + userId + " in " + Thread.currentThread().getName());
}
and here is method of polling data:
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = findRecent(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = findRecent(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
After User X got response he send poll request again and whole process is repeated.
Can you tell me if is this application design for long polling in java (spring) is correct or exists some better way? Key point is that when user call poll request, this request should be holded for new data for some time and not response immediately. This solution which I show above works, but question is if it will be works also for many users (1000+). I worry about it because of pausing threads which should make slower another requests when no threads will be available in pool. Thanks in advice for your effort.
Check Web Sockets. Spring supports it from version 4 on wards. It doesn't require client to initiate a polling, instead server pushes the data to client in real time.
Check the below:
https://spring.io/guides/gs/messaging-stomp-websocket/
http://www.baeldung.com/websockets-spring
Note - web sockets open a persistent connection between client and server and thus may result in more resource usage in case of large number of users. So, if you are not looking for real time updates and is fine with some delay then polling might be a better approach. Also, not all browsers support web sockets.
Web Sockets vs Interval Polling
Longpolling vs Websockets
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
In your current approach, if you are having a concern with large number of threads running on server for multiple users then you can trigger the polling from front-end every time instead. This way only short lived request threads will be triggered from UI looking for any update in the cache. If there is an update, another call can be made to retrieve the data. However don't hit the server every other second as you are doing otherwise you will have high CPU utilization and user request threads may also suffer. You should do some optimization on your timing.
Instead of hitting the cache after a delay of 1 sec for 100 times, you can apply an intelligent algorithm by analyzing the pattern of cache/DB update over a period of time.
By knowing the pattern, you can trigger the polling in an exponential back off manner to hit the cache when the update is most likely expected. This way you will be hitting the cache less frequently and more accurately.

Avoiding multiple calls when consuming a web service

I have a task where a user consumes XML from a third party. The XML feed is only updated once a day. The XML is stored in a database and returned to the user when requested. If the XML is not in the database, then it is retrieved from the third party, stored in the database and returned to the user. All subsequent requests will simply read the XML from the database.
Now my question. Say it takes 10 seconds for the request to the third party to return. In this period, there are multiple server calls for the same data. I don't want each of these to fire off requests to the third party and I don't want the user to receive nothing or an error. They should probably wait for the first request to complete at which point the XML would be available. This is a relatively simple problem but I want to know what the best way of catering for it is.
Do I just use a simple flag to control requests or maybe something like a semaphore? Are there better solutions based on the stack I intend to use which is the Play framework and a cassandra backend. Is there something I could do with callbacks or triggers?
By the way, I need to lazy load the data when the first request comes in. So, in this task it isn't an option to get the data in a separate process or when the app starts...
Thanks
All you need to do is create a separate component that is responsible to get the XML from the third party and save it to the database.
In your code the various thread try to "fetch" the XML from this component.
This component returns the XML from the database if it exists. If it does not exist then you use a ReentrantLock to synchronize.
So you do a trylock and only one of your threads succeeds. The rest will be blocked. When the lock is released the other threads are unblocked but the XML has already been fetched from the third party and stored to the database by the thread that first managed to gain the lock. So the other threads just return the XML from the DB.
Example code (this is just a "pseudo code" to get you started. You should handle exceptions etc but the main skeleton can be used. Do NOT forget to unlock in a finally so that your code does not block indefinitelly):
public String getXML() {
String xml = getXMLFromDatabase();
if(xml == null) {
if(glocalLock.tryLock()) {
try{
xml = getXMLFromThirdParty();
storeXMLToDatabase(xml);
}
finally {
globalLock.unlock(); //ok! got XML and stored in DB. Wake-up others!
}
}
else {
try{ //Another thread got the lock and will do the query. Just wait on lock!
globalLock.lock();
}
finally {
//woken up but the xml is already fetched
xml = getXMLFromDatabase();
globalLock.unlock();
}
}
return xml;
}

How to synchronize concurring Web Service calls in Java

I'm currently developing some web services in Java (& JPA with MySQL connection) that are being triggered by an SAP System.
To simplify my problem I'm referring the two crucial entities as BlogEntry and Comment. A BlogEntry can have multiple Comments. A Comment always belongs to exactly one BlogEntry.
So I have three Services (which I can't and don't want to redefine, since they're defined by the WSDL I exported from SAP and used parallel to communicate with other Systems): CreateBlogEntry, CreateComment, CreateCommentForUpcomingBlogEntry
They are being properly triggered and there's absolutely no problem with CreateBlogEntry or CreateComment when they're called seperately.
But: The service CreateCommentForUpcomingBlogEntry sends the Comment and a "foreign key" to identify the "upcoming" BlogEntry. Internally it also calls CreateBlogEntry to create the actual BlogEntry. These two services are - due to their asynchronous nature - concurring.
So I have two options:
create a dummy BlogEntry and connect the Comment to it & update the BlogEntry, once CreateBlogEntry "arrives"
wait for CreateBlogEntry and connect the Comment afterwards to the new BlogEntry
Currently I'm trying the former but once both services are fully executed, I end up with two BlogEntries. One of them only has the ID delivered by CreateCommentForUpcomingBlogEntry but it is properly connected to the Comment (more the other way round). The other BlogEntry has all the other information (such as postDate or body), but the Comment isn't connected to it.
Here's the code snippet of the service implementation CreateCommentForUpcomingBlogEntry:
#EJB
private BlogEntryFacade blogEntryFacade;
#EJB
private CommentFacade commentFacade;
...
List<BlogEntry> blogEntries = blogEntryFacade.findById(request.getComment().getBlogEntryId().getValue());
BlogEntry persistBlogEntry;
if (blogEntries.isEmpty()) {
persistBlogEntry = new BlogEntry();
persistBlogEntry.setId(request.getComment().getBlogEntryId().getValue());
blogEntryFacade.create(persistBlogEntry);
} else {
persistBlogEntry = blogEntries.get(0);
}
Comment persistComment = new Comment();
persistComment.setId(request.getComment().getID().getValue());
persistComment.setBody(request.getComment().getBody().getValue());
/*
set other properties
*/
persistComment.setBlogEntry(persistBlogEntry);
commentFacade.create(persistComment);
...
And here's the code snippet of the implementation CreateBlogEntry:
#EJB
private BlogEntryFacade blogEntryFacade;
...
List<BlogEntry> blogEntries = blogEntryFacade.findById(request.getBlogEntry().getId().getValue());
BlogEntry persistBlogEntry;
Boolean update = false;
if (blogEntries.isEmpty()) {
persistBlogEntry = new BlogEntry();
} else {
persistBlogEntry = blogEntries.get(0);
update = true;
}
persistBlogEntry.setId(request.getBlogEntry().getId().getValue());
persistBlogEntry.setBody(request.getBlogEntry().getBody().getValue());
/*
set other properties
*/
if (update) {
blogEntryFacade.edit(persistBlogEntry);
} else {
blogEntryFacade.create(persistBlogEntry);
}
...
This is some fiddling that fails to make things happen as supposed.
Sadly I haven't found a method to synchronize these simultaneous service calls. I could let the CreateCommentForUpcomingBlogEntry sleep for a few seconds but I don't think that's the proper way to do it.
Can I force each instance of my facades and their respective EntityManagers to reload their datasets? Can I put my requests in some sort of queue that is being emptied based on certain conditions?
So: What's the best pracice to make it wait for the BlogEntry to exist?
Thanks in advance,
David
Info:
GlassFish Server 3.1.2
EclipseLink, version: Eclipse Persistence Services - 2.3.2.v20111125-r10461
If you are sure you are getting a CreateBlogEntry call, queue the CreateCommentForUpcomingBlogEntry calls and dequeue and process them once you receive the CreateBlogEntry call.
Since you are on an application server, for queues, you can probably use JMS queues that autoflush to storage or use the DB cache engine (Ehcache ?), in case you receive a lot of calls or want to provide a recovery mechanism across restarts.

Categories