Spring and background thread execution

Spring and background thread execution - java

I have a Spring Boot 1.3.5 web application (Running on Tomcat 8), one of its features is to contact a third-party API through REST and launch many lenghty jobs (From 1 to around maybe 30 depending on the user input, each one with its own REST call in a for loop). I have all this logic in a controller called using a POST with some parameters.
What I need is to launch a background task after each job has been acknowledged by the API, which would be passed some parameter (Job ID) and periodically (~30 s) poll another API to fetch the job output (Again, these jobs may take from several seconds up to an hour, and getting its job takes about 3-4 seconds plus parsing a long string) and do some business logic based on their status (Updating a DB record for now)
However I'm not sure which, if any, TaskExecutor to use, or whether I should use Java's Future structures for this. I might benefit from a Thread pool which will only run X threads parallel and queue others to not overload the server. Is there an example I can take to learn and start off?
Sample of my existing code:
#RequestMapping(value={"/job/launch"}, method={RequestMethod.POST})
public ResponseEntity<String> runJob(HttpServletRequest req) {
for (int deployments=1; deployments <= deployments_required; deployments++) {
httpPost.setEntity((HttpEntity)new StringEntity(jsonInput));
CloseableHttpResponse response = httpclient.execute(httpPost);
HttpEntity entity = response.getEntity();
responseString = EntityUtils.toString(entity, "UTF-8");
JsonObject jsonObject = new JsonParser().parse(responseString).getAsJsonObject();
if (response.getStatusLine().getStatusCode() != 200) {
resultsNotOk.add(new ResponseEntity<String>(jsonObject.get("message").getAsString(), HttpStatus.INTERNAL_SERVER_ERROR));
continue;
}
String deploymentId;
deploymentId = jsonObject.get("id").getAsString();
// Start background task to keep checking the job every few seconds and find created instance IP addresses
start_checking_execution(deploymentId);
}
}
(Yes, this code may be better put in a Service but it was originally built as is so I haven't moved it yet. It may be a good time to do it now)

I would say it's work for Spring Batch
You can define Reader/Processor (to convert source read to target write objects)/Writer to work with the the logic
You can use JobOperator to get job state. See job status transitions

Related

How to use updateMetadata(request) in Cognos Analytics in parallel threads?

I am trying to update an existing code to send mdprovider requests to the metadata service to update or publish the metadata in an unpublished model using parallel threads. My model is having 1000 query subjects and initially we are validating it sequentially. It looks almost 4 hrs to complete. Now what I am trying to do is run in 3 parallel threads and my aim to bring down the time.
I have used ExecuterService and created a fixed thread pool of 3 and submitted the task.
ExecutorService exec = Executors.newFixedThreadPool(thread);
exe.submit(task)
and inside the run method I connected to cognos, logon to cognos and calls the updateMetadata()
MetadataService_PortType mdService;
public void run() {
cognosConnect();
if (namespace.length() > 0) {
login(namespace, user name, password);
}
//xml = Will build the xml here
//Calls the method
boolean testdblResult = validateQS(xml);
Boolean validateQS(String actionXml){
//actionXML : transaction XML to test a query subject
//Cognos SDK method
result = mdService.updateMetadata(actionXml);
}
}
This is executing successfully. But the problem is, though 3 threads send request to Cognos SDK method mdService.updateMetadata() in parallel, the response is given back from the method is sequentially. for example lets say in 10th sec it send request for 3 Query subject validation in parallel, But the response of that 3 query subject is given in 15th second, 20th sec, 24th sec sequentially.
Is this the expected behaviour of Cognos? Does mdService.updateMetadata(xmlActionXml); internally execute it sequentially? or is there any other way to achieve parallelism here. I couldn't found any much information in SDK documentation.

Hold thread in spring rest request for long-polling

As I wrote in title we need in project notify or execute method of some thread by another. This implementation is part of long polling. In following text describe and show my implementation.
So requirements are that:
UserX send request from client to server (poll action) immediately when he got response from previous. In service is executed spring async method where thread immediately check cache if there are some new data in database. I know that cache is usually used for methods where for specific input is expected specific output. This is not that case, because I use cache to reduce database calls and output of my method is always different. So cache help me store notification if I should check database or not. This checking is running in while loop which end when thread find notification to read database in cache or time expired.
Assume that UserX thread (poll action) is currently in while loop and checking cache.
In that moment UserY (push action) send some data to server, data are stored in database in separated thread, and also in cache is stored userId of recipient.
So when UserX is checking cache he found id of recipient (id of recipient == his id in this case), and then break loop and fetch these data.
So in my implementation I use google guava cache which provide manually write.
private static Cache<Long, Long> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
In create method I store id of user which should read these data.
public void create(Data data) {
dataRepository.save(data);
cache.save(data.getRecipient(), null);
System.out.println("SAVED " + userId + " in " + Thread.currentThread().getName());
}
and here is method of polling data:
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = findRecent(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = findRecent(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
After User X got response he send poll request again and whole process is repeated.
Can you tell me if is this application design for long polling in java (spring) is correct or exists some better way? Key point is that when user call poll request, this request should be holded for new data for some time and not response immediately. This solution which I show above works, but question is if it will be works also for many users (1000+). I worry about it because of pausing threads which should make slower another requests when no threads will be available in pool. Thanks in advice for your effort.

Check Web Sockets. Spring supports it from version 4 on wards. It doesn't require client to initiate a polling, instead server pushes the data to client in real time.
Check the below:
https://spring.io/guides/gs/messaging-stomp-websocket/
http://www.baeldung.com/websockets-spring
Note - web sockets open a persistent connection between client and server and thus may result in more resource usage in case of large number of users. So, if you are not looking for real time updates and is fine with some delay then polling might be a better approach. Also, not all browsers support web sockets.
Web Sockets vs Interval Polling
Longpolling vs Websockets
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
In your current approach, if you are having a concern with large number of threads running on server for multiple users then you can trigger the polling from front-end every time instead. This way only short lived request threads will be triggered from UI looking for any update in the cache. If there is an update, another call can be made to retrieve the data. However don't hit the server every other second as you are doing otherwise you will have high CPU utilization and user request threads may also suffer. You should do some optimization on your timing.
Instead of hitting the cache after a delay of 1 sec for 100 times, you can apply an intelligent algorithm by analyzing the pattern of cache/DB update over a period of time.
By knowing the pattern, you can trigger the polling in an exponential back off manner to hit the cache when the update is most likely expected. This way you will be hitting the cache less frequently and more accurately.

Google Cloud Platform blocking BatchRequest request - Java

Is it possible to wait until a BatchJob (BatchRequest objecT) in GCP is completed?
I.g. you can do it with a normal Job:
final Job job = createJob(jobId, projectId, datasetId, tableId, destinationBucket);
service.jobs().insert(projectId, job).execute();
final Get request = service.jobs().get(projectId, jobId);
JobStatus response;
while (true) {
Thread.sleep(500); // improve this sleep policy
response = request.execute().getStatus();
if (response.getState().equals("DONE") || response.getState().equals("FAILED"))
break;
}
Something like the above code works fine. The problem with batchRequest is that the jobRequest.execute() method does not return a Response object.
When you execute it, the batch request returns after it has initialised all the jobs specified in its queue but it does not wait until all of them are really finished. Indeed your execute() method returns but you can have failing jobs later on (i.g. error due to quota issue, schema issues etc.) and I can't notify the client on time with the right information.
You can just check the status of all the created jobs in the web UI with the job history button from the BigQuery view, you can't return error message to a client.
Any idea with that?

Convert a for loop to a Multi-threaded chunk

I have a following piece for loop in a function which I intended to parallelize but not sure if the load of multiple threads will overweight the benefit of concurrency.
All I need is to send different log files to corresponding receivers. For the timebeing lets say number of receivers wont more than 10. Instead of sending log files back to back, is it more efficient if I send them all parallel?
for(int i=0; i < receiversList.size(); i++)
{
String receiverURL = serverURL + receiversList.get(i);
HttpPost method = new HttpPost(receiverURL);
String logPath = logFilesPath + logFilesList.get(i);
messagesList = readMsg(logPath);
for (String message : messagesList) {
StringEntity entity = new StringEntity(message);
log.info("Sending message:");
log.info(message + "\n");
method.setEntity(entity);
if (receiverURL.startsWith("https")) {
processAuthentication(method, username, password);
}
httpClient.execute(method).getEntity().getContent().close();
}
Thread.sleep(500); // Waiting time for the message to be sent
}
Also please tell me how can I make it parallel if it is gonna work? Should I do it manual or use ExecutorService?

All I need is to send different log files to corresponding receivers. For the time being lets say number of receivers won't be more than 10. Instead of sending log files back to back, is it more efficient if I send them all parallel?
There are a lot of questions to be asked before we can determine if doing this in parallel will buy you anything. You mentioned "receivers" but are you really talking about different receiving servers on different web addresses or are all threads sending their log files to the same server? If it is the latter then chances are you will get very little improvement in speed with concurrency. A single thread should be able to fill the network pipeline just fine.
Also, you probably would get no speed up if the messages are small. Only large messages would take any time and give you any true savings if they were sent in parallel.
I'm most familiar with the ExecutorService classes. You could do something like:
ExecutorService threadPool = Executors.newFixedThreadPool(10);
...
threadPool.submit(new Runnable() {
// you could create your own Runnable class if each one needs its own httpClient
public void run() {
StringEntity entity = new StringEntity(message);
...
// we assume that the client is some sort of pooling client
httpClient.execute(method).getEntity().getContent().close();
}
}
});
What will be good is if you want to queue up these messages and send them in a background thread to not slow down your program. Then you could submit the messages to the threadPool and keep on moving. Or you could put them in BlockingQueue<String> and have a thread taking from the BlockingQueue and calling the httpClient.execute(...).
More implementation details from this good ExecutorService tutorial.
Lastly, how about putting all of your messages into one entity and divide the messages on the server. That would be the most efficient although you might not control the server handler code.

Hello ExecutorService is certainly an option. You have 4 ways to do it in Java.
Using Threads (exposes to many details easy to make mistake)
Executor service as you have already mentioned. It comes from Java 6
Here is a tutorial demonstrating ExecutorService http://tutorials.jenkov.com/java-util-concurrent/executorservice.html
ForkJoin framework comes from Java 7
ParallelStreams comes from Java 8 bellow is a solution using ParallelStreams
Going for higher level api will spare you some errors you might otherwise do.
receiversList.paralelstream().map(t->{
String receiverURL = serverURL + receiversList.get(i);
HttpPost method = new HttpPost(receiverURL);
String logPath = logFilesPath + logFilesList.get(i);
return readMsg(logPath);
})
.flatMap(t->t.stream)
.forEach(t->{
StringEntity entity = new StringEntity(message);
log.info("Sending message:");
log.info(message + "\n");
method.setEntity(entity);
if (receiverURL.startsWith("https")) {
processAuthentication(method, username, password);
}
httpClient.execute(method).getEntity().getContent().close();})

Recursive function call with time delay

I have a web application, I need to run a backgroung process which will hit a web-service, after getting the response it will wait for few seconds(say 30) then again hit the service. The response data can vary from very less to very large, so i dont want to call the processagain untill i am finished with processing of data. So, its a recursive call with a time delay. How i intend to do is:
Add a ContextListener to web app.
On contextIntialized() method , call invokeWebService() i.e. arbitary method to hit web service.
invokeWebService will look like:
invokeWebService()
{
//make request
//hit service
//get response
//process response
timeDelayInSeconds(30);
//recursive call
invokeWebService();
}
Pls. suggest whether I am doing it right. Or go with threads or schedulers. Pls. answer with sample codes.

You could use a ScheduledExecutorService, which is part of the standard JDK since 1.5:
ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
Runnable r = new Runnable() {
#Override
public void run() {
invokeWebService();
}
};
scheduler.scheduleAtFixedRate(r, 0, 30, TimeUnit.SECONDS);

It is not recursive but repeated. You have two choice here:
Use a Timer and a TimerTask with scheduleAtFixedRate
Use Quartz with a repeated schedule.
In quartz, you can create a repeated schedule with this code:
TriggerBuilder.newTrigger().withSchedule(SimpleScheduleBuilder.repeatSecondlyForever(30))
.build()

From what I am getting, waiting sort of implies hanging, which I do not really think is a good idea. I would recommend you use something such as Quartz and run your method at whatever interval you wish.
Quartz is a full-featured, open source job scheduling service that can
be integrated with, or used along side virtually any Java EE or Java
SE application
Tutorials can be accessed here.
As stated in here you can do something like so:
JobDetail existingJobDetail = sched.getJobDetail(jobName, jobGroup);
if (existingJobDetail != null) {
List<JobExecutionContext> currentlyExecutingJobs = (List<JobExecutionContext>) sched.getCurrentlyExecutingJobs();
for (JobExecutionContext jec : currentlyExecutingJobs) {
if(existingJobDetail.equals(jec.getJobDetail())) {
//String message = jobName + " is already running.";
//log.info(message);
//throw new JobExecutionException(message,false);
}
}
//sched.deleteJob(jobName, jobGroup); if you want to delete the scheduled but not-currently-running job
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.