How to use updateMetadata(request) in Cognos Analytics in parallel threads?

How to use updateMetadata(request) in Cognos Analytics in parallel threads? - java

I am trying to update an existing code to send mdprovider requests to the metadata service to update or publish the metadata in an unpublished model using parallel threads. My model is having 1000 query subjects and initially we are validating it sequentially. It looks almost 4 hrs to complete. Now what I am trying to do is run in 3 parallel threads and my aim to bring down the time.
I have used ExecuterService and created a fixed thread pool of 3 and submitted the task.
ExecutorService exec = Executors.newFixedThreadPool(thread);
exe.submit(task)
and inside the run method I connected to cognos, logon to cognos and calls the updateMetadata()
MetadataService_PortType mdService;
public void run() {
cognosConnect();
if (namespace.length() > 0) {
login(namespace, user name, password);
}
//xml = Will build the xml here
//Calls the method
boolean testdblResult = validateQS(xml);
Boolean validateQS(String actionXml){
//actionXML : transaction XML to test a query subject
//Cognos SDK method
result = mdService.updateMetadata(actionXml);
}
}
This is executing successfully. But the problem is, though 3 threads send request to Cognos SDK method mdService.updateMetadata() in parallel, the response is given back from the method is sequentially. for example lets say in 10th sec it send request for 3 Query subject validation in parallel, But the response of that 3 query subject is given in 15th second, 20th sec, 24th sec sequentially.
Is this the expected behaviour of Cognos? Does mdService.updateMetadata(xmlActionXml); internally execute it sequentially? or is there any other way to achieve parallelism here. I couldn't found any much information in SDK documentation.

Related

How to scale more than 1 instance and deal with scheduled task in spring?

I am having a push notifications being send to android and ios application through spring boot every day at 8am Europe/Paris.
If I run multiple instances, the notifications will send multiple times. I am thinking to send every day notifications send on the database, and check them but I am worried it still run multiple times, this is what I am doing:
#Component
public class ScheduledTasks {
private static final Logger log = LoggerFactory.getLogger(ScheduledTasks.class);
private static final SimpleDateFormat dateFormat = new SimpleDateFormat("HH:mm:ss");
#Autowired
private ExpoPushTokenRepository expoPushTokenRepository;
#Autowired
private ExpoPushNotificationService expoPushNotificationService;
#Autowired
private MessageSource messageSource;
// TODO: if instances > 1, this will run multiple times, save to database the notifications send and prevent multiple sending.
#Scheduled(cron = "${cron.promotions.notification}", zone = "Europe/Paris")
public void sendNewPromotionsNotification() {
List<ExpoPushToken> expoPushTokenList = expoPushTokenRepository.findAll();
ArrayList<NotifyRequest> notifyRequestList = new ArrayList<>();
for (ExpoPushToken expoPushToken : expoPushTokenList) {
NotifyRequest notifyRequest = new NotifyRequest(
expoPushToken.getToken(),
"This is a test title",
"This is a test subtitle",
"This is a test body"
);
notifyRequestList.add(notifyRequest);
}
expoPushNotificationService.sendPushNotificationToList(notifyRequestList);
log.info("{} Send push notification to " + expoPushTokenList.size() + " userse", dateFormat.format(new Date()));
}
}
Does anybody have an idea on how I can prevent that safely?

Quartz would be my mostly database-agnostic solution for the task at hand, but was ruled out, so we are not going to discuss it.
The solution we are going to explore instead makes the following assumptions:
Postgres >= 9.5 is used (because we are going to use SKIP LOCKED, which was introduced in Postgresl 9.5).
It is okay to run a native query.
Under this conditions, we can retrieve batches of notifications from multiple instances of the application running through the following query:
SELECT * FROM expo_push_token FOR UPDATE SKIP LOCKED LIMIT 100;
This will retrieve and lock up to 100 entries from the table expo_push_token. If two instances of the application execute this query simultaneously, the received results will be disjoint. 100 is just some sample value. We may want to fine-tune this value for our use case. The locks stay active until the current transaction ends.
After an instance has fetched a batch of notifications, it has to also delete the entries it locked from the table or otherwise mark that this entry has been processed (if we go down this route, we have to modify the query above to filter-out already processed entires) and close the current transaction to release the locks. Each instance of the application would then repeat this query until the query returns zero entries.
There is also an alternative approach: an instance first fetches a batch size of notifications to send, keeps the transaction to the database open (thus continues holding the lock on the database), sends out its notification and then deletes/updates the entries and closes the transactions.
The two solutions have different strengths/weaknesses:
the first solutions keeps the transaction short. But if the application crashes in the middle of sending out notificatiosn, the part of its batch that was not send out is lost in this run.
the second solution keeps the transaction open, for possibly a long time. If it crashes in the middle fo sending out notifications, all entries will be unlocked and its batch would be re-processed, possibly resulting in some notifications being sent out twice.
For this solution to work, we also need some kind of job that fills table expo_push_token with the data we need. This job should run beforehand, i.e. its execution should not overlap with the notification sending process.

spring boot API - document processing and executing python script on documents in parallel

Scenario:
In my application, there are 3 processes which are copying documents on a shared drive in their respective folders.
As soon as any document is copied on shared drive (by any process), directory watcher (Java) code picks up the document and call the Python script using "Process" and do some processing on the document. code snippet is as follows:
Process pr = Runtime.getRuntime().exec(pythonCommand);
// retrieve output from python script
BufferedReader bfr = new BufferedReader(new InputStreamReader(pr.getInputStream()));
String line = "";
while ((line = bfr.readLine()) != null) {
// display each output line from python script
logger.info(line);
}
pr.waitFor();
Currently my code waits till python code execution is completed on the document. Only after that it pick up the next document. Python code takes 30 secs to complete.
After processing the document, document is moved from the current folder to archive OR error folder.
Please find below screen shot of the scenario:
What is the problem?
My code is processing documents in sequential manner and I need to process the document in parallel.
As Python code takes around 30 seconds, some of the events created by directory watcher are also getting lost.
If around 400 documents are coming within a short span of time, document processing stops.
What I am looking for?
Design solution for processing documents in parallel.
In case of any failure scenario for document processing, pending documents must be processed automatically.
I tried spring boot schedular as well but still documents are getting processed in sequential manner only.
Is it possible to call the Python code in parallel as a background process.
Sorry for the long question but I am stuck at this from many days and already looked many similar questions.
Thank you!

One option would be to use a ExecutorService provided by the JDK, which can execute Runnable and Callable tasks. You will need to create a class that implements Runnable, which will execute your Python script, and after receiving a new document, you need to create a new instance of this class and pass it to the ExecutorService.
To show how this works, we will use a simple Python script that takes a thread name as an argument, prints the start time of its execution, sleeps 10 seconds and prints the end time:
import time
import sys
print "%s start : %s" % (sys.argv[1], time.ctime())
time.sleep(10)
print "%s end : %s" % (sys.argv[1], time.ctime())
First, we implement the class that runs the script and passes it the name obtained in the constructor:
class ScriptRunner implements Runnable {
private String thread;
ScriptRunner(String thread) {
this.thread = thread;
}
#Override
public void run() {
try {
ProcessBuilder ps = new ProcessBuilder("py", "test.py", thread);
ps.redirectErrorStream(true);
Process pr = ps.start();
try (BufferedReader in = new BufferedReader(new InputStreamReader(pr.getInputStream()))) {
String line;
while ((line = in.readLine()) != null) {
System.out.println(line);
}
}
pr.waitFor();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Then we create main method that creates ExecutorService with a fixed number of parallel threads in the amount of 5 and pass 10 instances of ScriptRunner to it with interruptions of 1 second:
public static void main(String[] args) throws InterruptedException {
ExecutorService executor = Executors.newFixedThreadPool(5);
for (int i = 1; i <= 10; i++) {
executor.submit(new ScriptRunner("Thread_" + i));
Thread.sleep(1000);
}
executor.shutdown();
}
If we run this method, we will see that the service, due to the specified limit, has a maximum of 5 parallel-running tasks, and the rest fall into the queue and start in freed threads:
Thread_1 start : Sat Nov 23 11:40:14 2019
Thread_1 end : Sat Nov 23 11:40:24 2019 // the first task is completed..
Thread_2 start : Sat Nov 23 11:40:15 2019
...
Thread_5 end : Sat Nov 23 11:40:28 2019
Thread_6 start : Sat Nov 23 11:40:24 2019 // ..and the sixth is started
...
Thread_10 end : Sat Nov 23 11:40:38 2019

You can try the multi pocesssing module in python here
Because of the GIL, Python's threading will not speed-up computations
that are CPU bound.
Possible duplicate of this question Solving embarassingly parallel problems using Python multiprocessing

Create two queues (Blocking Queue):
executionQueue
errorQueue
Creat two thread(You can create as many as you want based on your need):
FirstThread
SecondThread
Concept:
Producer-Consumer
Producer(Directory watcher Thread):
Directory watcher
Consumer:
FirstThread
SecondThread
Details:
The addition and deletion method of both the queue must be synchronized. A single moment only one thread will access that method.If one thread is accessing the critical area(producer or consumer) the rest of the thread will wait for their turn.
First Producer will start working and initially, the consumer is in the sleeping stage.
Why? to synchronously run the whole system.
How you will get it? Sleep producer thread after processing and in case of Consumer sleep at the starting of job.
The first producer or consumer will acquire the lock in the queue, process the work and release it. In between, if any thread (producer or consumer) comes to fetch the data, they will wait for their turn(using the concept of a Thread pool).
As soon as any document is copied on a shared drive(by any process), the directory watcher(Producer) code picks up the path of that document and store in executionQueue synchronously.
Now Consumer will come to fetch the data, FirstThread wakes up first and goes to fetch data from executionQueue. FirstThread will acquire the lock-in executionQueue and then fetch the data and release lock in it. If in between SecondThread come to fetch the data it will wait for his turn.
After fetching the data from executionQueue FirstThread will pick up the document from location and call the Python script with the fetched document.
In between SecondThread will acquire the lock and fetch the path and start processing the same concept as FirstThread.
After a few seconds later FirstThread will finish his job and then it will go to the executionQueue and again acquire the lock and fetch the file path and release the lock and start processing the same work and rest the same for SecondThread too...
In the processing of that file if any error is occurred then send that path info to the errorQueue method and analysis that errorQueue information at day end or when your system is free using the same concept or manually.
If no data is available in executionQueue, at that moment producer threads(Directory watcher) are already in the sleeping stage. Then consumer thread will come to executionQueue to fetch the data, they will not get any data and goto to the sleeping stage like 1 minute, after 1 minute again it will wake up and go to fetch the data and so on...
In each step log, the information will help you for better analysis.
Using that concept you can run the whole system parallel.

Report to database only once from multiple machines

I have a Spring Boot app which has a scheduler that insert data to a remote database at 2 a.m. every day.
#Scheduled(cron = "0 0 2 * * ?")
public void reportDataToDB() {
// code omitted
}
The problem is, the app runs on multiple machines, so the database would receive multiple duplicate insertions of data.
What is the idiomatic way to solve this?

We solved such a problem by using a central scheduler. In our case we use Rundeck, which then calls a URL on our service (by going through the loadbalancer), which then executes the task (in our case data cleanup). This way you can make sure, that the logic is only executed on one instance of the service.

Hold thread in spring rest request for long-polling

As I wrote in title we need in project notify or execute method of some thread by another. This implementation is part of long polling. In following text describe and show my implementation.
So requirements are that:
UserX send request from client to server (poll action) immediately when he got response from previous. In service is executed spring async method where thread immediately check cache if there are some new data in database. I know that cache is usually used for methods where for specific input is expected specific output. This is not that case, because I use cache to reduce database calls and output of my method is always different. So cache help me store notification if I should check database or not. This checking is running in while loop which end when thread find notification to read database in cache or time expired.
Assume that UserX thread (poll action) is currently in while loop and checking cache.
In that moment UserY (push action) send some data to server, data are stored in database in separated thread, and also in cache is stored userId of recipient.
So when UserX is checking cache he found id of recipient (id of recipient == his id in this case), and then break loop and fetch these data.
So in my implementation I use google guava cache which provide manually write.
private static Cache<Long, Long> cache = CacheBuilder.newBuilder()
.maximumSize(100)
.expireAfterWrite(5, TimeUnit.MINUTES)
.build();
In create method I store id of user which should read these data.
public void create(Data data) {
dataRepository.save(data);
cache.save(data.getRecipient(), null);
System.out.println("SAVED " + userId + " in " + Thread.currentThread().getName());
}
and here is method of polling data:
#Async
public CompletableFuture<List<Data>> pollData(Long previousMessageId, Long userId) throws InterruptedException {
// check db at first, if there are new data no need go to loop and waiting
List<Data> data = findRecent(dataId, userId));
data not found so jump to loop for some time
if (data.size() == 0) {
short c = 0;
while (c < 100) {
// check if some new data added or not, if yes break loop
if (cache.getIfPresent(userId) != null) {
break;
}
c++;
Thread.sleep(1000);
System.out.println("SEQUENCE: " + c + " in " + Thread.currentThread().getName());
}
// check database on the end of loop or after break from loop
data = findRecent(dataId, userId);
}
// clear data for that recipient and return result
cache.clear(userId);
return CompletableFuture.completedFuture(data);
}
After User X got response he send poll request again and whole process is repeated.
Can you tell me if is this application design for long polling in java (spring) is correct or exists some better way? Key point is that when user call poll request, this request should be holded for new data for some time and not response immediately. This solution which I show above works, but question is if it will be works also for many users (1000+). I worry about it because of pausing threads which should make slower another requests when no threads will be available in pool. Thanks in advice for your effort.

Check Web Sockets. Spring supports it from version 4 on wards. It doesn't require client to initiate a polling, instead server pushes the data to client in real time.
Check the below:
https://spring.io/guides/gs/messaging-stomp-websocket/
http://www.baeldung.com/websockets-spring
Note - web sockets open a persistent connection between client and server and thus may result in more resource usage in case of large number of users. So, if you are not looking for real time updates and is fine with some delay then polling might be a better approach. Also, not all browsers support web sockets.
Web Sockets vs Interval Polling
Longpolling vs Websockets
In what situations would AJAX long/short polling be preferred over HTML5 WebSockets?
In your current approach, if you are having a concern with large number of threads running on server for multiple users then you can trigger the polling from front-end every time instead. This way only short lived request threads will be triggered from UI looking for any update in the cache. If there is an update, another call can be made to retrieve the data. However don't hit the server every other second as you are doing otherwise you will have high CPU utilization and user request threads may also suffer. You should do some optimization on your timing.
Instead of hitting the cache after a delay of 1 sec for 100 times, you can apply an intelligent algorithm by analyzing the pattern of cache/DB update over a period of time.
By knowing the pattern, you can trigger the polling in an exponential back off manner to hit the cache when the update is most likely expected. This way you will be hitting the cache less frequently and more accurately.

Spring and background thread execution

I have a Spring Boot 1.3.5 web application (Running on Tomcat 8), one of its features is to contact a third-party API through REST and launch many lenghty jobs (From 1 to around maybe 30 depending on the user input, each one with its own REST call in a for loop). I have all this logic in a controller called using a POST with some parameters.
What I need is to launch a background task after each job has been acknowledged by the API, which would be passed some parameter (Job ID) and periodically (~30 s) poll another API to fetch the job output (Again, these jobs may take from several seconds up to an hour, and getting its job takes about 3-4 seconds plus parsing a long string) and do some business logic based on their status (Updating a DB record for now)
However I'm not sure which, if any, TaskExecutor to use, or whether I should use Java's Future structures for this. I might benefit from a Thread pool which will only run X threads parallel and queue others to not overload the server. Is there an example I can take to learn and start off?
Sample of my existing code:
#RequestMapping(value={"/job/launch"}, method={RequestMethod.POST})
public ResponseEntity<String> runJob(HttpServletRequest req) {
for (int deployments=1; deployments <= deployments_required; deployments++) {
httpPost.setEntity((HttpEntity)new StringEntity(jsonInput));
CloseableHttpResponse response = httpclient.execute(httpPost);
HttpEntity entity = response.getEntity();
responseString = EntityUtils.toString(entity, "UTF-8");
JsonObject jsonObject = new JsonParser().parse(responseString).getAsJsonObject();
if (response.getStatusLine().getStatusCode() != 200) {
resultsNotOk.add(new ResponseEntity<String>(jsonObject.get("message").getAsString(), HttpStatus.INTERNAL_SERVER_ERROR));
continue;
}
String deploymentId;
deploymentId = jsonObject.get("id").getAsString();
// Start background task to keep checking the job every few seconds and find created instance IP addresses
start_checking_execution(deploymentId);
}
}
(Yes, this code may be better put in a Service but it was originally built as is so I haven't moved it yet. It may be a good time to do it now)

I would say it's work for Spring Batch
You can define Reader/Processor (to convert source read to target write objects)/Writer to work with the the logic
You can use JobOperator to get job state. See job status transitions

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.