I have a service which is deployed as microservice and a mongodb with some documents with "few" states, for example: READY, RUNNING, COMPLETED. I need to pick the documents with state "READY" and then process them. But with multiple instances running there is high possibility of processing the "duplicates". I have seen the below thread, but it is only concerned about one instance only picking up tasks.
Spring boot Webservice / Microservices and scheduling
Above talks about solution using Hazlecast and mongodb. But what I am looking at is that all instances wait for the lock, get their own "documents (non-duplicates) and process them. I have checked the various documents and unfortunately I am not able to find any solution.
One of the option I thought is to introduce Kafka, where we can "assign" specific tasks to specific consumers. But before opting would like to see if we any solutions which can be implemented using simple methods such as database locks etc. Any pointers towards this are highly appreciated.
Related
I've spent some time today on stackoverflow.com and haven't found an answer for my challenge.
The challenge
I would like to create a Spring Boot based microservice I can scale easily. That microservice will be writing some entries (e.g. Product) to database and reading them. This service will be deployed as a k8s deployment . Microservice saves data in AWS RDS MySQL instance. I will be scaling the microservice by embedded k8s Deployment mechanisms.
Questions
How can I create an spring boot based app that will handle saving data across many threads and across many instances of the app?
I've read some posts that in that case there should be some queue and from the queue only once instance and one thread should save the data but I guess it's cumbersome. I expect that there will be more and more traffic and as a consequence -> more and more messages in a queue to process.
Can you recommend some books ideally about that problem (in my words "multithreading and multi-instances writes synchronization problem).
Thank you for any help.
I'm currently working on a web application which needs to import data and do some processing. This can take some time (probably in the "several minutes" range, once the data sets grow), so we're running it in the background - and now the time has come to show status in the frontend, instead of tailing log files :)
The frontend is using Angular, hooked up to REST endpoints (JAX-RS) calling services in EJBs that do persistance via JPA. Running in JBoss EAP 6.4 / AS 7.5 (EE6). Standard stuff, but this is the first time I'm dealing with Java EE.
With regards to querying status, polling a REST endpoint periodically is fine - we don't need fancy stuff like websockets. We do need to support multiple background jobs, though, and information consisting of runstate (running/done/error), progress and list of errors.
So, I current have two questions:
1) Is there a more suitable way of launching a background task than calling a #Asynchronous EJB method?
2) Which options do I have for keeping track of the background tasks, and which is most suitable?
My first idea was to keep a HashMap, but that quckly ended up looking like too much manual (and fragile-looking) code with concurrency and lifetime concerns - and I prefer not reinventing the wheel. The safe choice seems to be JPA persisting it, but that seems somewhat clumsy for volatile status information.
I'm obviously not the first person facing these issues, but my google-fu seems to be lacking at the moment :)
The tasks could be launched using #Asynchronous or by using JMS #MessageDriven
From java-ee-7 ManagedExecutorService is also an option.
The tasks would then update their own state that is stored in a ConcurrentHashMap inside a #Singleton EJB.
If you are in a clustered environment, state of tasks is better stored using JPA, as #Singleton is not for whole cluster
I am trying to determine the best way to implement handling long running batch jobs in Spring MVC. I come across Akka in my searching as a non blocking framework for aync processing, which is preferred because I don't want the batch processing to eat up all the threads from the thread pool.
Essentially what I will be doing is have a job that needs to run on some set schedule that will go out and call various web services, process the data, and persist it.
I have seen some code example with using it with Spring, but I've never seen it used with a CRON type scheduler. It always seems to be using a fixed time period.
I'm not sure if this is even the best approach to handling large scale batch processing within Spring. Any suggestions or links to good Akka Spring resources are welcome.
I would suggest you to look into Spring Integration and Spring Batch projects. The first one allows you configure chains of services using EIP. We used it in or project to fetch files from FTP, deserialize and process them, import into DB, send emails if required etc. - all by schedule. The second one is more straightforward and basically provides a framework to work on rows of data. Both are configurable with Quartz and integrate into Spring MVC project nicely.
We have a component called Workflow which exposes SOAP web service. We are trying to introduce a asynchronous processing in Workflow by allowing it to consume messages from WebSphere MQ. We also want to utilize multiple instances of Workflow. So there can be 4 instances of Workflow listening to same queue. The problem here is, how to make sure all Workflow instances are utilized evenly and not single instance is overloaded.
Workflow is completely written in Java. We use Spring and Hibernate extensively. The processes which will be submitting message to Workflow are written in Java. For message processing and MQ, we use Spring Integration.
The best way to ensure that no Workflow instance is overloaded is to have each individual Workflow instance not consume a message from the message queue that will overload it. In this case, you may not care whether the work is distributed evenly, as long as all the work gets done promptly.
If you really want to make sure all Workflow instances are used evenly even when your load is so light that you don't need all of the instances, you may need to check whether there's a way of reconfiguring WebSphere MQ to distribute messages on a FIFO basis rather than a LIFO basis, or if WebSphere MQ can't be configured that way, to switch to a different message queue. However, I don't recommend this: the system as a whole can work perfectly fine even if, at low loads, only some of the Workflow instances are utilized, with all being utilized only at high loads.
I currently have a tomcat container -- servlet running on it listening for requests. I need the result of an HTTP request to be a submission to a job queue which will then be processed asynchronously. I want each "job" to be persisted in a row in a DB for tracking and for recovery in case of failure. I've been doing a lot of reading. Here are my options (note I have to use open-source stuff for everything).
1) JMS -- use ActiveMQ (but who is the consumer of the job in this case another servlet?)
2) Have my request create a row in the DB. Have a seperate servlet inside my Tomcat container that always runs -- it Uses Quartz Scheduler or utilities provided in java.util.concurrent to continously process the rows as jobs (uses thread pooling).
I am leaning towards the latter because looking at the JMS documentation gives me a headache and while I know its a more robust solution I need to implement this relatively quickly. I'm not anticipating huge amounts of load in the early days of deploying this server in any case.
A lot of people say Spring might be good for either 1 or 2. However I've never used Spring and I wouldn't even know how to start using it to solve this problem. Any pointers on how to dive in without having to re-write my entire project would be useful.
Otherwise if you could weigh in on option 1 or 2 that would also be useful.
Clarification: The asynchronous process would be to screen scrape a third-party web site, and send a message notification to the original requester. The third-party web site is a bit flaky and slow and thats why it will be handled as an asynchronous process (several retry attempts built in). I will also be pulling files from that site and storing them in S3.
Your Quartz Job doesn't need to be a Servlet! You can persist incoming Jobs in the DB and have Quartz started when your main Servlet starts up. The Quartz Job can be a simple POJO and check the DB for any jobs periodically.
However, I would suggest to take a look at Spring. It's not hard to learn and easy to setup within Tomcat. You can find a lot of good information in the Spring reference documentation. It has Quartz integration, which is much easier than doing it manually.
A suitable solution which will not require you to do a lot of design and programming is to create the object you will need later in the servlet, and serialize it to a byte array. Then put that in a BLOB field in the database and be done with it.
Then your processing thread can just read the contents, deserialize it and work with the ressurrected object.
But, you may get better answers by describing what you need your system to actually DO :)