Spring micro-services: Kafka event processing issue [closed]

Spring micro-services: Kafka event processing issue [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 months ago.
Improve this question
I'm new to micro-services and need some suggestion on how to address the below issue.
I have two micro-services order micro-service and delivery micro-services. The order micro-service pushes the event to delivery micro-service through Kafka.
I need some assistance in
How do I track a particular event in both micro-services? This will help in logging and tracking the event changes etc.
For this can I generate a random number and add it to the payload which can be used for tracking.
Let's assume that order micro-service has 10 orders and all the ten have been processed and 10 events are generated and are pushed to the Kafka? If there is a failure, how should I handle it?
I thought of creating an error queue for order service and if there are any errors while processing the events from the order micro-service side it can be pushed there. The user can correct the issue and re-try only those events.
At the delivery micro-service how do I ensure that the a particular event is not processed more than once?
Also, if there any errors while processing the events from the delivery micro-service side, how should this be handled?
Same as second point?
Are there any more scenarios which I need to consider?

How do I track a particular event in both micro-services. This will help in logging and tracking the event changes etc.
For that, usually what is used in prod environments is implementing a traceId.
Let us assume that Order micro-service has 10 orders and all the ten have been processed and 10 events are generated and are pushed to the Kafka? If there is a failure how should I handle it?
Generally what is implemented is a retry template (if you are using spring-boot, otherwise something similar in whichever framework you are working with). What this does is that it retries for a set number of times, handling any possible transitive errors. If the error still persists there are multiple possibilities. You could log appropriately and reprocess from that given offset, or in a less critical environment you could save the event as a file and build an Admin RESTful Endpoint which takes the event and processes it in the same way you process the kafka events.
At the delivery micro-service how do I ensure that the a particular event is not processed more than once.
This depends on the logic you perform per kafka message. If you are creating new data (inserting rows) in your db then you could simply check for uniqueness based on primary key (which in this case would be using order-id or smth similar) and refuse to add the new data, by logging it with a warn message. Otherwise, a common way to do this would be to overwrite the previously inserted data. Both ways should be as performant as processing always unique records if you leave the uniqueness to the db and properly handle the exception.
Another workaround towards uniqueness includes editing the producer to send only unique messages. Check this out.
Also, if there any errors while processing the events from the delivery micro-service side, how should this be handled?
See no 2.

Related

How to delay the execution of a timed task for two or more years

In a microservice architecture, suppose there is a business scenario where a user purchases something that will expire after two years, and the system needs to notify the user a little bit in advance.
In this case, how should we handle the situation so that the users can be notified on time even if there are many users who need to be notified?
For example, using a delayed queue of message queue will cause the messages to pile up when there are many users; using a timed task, too many users will overload the server CPU.
Is there a good way to do this?

While "microservices" do not inherently mean "REST", they usually are. And in REST you shouldn't store in memory anything that needs to survive more than one request. Two years is an extreme case, but even if it is for just 10 minutes, it should probably go to the DB.

Building up a queue for two years will just be very impractical and likely to fail if the queue contents are not persisted somewhere. Since you mention purchases I am assuming you have some sort of data store to record them either in sql or no-sql.
You can simply add purchase date/time column(s) to the table to make life easier. If you volumes are low enough for daily purchases then I would start with date based lookup only. You will need a scheduled execution of some service method say at 6am everyday that looks up purchases close to expiry i.e 7 days before 2 years purchase_date = now - 723days and then send rest request somewhere or publishes an event or jms message with order number and purchase_date as content for each purchase order. This will then be picked up by event/message listener somewhere and processed accordingly i.e. send a notification to customer. To avoid sending duplicate notifications you should also persist the expiry notifications in a database and ensure you check that notification has been sent for purchase id before sending it again.
If you ever reach a situation where you are processing thousands of orders a day and don't want to publish large number of events in one go then extend the functionality to filter by purchase timestamp and process chunks of purchases multiple times a day by changing the lookup condition.
This is just general idea of such requirement and you will have to fine-grain a lot of implementation details such as what happens if your email server is down.

You can use quartz job and configure it to use persistent mode in database (JDBC JobStore) to not loose information and also it is suitable for clustering mode.
Quartz checks periodically the database for the nearest task (configurable parameter) if the time comes, it will process the notification.
You can configure the thread pool size in order to avoid overload.

Efficiently insert data in to database in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm building an application that has a database that solely will be used for logging purpose. We log the incoming transaction id and its start and end time. There is no use for the application itself from this database. Hence I want to execute this insert query as efficient as possible without affecting the application itself. My idea is to execute the whole database insert code in a separate thread. So in this way, the database insert will run without interfering the actual work. I would like to know whether there is any design patter related to this kind of scenario. Or else whether my thinking pattern is correct for this.

Your thinking pattern is right. Post your generated data from your main thread(s) into a safe-for-multi-threading blocking queue, and have the logging thread loop block waiting for a message to appear in the queue, then sending that message to the database and repeating.
If there is a chance, however small, that your application may be generating messages faster than your logging thread can process them, then consider giving the queue a maximum capacity, so that the application gets blocked when trying to enqueue a message in the event that the maximum capacity is reached. This will incur a performance penalty, but at least it will be controlled, whereas allowing the queue to grow without a limit may lead to degraded performance in all sorts of other unexpected and nasty ways, and even to out-of-memory errors.
Be advised, however, that plain insert operations (with no cursors and no returned fields) are quite fast as they are, so the gains from using a separate thread might be negligible.
Try running a benchmark while doing your logging a) from a separate logging thread as per your plan, and b) from within your main thread, and see whether it makes any difference. (And post your results here if you can, they would be interesting for others to see.)

From my point of view, the best idea is to make an Java + RabbitMq broker + Background process architecture.
For example:
Java process enqueued a JSON message in RabbitMq queue. This step can be done asynchronously through ExecutorService class if you want a thread pool. Anyway, this task can be done synchrounously due to high enqueue speed of RabbitMq.
Background process connects to queue that contains messages and start to consuming them. This process task is to read and intrepret message by message and make the insert in database with its content information.
This way, you will have two separate processes and database operations won't affect main process.

executing millions of thread concurrently in java [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have a requirement to handle millions of thread and i know its quite dependent on the hardware configuration and jvm.
I have used executors for the task
call flow of my project :
user(mobile)----->Server(Telecom) ------>Application----->Server(Telecom)----->User
Code call flow :
A------>B---------->C
//Code snippet of A
public static final int maxPoolSize=100;
ExecutorService executorCU=Executors.newFixedThreadPool(maxPoolSize);
Runnable handleCalltask=new B(valans, sessionID, msisdn);
executorCU.execute(handleCalltask);
//Code snippet of B
public static final int maxPoolSize=10;
ExecutorService executor=Executors.newFixedThreadPool(maxPoolSize);
Runnable handleCalltask=new c(valans, sessionID, msisdn);
executor.execute(handleCalltask);
and there are shared map which i implemented as concurrencyHashMap which gets loaded at the loading of application.
Is my approach is correct and if not can anybody suggest how i can achieved maximum threading in my web application.
I have tested with Jmeter and its result are not at all encouraging.
Thanks.

Is my approach is correct
IMO, no, it's definitely not the correct approach.
and if not can anybody suggest how i can achieved maximum threading in my web application.
Separate receiving messages from the client with processing the messages. That way, you can horizontally scale the two parts independently to meet your requirements without having millions of threads in a single JVM.
A few suggestions:
1) I'd make the web application as light as possible and submit any long running tasks to some sort of backend processor.
Within the same JVM, you could use a ThreadPoolExecutor with an ArrayBlockingQueue.
If you wanted to submit the jobs to another JVM, you could use JMS with competing consumers or something like Apache Kafka.
Again the benefit here is that you can add more nodes to either the backend or frontend of the app as required.
2) If required, make your application server's thread pool larger.
For instance, with Tomcat you'd tweak the parameters described here: http://tomcat.apache.org/tomcat-7.0-doc/config/executor.html. Explaining how to correctly tune these parameters is more than I can describe here. Among other things, the values you select will depend on the average number of concurrent requests, the maximum number of concurrent requests, the time required to serve a single request, and the number of application servers in your pool.
3) You'll get the most scalability by reducing statefulness.
If a request can be dispatched to any front end consumer and then processed by any backend consumer, you can add more instances of either to scale. If one request depends on another, you'll need to synchronize the processing of requests across nodes, which reduces scalability. Design things to be stateless from the start if at all possible.
I have tested with Jmeter and its result are not at all encouraging.
You need to profile your application to determine where the hot spots are. If you follow my recommendations above, you can easily add more horsepower where required.

Design for scalable periodic queue message batching

We currently have a distributed setup where we are publishing events to SQS and we have an application which has multiple hosts that drains messages from the queue and does some transformation over it and transmits to interested parties. I have a use case where the receiving end point has scalability concerns with the message volume and hence we would like to batch these messages periodically (say every 15 mins) in the application before sending it.
The incoming message rate is around 200 messages per second and each message is no more than 10 KB. This system need not be real time, but would definitely be a good to have and also the order is not important (its okay if a batch containing older messages gets sent first).
One approach that I can think of is maintaining an embedded database within the application (each host) that batches the events and another thread that runs periodically and clears the data.
Another approach could be to create timestamped buckets in a a distributed key-value store (s3, dynamo etc.) where we write the message to the correct bucket based the messages time stamp and we periodically clear the buckets.
We can run into several issues here, since the messages would be out of order a bucket might have already been cleared (can be solved by having a default bucket though), would need to accurately decide when to clear a bucket etc.
The way I see it, at least two components would be required one which does the batching into a temporary storage and another that clears it.
Any feedback on the above approaches would help, also it looks like a common problem are they any existing solutions that I can leverage ?
Thanks

Generic QoS Message batching and compression in Java

We have a custom messaging system written in Java, and I want to implement a basic batching/compression feature that basically under heavy load it will aggregate a bunch of push responses into a single push response.
Essentially:
if we detect 3 messages were sent in the past second then start batching responses and schedule a timer to fire in 5 seconds
The timer will aggregate all the message responses received in the next 5 seconds into a single message
I'm sure this has been implemented before I'm just looking for the best example of it in Java. I'm not looking for a full blown messaging layer, just the basic detect messages per second and schedule some task (obviously I can easily write this myself I just want to compare it with any existing algorithms to make sure I'm not missing any edge cases or that I've simplified the problem as much as possible).
Are there any good open source examples of building a basic QoS batching/throttling/compression implementations?

we are using a very similar mechanism for high load.
it will work as you described it
* Aggregate messages over a given interval
* Send a List instead of a single message after that.
* Start aggregating again.
You should watch out for the following pitfalls:
* If you are using a transacted messaging system like JMS you can get into trouble because your implementation will not be able to send inside the JMS transaction so it will keep aggregating. Depending on the size of your data structure to hold the messages this can run out of space. If you are have very long transactions sending many messages this can pose a problem.
* Sending a message in such a way will happen asynchronous because a different thread will be sending the message and the thread calling the send() method will only put it in the data structure.
* Sticking to the JMS example you should keep in mind that they way messages are consumed is also changed by this approach. Because you will receive the list of messages from JMS as a single message. So once you commit this single JMS message you commited the entire list of messages. You should check if this i a problem to your requirements.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.