I have one Server and multiple clients. With some period, clients sends an alive packet to Server. (At this moment, Server doesn't respond alive packets). The period may change device to device and configurable at runtime, for both Server and Clients. I want to generate an alert when one or more clients doesn't send the alive packet. (One packet or two in row etc.). This aliveness is used other parts of application so the quicker notice is the better. I came up some ideas but I couldn't select one.
Create a task that checks every clients last alive packet timestamps with current time and generate alert or alerts. Call this method in some period which should be smaller than minimum client-period.
Actually that seems better to me, however this way unnecessarily I check some clients alive. (Ex: If clients period are change 1-5 minute, task should be run in every minute at least, so I check all clients above 2 minute period is redundant). Also if the minimum of client periods is decrease, I should decrease the tasks period also.
Create a task for each clients, and check the last alive packet timestamps with current time, sleep for one client's period time.
In this way, if clients number goes very high, there will be dozens of task. Since they will sleep most of the time, I still doubt this is more elegant.
Is there any idiom or pattern for this kind of situation? I think watchdog kind implementation is suite well, however I didn't see something like in Java.
Approach 2 is not very useful as it is vague idea to write 100 task for 100 clients.
Approach 1 can be optimized if you use average client-period instead of minimum.
It depends on your needs.
Is it critical if alert is generated few seconds later (or earlier) than it should be?
If not then maybe it's worth grouping clients with nearby heartbeat intervals and run the check against not a single client but the group of clients? This will allow to decrease number of tasks (100 -> 10) and increase number of clients handled by single task (1 -> 10).
First approach is fine.
Only thing I can suggest you is that create an independent service to do this control. If you set this task as a thread in your server, it wouldn't be that manageable. Imagine your control thread is broken, killed etc, how would you notice? So, build an independent OS service, another java program, to check last alive timestamps periodically.
In this way you can easily modify and restart your service and see its logs separately. According to its importance, you may even built a "watchdog of watchdog" service too.
Related
I have a typical kafka consumer/producer app that is polling all the time for data. Sometimes, there might be no data for hours, but sometimes there could be thousands of messages per second. Because of this, the application is built so it's always polling, with a 500ms duration timeout.
However, I've noticed that sometimes, if the kafka cluster goes down, the consumer client, once started, won't throw an exception, it will simply timeout at 500ms, and continue returning empty ConsumerRecords<K,V>. So, as far as the application is concerned, there is no data to consume, when in reality, the whole Kafka cluster could be unreachable, but the app itself has no idea.
I checked the docs, and I couldn't find a way to validate consumer health, other than maybe closing the connection and subscribing to the topic every single time, but I really don't want to do that on a long-running application.
What's the best way to validate that the consumer is active and healthy while polling, ideally from the same thread/client object, so that the app can distinguish between no data and an unreachable kafka cluster situation?
I am sure this is not the best way to achieve what you are looking for.
But one simple way which I had implemented in my application is by maintaining a static counter in the application indicating emptyRecordSetReceived. Whenever I receive an empty record set by the poll operation I increment this counter.
This counter was emitted to the Graphite at periodic interval (say every minute) with the help of the Metric registry from the application.
Now let's say you know the maximum time frame for which the message will not be available to consume by this application. For example, say 6 hours. Given that you are polling every 500 Millisecond, you know that if we do not receive the message for 6 hours, the counter would increase by
2 poll in 1 second * 60 seconds * 60 minutes * 6 hours = 43200.
We had placed an alerting check based on this counter value reported to Graphite. This metric used to give me a decent idea if it is a genuine problem from the application or something else is down from the Broker or producer side.
This is just the naive way I had solved this use case to some extent. I would love to hear how it is actually done without maintaining these counters.
For setting up the timeouts while making REST calls we should specify both these parameters but I'm not sure why both and exactly what different purpose they serve. Also, what if we set only one of them or both with different value?
CONNECT_TIMEOUT is the amount of time it will wait to establish the connection to the host. Once connected, READ_TIMEOUT is the amount of time allowed for the server to respond with all of the content in a give request.
How you set either one will depend on your requirements, but they can be different values. CONNECT_TIMEOUT should not require a large value, because it is only the time required to setup a socket connection with the server. 30 seconds should be ample time - frankly if it is not complete within 10 seconds it is too long, and the server is likely hosed, or at least overloaded.
READ_TIMEOUT - this could be longer, especially if you know that the action/resource you requested takes a long time to process. You might set this as high as 60 seconds, or even several minutes. Again, this depends on how critical it is that you wait for confirmation that the process completed, and you'll weigh this against how quickly your system needs to respond on its end. If your client times out while waiting for the process to complete, that doesn't necessarily mean that the process stopped, it may keep on running until it is finished on the server (or at least, until it reaches the server's timeout).
If these calls are directly driving an interface, then you may want much lower times, as your users may not have the patience for such a delay. If it is called in a background or batch process, then longer times may be acceptable. This is up to you.
I've built a server application in java, where clients can connect . I've implemented a heartbeat system where the client is sending every x seconds a small message.
On the server side I save in a HashMap the time the client has sent the message , and I use a TimerTask for every client to check every x seconds if I received any message from the client.
Everything works ok for a small amount of client, but after the number of clients increase (2k+) the memory amount is very big, plus the Times has to deal with a lot of TimerTask and the program start to eat a lot of CPU.
Is there a better way to implement this? I thought about using a database and make a select the clients that didn't sent any update in a certain amount of time.
Do you think this will work better, or is a better way of doing this.
Few random suggestions:
Instead of one timer per each client, have only one global timer that examines the map of received heartbeats quite often (say 10 times per second). Iterate over that map and find dead clients. Remember about thread-safety of shared data structure!
If you want to use database, use a lightweight in-memory DB like h2. But still sounds like an overkill.
Use cache or some other expiring map and be notified every time something is evicted. This way you basically put something in the map when a client sends a heartbeat and if nothing happened with that entry within given amount of time, the map implementation will remove it, calling some sort of listener.
Use actor-based system like Akka (has Java API). You can have one actor on the server side that handles one client. It's much more efficient than one thread/timer.
Use a different data structure, e.g. a queue. Every time you receive a heartbeat, you remove client from the queue and put it back at the end. Now periodically check only the head of the queue, which should always contain the client with oldest heartbeat.
I intend to make a service where in people could submit tasks(specifically transcoding tasks) to the system and they should get serviced soon but at the same time it should not starve anyone else, ie it must be fair. If a person submits 2000 tasks the system should not cater to only him all the time but instead do a round robin or something like that among other people's requests...
Are there any solutions available? I looked at rabbitMQ and other messaging systems but they don't exactly cater to my problem. How are fair task queues implemented?
I would implement like this:
Have a queue listener on a queue which when a message arrives checks the last time a task from the given user was received; if the time < 1 sec put it on queue 1, if time < 10 seconds put on queue 2, if time < 100 seconds put on queue 3, else put on queue 4. You would then have listeners on the 4 queues that would be processing the tasks.
Of course you can change the number of queues and change the times to match the best throughput. Ideally you want your queues to be busy all the time.
I don't think this behavior exists natively but I could see it being implemented with some of RabbitMQ's features.
http://www.rabbitmq.com/blog/2010/08/03/well-ill-let-you-go-basicreject-in-rabbitmq/
That would let you reject messages and requeue them. You would then have to write a utility that can choose to execute or requeue messages based on some identifying property of the message (in this case, the report requester, which is custom to your app). Conceivably you could design the policy entirely around the routing key if it contains the ID of the user you are trying to throttle.
Your policy could be structured using
responding with basic.reject
using {requeue=true}
Hopefully this helps!
Is it possible to limit the number of JMS receiver instances to a single instance? I.e. only process a single message from a queue at any one time?
The reason I ask is because I have a fairly intensive render type process to run for each message (potentially many thousands). I'd like to limit the execution of this code to a single instance at a time.
My application server is JBoss AS 6.0
You can configure the queue listener pool to have a single thread, so no more than one listener is handling requests, but this makes no sense to me.
The right answer is to tune the size of the thread pool to balance performance with memory requirements.
Many thousands? Per second, per minute, per hour? The rate at which they arrive, and the time each task takes, are both crucial. How much time, memory, CPU per request? Make sure you configure your queue to handle what could be a rather large backlog.
UPDATE: If ten messages arrive per second, and it takes 10 seconds for a single listener to process a message, then you'll need 101 listener threads to be able to keep up. (10 messages/second * 10 seconds means 100 messages arrive by the time the first listener finishes its 10 second task. The 101st listener will handle the 101st message, and subsequent listeners will finish in time to keep up.) If you need 1 MB of RAM per listener, you'll need 101 MB RAM just to process all the messages on one server. You'll need a similar estimate for CPU as well.
It might be wise to think about multiple queues on multiple servers and load balancing between them if one server isn't sufficient.