SimGrid. Asynchronous communications and failing links

SimGrid. Asynchronous communications and failing links - java

Simulation has one master and seven workers. When workers end to execute data, they dsend messageTasks to master about completion of execution.
getHost().setProperty("busy", "no");
ReleaseTask releaseTask = new ReleaseTask(getHost().getName());
releaseTask.dsend("Master");
The link connects worker1 and master is broken. It is link1.fail file.
PERIODICITY 2
0 1
1 0
I expected that only one releaseTask (from worker1) can't reach master. But, unfortunately, no releaseTasks (from other workers) achieve master. This error-warning appears:
[13.059397] /builds/workspace/SimGrid-Multi/build_mode/Debug/node/simgrid-ubuntu-trusty-64/build/SimGrid-3.13/src/simix/smx_global.cpp:554: [simix_kernel/CRITICAL] Oops ! Deadlock or code not perfectly clean.
[13.059397] [simix_kernel/INFO] 16 processes are still running, waiting for something.
Master receive task in such way:
Task listenTask = Task.receive("Master");
When link connects worker1 and master isn't broken, all simulation works fine.
How can I avoid this problem?
UPDATED
My platform.xml file:
<link id="0_11" state_file="linkfailures/0_11.fail" bandwidth="3.430125Bps" latency="4.669142ms"/>
0_11.fail file:
PERIODICITY 2
0 1
1 0
Worker starts to dsend a MessageTask to master at 6.94 s. MessageTask transmission time is 0.07 sec. But at 7.00 s. the link which connects master and worker starts to be broken. I guess master continues timeless "receiving" data and error occurs. But how to handle it?

If you send your data with dsend, it only means that you don't care of whether the receiver gets it or whether an error occurs. It does not make the communication more robust (nor less robust either).
You updated your question, giving two possible outcomes to your simulation. Sometimes you say that no communication makes it to master and that the simulation ends when SimGrid reports a deadlock (16 processes are still running, waiting for something), and sometimes you report a that a TransferFailureError is occurring. But actually, that's exactly what is expected in your case, if I'm right.
Here is what happens:
you send a message with dsend
the message get lost because the link fails. Nope, it does not take for ever to deliver because the link fails, it just disappear immediately.
At this point there is two possible outcomes, depending on whether the link fails before or after the communication starts (before or after the receiver posts its recv).
If the link fails before the time where the receiver (the master in your case, it seems) posts its recv request, then the failure will not be noticed. Indeed, there is no receiver yet to inform and the sender said that it does not care about the communication outcome, by using a dsend.
If the link fails after the time where the receiver posts its request, then the sender does not notice anything (because of the dsend), and the receiver gets a TransferFailureException on its receive action. So the failing communication is killing someone even if you sent it with dsend, but actually that's the master who dies. That is why the other slaves cannot communicate with the master: he got an uncatched exception while receiving something from the fishy host.
If you want the sender to notice that your message did not went through (to resend it maybe), then you don't want to use dsend but isend (for an asynchronous communication) or send (for a blocking communication). And the sender has to pay attention for the status of the communication.
If you want your message to be really delayed but not destroyed, then try changing the bandwidth of the link to 0 for a while (using availability_file instead if state_file).
If you want your receiver to survive such communication issue, just catch the exception it gets.

Related

Queueing a message in JMS, for delayed processing

I have a piece of middleware that sits between two JMS queues. From one it reads, processes some data into the database, and writes to the other.
Here is a small diagram to depict the design:
With that in mind, I have some interesting logic that I would like to integrate into the service.
Scenario 1: Say the middleware service receives a message from Queue 1, and hits the database to store portions of that message. If all goes well, it constructs a new message with some data, and writes it to Queue 2.
Scenario 2: Say that the database complains about something, when the service attempts to perform some logic after getting a message from Queue 1.In this case, instead of writing a message to Queue 2, I would re-try to perform the database functionality in incremental timeouts. i.e Try again in 5 sec., then 30 sec, then 1 minute if still down. The catch of course, is to be able to read other messages independently of this re-try. i.e Re-try to process this one request, while listening for other requests.
With that in mind, what is both the correct and most modern way to construct a future proof solution?
After reading some posts on the net, it seems that I have several options.
One, I could spin off a new thread once a new message is received, so that I can both perform the "re-try" functionality and listen to new requests.
Two, I could possibly send the message back to the Queue, with a delay. i.e If the process failed to execute in the db, write the message to the JMS queue by adding some amount of delay to it.
I am more fond of the first solution, however, I wanted to get the opinion of the community if there is a newer/better way to solve for this functionality in java 7. Is there something built into JMS to support this sort of "send message back for reprocessing at a specific time"?

JMS 2.0 specification describes the concept of delayed delivery of messages. See "What's new" section of https://java.net/projects/jms-spec/pages/JMS20FinalReleaseMany JMS providers have implemented the delayed delivery feature.
But I wonder how the delayed delivery will help your scenario. Since the database writes have issues, subsequent messages processing and attempt to write to database might end up in same situation. I guess it might be better to sort out issues with database updates and then pickup messages from queue.

Handling Failed calls on the Consumer end (in a Producer/Consumer Model)

Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?

Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.

Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.

It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.

I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb

ConcurrentLinkedQueue does'n work as expected

I'm developing an Andoid App that is made of a Service running in Background and some Activities connected to that Service. The Service runs on it's own Process.
My Service mainly has 3 classes: ServiceMain, ServiceWorker,Message.
ServiceMain has all the functions that are used by the Activities like logIn,logOut,send ... and so on.
Message represents a message that is sent to our server or recieved. Simply a String and a bool where the String is the message and the bool is a flag saying if a response from server is needed.
ServiceWorker is a subclass of Thread and does all the sending an recieving of messages using Sockets.
ServiceMain contains 2 Queues:
Queue<Message> Sendingqueue= new ConcurrentLinkedQueue<Message>();
Queue<Message> Recievequeue = new ConcurrentLinkedQueue<Message>();
If the logIn method is called a ServiceWorker is created and started. In it's constructor it gets references to both queues and holds them.
private final Queue<Message> Sendingqueue;
private final Queue<Message> Recievequeue;
ServiceMain then creates some messages (M1,M2 for example) and adds them to the Sendingqueue.
The ServiceWorker builds the connection to our server and then runs into a loop where it looks for messages in Sendingqueue, sends them and doing some othe stuff like recieving ....
Hope the scenario is clear now.
Within ServiceWorker something strange happens on Sendingqueue:
Let's say ServiceMain added two messages, M1 and M2 to Sendingqueue while ServiceWorker is doing something time consuming or is not connected to our server.
Sendingqueue now contains two messages.
If the ServiceWorker next time gets the length of the Queue it sees 2 items. Ok so far.
Then it calls peek() (the message is removed only if it was successfull sent) on the Sendingqueue and should get M1 because it was added first.
But it gets M2.
The Sendingqueue seems to be reverted.
What's going wrong here ? What can I do to avoid this?
Thanks for any constructive reply.
Detlef

ConcurrentLinkedQueue doesn't make any guarantees about order, but the order of elements shouldn't change if you are adding to the end and taking from the start (or visa versa) thsi should work. You could run into a problem if you add and remove from the start or end as this will mean you are processing the newest rather than the oldest each time.
If you had a large powerful server, I would still suggest this approach is overkill. Instead of having a background thread to perform the processing, I would use the main thread.
Note: The socket is already an input and output queue on the client and on the server, so adding a third layer of queuing may be redundant in a large system and inefficient in a smaller device.

Java Async Processing

I am currently developing a system that uses allot of async processing. The transfer of information is done using Queues. So one process will put info in the Queue (and terminate) and another will pick it up and process it. My implementation leaves me facing a number of challenges and I am interested in what everyone's approach is to these problems (in terms of architecture as well as libraries).
Let me paint the picture. Lets say you have three processes:
Process A -----> Process B
|
Process C <-----------|
So Process A puts a message in a queue and ends, Process B picks up the message, processes it and puts it in a "return" queue. Process C picks up the message and processes it.
How does one handle Process B not listening or processing messages off the Queue? Is there some JMS type method that prevents a Producer from submitting a message when the Consumer is not active? So Process A will submit but throw an exception.
Lets say Process C has to get a reply with in X minutes, but Process B has stopped (for any reason), is there some mechanism that enforces a timeout on a Queue? So guaranteed reply within X minutes which would kick off Process C.
Can all of these matters be handled using a dead letter Queue of some sort? Should I maybe be doing this all manually with timers and check. I have mentioned JMS but I am open to anything, in fact I am using Hazelcast for the Queues.
Please note this is more of a architectural question, in terms of available java technologies and methods, and I do feel this is a proper question.
Any suggestions will be greatly appreciated.
Thanks

IMHO, The simplest solution is to use an ExecutorService, or a solution based on an executor service. This supports a queue of work, scheduled tasks (for timeouts).
It can also work in a single process. (I believe Hazelcast supports distributed ExecutorService)

It seems to me that the type of questions you're asking are "smells" that queues and async processing may not be the best tools for your situation.
1) That defeats a purpose of a queue. Sounds like you need a synchronous request-response process.
2) Process C is not getting a reply generally speaking. It's getting a message from a queue. If there is a message in the queue and the Process C is ready then it will get it. Process C could decide that the message is stale once it gets it, for example.

I think your first question has already been answered adequately by the other posters.
On your second question, what you are trying to do may be possible depending on the messaging engine used by your application. I know this works with IBM MQ. I have seen this being done using the WebSphere MQ Classes for Java but not JMS. The way it works is that when Process A puts a message on a queue, it specifies the time it will wait for a response message. If Process A fails to receive a response message within the specified time, the system throws an appropriate exception.
I do not think there is a standard way in JMS to handle request/response timeouts the way you want so you may have to use platform specific classes like WebSphere MQ Classes for Java.

Well, kind of the point of queues is to keep things pretty isolated.
If you're not stuck on any particular tech, you could use a database for your queues.
But first, a simple mechanism to ensure two processes are coordinated is to use a socket. If practical, simply have process B create an open socket listener on some well know port, and process A will connect to that socket, and monitor it. If process B ever goes away, process A can tell because their socket gets shutdown, and it can use that as an alert of problems with process B.
For the B -> C problem, have a db table:
create table queue (
id integer,
payload varchar(100), // or whatever you can use to indicate a payload
status varchar(1),
updated timestamp
)
Then, Process A puts its entry on the queue, with the current time and a status of "B". B, listens on the queue:
select * from queue where status = 'B' order by updated
When B is done, it updates the queue to set the status to "C".
Meanwhile, "C" is polling the DB with:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
(with the threshold being however long you want things to rot on the queue).
Finally, C updates the queue row to 'D' for done, or deletes it, or whatever you like.
The dark side is there is a bit of a race condition here where C might try and grab an entry while B is just starting up. You can probably get through that with a strict isolation level, and some locking. Something as simply as:
select * from queue where status = 'C'
or (status = 'B' and updated < (now - threshold) order by updated
FOR UPDATE
Also use FOR UPDATE for B's select. This way whoever win the select race will get an exclusive lock on the row.
This will get you pretty far down the road in terms of actual functionality.

You are expecting the semantics of synchronous processing with async (messaging) setup which is not possible. I have worked on WebSphere MQ and normally when the consumer dies, the messages are kept in the queue forever (unless you set the expiry). Once the queue reaches its depth, the subsequent messages are moved to the dead letter queue.

I've used a similar approach to create a queuing and processing system for video transcoding jobs. Basically the way it worked was:
Process A posts a "schedule" message to Arbiter Q, which adds the job into its "waiting" queue.
Process B requests the next job from Arbiter Q, which removes the next item in its "waiting" queue (subject to some custom scheduling logic to ensure that a single user couldn't flood transcode requests and prevent other users from being able to transcode videos) and inserts it into its "processing" set before returning the job back to Process B. The job is timestamped when it goes into the "processing" set.
Process B completes the job and posts a "complete" message to Arbiter Q, which removes the job from the "processing" set and then modifies some state so that Process C knows the job completed.
Arbiter Q periodically inspects the jobs in its "processing" set, and times out any that have been running for an unusually long amount of time. Process A is then free to attempt to queue up the same job again, if it wants.
This was implemented using JMX (JMS would have been much more appropriate, but I digress). Process A was simply the servlet thread which responded to a user-initiated transcode request. Arbiter Q was an MBean singleton (persisted/replicated across all the nodes in a cluster of servers) that received "schedule" and "complete" messages. Its internally managed "queues" were simply List instances, and when a job completed it modified a value in the application's database to refer to the URL of the transcoded video file. Process B was the transcoding thread. Its job was simply to request a job, transcode it, and then report back when it finished. Over and over again until the end of time. Process C was another user/servlet thread. It would see that the URL was available, and present the download link to the user.
In such a case, if Process B were to die then the jobs would sit in the "waiting" queue forever. In practice, however, that never happened. If your Process B is not running/doing what it is supposed to do then I think that suggests a problem in your deployment/configuration/implementation of Process B more than it does a problem in your overall approach.

Greedy threads are grabbing too many JMS messages under WebLogic

We encountered a problem under WebLogic 8.1 that we lived with but could never fix. We often queue up a hundred or more JMS messages, each of which represents a unit of work. Despite the fact that each message is of the same size and looks the same, one may take only seconds to complete while the next one represents 20 minutes of solid crunching.
Our problem is that each of the message driven beans we have doing the work of these messages ends up on a thread that seems to grab ten messages at a time (we think it is being done as a WebLogic optimization to keep from having to hit the queue over and over again for small messages). Then, as one thread after another finishes all of its small jobs and no new ones come in, we end up with a single thread log jammed on a long running piece of work with up to nine other items sitting waiting on it to finish, despite the fact that other threads are free and could start on those units of work.
Now we are at a point where we are converting to WebLogic 10 so it is a natural point to return to this problem and find out if there is any solution that we could implement so that either: a) each thread only grabs one JMS message at a time to process and leaves all the others waiting in the incoming queue, or b) it would automatically redistribute waiting messages (even ones already assigned to a particular thread) out to free threads. Any ideas?

Enable the Forward Delay and provide an appropriate value. This will cause the JMS Queue to redistribute messages to it's peers if they have not been processed in the configured time.
Taking a single message off the queue every time might be overkill - It's all a balance on the number of messages you are processing and what you gauge as an issue.
There are also multiple issues with JMS on WebLogic 10 depending on your setup. You can save yourself a lot of time and trouble by using the latest MP right from the start.

when a Thread is in 'starvation' after getting the resources they can able to execute.The threads which are in starvation called as "greedy thread"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.