What is a difference between using a .poll() & .recv() methods in ZeroMQ? - java

In ZeroMQ PUB/SUB mode tutorial, there is a .poll() method:
ZMQ.Poller items = new ZMQ.Poller (1);
while (!Thread.currentThread ().isInterrupted ()) {
byte[] message;
items.poll();
if (items.pollin(0)) {
message = subscriber.recv(0);
System.out.println("received message:" + message);
}
}
This method lets you check the status of a connection. But it seems that the .poll() method is "another" .recv() without timeout!?
Both of them would stick thread/program until data is received.
Using a .poll() just lets a program stuck at .poll() instead of .recv()!?
What do I miss here?

ZeroMQ has introduced two, very different mechanics here:
.recv() method:
The .recv( <aSocketAccessPointINSTANCE>, <ZMQ_DONTWAIT_FLAG> ) is a method, that may, but need not block on an attempt to receive aSocketAccessPoint-delivered message into the hands of the kind user for further processing.
As syntax enforces, coder has to specify, from which ZeroMQ Scalable Formal Communication Archetype's Access Point the method ought try to pick a next FIFO message or a part of a multi-part message.
ZeroMQ ensures atomic delivery of messages: peers shall receive either all message parts of a message or none at all. The total number of message parts is unlimited except by available memory.
An application that processes multi-part messages must use and further investigate the ZMQ_RCVMORE in a zmq_getsockopt(3) option after calling zmq_recvmsg() to determine, if there are further parts to receive.
.poll() method is different ( while it may "shortcut" some logic ):
Before one can use a .poll() method, there are some duties to setup first the details for the .Poller() instance behaviour.
ZeroMQ provides this as a mechanism for multiplexing more and many input/output events over a set of Access Points ( containing both ZeroMQ smart-sockets and ( in more recent API versions ) also standard, plain O/S-sockets. This mechanism mirrors the standard sockets' poll() system call.
Poller() instance can be instructed to .poll() one or several "local" Access Points to ask them about their internal state and can receive { zero | one | more }-answers, depending on the call-setup, actual state of the queried resources and whether a timeout has run off, before any specified event has arrived onto the "local" side of the listed Access Points at all.
The ZeroMQ original API defines for this:
int zmq_poll ( zmq_pollitem_t *items, int nitems, long timeout );
whereas the respective language bindings may re-wrap this API into some sort of a higher level helper method ( not to have manually declare how many records nitems one tries to pass in *items once the sizing of the MUX-events object is known at the runtime, before the low-level API is going to be called -- so re-check the ZeroMQ binding documentation for the exact syntax, exposed to user-code ).
As noted in the O/P, given a .poll() was called with the value of timeout == -1, having just a single Access Point in the *items, the .poll() shall block indefinitely until a requested event has occurred on the specified zmq_pollitem_t set for { ZMQ_POLLIN }, so here, the blocking the user-code effectively mirrors what would .recv() do at that very place. Yet, the respective mechanics are way different.

recv() waits for messages from just 1 ZeroMQ socket, while poll() lets you wait for messages from many ZeroMQ sockets.
The Poller also lets you easily specify a timeout when waiting for messages.
Note, your code seems to miss the needed calls to
items.register( subscriber, ZMQ.Poller.POLLIN );

Related

How to make ZMQ pub client socket buffer messages while sub server socket is down

Given 2 applications where
application A is using a publisher client to contentiously stream data to application B which has a sub server socket to accept that data, how can we configure pub client socket in application A such that when B is being unavailable (like its being redeployed, restarted) A buffers all the pending messages and when B becomes available buffered messages go trough and socket catches up with real time stream?
In a nutshell, how do we make PUB CLIENT socket buffer messages with some limit while SUB SERVER is unavailable?
The default behaviour for PUB client is to drop in mute state, but it would be great if we could change that to a limit sized buffer, is it possible with zmq? or do i need to do it on application level...
I've tried setting HWM and LINGER in my sockets, but if i'm not wrong they are only responsible for slow consumer case, where my publisher is connected to subscriber, but subscriber is so slow that publisher starts to buffer messages (hwm will limit number of those messages)...
I'm using jeromq since i'm targeting jvm platform.
First of all, welcome to the world of Zen-of-Zero, where latency matters most
PROLOGUE :
ZeroMQ was designed by a Pieter HINTJENS' team of ultimately experienced masters - Martin SUSTRIK to be named first. The design was professionally crafted so as to avoid any unnecessary latency. So asking about having a (limited) persistence? No, sir, not confirmed - PUB/SUB Scalable Formal Communication Pattern Archetype will not have it built-in, right because of the added problems and decreased performance and scalability ( add-on latency, add-on processing, add-on memory-management ).
If one needs a (limited) persistence (for absent remote-SUB-side agent(s)' connections ), feel free to implement it on the app-side, or one may design and implement a new ZMTP-compliant such behaviour-pattern Archetype, extending the ZeroMQ framework, if such work goes into stable and publicly accepted state, but do not request the high-performance, latency-shaved standard PUB/SUB having polished the almost linear scalability ad astra, to get modified in this direction. It is definitely not a way to go.
Solution ?
App-side may easily implement your added logic, using dual-pointer circular buffers, working in a sort-of (app-side-managed)-Persistence-PROXY, yet in-front-of the PUB-sender.
Your design may get successful in squeezing some additional sauce from the ZeroMQ internal details in case your design also enjoys to use the recently made available built-in ZeroMQ-socket_monitor-component to setup an additional control-layer and receive there a stream of events as seen from "inside" the PUB-side Context-instance, where some additional network and connection-management related events may bring more light into your (app-side-managed)-Persistence-PROXY
Yet, be warned that
The _zmq_socket_monitor()_ method supports only connection-oriented
transports, that is, TCP, IPC, and TIPC.
so one may straight forget about this in case any of the ultimately interesting transport-classes was planned to be used { inproc:// | norm:// | pgm:// | epgm:// | vmci:// }
Heads up !
There are inaccurate, if not wrong, pieces of information from our Community honorable member smac89, who tried his best to address your additional interest expressed in the comment:
"...zmq optimizes publishing on topics? like if you keep publishing on some 100char long topic rapidly, is it actually sending the topic every time or it maps to some int and sends the int subsequently...?"
telling you:
"It will always publish the topic. When I use the pub-sub pattern, I usually publish the topic first and then the actual message, so in the subscriber I just read the first frame and ignore it and then read the actual message"
ZeroMQ does not work this way. There is nothing as a "separate" <topic> followed by a <message-body>, but rather the opposite
The TOPIC and the mechanisation of topic-filtering works in a very different way.
1) you never know, who .connect()-s:i.e. one can be almost sure the version 2.x till version 4.2+ will handle the topic-filtering in different manner ( ZMTP:RFC defines intial capability-version handshaking, to let the Context-instance decide, which version of topic-filtering will have to be used: ver 2.x used to move all messages to all peers, and let all the SUB-sides ( of ver 2.x+ ) be delivered the message ( and let the SUB-side Context-instance process the local topic-list filter processing )whereasver 4.2+ are sure to perform the topic-list filter processing on **the PUB-side Context-instance (CPU-usage grows, network-transport the opposite ), so your SUB-side will never be delivered a byte of "useless" read "not-subscribed" to messages.
2) (you may, but) there is no need to separate a "topic" into a first-frame of a thus-implied multi-frame message. Perhaps just the opposite ( it is a rather anti-pattern to do this in high performance, low-latecy distributed system design.
Topic filtering process is defined and works byte-wise, from left-to-right, pattern matching for each of the topic-list member value agains the delivered message payload.
Adding extra data, extra frame-management processing just and only does increase the end-to-end latency and processing overhead. Never a good idea to do this instead of proper distributed-system design work.
EPILOGUE :
There are no easy wins nor any low-hanging fruit in professional distributed-systems design, the less if low-latency or ultra-low-latency are the design targets.
On the other hand, be sure that ZeroMQ framework was made with this in mind and these efforts were crowned with stable, ultimately performant well-balanced set of tools for smart (by design), fast (in operation) and scalable (as hell may envy) signaling/messaging services people love to use right because of this design wisdom.
Wish you live happy with ZeroMQ as it is and feel free to add any additional set of features "in front" of the ZeroMQ layer, inside your application suite of choice.
I'm posting a quick update since the other two answers (though very informative were actually wrong), and i dont want others to be misinformed from my accepted answer. Not only you can do this with zmq, it is actually the default behaviour.
The trick is that if you publisher client never connected to the subscriber server before it keeps dropping messages (and that is why i was thinking it does not buffer messages), but if your publisher connects to subscriber and you restart subscriber, publisher will buffer messages until HWM is reached which is exactly what i asked for... so in short publisher wants to know there is someone on the other end accepting messages only after that it will buffer messages...
Here is some sample code which demonstrates this (you might need to do some basic edits to compile it).
I used this dependency only org.zeromq:jeromq:0.5.1.
zmq-publisher.kt
fun main() {
val uri = "tcp://localhost:3006"
val context = ZContext(1)
val socket = context.createSocket(SocketType.PUB)
socket.hwm = 10000
socket.linger = 0
"connecting to $uri".log()
socket.connect(uri)
fun publish(path: String, msg: Msg) {
">> $path | ${msg.json()}".log()
socket.sendMore(path)
socket.send(msg.toByteArray())
}
var count = 0
while (notInterrupted()) {
val msg = telegramMessage("message : ${++count}")
publish("/some/feed", msg)
println()
sleepInterruptible(1.second)
}
}
and of course zmq-subscriber.kt
fun main() {
val uri = "tcp://localhost:3006"
val context = ZContext(1)
val socket = context.createSocket(SocketType.SUB)
socket.hwm = 10000
socket.receiveTimeOut = 250
"connecting to $uri".log()
socket.bind(uri)
socket.subscribe("/some/feed")
while (true) {
val path = socket.recvStr() ?: continue
val bytes = socket.recv()
val msg = Msg.parseFrom(bytes)
"<< $path | ${msg.json()}".log()
}
}
Try running publisher first without subscriber, then when you launch subscriber you missed all the messages so far... now without restarting publisher, stop subscriber wait for some time and start it again.
Here is an example of one of my services actually benefiting from this...
This is the structure [current service]sub:server <= pub:client[service being restarted]sub:server <=* pub:client[multiple publishers]
Because i restart the service in the middle, all the publishers start buffering their messages, the final service that was observing ~200 messages per second observes drop to 0 (those 1 or 2 are heartbeats) then sudden burst of 1000+ messages come in, because all publishers flushed their buffers (restart took about 5 seconds)... I am actually not loosing a single message here...
Note that you must have subscriber:server <= publisher:client pair (this way publisher knows "there is only one place i need to deliver these messages to" (you can try binding on publisher and connecting on subscriber but you will not see publisher buffering messages anymore simply because its questionable if subscriber that just disconnected did it because it no longer needs the data or because it failed)
As we've discussed in the comments there is no way for the publisher to buffer messages while having nothing connected to it, it will simply drop any new messages:
From the docs:
If a publisher has no connected subscribers, then it will simply drop all messages.
This means your buffer needs to be outside of zeromq's care. Your buffer could then be a list, or a database, or any other method of storage you choose, but you cannot use your publisher for doing that.
Now the next problem is dealing with how to detect that a subscriber has connected/disconnected. This is needed to tell us when we need to start reading from the buffer/filling the buffer.
I suggest using Socket.monitor and listening for the ZMQ_EVENT_CONNECTED and ZMQ_EVENT_DISCONNECTED, as these will tell you when a client has connected/disconnected and thus enable you to switching to filling your buffer of choice. Of course, there might be other ways of doing this that does not directly involve zeromq, but that's up to you to decide.

Dilemma of setting a timeout in ZeroMQ

I currently use ZeroMQ with Java binding. My program is in a PUB/SUB mode.
It seems reasonable to set a timeout, while a client can't receive a message from PUB-side.
But for the publish server, who sends messages without a fixed frequency, it's hard to decide a reasonable timeout.
On the other hand, if no timeout is set, then a program would probably stuck at function:
recv()
forever even publish server is dead.
If there a good solution to fix this issue?
Yes, there is a good solution or two:
A principally best solution is to use a Poller instance, for which a .poll() method tells your code, again with a help of an explicitly configurable aTimeOut in [ms], whether there is any incoming message to either attempt a non-blocking .recv() method or not even trying to do so, once there is NACK for any such message being present from the call to a .poll() method ( for details check the API ).
Another way is to use a non-blocking mode of a call to the .recv( aSockINST, ZMQ_DONTWAIT ) method. Here, the API + wrapper / binding, specify how it handles a state, where none such message was locally ready to get .recv()-ed, so that one may rely on the common language's available syntax-scaffolding - like { try: except: finally: } or { try: catch: } or { defer; panic(); recover() } - to handle either of the return states from the .recv( .., ZMQ_DONTWAIT ) call. Similar rules apply to an ( almost ) blocking call, with some moderately small .recv() timeout.
You can use Pollers:
poller = zmq.Poller()
poller.register(client_receiver, zmq.POLLIN);
for further reading:
http://learning-0mq-with-pyzmq.readthedocs.io/en/latest/pyzmq/multisocket/zmqpoller.html
hope it helps.

Handling Failed calls on the Consumer end (in a Producer/Consumer Model)

Let me try explaining the situation:
There is a messaging system that we are going to incorporate which could either be a Queue or Topic (JMS terms).
1 ) Producer/Publisher : There is a service A. A produces messages and writes to a Queue/Topic
2 ) Consumer/Subscriber : There is a service B. B asynchronously reads messages from Queue/Topic. B then calls a web service and passes the message to it. The webservice takes significant amount of time to process the message. (This action need not be processed real-time.)
The Message Broker is Tibco
My intention is : Not to miss out processing any message from A. Re-process it at a later point in time in case the processing failed for the first time (perhaps as a batch).
Question:
I was thinking of writing the message to a DB before making a webservice call. If the call succeeds, I would mark the message processed. Otherwise failed. Later, in a cron job, I would process all the requests that had initially failed.
Is writing to a DB a typical way of doing this?
Since you have a fail callback, you can just requeue your Message and have your Consumer/Subscriber pick it up and try again. If it failed because of some problem in the web service and you want to wait X time before trying again then you can do either schedule for the web service to be called at a later date for that specific Message (look into ScheduledExecutorService) or do as you described and use a cron job with some database entries.
If you only want it to try again once per message, then keep an internal counter either with the Message or within a Map<Message, Integer> as a counter for each Message.
Crudely put that is the technique, although there could be out-of-the-box solutions available which you can use. Typical ESB solutions support reliable messaging. Have a look at MuleESB or Apache ActiveMQ as well.
It might be interesting to take advantage of the EMS platform your already have (example 1) instead of building a custom solution (example 2).
But it all depends on the implementation language:
Example 1 - EMS is the "keeper" : If I were to solve such problem with TIBCO BusinessWorks, I would use the "JMS transaction" feature of BW. By encompassing the EMS read and the WS call within the same "group", you ask for them to be both applied, or not at all. If the call failed for some reason, the message would be returned to EMS.
Two problems with this solution : You might not have BW, and the first failed operation would block all the rest of the batch process (that may be the desired behavior).
FYI, I understand it is possible to use such feature in "pure java", but I never tried it : http://www.javaworld.com/javaworld/jw-02-2002/jw-0315-jms.html
Example 2 - A DB is the "keeper" : If you go with your "DB" method, your queue/topic customer continuously drops insert data in a DB, and all records represent a task to be executed. This feels an awful lot like the simple "mapping engine" problem every integration middleware aims to make easier. You could solve this with anything from a custom java code and multiples threads (DB inserter, WS job handlers, etc.) to an EAI middleware (like BW) or even a BPM engine (TIBCO has many solutions for that)
Of course, there are also other vendors... EMS is a JMS standard implementation, as you know.
I would recommend using the built in EMS (& JMS) features,as "guaranteed delivery" is what it's built for ;) - no db needed at all...
You need to be aware that the first decision will be:
do you need to deliver in order? (then only 1 JMS Session and Client Ack mode should be used)
how often and in what reoccuring times do you want to retry? (To not make an infinite loop of a message that couldn't be processed by that web service).
This is independent whatever kind of client you use (TIBCO BW or e.g. Java onMessage() in a MDB).
For "in order" delivery: make shure only 1 JMS Session processes the messages and it uses Client acknolwedge mode. After you process the message sucessfully, you need to acknowledge the message with either calling the JMS API "acknowledge()" method or in TIBCO BW by executing the "commit" activity.
In case of an error you don't execute the acknowledge for the method, so the message will be put back in the Queue for redelivery (you can see how many times it was redelivered in the JMS header).
EMS's Explicit Client Acknolwedge mode also enables you to do the same if order is not important and you need a few client threads to process the message.
For controlling how often the message get's processed use:
max redelivery properties of the EMS queue (e.g. you could put the message in the dead
letter queue afer x redelivery to not hold up other messages)
redelivery delay to put a "pause" in between redelivery. This is useful in case the
Web Service needs to recover after a crash and not gets stormed by the same message again and again in high intervall through redelivery.
Hope that helps
Cheers
Seb

RabbitMQ multi-threaded channels and queue binding

I have inherited some legacy RabbitMQ code that is giving me some serious headaches. Can anyone help, ideally pointing to some "official" documentation where I can browse for similar questions?
We create some channels receive responses from workers which perform a search using channels like so:
channelIn.queueDeclare("", false, false, true, null);
channelIn.queueBind("", AmqpClient.NAME_EXCHANGE,
AmqpClient.ROUTING_KEY_ROOT_INCOMING + uniqueId);
My understanding from browsing mailing lists and forums is that
declaring a queue with an empty name allows the server auto-generate a unique name, and
queues must have a globally unique name.
Is this true?
Also, in the second line above, my understanding based on some liberal interpretation of blogs and mailing lists is that queuebind with an empty queue name automatically binds to the last created queue. It seems nice because then you wouldn't have to pull the auto-generated name out of the clunky DeclareOK object.
Is this true? If so, will this work in a multithreaded environment?
I.e. is it possible some channel will bind itself to another channel's queue, then if that other channel closes, the incorrectly bound channel would get an error trying to use the queue? (note that the queue was created with autodelete=true.) My testing leads me to think yes, but I'm not confident that's where the problem is.
I cannot be certain that this will work in a multithreaded environment. It may be fine a high percentage of the time but it is possible you will get the wrong queue. Why take the risk?
Wouldn't this be better and safer?
String queueName = channelIn.queueDeclare("", false, false, true, null).getQueue();
channelIn.queueBind(queueName, AmqpClient.NAME_EXCHANGE,
AmqpClient.ROUTING_KEY_ROOT_INCOMING + uniqueId);
Not exactly clunky.
Q: What happens when a queue is declared with no name?
A: The server picks a unique name for the queue. When no name is supplied, the RabbitMQ server will generate a unique-for-that-RabbitMQ-cluster name, create a queue with that name, and then transmit the name back to the client that called queue.declare. RabbitMQ does this in a thread-safe way internally (e.g. many clients calling queue.declare with blank names will never get the same name). Here is the documentation on this behavior.
Q: Do queue names need to be globally unique?
A: No, but they may need to be in your use case. Any number of publishers and subscribers can share a queue. Queue declarations are idempotent, so if 2 clients declare a queue with the same name and settings at the same time, or at different times, the server state will be the same as if just one declared it. Queues with blank names, however, will never collide. Consider declaring a queue with a blank name as if it were two operations: an RPC asking RabbitMQ "give me a globally unique name that you will reserve just for my use", and then idempotently declaring a queue with that name.
Q: Will queue.bind with a blank name bind to the last created queue in a multithreaded environment?
A: Yes, but you should not do that; it achieves nothing, is confusing, and has unspecified/poorly-specified behavior. This technique is largely pointless and prone to bugs in client code (What if lines got added between the declare and the add? Then it would be very hard to determine what queue was being bound).
Instead, use the return value of queueDeclare; that return value will contain the name of the queue that was declared. If you declared a queue with no name, the return value of queueDeclare will contain the new globally-unique name provided by RabbitMQ. You can provide that explicitly to subsequent calls that work with that queue (like binding it).
For an additional reason not to do this, the documentation regarding blank-queue-name behavior is highly ambiguous:
The client MUST either specify a queue name or have previously
declared a queue on the same channel
What does that mean? If more than one queue was declared, which one will be bound? What if the previously-declared queue was then deleted on that same channel? This seems like a very good reason to be as explicit as possible and not rely on this behavior.
Q: Can queues get deleted "underneath" channels connected to them?
A: Yes, in specific circumstances. Minor clarification on your question's terminology: channels don't "bind" themselves to queues: a channel can consume a queue, though. Think of a channel like a network port and a queue like a remote peer: you don't bind a port to a remote peer, but you can talk to more than one peer through the same port. Consumers are the equivalent of connected sockets; not channels. Anyway:
Channels don't matter here, but consumers and connections do (can have more than one consumer, even to the same queue, per channel; you can have more than one channel per connection). Here are the situations in which a queue can be deleted "underneath" a channel subscribing to it (I may have missed some, but these are all the non-disastrous--e.g. "the server exploded" conditions I know of):
A queue was declared with exclusive set to true, and the connection on which the queue was declared closes. The channel used to declare the queue can be closed, but so long as the connection stays open the queue will keep existing. Clients connected to the exclusive queue will see it disappear. However, clients may not be able to access the exclusive queue for consumption in the first place if it is "locked" to its declarer--the documentation is not clear on what "used" means with regards to exclusive locking.
A queue which is manually deleted via a queue.delete call. In this case, all consumers connected to the queue may encounter an error the next time they try to use it.
Note that in many client situations, consumers are often "passive"
enough that they won't realize that a queue is gone; they'll just
listen forever on what is effectively a closed socket. Publishing to
a queue, or attempting to redeclare it with passive (existence
poll) is guaranteed to surface the nonexistence; consumption alone
is not: sometimes you will see a "this queue was deleted!" error,
sometimes it will take minutes or hours to arrive, sometimes you
will never see such an error if all you're doing is consuming.
Q: Will auto_delete queues get deleted "underneath" one consumer when another consumer exits?
A: No. auto_delete queues are deleted sometime after the last consumer leaves the queue. So if you start two consumers on an auto_delete queue, you can exit one without disturbing the other. Here's the documentation on that behavior.
Additionally, queues which expire (via per-queue TTL) follow the same behavior: the queue will only go away sometime after the last consumer leaves.

Pattern for processing communication protocol objects

I am using protobuf for implementing a communication protocol between a Java application and a native application written in C++. The messages are event driven: when an event occurs in the C++ application a protobuf message is conructed and sent.
message MyInterProcessMessage {
int32 id = 1;
message EventA { ... }
message EventB { ... }
...
}
In Java I receive on my socket an object of the class: MyInterProcessMessageProto. From this I can get my data very easily since they are encapsulated into each other: myMessage.getEventA().getName();
I am facing two problems:
How to delegate the processing of the received messages?
Because, analysising the whole message and distinguishing the different event types and the actions they imply resulted in a huge and not maintainable method with many if-cases.
I would like to find a pattern, where I can preserve the messages and not only apply them, but also undo them, like the Command pattern is used to implement this.
My first approach would be: create different wrapper classes for each event with a specified apply() and undo() method and delegate the job this way.
However I am not sure if this is the right way or whether there are not any better solutions.
To clarify my application:
The Java application models a running Java Virtual Machine and holds information, for instance Threads, Monitors, Memory, etc.
Every event changes the current state of the modeled JVM. For instance, a new thread was launched, another thread goes into blocking state, memory was freed etc. In the same meaning the events are modeled: ThreadEvent, MemoryEvent, etc.
This means, the messages have to be processed sequentially. In order to iterate back to previous states of the JVM, I would like to implement this undo functionality.
For undo I already tried. clearAllStates, apply Events until Event #i.
Unfortunately with 20.000+ events this is total inefficient.
To provide a tailored answer it would be good to know what you're doing with received messages, if they can be processed concurrently or not, and how an undo impacts the processing of messages received after and undo'ed message.
However, here's a generic suggestion: A typical approach is to delegate received messages to a queue-like handler class, which usually runs in an own thread (to let the message receiver get ready for the next incoming message as soon as possible) and sequentially processes received messages. You could use a stack-like class to keep track of processed messages for the sake of the undo feature. You could also use specific queues and stacks for different event types.
Basically this resembles the thread pool pattern.

Categories