Parallel between LMax Disruptor and Rx Framework concept?

Parallel between LMax Disruptor and Rx Framework concept? - java

As I read here http://mechanitis.blogspot.fr/2011/06/dissecting-disruptor-how-do-i-read-from.html
"for every individual item, the Consumer simply says "Let me know when you've got more than this number", and is told in return how many more entries it can grab."
Doesn't this relates to Rx Framework concept as exposed by Erik Meijer
http://www.youtube.com/watch?v=8Mttjyf-8P4 ?
If yes could Rx Framework be helpfull to implement similar piece of software ?

Nice question, I've been wondering about this myself, for one of my current projects.
I don't feel greatly qualified to give a definitive answer, however:
They are designed to scratch different itches.
Disruptor is clearly designed for performance first, as close to the metal as possible. It doesn't do anything fancy apart from what it does.
Rx is higher level, it is 'Linq to events', it allows you to do nice things with 'events' that you couldn't with normal framework events (you can't filter a standard event and then continue propagating it as an event).
Semantic differences
As the originator of Disruptor.Net pointed out here:
The interface matches but I think the semantic behind RX does not:
an exception (OnError) terminates the stream, this in not the case with the disruptor
you can not subscribe to the disruptor while it's hot: observers would have to be setup before "starting" the disruptor, this does not
work very well with operators like retry for instance which will re-
subscribe in case of error
lots of operators do not make sense with the disruptor or would just not work
Having said that, he was (at least at one time) thinking about integration between Disruptor.Net, TPL Dataflow and Rx.
Here is another page where someone asks the same question, the page concludes with:
Disruptor is in fact more like TPL DataFlow in my opinion.

Without know the Rx framework, you could be right. However Disruptor.Net is designed to be a port of the Java version so it will be as similar as possible. Given the original doesn't use Rx, it would add lots of rework and possibly performance issues to use a different library.

Related

Implementing retractions in google dataflow

I read the "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, Out of Order Data Processing" paper. Alas, the SDK does not yet expose the accumulating & retracting triggering mode (section 2.3).
I was wondering if there was a workaround for getting similar semantics?
I have been reading the source and have figured out that StateTag or StateNamespace may be the way i can store the "last emitted value of the window" and hence can be used to calculate the retraction message down the pipeline. Is this the correct path or are there other classes/ways I can/should look at.

The upcoming state API is indeed your best bet for emulating retractions. Those classes you mentioned are part of the state API, but everything in the com.google.cloud.dataflow.sdk.util is for internal use only; we technically make no guarantees that the APIs won't change drastically, or even remain unreleased. That said, releasing that API is on our roadmap, and I'm hopeful we'll get it released relatively soon.
One thing to keep in mind: all the code downstream of your custom retractions will need to be able to differentiate them from normal records. This is something we'll do automatically for you once bonafide retraction support is ready, but in the mean time, you'll just need to make sure all the code you write that might receive a retraction knows how to recognize and handle it as such.

Akka typed actors in Java

I don't understand why not to use TypedActors in Akka. Using reflection (well.. instanceof) to compensate for the lack of pattern matching in Java is quite ugly.
As far as I understand, TypedActors should be like a gate between the "Akka world" and the "Non Akka world" of your software. But why won't we just throw all OO principals and just use reflection!
Why wouldn't you want to use an actor and know exactly what it should respond to? Or for Akka's sake of keeping the actor model, why not create a message hierarchy that uses double-dispatch in order to activate the right method in the actor (and I know you shouldn't pass Actors as parameters and use ActorRef instead).
DISCLAIMER: I'm new to Akka and this model, and I haven't wrote a single line of code using Akka, but just reading the documentation is giving me a headache.

Before we get started: The question is about the deprecated "typed actors" module. Which will soon be replaced with akka-typed, a far superior take on the problem, which avoids the below explained shortcomings - please do have a look at akka-typed if you're interested in typed actors!
I'll enumerate a number of downsides of using the typed actors implementation you refer to. Please do note however that we have just merged a new akka-typed module, which brings in type safety back to the world of akka actors. For the sake of this post, I will not go in depth into the reasons developing the typed version was such a tough challenge, let's for now answer the question of "why not use the (old) typed actors".
Firstly, they were never designed to be the core of the toolkit. They are built on top of the messaging infrastructure Akka provides. Please note that thanks to that messaging infrastructure we're able to achieve location transparency, and Akka's well known performance. They heavily use reflection and JDK proxies to translate to and from methods to message sends. This is very expensive (time wise), and downgrades the performance around 10-fold in contrast to plain Akka Actors, see below for a "ping pong" benchmark (implemented using both styles, sender tells to actor, actor replies - 100.000 times):
Unit = ops/ms
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs thrpt 20 119973619.810 79577253.299 ops/ms
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs thrpt 20 16697718.988 406179.847 ops/ms
Unit = us/op
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs sample 133647 1.223 0.916 us/op
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs sample 222869 12.416 0.045 us/op
(Benchmarks are kept in akka/akka-bench-jmh and run using the OpenJDK JMH tool, via the sbt-jmh plugin.)
Secondly, using methods to abstract over distributed systems is just not a good way of going about it (oh, how I remember RMI... let's not go there again). Using such "looks like a method" makes you stop thinking about message loss, reordering and all the things which can and do happen in distributed systems. It also encourages (makes it "too easy to do the wrong thing") using signatures like def getThing(id: Int): Thing - which would generate blocking code - which is horrible for performance! You really do want to stay asynchronous and responsive, which is why you'd end up with loads of futures when trying to work properly with these (proxy based) typed actors.
Lastly, you basically lose one of the main Actor capabilities. The 3 canonical operations an Actor can perform are 1) send messages 2) start child actors 3) change it's own behaviour based on received messages (see Carl Hewitt's original paper on the Actor Model). The 3rd capability is used to beautifully model state machines. For example you can say (in plain akka actors) become(active) and then become(allowOnlyPrivileged), to switch between receive implementations - making finite state machine implementations (we also have a DSL for FSMs) a joy to work with. You can not express this nicely in JDK proxied typed actors, because you can not change the set of exposed methods. This is a major down side once you get into the thinking and modeling using state machines.
A New Hope (Episode 1): Please do have a look at the upcoming akka-typed module authored by Roland Kuhn (preview to be included in the 2.4 release soon), I'm pretty sure you'll like what you'll find there typesafety wise. And also, that implementation will eventually be even faster than the current untyped actors (omitting impl details here as the answer got pretty long already - short version: basically we'll remove a load of allocations thanks to the new implementation).
I hope you'll enjoy this thorough answer. Feel free to ask follow up questions in comments here or on akka-user - our official mailing list. Happy Hakking!

Typed Actors provide you with a static contract defined in the terms of your domain-- you can name their messages (which will be delegated to an underlying implementation and executed asynchronously) actions which make sense in your domain, avoiding the use of reflection on your part (TypedActors use JDK Proxies under the hood, so there is still reflection going on, you just don't have to worry about it, and you gain type-checking in terms of the arguments passed to the active object/typed actor and its return types. The documention is pretty clear on this, but I know for those new to actor-based concurrency, additional examples always help, so feel free to ask additional questions/comments if you are still having troubling groking the difference.

But do you guys realice that you have a huge number of companies where they don’t have the expertise developers, but a big Infra to scale horizontally as much as we need, so performance not always is the best “go for it” but instead be responsive, Message driven, elastic and resilient, which right now thanks to typed actors we have, being used by developers that don’t know anything about Akka or Reactive
Programing.
Don’t get me wrong, I’m use pure Akka typed in my day by day, but for delivery teams we have this framework that use typed actors and our consumers use as POJO without know that they are coding in a reactive system. And that’s awesome feature.

How to implement a network protocol?

Here is a generic question. I'm not in search of the best answer, I'd just like you to express your favourite practices.
I want to implement a network protocol in Java (but this is a rather general question, I faced the same issues in C++), this is not the first time, as I have done this before. But I think I am missing a good way to implement it. In fact usually it's all about exchanging text messages and some byte buffers between hosts, storing the status and wait until the next message comes. The problem is that I usually end up with a bunch of switch and more or less complex if statements that react to different statuses / messages. The whole thing usually gets complicated and hard to mantain. Not to mention that sometimes what comes out has some "blind spot", I mean statuses of the protocol that have not been covered and that behave in a unpredictable way. I tried to write down some state machine classes, that take care of checking start and end statuses for each action in more or less smart ways. This makes programming the protocol very complicated as I have to write lines and lines of code to cover every possible situation.
What I'd like is something like a good pattern, or a best practice that is used in programming complex protocols, easy to mantain and to extend and very readable.
What are your suggestions?

Read up on the State design pattern to learn how to avoid lots of switch statements.
"sometimes what comes out has some "blind spot", I mean statuses of the protocol that have not been covered..."
State can help avoid gaps. It can't guarantee a good design, you still have to do that.
"...as I have to write lines and lines of code to cover every possible situation."
This should not be considered a burden or a problem: You must write lines of code to cover every possible situation.
State can help because you get to leverage inheritance. It can't guarantee a good design, you still have to do that.

Designing a protocol is usually all about the application space you are working within. For instance, http is all about handling web pages, graphics, and posts, while FTP is all about transferring files.
So in short, to start, you should decide what application space you are in, then define the actions that need to be taken. Then finally, before you start designing your actual protocol, you should seriously, seriously hunt for another protocol stack that does what you want to do and avoid implementing a protocol stack altoether. Only after you have determined that something else pre-built absolutely won't work for you should you start building your own protocol stack.

In C++ you can use Boost::Spirit library to parse your protocol message easily. The only "difficulty" is to define the grammar of your message protocol. Take a look at Gnutella source code to see how they solve this problem. Here http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf is the Gnutella protocol specifications

Finite State Machine is what you want
FSM
So you define a whole bunch of states that you can be in as a receiver or sender (idle, connecting_phase1, connecting_phase2, packet expected,...)
Then define all the possible events (packet1 arrives, net closes, ...)
finally you have a table that says 'when in state x and event n happens do func y and transition to state q' - for every state and event (many will be null or dups)
Edit - how to make a FSM (rough sketch)
struct FSMNode
{
int m_nextState;
void (m_func*);
}
FSMNode states[NUMSTATES][NUMEVENTS]=
{ // state 0
{3, bang}, // event 0
{2,wiz},
{1, fertang}
}
{
{1, noop}, // event 0
{1, noop},
{3, ole}
}
.......
FSMNode node = states[mystate][event];
node.m_func(context);
mystate = node.m_nextState;
I am sure this is full of invalid syntax - but I hope you get the drift

Why not use XML as your protocol? You can encapsulate and categorize all your pieces of data inside XML nodes

Can't give you an example myself, but how about looking at how other (competent) people are doing it?
Like this one?
http://anonsvn.jboss.org/repos/netty/trunk/src/main/java/org/jboss/netty/handler/codec/http/
P.S. for that matter, I actually recommend using netty as your network framework and build your protocol on top of it. It should be very easy, and you'll probably get rid of bunch of headaches...

If you are using Java, consider looking at Apache MINA, it's documentation and samples should inspire you in the right way.

Right-click the network connection icon in the System Tray.
Click Troubleshoot problems.
The troubleshooter may find and fix the problem, in this case, you can get quickly started with your business.
If the troubleshooter can't fix the Winsocks problem, then you may get an error looking like:
"One or more network protocols are missing on this computer"

Challenging Multithreading Problems

Is there some resource for challenging multi-threading problems? Would like to pose these to interviewees if possible. Tired of asking the same wait-notify questions that everyone gets right these days, but can't visualise a real scenario where multi-threading was employed.

The problem is that concurrent programming is a difficult topic. If you (the interviewer) are not fully on top of it, it will be difficult for you to tell if the interviewee knows their stuff. It is very easy to come up with solutions to concurrency problems that have subtle flaws. Conversely, it is unfair on candidates1 if you reject them because you think their answers are wrong when they are actually correct.
1 - and bad for your organisation. If the candidate actually knows more about multi-threading than you, then you arguably need to employ him. Other factors being equal, of course.

Java Concurrency In Practice. I like to know if candidate understand data race, CAS, Michael Scott Queue and other concurrent data structures and how concurrent thread safety is important with growing number of cores.

As multithreading is hard (as others have pointed out) I would suggest having this in an actual programming session where the potential employee is given a programming problem preferrably based on something that has actually happened along with one of your experienced programmers so you can actually SEE how the problem was attempted solved, and the experienced programmer can evaluate what happened.
Must not be too complex, but complex enough that your expereinced programmer get enough information.

Well, if you want to have fun with the poor sap, ask him about Dekker's Algorithm (and Peterson's variation thereof). If you're feeling nasty, ask him if he has ever used either one on real multiprocessor hardware.
If you feel extra-nasty, ask him to show you a technique suitable for lock-free true concurrent single-reader single-writer unidirectional communications, between two processors with shared memory, in which the only atomic operations are single-word reads and writes. There is no read-modify-write instruction, on either side, and the processor architectures need not be the same. (Yes, such a technique exists.)

I wouldn't ask too specific/detailed questions. But the above mentioned book 'Concurrency in practice' is a good helper. Just go there chapter-wise and read out the pin-points, e.g.:
Explain difference between mutable/immutable
What does it mean to share data in concurrency setup
What problems do you solve with concurrency
etc.

First you
try to get the real scenario
of it, and then ask job seekers.
For this you should pose a questions like what is real scenario for multithreading?
Hope it will help you.

I got one in a interview recently. Get the candidate to write a Servlet that implements an accurate in memory hit counter indexed by URL (to serve a javascript style hit counter on a number of web page). Try it for yourself, it's not as easy as it sounds. The solution is a cut down implementation of the Memoizer pattern from Concurrency in Practice.

Is there a .Net equivalent to java.util.concurrent.Executor?

Have a long running set of discrete tasks: parsing 10s of thousands of lines from a text file, hydrating into objects, manipulating, and persisting.
If I were implementing this in Java, I suppose I might add a new task to an Executor for each line in the file or task per X lines (i.e. chunks).
For .Net, which is what I am using, I'm not so sure. I have a suspicion maybe CCR might be appropriate here, but I'm not familiar enough with it, which is why I pose this question.
Can CCR function in an equivalent fashion to Java Executors, or is there something else available?
Thanks

You may want to look at the Task Parallel Library.
As of C# 5 this is built into the language using the async and await keywords.

If you're going to ask a bunch of .NET people what's closest to being equivalent to Java Excecutors, it might not hurt to describe the distinguishing features of Java Executors. The person who knows your answer may not be any more familiar with Java than you are with .NET.
That said, if the already-mentioned Task Parallel Library is overkill for your needs, or you don't want to wait for .NET 4.0, perhaps ThreadPool.QueueUserWorkItem() would be what you're looking for.

Maybe this is related: Design: Task Parallel Library explored.
See 10-4 Episode 6: Parallel Extensions as a quick intro.
For older thread-based approach, there's ThreadPool for pooling.

The BackgroundWorker class is probably what you're looking for. As the name implies, it allows you to run background tasks, with automatically managed pooling, and status update events.

For anyone looking for a more contemporary solution (as I was), check out the EventLoopScheduler class.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.