Implementing retractions in google dataflow

Implementing retractions in google dataflow - java

I read the "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, Out of Order Data Processing" paper. Alas, the SDK does not yet expose the accumulating & retracting triggering mode (section 2.3).
I was wondering if there was a workaround for getting similar semantics?
I have been reading the source and have figured out that StateTag or StateNamespace may be the way i can store the "last emitted value of the window" and hence can be used to calculate the retraction message down the pipeline. Is this the correct path or are there other classes/ways I can/should look at.

The upcoming state API is indeed your best bet for emulating retractions. Those classes you mentioned are part of the state API, but everything in the com.google.cloud.dataflow.sdk.util is for internal use only; we technically make no guarantees that the APIs won't change drastically, or even remain unreleased. That said, releasing that API is on our roadmap, and I'm hopeful we'll get it released relatively soon.
One thing to keep in mind: all the code downstream of your custom retractions will need to be able to differentiate them from normal records. This is something we'll do automatically for you once bonafide retraction support is ready, but in the mean time, you'll just need to make sure all the code you write that might receive a retraction knows how to recognize and handle it as such.

Related

Using GET for delete in REST

what is the technical drawback of deleting a row on a GET method call in REST i know that it is not a standard way of doing but will there be having any issues ?

The technical drawback is that indexers (think Google) come along and GET all of the links that they can find, just to see what's there. General purpose components that see a link to your thing might do a GET on it as a way of pre-caching the results in case the client wants them.
Fielding, writing in 2002
HTTP does not attempt to require the results of a GET to be safe. What
it does is require that the semantics of the operation be safe, and
therefore it is a fault of the implementation, not the interface
or the user of that interface, if anything happens as a result that
causes loss of property (money, BTW, is considered property for the
sake of this definition)

Akka typed actors in Java

I don't understand why not to use TypedActors in Akka. Using reflection (well.. instanceof) to compensate for the lack of pattern matching in Java is quite ugly.
As far as I understand, TypedActors should be like a gate between the "Akka world" and the "Non Akka world" of your software. But why won't we just throw all OO principals and just use reflection!
Why wouldn't you want to use an actor and know exactly what it should respond to? Or for Akka's sake of keeping the actor model, why not create a message hierarchy that uses double-dispatch in order to activate the right method in the actor (and I know you shouldn't pass Actors as parameters and use ActorRef instead).
DISCLAIMER: I'm new to Akka and this model, and I haven't wrote a single line of code using Akka, but just reading the documentation is giving me a headache.

Before we get started: The question is about the deprecated "typed actors" module. Which will soon be replaced with akka-typed, a far superior take on the problem, which avoids the below explained shortcomings - please do have a look at akka-typed if you're interested in typed actors!
I'll enumerate a number of downsides of using the typed actors implementation you refer to. Please do note however that we have just merged a new akka-typed module, which brings in type safety back to the world of akka actors. For the sake of this post, I will not go in depth into the reasons developing the typed version was such a tough challenge, let's for now answer the question of "why not use the (old) typed actors".
Firstly, they were never designed to be the core of the toolkit. They are built on top of the messaging infrastructure Akka provides. Please note that thanks to that messaging infrastructure we're able to achieve location transparency, and Akka's well known performance. They heavily use reflection and JDK proxies to translate to and from methods to message sends. This is very expensive (time wise), and downgrades the performance around 10-fold in contrast to plain Akka Actors, see below for a "ping pong" benchmark (implemented using both styles, sender tells to actor, actor replies - 100.000 times):
Unit = ops/ms
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs thrpt 20 119973619.810 79577253.299 ops/ms
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs thrpt 20 16697718.988 406179.847 ops/ms
Unit = us/op
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs sample 133647 1.223 0.916 us/op
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs sample 222869 12.416 0.045 us/op
(Benchmarks are kept in akka/akka-bench-jmh and run using the OpenJDK JMH tool, via the sbt-jmh plugin.)
Secondly, using methods to abstract over distributed systems is just not a good way of going about it (oh, how I remember RMI... let's not go there again). Using such "looks like a method" makes you stop thinking about message loss, reordering and all the things which can and do happen in distributed systems. It also encourages (makes it "too easy to do the wrong thing") using signatures like def getThing(id: Int): Thing - which would generate blocking code - which is horrible for performance! You really do want to stay asynchronous and responsive, which is why you'd end up with loads of futures when trying to work properly with these (proxy based) typed actors.
Lastly, you basically lose one of the main Actor capabilities. The 3 canonical operations an Actor can perform are 1) send messages 2) start child actors 3) change it's own behaviour based on received messages (see Carl Hewitt's original paper on the Actor Model). The 3rd capability is used to beautifully model state machines. For example you can say (in plain akka actors) become(active) and then become(allowOnlyPrivileged), to switch between receive implementations - making finite state machine implementations (we also have a DSL for FSMs) a joy to work with. You can not express this nicely in JDK proxied typed actors, because you can not change the set of exposed methods. This is a major down side once you get into the thinking and modeling using state machines.
A New Hope (Episode 1): Please do have a look at the upcoming akka-typed module authored by Roland Kuhn (preview to be included in the 2.4 release soon), I'm pretty sure you'll like what you'll find there typesafety wise. And also, that implementation will eventually be even faster than the current untyped actors (omitting impl details here as the answer got pretty long already - short version: basically we'll remove a load of allocations thanks to the new implementation).
I hope you'll enjoy this thorough answer. Feel free to ask follow up questions in comments here or on akka-user - our official mailing list. Happy Hakking!

Typed Actors provide you with a static contract defined in the terms of your domain-- you can name their messages (which will be delegated to an underlying implementation and executed asynchronously) actions which make sense in your domain, avoiding the use of reflection on your part (TypedActors use JDK Proxies under the hood, so there is still reflection going on, you just don't have to worry about it, and you gain type-checking in terms of the arguments passed to the active object/typed actor and its return types. The documention is pretty clear on this, but I know for those new to actor-based concurrency, additional examples always help, so feel free to ask additional questions/comments if you are still having troubling groking the difference.

But do you guys realice that you have a huge number of companies where they don’t have the expertise developers, but a big Infra to scale horizontally as much as we need, so performance not always is the best “go for it” but instead be responsive, Message driven, elastic and resilient, which right now thanks to typed actors we have, being used by developers that don’t know anything about Akka or Reactive
Programing.
Don’t get me wrong, I’m use pure Akka typed in my day by day, but for delivery teams we have this framework that use typed actors and our consumers use as POJO without know that they are coding in a reactive system. And that’s awesome feature.

Consistency Between MongoDB and RabbitMQ

I'm writing a system that will leverage Mongo for persistence and RabbitMQ for a message bus/event queueing, and I'm trying to figure out the best way to be resilient to failures on the publication side.
There are three scenarios I can think of:
Everything works - consistent
Everything fails - consistent
Part works, whichever happens later is out of date - inconsistent
The last case is the one I'm interested in, and I'm curious to know how others have solved the issue, given that XA isn't an option (and I wouldn't want the performance overhead anyway).
There are a couple of solutions I can think of:
Add a "lastEvent" (or some similar) field to the Mongo document. On a periodic interval, scan for documents where lastEvent < lastUpdated, and fire an event (this requires an extra update for every change, and loses context of the "old" document in the case of an update)
Fire the event in Rabbit before persisting in Mongo, and allow safe handling of events that may not have actually happened (really dislike this approach)
Could anyone else shed some light on how to provide some sort of consistency across a persistence layer and message bus?

1 is never a good idea, the notion of "last X time" falls over as soon as you introduce multi-threaded or multi-process systems, and when that "time" is generated (if some requests take longer to process then others, then the "later" time might be written before the "earlier" times to the persistent store)
2 Is basically Idempotence, and it's a pattern that works very well for designing fault tolerant systems if done properly

Correctly using onUpgrade (and content providers) to handle updates without blocking the main thread, are `Loader`s pointless?

This is one of the questions that involves crossing what I call the "Hello World Gulf" I'm on the "Hello world" I can use SQLite and Content Providers (and resolvers) but I now need to cross to the other side, I cannot make the assumption that onUpgrade will be quick.
Now my go-to book (Wrox, Professional Android 4 development - I didn't chose it because of professional, I chose it because Wrox are like the O'Reilly of guides - O'Reilly suck at guides, they are reference book) only touches briefly on using Loaders, so I've done some searching, some more reading and so forth.
I've basically concluded a Loader is little more than a wrapper, it just does things on a different thread, and gives you a callback (on that worker thread) to process things in, it gives you 3 steps, initiating the query, using the results of the query, and resetting the query.
This seems like quite a thin wrapper, so question 1:
Why would I want to use Loaders?
I sense I may be missing something you see, most "utilities" like this with Android are really useful if you go with the grain so to speak, and as I said Loaders seem like a pretty thin wrapper, and they force me to have callback names which could become tedious of there are multiple queries going on
http://developer.android.com/reference/android/content/Loader.html
Reading that points out that "they ought to monitor the data and act upon changes" - this sounds great but it isn't obvious how that is actually done (I am thinking about database tables though)
Presentation
How should this alter the look of my application? Should I put a loading spinning thing (I'm not sure on the name, never needed them before) after a certain amount of time post activity creation? So the fragment is blank, but if X time elapses without the loader reporting back, I show a spiny thing?
Other operations
Loaders are clearly useless for updates and such, their name alone tells one this much, so any nasty updates and such would have to be wrapped by my own system for shunting work to a worker thread. This further leads me to wonder why would I want loaders?
What I think my answer is
Some sort of wrapper (at some level, content provider or otherwise) to do stuff on a worker thread will mean that the upgrade takes place on that thread, this solves the problem because ... well that's not on the main thread.
If I do write my own I can then (if I want to) ensure queries happen in a certain order, use my own data-structures (rather than Bundles) it seems that I have better control.
What I am really looking for
Discussion, I find when one knows why things are the way they are that one makes less mistakes and just generally has more confidence, I am sure there's a reason Loaders exist, and there will be some pattern that all of Android lends itself towards, I want to know why this is.
Example:
Adapters (for ListViews) it's not immediately obvious how one keeps track of rows (insert) why one must specify a default style (and why ArrayAdapter uses toString) when most of the time (in my experience, dare I say) it is subclasses, reading the source code gives one an understanding of what the Adapter must actually do, then I challenge myself "Can I think of a (better) system that meets these requirements", usually (and hopefully) my answer to that converges on how it's actually done.
Thus the "Hello World Gulf" is crossed.
I look forward to reading answers and any linked text-walls on the matter.

you shouldnt use Loaders directly, but rather LoaderManager

desiging a FIX message encoder and decoder

I am trying to design a simple FIX message encoder and decoder to encode (convert to FIX) and decode (convert from FIX) my business domain Order objects. I have designed something, but I am not able to achieve the beautiful design I want. Wanted to see if others who have experience building this kind of things have any better design ideas.
This is what I roughly have: a business Object Order, QuickFIX object Message.
I need to generate NewOrder/Cancel/Replace messages and the message could be different for different exchanges.
I can have ReplaceEncoder --> NewOrderEncoder --> AbstractEncoder, CancelEncoder --> AbstractEncoder.
But if I want another dimension to this, like having custom message generation for different exchanges, then it results in too many combinations of hierarchies.
Is my only bet is to mundanely write different code for different exchanges? How others achieve this? Thanks.

I think you will probably come across a similar problem that we have. That is that each FIX implementation is different. Some use 4.2 others 4.4, some use some tags others ignore them, some use many of their own tags others use very few. What we have done is created general FIX sessions with subclasses for FIX 4.2 and 4.4 and then subclasses for each specific sessions (ie individual brokers). That gives us a reasonable amount of reuse of code for sending and receiving FIX messages. With just the specifics changed for things like handling account names and passwords etc.
For message generation we have a factory method that returns and adapter. All the adapters have the same API which will convert our Business order object in to a FIX Message object. Of course each adapter is specific to the API of the broker. I guess we could probably reuse some code between the adapters but currently we don't.

Is my only bet is to mundanely write different code for different exchanges?
Certainly not. In a FIX message there are compulsory and non compulsory fields. You cannot negotiate on the required fields because then you could not guarantee the authenticity and completeness of the messages. Now I am not saying this is impossible, many counter parties have their own specific user level agreements with exchanges for their own specific messages.
With Quickfix, the XML data dictionary from where the engines confirms the completeness of the messages, is in your hand. Tweak it for your own requirements. You would certainly have multiple sessions. I am not sure if this is possible, haven't tried it myself, does different sessions allow different data dictionaries ? If yes, then use them for different counter parties. If that isn't possible, one way which crosses my mind is add extra code for processing your specific fields, not the whole message, in messages expected from certain counter parties.
One place where I had worked, we were using something on these lines. Receive whatever version you may, but once the message is received convert it into a specific version of FIX message, which only exists inside your system. So your engine basically reads only 1 FIX version of messages. But the added complexity is you have to code a converter. I am not sure how feasible is that for you.

FIX is an extraordinarily slippery protocol when it comes to message definitions.
In practice, every institution that offers a FIX interface has made modifications to the default message set. That means, for instance, a FIX4.4 NewOrderSingle message from counterparty A may have different fields than one from counterparty B.
In fact, counterparty A may have made up some fields whole-cloth and added them in. For any new counterparty, there's a chance you'll encounter fields that you've never seen before.
I've written a few adapters for a few different exchanges, and unfortunately, you're really forced to handle them individually. You may be able to capitalize on some commonalities, but you can't make any assumptions on that until you've reviewed their FIX interface's specs.
So, short answer to your question:
Is my only bet is to mundanely write different code for different exchanges?
Yep, pretty much.

What we ended up doing was writing a base fix layer that applies only the required fix tags. In the fix spec certain tags are flagged as required for each message type.
Once this message had been created we apply a filter to the message that is specific to a broker and instrument type.
ie if you trade options and equities with Goldman and JPMorgan you'd write the following filters:
Goldman-equity
Goldman-option
JPMorgan-Equity
JPMorgan-option
Each would apply vendor and instrument specific fields to the base message.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.