Akka typed actors in Java - java

I don't understand why not to use TypedActors in Akka. Using reflection (well.. instanceof) to compensate for the lack of pattern matching in Java is quite ugly.
As far as I understand, TypedActors should be like a gate between the "Akka world" and the "Non Akka world" of your software. But why won't we just throw all OO principals and just use reflection!
Why wouldn't you want to use an actor and know exactly what it should respond to? Or for Akka's sake of keeping the actor model, why not create a message hierarchy that uses double-dispatch in order to activate the right method in the actor (and I know you shouldn't pass Actors as parameters and use ActorRef instead).
DISCLAIMER: I'm new to Akka and this model, and I haven't wrote a single line of code using Akka, but just reading the documentation is giving me a headache.

Before we get started: The question is about the deprecated "typed actors" module. Which will soon be replaced with akka-typed, a far superior take on the problem, which avoids the below explained shortcomings - please do have a look at akka-typed if you're interested in typed actors!
I'll enumerate a number of downsides of using the typed actors implementation you refer to. Please do note however that we have just merged a new akka-typed module, which brings in type safety back to the world of akka actors. For the sake of this post, I will not go in depth into the reasons developing the typed version was such a tough challenge, let's for now answer the question of "why not use the (old) typed actors".
Firstly, they were never designed to be the core of the toolkit. They are built on top of the messaging infrastructure Akka provides. Please note that thanks to that messaging infrastructure we're able to achieve location transparency, and Akka's well known performance. They heavily use reflection and JDK proxies to translate to and from methods to message sends. This is very expensive (time wise), and downgrades the performance around 10-fold in contrast to plain Akka Actors, see below for a "ping pong" benchmark (implemented using both styles, sender tells to actor, actor replies - 100.000 times):
Unit = ops/ms
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs thrpt 20 119973619.810 79577253.299 ops/ms
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs thrpt 20 16697718.988 406179.847 ops/ms
Unit = us/op
Benchmark Mode Samples Mean Mean error Units
TellPingPongBenchmark.tell_100000_msgs sample 133647 1.223 0.916 us/op
JdkProxyTypedActorTellPingPongBenchmark.tell_100000_msgs sample 222869 12.416 0.045 us/op
(Benchmarks are kept in akka/akka-bench-jmh and run using the OpenJDK JMH tool, via the sbt-jmh plugin.)
Secondly, using methods to abstract over distributed systems is just not a good way of going about it (oh, how I remember RMI... let's not go there again). Using such "looks like a method" makes you stop thinking about message loss, reordering and all the things which can and do happen in distributed systems. It also encourages (makes it "too easy to do the wrong thing") using signatures like def getThing(id: Int): Thing - which would generate blocking code - which is horrible for performance! You really do want to stay asynchronous and responsive, which is why you'd end up with loads of futures when trying to work properly with these (proxy based) typed actors.
Lastly, you basically lose one of the main Actor capabilities. The 3 canonical operations an Actor can perform are 1) send messages 2) start child actors 3) change it's own behaviour based on received messages (see Carl Hewitt's original paper on the Actor Model). The 3rd capability is used to beautifully model state machines. For example you can say (in plain akka actors) become(active) and then become(allowOnlyPrivileged), to switch between receive implementations - making finite state machine implementations (we also have a DSL for FSMs) a joy to work with. You can not express this nicely in JDK proxied typed actors, because you can not change the set of exposed methods. This is a major down side once you get into the thinking and modeling using state machines.
A New Hope (Episode 1): Please do have a look at the upcoming akka-typed module authored by Roland Kuhn (preview to be included in the 2.4 release soon), I'm pretty sure you'll like what you'll find there typesafety wise. And also, that implementation will eventually be even faster than the current untyped actors (omitting impl details here as the answer got pretty long already - short version: basically we'll remove a load of allocations thanks to the new implementation).
I hope you'll enjoy this thorough answer. Feel free to ask follow up questions in comments here or on akka-user - our official mailing list. Happy Hakking!

Typed Actors provide you with a static contract defined in the terms of your domain-- you can name their messages (which will be delegated to an underlying implementation and executed asynchronously) actions which make sense in your domain, avoiding the use of reflection on your part (TypedActors use JDK Proxies under the hood, so there is still reflection going on, you just don't have to worry about it, and you gain type-checking in terms of the arguments passed to the active object/typed actor and its return types. The documention is pretty clear on this, but I know for those new to actor-based concurrency, additional examples always help, so feel free to ask additional questions/comments if you are still having troubling groking the difference.

But do you guys realice that you have a huge number of companies where they don’t have the expertise developers, but a big Infra to scale horizontally as much as we need, so performance not always is the best “go for it” but instead be responsive, Message driven, elastic and resilient, which right now thanks to typed actors we have, being used by developers that don’t know anything about Akka or Reactive
Programing.
Don’t get me wrong, I’m use pure Akka typed in my day by day, but for delivery teams we have this framework that use typed actors and our consumers use as POJO without know that they are coding in a reactive system. And that’s awesome feature.

Related

Implementing retractions in google dataflow

I read the "The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in MassiveScale, Unbounded, Out of Order Data Processing" paper. Alas, the SDK does not yet expose the accumulating & retracting triggering mode (section 2.3).
I was wondering if there was a workaround for getting similar semantics?
I have been reading the source and have figured out that StateTag or StateNamespace may be the way i can store the "last emitted value of the window" and hence can be used to calculate the retraction message down the pipeline. Is this the correct path or are there other classes/ways I can/should look at.
The upcoming state API is indeed your best bet for emulating retractions. Those classes you mentioned are part of the state API, but everything in the com.google.cloud.dataflow.sdk.util is for internal use only; we technically make no guarantees that the APIs won't change drastically, or even remain unreleased. That said, releasing that API is on our roadmap, and I'm hopeful we'll get it released relatively soon.
One thing to keep in mind: all the code downstream of your custom retractions will need to be able to differentiate them from normal records. This is something we'll do automatically for you once bonafide retraction support is ready, but in the mean time, you'll just need to make sure all the code you write that might receive a retraction knows how to recognize and handle it as such.

How many GWT services

Starting a new GWT application and wondering if I can get some advice from someones experience.
I have a need for a lot of server-side functionality through RPC services...but I am wondering where to draw the line.
I can make a service for every little call or I can make fewer services which handle more operations.
Let's say I have Customer, Vendor and Administration services. I could make 3 services or a service for each function in each category.
I noticed that much of the service implementation does not provide compile-time help and at times troublesome to get going, but it provides good modularity. When I have a larger service, I don't have the modularity as I described, but I don't have to the service creation issues and reduce the entries in my web.xml file.
Is there a resource issue with using a lot of services? What is the best practice to determine what level of granularity to use?
in my opinion, you should make a rpc service for "logical" things.
in your example:
one for customer, another for vendors and a third one for admin
in that way, you keep several services grouped by meaning, and you will have a few lines to maintain in the web.xml file ( and this is a good news :-)
More seriously, rpc services are usually wrappers to call database stuff, so, you even could make a single 'MagicBlackBoxRpc' with a single web.xml entry and thousands of operations !
but making a separate rpc for admin operations, like you suggest, seems a good thing.
Read general advice on "how big should a class be?", which is available in any decent software engineering book.
In my opinion:
One class = One Subject (ie. group of functions or behaviours that are related)
A class should not deal with more than one subject. For example:
Class PersonDao -> Subject: interface between the DB and Java code.
It WILL NOT:
- cache Person instances
- update fields automatically (for example, update the field 'lastModified')
- find the database
Why?
Because for all these other things, there will be other classes doing it! Respectively:
- a cache around the PersonDao is concerned with the efficient storage of information to avoid hitting the DB more often than necessary
- the Service class which uses the DAO is responsible for modifying anything that needs to be modified automagically.
- To find the database is responsibility of the DataSource (usually part of a framework like Spring) and your Dao should NOT be worried about that. It's not part of its subject.
TDD is the answer
The need for this kind of separation becomes really clear when you do TDD (Test-Driven Development). Try to do TDD on bad code where a single class does all sorts of things! You can't even get started with one unit test! So this is my final hint: use TDD and that will tell you how big a class should be.
I think the thing to optimize for is that you can accomplish a result in one round trip to the server. I have an ad-hoc collection of methods on my service object, one for each situation the client finds itself in when it has to get something done. You do not want the client to RPC to the server several times in a row while the user is sitting there waiting.
REST makes things orthogonal, but orthogonality has a cost: there is a reason that the frequently used verbs in languages are irregular. In terms of maintaing clean orthogonal structure to your app, make sure your schema is well-designed. That is where each class should have semantics orthogonal to that of the other classes. When the semantics of each RPC call can be stated cleanly in the schema there will be no confusion as to what they mean, even if they aren't REST-fully ideal.

Parallel between LMax Disruptor and Rx Framework concept?

As I read here http://mechanitis.blogspot.fr/2011/06/dissecting-disruptor-how-do-i-read-from.html
"for every individual item, the Consumer simply says "Let me know when you've got more than this number", and is told in return how many more entries it can grab."
Doesn't this relates to Rx Framework concept as exposed by Erik Meijer
http://www.youtube.com/watch?v=8Mttjyf-8P4 ?
If yes could Rx Framework be helpfull to implement similar piece of software ?
Nice question, I've been wondering about this myself, for one of my current projects.
I don't feel greatly qualified to give a definitive answer, however:
They are designed to scratch different itches.
Disruptor is clearly designed for performance first, as close to the metal as possible. It doesn't do anything fancy apart from what it does.
Rx is higher level, it is 'Linq to events', it allows you to do nice things with 'events' that you couldn't with normal framework events (you can't filter a standard event and then continue propagating it as an event).
Semantic differences
As the originator of Disruptor.Net pointed out here:
The interface matches but I think the semantic behind RX does not:
an exception (OnError) terminates the stream, this in not the case with the disruptor
you can not subscribe to the disruptor while it's hot: observers would have to be setup before "starting" the disruptor, this does not
work very well with operators like retry for instance which will re-
subscribe in case of error
lots of operators do not make sense with the disruptor or would just not work
Having said that, he was (at least at one time) thinking about integration between Disruptor.Net, TPL Dataflow and Rx.
Here is another page where someone asks the same question, the page concludes with:
Disruptor is in fact more like TPL DataFlow in my opinion.
Without know the Rx framework, you could be right. However Disruptor.Net is designed to be a port of the Java version so it will be as similar as possible. Given the original doesn't use Rx, it would add lots of rework and possibly performance issues to use a different library.

Task scheduling frameworks - not thread scheduling!

I'm working on a Java application which should allow users to optimize their daily schedule. For that, I need a framework that helps calculate optimal times for "tasks" taking note of:
Required resources and resource usage limits
Dependencies between tasks (can do with only F->S relations though)
Earliest and latest start-finish times, slack times
Baseline vs. actual times - allowing to report actual start and finish times, updating the rest of the tasks accordingly
Some clarifications: I am not looking for neither a framework to draw these gantts, nor a framework that deals with one specific problem domain (such as classrooms), and definitely not a framework that deals with thread scheduling.
Thanks!
I don't think there is a framework that will suit your needs out of the box. I know you said you're not looking for a job/thread scheduler, but I think your best bet is probably to roll your own optimization/prioritization code around a "dumb" job/thread scheduling framework like Quartz (or whatever you have in place). If you go with Quartz, the API can probably provide you with some information useful for items 3 and 4 of your optimization criteria. Additionally, Quartz has a job "priority" concept, so once you've computed the optimized priority, it should make scheduling the execution easy.
If you do find a framework that does what you ask, please post back here -- I'm sure there are others who could use something similar.
You could check for a project management software. It seems you need it written in java with the ability to modify the code. It really narrows down the list but I made a quick scan and I see at least 2 of them which could help (Endeavour and Project.net).
Perhaps what you need is something like evolutionary/genetic algorithm to generate an optimized schedule?
If yes, you may have a look at this Watchmaker Framework:
http://watchmaker.uncommons.org/
With evolutionary/genetic algorithm, it randomly generate a pool of schedule. Your main focus will be defining the scoring criteria to evaluate each schedule generated. Then let it(the schedules generated) evolve from generation to generation until it is optimum enough for you.

How easily customizable are SAP industry-specific solutions?

First of all, I have a very superficial knowledge of SAP. According to my understanding, they provide a number of industry specific solutions. The concept seems very interesting and I work on something similar for banking industry. The biggest challenge we face is how to adapt our products for different clients. Many concepts are quite similar across enterprises, but there are always some client-specific requirements that have to be resolved through configuration and customization. Often this requires reimplementing and developing customer specific features.
I wonder how efficient in this sense SAP products are. How much effort has to be spent in order to adapt the product so it satisfies specific customer needs? What are the mechanisms used (configuration, programming etc)? How would this compare to developing custom solution from scratch? Are they capable of leveraging and promoting best practices?
Disclaimer: I'm talking about the ABAP-based part of SAP software only.
Disclaimer 2, ref PATRYs response: HR is quite a bit different from the rest of the SAP/ABAP world. I do feel rather competent as a general-purpose ABAP developer, but HR programming is so far off my personal beacon that I've never even tried to understand what they're doing there. %-|
According to my understanding, they provide a number of industry specific solutions.
They do - but be careful when comparing your own programs to these solutions. For example, IS-H (SAP for Healthcare) started off as an extension of the SD (Sales & Distribution) system, but has become very much more since then. While you could technically use all of the techniques they use for their IS, you really should ask a competent technical consultant before you do - there are an awful lot of pits to avoid.
The concept seems very interesting and I work on something similar for banking industry.
Note that a SAP for Banking IS already exists. See here for the documentation.
The biggest challenge we face is how to adapt our products for different clients.
I'd rather rephrase this as "The biggest challenge is to know where the product is likely to be adapted and to structurally prepare the product for adaption." The adaption techniques are well researched and easily employed once you know where the customer is likely to deviate from your idea of the perfect solution.
How much effort has to be spent in
order to adapt the product so it
satisfies specific customer needs?
That obviously depends on the deviation of the customer's needs from the standard path - but that won't help you. With a SAP-based system, you always have three choices. You can try to customize the system within its limits. Customizing basically means tweaking settings (think configuration tables, tens of thousands of them) and adding stuff (program fragments, forms, ...) in places that are intended to do so. Technology - see below.
Sometimes customizing isn't enough - you can develop things additionally. A very frequent requirement is some additional reporting tool. With the SAP system, you get the entire development environment delivered - the very same tools that all the standard applications were written with. Your programs can peacefully coexist with the standard programs and even use common routines and data. Of course you can really screw things up, but show me a real programming environment where you can't.
The third option is to modify the standard implementations. Modifications are like a really sharp two-edged kitchen knife - you might be able to cook really cool things in half of the time required by others, but you might hurt yourself really badly if you don't know what you're doing. Even if you don't really intend to modify the standard programs, it's very comforting to know that you could and that you have full access to the coding.
(Note that this is about the application programs only - you have no chance whatsoever to tweak the kernel, but fortunately, that's rarely necessary.)
What are the mechanisms used (configuration, programming etc)?
Configurations is mostly about configuration tables with more or less sophisticated dialog applications. For the programming part of customizing, there's the extension framework - see http://help.sap.com/saphelp_nw70ehp1/helpdata/en/35/f9934257a5c86ae10000000a155106/frameset.htm for details. It's basically a controlled version of dependency injection. As a solution developer, you have to anticipate the extension points, define the interface that has to be implemented by the customer code and then embed the call in your code. As a project developer, you have to create an implementation that adheres to the interface and activate it. The basic runtime system takes care of glueing the two programs together, you don't have to worry about that.
How would this compare to developing custom solution from scratch?
IMHO this depends on how much of the solution is the same for all customers and how much of it has to be adapted. It's really hard to be more specific without knowing more about what you want to do.
I can only speak for the Human Resource component, but this is a component where there is a lot of difference between customers, based on a common need.
First, most of the time you set the value for a group, and then associate the object (person, location...) with a group depending on one or two values. This is akin to an indirection, and allow for great flexibility, as you can change the association for a given location without changing the others. in a few case, there is a 3 level indirection...
Second, there is a lot of customization that is nearly programming. Payroll or administrative operations are first class example of this. In the later cas, you get a table with the operation (hiring for example), the event (creation, modification...) a code for the action (I for test, F to call a function, O for a standard operation) and a text field describing the parameters of a function ("C P0001, begda, endda" to create a structure P001 with default values).
Third, you can also use such a table to indicate a function or class (ABAP-OO), that will be dynamically called. You get a developer to create this function or class, and then indicate this in the table. This is a method to replace a functionality by another one, or extend it. This is used extensively in the ESS/MSS.
Last, there is also extension point or file that you can modify. this is nearly the same as the previous one, except that you don't need to indicate the change : the file is always used (ZXPADU01/02 for HR modification of infotype)
hope this help
Guillaume PATRY

Categories