I am building an application in Play Framework that has to do some intense file parsing. This parsing involves parsing multiple files, preferably in parallel.
A user uploads an archive that gets unziped and the files are stored on the drive.
In that archive there is a file (let's call it main.csv) that has multiple columns. One such column is the name of another file from the archive (like subPage1.csv). This column can be empty, so that not all rows from the main.csv have subpages.
Now, I start an Akka Actor to parse the main.csv file. In this actor, using #Inject, I have another ActorRef
public MainParser extends ActorRef {
#Inject
#Named("subPageParser")
private AcgtorRef subPageParser;
public Receive createReceive() {
...
if (column[3] != null) {
subPageParser.tell(column[3], getSelf());
}
}
}
SubPageParser Props:
public static Props getProps(JPAApi jpaApi) {
return new RoundRobinPool(3).props(Props.create((Class<?>) SubPageParser.class, jpaApi));
}
Now, my question is this. Considering that a subPage may take 5 seconds to be parsed, will I be using a single instance of SubPageParser or will there be multiple instances that do the processing in parallel.
Also, consider another scenario, where the names are stored in the DB, and I use something like this:
List<String> names = dao.getNames();
for (String name: names) {
subPageParser.tell(name, null);
}
In this case, considering that the subPageParser ActorRef is obtained using Guice #Inject as before, will I do parallel processing?
If I am doing processing in parallel, how do I control the number of Actors that are being spawned? If I have 1000 subPages, I don't want 1000 Actors. Also, their lifetime may be an issue.
NOTE:
I have an ActorsModule like this, so that I can use #Inject and not Props:
public class ActorsModule extends AbstractModule implements AkkaGuiceSupport {
#Override
protected void configure() {
bindActor(MainParser.class, "mainparser");
Function<Props, Props> props = p -> SubPageParser.getProps();
bindActor(SubPageParser.class, "subPageParser", props);
}
}
UPDATE: I have modified to use a RoundRobinPool. However, This does not work as intended. I specified 3 as the number of instances, but I get a new object for each parse request tin the if.
Injecting an actor like you did will lead to one SubPageParser per MainParser. While you might send 1000 messages to it (using tell), they will get processed one by one while the others are waiting in the mailbox to be processed.
With regards to your design, you need to be aware that injecting an actor like that will create another top-level actor rather than create the SubPageParser as a child actor, which would allow the parent actor to control and monitor it. The playframework has support for injecting child actors, as described in their documentation: https://www.playframework.com/documentation/2.6.x/JavaAkka#Dependency-injecting-child-actors
While you could get akka to use a certain number of child actors to distribute the load, I think you should question why you have used actors in the first place. Most problems can be solved with simple Futures. For example you can configure a custom thread pool to run your Futures with and have them do the work at a parallelization level as you wish: https://www.playframework.com/documentation/2.6.x/ThreadPools#Using-other-thread-pools
Related
I read "Clean Code" book ((c) Robert C. Martin) and try to use SRP(single responsibility principle). And I have some questions about it. I have some service in my application, and I do not know how can I refactor it so it matched the right approach. For example, I have service:
public interface SendRequestToThirdPartySystemService {
void sendRequest();
}
What does it do if you look at the class name? - send a request to the third party system. But I have this implementation:
#Slf4j
#Service
public class SendRequestToThirdPartySystemServiceImpl implements SendRequestToThirdPartySystemService {
#Value("${topic.name}")
private String topicName;
private final EventBus eventBus;
private final ThirdPartyClient thirdPartyClient;
private final CryptoService cryptoService;
private final Marshaller marshaller;
public SendRequestToThirdPartySystemServiceImpl(EventBus eventBus, ThirdPartyClient thirdPartyClient, CryptoService cryptoService, Marshaller marshaller) {
this.eventBus = eventBus;
this.thirdPartyClient = thirdPartyClient;
this.cryptoService = cryptoService;
this.marshaller = marshaller;
}
#Override
public void sendRequest() {
try {
ThirdPartyRequest thirdPartyRequest = createThirdPartyRequest();
Signature signature = signRequest(thirdPartyRequest);
thirdPartyRequest.setSignature(signature);
ThirdPartyResponse response = thirdPartyClient.getResponse(thirdPartyRequest);
byte[] serialize = SerializationUtils.serialize(response);
eventBus.sendToQueue(topicName, serialize);
} catch (Exception e) {
log.error("Send request was filed with exception: {}", e.getMessage());
}
}
private ThirdPartyRequest createThirdPartyRequest() {
...
return thirdPartyRequest;
}
private Signature signRequest(ThirdPartyRequest thirdPartyRequest) {
byte[] elementForSignBytes = marshaller.marshal(thirdPartyRequest);
Element element = cryptoService.signElement(elementForSignBytes);
Signature signature = new Signature(element);
return signature;
}
What does it do actually? - create a request -> sign this request -> send this request -> to send the response to Queue
This service inject 4 another services: eventBus, thirdPartyClient, cryptoSevice and marshaller. And in sendRequest method calls each this service.
If I want to create a unit test for this service, I need mock 4 services. I think it's too much.
Can somebody indicate how can this service be changed?
Change the class name and leave as is?
Split into several classes?
Something else?
The SRP is a tricky one.
Let's ask two questions:
What is a responsibility?
What are the different types of responsibilities?
One important thing about responsibilities is that they have a Scope and you can define them in different levels of Granularity. and are hierarchical in nature.
Everything in your application can have a responsibility.
Let's start with Modules. Each module has responsibilities an can adhere to the SRP.
Then this Module can be made of Layers. Each Layer has a responsibility and can adhere to the SRP.
Each Layer is made of different Objects, Functions etc. Each Object and/or Function has responsibilities and can adhere to the SRP.
Each Object has Methods. Each Method can adhere to the SRP. Objects can contain other objects and so on.
Each Function or Method in an Object is made of statements and can be broken down to more Functions/Methods. Each statement can have responsibilities too.
Let's give an example. Let's say we have a Billing module. If this module is implemented in a single huge class, does this module adhere to the SRP?
From the point of view of the system, the module does indeed adhere to the SRP. The fact that it's a mess doesn't affect this fact.
From the point of view of the module, the class that represents this module doesn't adhere to the SRP as it will do a lot of other things, like communicate with DB, send Emails, do business logic etc.
Let's take a look at the different types of responsibilities.
When something should be done
How it should be dome
Let's take an example.
public class UserService_v1 {
public class SomeOperation(Guid userID) {
var user = getUserByID(userID);
// do something with the user
}
public User GetUserByID(Guid userID) {
var query = "SELECT * FROM USERS WHERE ID = {userID}";
var dbResult = db.ExecuteQuery(query);
return CreateUserFromDBResult(dbResult);
}
public User CreateUserFromDBResult(DbResult result) {
// parse and return User
}
}
public class UserService_v2 {
public void SomeOperation(Guid userID) {
var user = UserRepository.getByID(userID);
// do something with the user
}
}
Let's take a look at these two implementations.
UserService_v1 and UserService_v2 do exactly the same thing but different ways. From the point of view of the System, these services adhere to the SRP as they contain operations related to Users.
Now let's take a look at what they actually do to complete their work.
UserService_v1 does these things:
Builds a SQL query string.
Calls the db to execute the query
Takes the specific DbResult and creates a User from it.
Does the operation on the User
UserService_v2 does these things:
1. Requests from the repository the User by ID
2. Does the operation on the User
UserService_v1 contains:
How specific query is build
How the specific DbResult is mapped to a User
When this query need to be called (in the begging of the operation in this case)
UserService_v1 contains:
When a User should be retrieved from the DB
UserRepository contains:
How specific query is build
How the specific DbResult is mapped to a User
What we do here is to move the responsibility of How from the Service to the Repository. This way each class has one reason to change. If how changes, we change the Repository. If when changes, we change the Service.
This way we create objects that collaborate with each other to do specific work, by dividing responsibilities. The tricky parts is: what responsibilities we divide?
If we have a UserService and OrderService we don't divide when and how here. We divide what so we can have one service per Entity in our system.
It's natural for there services to need other objects to do their work. We can of course add all of the responsibilities of what, when and how to a single object but that just makes to the messy, unreadable and hard to change.
In this regard the SRP helps us to achieve cleaner code by having more smaller parts that collaborate with and use each other.
Let's take a look at your specific case.
If you can move the responsibility of how the ClientRequest is created and signed by moving it to the ThirdPartyClient, your SendRequestToThirdPartySystemService will only tell when this request should be sent. This will remove Marshaller, and CryptoService as dependencies from your SendRequestToThirdPartySystemService.
Also you have SerializationUtils that you probably rename to Serializer to capture the intent better as Utils is something that we stick to objects that we just don't know how to name and contains a lot of logic (and probably multiple responsibilities).
This will reduce the number of dependencies and your tests will have less things to mock.
Here's a version of the sendRequest method with less responsibilities.
#Override
public void sendRequest() {
try {
// params are not clear as you don't show them to your code
ThirdPartyResponse response = thirdPartyClient.sendRequest(param1, param2);
byte[] serializedMessage = SerializationUtils.serialize(response);
eventBus.sendToQueue(topicName, serialize);
} catch (Exception e) {
log.error("Send request was filed with exception: {}", e.getMessage());
}
}
From your code I'm not sure if you can also move the responsibility of serialization and deserialization to the EventBus, but if you can do that, it will remove Seriazaliation from your service also. This will make the EventBus responsible for how it serialized and stores the things inside it making it more cohesive. Other objects that collaborate with it will just tell it to send and object to the queue not caring how this objects get's there.
I am developing an application that creates some Akka actors to manage and process messages coming from a Kafka topic. Messages with the same key are processed by the same actor. I use the message key also to name the corresponding actor.
When a new message is read from the topic, I don't know if the actor with the id equal to the message key was already created by the actor system or not. Therefore, I try to resolve the actor using its name, and if it does not exist yet, I create it. I need to manage concurrency in regard to actor resolution. So it is possible that more than one client asks the actor system if an actor exists.
The code I am using right now is the following:
private CompletableFuture<ActorRef> getActor(String uuid) {
return system.actorSelection(String.format("/user/%s", uuid))
.resolveOne(Duration.ofMillis(1000))
.toCompletableFuture()
.exceptionally(ex ->
system.actorOf(Props.create(MyActor.class, uuid), uuid))
.exceptionally(ex -> {
try {
return system.actorSelection(String.format("/user/%s",uuid)).resolveOne(Duration.ofMillis(1000)).toCompletableFuture().get();
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
});
}
The above code is not optimised, and the exception handling can be made better.
However, is there in Akka a more idiomatic way to resolve an actor, or to create it if it does not exist? Am I missing something?
Consider creating an actor that maintains as its state a map of message IDs to ActorRefs. This "receptionist" actor would handle all requests to obtain a message processing actor. When the receptionist receives a request for an actor (the request would include the message ID), it tries to look up an associated actor in its map: if such an actor is found, it returns the ActorRef to the sender; otherwise it creates a new processing actor, adds that actor to its map, and returns that actor reference to the sender.
I would consider using akka-cluster and akka-cluster-sharding. First, this gives you throughput, and as well, reliability. However, it will also make the system manage the creation of the 'entity' actors.
But you have to change the way you talk to those actors. You create a ShardRegion actor which handles all the messages:
import akka.actor.AbstractActor;
import akka.actor.ActorRef;
import akka.actor.ActorSystem;
import akka.actor.Props;
import akka.cluster.sharding.ClusterSharding;
import akka.cluster.sharding.ClusterShardingSettings;
import akka.cluster.sharding.ShardRegion;
import akka.event.Logging;
import akka.event.LoggingAdapter;
public class MyEventReceiver extends AbstractActor {
private final ActorRef shardRegion;
public static Props props() {
return Props.create(MyEventReceiver.class, MyEventReceiver::new);
}
static ShardRegion.MessageExtractor messageExtractor
= new ShardRegion.HashCodeMessageExtractor(100) {
// using the supplied hash code extractor to shard
// the actors based on the hashcode of the entityid
#Override
public String entityId(Object message) {
if (message instanceof EventInput) {
return ((EventInput) message).uuid().toString();
}
return null;
}
#Override
public Object entityMessage(Object message) {
if (message instanceof EventInput) {
return message;
}
return message; // I don't know why they do this it's in the sample
}
};
public MyEventReceiver() {
ActorSystem system = getContext().getSystem();
ClusterShardingSettings settings =
ClusterShardingSettings.create(system);
// this is setup for the money shot
shardRegion = ClusterSharding.get(system)
.start("EventShardingSytem",
Props.create(EventActor.class),
settings,
messageExtractor);
}
#Override
public Receive createReceive() {
return receiveBuilder().match(
EventInput.class,
e -> {
log.info("Got an event with UUID {} forwarding ... ",
e.uuid());
// the money shot
deviceRegion.tell(e, getSender());
}
).build();
}
}
So this Actor MyEventReceiver runs on all nodes of your cluster, and encapsulates the shardRegion Actor. You no longer message your EventActors directly, but, using the MyEventReceiver and deviceRegion Actors, you use the sharding system keep track of which node in the cluster the particular EventActor lives on. It will create one if none have been created before, or route it messages if it has. Every EventActor must have a unique id: which is extracted from the message (so a UUID is pretty good for that, but it could be some other id, like a customerID, or an orderID, or whatever, as long as its unique for the Actor instance you want to process it with).
(I'm omitting the EventActor code, it's otherwise a pretty normal Actor, depending what you are doing with it, the 'magic' is in the code above).
The sharding system automatically knows to create the EventActor and allocate it to a shard, based on the algorithm you've chosen (in this particular case, it's based on the hashCode of the unique ID, which is all I've ever used). Furthermore, you're guaranteed only one Actor for any given unique ID. The message is transparently routed to the correct Node and Shard wherever it is; from whichever Node and Shard it's being sent.
There's more info and sample code in the Akka site & documentation.
This is a pretty rad way to make sure that the same Entity/Actor always processes messages meant for it. The cluster and sharding takes automatic care of distributing the Actors properly, and failover and the like (you would have to add akka-persistence to get passivation, rehydration, and failover if the Actor has a bunch of strict state associated with it (that must be restored)).
The answer by Jeffrey Chung is indeed of Akka way. The downside of such approach is its low performance. The most performant solution is to use Java's ConcurrentHashMap.computeIfAbsent() method.
I have a use case where I am creating multiple AWS resources, for example S3 buckets, SNS topics etc in a cloudformation stack. All of these are bare minimalistic resources i.e the S3 bucket would not have any objects in it.
I have a workflow set up where my code will pick up a random resourceType and then create the resource. Now, I am trying to build a generic class/method which would delete these created resources.
I store the resources as json fields which give me the details about the resourceType
{
"AWSService": "S3",
"AWSResourceType": "Bucket",
"ResourceAttributes" : {
"BucketName": "MyBucket"
}
}
For the cleanup, I was thinking that I have a map with key as the AWSService and the value to be a runnable which would call the appropriate serviceType to delete the resource.
But, runnables cannot take in parameters, and therefore I cannot pass in the resourceName/Arn to be deleted so that the API knows which resource to delete.
Is there a way I can store this information as a map and still pass in parameters to the method being executed?
Not sure I fully understand all of the implications of what you are doing without seeing some code, but I think this might get you going in the right direction.
You can implement the Runnable interface in a new generic class: http://leo.ugr.es/elvira/devel/Tutorial/Java/essential/threads/clock.html
So what you could do is create a generic class that implements the runnable interface, and has either a constructor that can take the variables you need, or getters/setters, etc.
Something along the lines of:
public class ResourceCleanup implements Runnable {
private String arn;
#Override
public void run() {
// do the cleanup with the arn
}
public ResourceCleanup(String arn) {
this.arn = arn;
}
// etc.
}
Or you could pass in the map instead of the String, use Java Generics, etc. as necessary. Hope this helps!
I'm new to akka and I'm trying akka on java. I'd like to understand unit testing of business logic within actors. I read documentation and the only example of isolated business logic within actor is:
static class MyActor extends UntypedActor {
public void onReceive(Object o) throws Exception {
if (o.equals("say42")) {
getSender().tell(42, getSelf());
} else if (o instanceof Exception) {
throw (Exception) o;
}
}
public boolean testMe() { return true; }
}
#Test
public void demonstrateTestActorRef() {
final Props props = Props.create(MyActor.class);
final TestActorRef<MyActor> ref = TestActorRef.create(system, props, "testA");
final MyActor actor = ref.underlyingActor();
assertTrue(actor.testMe());
}
While this is simple, it implies that the method I want to test is public. However, considering actors should communicate only via messages, my understanding that there is no reason to have public methods, so I'd made my method private. Like in example below:
public class LogRowParser extends AbstractActor {
private final Logger logger = LoggerFactory.getLogger(LogRowParser.class);
public LogRowParser() {
receive(ReceiveBuilder.
match(LogRow.class, lr -> {
ParsedLog log = parse(lr.rowText);
final ActorRef logWriter = getContext().actorOf(Props.create(LogWriter.class));
logWriter.tell(log, self());
}).
matchAny(o -> logger.info("Unknown message")).build()
);
}
private ParsedLog parse(String rowText) {
// Log parsing logic
}
}
So to test method parse I either:
need it to make package-private
Or test actor's public interface, i.e. that next actor LogWriter received correct parsed message from my actor LogRowParser
My questions:
Are there any downsides on option #1? Assuming that actors communicating only via messages, encapsulation and clean open interfaces are less important?
In case if I try to use option #2, is there a way to catch messages sent from actor in test downstream (testing LogRowParser and catching in LogWriter)? I reviewed various examples on JavaTestKit but all of them are catching messages that are responses back to sender and none that would show how to intercept the message send to new actor.
Is there another option that I'm missing?
Thanks!
UPD:
Forgot to mention that I also considered options like:
Moving logic out of actors completely into helper classes. Is it common practice with akka?
Powermock... but i'm trying to avoid it if redesign is possible
There's really no good reason to make that method private. One generally makes a method on a class private to prevent someone who has a direct reference to an instance of that class from calling that method. With an actor instance, no one will have a direct reference to an instance of that actor class. All you can get to communicate with an instance of that actor class is an ActorRef which is a light weight proxy that only allows you to communicate by sending messages to be handled by onReceive via the mailbox. An ActorRef does not expose any internal state or methods of that actor class. That's sort of one of the big selling points of an actor system. An actor instance completely encapsulates its internal state and methods, protecting them from the outside world and only allows those internal things to change in response to receiving messages. That's why it does not seem necessary to mark that method as private.
Edit
Unit testing of an actor, IMO, should always go through the receive functionality. If you have some internal methods that are then called by the handling in receive, you should not focus on testing these methods in isolation but instead make sure that the paths that lead to their invocation are properly exercised via the messages that you pass during test scenarios.
In your particular example, parse is producing a ParsedLog message that is then sent on to a logWriter child actor. For me, knowing that parse works as expected means asserting that the logWriter received the correct message. In order to do this, I would allow the creation of the child logWriter to be overridden and then do just that in the test code and replace the actor creation with a TestProbe. Then, you can use expectMsg on that probe to make sure that it received the expected ParsedLog message thus also testing the functionality in parse.
As far as your other comment around moving the real business for the actor out into a separate and more testable class and then calling that from in the actor, some people do this, so it's not unheard of. I personally don't, but that's just me. If that approach works for you, I don't see any major issues with it.
I had the same problem 3 years ago, when dealing with actors : the best approach i found was to have minimum responsability to the actor messenging responsability.
The actor will receive the message and choose the Object's method to call or the message to send or the exception to throw and that's it.
This way it will be very simple to mock up either the services called by the actor and the input to those services.
I have implemented an Actor system using Akka and its Java API UntypedActor. In it, one actor (type A) starts other actors (type B) dynamically on demand, using getContext().actorOf(...);. Those B actors will do some computation which A doesn't really care about anymore. But I'm wondering: is it necessary to clean up those actors of type B when they have finished? If so, how?
By having B actors call getContext().stop(getSelf()) when they're done?
By having B actors call getSelf().tell(Actors.poisonPill()); when they're done? [this is what I'm using now].
By doing nothing?
By ...?
The docs are not clear on this, or I have overlooked it. I have some basic knowledge of Scala, but the Akka sources aren't exactly entry-level stuff...
What you are describing are single-purpose actors created per “request” (defined in the context of A), which handle a sequence of events and then are done, right? That is absolutely fine, and you are right to shut those down: if you don’t, they will accumulate over time and you run into a memory leak. The best way to do this is the first of the possibilities you mention (most direct), but the second is also okay.
A bit of background: actors are registered within their parent in order to be identifyable (e.g. needed in remoting but also in other places) and this registration keeps them from being garbage collected. OTOH, each parent has a right to access the children it created, hence no automatic termination (i.e. by Akka) makes sense, instead requiring explicit shutdown in user code.
In addition to Roland Kuhn's answer, rather than create a new actor for every request, you could create a predefined set of actors that share the same dispatcher, or you can use a router that distributes requests to a pool of actors.
The Balancing Pool Router, for example, allows you to have a fixed set of actors of a particular type share the same mailbox:
akka.actor.deployment {
/parent/router9 {
router = balancing-pool
nr-of-instances = 5
}
}
Read the documentation on dispatchers and on routing for further detail.
I was profiling(visualvm) one of the sample cluster application from AKKA documentation and I see garbage collection cleaning up the per request actors during every GC. Unable to completely understand the recommendation of explicitly killing the actor after use. My actorsystem and actors are managed by SPRING IOC container and I use spring extension in-direct actor-producer to create actors. The "aggregator" actor is getting garbage collected on every GC, i did monitor the # of instances in visual VM.
#Component
#Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class StatsService extends AbstractActor {
private final LoggingAdapter log = Logging.getLogger(getContext().getSystem(), this);
#Autowired
private ActorSystem actorSystem;
private ActorRef workerRouter;
#Override
public void preStart() throws Exception {
System.out.println("Creating Router" + this.getClass().getCanonicalName());
workerRouter = getContext().actorOf(SPRING_PRO.get(actorSystem)
.props("statsWorker").withRouter(new FromConfig()), "workerRouter");
super.preStart();
}
#Override
public Receive createReceive() {
return receiveBuilder()
.match(StatsJob.class, job -> !job.getText().isEmpty(), job -> {
final String[] words = job.getText().split(" ");
final ActorRef replyTo = sender();
final ActorRef aggregator = getContext().actorOf(SPRING_PRO.get(actorSystem)
.props("statsAggregator", words.length, replyTo));
for (final String word : words) {
workerRouter.tell(new ConsistentHashableEnvelope(word, word),
aggregator);
}
})
.build();
}
}
Actors by default do not consume much memory. If the application intends to use actor b later on, you can keep them alive. If not, you can shut them down via poisonpill. As long your actors are not holding resources, leaving an actor should be fine.