Global variables in the play framework / Keeping track of open streams - java

I'm working on a twitter app right now using twitter4j. Basically I am allowing users to create accounts, add their personal keywords, and start their own stream.
I'm trying to play as nice as possible with the twitter api. Avoid rate limits, don't be connecting the same account over and over etc. So what I think I need to do is have some object that contains a list of all the active TwitterStream object, but I don't know how to approach this. This is the controller to start the stream.
public static Result startStream(){
ObjectNode result = Json.newObject();
if (
//openStreams is a Map<Long,TwitterStream> in the TwitterListener class
TwitterListener.openStreams.containsKey(Long.parseLong(session().get("id")))
){
result.put("status", "running");
return ok(result);
}
Cache.set("twitterStream", TwitterListener.listener(
Person.find.byId(
Long.parseLong(session().get("id"))
)
)
);
result.put("status", "OK");
return ok(result);
}
As you can see I am putting them in Cache right now but I'd like to keep streams open for long periods of time, so cache won't suffice.
What is the most appropriate way to structure my application for this purpose?
Should I be using Akka?
How could I implement Play's Global object to do this?

As soon as you start to think about introducing global state in your application, you have to ask yourself, is there any possibility that I might want to scale to multiple nodes, or have multiple nodes for the purposes of redundancy? If there's even the slightest chance that the answer is yes, then you should use Akka, because Akka will allow you to easily adapt your code to work in a multi node environment with Akka clustering by simply introducing a consistent hashing router. If you don't use Akka, then you'll practically have to redesign your application when the requirement for multiple nodes comes in.
So I'm going to assume that you want to future proof your application, and explain how to use Akka (Akka is a nice way of managing global state anyway even if you don't need multiple nodes).
So in Akka, what you want is an actor that is a stream manager, this will be responsible for creating stream actors if they don't already exist as children of itself. Then the stream actors will be responsible for handling the stream, sending the stream to subscribers, and tracking how many connections are subscribed to them to them, and shutting down when there are no longer any subscribers.

Related

Events handling between Spring boot and Reactjs

This might seem like an easy solution got on the internet, but believe me, I have seen through a lot of examples and couldn't figure out which approach to choose.
Requirement :
I have a subscriber at the application service(spring boot/Java) end, subscribed to blockchain events( corda ). I want to push this event to UI (ReactJS) whenever there is a change in state.
I could subscribe to the blockchain events successfully but stuck with multiple in-complete or tangled ideas of pushing it to the UI and how UI would receive my events ( kindly don't suggest paid services, APIs, Libraries etc ).
I have come across and tried out all approach, since I'm newly working on events I need some ray of light as to how to approach towards a complete solution.
Publisher-subscriber pattern
Observable pattern
Sse emitter
Flux & Mono
Firebase ( a clear NO )
+Boggler :
events handling between service and UI , should it be via API/endpoint calls or can it be emitted just in air( i'm not clear) and based on event name can we subscribe to it in UI ?
should i have two APIs dedicated for this ? one trigger subscribe and other actually executes emitter ?
If the endpoint is always being heard doesn't it needs dedicated resource ?
I basically need a CLEAR approach to handle this.
Code can be provided based on demand
I see you mention you are able to capture events in Spring Boot. So you are left with sending the event information to the front-end. I could think of three ways to do this.
Websockets: Might be an over-kill, as I suppose you won't need bi-directional communication.
SEE: Perhaps a better choice than WebSockets.
Or simply Polling: Not a bad choice either, if you are not looking for realtime notifications.
Yes Long Polling.
The solution seems to be pretty simple. Make the connection once and let them wait for as long as possible. So that in the meanwhile if any new data comes to the server, the server can directly give the response back. This way we can definitely reduce the number of requests and response cycles involved.
You will find multiple implementation examples of How Long Polling is done as part of Spring Boot project on internet.

Invoking an external REST API concurrently via AKKA in Java

I am trying to
get the information of cars from Tesla Server through its API
. And I want to do it concurrently .i.e fetch the information of multiple cars in parallel using AKKA actors
My Approach:
(1) First get the total number of cars.
(2) Create actors equal to the number of cars.
(3) Inside each actor call rest API to get the information of cars in parallel. i.e.each actor will be provided with url containing the car id.
Am I doing it right regarding my approach?
Specifically, in point number 3, I have made call to the Tesla Server inside each actor using AsyncHttpClient from com.ning. Will using AsyncHttpClient inside each actor ensure that each actor will send request asynchronously to the server without blocking other actors?
Will provide further information if need be. I am beginner in AKKA. Looked a lot of threads but could not find exactly what I was looking for.
Specifically for point number 3, as long as you use a Future based API in your actors, the actors will not block.
In general it is hard to tell much more about your approach without knowing why you chose to use one actor per car.
Consider this question: why couldn't you simply create a listOfCars: List[String] of URLs and use Future.traverse(listOfCars)(downloadCarDataForUrl _)?
Finally, I don't know how AsyncHttpClient behaves, but I would double check that if you have a list of thousands of cars, AsyncHttpClient will not concurrently download all of them... if that's the case, you risk being blocked quite quickly by the api provider. If this becomes a problem, you could look into akka-http which only uses a limit number of connection to a certain host.

How to store variables that can be accessed by all Spouts and Bolts in Apache Storm?

I have a Storm topology that creates many Spouts and Bolts. They will obviously be spread out on various systems/nodes which have their own JVM's.
I understand that Storm will automatically manage the network communications so that the tuples emitted by the Spout will reach the Bolts on a different JVM.
What I don't understand is about how I can maintain a few variables that can keep track of things.
I want one variable that counts the number of tuples that have been processed by all instances of Bolt-A. Another variable for counting for Bolt-B and so on.
I also need a variable that acts as a flag so that I'll know when the Spouts have no more data to emit, so that the Bolts can start writing to SQL.
I considered using Redis, but wanted to know if that is the best way or is there any other way? Any code samples available anywhere? I Google-searched, but couldn't find much useful info.
First of all, there's no way to share the variable between tasks on Storm.
Instead of directly sharing the flag, you can define your own 'control' message and send it to Bolts to know there're no message for Spout to emit.
Sharing state with Redis is one of possible options (you need to implement your own logic), but flag value could be flickering so you may want to take care of it.
You should be able to get the number of tuples emitted and transferred per component and also per instance of each component from the Storm UI. There is even a REST API to retrieve the values.
For the fist requirement you can may use Metrics API (http://storm.apache.org/releases/0.10.1/Metrics.html)
For the second requirement, why not send a "flush" tuple similar to the timer tuple?

Using Stream API for organising application pipeline

As far as I know Stream API is intended to be applied on collections. But I like the idea of them so much that I try to apply them when I can and when I shouldn't.
Originally my app had two threads communicating through BlockingQueue. First would populate new elements. Second make transformations on them and save on disk. Looked like a perfect stream oportunity for me at a time.
Code I ended up with:
Stream.generate().flatten().filter().forEach()
I'd like to put few maps in there but turns out I have to drag one additional field till forEach. So I either have to create meaningless class with two fields and obscure name or use AbstractMap.SimpleEntry to carry both fields through, which doesn't look like a great deal to me.
Anyway I'd rewritten my app and it even seems to work. However there are some caveats. As I have infinite stream 'the thing' can't be stopped. For now I'm starting it on daemon thread but this is not a solution. Business logic (like on connection loss/finding, this is probably not BL) looks alienated. Maybe I just need proxy for this.
On the other hand there is free laziness with queue population. One thread instead of two (not sure how good is this). Hopefully familiar pattern for other developers.
So my question is how viable is using of Stream API for application flow organising? Is there more underwather roks? If it's not recomended what are alternatives?

Designing a point system in Spring

I have a lot of existing data in my database already, and want to develop a points mechanism that computes a score for each user based on what actions they do.
I am implementing this functionality in a pluggable way, so that it is independent of the main logic, and relies on Spring events being sent around, once an entity gets modified.
The problem is what to do with the existing data. I do not want to start collecting points from now, but rather include all the data until now.
What is the most practical way to do this? Should I design my plugins in such a way as to provide for an index() method, which will force my system to fetch every single entity from the database, send an EntityDirtyEvent, to fire the points plugins, for each one, and then update it, to let points get saved next to each entity. That could result in a lot of overhead, right?
The simplest thing would be to create a complex stored procedure, and then make the index() call that stored procedure. That however, seems to me like a bad thing either. Since I will have to write the logic for computing the points in java anyway, why have it once again in SQL? Also, in general I am not a fan of splitting business logic into the different layers.
Has anyone done this before? Please help.
First let's distinguish between the implementation strategy and business rules.
Since you already have the data, consider obtaining results directly from the data. This forms the data domain model. Design the data model to store all your data. Then, create a set of queries, views and stored procedures to access and update the data.
Once you have those views, use a data access library such as Spring JDBC Template to fetch this data and represent them into java objects (lists, maps, persons, point-tables etc).
What you have completed thus far does not change much, irrespective of what happens in the upper layers of the system. This is called Model.
Then, develop a rule base or logic implementation which determines, under what inputs, user actions, data conditions or for all other conditions, what data is needed. In mathetical sense, this is like a matrix. In programming sense, this would be a set of logic statements. If this and this and this is true, then get this data, else get that data, etc. This encompasses the logic in your system. Hence it is called "Controller".
Do not move this logic into the queries/stored procedure/views.
Then finally develop a front-end or "console" for this. In the simplest case, develop a console input system, which takes a .. and displays a set of results. This is your "view" of the system.
You can eventually develop the view into a web application. The above command-line view can still be viable in the form of a Restful API server.
I think there is one problem here to be considered: as I understand there's huge data in the Database so the idea to create only one mechanism to calculate the point system could not be the best approach.
In fact if you don't want to start collecting points but include all the data, you must process and calculate the information you have now. Yes, the first time you will run this can result an overhead, but as you said, you need this data calculated.
By other hand you may include another mechanism that attends changes in an entity and launches a different process capable of calculate the new pointing diffence that applies to this particular modification.
So, you can use one Service responsible of calculate the pointing system, one for a single entity and another, may be longer to finish, capable of calculate the global points. Even, if you don't need to be calculated in real-time you can create a scheduled job responsible of launch it.
Finally, I know it's not a good approach to split the business logic in two layers (Db + Java) but sometimes is a requirement do it, for example, if you need to reply quickly to a request that finally works with a lot of registries. I've found some cases that there's no other option than add business logic to the database (as a stored procedures, etc) to manage a lot of data and return the final result to the browser client (ex: calculation process in one specific time).
You seem to be heading in the right direction. You know you want your "points" thing decoupled from the main application. Since it is implied you are already using hibernate (by the tag!), you can tap into the hibernate event system (see here section 14.2). Depending upon the size/complexity of your system, you can plugin your points calculations here (if it is not a large/complex system), or you can publish your own event to be picked up by whatever software is listening.
The point in either design approach is that neither knows or cares about your point calculations. If you are, as I am guessing, trying to create a fairly general purpose plugin mechanism, then you publish your own events to that system from this tie-in point. Then if you have no plug-ins on a given install/setup, then no one gets/processes the events. If you have multiple plug-ins on another install/setup, then they each can decide what processing they need to do based upon the event received. In the case of the "points plugin" it would calculate it's point value and store it. No stored proc required....
You're trying to accomplish "bootstrapping." The approach you choose should depend on how complicated the point calculations are. If stored procedures or plain update statements are the simplest solution, do that.
If the calculations are complicated, write a batch job that loads your existing data, probably orders it oldest first, and fires the events corresponding to that data as if they've just happened. The code which deals with an event should be exactly the same code that will deal with a future event, so you won't have to write any additional code other than the batch jobs themselves.
Since you're only going to run this thing once, go with the simplest solution, even if it is quick and dirty.
There are two different ways.
One is you already know that - poll the database for for changed data. In that case you are hitting the database when there may not be change and it may slow down your process.
Second approach - Whenever change happens in database, the database will fire the event. That you can to using CDC (Change Data Capture). It will minimize the overhead.
You can look for more options in Spring Integration

Categories