Processing large data from an API to visualise

Processing large data from an API to visualise - java

Let's say I have an API that gives me the values of stock for the last month. The data is sampled every hour.
Now I want to make a web app that would visualize this data on a line chart. I don't need all the hourly samples, so my question is how should I make this work?
My idea is that there would be a backend app (i.e. in Java Spring) that would GET the data from the API and calculate the average for each day (using a stream, maybe parallel stream?) and then put that in a new collection and pass it on to the front end to put in a chart.

Start thinking from the UI, what do you need there, how often do you need it and how fast ?
Then get the data from the backend, if there is too much data at once and the API cannot do otherwise, either:
get data and reduce to what the UI needs (backend), use once and throw away
OR get data and reduce to what the UI needs (backend), keep in cache for a while
OR pre-process the data so that when the UI needs it, it will be ready
For the return format, consider something lightweight, like some simple named json array {"dayAverages": [0.34, 1253.432, ...]}, "month" : 2, "year": 2018}, then in the UI adapt to the needs of your lib (that is debatable).
Also observe how users use the UI, then you may get some ideas on how to optimize the experience (preload next month ...)
If you do this for learning purposes, consider doing it async + lambdas = bonus :)

As to your question "...how should I make this work?" --
This is extremely broad. There are many, many ways to do this. Some of these ways depend heavily on your architecture, how much traffic is expected to your app, what request-load the API can handle, etc. Here are a few general things to consider:
Any sort of MVC architecture (or similar) would be a good fit for your Web app.
You mention needing a "backend app" of some type. Not sure what you mean here, but the averaging features can be built directly into your Web app framework without needing a separate back-end app.
If you're going to calculate averages for display in the Web app, you will need to maintain state somewhere. Assuming the API doesn't give this to you, you'll need a database of some type, or at least some type of memory caching storage engine to facilitate this. How you do this will depend on your architecture and the traffic/load on your app (e.g. will you have multiple, load-balanced servers).
Hope that helps. We could give more if you ask some specific questions.

Related

combined vs. separate backend calls

I try to figure out the best solution for a use case I'm working on. However, I'd appreciate getting some architectural advice from you guys.
I have a use case where the frontend should display a list of users assigned to a task and a list of users who are not assigned but able to be assigned to the same task.
I don't know what the better solution is:
have one backend call which collects both lists of users and sends them
back to the frontend within a new data class containing both lists.
have two backend calls which collect one of the two lists and send them
back separately.
The first solution's pro is the single backend call whereas the second solution's pro is the reusability of the separate methods in the backend.
Any advice on which solution to prefer and why?
Is there any pattern or standard I should get familiar with?

When I stumble across the requirement to get data from a server I start with doing just a single call for, more or less (depends on the problem domain), a single feature (which I would call your task-user-list).
This approach saves implementation complexity on the client's side and saves protocol overhead for transactions (TCP header, etc.).
If performance analysis shows that the call is too slow because it requests too much data (user experience suffers) then I would go with your 2nd solution.
Summed up I would start with 1st approach. Optimize (go with more complex solution) when it's necessary.

I'd prefer the two calls because of the reusability. Maybe one day you need add a third list of users for one case and then you'd need to change the method if you would only use one method. But then there may be other use cases which only required the two lists but not the three, so you would need to change code there as well. Also you would need to change all your testing methods. If your project gets bigger this makes your project hard to update or fix. Also all the modifications increase the chances of introducing new bugs as well.
Seeing the methods callable by the frontend of the backend like an interface helps.
In general an interface should be open for extension but closed on what the methods return and require. As otherwise a slight modification leads to various more modifications.

Converting object from one format to another Java ( Design pattern )

I am building a service that depends on another service. A typical Service oriented architecture. The service i am dependent on exposes some API and data types. I am confused should i be converting the object types exposed by that service into specific objects which my service understands. I do expect their service to change with time as these are two different services. I have two options:
Directly use those data types in my service and pass those in methods.
Transform those into specific data types which only my service understands. ( objects will look exactly same if i do this with 0 changes ).
I tried to answer these questions but still could not make the final call. I need help in making this decision.
Why should I have encapsulated/transformed types ?
To prevent building every time they build changes in the service.
To prevent widespread changes ( adapter pattern ) : Changes to the wire
format will lead me to change only the encapsulating classes.
Why should I not have the changes for the types encapsulated ?
The classes will look exactly same as the wire format classes. ( Useless effort to maintain extra classes )
As i understand the impact will be same if i go with either approach. Help ?

I am no architect or SOA specialist, so excuse me if I am saying anything stupid :-)
But I really think the way here is to keep your services simple.
In your shoes, I'd just directly use the existent API. I would not spent any time wrapping or adapting the methods into another API. Your second service (that uses the existent first service) business logic should take care of this convertion, IMO, except if you're being forced to do something that is really expensive with the existent API.
Remember that services are mutable. They're software. They have bugs, business logic changes as time goes and you'll have to change the API and sometimes you'll have to keep older methods compatible for other service consumers. You probably don't want to maintain two APIs that provide the same information without any good practical reason. Not for twice the maintenance work.
Creating another API just to adapt the data format sounds to me a little like that old "DTOs are evil" flame war. And I think a very few people write about the advantages of using DTO nowadays :-)

This is sort of opinion based question, so my opinion is, you should make your own data-types to let your piece of code understand what should be contained in which variable.
I think of services as a data provider, which accepts certain request and fulfill our needs and in return may give us some data. I think role of service is just providing services to client.
It should be responsiblity of client to accept the data returned by service and store them in certain data-structure as there can be n different clients for single service and they can have n different requirements which may lead them to design client specific data-structure to contain data.
Also, as you said client service is subject to change over the period of time, then if you make your own data-structure, then you will need to make change in one single place, and rest of your code will be safe.

Designing a point system in Spring

I have a lot of existing data in my database already, and want to develop a points mechanism that computes a score for each user based on what actions they do.
I am implementing this functionality in a pluggable way, so that it is independent of the main logic, and relies on Spring events being sent around, once an entity gets modified.
The problem is what to do with the existing data. I do not want to start collecting points from now, but rather include all the data until now.
What is the most practical way to do this? Should I design my plugins in such a way as to provide for an index() method, which will force my system to fetch every single entity from the database, send an EntityDirtyEvent, to fire the points plugins, for each one, and then update it, to let points get saved next to each entity. That could result in a lot of overhead, right?
The simplest thing would be to create a complex stored procedure, and then make the index() call that stored procedure. That however, seems to me like a bad thing either. Since I will have to write the logic for computing the points in java anyway, why have it once again in SQL? Also, in general I am not a fan of splitting business logic into the different layers.
Has anyone done this before? Please help.

First let's distinguish between the implementation strategy and business rules.
Since you already have the data, consider obtaining results directly from the data. This forms the data domain model. Design the data model to store all your data. Then, create a set of queries, views and stored procedures to access and update the data.
Once you have those views, use a data access library such as Spring JDBC Template to fetch this data and represent them into java objects (lists, maps, persons, point-tables etc).
What you have completed thus far does not change much, irrespective of what happens in the upper layers of the system. This is called Model.
Then, develop a rule base or logic implementation which determines, under what inputs, user actions, data conditions or for all other conditions, what data is needed. In mathetical sense, this is like a matrix. In programming sense, this would be a set of logic statements. If this and this and this is true, then get this data, else get that data, etc. This encompasses the logic in your system. Hence it is called "Controller".
Do not move this logic into the queries/stored procedure/views.
Then finally develop a front-end or "console" for this. In the simplest case, develop a console input system, which takes a .. and displays a set of results. This is your "view" of the system.
You can eventually develop the view into a web application. The above command-line view can still be viable in the form of a Restful API server.

I think there is one problem here to be considered: as I understand there's huge data in the Database so the idea to create only one mechanism to calculate the point system could not be the best approach.
In fact if you don't want to start collecting points but include all the data, you must process and calculate the information you have now. Yes, the first time you will run this can result an overhead, but as you said, you need this data calculated.
By other hand you may include another mechanism that attends changes in an entity and launches a different process capable of calculate the new pointing diffence that applies to this particular modification.
So, you can use one Service responsible of calculate the pointing system, one for a single entity and another, may be longer to finish, capable of calculate the global points. Even, if you don't need to be calculated in real-time you can create a scheduled job responsible of launch it.
Finally, I know it's not a good approach to split the business logic in two layers (Db + Java) but sometimes is a requirement do it, for example, if you need to reply quickly to a request that finally works with a lot of registries. I've found some cases that there's no other option than add business logic to the database (as a stored procedures, etc) to manage a lot of data and return the final result to the browser client (ex: calculation process in one specific time).

You seem to be heading in the right direction. You know you want your "points" thing decoupled from the main application. Since it is implied you are already using hibernate (by the tag!), you can tap into the hibernate event system (see here section 14.2). Depending upon the size/complexity of your system, you can plugin your points calculations here (if it is not a large/complex system), or you can publish your own event to be picked up by whatever software is listening.
The point in either design approach is that neither knows or cares about your point calculations. If you are, as I am guessing, trying to create a fairly general purpose plugin mechanism, then you publish your own events to that system from this tie-in point. Then if you have no plug-ins on a given install/setup, then no one gets/processes the events. If you have multiple plug-ins on another install/setup, then they each can decide what processing they need to do based upon the event received. In the case of the "points plugin" it would calculate it's point value and store it. No stored proc required....

You're trying to accomplish "bootstrapping." The approach you choose should depend on how complicated the point calculations are. If stored procedures or plain update statements are the simplest solution, do that.
If the calculations are complicated, write a batch job that loads your existing data, probably orders it oldest first, and fires the events corresponding to that data as if they've just happened. The code which deals with an event should be exactly the same code that will deal with a future event, so you won't have to write any additional code other than the batch jobs themselves.
Since you're only going to run this thing once, go with the simplest solution, even if it is quick and dirty.

There are two different ways.
One is you already know that - poll the database for for changed data. In that case you are hitting the database when there may not be change and it may slow down your process.
Second approach - Whenever change happens in database, the database will fire the event. That you can to using CDC (Change Data Capture). It will minimize the overhead.
You can look for more options in Spring Integration

How to track and persist events from catched exceptions or logged data processing deviances?

Let say that there are events that may occur (are less likely), but they should be registered. This is data that one just need for tuning ... to see what has happened and what needs to be changed and improved.
Typically this is done in catch blocks when exception is thrown or just if some if condition passes.
I don't want to write shell scripts and collect data from logs so I can either use DB and create a table for every context, which is possible in most cases, but it is extremely inconvenient for maintenance and refactoring in further development. Especially because the data will by typed as RDB is used. Mostly the only shared data is userId, time, component and varying data like fileId, fileSize || elementsCount, count deviance || etc.
Or I can use some nosql store for that, which is also a little overkill to do that just because of this, but as the data has rather type free nature, I guess it would be more convenient.
Could you please explain how you do it ? How is this even called ? I think that JMX doesn't deal with this scenario. Spring AOP or AOP in general deals only with the distributed nature of this.

It sounds like you have two separate questions:
How should I capture certain events in the first place?
What should I do with the events once I've captured them?
Regarding 1, yes, AOP is a pretty common solution for capturing events. I'm not sure what you mean by "AOP in general deals only with the distributed nature of this". There's nothing distributed about AOP. You haven't told us enough about your application for anybody to say how to integrate AOP as easily as possible, etc, but lots of information is available online.
Regarding 2, how much data are we talking about? How much information do you want to store per event? What's similar/different about each message? I'd probably take the following approach:
Figure out how much data you're going to save during any given second, minute, hour, day, etc. If it's small enough to fit into one of your existing databases, then don't complicate your environment by introducing a new technology.
Can the data be loaded synchronously? If yes, then that's easy. Otherwise, I probably would log the data, and consolidate it periodically with a simple ETL script. This probably will be a whole lot easier and cheaper than setting up a new nosql store that you don't have in production now.
Decide what data you want to keep. It probably will look something like: id, type, timestamp, source (e.g. server or instance of the application), details. Details should be type-specific.
Decide what types of queries or reports you want to run on the data.
Do you need to structure the type-specific stuff so that specific queries are possible? Can you keep the type-specific stuff in an XML or JSON document, and only parse them in type-specific reports? Or, do you need to refer to type-specific stuff in the queries themselves? Type-specific details can make queries hard, but a nosql database such as mongodb might actually help here.
Figure out your data retention policy. You probably need to clean up old data at some point. This might affect the design of your storage.

Building a Java based stock trading application, need pointers for technologies to use

I am building an application in Java (with a jQuery frontend) that needs to talk to a third party application. it needs to update the interface every two seconds at the most.
Would it be a good idea to use comets? If so, how do they fit into the picture?
What other means/technologies can I use to make the application better?
The application will poll stock prices from a third party app, write it to a database and then push it to the front end every second, for the polling, I have a timer that runs every second to call the third party app for data, I then have to display it to the front end using JSP or something,
well at this point im not sure if I should use a servlet to write this out to the front end, what would you recommend? how should I go about it?
is there any new technology that I can use instead of servlets?
I am also using Berkeley db to store the data, do you think its a good option? what would be the drawbacks, if any for using berkeley..
im absolutely clueless so any advice will be much appreciated.
Thanks!
edit : I am planning to do this so that a deskop app constantly polls from the thrid part and writes to the database and a web app only reads and displays from the database, this will reduce the load on the web app and all it has to do is read from db.

Take a look at using a web application framework instead of Servlets - unless it's a really basic project with one screen. There are lots in the Java world unfortunately and it can be a bit of a minefield. Stick with maybe SpringMVC or Struts 2, the worst part is setting these up, but take a look at a sample application plus a tutorial or two and work from there.
http://www.springsource.org/about
http://struts.apache.org/2.x/index.html
Another option to look at is using a template framework such as Appfuse to get yourself up and running without having to integrate a lot of the framework together, see:
http://appfuse.org/display/APF/AppFuse+QuickStart
It provides you with a template to setup SpringMVC with MySQL as a database plus Spring as an POJO framework. It may be a quick way to get started and up and building a prototype.
Judging by your latency requirement of 2 seconds it would be wise to look at some sort of AJAX framework - JQuery or Prototype/Scriptaculous are both good places to start.
http://jquery.com/
http://www.prototypejs.org/
In terms of other technoloqies to make things better you will want to consider a build system, Ant/Maven are fine with Maven the slightly more complex of the two.
http://ant.apache.org/
http://maven.apache.org/download.html
Also, consider JUnit for testing the application. You might want to consider Selenium for functional testing of the front end.
http://www.junit.org
http://seleniumhq.org/

Is this really a stock trading application? Or just a stock price display application? I am asking because from your description it sounds like the latter.
How critical is it that data is polled every second? Specifically would it matter if some polls are a second or two late?
If you are building a stock trading application (where the timing is absolutely critical), or if you cannot afford to be delayed on your polling, I'd recommend you have a look at one of the Java Real Time solutions:
Sun Java Real-Time System (http://java.sun.com/javase/technologies/realtime/index.jsp)
WebSphere Real Time (http://www-01.ibm.com/software/webservers/realtime/)
Oracle JRockit Real Time (http://download.oracle.com/docs/cd/E13150_01/jrockit_jvm/jrockit/docs30/index.html)
Other than that, my only advice is that you stick to good OO design practices. For instance, use a DAO to write to your database, this way, if you find that Berkeley DB isn't quite for you, you can switch to a relational database system with relative ease. It also makes it easy for you to move on to some database partitioning solutions (e.g., Hibernate Shards) if you decide you need it.
While I may have my own technology preferences (for instance, I'd choose Spring MVC for the front end as others have mentioned, I'd try and use Hibernate for persistance), I really cannot claim that these would be better than other technologies out there. Go with something you are familiar with, if it fits the bill.

I think you should focus on your architectural design before picking technologies with a focus on scalability and extendability. Once an architectural design is in place you can look to see what's available and what you need to build, all of which should be pretty obvious.
While not directly comparable look at how Google, eBay and YouTube deal with the scalability problems they face. While a trading system won't have the issues these guys have with sheer numbers of users, you'll get similar problems with data volumes and being able to process price ticks in a timely fashion.
The LSE has getting on for 3000 names, multiply this by the 10 or so popular exchanges round the world and you've got a lot of data being updated continuously over the period each market is open. To give you an idea of what involved in capturing data from a single exchange take a look at http://kx.com/.
From a database perspective you've going to need something industrial strength that allows clustering and has reliable replication - for me this means Oracle. You also want to look at a Time-series Database Design, which in my experience is the best way to build this sort of system.
The same scaling and reliability requirements will apply to your app servers, with JBoss being the logical choice there, although I'd also consider the OSGi Spring Server (http://www.springsource.com/products/dmserver) as its lightweight nature could make it faster.
You'll also want Apache servers for load balancing and to serve static content - a quick Google will turn up stacks of information on that so I won't repeat it here.
Also forget polling, it doesn't scale. Look at using messaging and consumer processes for the cross-process communication, events and worker threads for the in-process communication. Both techniques achieve a natural load balancing effect that can be tuned by increasing the number of consumer processes or worker threads as need be.
Also a static front-end isn't going to cut the mustard, IMHO. Take a look at what's out in the market already - CNC Markets, IG Index, etc all have pretty impressive real-time trading apps.
As an aside, assuming this is a commercial project and not meaning to put a downer on the whole thing, companies like CNC Markets, IG Index, etc make their money from trading fees, the software being a means to an end, which you get access to for free simply by having an account. The other target for the trading software is commercial institutions such as the banks, investment managers, etc. I'd want a pretty watertight plan for how I was going to break into either of these markets before expending too much time and effort.

PostgreSQL is probably the right database. It's a little more enterprisy than MySQL. As for the front-end, there's lots of stuff that can go "on top" of servlets, SpringMVC, Tapestry, and so on and so forth. The actual servlet implementation will be hidden from you.
Many will suggest, and it's probably not a bad suggestion to use Spring to configure the application and to do any dependency injection.
If you're looking for something a little more lightweight, you might consider grails. It's quick to develop with and becoming mature.
Really though, it's kind of hard to recommend things without knowing what kind of "production" environment this would be. Are we talking lots of transactions? (sure, it's a stock trading program, but is it a simulation with a small number of users etc...) It's fun to suggest things, but if you're serious, I'm not sure I would start a major project like this. There are lots of ways to do this, and lots of ways to do this wrong.

Your intention is to build a web UI which shows realtime data eg: time, market data etc...
One of the technologies I have personally used is Web Firm Framework, an opensource framework under Apache License 2.0. It is a java server side framework to build web UI. For each and every tag & attribute there is a corresponding java class. We are just building the UI with Java code instead of pure HTML and JavaScript. The advantage is whatever changes we are making in the server tag & attribute objects will be reflected to the browser page without any explicit trigger from the client. In your case we can simply use ScheduledExecutorService to make data changes in the UI.
Eg:
AtomicReference<BigDecimal> oneUSDToOneGBPRef = new AtomicReference<>(new BigDecimal("0.77"));
SharedTagContent<BigDecimal> amountInBaseCurrencyUSD = new SharedTagContent<>(BigDecimal.ZERO);
Div usdToGBPDataDiv = new Div(null).give(dv -> {
//the second argument is formatter
new Span(dv).subscribeTo(amountInBaseCurrencyUSD, content -> {
BigDecimal amountInUSD = content.getContent();
if (amountInUSD != null) {
return new SharedTagContent.Content<>(amountInUSD.toPlainString(), false);
}
return new SharedTagContent.Content<>("-", false);
});
new Span(dv).give(spn -> {
new NoTag(spn, " USD to GBP: ");
});
new Span(dv).subscribeTo(amountInBaseCurrencyUSD, content -> {
BigDecimal amountInUSD = content.getContent();
if (amountInUSD != null) {
BigDecimal oneUSDToOneGBP = oneUSDToOneGBPRef.get();
BigDecimal usdToGBP = amountInUSD.multiply(oneUSDToOneGBP);
return new SharedTagContent.Content<>(usdToGBP.toPlainString(), false);
}
return new SharedTagContent.Content<>("-", false);
});
});
amountInBaseCurrencyUSD.setContent(BigDecimal.ONE);
//just to test
// will print <div><span>1</span><span> USD to GBP: </span><span>0.77</span></div>
System.out.println(usdToGBPDataDiv.toHtmlString());
ScheduledExecutorService scheduledExecutorService =
Executors.newScheduledThreadPool(1);
Runnable task = () -> {
//dynamically get USD to GBP exchange value
oneUSDToOneGBPRef.set(new BigDecimal("0.77"));
//to update latest converted value
amountInBaseCurrencyUSD.setContent(amountInBaseCurrencyUSD.getContent());
};
ScheduledFuture scheduledFuture = scheduledExecutorService.schedule(task, 1, TimeUnit.SECONDS);
//to cancel the realtime update
//scheduledFuture.cancel(false);
For displaying time in real-time you can use SharedTagContent<Date> and ContentFormatter<Date> to show time in specific timezone. You can watch this video for better understanding. You can also download sample projects from this github repository.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.