How do I perform a large query in google app engine? - java

I have collection of apps that total almost 1 million users. I am now adding a push notification system using Google cloud messaging to create alerts. My database contains an entity with the GcmId and app name (ex. "myApp1").
Now, I want to send a GCM message to all users of "myApp1". The objectify documents do not describe the .limit function well though. For example, from the GCM demo app:
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).limit(10).list();
will send to the first 10 entries. But I need all entries that match appType="myApp1". This is harder because the query can be large and could potentially match half a million users and I need to send the GCM push to all of them.
How is such a large query performed?
EDIT: I am currently using
List<RegistrationRecord> records = ofy().load().type(RegistrationRecord.class).filter("app","myApp1").list();
for testing and it is working fine. However when pushed live, the dataset is huge, and I don't know what the repercussions are.

i believe you are looking it from the wrong angle.
objectify or low level appengine deal very well with paginated results using cursors, so you need to process results by chunks. i wont go into details on how to do that because that would end up costing you a lot of $ for all those datastore reads and you will need task queues.
instead, look at topics in google cloud messaging:
https://developers.google.com/cloud-messaging/topic-messaging
the user (client side app) subscribes to the topic (the appid in your case). you then send a single topic push which is much easier from an appengine frontend instance (limited to 30 seconds response and such).
I found this blog post to be a great example of a full implementation and how to properly handle possible errors:
https://blog.pushbullet.com/2014/02/12/keeping-google-cloud-messaging-for-android-working-reliably-techincal-post/
the only issue i can see is that a push from server is documented to take up to 30 seconds. An appengine frontend instance also has a 30 seconds limit total, so while it waits for GCM push to complete the servlet itself can timeout. one way to solve this is to send the push from taskqueue which would give you 60 seconds for urlfetch calls (i assume that limit applies to any api call as well): https://cloud.google.com/appengine/docs/java/urlfetch/

Related

QuickFIX/J - Best way to consume market data

I am using QuickFIX library to connect to my broker's FIX server and send orders. My Application (QuickFIX Initiator) receive orders message from different algorithms by consuming a RabbitMQ queue and route it to my broker using FIX protocol, as outlined by the diagram bellow:
I had to implement a custom order execution that must check the bid and ask of a specific security before deciding if the order to be send is a LIMIT or a STOP LIMIT.Before having access to market data through FIX, I used to make a request to an internal API build uppon the Bloomberg API
instrumentData = bbgApi.getInstrumentData(this.params.getSymbol());
String side = this.params.getSide();
if (side.equals(Side.BUY)) {
Double ask = instrumentData.getAsk();
// ...
Session.sendToTarget(this.getOrder(), sessionID);
}
else if (side.equals(Side.SELL)) {
Double bid = instrumentData.getBid();
//...
Session.sendToTarget(this.getOrder(), sessionID);
}
I just have to check the bid and ask of the specific moment that I am sending the order.
Now that I have access to market data through FIX, I would like to consume the bid and ask using it because the price are closer to the real price of the securities at each moment. But because of the way that FIX works, I don't know how can I make a call like
instrumentData = fixApi.getInstrumentData(this.params.getSymbol());
because when I send a FIX message, it does not returns me a "promise" that I can wait to be complete before continuing my code execution. I am used to the way that JavaScript and REST APIs works, so I am a little bit stuck. I am wondering what is the best way that I can consume and produces market data that is received through FIX.
My ideas
Create a Market Data FIX application (Initiator) that will subscribe to securities data. The data of each securitity will be put to an RabbitMQ queue. For each order received from my FIX Order Application, the Order Execution Object will consume from the specific security queue and react to the first market data message received and send an order.
Also create a Market Data FIX Application (Initiator) that will subscribe to securities data. The data of each security will be put to a table of MySQL. Then I would create an API like the bbgApi that I mentioned above that will consult the MySQL and get the most recent data of the required security.
Create an Market Data Application that is composed of an Initiator and Acceptor. The Initiator will connect to my broker's market data app and the acceptor will be used to accept new connections from internal applications. My Order Execution Object will require market data through the acceptor and wait for a message containing the required data.
In my opinion, the solution 3 seems ideal, but would require multiple connections to my acceptor, which can slow down (and delay) the order execution. It would be ideal if I only have one object connected to the FIX market data app that would send the request and this request returns a promise that, when completed, returns the required data.
I appreciate your opinion about which way is better to consume market data. If you have a different opinion or suggestion, please let me know. Thank you very much for your help.
Edit:
RabbitMQ won't be a good option because maybe I would like to have multiple consumers for the same message. Maybe kafka would be ideal.

Java: best practice to update data online when internet can be interrupted

I have 2 applications:
desktop (java)
web (symfony)
I have some data in the desktop app that must be consistent with the data in the web app.
So basically I send a POST request from the desktop app to the web app to update online data.
But the problem is that the internet cannot always be available when I send my request and in the same time I can't prevent the user from updating the desktop data
So far, this is what I have in mind to make sure to synchronize data when the internet is available.
Am I on the right direction or not ?
If not, I hope you guys put me in the right path to achieve my goal in a professional way.
Any link about this kind of topics will be appreciated.
In this case the usefull pattern is to assume that sending data is asynchronous by default. The data, after collecting, are stored in some intermediate structure and wait for a sutable moment to be send. I think the queue could be useful because it can be backend with a database and prevent data lost in case of the sending server failure. Separate thread (e.g. a job) check for data in the queue and if exists, read them and try to send. If sending was performed correctly the data are removed from queue. If failure occurs, the data stays in queue and an attempt will be made to send them next time.
This is a typical scenario when you want to send a message to an un-transactional external system in a transaction and you need to garantee that data will be transfered to the external system as soon as possible without losing it.
2 solutions come up in my mind, maybe the second fits better to your architecture.
Use case 1)
You can use message queue + redelivery limit setting with dead letter pattern. In t that case you need to have an application server.
Here you can read details about the Dead letter pattern.
This document explain how redelivery limit works on Weblogic server.
Use case 2)
You can create an interface table in the database of the destop application. Then insert your original data into database and insert a new record into the interface table as well (all in same transaction). The data what you want to POST needs to be inserted into the interface table as well. The status flag of the new record in the interface table can be "ARRIVED". Then create an independent timer in your desktop app which search periodically for records in the interface table with status "ARRIVED". This timer controlled process will try to POST data to webservice. If the HTTP response is 200 then update the status of the record to "SENT".
Boot can work like a charm.
You can solve it many way. Here give 2 way:
1.You can use circuit breaker pattern. You can get link about it from here
You can use JMS concept to manage this.

Pagination in Highly dynamic and Frequently change Data in java

I am java developer and my application is in iOS and android.I have created web service for that and it is in restlet Framework as JDBC as DB connectivity.
My problem is i have three types of data it is called intersection like current + Past + Future.and this intersection contain list of user as a data.There is single web service for giving all users to device as his/her intersection.I have implement pagination but server has to process all of his/her intersections and out of this giving (start-End) data to device.I did this because there are chances that past user may also come in current.This the total logic.
But as intersection grows in his/her profile server has to process all user.so it become slow and this is obvious.also device call this web service in every 5 minutes.
please provide better suggestion to handle this scenario.
Thanks in advance.
Ketul Rathod
It's a little hard to follow your logic, but it sounds like you can probably benefit from caching your results on the server.
If it makes sense, after every time you process the users data on the server, save the results (to a file, to a database table, whatever). Then, in 5min, if there are no changes, simply return the same. If there were changes, retrieve from cache (optionally invalidating the cache in the process), append those changes to what is cached, re-save the results in the cache, and return the results.
If this is applicable to your workflow, your server-side processing time will be significantly less.

CouchDB data replication

I have 30 GB of twitter data stored in CouchDB. I am aiming to process each tweet in java but the java program is not able to hold such a large data at a time. In order to process the entire dataset, I am planning to divide my entire dataset into smaller ones with the help of filtered replication supported by CouchDb. But, as I am new to couchDB, I am facing a lot of problems in doing so. Any better ideas for doing it are welcome. Thanks.
You can always query couchdb for a dataset that is small enough for your java program, so there should be no reason to replicate subsets to smaller databases. See this stackoverflow answer for a way to get paged results from couchdb. You might even employ couchdb itself for the processing with map/reduce, but that depends on your problem.
Depending on the complexity of the queries and the changes you make when processing your data set you should be fine with one instance.
As the previous poster you can use paged results, I tend to do something different:
I have a document for social likes. The latter always refers to a URL and I want to try and have an update at every 2-3 hours.
I have a view that sorts URL's by the documents by the age of the last update request and the last update.
I query this view so that I exclude the articles that had a request within 30 minutes or have been updated less than 2 hours ago.
I use rabbit MQ when enqueuing the jobs and if these are not picked up within 30 minutes, they expire.

How to limit the quota in jersey for every client

I've already built the web service API by jersey framework.
Now I want to limit the quotas for every client.
For example:
- one client can only make less than 10000 requests in one day.
- one client can only make less than 10 requests per second.
so on and so forth.
Should I store these information in the table of the database?
But if I do that, will it cost a lot time to handle these requests because I have to update the table.
I am looking forward to other efficient ways to solve these problem.
Because this is my first time to do this kind of job, hope somebody can give me some advise to in these problems.
Thanks~!
Without information about how you define a client its difficult to answer this question. However one method would be to filter all incoming requests using a ContainerRequestFilter
Then at that level you can define what a client is, and log all accesses by that client to your Jersey application. Perhaps by incrementing a value in a DataStructure or a value in a database. Then having a cron job flushes that data every 24 hours.
Ideally you would want to store the data in an in-memory data structure, since the data is transient, it won't grow to a large size and will be deleted in a short period of time anyway. However this will become an issue if you ever scale up to multiple machines, or multiple instances on a single machine.
Without more information from you I can't really give any more information

Categories