Cassandra Client API Most Similar to App Engine Datastore API? - java

With the announcement of the Google App Engine's new pricing model, I've realized my application will not be able to sustain itself due to the extremely high price of Google Datastore interactions. Because it is a social game that relies on consistent and quick user input, this application simply requires far too many datstore interactions on a per-user basis to be viable (even with memcache mediating common queries and operations).
From the research I've done, it seems like the best solution would be for my team to migrate to a Cassandra-based database solution. I've looked at the various popular APIs like Hector and Pelops, but from my initial inspection it seems these are a little too low-level for what I'm looking for. Is there a Cassandra client API in Java that emulates the App Engine's low-level Datstore API and uses the same "Entity Group"/property model? At the very least I would like the API to have the same "Ancestor" Entity concepts and maintain cross-group transactions in the same manner.
EDIT : To clarify, what I'm really looking for is a Cassandra API that supports Transactions. As far as I can understand, transactions in a NoSQL environment are difficult, if not impossible to implement without some hierarchical groupings of "objects" (call them what you will, entities, tables, etc). This seems to be why Hector does not implement them.
So, my question is, what is the most popular Cassandra API that implements some form of transactional systems, preferably one that uses an GAE-like Entity structure?

Did you see hector-object-mapper? https://github.com/rantav/hector/tree/master/object-mapper
Lightweight, annotation driven persistence for Apache Cassandra via Hector. For more docs on Hector usage, see: http://hector-client.org

The best I could find is AppScale which uses Cassandra though it's unclear to me how to set-up and run the Datastore as a stand alone service without the other GAE services. I've created an issue for that. Technically you could use the same client library with minor tweaks(i.e. replace the datastore URL/endpoint with your own one)

Related

Is hibernate recommended in a heterogeneous environment?

Is Hibernate less effective in some environments, like a polygot company where several distributed systems are accessing the same db? If Acme Company has a python website reading from and writing to the same database as a java web app (web services), will Hibernate be a poor choice for the java web services app? In other words, does Hibernate caching and session management assume all db transactions for Acme will be using Hibernate? Do I need to be sensitive to certain ORM concerns at a company where several programming languages are writing a lot of updates to the same data concurrently? Is Hibernate more advantageous for a strict java shop using a java ee app server for nearly all of its business operations?
Hibernate does have some performance overhead over pure JDBC, but if you're using it cautiously it should be fine for most of use cases.
Hibernate does not assume that it handles all operations itself. The only thing I would worry about is second level cache if you need it. You won't have a way to keep it in sync if other apps access the same DB (but you don't have to use it).
Having said that, I must add that having multiple apps write to the same DB is not a good practice. I'd rather create one app that handles this DB and have others communicate with this one - this way it's much easier to keep the database consistent.

Use couchDB with vert.x

I'm looking into a NoSQL database for use with Vert.x
Based on the not so favorable results mongoDB is out, so I'm looking at CouchDB/CouchBase, not at least since some of our data collection runs on RaberryPI fed by Arduino I/O (with a Rasbery PI CouchDB instance for offline collection).
What Java library would be suitable/best for use with CouchDB and Vert.x
I don't know a lot about vert.x but it appears to run on the JVM, so you should just be able to use Ektorp, which is pretty much the standard Java library for CouchDB nowadays. It covers all the core functionality, it's fairly well thought out, and the maintainer has been reasonably responsive to pull requests etc, as far as I've seen.
There's more documentation on Ektorp here.

Are there any frameworks to synchronize data generated on one peer with all other peers in an unreliable network?

We are developing a system with the following requirements.
There are N systems that each generate data that is unique to themselves
Each system requires the data from every other system to perform its end goal
These systems are talking to each other on an unreliable network.
It is expected that some systems will be completely unavailable for extended periods of time (but they may be in contact with some of there peers who are in contact with the rest of the network)
To put it another way, each system needs to replicate its data to N peer systems. Ideally, this will be done in an intelligent manner.
I have considered looking into database synchronization frameworks, but I am concerned that it is overkill for this problem. I don't think there is any possibility for row conflicts because each system's data is entirely independent of other systems.
The question is, do you know of any frameworks that could help solve this problem? Or possibly a way to phrase this issue that might help me down a path to discover a solution.
Finally, ideally, this framework would be in C++ (and potentially, java).
SymmetricDS.org
The solution you are looking for sounds a lot like the open source software SymmetricDS.
"SymmetricDS is an asynchronous data replication software package that supports multiple subscribers and bi-directional synchronization. It uses web and database technologies to replicate tables between relational databases, in near real time if desired. The software was designed to scale for a large number of databases, work across low-bandwidth connections, and withstand periods of network outage."
-SymmetricDS.org
Symmetric was designed to be used as a Java library, as well as a stand alone application. Used with a lightweight database like H2, you could avoid your overkill scenario. H2 can optionally be run
embedded within an application and can store data in memory or to disk.
Disclaimer: I recently started working for JumpMind, the company that develops this software.
0mq. It is a C framework with a C++ interface. It notably supports EPGM (reliable multicast over UDP) and N-to-N connections. Though, there will be work to do for your special use case.
Interesting problem. Many of the issues you've described lend themselves particularly well to the BitTorrent protocol.
It seems you want to implementing a reliable broadcast for your peer communication. Check out the library J.N. provided, and if it is not sufficient (or you want to modify it) there are some algorithms in this book.
Check Causal Order Broadcast and Total Order Broadcast.
My teacher at the univ did implement such a library, I will update when I find it.
What you are looking for is called a "distributed database", and they are extensively used even in production system; http://www.project-voldemort.com/ for example, is used by linkedin
As p2p network like DHT and Kadmelia ARE key->value database, there are also some P2P database, where new node are automatically added and the failure resistence of any node is strong, as those network resistance and scalability is proven
So just look on your preferred search engine for "p2p database" and "distributed database", and you will find a lot of implementation.

How to design a 2/3 tier distributed application in Java?

I got the task to design a distributed system that basically consists of one centrally shared database and multiple fat clients (Swing based GUIs) that should interact with this database. I basically want to administrate some addresses, deadlines, tasks, etc. I am currently using Java 6 SE, JPA (eclipse-link), and a MySQL database. Now I am facing some problems:
How is client 2 informed about data changes committed to the database by client 1? Is it a good idea to use an RMI approach for messaging?
I am dealing with stale data, since the JPA EntityManager caches the query results. Does it make sense to broadcast "db-change"-messages to all active clients so that they may refresh the locally cached entity objects?
Is there a much simpler approach to achieve these goals by using an application server like GlassFish? Or is the usage of Java EE application servers only convenient for web development? (sorry for these newbie questions, but I really didn't find any clear answers by reading the Java EE docs, or I simply didn't get it :-/ ...)
Any help is highly appreciated - many thanks in advance!
Is there a much simpler approach to achieve these goals by using an application server like GlassFish ?
That is the precise point of an application server (which is distinct from a web-server) in a 3-tier setup. Certainly you can poll and/or use messaging to provide additional hooks for meta-data (e.g. db change event) communication, but you will end up poorly reinventing a very well known (and non-trivial) wheel (e.g. data synchronization in a distributed tier).
If you can live without caching query results in the client and latencies of accessing the server (2nd tier) for data access are acceptable, then clearly that is the way to go.
[below is a fairly late p.s. but happened to read this and the question again today and personally felt it required clarification.]
Java EE is a distributed container/component based architecture for the enterprise tier. Putting aside the failure of a component market to emerge for J2EE (though some did try) what is remains is the fact of its COA and its inherent support for distribution as a foundational concern of the architecture. Note that the web profile (e.g. "web-server") of Java EE is also part of the same architecture.
So what do you get when you use one of these Java EE application servers and how would it address your requirement/design concerns.
Two important key aspects of the support for distribution offered by Java EE are its (a) distributed name-space (JNDI), and (b) its menu of offerings for connectivity across tiers (pure RMI (where you roll your own distributed RPC based system), Enterprise Beans aka EJBs (remotely and locally exposed component interfaces with well defined semantics in terms of lookup and life-cycle in distributable containers). Of the EJB flavors, in terms of connection semantics, you have messaging (JMS) and straight-up RPC.
For your case, you could, for example, opt for a JMS message bus with both fat-client JMS end-points and MessageDrivenBean EJBs. You c/would design a messaging domain with both topic/subscription based and straight up Queues. These can be declaratively configured to be durable, or not, etc.
Your application server c/would provide this JMS provider, or you could opt for a best of breed, e.g. TIBCO, for your needs, per your requirements.
You are not reinventing any of the above very non-trivial concerns. Your focus remains your domain requirements, and you have all the tools you need to create, within certain reasonable SLAs, your platform.
A viable alternative to this is to compose what boils down to the same exact thing minus the COA approach of Java EE (which both gets you declarative magic and pita development ceremony) with stand alone OSS software e.g. ØMQ for your bus, REST remote RPC, and possibly REDIS for beefing up persistence guarantees for your messages and for coordinating (no JNDI ..) your distributed balls in the air.
I personally prefer that latter, given that it is more fun for me. Also efficiencies gained due to more direct control over the distribution layer allows for scalability gains given very stringent requirements (e.g. a tiny minority of requirements out there).
A distributed system design for the enterprise ("have been tasked") needs to consider business requirements beyond merely the application domain. That is part of the equation.
Hope this is helpful (and timely ;)
Since you are using JPA you could benefit from its entity locking and concurrency mechanisms.
There are two main concepts for JPA (Quoted from Java EE 6 tutorial):
Optimistic locking:
By default, persistence providers use optimistic locking, where,
before committing changes to the data, the persistence provider checks
that no other transaction has modified or deleted the data since the
data was read. This is accomplished by a version column in the
database table, with a corresponding version attribute in the entity
class. When a row is modified, the version value is incremented.
Pessimistic locking:
Pessimistic locking goes further than optimistic locking. With
pessimistic locking, the persistence provider creates a transaction
that obtains a long-term lock on the data until the transaction is
completed, which prevents other transactions from modifying or
deleting the data until the lock has ended. Pessimistic locking is a
better strategy than optimistic locking when the underlying data is
frequently accessed and modified by many transactions.
Choose the strategy that fits best to your application behavior and functional requirements.
the fat clients can poll on a configured interval. This is similar to mail clients like outlook, which poll for new e-mail messages.
Your clients conceptually connect to a "middle-tier" which contains the "business logic".
You clients send all requests to the "middle-tier" and the "middle-tier" preforms them. This means that if a middle tier cares about coordinating clients, the middle-tier can remember which clients have "looked at" an important object, and (provided the technology supports it) can transmit an update to the appropriate clients.
Clients mainly contain code to present the data under this scenario, and the code they contain to accept requests mostly proxies the request to the middle tier.

JDO vs JPA for Java on Google App Engine

I want to develop my project on Google App Engine with Struts2. For the database I have two options JPA and JDO. Will you guys please suggest me on it? Both are new for me and I need to learn them. So I will be focused on one after your replies.
Thanks.
The GAE/J google group has several posts about this very thing. I'd do a search on there and look at people's opinions. You will get a very different message to the opinions expressed above. Also focus on the fact that BigTable is not an RDBMS. Use the right tool for the job
JPA is Sun's standard for persistence, JDO is IMHO dying (actually, it's dead but still moving). In other words, JPA seems to be a better investment on the long term. So I guess I'd choose JPA if both were new to me.
Just saw this comparison between JPA and JDO by DataNucleus themselves:-
http://www.datanucleus.org/products/accessplatform_2_1/jdo_jpa_faq.html
An eye-opener.
I'm a happy user of JDO. Keep up the good work guys.
People claiming JDO is dead is not without merit. Here is what I read in the book Pro EJB 3 Java Persistence API: "Shortly thereafter Sun announced that JDO would be reduced to specification maintenance mode and that the Java Persistence API would draw from both JDO and the other persistence vendors and become the single supported standard going forward.". The author Mike Keith is the co-specification leader on EJB3. Of course he is a big supporter of JPA, but I doubt he is biased enough to lie.
It is true that when the book was published, most major vendors were united behind JPA rather than JDO, even though JDO does have more advanced technical features than JPA. It is not surprising because big players in the EE world such as IBM/Oracle are also big RDBMS vendors. More customers are using RDMBS than non-RDMBS in their projects. JDO was dying until GAE gave it a big boost. It makes sense because GAE data store is not relational database. Some JPA features does not work with bigtable such as aggregation queries, Join queries, owned many-to-many relationships. BTW, GAE supports JDO 2.3 while only support JPA 1.0. I will recommend JDO if GAE is your target cloud platform.
For the record, it is Google App Engine (GAE), so we play with the Google rules not with the Oracle/Sun rules.
Under it, JPA is not suitable for GAE, it is unstable and it does not work as expected. Neither Google is willing to support it but the bare minimum.
And for other part, JDO is quite stable in GAE and it is (in some extend) well documented by Google.
However, Google does not recommend any of them.
http://code.google.com/appengine/docs/java/datastore/overview.html
Low-level API will give the best performance and GAE is all about performance.
http://gaejava.appspot.com/
For example, add 10 entity
Python :68ms
JDO :378ms
Java Native :30ms
In race between JDO vs JPA I can only agree with the datanucleus posters.
First of all, and also most importantly, the posters of datanucleus know what they are doing. They are after all developing a persistent library and are familiar with data models other than the relational, e.g. Big Table. I am sure that id a developer for hibernate were here, he would have say: "all our assumptions when building our core libraries are tightly coupled to relational model, hibernate is not optimized for GAE".
Secondly, JPA is unquestionably in more widespread use, being a part of the official Java EE stack helps a bit, but that does not necessarily mean that it is better.
In fact, JDO, if you read about it, corresponds to a higher level of abstraction than JPA. JPA is tightly coupled to the RDBMS data model.
From a programming stand point, using the JDO APIs is a much better option, because you are conceptually compromising a lot less. You can switch, theoretically to any data model of your desire, provided the provider you use supports the underlying database.
(In practice you rarely achieve such a high level of transparancy, because you will find yourself setting your primary keys on GAE's object and you will be tying yourself to a specific database provider, e.g. google). it will still be easier to migrate though.
Thirdly, you can use Hibernate, Eclipse Link, and even spring with GAE. Google seems to have made a big effort to allow you to use the frameworks you are used to building your applications on. But what people realize when they build their GAE applications as if they were running on RDBMS is that they are slow. Spring on GAE is SLOW. You can google Google IO videos on this topic to see that it is true.
Also, adhering to standards is a good sensible thing to do, in principle I applaud. On the other hand, JPA being part of the Java EE stack makes people, at times, lose their notion of options.
Realize, if you will, that Java Server Faces is also part of the Java EE stack. And it is an unbelievably tidy solution for web GUI development. But in the end, why do people, the smarter people if I may say so, deviate from this standard and use GWT instead?
In all of this, I have to sate that there is one very significant thing going for JPA. That is Guice and its convenient support for JPA. Seems that google was not as smart as usual in this point and are content, for now in not supporting JDO. I still think that they can afford it, and eventually Guice will engulf JDO as well,... or maybe not.
Go JDO. Even if you don't have experience in it, it is not hard to pick up, and you will have a new skill under your belt!
What I think is terrible about using JDO at the time of writing this is that the only implementation vendor is Datanucleus and the drawbacks of that is the lack of competition which leads to numerous issues like:
A not very detailed documentation about some aspects like extensions
You usually get sarcastic responses from the authors like (Have you checked the logs ? May be there is a reason for having them) and annoying responses like that
You don't get an answer to your question in a helpful amount of time, sometimes if you get an answer in less than 7 days, you should consider your self lucky, even here on StackOverflow
I'm always hoping for someone to start implementing the JDO specification themselves, may be then they'll offer something more and hopefully more free attention to the community and not always bothering about being paid for support, not saying that Datanucleus authors only care about commercial support, but I'm just saying.
I personally consider Datanucleus authors has no obligation whatsoever to Datanucleus itself nor it's community. They can drop the whole project at anytime and no one can judge them for it, it's their effort and their own property. But you should know what you are getting into. You see, when one of us developers look for a framework to use, you cannot punish or command the framework's author, but on the other hand, you need your work done ! If you had time to write that framework, why would you look for one in the first place ?!
On the other hand, JDO itself has some complications like objects life cycle and stuff which isn't very intuitive and common (I think).
Edit: Now I know also JPA enforces the object life cycle mechanism, so it looks like its inevitable to deal with persisted entities life cycle states if you wish to use a standard ORM API (i.e. JPA or JDO)
What I like most about JDO is the ability to work with ANY database management system without considerable effort.
GAE/J is slated to add MYSQL before the end of the year.
JPA is the way to go as it seems to be pushed as a standardized API and has recently got momentum in EJB3.0.. JDO seems to have lost the steam.
Neither!
Use Objectify, because is cheaper (use less resources) and is faster.
FYI: http://paulonjava.blogspot.mx/2010/12/tuning-google-appengine.html
Objectify is a Java data access API specifically designed for the
Google App Engine datastore. It occupies a "middle ground"; easier to
use and more transparent than JDO or JPA, but significantly more
convenient than the Low-Level API. Objectify is designed to make
novices immediately productive yet also expose the full power of the
GAE datastore.
Objectify lets you persist, retrieve, delete, and query your own typed objects.
#Entity
class Car {
#Id String vin; // Can be Long, long, or String
String color;
}
ofy().save().entity(new Car("123123", "red")).now();
Car c = ofy().load().type(Car.class).id("123123").now();
ofy().delete().entity(c);

Categories