Client side caching in GWT - java

We have a gwt-client, which recieves quite a lot of data from our servers. Logically, i want to cache the data on the client side, sparing the server from unnecessary requests.
As of today i have let it up to my models to handle the caching of data, which doesn't scale very well. It's also become a problem since different developers in our team develop their own "caching" functionality, which floods the project with duplications.
I'm thinking about how one could implement a "single point of entry", that handles all the caching, leaving the models clueless about how the caching is handled.
Does anyone have any experience with client side caching in GWT? Is there a standard approach that can be implemented?

I suggest you look into gwt-presenter and the CachingDispatchAsync . It provides a single point of entry for executing remote commands and therefore a perfect opportunity for caching.
A recent blog post outlines a possible approach.

You might want to take a look at the Command Pattern; Ray Ryan held a talk at Google IO about best practices in GWT, here is a transcript: http://extgwt-mvp4g-gae.blogspot.com/2009/10/gwt-app-architecture-best-practices.html
He proposes the use of the Command Pattern using Action and Response/Result objects which are thrown in and out the service proxy. These are excellent objects to encapsulate any caching that you want to perform on the client.
Here's an excerpt: "I've got a nice unit of currency for implementing caching policies. May be whenever I see the same GET request twice, I'll cache away the response I got last time and just return that to myself immediately. Not bother with a server-side trip."
In a fairly large project, I took another direction. I developed a DtoCache object which essentially held a reference to each AsyncCallback that was expecting a response from a service call in a waiting queue. Once the DtoCache received the objects from the server, they were cached inside the DtoCache. The cached result was henceforth returned to all queued and newly created AsyncCallbacks for the same service call.

For an already-fully-built, very sophisticated caching engine for CRUD operations, consider Smart GWT. This example demonstrates the ability to do client-side operations adaptively (when the cache allows it) while still supporting paging for large datasets:
http://www.smartclient.com/smartgwt/showcase/#grid_adaptive_filter_featured_category
This behavior is exposed via the ResultSet class if you need to put your own widgets on top of it:
http://www.smartclient.com/smartgwtee/javadoc/com/smartgwt/client/data/ResultSet.html

There are two levels of caching:
Caching during one browser session.
Caching cross browser sessions, e.g the cached data should be available after browser restarted.
What to cache: depend on your application, you may want to cache
Protected data for particular user
Public static (or semi-static, e.g rarely to change) data
How to cache:
For the first caching level, we can use GWT code as suggested in the answers or write your own one.
For the second one, we must use Browser caching features. The standard approach is put your data inside html (whether static html files or dynamic generated by jsp/servlet for example). Your application then use http://code.google.com/webtoolkit/doc/latest/DevGuideCodingBasicsOverlay.html techniques to get the data.

I thought Itemscript was kind of neat. It's a RESTful JSON database that works on both the client (GWT) and server.
Check it out!
-JP

Related

Best way to directly manipulate java-based backend objects from flex front-end?

I'm currently stuck between two options:
1) Store the object's information in the file.xml that is returned to my application at initialization to be displayed when the GUI is loaded and then perform asynchronous calls to my backend whenever the object is edited via the GUI (saving to the file.xml in the process).
-or-
2) Make the whole thing asynchronous so that when my custom object is brought up for editing by the end-user it queries the backend for the object, returns the xml to be displayed in the GUI, and then do another asynchronous call for if something was changed.
Either way I see many cons to both of these approaches. I really only need one representation of the object (on the backend) and would not like to manage the front-end version of the object as well as the conversion of my object to an xml representation and then breaking that out into another object on the flex front-end to be used in datagrids.
Is there a better way to do this that allows me to only manage my backend java object and create the interface to it on the front-end without worrying about the asynchronous nature of it and multiple representations of the same object?
You should look at Granite Data Services: http://www.graniteds.org If you are using Hibernate: it should be your first choice, as BlazeDS is not so advanced. Granite implements a great facade in Flex to access backend java objects with custom serialization in AMF, support for lazy-loading, an entity cache on the flex-side with bean validation. Globally, it is a top-down approach with generation of AS3 classes from your java classes.
If you need real-time features you can push data changes on flex client (Gravity module) and solve conflicts on the front side or implement conflict resolvers on the backend.
Still you will eventually have to deal with advanced conflicts (with some "deprecated" flex objects to work with on the server: you don't want to deal with that), a basic feature for instance is to add a version field and reject manipulation of such objects on the backend automatically (many ways to do that): you will have to implement a custom way for a flex client to update itself to the current changes implying that some work could be dropped (data lost) on the flex client.
If not so many people work on the same objects on your flex application, this will not happen a lot, like in a distributed VCS.
Depending on your real-time needs (what is the frequency of changes of your java object? This is the most important question), you can choose to "cache" changes in the flex side then updating the whole thing once (but you'll get troublesome conflicts if changes have happened) or you can check everytime the server-side (granite enables this) with less conflicts (and if one happens: it is simpler) but you'll generate probably more code to synchronize objects and more network traffic.

architecture - should I combine many ad-hoc applications to a single application?

My main goal is providing a search application written in jquery that is based on solr. (For those who unfamiliar with solr, just assume its a rest api that can return search result.)
For this goal I wrote many small applications and servlets that each one does an ad-hoc task.
For example:
SearchApp - a jquery app in which an end user can perform searches.
SolrProxy - A java servlet that plays a proxy role between the SearchApp and solr. One of the things it does is logging the user request for later analysis.
StatsApp- a servlet that performs analysis of the user activity and returns a json with the data.
Indexer - a java application that indexes data to solr according to my requirements. in this process it also fetches an SQLServer DB, and then performs some update commands to the DB.
IndexerServlet - an asynchronous servlet that uses Indexer to provide an ability to execute index by http request.
Nutch - an open source project that indexes data to solr for other requirements that are not accomplished in Indexer(3).
(MAYBE) - some service that will perform scheduled Nutch running.
And more components might be added.
It seems a bit wrong to have multiple java projects that each one does a single task, instead of having one project that handles most of the components.
Any ideas and insights on this?
Should I combine all the java apps to a single project? should I use some kind of a fremework for this? or should I live it as it is now?
I don't think it's a bad idea that you have all these separate applications. They all seem to be doing one thing, and doing it well. What you can do, is expose them via a unified interface. So essentially you have a facade that sits in front of all these disparate services that presents an abstract and uniform interface. The consumers of this service will have no idea what sits behind that facade. This is just as well, because now you can discretely update and replace individual components without affecting others. If you had combined all of them into one, you would have to push a new release every time you modified one of the components.

is it possible save state between requests in GAE/java

I plan to implement a GAE app only for my own usage.
The application will get its data using URL Fetch service, updating it every x minutes (using Scheduled tasks). Then it will serve that information to me when I request it.
I have barely started to look into GAE, but I have a main question that I am not able to clear. Can state be maintained in GAE between different requests without using jdo/jpa and the datastore?
As I am the only user, I guess I could keep the info in a servlet subclass and so I can avoid having to deal with Datastore...but my concern is that, as this app will have very few request, if it is moved to disk or whatever (don't know yet if it has some specific name), it will loose its status?
I am not concerned about having to restart the whole app and start collecting data from scratch from time to time, that is ok.
If this is an app for your own use, and you're double-extra sure that you won't be making it multi-user, and you're not concerned about the possibility that you might be using it from two browsers at once, you can skip using sessions and use a known key for storing information in memcache.
If your reason for avoiding datastore is concern over performance, then I strong recommend testing that assumption. You may be pleasantly surprised.
You could use the http session to maintain state between requests, but that will use the datastore itself (although you won't have to write any code to get this behaviour).
You might also consider using the Cache API (like memcache). It's JSR 107 I think, which Google provide an implementation of. The Cache is shared between instances, but it can empty at anytime. But if you're happy with that behaviour this may be an option. Looking at your requirements this may be the most feasible option, if you don't want to write your own persistence code.
You could store data as a static against your Class or in an application scoped Object, but doing that means when your instance spins down or your instance switches to another instance, the data would be lost as your classes would need to be loaded into the new instance.
Or you could serialize the state to the client and send it back in with each request.
The most robust option is persistence to the datastore - the JPA code is trivial. Perhaps you should reconsider?

What is the disadvantage of DWR?

While using DWR in a intranet, will disadvantages like perfomance or security issues occur?
Direct web remoting is a tool which uses Ajax request to contact a server from a js file.
One thing I would watch out for is that your server will most likely get hit by more HTTP requests than if you have the (normal) full page HTTP delivery.
Let me explain. When your web page is AJAX-enabled, your clients will end up creating more HTTP requests for (say) form filling, page-fragment regeneration etc. I've seen scenarios where developers have gone AJAX-crazy, and made the web page a largely dynamic document. This results in a great user experience (if done well), but every request results in a server hit, leading to scalability and latency issues.
Note - this isn't particular to DWR, but is an AJAX issue. I've used DWR, and it works nicely. Unfortunately, I found that it worked so well, and so easily, that everything becomes a candidate for remoting, and you can end up with huge numbers of small requests.
I worked on a project with DWR - a really nice tool.
I'm not convinced about the pace of development though. They did post on the development log that they're working on getting 3.0 out the door, but the last stable release - 2.0 - was out in summer 2006. It's a bit worrying taken from a support perspective - bug fixes especially.
Main problem I've experienced is trying to script a load test on a system where the main bulk of the work is done via DWR calls. The format of the calls is difficult to replicate when compared with just replying a bunch of urls with changing parameters.
Still DWR is an excellent framework and makes implementing Javascript -> Java RPC pretty damn easy.
One feature missing of current DWR 3.x that any user should take good care is that when an instance of a bean has properties of NULL value, those properties will be still injected to the JSON and these redundant data DO affect the performance.
When a property has the value of NULL, usually it should not be sent to frontend.
Details of problem: http://dwr.2114559.n2.nabble.com/Creating-Custom-bean-converter-td6178318.html
DWR is a great tool when your site has a lot of ajax calls.
Each page that makes dwr rpc calls needs to include :
a) an interface file corresponding to the calls being made.
and
b) a js file bundled with dwr that contains the dwr engine code that makes these calls possible. for e.g. <script src="/dwr/engine.js" ></script>
one technique that is frequently used while optimizing web applications is to use the browser cache as much as possible when a resource(like a js file) has not changed on a server.
engine.js is something that will never change unless you upgrade your dwr to a newer version. But, by default, engine.js is not a static file served by your webserver. its bundled as part of the dwr tool itsef and is served by the dwr controller/servlet.this doesnt aid client side caching.
So, it is beneficial to save engine.js under the document root of your webserver and let the webserver serve it as a static file.
The biggest difference among other solutions to transfer objects (marshaling) is object references.
For instance, if you use it to transfer a tree:
A
|-B
|-C
in a list {A,B,C}:
B.parent = A
C.parent= A
then A is the same object in Javascrit!
On the bad side, if you have complex structures with circular dependencies and lot of objects: A<-B, B<-C, C<-B, C<.A,... it could crash.
Anyway, I use it in a real project used by many hundreds of companies in production to transfer thousands of objects to a single html page in order to draw a complex graph and it works nicely with a good performance.

Share file storage index with multiple open applications in Java

I'm writing an HTTP Cache library for Java, and I'm trying to use that library in the same application which is started twice. I want to be able to share the cache between those instances.
What is the best solution for this? I also want to be able to write to that same storage, and it should be available for both instances.
Now I have a memory-based index of the files available to the cache, and this is not shareable over multiple VMs. It is serialized between startups, but this won't work for a shared cache.
According to the HTTP Spec, I can't just map files to URIs as there might be a variation of the same payload based on the request. I might, for instance, have a request that varies on the 'accept-language' header: In that case I would have a different file for each subsequent request which specifies a different language.
Any Ideas?
First, are you sure you want to write your own cache when there are several around? Things like:
ehcache
jboss cache
memcached
The first two are written in Java and the third can be accessed from Java. The first two also handle distributed caching, which is the general case of what you are asking for, I think. When they start up, they look to connect to other members so that they maintain a consistent cache across instances. Changes to one are reflected across instances. They can be set up to connect via multicast or with specific lists of servers specified.
Memcached typically works in a slightly different manner in that it is running externally to the Java processes you are running, so that all Java instances that start up will be talking to a common service. You can set up memcached to work in a distributed manner, but it does so by hashing keys so that the server you want to connect to can be determined by what it is you are looking for.
Doing a true distributed cache with consistent content is very hard to do well, which is why I suggest looking at an existing library. If you want to do it yourself, it would still help to look at those listed to see how they go about it and consider using something like JGroups as your underlying mechanism.
I think you should have a look at the WebDav-Specifications. It's an HTTP extension for sharing/editing/storing/versioning resources on a server. There exists an implementation as an Apache module, wich allows you a swift start using them.
So instead of implementing your own cache server implementation, you might be better off with a local Apache + mod-dav instance that is available to both of your applications.
Extra bonus: Since WebDav is a specified protocoll you get the interoperability with lots of tools for free.

Categories