Anyone using JavaSpaces technology? [closed]

Anyone using JavaSpaces technology? [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Are there real practical uses of JavaSpaces technology out there and how exactly is it implemented?

We are currently using javaspaces (the Sun outrigger implementation), to coordinate loosely coupled processes. The idea behind it is compelling, and the API is very simple. The actual implementation has been a problem. It's built on Jini, so 5 or 6 processes are required to bring up a space. And, at least in Sun's implementation, there is no way to have it communicate over specific ports, which makes firewalls a bit of a pain.
The other issue that we have run into is that there is no implied ordering in the space. So if you put 5 objects in, and your template on the read/take matches all 5, it is unspecified which one you will get. Depending on the application, this may or may not be an issue.

GigaSpaces is a mature version of JavaSpaces. It is widely used in financial applications, which are kept quiet.
As for the Implementation it is basically an transactional Object database on top of Jini. The queries are similar to db4o.

I've seen it used in a financial application, mostly for managing compute workers (grid style) where entries were written into the space from front-tier applications and pulled out by workers by matching on a field showing work was needed. Results could be written back into the space, triggering a notify registered by the front-tier app which then reads back the finished piece of work.
For compute workers it's OK, but lack of ordering may be an issue for you (if only because of unpredictability) - some implementations have features to enforce FIFO ordering. It was also used for long term data storage as it was persistent, but I don't think that was a good idea. The admin tooling wasn't good enough to make it manageable and performance suffered due to the volume of data.
Dan Creswell's Blitz JavaSpaces implementation was used - it's got a good range of features (can run in transient or persistent modes), is designed to be robust (with transaction logging) and retain high performance, and it's very tunable. As with the other Jini services, you can configure the "exporter" to have it listen on specific ports to make firewalling easier - SSL transports and full PKI were used too and are made possible by Jini's abstraction of communication.
I think Gigaspaces is the only implementation that has continued to innovate by extending the specification in numerous ways, which is nice to see. They've made it fit a wide variety of use-cases and added implementation features such as clustering and high availability. Using it would worry me though, as I'd be much happier seeing two or more implementations of these features in the community, given Gigaspaces is fairly proprietary.

I believe Orbitz which is a reservation system for hotels runs on Jini.
Based on Java Posse episodes #82, #84 and #86 which is an interview with Vin Simmons this technology is sometimes used in military or financial applications which are unfortunatley on the quiet.

I used it a few years back but it probably has not changed much.
#Keith: It is(used to be atleast) possible to start all the services in a single process/JVM and I think there is documentation out there on how to do this.
I believe Jini/Javaspaces is used in a few large applications (ticketing, cell phones etc) in Europe. Also used by GE Aircraft for research and analysis.
SORCER lab at Texas Tech has a large SOA architecture built on top of Jini/Javaspaces and you may be able to find some help there.

I'm not aware of any new usage of JavaSpaces at this point in time. For distributed computing, most large-scale systems are being built with In-Memory Data Grid technology or partitioned NoSQL-like solutions. (I see a lot of Oracle Coherence being used, but that's probably because I work with it.)
For the sake of full disclosure, I work at Oracle. The opinions and views expressed in this post are my own, and do not necessarily reflect the opinions or views of my employer.

Related

Which are the advantages of developing in Java a server-side application compared to other languages? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
Our company is starting the development of a client-server application and a discussion is going on about which technologies should be used.
For the client (GUI) side we tend to QT and C++. For the server side, we have been advised to use Java and indeed it looks like it is one of the languages being used most for server development.
Can anyone elaborate on the advantages offered by Java for server side development and why adopting it should make our life as developers easier and/or allow us to reach better results than if we used, let´s say, .NET or even C++?
Thanks in advance.

Some advantages:
Run compiled code across platforms.
Managed memory (garbage collection).
Hude wealth of excellent open-source libaries.
Large developer market.
Easy migration for C++ developers.
Some disadvantages:
Aging language — has not kept up with language advances IMO (e.g. adding functional facilities).
Future uncertain after Oracle aquistion (will become clearer with time).
Low level programming difficult.
You may want to look also at other languages which run on the JVM, such as Scala and Groovy, at .NET (it can run on Linux et al using Mono) and even the D language, which provides a C++ like, compiled to native, language with modern features such as garbage collection (optional), code contracts, lambdas etc. These languages provide many of the benefits of Java over C/C++ but have also taken the progression a bit further or in different directions.

Apart from platform independence, the main advantage of server-side Java development is the wide selection of mature libraries and standardized frameworks. However, the main focus here is on web development.
For a C++ client, Java could still be beneficial if you use REST as protocol between client and server (JAX-RS is pretty nice). Otherwise, it depends very much on your application domain and whether there are Java libraries that could help you in that regard.

Let's put it this way... it's not which server-side language is better and what not, it is what's available in your company that you can leverage of and make good use of it. When you work in a big corporation, sometimes you cannot just introduce "yet another language"... it doesn't work that way. :)
Further, every language has its pros and cons. You can almost argue the pros/cons in both way depending on how biased or open-minded you are. You can choose RoR and all that bleeding age technologies, but if your team members are not comfortable in dealing a brand new language, how exactly are you going to maintain the project in long run? I mean, if your team is familiar with PHP, I don't see anything wrong using that compared to Java, .NET, etc.
Your customers don't care the underlying implementation as long as it works.

Java advantages:
- mature
- good to excellent backward compatibility
- wide range of available frameworks for almost any problem
- robust - garbage collection, APIs as java.util.concurrent
- great tools to manage code quality, good IDEs etc.
- very good performance
- support for scripting
disadvantages:
- sometimes too many frameworks for the same thing
- not all the frameworks have as good quality of code as you need
- looks easier than really it is

You have many options in server side. Since you have the control over server side you can basically use anything. Using .NET forces you to use Windows Server so i will prefer a framework that can run on any operating system and is portable.
Java was the right answer 5-10 years ago. Because it had portability, and can work on any system. But these days developers look for languages/frameworks that are easier to use, maintain and code. I will vote on Python these days for server side development because of this, its fast it easier to read and maintain code wise, and it has many open source projects/libraries that you can use, even Google is favoring python over Java(GAE had python support first, then support for Java came). You can use django on python for web development and twisted for writing a server that uses TCP to communicate.

There are several issues you need taken in accout to select the language:
which are the languages which know your team best / good enougth
which are the languages which know the team thet must maintain the server
are there the right frameworks with an quality that makes you want to use them
will the code be maintainable as long as the server is in production
how fast will be the development -- the importent thing here is not the time you spend to type the code - more important is the time that you need until the product works stable enougth to use it for production without reasonable bugs
communication with other systems - if every system you need to communicate with is an .net - that it would be wise to build the new system in .net too
are there any constraints (must use this server, open source policy of your company, ...)
cost of licences, ...
...
At least the descicion to use a specific language for an project with a reasonable size, is always the question of cost. But not only the cost to build the system, also the cost to maintain it. - The points mentioned above are all cost related: for example: if you do not knwo the language you are slower (-> $), if the system can not be maintained, it must be rebuiled (-> $), if there are not the right libs, you need to implemnt it by your own (-> $), if the language you picked make it easy for bugs to hide, you need a long time until the system can go in production (-> $)
In MHO, the advantages of Java are: the wide spread knowlege (this is for .net too), a huge amount of realy mature open source framworks (this is the point for Java against .net), and the usage of a strong typed system and a compiler wich result in less bugs is a long term advantage of Java and .net over every not strong typed scripting language)
One must have for all languages you use on a Server is an Garbage collection!

RealWorld HazelCast [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone have any real world experience with Hazelcast distributed data grid and execution product? How has it worked for you? It has an astonishingly simple API and functionality that seems almost to good to be true for such a simple to use tool. I have done some very simple apps and it seems to work as advertised so far. So here I am looking for the real world 'reality check'. Thank you.

We've been using it in production since version 1.8+, using mainly the distributed locking feature. It works great, we've found a couple of workarounds/bugs, but those were fixed relatively fast.
With 1.8M locks per day we found no problems so far.
I recommend start using version 1.9.4.4.

There are still some issues still with its development,
http://code.google.com/p/hazelcast/issues/list
Generally, you can choose to either let it use its own multicast algorithm or specify your own ip's. We've tried it in a LAN environment and it works pretty well. Performance wise it's not bad but the monitoring tool didn't work very well as it failed to update most of the time. If you can live with the current issues then by all mean go for it. I would use it with caution but it's a great working tool IMHO.
Update:
We've been using Hazelcast for a few months now and it's working very well. The settings are relatively easy to set up and with the new updates, are comprehensive enough to customize even small things like the number of threads allowed in read/write operations.

We are using Hazelcast (1.9.4.6 now) in production integrated with a complicated transactional service. It was added to alleviate immediate database throughput issues. We have discovered that we frequently have to stop it bringing down all transaction services for at least an hour. We are running clients in superclient mode because it is the only option that even remotely meets our performance requirements (about 4 times faster than native clients.) Unfortunately stopping a superclient node causes split brain issues and causes the grid to lose records, forcing a complete shutdown of services. We have been trying to make this product work for us for almost a full year now, and even paid to have 2 hazelcast reps flown in to help. They were unable to produce a solution, but were able to let us know that we were probably doing it wrong. In their opinion it should work better but it was pretty much a wasted trip.
At this point we are on the hook for over 6 figures per year in licensing fees and we are currently using about 5 times the resources to keep the grid alive and meet our performance needs than we would be using with a clustered and optimized database stack. This was absolutely the wrong decision for us.
This product is killing us off. Use with caution, sparingly, and only for simple services.

If my own company and projects count as real world, here's my experience. I wanted to get as close to eliminating external (disk) storage in favor of limitless and persistent "RAM". For starters that eliminates CRUD plumbing which sometimes makes up to 90% of the so-called "middle tier". There are other benefits. Since RAM is your "database" you don't need any complex caches or HTTP session replication (which in turn eliminates ugly sticky session technique).
I believe RAM is the future and Hazelcast has everything to be an in-memory database: queries, transactions, etc. So I wrote a mini-framework abstracting it: to load data from the persistent storage (I can plugin anything that can store BLOBs - the fastest turned out to be MySQL). It is too long to explain why I didn't like Hazelcast's built-in persistence support. It's rather generic and rudimentary. They should remove it. It is not rocket science to implement your own distributed and optimized write-behind and write-through. Took me a week.
Everything was fine until I started performance-testing. Queries are slow - after all of the optimizations I did: indexes, Portable serialization, explicit comparators, etc. A simple "greater than" query on an indexed field takes 30 seconds on the set of 60K of 1K records (map entries). I believe Hazelcast team did everything they could. As much as I hate to say it, Java collections are still slow compared to super-optimized C++ code normal databases use. There are some open-source Java projects that address that. However at this time query persistence is unacceptable. It should be instant on a single local instance. It is an in-memory technology after all.
I switched to Mongo for the database, however left Hazelcast for shared runtime data - namely sessions. Once they improve query performance I'll switch back.

If you have alternatives to hazelcast maybe look at these first. We have it in running production mode and it is still quite buggy, just check out the open issues.
However, the integration with Spring, Hibernate etc. is quite nice and the setup is really easy :)

We use Hazelcast in our e-commerce application to make sure that our inventory is consistent.
We use extensive use of distributed locking to make sure SKU Items of inventory are modified in atomic way because there are hundred of nodes in our web application cluster that operates concurrently on these items.
Also, we use distributed map for caching purpose which are shared across all the nodes. Since scaling node in Hazelcast is so simple and it utilises all its CPU core, it gives added advantage over redis or any other caching framework.

We are using Hazelcast from last 3 years in our e-commerce application to make sure availability (supply & demand) is consistent, atomic, available & scalable.
We are using IMap (distributed map) to cache the data and Entry Processor for read & write operations to do fast in-memory operations on IMap without you having to worry about locks.

cloud hosting vs. managed hosting [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
It seems the hype about cloud computing cannot be avoided, but the actual transition to that new platform is subject to many discussions...
From a Theoretical viewpoint, the following can be said:
Cloud:
architectural change (you might not install anything you want)
learning curve (because of the above)
no failover (since failure is taken care of)
granular cost (pay per Ghz or Gbyte)
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Managed:
failover (depends on provider)
manual scalability (requires maintenance)
static cost (you pay the package , whether you use it fully or not)
lower cost (for entry- packages only)
data ownership ( you do )
liberty ( you do ) ? lower latency ( depends on provider)
Assuming the above is correct or not; Nevertheless, a logical position is "it depends.." .. on the application itself.
Now comes the hidden question: how would you profile your j2ee app to determine if it is a candidate to cloud or not; knowing that it is
a quite big app in number of services/ functions (i.e.; servlets)
relies on a complex database (ie. num. tables)
doesn't need much media resources, mostly text based

"Now comes the hidden question: how would you profile your j2ee app to determine if it is a candidate to cloud or not; knowing that it is"
As an aside, make that the Explicit question. Make it the TITLE of this question. Put it in the beginning of the question. If possible, delete all of your assumptions, and focus on the question.
Here's what we do.
Call some vendors for your "cloud" or "managed service" arrangement. Not too many. One or two of each.
Ask them what they support. More importantly, what they don't support.
Then, given a short list of features that aren't supported, look at your code for those features. If they don't support things you need, you have some architecture work to do. Or cross them off your preferred vendor list.
For good vendors, write a pilot contract that gives you free (or cheap) access for a few months to install and test. If it doesn't work, you haven't paid much.
"But why go through the expense of trying to host it when it may not work?"
What expense? You can spend months "studying" your code. Or you can try to host it. Usually, the "try to host it" will turn up an answer within a few days. It's less effort to just do it.

What sort of Cloud Service are you talking about? IaaS, PaaS, DaaS ?
architectural change (you might not install anything you want)
Depends: moving from a "managed server" to a Platform (e.g. GAE) might be.
learning curve (because of the above)
Amazon EC2 might not be a big learning curve if you are used to running your own server
no failover (since failure is taken care of)
Depends: EC2 -> you have to roll your own
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Depends: EC2 -> you have to plan for this / use an adjunct service

There are numerous cloud providers and as far as I've seen there are two main types of them:
Ones that support a cloud computing platform (e.g. Amazon E2C, MS Azure)
Virtual instance providers providing you ability to create numerous running instances (e.g. RightScale)
As far as the platforms, be aware that relational database support is still quite poor and the learning curve is quite long.
As for virtual instance providers the learning curve is really little (you just have to fire up your instances), but instances need some way of synchronizing... for a complex application this might not work.
As for your original question: I don't think there's any standard way you could profile wether an application should / could be moved to the cloud. You probably need to familiarize yourself with the options, narrow down to a few providers and see if the benefits you would get to them would be of any significant win over managed hosting (which you're probably currently doing).

In some ways comparing google app engine (gae) and amazon ec2 is like comparing apples and oranges.
With ec2 you get an operating system, with or without installed server software (tomcat, database, etc.; your choice, depending on which ami you choose). With ec2 you need a (or be a) system administrator to keep things running smoothly. Load balancing on ec2 is something you'll have to figure out and implement; I've never done that part. The big advantage with ec2 is that you can spin up and down new instances programatically, and compared to a regular web server provider, only pay for when your instance is up and running. You use this "auto spin up/down" to implement your load balancing and failover. But you have to do the implementation (again, I have no experience with this part).
With google app engine (gae), all of the system administration is taken care of for you. It also automatically scales out as needed, both on the app side and the database side. You also only pay for what you use; an idle app that gets no hits incurs no costs. The downsides to gae are that you're restricted in the languages you can use; python and java (or things that run on the jvm, like jruby). An even bigger downside is that the database is not sql (they don't call it a database; they call it the datastore) and will likely require reworking your ddl if you have an existing database; it's a definite cost in programmer time to understand how it works and how to use it effectively.
So if you're starting from scratch (or willing to rewrite) and you have the resources and time to learn its ways, gae may be the way to go. If you have a sysadmin or sysadmin skills and have the time or know how to set up load balancing and failover then ec2 could be the way to go. If your app is going to be idle a lot then ec2 is expensive; last time I checked it was something like $70 a month to leave a small instance up.

I think the points you have to focus are:
granular cost (pay per Ghz or Gbyte)
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Changing your application to run on a cloud will take a good amount of time, but it will not really matter if the cloud don't lower your costs and/or you don't really need instantaneous/fast scalability (the classic example are eCommerce app)
After considering these 2 points. The one IMO you should think about is relies on a complex database (ie. num. tables), since depending on its "conplexity", changing to a cloud environment can be really troublesome.

Ideas on this alternative to ORM + RDBMS? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am currently developing a proof of concept for an alternative data store. The reason why is I need to enhance a read-mostly clustered webapp, but also because I want to free myself from the pain of the sometimes overly-complex ORM+RDBMS solution.
Overall the idea is quite similar to a distributed cache with persistence (letting the cluster be the SoR), however:
want to be able to retrieve any object along with its children, by
id (providing class & id) [only that to start off, as the main
querying part is already resolved with lucene in my app].
need to have map of maps of types ( ~ tables in the relational
world), and therein distributed maps of 'dehydrated' stored objects (flattening the object graph via reflection deep cloning)
a bin log (like Prevayler, for example) for
eventual recovery if whole cluster goes down
development (and ability to refactor code / change structure)
perhaps asynchronously processed for other purposes (reporting, whatever)
eventually later on try to integrate a statically-typed query mechanism, like LINQ, Jaque or H2's JaQu / see ODBs / Lucene (?)
it has to be transaction-aware (not sure "JTA type" though)
I'm planning to implement this idea with Hazelcast (I love its super-simple API) or Terracotta (which I never used - but I'm aware of their 'sweet spot', medium-term data). If you will, my aim is to do more or less what Jonas once blogged about. Using one of these, stored data would roughly have to fit in the sum of the JVM heaps of the cluster.
This should be pretty simple to scale, would avoid the relational impedance mismatch (ie save as with an ODB) and JDBC + I/O overhead.
Do you know of other tools/frameworks or combination thereof already providing similar functionality, that I'm ignoring?
Can you suggest other ways of tackling this 'getting rid of the DB'? What flaws do you already see in this idea?
Concurrency-wise would it make sense to consider Scala instead of Java?
How about non-relational data stores such as Couch DB, Neo4j, HyperTable, HBase?
A similar question was asked one month ago - but there was no concrete solution.
BTW I just stumbled upon the concept of Enterprise Data Fabric, which, to my surprise, describes a lot of these ideas.

Definitely give Terracotta a try. It's free (unless you go Enterprise which has an SLA and support). It is a JVM-level cluster, so to speak, so you don't have the issues associated with sessions on multiple boxes behind disparate JK workers (assuming you're using this for a J2EE app).
I'm just rambling, so have a look here: http://en.wikipedia.org/wiki/Terracotta_Cluster
UPDATE numerous bits of info on Terracotta on the web too, e.g. http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html
UPDATE2 Bit of background on my views: I work for a company with a fairly big audience. We have a enterprise MySQL running with a master and about 5 slaves (times 2 considering we have 2 channels, with 4 app servers per channel), using MySQL's JDBC Replication driver (for which we've already submitted various patches). We use Spring2.5/Hibernate3 using Spring's declarative JTA transaction management, so read-onlies go to the slaves. With the advent of numerous Ajax enhancements on a future version of our site, our DB servers' load has gone up - we create pricing summaries for thousands of products for all countries, taking into account duties/tax rules for all these countries (plus promotions and real-time auctions running all the time), then the Ajax services have the latest prices in a blink. Terracotta takes the load off the DB and app servers by making these prices available to all app servers on a JVM-layer, with all the JVMs across the boxes linked. So, server A can update the prices every few minutes, and if Ajax hits server B, the prices are available immediately. I know there are people/companies out there with similar businesses, who probably have better ideas and implementations, so I'm always open for discussion, but this is my two cents.
I get inspiration from the guys at Facebook too, for instance this very informative article:
http://www.facebook.com/note.php?note_id=23844338919
They talk about memcached which you should also definitely check out.

As Neo4j is mentioned in the question, I'm chiming in with a few thoughts on using a graph database in this case. (I'm part of the Neo4j team)
retrieving children is trivial in a
graph db
there is a map implementation
for neo4j
as graphs are native to a graph db
you could consider not to flatten the
object graph, but to persist data in nodes
and edges/relationships (this gives you
more flexibility in handling the data)
neo4j is fully transactional
With the new DB technologies emerging today, there's really no need to stay with a RDBMS if your data isn't a good fit for the relational paradigm.

Seems to me Terracotta is a perfect fit for your requirements:
cluster a map to retrieve children
via keys (e.g. clustered Map)
map of maps - no problem
no explicit bin log - but Terracotta already persists everything to disk so full cluster restart is already supported
integrated already to Compass, Hibernate Search, and Lucene for search
Transactions? Too slow. Use the cache as a datastore. With persistence you won't lose data writing to (clustered) memory and trickle back to the DB.
In addition, Terracotta does the "reflection" thing you ask for - although it doesn't use reflection as that is far too slow. It uses BCM. Only changes are propagated on the network.
Hazelcast btw requires serialization so it will be slow and will not do well at all with a map of maps data structure (every put will result in a full deep clone copy across the network) and it doesn't have any kind of persistence built in.

Interesting.
I have a view that we all develop a zoo which comprises all the abstraction layers we habitually use in our projects. And each abstraction layer is a completely different animal.
My goal is to minimize the amount of time spent on just care and feeding of the animals whenever it diverts me from solving the problem at hand - it's overhead - wasted resources. So the fewer, simpler abstraction layers we can get away with, the more productive we are.
I can usually do just fine with two beasties - OOP and RDBMS, coupled through nice, simple, minimal, hand-crafted DAL. For me, ORM is mostly overhead - one abstraction too many, and a pretty hungry one.
Don't discount the option of treating stored procedures as an abstraction tool, either. If you're real comfortable with SQL, it can be a useful resource for implementing a light-weight BL facade that means not needing to think about the ORM problem.
And this post suggests the emergence of alternatives to RDBMS for some requirements, anyway.

Thanks for your answers.
Actually, you talk about DBs which is something I want to completely take out of the picture.
The use case I'm targetting is a startup's small/medium-sized clustered webapp (boxes in a LAN, or in the cloud). It needs to retrieve objects at ~RAM-speed levels and scale fairly easily. As a side-effect, one wouldn't have to think about DB server installations, impedance mismatch, JDBC, caches, polluting domain models with annotations, etc.
Again, what I want to accomplish is something like described here, and I would love to have some more feedback on ideas concerning the actual implementation (why use Terracotta instead of Hazelcast, use serialization or deep cloning via reflection or whatever else, and also the major drawbacks of an approach like this - eg. why wouldn't you change it for your current ORM/DB setup).
It has to be super simple to integrate so it'll feature a really neat Java API, improving code readability. No other software (DB, memcached will be required).

Try GigaSpaces. I think they have exactly what you require, and if I'm not mistaken there's a free version for startups.
Some concepts:
"Space" is some place where you can store and retrieve objects
Space can be backed by any JDBC-compliant DB, automatically (no code, only configuration)
Space can be started in your java process, so all accesses are at RAM speed
Space can be clustered/partitioned in any way you want (full mirror, partial, grid).
Space supports distributed or local transactions
Check their wiki, (but only "programmer's guide" - all the rest is marketing BS).

What would you recommend for a large-scale Java data grid technology: Terracotta, GigaSpaces, Coherence, etc? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've been reading up on so-called "data grid" solutions for the Java platform including Terracotta, GigaSpaces and Coherence. I was wondering if anyone has real-world experience working any of these tools and could share their experience. I'm also really curious to know what scale of deployment people have worked with: are we talking 2-4 node clusters or have you worked with anything significantly larger than that?
I'm attracted to Terracotta because of its "drop in" support for Hibernate and Spring, both of which we use heavily. I also like the idea of how it decorates bytecode based on configuration and doesn't require you to program against a "grid API." I'm not aware of any advantages to tools which use the approach of an explicit API but would love to hear about them if they do in fact exist. :)
I've also spent time reading about memcached but am more interested in hearing feedback on these three specific solutions. I would be curious to hear how they measure up against memcached in the event someone has used both.

You may want to check out Hazelcast also. Hazelcast is an open source transactional, distributed/partitioned implementation of queue, topic, map, set, list, lock and executor service. It is super easy to work with; just add hazelcast.jar into your classpath and start coding. Almost no configuration is required.
Hazelcast is released under Apache license and enterprise grade support is also available. Code is hosted at Google Code.

We had a 50 servers running a webservice application and all these servers were load balanced using bigIP. The requirement was to cache each user state so that subsequent states don't do the same processing again and get the data from previous state. This way the client of the webservice don't need to maintain state.
We used Terracotta to cache the states and never faced any performance issue. At peak times number of request application is getting is 100 per second.

The library you choose really depends on your application and what you're trying to achieve.
I worked for a shop that used Coherence to provide scalability (and redundancy, sort of) for it's web applications. We found that you have to have around 4-5 nodes to start getting any benefits from Coherence (2 or 3 nodes potentially reduces performance). I believe Oracle's docs say you need lots (30+) nodes to really get a benefit with Coherence. If you do go with Coherence, make sure you get your hardware set up properly - it is very sensitive to latency.
I personally would stay away from a "drop-in" stuff. They might give you something to start with, but you'll eventually run into synchronization or performance problems and will have to start writing code specific to your grid layer anyway. Basically, you know your app better than the library, and will be able to figure out which items need to be in cache, how long they need to live, how your app will be used, etc.

I don't have enough experience with these technologies, but I think Apache Hadoop is proved to be scalable and reliable. Yahoo ran it on 10,000 core Linux cluster.
It's based on Google MapReduce algorithm.
This article describes MapReduce and why you should care about it.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.