Different scenarios on distributed processing [closed] - java

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a web application - a simple web application archive file - that having several storage adapters for different storage types ie. MongoDB and CouchDB. By using this application I can store/query data to those databases with the web services I have written. Currently I can only have one single database instance per application, cannot have more than one which prevents me having parallel processing.
What I want is to run my application on several machines. And above of those, I want to write a UI enabling the Client to store/query the data without knowing the database types/addresses.
I have two different scenarios and wanted to ask you which one of them is a better way to do it and why.
1) Let's say I have three servers running three single databases - couchdb. I can upload my application to those servers and then with the help of my UI or a layer above my application I can define a map of servers so that I can store and query the data.
As you see above, database and application lies in the same server, so they are remote.
2) Let's say three servers are still running remotely but in this case my application is local. And I enabled it to accept several database instances.
I actually prefer the first one since in that case I won't need to extend my application but I wanted to hear what you think about it. I will be glad if you can provide some sources for that kind of distributed scenarios - I had no experience at all on that kind of stuff.

Please take a look on article, which is describing of Instagram architecture. It's quite interesting to know how 3 engineers handled 15-25 millions of user with 150 millions of photos per day.
Also I would recommend interesting blog, which describes different scalability solutions for popular web-resources:
Facebook: An Example Canonical Architecture for Scaling Billions of Messages
Tumblr Architecture - 15 Billion Page Views a Month and Harder to Scale than Twitter
There are lots of information.
But the most common things are:
keep everything as easy as it's possible
use well-known, mature technologies, which have good support of the community
scale each part of application separately
And even though you may find explanations of each these, I'd like to focus on the last one according to your requirements.
When you want to make your application horizontally scalable, you need to consider each of clusters as separate logical module, regardless on actual number of servers, involved into cluster. F.e. for your web-application you can setup several instances of that application and set a load-balancer before them. Thus users can access single entry point (e.g. http://mysite.com), meanwhile actual instance may be arbitrary.
If you need to collaborate instances between each other, then you need to avoid in-memory storage, but use "key-value" storages, such as Redis, along with Messages Brokers, such as ActiveMQ, RabbitMQ or cloud version Iron.IO etc.
Datastorage you also need to consider as single entry point, e.g. sharded cluster (f.e. MongoDB supports out-of-box auto-sharding, and most of NoSQL solutions also have it - CouchDB, HBase).
So basically you call some shard-controller, which according specific shard-key redirects to corresponding instance. But please note, that usually sharding might be quite non-trivial thing, therefore in most cases when you deal with RDBMS you need to use vertical scalability.
Considering everything above I would suggest you such structure:
For sure ideally all the servers must be near each other physically (f.e. in the same data-centre). But if you are going to use your application as World-wide, then you need to shard your instances according less latency. Here is quite interesting lecture about server's configuration (even though it's about MongoDb, I believe some approaches might be helpful in your case as well): https://www.youtube.com/watch?v=TZOH92mZIN8
But if do not need to use all your servers for distributed "map/reduce" computing, and for getting result you need only one particular server's instance, in that case I believe scenario #1 is fairly suitable and better for your needs (in case if you setup load-balancer before your instances).

Related

Spring boot API - How to scale an App as users grow [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am making an API with Spring boot which will maybe have requests peaks.
So let's get into the worst scenario. Imagine that suddenly I have 2 million API requests.
What can I do to be able to handle this?
I read something about queues and workers but I don't know if that's the best way.
Is there some way to do it with something with AWS?
This is always a tricky question to answer. Firstly does your app really need to scale to 2 million API requests at peak? I ask because it is easy to over-engineer a solution 'to deal with future scale' which ends up a bit buggy and not even dealing with current scale very well as a result.
But assuming you really will have massive request peaks, the current microservice approach (or buzzword?) is quite a popular way of dealing with these periods of high demand. Basically you split your application up into smaller, self contained services ('microservices') which can be more-easily scaled as required.
Individual microservices can then be scaled up and down to match the load, using something like Kubernetes or Amazon ECS.
In terms of where Spring ties into this, Spring has a handy set of technologies called Spring Cloud - you'll notice Spring Cloud AWS there, too (although Spring Cloud in general can also work fine on bare metal servers, Docker, Kubernetes etc etc). A year ago I put together a simple Spring cloud/microservices demo on Github which shows how different Spring-powered microservices can fit together, which you may find useful.
The other nice thing with microservices is that you can swap out the language that a particular service is written in fairly easily, especially if the microservices 'speak' to each-other in a general format (such as JSON via REST requests). So if you had 10 different microservices, all powered by Spring Boot, but you find that a couple of them are better written in another language, you can just rewrite them: as long as they send and receive data in the same way, then the other parts of your system don't need to care.
Okay, that's a lot of buzzwords and new concepts. Feel free to ask questions if I can clarify anything, but microservices + kubernetes/AWS is a popular solution.
Other approaches which will probably work equally well, however, are:
Serverless - this is where you use a bunch of cloud-provider's tools and solutions to power your webapp, including things like lambdas, instead of hosting your application on a traditional server/VM. This medium tutorial gives a simple introduction to serverless.
Monoliths - this is a traditional web application which is a single, large, sprawling codebase, but this doesn't mean you can only have one instance of it running (i.e. you can still scale it). This very site is a successfully scalable monolith.
Your question is very broad, as there are lots of different solutions:
1) Use a load balancer and have multiple instances of your application
2) Use a containerization tool like docker and kubernetes to increase the amount of instances depending on current load. You can essentially scale on demand
3) We don't know what your app actually does: is it read heavy, is it write heavy? Will users be downloading content? The answers to this question can change whether or not a specific solution is feasible
4) You can maybe use a messenger queue like RabbitMQ to assist with distributing load across different services. You can have multiple services reading from this queue and performing actions at the same time...but again, this depends on what your app will actually be doing.
Check out AWS EC2 and Elastic Beanstalk. You can also get a simple load balancer up and running with nginx. Good luck

How to build a distributed java application?

First of all, I have a conceptual question, Does the word "distributed" only mean that the application is run on multiple machines? or there are other ways where an application can be considered distributed (for example if there are many independent modules interacting togehter but on the same machine, is this distributed?).
Second, I want to build a system which executes four types of tasks, there will be multiple customers and each one will have many tasks of each type to be run periodically. For example: customer1 will have task_type1 today , task_type2 after two days and so on, there might be customer2 who has task_type1 to be executed at the same time like customer1's task_type1. i.e. there is a need for concurrency. Configuration for executing the tasks will be stored in DB and the outcomes of these tasks are going to be stored in DB as well. the customers will use the system from a web browser (html pages) to interact with system (basically, configure tasks and see the outcomes).
I thought about using a rest webservice (using JAX-RS) where the html pages would communicate with and on the backend use threads for concurrent execution.
Questions:
This sounds simple, But am I going in the right direction? or i should be using other technologies or concepts like Java Beans for example?
2.If my approach is fine, do i need to use a scripting language like JSP or i can submit html forms directly to the rest urls and get the result (using JSON for example)?
If I want to make the application distributed, is it possible with my idea? If not what would i need to use?
Sorry for having many questions , but I am really confused about this.
I just want to add one point to the already posted answers. Please take my remarks with a grain of salt, since all the web applications I have ever built have run on one server only (aside from applications deployed to Heroku, which may "distribute" your application for you).
If you feel that you may need to distribute your application for scalability, the first thing you should think about is not web services and multithreading and message queues and Enterprise JavaBeans and...
The first thing to think about is your application domain itself and what the application will be doing. Where will the CPU-intensive parts be? What dependencies are there between those parts? Do the parts of the system naturally break down into parallel processes? If not, can you redesign the system to make it so? IMPORTANT: what data needs to be shared between threads/processes (whether they are running on the same or different machines)?
The ideal situation is where each parallel thread/process/server can get its own chunk of data and work on it without any need for sharing. Even better is if certain parts of the system can be made stateless -- stateless code is infinitely parallelizable (easily and naturally). The more frequent and fine-grained data sharing between parallel processes is, the less scalable the application will be. In extreme cases, you may not even get any performance increase from distributing the application. (You can see this with multithreaded code -- if your threads constantly contend for the same lock(s), your program may even be slower with multiple threads+CPUs than with one thread+CPU.)
The conceptual breakdown of the work to be done is more important than what tools or techniques you actually use to distribute the application. If your conceptual breakdown is good, it will be much easier to distribute the application later if you start with just one server.
The term "distributed application" means that parts of the application system will execute on different computational nodes (which may be different CPU/cores on different machines or among multiple CPU/cores on the same machine).
There are many different technological solutions to the question of how the system could be constructed. Since you were asking about Java technologies, you could, for example, build the web application using Google's Web Toolkit, which will give you a rich browser based client user experience. For the server deployed parts of your system, you could start out using simple servlets running in a servlet container such as Tomcat. Your servlets will be called from the browser using HTTP based remote procedure calls.
Later if you run into scalability problems you can start to migrate parts of the business logic to EJB3 components that themselves can ultimately deployed on many computational nodes within the context of an application server, like Glassfish, for example. I don think you don't need to tackle this problem until you run it to it. It is hard to say whether you will without know more about the nature of the tasks the customer will be performing.
To answer your first question - you could get the form to submit directly to the rest urls. Obviously it depends exactly on your requirements.
As #AlexD mentioned in the comments above, you don't always need to distribute an application, however if you wish to do so, you should probably consider looking at JMS, which is a messaging API, which can allow you to run almost any number of worker application machines, readying messages from the message queue and processing them.
If you wanted to produce a dynamically distributed application, to run on say, multiple low-resourced VMs (such as Amazon EC2 Micro instances) or physical hardware, that can be added and removed at will to cope with demand, then you might wish to consider integrating it with Project Shoal, which is a Java framework that allows for clustering of application nodes, and having them appear/disappear at any time. Project Shoal uses JXTA and JGroups as the underlying communication protocol.
Another route could be to distribute your application using EJBs running on an application server.

RealWorld HazelCast [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Does anyone have any real world experience with Hazelcast distributed data grid and execution product? How has it worked for you? It has an astonishingly simple API and functionality that seems almost to good to be true for such a simple to use tool. I have done some very simple apps and it seems to work as advertised so far. So here I am looking for the real world 'reality check'. Thank you.
We've been using it in production since version 1.8+, using mainly the distributed locking feature. It works great, we've found a couple of workarounds/bugs, but those were fixed relatively fast.
With 1.8M locks per day we found no problems so far.
I recommend start using version 1.9.4.4.
There are still some issues still with its development,
http://code.google.com/p/hazelcast/issues/list
Generally, you can choose to either let it use its own multicast algorithm or specify your own ip's. We've tried it in a LAN environment and it works pretty well. Performance wise it's not bad but the monitoring tool didn't work very well as it failed to update most of the time. If you can live with the current issues then by all mean go for it. I would use it with caution but it's a great working tool IMHO.
Update:
We've been using Hazelcast for a few months now and it's working very well. The settings are relatively easy to set up and with the new updates, are comprehensive enough to customize even small things like the number of threads allowed in read/write operations.
We are using Hazelcast (1.9.4.6 now) in production integrated with a complicated transactional service. It was added to alleviate immediate database throughput issues. We have discovered that we frequently have to stop it bringing down all transaction services for at least an hour. We are running clients in superclient mode because it is the only option that even remotely meets our performance requirements (about 4 times faster than native clients.) Unfortunately stopping a superclient node causes split brain issues and causes the grid to lose records, forcing a complete shutdown of services. We have been trying to make this product work for us for almost a full year now, and even paid to have 2 hazelcast reps flown in to help. They were unable to produce a solution, but were able to let us know that we were probably doing it wrong. In their opinion it should work better but it was pretty much a wasted trip.
At this point we are on the hook for over 6 figures per year in licensing fees and we are currently using about 5 times the resources to keep the grid alive and meet our performance needs than we would be using with a clustered and optimized database stack. This was absolutely the wrong decision for us.
This product is killing us off. Use with caution, sparingly, and only for simple services.
If my own company and projects count as real world, here's my experience. I wanted to get as close to eliminating external (disk) storage in favor of limitless and persistent "RAM". For starters that eliminates CRUD plumbing which sometimes makes up to 90% of the so-called "middle tier". There are other benefits. Since RAM is your "database" you don't need any complex caches or HTTP session replication (which in turn eliminates ugly sticky session technique).
I believe RAM is the future and Hazelcast has everything to be an in-memory database: queries, transactions, etc. So I wrote a mini-framework abstracting it: to load data from the persistent storage (I can plugin anything that can store BLOBs - the fastest turned out to be MySQL). It is too long to explain why I didn't like Hazelcast's built-in persistence support. It's rather generic and rudimentary. They should remove it. It is not rocket science to implement your own distributed and optimized write-behind and write-through. Took me a week.
Everything was fine until I started performance-testing. Queries are slow - after all of the optimizations I did: indexes, Portable serialization, explicit comparators, etc. A simple "greater than" query on an indexed field takes 30 seconds on the set of 60K of 1K records (map entries). I believe Hazelcast team did everything they could. As much as I hate to say it, Java collections are still slow compared to super-optimized C++ code normal databases use. There are some open-source Java projects that address that. However at this time query persistence is unacceptable. It should be instant on a single local instance. It is an in-memory technology after all.
I switched to Mongo for the database, however left Hazelcast for shared runtime data - namely sessions. Once they improve query performance I'll switch back.
If you have alternatives to hazelcast maybe look at these first. We have it in running production mode and it is still quite buggy, just check out the open issues.
However, the integration with Spring, Hibernate etc. is quite nice and the setup is really easy :)
We use Hazelcast in our e-commerce application to make sure that our inventory is consistent.
We use extensive use of distributed locking to make sure SKU Items of inventory are modified in atomic way because there are hundred of nodes in our web application cluster that operates concurrently on these items.
Also, we use distributed map for caching purpose which are shared across all the nodes. Since scaling node in Hazelcast is so simple and it utilises all its CPU core, it gives added advantage over redis or any other caching framework.
We are using Hazelcast from last 3 years in our e-commerce application to make sure availability (supply & demand) is consistent, atomic, available & scalable.
We are using IMap (distributed map) to cache the data and Entry Processor for read & write operations to do fast in-memory operations on IMap without you having to worry about locks.

cloud hosting vs. managed hosting [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
It seems the hype about cloud computing cannot be avoided, but the actual transition to that new platform is subject to many discussions...
From a Theoretical viewpoint, the following can be said:
Cloud:
architectural change (you might not install anything you want)
learning curve (because of the above)
no failover (since failure is taken care of)
granular cost (pay per Ghz or Gbyte)
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Managed:
failover (depends on provider)
manual scalability (requires maintenance)
static cost (you pay the package , whether you use it fully or not)
lower cost (for entry- packages only)
data ownership ( you do )
liberty ( you do ) ? lower latency ( depends on provider)
Assuming the above is correct or not; Nevertheless, a logical position is "it depends.." .. on the application itself.
Now comes the hidden question: how would you profile your j2ee app to determine if it is a candidate to cloud or not; knowing that it is
a quite big app in number of services/ functions (i.e.; servlets)
relies on a complex database (ie. num. tables)
doesn't need much media resources, mostly text based
"Now comes the hidden question: how would you profile your j2ee app to determine if it is a candidate to cloud or not; knowing that it is"
As an aside, make that the Explicit question. Make it the TITLE of this question. Put it in the beginning of the question. If possible, delete all of your assumptions, and focus on the question.
Here's what we do.
Call some vendors for your "cloud" or "managed service" arrangement. Not too many. One or two of each.
Ask them what they support. More importantly, what they don't support.
Then, given a short list of features that aren't supported, look at your code for those features. If they don't support things you need, you have some architecture work to do. Or cross them off your preferred vendor list.
For good vendors, write a pilot contract that gives you free (or cheap) access for a few months to install and test. If it doesn't work, you haven't paid much.
"But why go through the expense of trying to host it when it may not work?"
What expense? You can spend months "studying" your code. Or you can try to host it. Usually, the "try to host it" will turn up an answer within a few days. It's less effort to just do it.
What sort of Cloud Service are you talking about? IaaS, PaaS, DaaS ?
architectural change (you might not install anything you want)
Depends: moving from a "managed server" to a Platform (e.g. GAE) might be.
learning curve (because of the above)
Amazon EC2 might not be a big learning curve if you are used to running your own server
no failover (since failure is taken care of)
Depends: EC2 -> you have to roll your own
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Depends: EC2 -> you have to plan for this / use an adjunct service
There are numerous cloud providers and as far as I've seen there are two main types of them:
Ones that support a cloud computing platform (e.g. Amazon E2C, MS Azure)
Virtual instance providers providing you ability to create numerous running instances (e.g. RightScale)
As far as the platforms, be aware that relational database support is still quite poor and the learning curve is quite long.
As for virtual instance providers the learning curve is really little (you just have to fire up your instances), but instances need some way of synchronizing... for a complex application this might not work.
As for your original question: I don't think there's any standard way you could profile wether an application should / could be moved to the cloud. You probably need to familiarize yourself with the options, narrow down to a few providers and see if the benefits you would get to them would be of any significant win over managed hosting (which you're probably currently doing).
In some ways comparing google app engine (gae) and amazon ec2 is like comparing apples and oranges.
With ec2 you get an operating system, with or without installed server software (tomcat, database, etc.; your choice, depending on which ami you choose). With ec2 you need a (or be a) system administrator to keep things running smoothly. Load balancing on ec2 is something you'll have to figure out and implement; I've never done that part. The big advantage with ec2 is that you can spin up and down new instances programatically, and compared to a regular web server provider, only pay for when your instance is up and running. You use this "auto spin up/down" to implement your load balancing and failover. But you have to do the implementation (again, I have no experience with this part).
With google app engine (gae), all of the system administration is taken care of for you. It also automatically scales out as needed, both on the app side and the database side. You also only pay for what you use; an idle app that gets no hits incurs no costs. The downsides to gae are that you're restricted in the languages you can use; python and java (or things that run on the jvm, like jruby). An even bigger downside is that the database is not sql (they don't call it a database; they call it the datastore) and will likely require reworking your ddl if you have an existing database; it's a definite cost in programmer time to understand how it works and how to use it effectively.
So if you're starting from scratch (or willing to rewrite) and you have the resources and time to learn its ways, gae may be the way to go. If you have a sysadmin or sysadmin skills and have the time or know how to set up load balancing and failover then ec2 could be the way to go. If your app is going to be idle a lot then ec2 is expensive; last time I checked it was something like $70 a month to leave a small instance up.
I think the points you have to focus are:
granular cost (pay per Ghz or Gbyte)
instantaneous scalability (not so instantaneous, but at least transparent?) ? lower latency
Changing your application to run on a cloud will take a good amount of time, but it will not really matter if the cloud don't lower your costs and/or you don't really need instantaneous/fast scalability (the classic example are eCommerce app)
After considering these 2 points. The one IMO you should think about is relies on a complex database (ie. num. tables), since depending on its "conplexity", changing to a cloud environment can be really troublesome.

Ideas on this alternative to ORM + RDBMS? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am currently developing a proof of concept for an alternative data store. The reason why is I need to enhance a read-mostly clustered webapp, but also because I want to free myself from the pain of the sometimes overly-complex ORM+RDBMS solution.
Overall the idea is quite similar to a distributed cache with persistence (letting the cluster be the SoR), however:
want to be able to retrieve any object along with its children, by
id (providing class & id) [only that to start off, as the main
querying part is already resolved with lucene in my app].
need to have map of maps of types ( ~ tables in the relational
world), and therein distributed maps of 'dehydrated' stored objects (flattening the object graph via reflection deep cloning)
a bin log (like Prevayler, for example) for
eventual recovery if whole cluster goes down
development (and ability to refactor code / change structure)
perhaps asynchronously processed for other purposes (reporting, whatever)
eventually later on try to integrate a statically-typed query mechanism, like LINQ, Jaque or H2's JaQu / see ODBs / Lucene (?)
it has to be transaction-aware (not sure "JTA type" though)
I'm planning to implement this idea with Hazelcast (I love its super-simple API) or Terracotta (which I never used - but I'm aware of their 'sweet spot', medium-term data). If you will, my aim is to do more or less what Jonas once blogged about. Using one of these, stored data would roughly have to fit in the sum of the JVM heaps of the cluster.
This should be pretty simple to scale, would avoid the relational impedance mismatch (ie save as with an ODB) and JDBC + I/O overhead.
Do you know of other tools/frameworks or combination thereof already providing similar functionality, that I'm ignoring?
Can you suggest other ways of tackling this 'getting rid of the DB'? What flaws do you already see in this idea?
Concurrency-wise would it make sense to consider Scala instead of Java?
How about non-relational data stores such as Couch DB, Neo4j, HyperTable, HBase?
A similar question was asked one month ago - but there was no concrete solution.
BTW I just stumbled upon the concept of Enterprise Data Fabric, which, to my surprise, describes a lot of these ideas.
Definitely give Terracotta a try. It's free (unless you go Enterprise which has an SLA and support). It is a JVM-level cluster, so to speak, so you don't have the issues associated with sessions on multiple boxes behind disparate JK workers (assuming you're using this for a J2EE app).
I'm just rambling, so have a look here: http://en.wikipedia.org/wiki/Terracotta_Cluster
UPDATE numerous bits of info on Terracotta on the web too, e.g. http://blog.terracottatech.com/2007/12/fud_of_the_week_terracotta_doe.html
UPDATE2 Bit of background on my views: I work for a company with a fairly big audience. We have a enterprise MySQL running with a master and about 5 slaves (times 2 considering we have 2 channels, with 4 app servers per channel), using MySQL's JDBC Replication driver (for which we've already submitted various patches). We use Spring2.5/Hibernate3 using Spring's declarative JTA transaction management, so read-onlies go to the slaves. With the advent of numerous Ajax enhancements on a future version of our site, our DB servers' load has gone up - we create pricing summaries for thousands of products for all countries, taking into account duties/tax rules for all these countries (plus promotions and real-time auctions running all the time), then the Ajax services have the latest prices in a blink. Terracotta takes the load off the DB and app servers by making these prices available to all app servers on a JVM-layer, with all the JVMs across the boxes linked. So, server A can update the prices every few minutes, and if Ajax hits server B, the prices are available immediately. I know there are people/companies out there with similar businesses, who probably have better ideas and implementations, so I'm always open for discussion, but this is my two cents.
I get inspiration from the guys at Facebook too, for instance this very informative article:
http://www.facebook.com/note.php?note_id=23844338919
They talk about memcached which you should also definitely check out.
As Neo4j is mentioned in the question, I'm chiming in with a few thoughts on using a graph database in this case. (I'm part of the Neo4j team)
retrieving children is trivial in a
graph db
there is a map implementation
for neo4j
as graphs are native to a graph db
you could consider not to flatten the
object graph, but to persist data in nodes
and edges/relationships (this gives you
more flexibility in handling the data)
neo4j is fully transactional
With the new DB technologies emerging today, there's really no need to stay with a RDBMS if your data isn't a good fit for the relational paradigm.
Seems to me Terracotta is a perfect fit for your requirements:
cluster a map to retrieve children
via keys (e.g. clustered Map)
map of maps - no problem
no explicit bin log - but Terracotta already persists everything to disk so full cluster restart is already supported
integrated already to Compass, Hibernate Search, and Lucene for search
Transactions? Too slow. Use the cache as a datastore. With persistence you won't lose data writing to (clustered) memory and trickle back to the DB.
In addition, Terracotta does the "reflection" thing you ask for - although it doesn't use reflection as that is far too slow. It uses BCM. Only changes are propagated on the network.
Hazelcast btw requires serialization so it will be slow and will not do well at all with a map of maps data structure (every put will result in a full deep clone copy across the network) and it doesn't have any kind of persistence built in.
Interesting.
I have a view that we all develop a zoo which comprises all the abstraction layers we habitually use in our projects. And each abstraction layer is a completely different animal.
My goal is to minimize the amount of time spent on just care and feeding of the animals whenever it diverts me from solving the problem at hand - it's overhead - wasted resources. So the fewer, simpler abstraction layers we can get away with, the more productive we are.
I can usually do just fine with two beasties - OOP and RDBMS, coupled through nice, simple, minimal, hand-crafted DAL. For me, ORM is mostly overhead - one abstraction too many, and a pretty hungry one.
Don't discount the option of treating stored procedures as an abstraction tool, either. If you're real comfortable with SQL, it can be a useful resource for implementing a light-weight BL facade that means not needing to think about the ORM problem.
And this post suggests the emergence of alternatives to RDBMS for some requirements, anyway.
Thanks for your answers.
Actually, you talk about DBs which is something I want to completely take out of the picture.
The use case I'm targetting is a startup's small/medium-sized clustered webapp (boxes in a LAN, or in the cloud). It needs to retrieve objects at ~RAM-speed levels and scale fairly easily. As a side-effect, one wouldn't have to think about DB server installations, impedance mismatch, JDBC, caches, polluting domain models with annotations, etc.
Again, what I want to accomplish is something like described here, and I would love to have some more feedback on ideas concerning the actual implementation (why use Terracotta instead of Hazelcast, use serialization or deep cloning via reflection or whatever else, and also the major drawbacks of an approach like this - eg. why wouldn't you change it for your current ORM/DB setup).
It has to be super simple to integrate so it'll feature a really neat Java API, improving code readability. No other software (DB, memcached will be required).
Try GigaSpaces. I think they have exactly what you require, and if I'm not mistaken there's a free version for startups.
Some concepts:
"Space" is some place where you can store and retrieve objects
Space can be backed by any JDBC-compliant DB, automatically (no code, only configuration)
Space can be started in your java process, so all accesses are at RAM speed
Space can be clustered/partitioned in any way you want (full mirror, partial, grid).
Space supports distributed or local transactions
Check their wiki, (but only "programmer's guide" - all the rest is marketing BS).

Categories