Scaling Java applications - existing cluster-aware IoC frameworks?

Scaling Java applications - existing cluster-aware IoC frameworks? - java

Most people use some kind of an IoC framework - Guice, Spring, you name it. Many of us need to scale their applications too, so they complicate their lifes with Terracotta, Glassfish/JBoss/insertyourfavouritehere clusters.
But is it really the way to go? Are you using any of the above?
Here's some ideas we currently have implemented in a yet-to-be-opensourced framework, and I'd like to see what you think of it, or maybe "it's a complete ripoff of XY!".
cluster-wide object replication - give it a name, and whenever you do something (in any node)
on such an object, it will get replicated - with different guarantees
do transparent soft-loadbalancing - simplest scenario: restful webservice method call proxied to an other node
view-only node injection: inject a proxy to a "named" object, and get your calls automatically proxied to a node
Would you use something like that? Is there a current, stable, enterprise-ready implementation out there?

FWIW I work at a company that does very large scale web applications and we tend not to use this form of object caching.
In fact we tend to make our lives easier by not storing anything in the session and not caching anything that is transactional and needs to be read in a current state. This way your application is quite simple, easy to reason about, and very easy to scale horizontally.
I guess the rationale behind using these object caches is primarily to reduce load on your persistence store and possibly to reduce latency. My suggestion is to work at scaling this backend independently from the relatively dumb webapp. Most large sites do this through the use of read replicas and data sharding. Have a look here: http://highscalability.com/livejournal-architecture. I remember looking at this a long time ago and it was pretty interesting. It also fits reasonably well into the kind of architecture that I see being used in high traffic websites.

Related

In modern application design, how do you implement / between TransferOject and BusinessObject

With organizations which are slow to adapt to modern technology finally junking EJBs and getting ready to transform to SpringBoot, Microservices, REST, Angular, there are some some questions application design. One being about TransferObjects and Business Objects
When the call comes to the REST Controller, is it still popular to populate a TO (POJO) and then make Service call, which in turn populates a BusinessObject and then calls a Repository service?
OR
At the REST Controller layer do we directly populate the BO and send it to the Service (This does not make any sense to me, because a BO is populated only during the execution business logic).
If nowadays its still Option 1, then how do we avoid writing to exactly similar POJO classes in most cases (in order to use BeanUtils.copyProperties()), with the BO decorated with #Id, #Column etc.

To elaborate on #Turing85's comments...
Option 1 usually (see the end of my answer) makes infinitely more sense. It's a question of responsibility (purpose) and change; the two logical components you refer to, a REST API and a repository / system service:
Responsibility: a REST service cares about working with its callers, so ideally when designing a REST API you should be involving someone from the client-side (client as in caller), because if the API doesn't work for them it's not going to be an effective API. On the other hand, repositories are somewhat self-centered, and may need to consider things that are or no interest to API callers (and vis-versa).
Change: if you pay attention to design principles, like SOLID, you'll know that part of a system should do one job - as a way of limiting the reasons why it needs to change (see: SRP). Trying to use one object across both outward-facing API's, and inward-facing repositories, is asking for trouble because it's trying to do too much - its trying to help solve problems in two very different parts of the wider solution, and thee both have very different change drivers working against them. Turning85's comment about the persistence layer stems from the same idea.
"Option 1 usually makes infinitely more sense":
One case where the REST API's objects will / can bear a very close resemblance to those that hit the actual repository (or even be reused, I guess) is when the REST API is a System API - i.e. a dedicated façade / proxy to the repository. In this case, the System API is largely driven by the repository i.e. the main change driver is just the repository.

After researching a bit I agree, keeping things simple will result in simpler code. I found a nice simple way to take care of this manual work.
https://www.baeldung.com/entity-to-and-from-dto-for-a-java-spring-application

Is it a bad practice to use a ThreadLocal Object for storing web request metadata?

I am working on a j2ee webapp divided in several modules. I have some metadata such as user name and preferences that I would like to access from everywhere in the app, and maybe also gather data similar to logging information but specific to a request and store it in those metadata so that I could optionally send it back as debug information to the user.
Aside from passing a generic context object throughout every method from the upper presentation classes to the downer daos or using AOP, the only solution that came in mind was using a threadlocal "Context" object very similar to a session BTW, and add a filter for binding it on ongoing request and unbinding it on response.
But such thing feels a little hacky since this breaks several patterns and could possibly make things complicated when it comes to testing and debugging so I wanted to ask if from your experience it is ok to proceed like this?

ThreadLocal is a hack to make up for bad design and/or architecture. It's a terrible practice:
It's a pool of one or more global variables and global variables in any language are bad practice (there's a whole set of problems associated with global variables - search it on the net)
It may lead to memory leaks, in any J2EE container than manages its threads, if you don't handle it well.
What's even worse practice is to use the ThreadLocal in the various layers.
Data communicated from one layer to another should be passed using Transfer Objects (a standard pattern).
It's hard to think of a good justification for using ThreadLocal. Perhaps if you need to communicate some values between 2 layers that have a third/middle layer between them, and you don't have the means to make changes to that middle layer. But if that's the case, I would look for a better middle layer.
In any case, if you store the values in one specific point in the code and retrieve it in another single point, then it may be excusable, otherwise you just never know what side affects any executing method may have on the values in the ThreadLocal.

Personally I prefer passing a context object, as the fact that the same thread is used for processing is an artifact of the implementation, and you shouldn't rely on such artifacts. The moment you want to use other threads, you'll hit a wall.
If those states are encapsulated in a Context object, I think that's clean enough.

When it comes to testing, the best tool is dependency injection. It allows to inject fake dependencies into the object under test.
And all dependency injection frameworks (Spring, CDI, Guice) have the concept of a scope (where request is one of these scopes). Under the hood, beans stored in the request scoped are indeed associated with a ThreadLocal variable, but this is all done by the dependency injection framework.
What I would do is thus to use a DI framework, which would make request-scope objects available anywhere, but without having to look them up, which would break testability. Just inject a request-scoped object where you want to use it, and the DI framework will retrieve it for you.

You must know that a servlet container can / will re-use threads for requests so if you do use ThreadLocals, you'll need to clean up after yourself once the request is finished (perhaps using a filter)

If you are the only developer in the project and you think you gain something: just do it! Because it is your time. But, be prepared to revert the decision and reorganize the code base later, as should be always the case.
Let's say there are ten developers on the project. Everybody might like to have its thread local variable to pass on parameters like currency, locale, roles, maybe it becomes even a HashMap....
I think in the end, not everything which is feasible, should be done. Complexity will strike back on you....

ThreadLocal can lead to memory leak if we do not set null manually once its out of scope.

How many GWT services

Starting a new GWT application and wondering if I can get some advice from someones experience.
I have a need for a lot of server-side functionality through RPC services...but I am wondering where to draw the line.
I can make a service for every little call or I can make fewer services which handle more operations.
Let's say I have Customer, Vendor and Administration services. I could make 3 services or a service for each function in each category.
I noticed that much of the service implementation does not provide compile-time help and at times troublesome to get going, but it provides good modularity. When I have a larger service, I don't have the modularity as I described, but I don't have to the service creation issues and reduce the entries in my web.xml file.
Is there a resource issue with using a lot of services? What is the best practice to determine what level of granularity to use?

in my opinion, you should make a rpc service for "logical" things.
in your example:
one for customer, another for vendors and a third one for admin
in that way, you keep several services grouped by meaning, and you will have a few lines to maintain in the web.xml file ( and this is a good news :-)
More seriously, rpc services are usually wrappers to call database stuff, so, you even could make a single 'MagicBlackBoxRpc' with a single web.xml entry and thousands of operations !
but making a separate rpc for admin operations, like you suggest, seems a good thing.

Read general advice on "how big should a class be?", which is available in any decent software engineering book.
In my opinion:
One class = One Subject (ie. group of functions or behaviours that are related)
A class should not deal with more than one subject. For example:
Class PersonDao -> Subject: interface between the DB and Java code.
It WILL NOT:
- cache Person instances
- update fields automatically (for example, update the field 'lastModified')
- find the database
Why?
Because for all these other things, there will be other classes doing it! Respectively:
- a cache around the PersonDao is concerned with the efficient storage of information to avoid hitting the DB more often than necessary
- the Service class which uses the DAO is responsible for modifying anything that needs to be modified automagically.
- To find the database is responsibility of the DataSource (usually part of a framework like Spring) and your Dao should NOT be worried about that. It's not part of its subject.
TDD is the answer
The need for this kind of separation becomes really clear when you do TDD (Test-Driven Development). Try to do TDD on bad code where a single class does all sorts of things! You can't even get started with one unit test! So this is my final hint: use TDD and that will tell you how big a class should be.

I think the thing to optimize for is that you can accomplish a result in one round trip to the server. I have an ad-hoc collection of methods on my service object, one for each situation the client finds itself in when it has to get something done. You do not want the client to RPC to the server several times in a row while the user is sitting there waiting.
REST makes things orthogonal, but orthogonality has a cost: there is a reason that the frequently used verbs in languages are irregular. In terms of maintaing clean orthogonal structure to your app, make sure your schema is well-designed. That is where each class should have semantics orthogonal to that of the other classes. When the semantics of each RPC call can be stated cleanly in the schema there will be no confusion as to what they mean, even if they aren't REST-fully ideal.

Properly using the singleton design pattern

I'm running a server which on occasion has to search what a client queries. I'd like to write the client query to disk for records, but I don't want to slow down the search anymore than I have to. (The search is already the bottleneck...)
So, when the client performs a search, I'm having the client's thread send a message to a singleton thread, which will handle the disk write, while the client thread continues to handle the client's requests. That way, the file on disk doesn't run into sync issues, and it doesn't slow the clients experience down.
I have a conceptual question here: is the singleton appropriate in this case? I've been using the singleton design pattern a little too much in my recent programming, and I want to make sure that I'm using it for its intended use.
Any feedback is greatly appreciated.

The singleton pattern is definitely overused and comes with its share of difficulties (unit-testing is the canonical example), but like everything in design, you need to weigh the pros and cons for your specific scenario. The singleton pattern does have its uses. There are options that may allow you to get the singleton behaviour, while alleviating some of the inherent issues:
Interception (often referred to as aspect oriented programming, though I've seen debate that they are not exactly the same thing... can't find the article I read on this at this time) is definitely an option. You could use any combination of construction injection, the decorator pattern, an abstract factory and an inversion of control container. I'm not up on my Java IoC containers, but there are some .Net containers that allow automatic interception (I believe Spring.Net does, so likely Spring (Java) has this built in). This is very handy for any type of cross-cutting concerns, where you need to perform certain types of actions across multiple layers (security, logging etc.). Also, most IoC containers allow you to control lifetime management, so you would be able to treat your logger as a singleton, without having to actually implement the singleton pattern manually.
To sum it up. If a singleton fits for your scenario (seems plausible from your description), go for it. Just make sure you have weighed the pros and cons. You may want to try a different approach and compare the two.

Can/should I use WeakReference in my complex object structure with db4o?

I'm considering to port an application to db4o. The data model consists of lots of small objects with a lot of references between each other. For example, I have a book which points to an author and chapter. Chapters have sections, sections have large blobs of text, images, and they reference characters mentioned.
I think it should be possible to keep the meta structure in memory (everything except the text blobs) but I was wondering whether I could use some clever trick involving WeakReference so db4o would just keep the part of the model in memory that I really need (i.e. which I've been using recently).
The same is true for the text blobs (which should be around 1-10KB). Is it possible to get a String without having to worry about the DB layer and without having to query for the text blob using an artificial ID inside the getter and without using a hard reference which keeps the whole text in memory all the time?

Turning off WeakReferences is mostly used for performance tuning. The downsides to this approach are not negligible - so be careful. I would not recommend it.
Controlling memory usage should be done using activation features. Activation can help you keep only part of you model in memory and weakreferences will help you GC objects no longer used. I think that's the way to go.
Also - you can post your questions to db4o forums to get help from the db4o community.
Goran

I've not used db40, or any ORM/OODB product recently, however it would strike me that this kind of memory management & graph management feature should be part of the framework itself rather than something you build on top of it. If Versant's db40 doesn't offer this it might be worth you looking into another product instead that does offer it. So, I realise not the answer your looking for, but leveraging the framework would be my first port of call.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.