Cache objects as Immutable classes - java

I am asking very generic question and the answer can vary from requirement to requirement, but for "general" or "rule of thumb", can we say the following is a good design rule:
The classes to be cached (static/reference data) should be designed as
immutable, with exceptions reasoned.
What could be design/performance issues with the above statement, if this is not true?

#JohnB has a good answer about the underlying data.
If, however, the question is referring to the immutability of the cached classes themselves (which are holding the data in the cache), then the answer is that mutable classes can cause thread-safety issues if the instances of the classes are referenced by multiple threads (as can often happen with data shared via a cache). Additionally, "accidental" modification of the data may occur, where a shared instance is unintentionally modified (because the modifying code did not know that the data was shared).

This is because of what a cache does, which is hold data rather than retrieving it from the data source again. For example, you query the database for a value then put it in a memory-based cache so you don't have to query the DB again. However, if the value in the DB can change then the value in the cache will be out of date and your application will be using the wrong data.
Therefore, caching is best if the data cannot change during the live of the application. If the data can change, then a strategy must be developed to regularly check to see if the data has changed.

What jtahlborn is explaining in other words : an immutable class will provide methods to obtain "static" data.
If your class is immutable, you will NOT have setters except the parameters in the constructor.
Take care making this : immutable classes are not made to be used only once, it would result in a performance loss, since copies of inner attributes have to be done each time you access the get... methods.
Example :
class MyImmutableThing {
private final String myProperty;
MyImmutableThing(String myProperty) {
this.myProperty = myProperty;
}
String obtainMyProperty() {
return myProperty;
}
// note there is no mean to modify the myProperty value : the original value remains ;)
// That's it !
}

Related

Single class file on server memory using Spring causing problems

I don't quite know how to explain the situation, I will try to be as clear as possible.
I am currently writing a web-application, using Spring to manage the beans. Obviously, more than one people will use this application. Each user has a set of data related to himself. My problem comes with some poor design I introduced when I just entered the development field. Here is the case:
#Component
public class ServiceClass implements IService {
#Autowired
private Dependency firstDependency;
#Autowired
private UsefulObject secondDependency;
private DataSet dataSet; // THIS LINE IS IMPORTANT
public void entryPoint(String arg1, int arg2, Structure arg3) {
/* Query data from a database specific from the project (not SQL
oriented. I absolutely need this information to keep going. */
dataSet = gatherDataSet(String ar1);
/* Treat the data */
subMethodOne(arg1);
subMethodTwo(arg2);
subMethodThree(arg3);
}
private subMethodOne(String arg1) {
// Do some things with arg1, whatever
subSubMethod(arg1);
}
private subSubMethod(String arg1) {
/* Use the DataSet previously gathered */
dataSet.whateverDoing();
}
... // Functions calling sub-methods, using the DataSet;
As every user would have a different dataSet, I thought it would be good to call it at the beginning of every call to my service. In the same way, as is it used very deep in the call hierarchy, I thought it would be a good idea to store it as an attribute.
The problem I encounter is that, when two users are going through this service nearly simultaneously, I have a cross-data issue. The following happens:
First user comes in, calls gatherDataSet.
Second user comes in, calls gatherDataSet. First user is still treating !
First user still uses the dataSet object, which was overrid by Second user.
Basically, the data first user makes use of become false, because he uses data from the second user, which came in short after him.
My questions are the following:
Are there design pattern / methods to avoid this kind of behavior ?
Can you configure Spring so that he uses two instances fo two users (and so on), to avoid this kinf od problems ?
Bonus: (Kind of unrelated) How to implement a very large data mapper ?
Object member variables (fields) are stored on the heap along with the object. Therefore, if two threads call a method on the same object instance and this method updates object member variables, the method is not thread safe.
However, If a resource is created, used and disposed within the control of the same thread, and never escapes the control of this thread, the use of that resource is thread safe.
With this in mind, change your design. https://books.google.co.in/books?isbn=0132702258 is a must read book for coming up with good java based software design
More stackoverflow links: Why are local variables thread safe in Java , Instance methods and thread-safety of instance variables
Spring promotes singleton pattern and (it is the default bean scope). Spring configuration for having two service class objects for two different users is called prototype bean scoping, but should be avoided as far as possible.
Consider the usage of in-memory Map or an external no-sql datastore or an external relational database
Can you configure Spring so that he uses two instances fo two users (and so on), to avoid this kinf od problems ?
You already mentioned correctly, that the design decisions you took are flawed. But to answer your specific question, which should get your use-case to work correctly, but at a impact to performance cost:
You can set spring beans to various scopes (relevant for your usecase: prototype / request or session), which will modify when spring beans get instanced. The default behaviour is one bean per spring container (singleton), hence the concurrency issues. See https://docs.spring.io/spring/docs/3.0.0.M3/reference/html/ch04s04.html
The easiest solution is simply to not store the dataset in a class field.
Instead, store the dataset in a local variable and pass it as an argument to other functions, this way there will not be any concurrency problems, as each call stack will have it's own instance.
Example:
public void entryPoint(String arg1, int arg2, Structure arg3) {
// Store the dataset in a local variable, avoiding concurrency problems
Dataset dataSet = gatherDataSet(String ar1);
// Treat the data passing dataset as an argument
subMethodOne(arg1, dataset);
subMethodTwo(arg2, dataset);
subMethodThree(arg3, dataset);
}
Use synchronized modifier for it.
As "Synchronization plays a key role in applications where multiple threads tend to share the same resources, especially if these resources must keep some kind of sensitive state where manipulations done by multiple threads at the same time could lead the resource to become in an inconsistent state."
public void someMethod() {
synchronized (object) {
// A thread that is executing this code section
// has acquired object intrinsic lock.
// Only a single thread may execute this
// code section at a given time.
}
}

java standard serialization order

I want to know in which order the attributes of the following example class would be serialized:
public class Example implements Serializable {
private static final long serialVersionUID = 8845294179690379902L;
public int score;
public String name;
public Date eventDate;
}
EDIT:
Why i wanna know this:
I got a serialized string in a file for a class of mine which has no implementation for
readObject() or writeObject(). now that implementation changed ( some properties are gone )
and i want to write a readObject() method that handles the old serialized class.
There i would just read this property but wouldnt save it to the created object.
This is basically just for legacy i use a database now but need to support the old
serialized files.
to write this readObject() i need the order of properties that in the stream.
Based on a brief reading of the spec.
The fields are written in the order of the field descriptors the class descriptor
The field descriptors are in "canonical order" which is defined as follows:
"The descriptors for primitive typed fields are written first sorted by field name followed by descriptors for the object typed fields sorted by field name. The names are sorted using String.compareTo."
(I suspect that the bit about the canonical order should not matter. The actual order of the fields in the serialization should be recoverable from the actual order of the field descriptors in the class descriptor in the same serialization. I suspect that the reason a canonical order is specified is that it affects the computed serialization id. But I could easily be wrong about this :-) )
Reference:
Java Object Serialization Specification - Section 4.3 and Section 6.4.1 (the comment on the "nowrclass" production)
With regards to your original problem, some testing would suggest that if you've maintained your serialVersionUID and haven't removed fields containing values you need, you can probably just deserialize your old objects without error.
Any fields you no longer have will be ignored. Any new fields will be initialised to default values (e.g. null, or 0 etc.).
Bear in mind, this approach might violate constraints you've placed upon your class. For example, it may not be legal (in your eyes) to have null values in your fields.
Final warning: this is based on some testing of mine and research on the Internet. I haven't yet encountered any hard proof that this is guaranteed to work in all situations, nor that it is guaranteed to continue to work in the future. Tread carefully.
It doesn't matter. Fields are serialized along with their names. Changing the order doesn't affect serialization compatibility, as long as the serialVersionUID is the same. There are a lot of other things that don't mater either. See the Versioning chapter in the Object Serialization Specification.

NoSQL Schemaless data and statically typed language

One of the key benefits of NoSQL data stores like MongoDB is that they're schemaless. With dynamically typed languages this seem to be a natural fit. You can receive some arbitrary JSON inputs, perform business logic on the known fields, and persist the whole thing without first having to define the object.
What if your choice of language is limited to the statically typed, say Java? How could I achieve the same level of flexibility?
A typical data flow like the following:
JSON Input
Serialize to Java Object to perform business logic
Deserialize into BSON to persist in Mongo
where the serialization to object step is necessary since you want to perform business logic with POJOs, not JSON strings. However, before I can serialize the input into objects, I must define it first. What if the input contains additional fields undefined in the object? While they may not be used in the business logic, I may still want to be able to persist them. I have seem implementations where the undefined fields are put into a map, but am not sure if that's the best approach. For one, the undefined fields may be complex objects as well.
Schemaless data doesn't necessarily mean structureless data; the fields are typically known in advance and some type-safe pattern can be applied on top of it to avoid the Magic Container anti-pattern But this is not always the case. Sometimes keys are entered by the user and cannot be known in advance.
I've used the Role Object Pattern several times to give coherence to a dynamic structure. I think it is well suited here for both cases.
The Role Object Pattern defines a way to access different views of an object. The canonical example being a User that can assume several roles such as Customer, Vendor, and Seller. Each of these views has different operations it can perform and can be accessed from any of the other views. Common fields are typically available at the interface level (especially userId(), or in your case toJson()).
Here's an example of using the pattern:
public void displayPage(User user) {
display(user.getName());
if (user.hasView(Customer.class))
displayShoppingCart(user.getView(Customer.class);
if (user.hasView(Seller.class))
displayProducts(user.getView(Seller.class));
}
In the case of data with a known structure, you can have several views bringing different sets of keys into cohesive units. These different views can read the json data on construction.
In the case of data with a dynamic structure, an authoritative RawDataView can have the data in it's dynamic form (ie. a Magic Container like a HashMap<String, Object>). This can be used to query the dynamic data. At the same time, type-safe wrappers can be created lazily and can delegate to the RawDataView to assist in program readability/maintainability:
public class Customer implements User {
private final RawDataView data;
public CustomerView(UserView source) {
this.data = source.getView(RawDataView.class);
}
// All User views must specify this
#Override
public long id() {
return data.getId();
}
#Override
public <T extends UserView> T getView(Class<T> view) {
// construct or look up view
}
#Override
public Json toJson() {
return data.toJson();
}
//
// Specific to Customer
//
public List<Item> shoppingCart() {
List<Item> items = (List<Item>) data.getValue("items", List.class);
}
// etc....
}
I've had success with both of these approaches. Here are some extra pointers that I've discovered along the way:
Have a static structure structure to your data as much as possible. This makes things a lot easier to maintain. I had to break this rule and use the RawDataView approach when working on a legacy system. You may also have to break it with dynamically-entered user data as mentioned above. In which case, use a convention for non-dynamic field names such as a leading underscore (_userId)
Have equals() and hashcode() implemented such that user.getView(A.class).equals(user.getView(B.class)) is always true for the same user.
Have a UserCore class that does all the heavy lifting of common code such as creating views; performing common operations (like toJson()) returning common fields (like userId()); and implementing equals() and hashcode(). Have all views delegate to this core object
Have an AbstractUserView that delegates to the UserCore and implements equals() and hashcode()
Use a type-safe heterogeneous container (like ClassToInstanceMap) constructing/caching views.
Allow the existence of a view to be queried. This can be done with either a hasView() method or by having getView return Optional<T>
You can always have a class which provides both:
easy access to attributes you know about and optional fallback cases to older formats (for example it can return "name" if it exists, or older case of "name.first" + "name.last" if it doesn't (or some similar scenario))
easy access to unknown elements simulating the map interface
Whether you do a full validation or not, whether you allow extra undefined attributes or not depends on what you want to achieve. But I think that creating an abstraction which allows you either way of accessing the data is the best solution.
Hopefully over time, you'll get to the stage where your schema is pretty much stable and messing directly with the attributes is not needed anymore.
This is not well solved in Java due to the lack of dynamic types. One way this can be solved is using Maps.
Map
The object can again be a Map of objects.
This is not an elegant way but works in Java. An example : SnakeYaml library for YAML allows traversal in this way.

Pattern/Library for sending objects over network, keeping pointers

Let's say you have a Client and a Server that wants to share/synchronize the same Models/Objects. The models point to each other, and you want them to keep pointing at the same object after being sent/serialized between the client and the server. My current solution roughly looks like this:
class Person {
static Map<Integer,Person> allPeople;
int myDogId;
static Person getPerson(int key){
return allPeople.get(key);
}
Dog getMyDog() {
return Dog.getDog(myDogId);
}
}
class Dog {
static Map<Integer,Dog> allDogs;
int myOwnersId;
static Dog getDog(int key) {
return allDogs.get(key);
}
Person getMyOwner() {
return Person.getPerson(myOwnersId);
}
}
But i'm not too satisfied with this solution, fields being integer and stuff. This should also be a pretty common problem. So what I'm looking for here is a name for this problem, a pattern, common solution or a library/framework.
There are two issues here.
Are you replicating the data in the Client and the Server (if so, why?) or does one, the other, or
a database agent hold the Model?
How does each agent access (its/the) model?
If the model is only held by one agent (Client, Server, Database), then the other agents
need a way to remotely query the model (e.g., object enumerators, getters and setters for various fields)
operating on abstract model entities (e.g, model element identifiers, which might be implemented
as integers as you have done).
Regardless of who holds the model (one or all), each model can be implemented naturally.
THe normal implementation has each object simply refer to other objects using normal object references,
as if you had coded this without any thought of sharing between agents, and unlike what
you did.
You can associate an objectid with each object, as you have, but your application
code doesn't need to use it; it is only necessary when referencing a remote copy of
of the model. Whether this objectid is associated with each object as a special
field, a hash table, or is computed on the fly is just an implementation detail.
One way to handle this is to compute the objectid on-the-fly. You can do this
if there is a canonical spanning tree over the entire model. In this case,
the objectid is "just" the path from root of the spanning tree to the location
of object. If you don't have a spanning tree or it is too expensive to compute,
you can assign objectids as objects are created.
The real problem with a duplicated, distributed model as you have suggested you have,
is keeping it up to date with both agents updating it. How do you prevent
one from creating an object (an assigning an objectid) at the same time
as the other, but the objects being created are different with the same objectid,
or the same with with different Objectids? You'll need remote locking
and signalling to keep the models in sync (this is the same problem as
"cache coherency" for multiple CPUs; just think of each object as acting like a cache line). The way it is generally solved
is to designate who holds the master copy (perhaps of the entire model,
perhaps of individual objects within the model) and then issue queries,
reads, reads-with-intent-to-modify, or writes to ensure that the
"unique" entire model gets updated.
The only solution I am aware of is to send the complete structure, i.e. Dogs and Persons over the network. Then they will end up pointing to the correct copy on the other side of the network. The implementation of this solution however depends on a lot of circumstances. For example when your inclusion relation defines a tree you can go at this problem differently than if it is a graph with cycles.
Have a look at this for more information.
I guess one can use the proxy pattern for this.

Persisting data suited for enums

Most projects have some sort of data that are essentially static between releases and well-suited for use as an enum, like statuses, transaction types, error codes, etc. For example's sake, I'll just use a common status enum:
public enum Status {
ACTIVE(10, "Active");
EXPIRED(11, "Expired");
/* other statuses... */
/* constructors, getters, etc. */
}
I'd like to know what others do in terms of persistence regarding data like these. I see a few options, each of which have some obvious advantages and disadvantages:
Persist the possible statuses in a status table and keep all of the possible status domain objects cached for use throughout the application
Only use an enum and don't persist the list of available statuses, creating a data consistency holy war between me and my DBA
Persist the statuses and maintain an enum in the code, but don't tie them together, creating duplicated data
My preference is the second option, although my DBA claims that our end users might want to access the raw data to generate reports, and not persisting the statuses would lead to an incomplete data model (counter-argument: this could be solved with documentation).
Is there a convention that most people use here? What are peoples' experiences with each and are there other alternatives?
Edit:
After thinking about it for a while, my real persistence struggle comes with handling the id values that are tied to the statuses in the database. These values would be inserted as default data when installing the application. At this point they'd have ids that are usable as foreign keys in other tables. I feel like my code needs to know about these ids so that I can easily retrieve the status objects and assign them to other objects. What do I do about this? I could add another field, like "code", to look stuff up by, or just look up statuses by name, which is icky.
We store enum values using some explicit string or character value in the database. Then to go from database value back to enum we write a static method on the enum class to iterate and find the right one.
If you expect a lot of enum values, you could create a static mapping HashMap<String,MyEnum> to translate quickly.
Don't store the actual enum name (i.e. "ACTIVE" in your example) because that's easily refactored by developers.
I'm using a blend of the three approaches you have documented...
Use the database as the authoritative source for the Enum values. Store the values in a 'code' table of some sort. Each time you build, generate a class file for the Enum to be included in your project.
This way, if the enum changes value in the database, your code will be properly invalidated and you will receive appropriate compile errors from your Continuous Integration server. You have a strongly typed binding to your enumerated values in the database, and you don't have to worry about manually syncing the values between code and the data.
Joshua Bloch gives an excellent explanation of enums and how to use them in his book "Effective Java, Second Edition" (p.147)
There you can find all sorts of tricks how to define your enums, persist them and how to quickly map them between the database and your code (p.154).
During a talk at the Jazoon 2007, Bloch gave the following reasons to use an extra attribute to map enums to DB fields and back: An enum is a constant but code isn't. To make sure that a developer editing the source can't accidentally break the DB mapping by reordering the enums or renaming then, you should add a specific attribute (like "dbName") to the enum and use that to map it.
Enums have an intrinsic id (which is used in the switch() statement) but this id changes when you change the order of elements (for example by sorting them or by adding elements in the middle).
So the best solution is to add a toDB() and fromDB() method and an additional field. I suggest to use short, readable strings for this new field, so you can decode a database dump without having to look up the enums.
While I am not familiar with the idea of "attributes" in Java (and I don't know what language you're using), I've generally used the idea of a code table (or domain specific tables) and I've attributed my enum values with more specific data, such as human readable strings (for instance, if my enum value is NewStudent, I would attribute it with "New Student" as a display value). I then use Reflection to examine the data in the database and insert or update records in order to bring them in line with my code, using the actual enum value as the key ID.
What I used in several occations is to define the enum in the code and a storage representation in the persistence layer (DB, file, etc.) and then have conversion methods to map them to each other. These conversion methods need only be used when reading from or writing to the persistent store and the application can use the type safe enums everywhere. In the conversion methods I used switch statements to do the mapping. This allows also to throw an exception if a new or unknown state is to be converted (usually because either the app or the data is newer than the other and new or additional states had been declared).
If there's at least a minor chance that list of values will need to be updated than it's 1. Otherwise, it's 3.
Well we don't have a DBA to answer to, so our preference is for option 2).
We simply save the Enum value into the database, and when we are loading data out of the database and into our Domain Objects, we just cast the integer value to the enum type.
This avoids any of the synchronisation headaches with options 1) and 3). The list is defined once - in the code.
However, we have a policy that nobody else accesses the database directly; they must come through our web services to access any data. So this is why it works well for us.
In your database, the primary key of this "domain" table does't have to be a number. Just use a varchar pk and a description column (for the purposes your dba is concerned). If you need to guarantee the ordering of your values without relying on the alphabetical sor, just add a numeric column named "order or "sequence".
In your code, create a static class with constants whose name (camel-cased or not) maps to the description and value maps to the pk. If you need more than this, create a class with the necessary structure and comparison operators and use instances of it as the value of the constants.
If you do this too much, build a script to generate the instatiation / declaration code.

Categories