What is wrong in sharing Mutable State? [duplicate] - java

This question already has answers here:
How shall we write get method, so that private fields don't escape their intended scope? [duplicate]
(2 answers)
Closed 3 years ago.
In Java Concurrency in Practice chapter # 3 author has suggested not to share the mutable state. Further he has added that below code is not a good way to share the states.
class UnsafeStates {
private String[] states = new String[] {
"AK", "AL"
};
public String[] getStates() {
return states;
}
}
From the book:
Publishing states in this way is problematic because any caller can modify its contents. In this case, the states array has escaped its intended scope, because what was supposed to be private state has been effectively made public.
My question here is: we often use getter and setters to access the class level private mutable variables. if it is not the correct way, what is the correct way to share the state? what is the proper way to encapsulate states ?

For primitive types, int, float etc, using a simple getter like this does not allow the caller to set its value:
someObj.getSomeInt() = 10; // error!
However, with an array, you could change its contents from the outside, which might be undesirable depending on the situation:
someObj.getSomeArray()[0] = newValue; // perfectly fine
This could lead to problems where a field is unexpectedly changed by other parts of code, causing hard-to-track bugs.
What you can do instead, is to return a copy of the array:
public String[] getStates() {
return Arrays.copyOf(states, states.length);
}
This way, even the caller changes the contents of the returned array, the array held by the object won't be affected.

With what you have it is possible for someone to change the content of your private array just through the getter itself:
public static void main(String[] args) {
UnsafeStates us = new UnsafeStates();
us.getStates()[0] = "VT";
System.out.println(Arrays.toString(us.getStates());
}
Output:
[VT, AR]
If you want to encapsulate your States and make it so they cannot change then it might be better to make an enum:
public enum SafeStates {
AR,
AL
}
Creating an enum gives a couple advantages. It allows exact vales that people can use. They can't be modified, its easy to test against and can easily do a switch statement on it. The only downfall for going with an enum is that the values have to be known ahead of time. I.E you code for it. Cannot be created at run time.

This question seems to be asked with respect to concurrency in particular.
Firstly, of course, there is the possibility of modifying non-primitive objects obtained via simple-minded getters; as others have pointed out, this is a risk even with single-threaded programs. The way to avoid this is to return a copy of an array, or an unmodifiable instance of a collection: see for example Collections.unmodifiableList.
However, for programs using concurrency, there is risk of returning the actual object (i.e., not a copy) even if the caller of the getter does not attempt to modify the returned object. Because of concurrent execution, the object could change "while he is looking at it", and in general this lack of synchronization could cause the program to malfunction.
It's difficult to turn the original getStates example into a convincing illustration of my point, but imagine a getter that returns a Map instead. Inside the owning object, correct synchronization may be implemented. However, a getTheMap method that returns just a reference to the Map is an invitation for the caller to call Map methods (even if just map.get) without synchronization.
There are basically two options to avoid the problem: (1) return a deep copy; an unmodifiable wrapper will not suffice in this case, and it should be a deep copy otherwise we just have the same problem one layer down, or (2) do not return unmediated references; instead, extend the method repertoire to provide exactly what is supportable, with correct internal synchronization.

Related

Is "double checked locking" broken here in java?

I find an example for double checked locking.
However, I think this example is invalid because it's possible that another thread may see a non-null reference to a DoorControlManage object of door 1 but see the default values for fields of the DoorControlManage object of door 1 rather than the values set in the constructor.
(Ref: https://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html)
Could you let me know whether I am right?
Thanks a lot!
public class DoorControlManager {
private static HashMap<Integer, DoorControlManager> mInstances = new HashMap<>();
public static DoorControlManager getInstance(int door) {
if (!mInstances.containsKey(door)) {
synchronized (mInstances) {
if (!mInstances.containsKey(door)) {
mInstances.put(slotId, new DoorControlManager(door));
}
}
}
return mInstances.get(slotId);
}
...
}
Yes this code is broken, though not for the normal reason.
In this case, you have different threads accessing HashMap without proper synchronization. Since HashMap is not a thread-safe class, this is not thread-safe. It is possible that the first containsKey call will see stale values the internals of the map, and behave in unspecified (implementation dependent) ways.
Making "simple" changes to concurrency sensitive code can completely destroy the properties that make the original version thread-safe. If you are going to attempt to write "clever" code like this, you need to have a deep understanding of Java concurrency ... and how the Java Memory Model really works.
There are a couple of ways that this code could be written correctly:
Use a ConcurrentHashMap and implement the getInstance method as:
return mInstances.computeIfAbsent(
slotId, () -> new DoorControlManager(door));
Keep using a HashMap and don't use the DCL pattern. Simply lock before testing.
Note that DCL initialization pattern in Java 5+ is not broken, provided that the you are initializing a single field and the field is declared as volatile. But there are other (better) ways to achieve the same effect, so its use is not recommended.

Assigning fields in an equals() method

Suppose you have a written a class and have used lazy initialization to assign one of its fields. Suppose that the computation for that field only involves the other fields and is guaranteed to produce the same result every time. When two equal instances of the class encounter one another, it makes sense for them to share the value of the lazily initialized field (if either knows it). You could do this in the equals() method. Here is a class showing what I mean.
final class MyClass {
private final int number;
private String string;
MyClass(int number) {
this.number = number;
}
String getString() {
if (string == null) {
string = OtherClass.expensiveCalculation(number);
}
return string;
}
#Override
public boolean equals(Object object) {
if (object == this) { return true; }
if (!(object instanceof MyClass)) { return false; }
MyClass that = (MyClass) object;
if (that.number != number) { return false; }
String thatString = that.string;
if (string == null && thatString != null) {
string = thatString;
} else if (thatString == null && string != null) {
that.string = string;
}
return true;
}
#Override
public int hashCode() { return number; }
}
To me, this information-sharing seems the logical thing to do if you are going to go to the effort of lazily initializing a field, yet I have never seen an example of anyone using the equals() method in this way.
Is it a common or standard technique? If so, what is it called? If it is not a common technique, can I ask (at the risk of having the question put on hold as primarily opinion-based) what people think about it? Is it a good idea to use the equals() method to do anything other than check for equality?
This looks dangerous to me: the use of a side affect of a public method of Object to set an object's state. This will break if you subclass this class, and then override the subclass's equals method, a common thing to do. Just don't do this.
"Suppose that the computation for that field only involves the other fields and is guaranteed to produce the same result every time."
Given this supposition, you can assert that the value of the lazily initialized field does not matter because if the values of the other fields are the same, the calculated value will also be the same.
Edit
I guess I sidestepped the original question, so I'll answer that too. In the scenario you've created, there is nothing inherently wrong with what you're proposing.
The argument I would make is simply from a pragmatic standpoint: what happens when someone else is changing the definition of getString() (or more likely - changing the definition of the long running calculation that results in that value) and it starts relying on something that's not part of the object's equality considerations?
The reason conventional wisdom says that equals() should be side effect free is that most developers expect it to be side effect free.
I would not do this, for three reasons:
General software-engineering principles, such as cohesion, loose coupling, and "don't repeat yourself", militate against it: your equals(...) method will be doing something not very "equals"-y, that overlaps with the logic of your getString() method. Someone updating the logic of getString() might well fail to notice if they also need to update the logic of equals(...). (You might think that the logic of equals(...) will continue to be correct no matter how getString() is changed — after all, you're just having equals(...) copy the reference from one object to an equivalent one, so presumably that should always stay the same? — but the problem is that complex systems evolve in ways that you can't always predict in advance. When a requirement changes, you don't want to have make random changes in parts of the code that aren't obviously related to the requirement.)
Thread-safety. Your string field currently isn't volatile, and your getString() method currently isn't synchronized, so there's no attempt at thread-safety here anyway; but if you were to make the rest of the class thread-safe, it would not be perfectly straightforward to change equals(...) to be thread-safe without risking deadlocks. (This overlaps a bit with point #1, but I'm listing it separately because #1 is solely about the difficulty of knowing that you have to change equals(...), whereas this issue is a bit tricky to address even given that knowledge.)
Unlikelihood of usefulness. There's not much reason to expect it to happen very often that two instances get equals(...)-compared when one has already been lazy-initialized and the other has not; so the extra code complexity, and downsides mentioned above, are not likely to be worth it. (Remember: code is not free. In order to pass cost–benefit analysis, the benefits of a piece of code must exceed the costs of testing, understanding, maintaining, and supporting it in the future.) If it's worthwhile to share these lazy-initialized values between equivalent instances, then that should be done in a clearer and more-organized fashion that does not rely on happenstance. (For example, you might make the class's constructor private, and have a static factory-method that checks a static WeakHashMap for an existing instance before creating and returning a new one.)
The approach you describe is sometimes a good one, especially in situations where it is likely that many large immutable objects, despite being independently constructed, will end up being identical. Because it is much faster to compare equal references than to compare large objects which happen to be equal, it may be advantageous to have code which compares two large-objects and finds them to be identical replace one of the references with a reference to the other. For this to be workable, one should attempt to establish some sort of ordering among the objects in question to ensure that repeated comparisons will eventually yield the same canonical value. This could be accomplished by having objects include a long sequence number and consistently replacing references to newer values with references to older-but-equal values, or by comparing the identityHashCode value of the equal references and discarding whichever one, if any, has the lower value (if two references which identify distinct but identical instances, happen to report the same identityHashCode, both should be kept).
A nasty but unfortunate wrinkle in this is that Java has very poor multi-threading support for effectively-immutable objects. For an effectively-immutable object to be thread-safe, any access to an array or non-final field must go through a final field. The cheapest way of accomplishing that is probably to have the object contain a final field into which it stores a reference to itself, and have all methods which access non-final fields do so through that final field, but that's a bit ugly. Still, changing references distinct-but-identical references with references to the same object could offer some significant performance advantages despite the silly redundant final field accesses (since the target of the final field would be guaranteed to be in-cache, dereferencing it would be much cheaper than a normal dereference).
BTW, it would in many cases be possible to include an "equivalence-relation" mechanism such that once some objects were compared and found to be equal, discovering that any of them is equal to another object would cause all of them to be quickly recognizable as such. I haven't figured out how to avoid the possibility of a deliberately-nasty-but-legitimate usage pattern causing a memory leak, however.

can I add to a private list directly through the getter?

I realize I'm going to get flamed for not simply writing a test myself... but I'm curious about people's opinions, not just the functionality, so... here goes...
I have a class that has a private list. I want to add to that private list through the public getMyList() method.
so... will this work?
public class ObA{
private List<String> foo;
public List<String> getFoo(){return foo;}
}
public class ObB{
public void dealWithObAFoo(ObA obA){
obA.getFoo().add("hello");
}
}
Yes, that will absolutely work - which is usually a bad thing. (This is because you're really returning a reference to the collection object, not a copy of the collection itself.)
Very often you want to provide genuinely read-only access to a collection, which usually means returning a read-only wrapper around the collection. Making the return type a read-only interface implemented by the collection and returning the actual collection reference doesn't provide much protection: the caller can easily cast to the "real" collection type and then add without any problems.
Indeed, not a good idea. Do not publish your mutable members outside, make a copy if you cannot provide a read-only version on the fly...
public class ObA{
private List<String> foo;
public List<String> getFoo(){return Collections.unmodifiableList(foo);}
public void addString(String value) { foo.add(value); }
}
If you want an opinion about doing this, I'd remove the getFoo() call and add an add(String msg) and remove(String msg) methods (or whatever other functionality you want to expose) to ObA
Giving access to collection always seems to be a bad thing in my experience--mostly because they are virtually impossible to control once they get out. I've taken to the habit of NEVER allowing direct access to collections outside the class that contains them.
The main reasoning behind this is that there is almost always some sort of business logic attached to the collection of data--for instance, validation on addition or perhaps some day you'll need to add a second closely-related collection.
If you allow access like you are talking about, it will be very difficult in the future to make a modification like this.
Oh, also, I often find that I eventually have to store a little more data with the object I'm storing--so I create a new object (only known inside the "Container" that houses the collection) and I put the object inside that before putting it in the collection.
If you've kept your collection locked down, this is a trivial refactor. Try to imagine how difficult it would be in some case you've worked on where you didn't keep the collection locked down...
If you wanted to support add and remove functions to Foo, I would suggest the methods addFoo() and removeFoo(). I ideally you could eliminate the getFoo at together by creating a method for each piece of functionality you need. This make it clear as to the functions a caller will preform on the list.

Should Java method arguments be used to return multiple values?

Since arguments sent to a method in Java point to the original data structures in the caller method, did its designers intend for them to used for returning multiple values, as is the norm in other languages like C ?
Or is this a hazardous misuse of Java's general property that variables are pointers ?
A long time ago I had a conversation with Ken Arnold (one time member of the Java team), this would have been at the first Java One conference probably, so 1996. He said that they were thinking of adding multiple return values so you could write something like:
x, y = foo();
The recommended way of doing it back then, and now, is to make a class that has multiple data members and return that instead.
Based on that, and other comments made by people who worked on Java, I would say the intent is/was that you return an instance of a class rather than modify the arguments that were passed in.
This is common practice (as is the desire by C programmers to modify the arguments... eventually they see the Java way of doing it usually. Just think of it as returning a struct. :-)
(Edit based on the following comment)
I am reading a file and generating two
arrays, of type String and int from
it, picking one element for both from
each line. I want to return both of
them to any function which calls it
which a file to split this way.
I think, if I am understanding you correctly, tht I would probably do soemthing like this:
// could go with the Pair idea from another post, but I personally don't like that way
class Line
{
// would use appropriate names
private final int intVal;
private final String stringVal;
public Line(final int iVal, final String sVal)
{
intVal = iVal;
stringVal = sVal;
}
public int getIntVal()
{
return (intVal);
}
public String getStringVal()
{
return (stringVal);
}
// equals/hashCode/etc... as appropriate
}
and then have your method like this:
public void foo(final File file, final List<Line> lines)
{
// add to the List.
}
and then call it like this:
{
final List<Line> lines;
lines = new ArrayList<Line>();
foo(file, lines);
}
In my opinion, if we're talking about a public method, you should create a separate class representing a return value. When you have a separate class:
it serves as an abstraction (i.e. a Point class instead of array of two longs)
each field has a name
can be made immutable
makes evolution of API much easier (i.e. what about returning 3 instead of 2 values, changing type of some field etc.)
I would always opt for returning a new instance, instead of actually modifying a value passed in. It seems much clearer to me and favors immutability.
On the other hand, if it is an internal method, I guess any of the following might be used:
an array (new Object[] { "str", longValue })
a list (Arrays.asList(...) returns immutable list)
pair/tuple class, such as this
static inner class, with public fields
Still, I would prefer the last option, equipped with a suitable constructor. That is especially true if you find yourself returning the same tuple from more than one place.
I do wish there was a Pair<E,F> class in JDK, mostly for this reason. There is Map<K,V>.Entry, but creating an instance was always a big pain.
Now I use com.google.common.collect.Maps.immutableEntry when I need a Pair
See this RFE launched back in 1999:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4222792
I don't think the intention was to ever allow it in the Java language, if you need to return multiple values you need to encapsulate them in an object.
Using languages like Scala however you can return tuples, see:
http://www.artima.com/scalazine/articles/steps.html
You can also use Generics in Java to return a pair of objects, but that's about it AFAIK.
EDIT: Tuples
Just to add some more on this. I've previously implemented a Pair in projects because of the lack within the JDK. Link to my implementation is here:
http://pbin.oogly.co.uk/listings/viewlistingdetail/5003504425055b47d857490ff73ab9
Note, there isn't a hashcode or equals on this, which should probably be added.
I also came across this whilst doing some research into this questions which provides tuple functionality:
http://javatuple.com/
It allows you to create Pair including other types of tuples.
You cannot truly return multiple values, but you can pass objects into a method and have the method mutate those values. That is perfectly legal. Note that you cannot pass an object in and have the object itself become a different object. That is:
private void myFunc(Object a) {
a = new Object();
}
will result in temporarily and locally changing the value of a, but this will not change the value of the caller, for example, from:
Object test = new Object();
myFunc(test);
After myFunc returns, you will have the old Object and not the new one.
Legal (and often discouraged) is something like this:
private void changeDate(final Date date) {
date.setTime(1234567890L);
}
I picked Date for a reason. This is a class that people widely agree should never have been mutable. The the method above will change the internal value of any Date object that you pass to it. This kind of code is legal when it is very clear that the method will mutate or configure or modify what is being passed in.
NOTE: Generally, it's said that a method should do one these things:
Return void and mutate its incoming objects (like Collections.sort()), or
Return some computation and don't mutate incoming objects at all (like Collections.min()), or
Return a "view" of the incoming object but do not modify the incoming object (like Collections.checkedList() or Collections.singleton())
Mutate one incoming object and return it (Collections doesn't have an example, but StringBuilder.append() is a good example).
Methods that mutate incoming objects and return a separate return value are often doing too many things.
There are certainly methods that modify an object passed in as a parameter (see java.io.Reader.read(byte[] buffer) as an example, but I have not seen parameters used as an alternative for a return value, especially with multiple parameters. It may technically work, but it is nonstandard.
It's not generally considered terribly good practice, but there are very occasional cases in the JDK where this is done. Look at the 'biasRet' parameter of View.getNextVisualPositionFrom() and related methods, for example: it's actually a one-dimensional array that gets filled with an "extra return value".
So why do this? Well, just to save you having to create an extra class definition for the "occasional extra return value". It's messy, inelegant, bad design, non-object-oriented, blah blah. And we've all done it from time to time...
Generally what Eddie said, but I'd add one more:
Mutate one of the incoming objects, and return a status code. This should generally only be used for arguments that are explicitly buffers, like Reader.read(char[] cbuf).
I had a Result object that cascades through a series of validating void methods as a method parameter. Each of these validating void methods would mutate the result parameter object to add the result of the validation.
But this is impossible to test because now I cannot stub the void method to return a stub value for the validation in the Result object.
So, from a testing perspective it appears that one should favor returning a object instead of mutating a method parameter.

Should I keep instance variables in Java always initialized or not?

I recently started a new project and I'm trying to keep my instance variables always initialized to some value, so none of them is at any time null. Small example below:
public class ItemManager {
ItemMaster itemMaster;
List<ItemComponentManager> components;
ItemManager() {
itemMaster = new ItemMaster();
components = new ArrayList<ItemComponentManager>();
}
...
}
The point is mainly to avoid the tedious checking for null before using an instance variable somewhere in the code. So far, it's working good and you mostly don't need the null-value as you can check also for empty string or empty list, etc. I'm not using this approach for method scoped variables as their scope is very limited and so doesn't affect other parts of the code.
This all is kind of experimental, so I'd like to know if this approach could work or if there are some pitfalls which I'm not seeing yet. Is it generally a good idea to keep instance variables initialized?
I usually treat an empty collection and a null collection as two separate things:
An empty collection implies that I know there are zero items available. A null collection will tell me that I don't know the state of the collection, which is a different thing.
So I really do not think it's an either/or. And I would declare the variable final if I initialize them in the constructor. If you declare it final it becomes very clear to the reader that this collection cannot be null.
First and foremost, all non-final instance variables must be declared private if you want to retain control!
Consider lazy instantiation as well -- this also avoids "bad state" but only initializes upon use:
class Foo {
private List<X> stuff;
public void add(X x) {
if (stuff == null)
stuff = new ArrayList<X>();
stuff.add(x);
}
public List<X> getStuff() {
if (stuff == null)
return Collections.emptyList();
return Collections.unmodifiableList(stuff);
}
}
(Note the use of Collections.unmodifiableList -- unless you really want a caller to be able to add/remove from your list, you should make it immutable)
Think about how many instances of the object in question will be created. If there are many, and you always create the lists (and might end up with many empty lists), you could be creating many more objects than you need.
Other than that, it's really a matter of taste and if you can have meaningful values when you construct.
If you're working with a DI/IOC, you want the framework to do the work for you (though you could do it through constructor injection; I prefer setters)
-- Scott
I would say that is totally fine - just as long as you remember that you have "empty" placeholder values there and not real data.
Keeping them null has the advantage of forcing you to deal with them - otherwise the program crashes. If you create empty objects, but forget them you get undefined results.
And just to comment on the defencive coding - If you are the one creating the objects and are never setting them null, there is no need to check for null every time. If for some reason you get null value, then you know something has gone catastrophically wrong and the program should crash anyway.
I would make them final if possible. Then they have to be initialized in the constructor and cannot become null.
You should also make them private in any case, to prevent other classes from assigning null to them. If you can check that null is never assigned in your class then the approach will work.
I have come across some cases where this causes problems.
During deserialization, some frameworks will not call the constructor, I don't know how or why they choose to do this but it happens. This can result in your values being null. I have also come across the case where the constructor is called but for some reason member variables are not initialized.
In actual fact I'd use the following code instead of yours:
public class ItemManager {
ItemMaster itemMaster = new ItemMaster();
List<ItemComponentManager> components = new ArrayList<ItemComponentManager>();
ItemManager() {
...
}
...
}
The way I deal with any variable I declare is to decide if it will change over the lifetime of the object (or class if it is static). If the answer is "no" then I make it final.
Making it final forces you to give it a value when the object is created... personally I would do the following unless I knew that I would be changing what the point at:
private final ItemMaster itemMaster;
private final List components;
// instance initialization block - happens at construction time
{
itemMaster = new ItemMaster();
components = new ArrayList();
}
The way your code is right now you must check for null all the time because you didn't mark the variables as private (which means that any class in the same package can change the values to null).
Yes, it is very good idea to initialize all class variables in the constructor.
The point is mainly to avoid the
tedious checking for null before using
a class variable somewhere in the
code.
You still have to check for null. Third party libraries and even the Java API will sometimes return null.
Also, instantiating an object that may never be used is wasteful, but that would depend on the design of your class.
An object should be 100% ready for use after it's constructed. Users should not have to be checking for nulls. Defensive programming is the way to go - keep the checks.
In the interest of DRY, you can put the checks in the setters and simply have the constructor call them. That way you don't code the checks twice.
If it's all your code and you want to set that convention, it should be a nice thing to have. I agree with Paul's comment, though, that nothing prevents some errant code from accidentally setting one of your class variables to null. As a general rule, I always check for null. Yeah, it's a PITA, but defensive coding can be a good thing.
From the name of the class "ItemManager", ItemManager sounds like a singleton in some app. If so you should investigate and really, really, know Dependency Injection. Use something like Spring ( http://www.springsource.org/ ) to create and inject the list of ItemComponentManagers into ItemManager.
Without DI, Initialization by hand in serious apps is a nightmare to debug and connecting up various "manager" classes to make narrow tests is hell.
Use DI always (even when constructing tests). For data objects, create a get() method that creates the list if it doesn't exist. However if the object is complex, almost certainly will find your life better using the Factory or Builder pattern and have the F/B set the member variables as needed.
What happens if in one of your methods you set
itemMaster = null;
or you return a reference to the ItemManager to some other class and it sets itemMaster as null.
(You can guard against this easily return a clone of your ItemManager etc)
I would keep the checks as this is possible.

Categories