I am currently working with a lot of nested level objects and was thinking about performance.
So let's say I have the following classes:
class Address {
private String doorNumber;
private String street;
...
}
and another class Customer.
class Customer {
private List<Address> addressList;
private String firstName;
.....
.....
.....
}
and when I try to access it like below:
public static void main(String[] str) {
Customer customer = new Customer();
// ... and add addresses to this Customer object.
// Set 1
// And if I then do...
customer.getAddressList().get(0).getHouseNumber();
customer.getAddressList().get(0).getStreet();
// OR
// Set 2
Address address = customer.getAddressList().get(0);
address.getHouseNumber();
address.getStreet()
}
I know the first set of lines to access the Address is not clean code, and I assumed the compiler would sort this out but it doesn't. Because when I decompile my code, I get exactly the same thing back so am not sure if the compiler is doing any optimisations there. So my first question is why doesn't the compiler clean this up and assign it to a temporary variable?
And my next question is, is this something to do with performance? And which is the more optimal performant code of the two, apart from the first not being very clean code. Does it mean, that my 2nd set of lines of code would internally get translated to the first during compilation?
And finally the last one, is it more optimal to call variables on a class than its getter method? Am just thinking performance here without clean coding.
Side effects.
Consider this case, where instead of returning some text, calling your get method has some internal side effect:
// This goes up each time getAddressList is called.
public int addressesRequested;
public List<Address> getAddressList(){
addressesRequested++;
return addressList;
}
Of course, in this method such a side effect doesn't make much sense, but there are a wide variety of ways in which a method call can leave some form of left over effect.
customer.getAddressList(); // addressesRequested is now 1.
customer.getAddressList(); // addressesRequested is now 2.
...
As a result, the compiler can't optimise multiple method calls into one - it has to assume that a method call has side effects.
It's also worth noting that a method can also be inlined - that's where the body of the method is copied to the call site to avoid a method calls overhead. This generally only happens when the JVM believes such an optimization is merited; i.e. because it's being called frequently. It does not, however, result in the callsite being optimised any further - it won't trigger some kind of temporary variable there.
What about fields? They can't produce side effects..can they?
Ok, so you're now thinking about this:
// Assume addressList was public and could be accessed like so:
customer.addressList.get(0)..
customer.addressList.get(0)..
..
They don't produce side effects, but the compiler won't drop it in a temporary variable either. This is because side effects are a two way street - some other method could be changing that addressList field; most likely from some other thread.
Related
This question already has answers here:
How shall we write get method, so that private fields don't escape their intended scope? [duplicate]
(2 answers)
Closed 3 years ago.
In Java Concurrency in Practice chapter # 3 author has suggested not to share the mutable state. Further he has added that below code is not a good way to share the states.
class UnsafeStates {
private String[] states = new String[] {
"AK", "AL"
};
public String[] getStates() {
return states;
}
}
From the book:
Publishing states in this way is problematic because any caller can modify its contents. In this case, the states array has escaped its intended scope, because what was supposed to be private state has been effectively made public.
My question here is: we often use getter and setters to access the class level private mutable variables. if it is not the correct way, what is the correct way to share the state? what is the proper way to encapsulate states ?
For primitive types, int, float etc, using a simple getter like this does not allow the caller to set its value:
someObj.getSomeInt() = 10; // error!
However, with an array, you could change its contents from the outside, which might be undesirable depending on the situation:
someObj.getSomeArray()[0] = newValue; // perfectly fine
This could lead to problems where a field is unexpectedly changed by other parts of code, causing hard-to-track bugs.
What you can do instead, is to return a copy of the array:
public String[] getStates() {
return Arrays.copyOf(states, states.length);
}
This way, even the caller changes the contents of the returned array, the array held by the object won't be affected.
With what you have it is possible for someone to change the content of your private array just through the getter itself:
public static void main(String[] args) {
UnsafeStates us = new UnsafeStates();
us.getStates()[0] = "VT";
System.out.println(Arrays.toString(us.getStates());
}
Output:
[VT, AR]
If you want to encapsulate your States and make it so they cannot change then it might be better to make an enum:
public enum SafeStates {
AR,
AL
}
Creating an enum gives a couple advantages. It allows exact vales that people can use. They can't be modified, its easy to test against and can easily do a switch statement on it. The only downfall for going with an enum is that the values have to be known ahead of time. I.E you code for it. Cannot be created at run time.
This question seems to be asked with respect to concurrency in particular.
Firstly, of course, there is the possibility of modifying non-primitive objects obtained via simple-minded getters; as others have pointed out, this is a risk even with single-threaded programs. The way to avoid this is to return a copy of an array, or an unmodifiable instance of a collection: see for example Collections.unmodifiableList.
However, for programs using concurrency, there is risk of returning the actual object (i.e., not a copy) even if the caller of the getter does not attempt to modify the returned object. Because of concurrent execution, the object could change "while he is looking at it", and in general this lack of synchronization could cause the program to malfunction.
It's difficult to turn the original getStates example into a convincing illustration of my point, but imagine a getter that returns a Map instead. Inside the owning object, correct synchronization may be implemented. However, a getTheMap method that returns just a reference to the Map is an invitation for the caller to call Map methods (even if just map.get) without synchronization.
There are basically two options to avoid the problem: (1) return a deep copy; an unmodifiable wrapper will not suffice in this case, and it should be a deep copy otherwise we just have the same problem one layer down, or (2) do not return unmediated references; instead, extend the method repertoire to provide exactly what is supportable, with correct internal synchronization.
Situation: I have multiple states of the same object represented by different instances (which are made using a deep-copy). Now I want to make sure that, no matter which of these grouped instances is accessed, all operations that perform modifications are redirected onto the youngest of these instances[1].
Example:[2]
//Let's create an object
MyObject mObj = new MyObject(...);
//Let's create a list of past states
List<MyObject> pastStates = new ArrayList<MyObject>();
//doing some operations on mObj ....
mObj.modify(...);
//done modifying mObj, now let's save it's state and then create a copy to begin again
pastStates.add(mObj.copy());
//more of this...
mObj.modify(...);
pastStates.add(mObj.copy());
//let's compare some old states for whatever reason (e.g. part of an algorithm)
compare(MyObject o1, MyObject o2) {
if(o1.getA() == o2.getA()) {
o2.modify(...); //wait, we modified an old state...
}
Now this is a rather obvious example and probably a classic case of programmer's fault. They modified something that is clearly advertised as being a past state whatsoever... But say we still want to be nice and try to help and thus intercept the method call and perform it on the correct instance namely the youngest/master instance.[3]
Question: Is there a way to do this with standard java?
Bonus: Is there a way that doesn't have a horrible impact on performance?
Background: I'm experimenting around with different ways to make a library/engine, I'm writing for fun, harder to misuse by the enduser. As I will need these states internally anyways (snapshots in time for certain background functionalities), I would like to make them available to the enduser as well so they can profit of my statekeeping, e.g. for use in analytical algorithms.
[1] There can be multiple groups of instances of an object that are not related to each other; relation will presumably be kept by a one way link to the youngest instance which simply won't ever change.
[2] This code is meant as an example, it is clear that this mistake could be prevented by the enduser paying more attention when writing code.
[3] Now an easy way to prevent modification is to wrap the object into an immutable version which throws exception when trying to modify it > but we do not write this object ourselves and don't want to force it upon the enduser to write two versions of their own object if we don't have to...
I would probably create two classes: an "inner" one which is immutable and an "outer" one that maintains a list of inners. (Note: I don't mean inner classes in the JLS sense, just an object that is fully controlled by its wrapper.)
Something like this:
public final class Outer {
private final List<Inner> history = new ArrayList<>(); //history is inverted for brevity, 0 is the latest one
public Outer(int x) {
this.history.add(new Inner(x));
}
public void add(int x) {
history.add( 0, new Inner(history.get(0).x+x);
}
public Inner current() {
return history.get(0);
}
public static final class Inner {
private final int x;
private Inner(int x) {
this.x = x;
}
public int getX() {
return x;
}
}
}
With this setup clients can only instantiate Outer, can only mutate Outer but have access to a read-only copy of all the past states. There is no way to accidentally modify a past state. There is no need for separate grouping logic either because each instance of Outer naturally only records its own history.
Method interception can be done with AOP by using an around advice. AspectJ is a good tool for solving such problems. The impact on performance should also be no problem.
In an around advice in most cases you call proceed to execute the target method on the target object, but you can also prevent the method execution and instead do a method call on another object.
Yes, it is possible using bytecode modification.
Actually, if it was done by AspectJ or other library, it would be implemented using proxies or byte code modification. But I'm not sure that this specific task is possible with Aspect programming libraries API.
You can find working example for your task in this repo.
This test from repository works fine:
//Let's create an object
MyObject mObj = new MyObject();
MyObjectActiveRepository.INSTANCE.putToGroup(mObj, "group1");
MyObjectActiveRepository.INSTANCE.registerActiveForItsGroup(mObj);
//Let's create a list of past states
List<MyObject> pastStates = new ArrayList<MyObject>();
//doing some operations on mObj ....
mObj.modify("state1");
//done modifying mObj, now let's save it's state and then create a copy to begin again
pastStates.add(mObj.copy());
//more of this...
mObj.modify("state2");
pastStates.add(mObj.copy());
mObj.modify("state3");
assertEquals("state1", pastStates.get(0).getState());
assertEquals("state2", pastStates.get(1).getState());
assertEquals("state3", mObj.getState());
pastStates.get(0).modify("stateNew");
assertEquals("state1", pastStates.get(0).getState());
assertEquals("state2", pastStates.get(1).getState());
assertEquals("stateNew", mObj.getState());
Shortly -
I use ByteBuddy (Bytecode generation and modification tool) to redefine class bytecode before it has been load to:
remove final from class (if we have)
add field to save MyObject's "group" to address your (1) note
intercept call to copy(we need to copy "group" field additionally) and modify (to retarget call)
replace class code in classloader
TypePool typePool = TypePool.Default.ofClassPath();
new ByteBuddy()
.rebase(typePool.describe("MyObject").resolve(), ClassFileLocator.ForClassLoader.ofClassPath())
.modifiers(TypeManifestation.PLAIN) //our class can be final and we have no access to it - so remove final
.defineField("group", String.class, Visibility.PUBLIC)
.method(named("modify")).intercept(MethodDelegation.to(typePool.describe("Interceptors").resolve()))
.method(named("copy")).intercept(MethodDelegation.to(typePool.describe("Interceptors").resolve()))
.make()
.load(InterceptorsInitializer.class.getClassLoader(), ClassLoadingStrategy.Default.INJECTION);
Implemented MyObjectActiveRepository which contains information about active object for group and "group" field related functionality.Interceptors with simple copy redefinition which add "group" setting and modify, which makes our retargeting.
I think it should be lite code, the most expensive part is reflection call to setter on group-to-object assignment after object creation (this part can be improved; if we use ByteBuddy - we can replace reflection with implementing new interface with getGroup() and setGroup(String) methods during byte code generation with delegating them to FieldAccessor.ofField("group"), so we will have fine effective invokevirtual thru interface). modify() should have near the same performance, because it doesn't use reflection, only fully generated bytecode. I didn't make any benchmarking.
I've got a recursive method which has local String variables:
private void recursiveUpdate(int id){
String selectQuery="Select ...";
String updateQuery="Update or rollback ..."
...
for(int childID: children)
recursiveUpdate(childID);
}
Is there any reason to externalize local String variables like this:
private static final String selectQuery="Select ...";
private static final String updateQuery="Update or rollback ..."
private void recursiveUpdate(int id){
...
for(int childID: children)
recursiveUpdate(childID);
}
From a technical point of view the difference between the two should be negligible since in either case you'd always use the same string instances. If you are parsing those query in every call you might consider externalizing that as well (e.g. using prepared statements).
From a development point of view, I'd probably externalize the queries to separate them from the call logic.
In the former case, you are relying on the compiler to recognize that those strings are unchanging across all calls so it doesn't need to give a fresh copy of each variable to each invocation of recursiveUpdate, whereas in the latter case, there is no question about it.
Yes. You'd want to externalize the variables. If left as a local variable, amd depending on the size of the call stack, you could quickly accumulate many string objects and lower efficiency. Even worse if making edits to the string inside the recursive method. Also, you will not be making changes to the strings as it appears to me, so if used as a 'reference' it would be better to externalize it.
Typically you should be concerned with the size of your stack memory when making recursive calls. This tells the cpu where to jump when the method completes. It contains your method parameters and returning location.
Object instantiated within the body of the method are saved in the heap. In this case, I think the compiler will figure out that these are constants and save them to static memory. The heap is much larger and is more likely to survive the recursion when objects are instantiated so I wouldn't worry about it.. By moving object out, you'll save a little space in your heap.
IMO, it's best to move the variable out, if the values are always the same (no-dynamic). This way, if they ever change, you can find them easily.
A very unimportant question about Java performance, but it made me wondering today.
Say I have simple getter:
public Object getSomething() {
return this.member;
}
Now, say I need the result of getSomething() twice (or more) in some function/algorithm. My question: is there any difference in either calling getSomething() twice (or more) or in declaring a temporary, local variable and use this variable from then on?
That is, either
public void algo() {
Object o = getSomething();
... use o ...
}
or
public void algo() {
... call getSomething() multiple times ...
}
I tend to mix both options, for no specific reason. I know it doesn't matter, but I am just wondering.
Thanks!
Technically, it's faster to not call the method multiple times, however this might not always be the case. The JVM might optimize the method calls to be inline and you won't see the difference at all. In any case, the difference is negligible.
However, it's probably safer to always use a getter. What if the value of the state changes between your calls? If you want to use a consistent version, then you can save the value from the first call. Otherwise, you probably want to always use the getter.
In any case, you shouldn't base this decision on performance because it's so negligible. I would pick one and stick with it consistently. I would recommend always going through your getters/setters.
Getters and setters are about encapsulation and abstraction. When you decide to invoke the getter multiple times, you are making assumptions about the inner workings of that class. For example that it does no expensive calculations, or that the value is not changed by other threads.
I'd argue that its better to call the getter once and store its result in a temporary variable, thus allowing you to freely refactor the implementing class.
As an anecdote, I was once bitten by a change where a getter returned an array, but the implementing class was changed from an array property to using a list and doing the conversion in the getter.
The compiler should optimize either one to be basically the same code.
Since arguments sent to a method in Java point to the original data structures in the caller method, did its designers intend for them to used for returning multiple values, as is the norm in other languages like C ?
Or is this a hazardous misuse of Java's general property that variables are pointers ?
A long time ago I had a conversation with Ken Arnold (one time member of the Java team), this would have been at the first Java One conference probably, so 1996. He said that they were thinking of adding multiple return values so you could write something like:
x, y = foo();
The recommended way of doing it back then, and now, is to make a class that has multiple data members and return that instead.
Based on that, and other comments made by people who worked on Java, I would say the intent is/was that you return an instance of a class rather than modify the arguments that were passed in.
This is common practice (as is the desire by C programmers to modify the arguments... eventually they see the Java way of doing it usually. Just think of it as returning a struct. :-)
(Edit based on the following comment)
I am reading a file and generating two
arrays, of type String and int from
it, picking one element for both from
each line. I want to return both of
them to any function which calls it
which a file to split this way.
I think, if I am understanding you correctly, tht I would probably do soemthing like this:
// could go with the Pair idea from another post, but I personally don't like that way
class Line
{
// would use appropriate names
private final int intVal;
private final String stringVal;
public Line(final int iVal, final String sVal)
{
intVal = iVal;
stringVal = sVal;
}
public int getIntVal()
{
return (intVal);
}
public String getStringVal()
{
return (stringVal);
}
// equals/hashCode/etc... as appropriate
}
and then have your method like this:
public void foo(final File file, final List<Line> lines)
{
// add to the List.
}
and then call it like this:
{
final List<Line> lines;
lines = new ArrayList<Line>();
foo(file, lines);
}
In my opinion, if we're talking about a public method, you should create a separate class representing a return value. When you have a separate class:
it serves as an abstraction (i.e. a Point class instead of array of two longs)
each field has a name
can be made immutable
makes evolution of API much easier (i.e. what about returning 3 instead of 2 values, changing type of some field etc.)
I would always opt for returning a new instance, instead of actually modifying a value passed in. It seems much clearer to me and favors immutability.
On the other hand, if it is an internal method, I guess any of the following might be used:
an array (new Object[] { "str", longValue })
a list (Arrays.asList(...) returns immutable list)
pair/tuple class, such as this
static inner class, with public fields
Still, I would prefer the last option, equipped with a suitable constructor. That is especially true if you find yourself returning the same tuple from more than one place.
I do wish there was a Pair<E,F> class in JDK, mostly for this reason. There is Map<K,V>.Entry, but creating an instance was always a big pain.
Now I use com.google.common.collect.Maps.immutableEntry when I need a Pair
See this RFE launched back in 1999:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4222792
I don't think the intention was to ever allow it in the Java language, if you need to return multiple values you need to encapsulate them in an object.
Using languages like Scala however you can return tuples, see:
http://www.artima.com/scalazine/articles/steps.html
You can also use Generics in Java to return a pair of objects, but that's about it AFAIK.
EDIT: Tuples
Just to add some more on this. I've previously implemented a Pair in projects because of the lack within the JDK. Link to my implementation is here:
http://pbin.oogly.co.uk/listings/viewlistingdetail/5003504425055b47d857490ff73ab9
Note, there isn't a hashcode or equals on this, which should probably be added.
I also came across this whilst doing some research into this questions which provides tuple functionality:
http://javatuple.com/
It allows you to create Pair including other types of tuples.
You cannot truly return multiple values, but you can pass objects into a method and have the method mutate those values. That is perfectly legal. Note that you cannot pass an object in and have the object itself become a different object. That is:
private void myFunc(Object a) {
a = new Object();
}
will result in temporarily and locally changing the value of a, but this will not change the value of the caller, for example, from:
Object test = new Object();
myFunc(test);
After myFunc returns, you will have the old Object and not the new one.
Legal (and often discouraged) is something like this:
private void changeDate(final Date date) {
date.setTime(1234567890L);
}
I picked Date for a reason. This is a class that people widely agree should never have been mutable. The the method above will change the internal value of any Date object that you pass to it. This kind of code is legal when it is very clear that the method will mutate or configure or modify what is being passed in.
NOTE: Generally, it's said that a method should do one these things:
Return void and mutate its incoming objects (like Collections.sort()), or
Return some computation and don't mutate incoming objects at all (like Collections.min()), or
Return a "view" of the incoming object but do not modify the incoming object (like Collections.checkedList() or Collections.singleton())
Mutate one incoming object and return it (Collections doesn't have an example, but StringBuilder.append() is a good example).
Methods that mutate incoming objects and return a separate return value are often doing too many things.
There are certainly methods that modify an object passed in as a parameter (see java.io.Reader.read(byte[] buffer) as an example, but I have not seen parameters used as an alternative for a return value, especially with multiple parameters. It may technically work, but it is nonstandard.
It's not generally considered terribly good practice, but there are very occasional cases in the JDK where this is done. Look at the 'biasRet' parameter of View.getNextVisualPositionFrom() and related methods, for example: it's actually a one-dimensional array that gets filled with an "extra return value".
So why do this? Well, just to save you having to create an extra class definition for the "occasional extra return value". It's messy, inelegant, bad design, non-object-oriented, blah blah. And we've all done it from time to time...
Generally what Eddie said, but I'd add one more:
Mutate one of the incoming objects, and return a status code. This should generally only be used for arguments that are explicitly buffers, like Reader.read(char[] cbuf).
I had a Result object that cascades through a series of validating void methods as a method parameter. Each of these validating void methods would mutate the result parameter object to add the result of the validation.
But this is impossible to test because now I cannot stub the void method to return a stub value for the validation in the Result object.
So, from a testing perspective it appears that one should favor returning a object instead of mutating a method parameter.