I ran Findbug tool on my project and it found 18 problems of the type:
Storing reference to mutable object -> May expose internal representation by incorporating reference to mutable object
So I have a class which the constructor accepts array of type Object and assigns it to a private class member variable. Here is an example:
public Class HtmlCellsProcessing extends HtmlTableProcessing
{
private Object[] htmlCells;
public HtmlCellsProcessing(Object[] htmlCells)
{
this.htmlCells = htmlCells;
}
}
Here is a further explanation about the warning:
This code stores a reference to an externally mutable object into the internal representation of the object. If instances are accessed by untrusted code, and unchecked changes to the mutable object would compromise security or other important properties, you will need to do something different. Storing a copy of the object is better approach in many situations.
The advice they give me is pretty obvious but what happens if the array's size is very big and if I copy its values into the member variable array the application is going to take twice more memory.
What should I do in such a scenario where I have large amount of data? Should I pass it as reference or always copy it?
It depends. You have multiple concerns, including space, time and correctness.
A defensive copy helps you guarantee that the list items will not change without the knowledge of the class holding the array. But it will take O(n) time and space.
For a very large array, you may find that the costs of a defensive copy in space and time are harmful to your application. If you control all the code with access to the array, it may be reasonable to guarantee correctness without a defensive copy, and suppress the FindBugs warning on that class.
I'd duggest you to try using immutable list from guava library. See http://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained
If both encapsulation and performance are required, the typical solution is to pass a reference to an immutable object instead.
Therefore, rather than pass a huge array directly, encapsulate it in an object that does not permit the array's modification:
final class ArraySnapshot {
final Object[] array;
ArraySnapshot(Object[] array) {
this.array = Arrays.copyOf(array);
}
// methods to read from the array
}
This object can now be passed around cheaply, but the since it is immutable, encapsulation is ensured.
This idea, of course, if nothing new: it's what String does for char[].
The advice they give me is pretty obvious but what happens if the
array's size is very big and if i copy its values into the member
variable array the application is going to take twice more memory.
In Java you copy references not object themselves unless you do a deep copy.
So if your only concern is to get rid of the warning (which is valid though especially if you don't understand what you actually store and you have multiple threads modifying the objects) you could do a copy without some much concerns on memory.
Related
"Suppose you are passing or returning an array of references to mutable objects to/from a method. Is it safe to make a reference copy only? Is it safe to make a shallow copy?"
This is a study question that was given to my class and the answer is "Neither one is safe. Only a deep copy is safe in this case."
Why is this?
"Safe" can mean a lot of things, but in your particular context it is about the safety of "your" private data (the text refers to "you" as a writer of some Java class). Your private data cannot be safe if you let the client of your class access and modify it behind your back.
Therefore:
if you return an array of mutable objects, you must make copies of all those objects and return them in a new array;
if you get an array of mutable objects passed in, you must again copy them all and put them into a new array—because your client already has references to the objects he passed in.
In practice, all this is a lot of CPU work and takes memory, so it is rarely done. You either design everything to be immutable—or else live with the danger inherent to mutable objects.
If your objects are mutable that means that any client with a reference to them can modify them. This can lead to race conditions, deadlocks and other un-fun behavior. However, if you make a deep copy of your objects right before using them, you will effectively be working with a snapshot of the objects. This ensures no other client is able to modify them, eliminating any concurrency or correctness concern.
In the first case both the original array elements and the objects can be modified. In the second case only the objects can be modified, as we no longer have access to the original array. If you perform a deep copy we are working with entirely different arrays and objects, so of course it's safe.
Suppose there is an Integer array in my class:
public class Foo {
private Integer[] arr = new Integer[20];
.....
}
On a 64 bit architecture the space requirement for this is ~ (20*8+24) + 24*20 {space required for references + some array overhead + space required for objects}.
Why java stores references to all of the 20 Integer objects? Wouldn't knowing that first memory location and the number of items in the array suffice? (assuming and I also as I read somewhere that objects in an array are placed contiguously anyways). I want to know the reason for this sort of implementation. Sorry if this is a noobish question.
Like every other class, Integer is a reference type. This means it can only be accessed indirectly, via a reference. You cannot store an instance of a reference type in a field, a local variable, a slot in a collection, etc. -- you always have to store a reference and allocate the object itself separately. There are a variety of reasons for this:
You need to be able to represent null.
You need to be able to replace it with another instance of a subtype (assuming subtypes are possible, i.e. the class is not final). For example, an Object[] may actually store instances of any number of different classes with wildly varying sizes.
You need to preserve sharing, e.g. after a[0] = a[1] = someObject; all three must refer to the same object. This is much more important (vital even) if the object is mutable, but even with immutable objects the difference can be observed via reference equality checks (==).
You need reference assignment to be atomic (cf. Java memory model), so copying the whole instance is even more expensive than it seems.
With these and many other constraints, always storing references is the only feasible implementation strategy (in general). In very specific circumstances, a JIT compiler may avoid allocating an object entirely and store its directly (e.g. on the stack), but this is an obscure implementation detail, and not widely applicable. I only mention this for completeness and because it's a wonderful illustration of the as-if rule.
I'm writing a soft that process some annotations. One of annotation's parameters is an array. One object finds this array and pass it to another object to process it. And then findbugs starts to scream that I'm passing a private array that may be mutated by malicious code. so the question is: is that true? can annotation parameters be changed in runtime?
This is true: you pass a reference to an array, and arrays are mutable. The callee can modify this array.
Your best course of action is to pass a copy of that array to the callee instead of the original array, for instance by using Arrays.copyOf().
Alternatively, instead of an array, you may want to return a List instead and use the Collections.unmodifiableList() wrapper since this will avoid unnecessary copies.
Arrays returned through reflection should be a fresh copy every time they are retrieved, so there's no problem.
From a mobile code, or in general code quality, perspective you should expect an array returned or passed as an argument to an untrusted method to be malicious modified. Similarly on the receiver side of things, arrays passed as parameters or returned from callbacks may be malicious modified later. So there arrays need to be copied before handing them out and also as they are received (even before any validation).
#fge mention Lists. When sending these out, an unmodifiable collection cannot be modified by the receiver. Receiving collections is a little more tricky. Obviously taking an untrusted List and wrapping it with unmodifiableList wont work. new ArrayList<>(things) is the way to go. Don't attempt to clone a malicious ArrayList because you cannot be sure what clone actually does.
Obviously, if you have an array of mutable objects, both the array and elements will need to be copied.
I've been reading this section about static methods and about passing arrays by call-by-reference. So my question is: When is it ever a good time to return an array argument as an array as opposed to void?
Thanks in advance!
I've been reading this section about static methods passing arrays by call-by-reference.
If this is a Java article, site, blog, book, whatever, you should probably find a better one. Java doesn't use "call-by-reference" on any parameters. Anyone who says that either doesn't understand Java, or doesn't understand what "call by reference" means.
Java passes all arguments by value. Period.
The point of confusion is that some people don't understand that Objects and arrays in Java are always references. So when you pass an array (for instance) as an argument, you are passing the reference to the array by value. However, this ins NOT a nitpick. Passing a reference by value is semantically very different to true "call by value".
Re the actual quote:
"All of these examples highlight the basic fact that the mechanism for passing arrays in Java is a call by reference mechanism with respect to the contents of the array" (Sedgewick).
I understand what he is saying, given the qualification " with respect to the contents of the array". But calling this "call by reference" is misleading. (And clearly, you were mislead by it, to a certain degree!)
I would also argue that it is technically wrong. The terms "call-by-value", "call-by-reference", "call-by-name" (and so on are) about parameter passing / returning. In this case, the parameter is the array as a whole, not the contents of the array. And at that level, the semantics are clearly NOT call-by-reference. (Assigning a new array reference to the parameter name in the method does not update the array variable in the caller.) The fact that the behaviour is not distinguishable from call-by-reference with respect to the array contents does not make it call-by-reference.
Now to the meat of your question ...
When is it ever a good time to return an array argument as an array as opposed to void?
It is not entirely clear what you mean, but I assume that you are talking about these two alternatives:
public void method(String arg, String[] result) ...
versus
public String[] method(String arg) ...
I'd say that the second form is normally preferable because it is easier to understand and to use. Furthermore, the second form allows the method to choose the size of the result array. (With the first form, if the array is too small or too large, there is no way to return the reference to a reallocated array.)
The only cases where the first form should be used are:
when the functional requirements for the method depend on it being to update an existing array, or
when there is an overarching need to minimize the number of objects that are allocated; e.g. to minimize GC pauses.
The first case might arise if the array is already referenced in other data structures, and finding / updating those references would be difficult. It also might arise if the array is large, and the cost of making a copy would dominate the cost of the real work done by the method.
All parameters passed into a method in java are reference expect the primitive type, so wherever inside or outside the method it just keeps one object storage in memory, Static method are even NOT treated as any special case here. In your case of returning this array or a void type, it would not make any differences.
If you return this array, the returned value is what exactly you just now passed into this method.
WELL, ultimately... passing an array by value is slow. It has to grab a block of memory and copy the array. If the array is only a few bytes in size, its not a big deal. But if its a large chunk of memory, then this will be a slow IO operation. ESPECIALLY if this is happening in a tight loop, it will hurt performance.
Passing by reference will allow you to create a buffer ahead of time and reuse it.
In Java, we can always use an array to store object reference. Then we have an ArrayList or HashTable which is automatically expandable to store objects. But does anyone know a native way to have an auto-expandable array of object references?
Edit: What I mean is I want to know if the Java API has some class with the ability to store references to objects (but not storing the actual object like XXXList or HashTable do) AND the ability of auto-expansion.
Java arrays are, by their definition, fixed size. If you need auto-growth, you use XXXList classes.
EDIT - question has been clarified a bit
When I was first starting to learn Java (coming from a C and C++ background), this was probably one of the first things that tripped me up. Hopefully I can shed some light.
Unlike C++, Object arrays in Java do not store objects. They store object references.
In C++, if you declared something similar to:
String myStrings[10];
You would get 10 String objects. At this point, it would be perfectly legal to do something like println(myStrings[5].length); - you'd get '0' - the default constructor for String creates an empty string with length 0.
In Java, when you construct a new array, you get an empty container that can hold 10 String references. So the call:
String[] myStrings = new String[10];
println(myStringsp[5].length);
would throw a null pointer exception, because you haven't actually placed a String reference into the array yet.
If you are coming from a C++ background, think of new String[10] as being equivalent to new (String *)[10] from C++.
So, with that in mind, it should be fairly clear why ArrayList is the solution for an auto expanding array of objects (and in fact, ArrayList is implemented using simple arrays, with a growth algorithm built in that allocates new expanded arrays as needed and copies the content from the old to the new).
In practice, there are actually relatively few situations where we use arrays. If you are writing a container (something akin to ArrayList, or a BTree), then they are useful, or if you are doing a lot of low level byte manipulation - but at the level that most development occurs, using one of the Collections classes is by far the preferred technique.
All the classes implementing Collection are expandable and store only references: you don't store objects, you create them in some data space and only manipulate references to them, until they go out of scope without reference on them.
You can put a reference to an object in two or more Collections. That's how you can have sorted hash tables and such...
What do you mean by "native" way? If you want an expandable list f objects then you can use the ArrayList. With List collections you have the get(index) method that allows you to access objects in the list by index which gives you similar functionality to an array. Internally the ArrayList is implemented with an array and the ArrayList handles expanding it automatically for you.
Straight from the Array Java Tutorials on the sun webpage:
-> An array is a container object that holds a fixed number of values of a single type.
Because the size of the array is declared when it is created, there is actually no way to expand it afterwards. The whole purpose of declaring an array of a certain size is to only allocate as much memory as will likely be used when the program is executed. What you could do is declare a second array that is a function based on the size of the original, copy all of the original elements into it, and then add the necessary new elements (although this isn't very 'automatic' :) ). Otherwise, as you and a few others have mentioned, the List Collections is the most efficient way to go.
In Java, all object variables are references. So
Foo myFoo = new Foo();
Foo anotherFoo = myFoo;
means that both variables are referring to the same object, not to two separate copies. Likewise, when you put an object in a Collection, you are only storing a reference to the object. Therefore using ArrayList or similar is the correct way to have an automatically expanding piece of storage.
There's no first-class language construct that does that that I'm aware of, if that's what you're looking for.
It's not very efficient, but if you're just appending to an array, you can use Apache Commons ArrayUtils.add(). It returns a copy of the original array with the additional element in it.
if you can write your code in javascript, yes, you can do that. javascript arrays are sparse arrays. it will expand whichever way you want.
you can write
a[0] = 4;
a[1000] = 434;
a[888] = "a string";