I'm writing a soft that process some annotations. One of annotation's parameters is an array. One object finds this array and pass it to another object to process it. And then findbugs starts to scream that I'm passing a private array that may be mutated by malicious code. so the question is: is that true? can annotation parameters be changed in runtime?
This is true: you pass a reference to an array, and arrays are mutable. The callee can modify this array.
Your best course of action is to pass a copy of that array to the callee instead of the original array, for instance by using Arrays.copyOf().
Alternatively, instead of an array, you may want to return a List instead and use the Collections.unmodifiableList() wrapper since this will avoid unnecessary copies.
Arrays returned through reflection should be a fresh copy every time they are retrieved, so there's no problem.
From a mobile code, or in general code quality, perspective you should expect an array returned or passed as an argument to an untrusted method to be malicious modified. Similarly on the receiver side of things, arrays passed as parameters or returned from callbacks may be malicious modified later. So there arrays need to be copied before handing them out and also as they are received (even before any validation).
#fge mention Lists. When sending these out, an unmodifiable collection cannot be modified by the receiver. Receiving collections is a little more tricky. Obviously taking an untrusted List and wrapping it with unmodifiableList wont work. new ArrayList<>(things) is the way to go. Don't attempt to clone a malicious ArrayList because you cannot be sure what clone actually does.
Obviously, if you have an array of mutable objects, both the array and elements will need to be copied.
Related
"Suppose you are passing or returning an array of references to mutable objects to/from a method. Is it safe to make a reference copy only? Is it safe to make a shallow copy?"
This is a study question that was given to my class and the answer is "Neither one is safe. Only a deep copy is safe in this case."
Why is this?
"Safe" can mean a lot of things, but in your particular context it is about the safety of "your" private data (the text refers to "you" as a writer of some Java class). Your private data cannot be safe if you let the client of your class access and modify it behind your back.
Therefore:
if you return an array of mutable objects, you must make copies of all those objects and return them in a new array;
if you get an array of mutable objects passed in, you must again copy them all and put them into a new array—because your client already has references to the objects he passed in.
In practice, all this is a lot of CPU work and takes memory, so it is rarely done. You either design everything to be immutable—or else live with the danger inherent to mutable objects.
If your objects are mutable that means that any client with a reference to them can modify them. This can lead to race conditions, deadlocks and other un-fun behavior. However, if you make a deep copy of your objects right before using them, you will effectively be working with a snapshot of the objects. This ensures no other client is able to modify them, eliminating any concurrency or correctness concern.
In the first case both the original array elements and the objects can be modified. In the second case only the objects can be modified, as we no longer have access to the original array. If you perform a deep copy we are working with entirely different arrays and objects, so of course it's safe.
I ran Findbug tool on my project and it found 18 problems of the type:
Storing reference to mutable object -> May expose internal representation by incorporating reference to mutable object
So I have a class which the constructor accepts array of type Object and assigns it to a private class member variable. Here is an example:
public Class HtmlCellsProcessing extends HtmlTableProcessing
{
private Object[] htmlCells;
public HtmlCellsProcessing(Object[] htmlCells)
{
this.htmlCells = htmlCells;
}
}
Here is a further explanation about the warning:
This code stores a reference to an externally mutable object into the internal representation of the object. If instances are accessed by untrusted code, and unchecked changes to the mutable object would compromise security or other important properties, you will need to do something different. Storing a copy of the object is better approach in many situations.
The advice they give me is pretty obvious but what happens if the array's size is very big and if I copy its values into the member variable array the application is going to take twice more memory.
What should I do in such a scenario where I have large amount of data? Should I pass it as reference or always copy it?
It depends. You have multiple concerns, including space, time and correctness.
A defensive copy helps you guarantee that the list items will not change without the knowledge of the class holding the array. But it will take O(n) time and space.
For a very large array, you may find that the costs of a defensive copy in space and time are harmful to your application. If you control all the code with access to the array, it may be reasonable to guarantee correctness without a defensive copy, and suppress the FindBugs warning on that class.
I'd duggest you to try using immutable list from guava library. See http://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained
If both encapsulation and performance are required, the typical solution is to pass a reference to an immutable object instead.
Therefore, rather than pass a huge array directly, encapsulate it in an object that does not permit the array's modification:
final class ArraySnapshot {
final Object[] array;
ArraySnapshot(Object[] array) {
this.array = Arrays.copyOf(array);
}
// methods to read from the array
}
This object can now be passed around cheaply, but the since it is immutable, encapsulation is ensured.
This idea, of course, if nothing new: it's what String does for char[].
The advice they give me is pretty obvious but what happens if the
array's size is very big and if i copy its values into the member
variable array the application is going to take twice more memory.
In Java you copy references not object themselves unless you do a deep copy.
So if your only concern is to get rid of the warning (which is valid though especially if you don't understand what you actually store and you have multiple threads modifying the objects) you could do a copy without some much concerns on memory.
If fooService.getFoos() returns List<Foo>.
then you can write this:
List<Foo> fooList = fooService.getFoos();
or this:
List<Foo> fooList = new ArrayList(fooService.getFoos());
Is there any significant difference in the resulting fooList between these two approaches?
Yes - you are creating a completely new List, containing the elements of the original one. You are duplicating the collection in memory, and iterating it from start to end. You are also not using instance provided by the service, and you can't modify the original. And finally, you've omitted the generics declaration in the 2nd snippet.
So use the first option.
Update: you indicated you are not allowed to modify the original list. This is actually a problem of fooService, not of its clients. If the service is also in your control, return Collections.unmodifiableList(originalList) - thus clients won't be able to perform modification operations (on attempt an exception will be thrown)
The second isn't really a good idea because you omit the generic part.
But the main problem is the unnecessary code which will be called. You can look at the ArrayList code source, and you'll see all the operations used in the constructor. If you only need a List, and fooService.getFoos() returns a valid List, you should stick with it.
The result of those two statement will be more or less the same unless:
later you check if your list is an instance of ArrayList and cast it, but let's face it, you would have declared ArrayList<Foo> if it was the case.
the list returned by fooService.getFoos() shouldn't be modified (for any reason) but you still want modify elements in the List on your side (without affecting the original list).
Resources :
grepcode - ArrayList
I'd stick with the first one just because it reads lots easier and makes much more sense than the second one.
In the second statement it returns only of List type. If you are sure of method returning of same type den you can use firs type.
When should I use an ArrayList in Java, and when should I use an array?
Some differences:
Arrays are immutable in their size, you cannot easly remove and element and remove the hole whereas using an ArrayList is straightforward
Arrays are fast (handled directly by the JVM as special objects) than an ArrayList and requires less memory
Arrays have a nice syntax for accessing elements (e.g. a[i] vs a.get(i))
Arrays don't play well with generics (e.g. you cannot create a generic array)
Arrays cannot be easly wrapped as ArrayList (e.g. Collections utils like checkedList, synchronizedList and unmodifiableList)
declaring the ArrayList as List you can easly swap implementation with a LinkedList when you need; this imho is the best advantage over plain arrays
Array's toString, equals and hashCode are weird and error-prone, you must use Arrays class utilities
Another couple of points:
You may want to consider using an array to represent more than one dimension (e.g. matrix).
Arrays can be used to store primitives and hence offer a more compact representation of your data than using an ArrayList.
ArrayLists are useful when you don't know in advance the number of elements you will need. Simple Example: you are reading a text file and builing a list of all the words you find. You can just keep adding to your array list, it will grow.
Arrays you need to pre-declare their size.
It's not only about the fact that arrays need to grow, a collection is easier to deal with.
Sometimes arrays are fine, when you just need to iterate over elements, read-only. However, most of the time you want to use methods like contains, etc.
You can't create generic arrays so it 'might' or might not bother you.
When in doubt, use Collections, it will make people that use your API love you :-). If you only provide them with arrays, the first lines of code that they'll write is :
Arrays.asList(thatGuyArray);
The List interface, of which ArrayList is an implementation in the Java Collections Framework is much richer then what a plain Java array has to offer. Due to the relatively widespread support of the collection framework throughout Java and 3rd party libraries, using an ArrayList instead of an array makes sense in general. I'd only use arrays if there is really need for them:
They are required by some other interface I'm calling
Profiling shows a bottleneck in a situation where array access can yield a significant speedup over list access
Situations where an array feels more natural such as buffers of raw data as in
byte[] buffer = new byte[0x400]; // allocate 1k byte buffer
You can always get an array representation of your ArrayList if you need one:
Foo[] bar = fooList.toArray(new Foo[fooList.size()])
It is a common failure pattern that methods return a reference to a private array member (field) of a class. This breaks the class' encapsulation as outsiders gain mutable access to the class' private state. Consequently you would need to always clone the array and return a reference to the cloned array. With an ArrayList you can use...
return Collections.unmodifiableList(privateListMember);
... in order to return a wrapper that protects the actual list object. Of course you need to make sure that the objects in the list are immutable too, but that also holds for a (cloned) array of mutable objects.
As per Nick Holt's comment, you shouldn't expose the fact that a List is an ArrayList anywhere:
private List<Foo> fooList = new ArrayList<Foo>();
public List<Foo> getFooList() {
return Collections.unmodifiableList(fooList);
}
An array has to be declared with a fixed size therefore you need to know the number of elements in advance.
An ArrayList is preferable when you don't know how many elements you will need in advance as it can grow as desired.
An ArrayList may also be preferable if you need to perform operations that are available in its API that would required manual implementation for an array. (e.g. indexOf)
When you want to change its size by adding or removing elements.
When you want to pass it to something that wants a Collection or Iterable (although you can use Arrays.asList(a) to make an array, a, look like a List).
I would say the default presumption should be to use an ArrayList unless you have a specific need, simply because it keeps your code more flexible and less error prone. No need to expand the declaration size when you add an extra element 500 lines of code away, etc. And reference the List interface, so you can replace the Array list with a LinkedList or a CopyOnWriteArrayList or any other list implementation that may help a situation without having to change a lot of code.
That being said, arrays have some properties that you just won't get out of a list. One is a defined size with null elements. This can be useful if you don't want to keep things in a sequential order. For example a tic-tac-toe game.
Arrays can be multi-dimensional. ArrayLists cannot.
Arrays can deal with primitives, something an ArrayList cannot (although there are third party collection classes that wrap primitives, they aren't part of the standard collections API).
G'day,
A couple of points that people seem to have missed so far.
an array can only contain one type of object whereas an ArrayList is a container that can contain a mixture of object types, it's heterogeneous,
an array must declare the type of its contents when the array itself is declared. An ArrayList doesn't have to declare the type of its contents when the ArrayList is declared,
you must insert an item into a specific location in an array. Adding to an ArrayList is done by means of the add() method on the container, and
objects are stored in an array and retain their type because of the way the array can only store objects of a particular type. Objects are stored in an ArrayList by means of the superclass type Object.
Edit: Ooop. Regarding the last point on the list, I forgot the special case where you have an array of Objects then these arrays can also contain any type of object. Thanks for the comment, Yishai! (-:
HTH
cheers,
In Java, we can always use an array to store object reference. Then we have an ArrayList or HashTable which is automatically expandable to store objects. But does anyone know a native way to have an auto-expandable array of object references?
Edit: What I mean is I want to know if the Java API has some class with the ability to store references to objects (but not storing the actual object like XXXList or HashTable do) AND the ability of auto-expansion.
Java arrays are, by their definition, fixed size. If you need auto-growth, you use XXXList classes.
EDIT - question has been clarified a bit
When I was first starting to learn Java (coming from a C and C++ background), this was probably one of the first things that tripped me up. Hopefully I can shed some light.
Unlike C++, Object arrays in Java do not store objects. They store object references.
In C++, if you declared something similar to:
String myStrings[10];
You would get 10 String objects. At this point, it would be perfectly legal to do something like println(myStrings[5].length); - you'd get '0' - the default constructor for String creates an empty string with length 0.
In Java, when you construct a new array, you get an empty container that can hold 10 String references. So the call:
String[] myStrings = new String[10];
println(myStringsp[5].length);
would throw a null pointer exception, because you haven't actually placed a String reference into the array yet.
If you are coming from a C++ background, think of new String[10] as being equivalent to new (String *)[10] from C++.
So, with that in mind, it should be fairly clear why ArrayList is the solution for an auto expanding array of objects (and in fact, ArrayList is implemented using simple arrays, with a growth algorithm built in that allocates new expanded arrays as needed and copies the content from the old to the new).
In practice, there are actually relatively few situations where we use arrays. If you are writing a container (something akin to ArrayList, or a BTree), then they are useful, or if you are doing a lot of low level byte manipulation - but at the level that most development occurs, using one of the Collections classes is by far the preferred technique.
All the classes implementing Collection are expandable and store only references: you don't store objects, you create them in some data space and only manipulate references to them, until they go out of scope without reference on them.
You can put a reference to an object in two or more Collections. That's how you can have sorted hash tables and such...
What do you mean by "native" way? If you want an expandable list f objects then you can use the ArrayList. With List collections you have the get(index) method that allows you to access objects in the list by index which gives you similar functionality to an array. Internally the ArrayList is implemented with an array and the ArrayList handles expanding it automatically for you.
Straight from the Array Java Tutorials on the sun webpage:
-> An array is a container object that holds a fixed number of values of a single type.
Because the size of the array is declared when it is created, there is actually no way to expand it afterwards. The whole purpose of declaring an array of a certain size is to only allocate as much memory as will likely be used when the program is executed. What you could do is declare a second array that is a function based on the size of the original, copy all of the original elements into it, and then add the necessary new elements (although this isn't very 'automatic' :) ). Otherwise, as you and a few others have mentioned, the List Collections is the most efficient way to go.
In Java, all object variables are references. So
Foo myFoo = new Foo();
Foo anotherFoo = myFoo;
means that both variables are referring to the same object, not to two separate copies. Likewise, when you put an object in a Collection, you are only storing a reference to the object. Therefore using ArrayList or similar is the correct way to have an automatically expanding piece of storage.
There's no first-class language construct that does that that I'm aware of, if that's what you're looking for.
It's not very efficient, but if you're just appending to an array, you can use Apache Commons ArrayUtils.add(). It returns a copy of the original array with the additional element in it.
if you can write your code in javascript, yes, you can do that. javascript arrays are sparse arrays. it will expand whichever way you want.
you can write
a[0] = 4;
a[1000] = 434;
a[888] = "a string";