Java: Setting array length for an unknown number of entries - java

I am trying to fill a RealVector (from Apache Commons Math) with values. I tried using the class's append method, but that didn't actually add anything. So now I'm using a double[], which works fine, except I don't know in advance how big the array needs to be.
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
double[] data = new double[100]; // TODO: bad.
int i = 0;
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
data[i++] = attr * m.getAnnotation(Analyze.class).weight();
}
}
weightedAnalysis = new ArrayRealVector(data);
}
How can I deal with this issue? Here are my ideas:
Iterate through the class and count the methods with the annotation, then use that size to initialize the array. However, this will require an extra loop, and reflection is performance-intensive. (right?)
Pick an arbitrary size for the array, doubling it if space runs out. Downside: requires more lines of code
Use a List<Double>, then somehow weasel the Double objects back into doubles so they can be put in the RealVector. Uses more memory for the list.
Just pick a huge size for the starting array, and hope that it never overflows. Downside: this is begging for arrayindexoutofbound errors.
Or am I just using append(double d) wrong?
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
weightedAnalysis = new ArrayRealVector(data);
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
weightedAnalysis.append(attr * m.getAnnotation(Analyze.class).weight());
}
}
}

RealVector.append() doesn't modify the vector, but rather constructs a new vector:
The [Java doc of RealVector.append()](http://commons.apache.org/math/apidocs/org/apache/commons/math/linear/RealVector.html#append(double)) explains:
append
RealVector append(double d)
Construct a vector by appending a double to this vector.
Parameters:
d - double to append.
Returns:
a new vector
Please note that using RealVector to construct the vector is quite an expensive operation, as append() would need to copy the elements over and over (i.e. constructing the array in the way you explained runs in O(n^2) time.
I would recommend simply using java's ArrayList<Double> during construction, and then simply converting to RealVector or any other data abstraction you like.

Why not use an ArrayList and add the elements to that?

I would suggest 3 as a good option. Using Double vs double is a minimal problem since autoboxing was introduced.

Using RealVector will take a huge amount of memory and computation time to build, because what you want is:
RealVector newVector = oldVector.append(d);
append() returns a newly constructed object, which is what you'd want for correctness.
If you're okay with heavy overhead on build, take a look at Apache Commons ArrayUtils, specifically add(double) and/or toPrimitive(Double).

You mentioned that you tried the append method, but that didn't actually add anything. After looking at the javadoc, make sure that you assign the result of the append method back to the original value...You probably already tried this, but just in case you overlooked:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector = myRealVector.append(1.0);
in other words, this won't change myRealVector:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector.append(1.0);

you could initialize the array using the
ParseTree.class.getMethods().lenght
as initial capacity:
double[] buf = new double[ ParseTree.class.getMethods().lenght ];
or better
DoubleBuffer buf = DoubleBuffer.allocate([ ParseTree.class.getMethods().lenght);
this may waste some memory but is a safe solution, it depends on how many hit the if inside the loop has.
if you prefer you may count how many methods are annotated in advance and then allocate the exact size for the array

Related

how can i initialize my array when i cant initialize as null?

i have an array of strings which i want to convert to int, pretty simple and straightforward here is the code :
public static void main(String[] args) {
String myarray[]=readfile("[pathtothefile]");
int mynums[] = new int[myarray.length];
for (int i=0;i<myarray.length;i++){
mynums[i]=Integer.parseInt(myarray[i]);
}
System.out.print(Arrays.toString(mynums));
}
But the Problem here is, if i initialize "mynums" like this: mynums[]=null; i get NullPointerException on the following line:
"mynums[i]=Integer.parseInt(myarray[i]);"
what i have to do to solve it is
int mynums[] = new int[myarray.length];
here someone explained why it happens but i dont know how to initialize now! i mean sometimes i dont know how big my array can get and i just want to initialize it. is it even possible?
In Java everything is a pointer behind the scenes. So when you do mynums[]=null, you are pointing to a null. So what is null[i]? That is where your NPE comes from. Alternatively when you point it to an array, then you are actually accessing the i'th element of the array.
You have to first initialize the array because it allocates memory depending on the array size. When you want to add for example an integer to an array it writes the int into previously allocated memory.
The memory size won't grow bigger as you add more items.( Unless you use Lists or Hashmaps, ... but it's not true for generic arrays)
If you don't know how big your array will be, consider using SparseIntArray. which is like Lists and will grow bigger as you add items.
Briefly, in java an array is an object, thus you need to treat it like an object and initialize it prior to doing anything with it.
Here's an idea. When you're initializing something as null, you're simply declaring that it exists. For example ... if I told you that there is a dog, but I told you nothing about it ... I didn't tell you where it was, how tall it was, how old, male/female, etc ... I told you none of its properties or how to access it, and all I told you was that there IS a dog (whose name is Array, for sake of argument), then that would be all you know. There's a dog whose name is Array and that is it.
Typically, arrays are used when the size is already known and generally the data is meant to be immutable. For data that are meant to be changed, you should use things like ArrayList. These are intended to be changed at will; you can add/remove elements at a whim. For more information about ArrayList, read up on the links posted above.
Now, as for your code:
public static void main(String[] args) {
ArrayList<int> myInts = new ArrayList<int>();
// define a new null arraylist of integers.
// I'm going to assume that readfile() is a way for you get the file
// into myarray. I'm not quite sure why you would need the [], but I'll
// leave it.
String myarray[] = readfile("[pathtothefile]");
for (int i = 0; i < myarray.length; i++) {
//adds the value you've specifed as an integer to the arraylist.
myInts.add(Integer.parseInt(myarray[i]));
}
for (int i = 0; i < myInts.size(); i++) {
//print the integers
System.out.print(Integer.toString(myInts.get(i)));
}
}
What if you don't use an array but an ArrayList? It grows dynamically as you add elements.

Pass zero-sized array, save allocation?

In this code sample from page 114 of The Well-Grounded Java Developer, the last line:
Update[] updates = lu.toArray(new Update[0]);
contains the note: Pass zero-sized array, save allocation
List<Update> lu = new ArrayList<Update>();
String text = "";
final Update.Builder ub = new Update.Builder();
final Author a = new Author("Tallulah");
for (int i=0; i<256; i++) {
text = text + "X";
long now = System.currentTimeMillis();
lu.add(ub.author(a).updateText(text).createTime(now).build());
try {
Thread.sleep(1);
} catch (InterruptedException e) {
}
}
Collections.shuffle(lu);
Update[] updates = lu.toArray(new Update[0]);
What allocation is this saving, exactly?
The javadoc for List#toArray(T[] a) mentions:
If the list fits in the specified array, it is returned therein.
Otherwise, a new array is allocated with the runtime type of the
specified array and the size of this list.
Which is what I remembered: if the array you pass to toArray(T[] a) can't fit everything in the list, a new array is allocated. Plainly, there are 256 elements in the list, which cannot fit in an array of size 0, therefore a new array must be allocated inside the method, right?
So is that note incorrect? Or is there something else it means?
Plainly, there are 256 elements in the list, which cannot fit in an array of size 0, therefore a new array must be allocated inside the method, right?
yes.
You can use
private static final Update NO_UPDATES = { }
lu.toArray(NO_UPDATES);
however this will should only help if you expect the list to be typically 0 length.
Generally, I would the same approach as fge
lu.toArray(new Update[lu.size()]);
In your specific case you know the size in advance so you can do
Update[] updates = new Update[256];
String text = "";
final Update.Builder ub = new Update.Builder();
final Author a = new Author("Tallulah");
long now = System.currentTimeMillis();
for (int i=0; i<updates.length; i++)
updates[i] = ub.author(a).updateText(text += 'X').createTime(now++).build();
Collections.shuffle(Arrays.asList(updates));
Going off of #Andreas comment on the question, I think it is a typo, and should say:
Pass zero-sized array, safe allocation.
Because if you passed nothing to the method, you'll end up calling the List#toArray() no-argument overload!
This would return an Object[] (though it would contain nothing but Update instances) and would require changing the type of the updates variable, so the last line would become:
Object[] updates = lu.toArray();
And then every time you wanted to iterate over and use the elements in that array, you'd have to cast them to Update.
Supplying the array calls the List#toArray(T[] a) method, which returns a <T> T[]. This array is reified to know it is an array of Update instances.
So supplying an empty array of Updates results in an Update[] coming back from the toArray call, not an Object[]. This is a much more type-safe allocation! The word "save" in the note must be a typo!
...this consumed way too much mental effort. Will post link to this in the book's forums so they can correct it.
It saves allocation, comparing to toArray(new Update[255]) or toArray(new Update[1000])

How to get an objects string and add this to a string array

I have an ArrayList of my own class Case. The class case provides the method getCaseNumber() I want to add all of the cases casenumber to a String[] caseNumber. I've tried this
public String[] getCaseNumberToTempList(ArrayList<Case> caseList) {
String[] objectCaseNumber = null;
for(int i = 0; i < caseList.size(); i++) {
objectCaseNumber[i] = caseList.get(i).getCaseNumber();
}
return objectCaseNumber;
}
But my compiler complaints about that the objectCaseNumber is null at the point insid the for-loop. How can I manage to complete this?
Well, you need to create an array to start with, and initialize the variable with a reference to the array. (See the Java tutorial for arrays for more information.) For example:
String[] objectCaseNumber = new String[caseList.size()];
Alternatively, build a List<String> (e.g. using ArrayList) instead. That's more flexible - in this case it's simple as you know the size up front, but in other cases being able to just add to a list makes life a lot simpler.
In idiomatic Java, you wouldn't use ArrayList as a parameter type. Use List.
Slightly more overhead, but simpler and more readable code is to accumulate in another List and then convert into an arrray:
public String[] getCaseNumberToTempList(List<Case> caseList) {
final List<String> r = new ArrayList<String>();
for (Case c : caseList) r.add(c.getCaseNumber());
return r.toArray(new Case[0]);
}
In your code it does make sense to insist on ArrayList due to performance implications of random access via get, but if you use this kind of code (and I suggest making a habit of it), then you can work with any List with the same results.
Well, as I think you may have misunderstood Arrays as a primitive type. Arrays in java are objects and they need to be initialized before you access it.

Serializable, cloneable and memory use in Java

I am using an inner class that is a subclass of a HashMap. I have a String as the key and double[] as the values. I store about 200 doubles per double[]. I should be using around 700 MB to store the keys, the pointers and the doubles. However, memory analysis reveals that I need a lot more than that (a little over 2 GB).
Using TIJmp (profiling tool) I saw there was a char[] that was using almost half of the total memory. TIJmp said that char[] came from Serializable and Cloneable. The values in it ranged from a list of fonts and default paths to messages and single characters.
What is the exact behavior of Serializable in the JVM? Is it keeping a "persistent" copy at all times thus, doubling the size of my memory footprint? How can I write binary copies of an object at runtime without turning the JVM into a memory hog?
PS: The method where the memory consumption increases the most is the one below. The file has around 229,000 lines and 202 fields per line.
public void readThetas(String filename) throws Exception
{
long t1 = System.currentTimeMillis();
documents = new HashMapX<String,double[]>(); //Document names to indices.
Scanner s = new Scanner(new File(filename));
int docIndex = 0;
if (s.hasNextLine())
System.out.println(s.nextLine()); // Consume useless first line :)
while(s.hasNextLine())
{
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
numTopics = fields.length/2-1;
double[] thetas = new double[numTopics];
for (int i=2;i<numTopics;i=i+2)
thetas[Integer.valueOf(fields[i].trim())] = Double.valueOf(fields[i+1].trim());
documents.put(docName,thetas);
docIndex++;
if (docIndex%10000==0)
System.out.print("*"); //progress bar ;)
}
s.close();
long t2 = System.currentTimeMillis();
System.out.println("\nRead file in "+ (t2-t1) +" ms");
}
Oh!, and HashMapX is an inner class declared like this:
public static class HashMapX< K, V> extends HashMap<K,V> {
public V get(Object key, V altVal) {
if (this.containsKey(key))
return this.get(key);
else
return altVal;
}
}
This may not address all of your questions, but is a way in which serialization can significantly increase memory usage: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#OutOfMemoryError.
In short, if you keep an ObjectOutputStream open then none of the objects that have been written to it can be garbage-collected unless you explicitly call its reset() method.
So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.
This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.
The core of the problem is here:
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
and I propagate it here:
documents.put(docName,thetas);
What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:
String docName = new String(fields[1]); // A copy, not a reference.
Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.
I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.
Thanks everybody for their comments. They guided me in the right direction.

Java: Serializing unknown Arraysize

If I safe an Array and reload it, is there a possibility to get the size if its unknown?
Thanks
What do you mean by "unknown"? You can get the length of any java array with the length field.
int[] myArray = deserializeSomeArray();
int size = myArray.length;
It sounds like you're serializing and storing the individual objects in the array (after much reading between the lines). Use the ObjectOutputStream to store the array itself. If the objects stored in the array are serializable, they'll be stored too. When you deserialize you'll get the entire array back intact.
I think you need to supply some more information. How are you saving the array? Using an ObjectOutputStream?
No because the length of the array is just the size of memory allocated divided by the size of the object stored in it, and since no objects have a size of 0 you will always have a proper length, (which could be 0)
If you use ObjectInputStream.readObject() to read the saved array, it will be reconstituted with the proper length and you can just read the size with array.length.
Attempting to read between the lines...
If you are actually reading array, then (unlike C) all arrays know their length. Java is a safe language, so the length is necessary for bounds checking.
MyType[] things = (MyType[])in.readObject();
int len = things.length;
Perhaps your difficulty is that you are doing custom (de)serialisation and are writing out individual elements of the array (hint: don't - use an array). In the case you need to catch OptionDataException to detect the end of the enclosing object's custom data:
private static final MyType[] NOTHING = new MyType[0];
private transient MyType[] things = NOTHING;
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject(); // Do not forget this call!
for (MyType thing : things) {
out.writeObject(thing);
}
}
private void readObject(
ObjectInputStream in
) throws IOException, ClassNotFoundException {
in.defaultReadObject(); // Do not forget this call!
List<MyType> things = new ArrayList<MyType>();
try {
for (;;) {
things.add((MyType)in.readObject();
}
} catch (OptionalDataException exc) {
// Okay - end of custom data.
}
this.things = things.toArray(NOTHING);
}
If you are going to do that sort of thing, it's much better to write out the number of objects you are going to read as an int before the actual data.

Categories