Java: Serializing unknown Arraysize

Java: Serializing unknown Arraysize - java

If I safe an Array and reload it, is there a possibility to get the size if its unknown?
Thanks

What do you mean by "unknown"? You can get the length of any java array with the length field.
int[] myArray = deserializeSomeArray();
int size = myArray.length;

It sounds like you're serializing and storing the individual objects in the array (after much reading between the lines). Use the ObjectOutputStream to store the array itself. If the objects stored in the array are serializable, they'll be stored too. When you deserialize you'll get the entire array back intact.

I think you need to supply some more information. How are you saving the array? Using an ObjectOutputStream?

No because the length of the array is just the size of memory allocated divided by the size of the object stored in it, and since no objects have a size of 0 you will always have a proper length, (which could be 0)

If you use ObjectInputStream.readObject() to read the saved array, it will be reconstituted with the proper length and you can just read the size with array.length.

Attempting to read between the lines...
If you are actually reading array, then (unlike C) all arrays know their length. Java is a safe language, so the length is necessary for bounds checking.
MyType[] things = (MyType[])in.readObject();
int len = things.length;
Perhaps your difficulty is that you are doing custom (de)serialisation and are writing out individual elements of the array (hint: don't - use an array). In the case you need to catch OptionDataException to detect the end of the enclosing object's custom data:
private static final MyType[] NOTHING = new MyType[0];
private transient MyType[] things = NOTHING;
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject(); // Do not forget this call!
for (MyType thing : things) {
out.writeObject(thing);
}
}
private void readObject(
ObjectInputStream in
) throws IOException, ClassNotFoundException {
in.defaultReadObject(); // Do not forget this call!
List<MyType> things = new ArrayList<MyType>();
try {
for (;;) {
things.add((MyType)in.readObject();
}
} catch (OptionalDataException exc) {
// Okay - end of custom data.
}
this.things = things.toArray(NOTHING);
}
If you are going to do that sort of thing, it's much better to write out the number of objects you are going to read as an int before the actual data.

Related

how can i initialize my array when i cant initialize as null?

i have an array of strings which i want to convert to int, pretty simple and straightforward here is the code :
public static void main(String[] args) {
String myarray[]=readfile("[pathtothefile]");
int mynums[] = new int[myarray.length];
for (int i=0;i<myarray.length;i++){
mynums[i]=Integer.parseInt(myarray[i]);
}
System.out.print(Arrays.toString(mynums));
}
But the Problem here is, if i initialize "mynums" like this: mynums[]=null; i get NullPointerException on the following line:
"mynums[i]=Integer.parseInt(myarray[i]);"
what i have to do to solve it is
int mynums[] = new int[myarray.length];
here someone explained why it happens but i dont know how to initialize now! i mean sometimes i dont know how big my array can get and i just want to initialize it. is it even possible?

In Java everything is a pointer behind the scenes. So when you do mynums[]=null, you are pointing to a null. So what is null[i]? That is where your NPE comes from. Alternatively when you point it to an array, then you are actually accessing the i'th element of the array.

You have to first initialize the array because it allocates memory depending on the array size. When you want to add for example an integer to an array it writes the int into previously allocated memory.
The memory size won't grow bigger as you add more items.( Unless you use Lists or Hashmaps, ... but it's not true for generic arrays)
If you don't know how big your array will be, consider using SparseIntArray. which is like Lists and will grow bigger as you add items.

Briefly, in java an array is an object, thus you need to treat it like an object and initialize it prior to doing anything with it.

Here's an idea. When you're initializing something as null, you're simply declaring that it exists. For example ... if I told you that there is a dog, but I told you nothing about it ... I didn't tell you where it was, how tall it was, how old, male/female, etc ... I told you none of its properties or how to access it, and all I told you was that there IS a dog (whose name is Array, for sake of argument), then that would be all you know. There's a dog whose name is Array and that is it.
Typically, arrays are used when the size is already known and generally the data is meant to be immutable. For data that are meant to be changed, you should use things like ArrayList. These are intended to be changed at will; you can add/remove elements at a whim. For more information about ArrayList, read up on the links posted above.
Now, as for your code:
public static void main(String[] args) {
ArrayList<int> myInts = new ArrayList<int>();
// define a new null arraylist of integers.
// I'm going to assume that readfile() is a way for you get the file
// into myarray. I'm not quite sure why you would need the [], but I'll
// leave it.
String myarray[] = readfile("[pathtothefile]");
for (int i = 0; i < myarray.length; i++) {
//adds the value you've specifed as an integer to the arraylist.
myInts.add(Integer.parseInt(myarray[i]));
}
for (int i = 0; i < myInts.size(); i++) {
//print the integers
System.out.print(Integer.toString(myInts.get(i)));
}
}

What if you don't use an array but an ArrayList? It grows dynamically as you add elements.

Pass zero-sized array, save allocation?

In this code sample from page 114 of The Well-Grounded Java Developer, the last line:
Update[] updates = lu.toArray(new Update[0]);
contains the note: Pass zero-sized array, save allocation
List<Update> lu = new ArrayList<Update>();
String text = "";
final Update.Builder ub = new Update.Builder();
final Author a = new Author("Tallulah");
for (int i=0; i<256; i++) {
text = text + "X";
long now = System.currentTimeMillis();
lu.add(ub.author(a).updateText(text).createTime(now).build());
try {
Thread.sleep(1);
} catch (InterruptedException e) {
}
}
Collections.shuffle(lu);
Update[] updates = lu.toArray(new Update[0]);
What allocation is this saving, exactly?
The javadoc for List#toArray(T[] a) mentions:
If the list fits in the specified array, it is returned therein.
Otherwise, a new array is allocated with the runtime type of the
specified array and the size of this list.
Which is what I remembered: if the array you pass to toArray(T[] a) can't fit everything in the list, a new array is allocated. Plainly, there are 256 elements in the list, which cannot fit in an array of size 0, therefore a new array must be allocated inside the method, right?
So is that note incorrect? Or is there something else it means?

Plainly, there are 256 elements in the list, which cannot fit in an array of size 0, therefore a new array must be allocated inside the method, right?
yes.
You can use
private static final Update NO_UPDATES = { }
lu.toArray(NO_UPDATES);
however this will should only help if you expect the list to be typically 0 length.
Generally, I would the same approach as fge
lu.toArray(new Update[lu.size()]);
In your specific case you know the size in advance so you can do
Update[] updates = new Update[256];
String text = "";
final Update.Builder ub = new Update.Builder();
final Author a = new Author("Tallulah");
long now = System.currentTimeMillis();
for (int i=0; i<updates.length; i++)
updates[i] = ub.author(a).updateText(text += 'X').createTime(now++).build();
Collections.shuffle(Arrays.asList(updates));

Going off of #Andreas comment on the question, I think it is a typo, and should say:
Pass zero-sized array, safe allocation.
Because if you passed nothing to the method, you'll end up calling the List#toArray() no-argument overload!
This would return an Object[] (though it would contain nothing but Update instances) and would require changing the type of the updates variable, so the last line would become:
Object[] updates = lu.toArray();
And then every time you wanted to iterate over and use the elements in that array, you'd have to cast them to Update.
Supplying the array calls the List#toArray(T[] a) method, which returns a <T> T[]. This array is reified to know it is an array of Update instances.
So supplying an empty array of Updates results in an Update[] coming back from the toArray call, not an Object[]. This is a much more type-safe allocation! The word "save" in the note must be a typo!
...this consumed way too much mental effort. Will post link to this in the book's forums so they can correct it.

It saves allocation, comparing to toArray(new Update[255]) or toArray(new Update[1000])

Serializable, cloneable and memory use in Java

I am using an inner class that is a subclass of a HashMap. I have a String as the key and double[] as the values. I store about 200 doubles per double[]. I should be using around 700 MB to store the keys, the pointers and the doubles. However, memory analysis reveals that I need a lot more than that (a little over 2 GB).
Using TIJmp (profiling tool) I saw there was a char[] that was using almost half of the total memory. TIJmp said that char[] came from Serializable and Cloneable. The values in it ranged from a list of fonts and default paths to messages and single characters.
What is the exact behavior of Serializable in the JVM? Is it keeping a "persistent" copy at all times thus, doubling the size of my memory footprint? How can I write binary copies of an object at runtime without turning the JVM into a memory hog?
PS: The method where the memory consumption increases the most is the one below. The file has around 229,000 lines and 202 fields per line.
public void readThetas(String filename) throws Exception
{
long t1 = System.currentTimeMillis();
documents = new HashMapX<String,double[]>(); //Document names to indices.
Scanner s = new Scanner(new File(filename));
int docIndex = 0;
if (s.hasNextLine())
System.out.println(s.nextLine()); // Consume useless first line :)
while(s.hasNextLine())
{
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
numTopics = fields.length/2-1;
double[] thetas = new double[numTopics];
for (int i=2;i<numTopics;i=i+2)
thetas[Integer.valueOf(fields[i].trim())] = Double.valueOf(fields[i+1].trim());
documents.put(docName,thetas);
docIndex++;
if (docIndex%10000==0)
System.out.print("*"); //progress bar ;)
}
s.close();
long t2 = System.currentTimeMillis();
System.out.println("\nRead file in "+ (t2-t1) +" ms");
}
Oh!, and HashMapX is an inner class declared like this:
public static class HashMapX< K, V> extends HashMap<K,V> {
public V get(Object key, V altVal) {
if (this.containsKey(key))
return this.get(key);
else
return altVal;
}
}

This may not address all of your questions, but is a way in which serialization can significantly increase memory usage: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#OutOfMemoryError.
In short, if you keep an ObjectOutputStream open then none of the objects that have been written to it can be garbage-collected unless you explicitly call its reset() method.

So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.
This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.
The core of the problem is here:
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
and I propagate it here:
documents.put(docName,thetas);
What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:
String docName = new String(fields[1]); // A copy, not a reference.
Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.
I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.
Thanks everybody for their comments. They guided me in the right direction.

Java: Setting array length for an unknown number of entries

I am trying to fill a RealVector (from Apache Commons Math) with values. I tried using the class's append method, but that didn't actually add anything. So now I'm using a double[], which works fine, except I don't know in advance how big the array needs to be.
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
double[] data = new double[100]; // TODO: bad.
int i = 0;
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
data[i++] = attr * m.getAnnotation(Analyze.class).weight();
}
}
weightedAnalysis = new ArrayRealVector(data);
}
How can I deal with this issue? Here are my ideas:
Iterate through the class and count the methods with the annotation, then use that size to initialize the array. However, this will require an extra loop, and reflection is performance-intensive. (right?)
Pick an arbitrary size for the array, doubling it if space runs out. Downside: requires more lines of code
Use a List<Double>, then somehow weasel the Double objects back into doubles so they can be put in the RealVector. Uses more memory for the list.
Just pick a huge size for the starting array, and hope that it never overflows. Downside: this is begging for arrayindexoutofbound errors.
Or am I just using append(double d) wrong?
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
weightedAnalysis = new ArrayRealVector(data);
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
weightedAnalysis.append(attr * m.getAnnotation(Analyze.class).weight());
}
}
}

RealVector.append() doesn't modify the vector, but rather constructs a new vector:
The [Java doc of RealVector.append()](http://commons.apache.org/math/apidocs/org/apache/commons/math/linear/RealVector.html#append(double)) explains:
append
RealVector append(double d)
Construct a vector by appending a double to this vector.
Parameters:
d - double to append.
Returns:
a new vector
Please note that using RealVector to construct the vector is quite an expensive operation, as append() would need to copy the elements over and over (i.e. constructing the array in the way you explained runs in O(n^2) time.
I would recommend simply using java's ArrayList<Double> during construction, and then simply converting to RealVector or any other data abstraction you like.

Why not use an ArrayList and add the elements to that?

I would suggest 3 as a good option. Using Double vs double is a minimal problem since autoboxing was introduced.

Using RealVector will take a huge amount of memory and computation time to build, because what you want is:
RealVector newVector = oldVector.append(d);
append() returns a newly constructed object, which is what you'd want for correctness.
If you're okay with heavy overhead on build, take a look at Apache Commons ArrayUtils, specifically add(double) and/or toPrimitive(Double).

You mentioned that you tried the append method, but that didn't actually add anything. After looking at the javadoc, make sure that you assign the result of the append method back to the original value...You probably already tried this, but just in case you overlooked:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector = myRealVector.append(1.0);
in other words, this won't change myRealVector:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector.append(1.0);

you could initialize the array using the
ParseTree.class.getMethods().lenght
as initial capacity:
double[] buf = new double[ ParseTree.class.getMethods().lenght ];
or better
DoubleBuffer buf = DoubleBuffer.allocate([ ParseTree.class.getMethods().lenght);
this may waste some memory but is a safe solution, it depends on how many hit the if inside the loop has.
if you prefer you may count how many methods are annotated in advance and then allocate the exact size for the array

Grab a segment of an array in Java without creating a new array on heap

I'm looking for a method in Java that will return a segment of an array. An example would be to get the byte array containing the 4th and 5th bytes of a byte array. I don't want to have to create a new byte array in the heap memory just to do that. Right now I have the following code:
doSomethingWithTwoBytes(byte[] twoByteArray);
void someMethod(byte[] bigArray)
{
byte[] x = {bigArray[4], bigArray[5]};
doSomethingWithTwoBytes(x);
}
I'd like to know if there was a way to just do doSomething(bigArray.getSubArray(4, 2)) where 4 is the offset and 2 is the length, for example.

Disclaimer: This answer does not conform to the constraints of the question:
I don't want to have to create a new byte array in the heap memory just to do that.
(Honestly, I feel my answer is worthy of deletion. The answer by #unique72 is correct. Imma let this edit sit for a bit and then I shall delete this answer.)
I don't know of a way to do this directly with arrays without additional heap allocation, but the other answers using a sub-list wrapper have additional allocation for the wrapper only – but not the array – which would be useful in the case of a large array.
That said, if one is looking for brevity, the utility method Arrays.copyOfRange() was introduced in Java 6 (late 2006?):
byte [] a = new byte [] {0, 1, 2, 3, 4, 5, 6, 7};
// get a[4], a[5]
byte [] subArray = Arrays.copyOfRange(a, 4, 6);

Arrays.asList(myArray) delegates to new ArrayList(myArray), which doesn't copy the array but just stores the reference. Using List.subList(start, end) after that makes a SubList which just references the original list (which still just references the array). No copying of the array or its contents, just wrapper creation, and all lists involved are backed by the original array. (I thought it'd be heavier.)

If you're seeking a pointer style aliasing approach, so that you don't even need to allocate space and copy the data then I believe you're out of luck.
System.arraycopy() will copy from your source to destination, and efficiency is claimed for this utility. You do need to allocate the destination array.

One way is to wrap the array in java.nio.ByteBuffer, use the absolute put/get functions, and slice the buffer to work on a subarray.
For instance:
doSomething(ByteBuffer twoBytes) {
byte b1 = twoBytes.get(0);
byte b2 = twoBytes.get(1);
...
}
void someMethod(byte[] bigArray) {
int offset = 4;
int length = 2;
doSomething(ByteBuffer.wrap(bigArray, offset, length).slice());
}
Note that you have to call both wrap() and slice(), since wrap() by itself only affects the relative put/get functions, not the absolute ones.
ByteBuffer can be a bit tricky to understand, but is most likely efficiently implemented, and well worth learning.

Use java.nio.Buffer's. It's a lightweight wrapper for buffers of various primitive types and helps manage slicing, position, conversion, byte ordering, etc.
If your bytes originate from a Stream, the NIO Buffers can use "direct mode" which creates a buffer backed by native resources. This can improve performance in a lot of cases.

You could use the ArrayUtils.subarray in apache commons. Not perfect but a bit more intuitive than System.arraycopy. The downside is that it does introduce another dependency into your code.

I see the subList answer is already here, but here's code that demonstrates that it's a true sublist, not a copy:
public class SubListTest extends TestCase {
public void testSubarray() throws Exception {
Integer[] array = {1, 2, 3, 4, 5};
List<Integer> list = Arrays.asList(array);
List<Integer> subList = list.subList(2, 4);
assertEquals(2, subList.size());
assertEquals((Integer) 3, subList.get(0));
list.set(2, 7);
assertEquals((Integer) 7, subList.get(0));
}
}
I don't believe there's a good way to do this directly with arrays, however.

List.subList(int startIndex, int endIndex)

The Lists allow you to use and work with subList of something transparently. Primitive arrays would require you to keep track of some kind of offset - limit. ByteBuffers have similar options as I heard.
Edit:
If you are in charge of the useful method, you could just define it with bounds (as done in many array related methods in java itself:
doUseful(byte[] arr, int start, int len) {
// implementation here
}
doUseful(byte[] arr) {
doUseful(arr, 0, arr.length);
}
It's not clear, however, if you work on the array elements themselves, e.g. you compute something and write back the result?

One option would be to pass the whole array and the start and end indices, and iterate between those instead of iterating over the whole array passed.
void method1(byte[] array) {
method2(array,4,5);
}
void method2(byte[] smallarray,int start,int end) {
for ( int i = start; i <= end; i++ ) {
....
}
}

Java references always point to an object. The object has a header that amongst other things identifies the concrete type (so casts can fail with ClassCastException). For arrays, the start of the object also includes the length, the data then follows immediately after in memory (technically an implementation is free to do what it pleases, but it would be daft to do anything else). So, you can;t have a reference that points somewhere into an array.
In C pointers point anywhere and to anything, and you can point to the middle of an array. But you can't safely cast or find out how long the array is. In D the pointer contains an offset into the memory block and length (or equivalently a pointer to the end, I can't remember what the implementation actually does). This allows D to slice arrays. In C++ you would have two iterators pointing to the start and end, but C++ is a bit odd like that.
So getting back to Java, no you can't. As mentioned, NIO ByteBuffer allows you to wrap an array and then slice it, but gives an awkward interface. You can of course copy, which is probably very much faster than you would think. You could introduce your own String-like abstraction that allows you to slice an array (the current Sun implementation of String has a char[] reference plus a start offset and length, higher performance implementation just have the char[]). byte[] is low level, but any class-based abstraction you put on that is going to make an awful mess of the syntax, until JDK7 (perhaps).

#unique72 answer as a simple function or line, you may need to replace Object, with the respective class type you wish to 'slice'. Two variants are given to suit various needs.
/// Extract out array from starting position onwards
public static Object[] sliceArray( Object[] inArr, int startPos ) {
return Arrays.asList(inArr).subList(startPos, inArr.length).toArray();
}
/// Extract out array from starting position to ending position
public static Object[] sliceArray( Object[] inArr, int startPos, int endPos ) {
return Arrays.asList(inArr).subList(startPos, endPos).toArray();
}

How about a thin List wrapper?
List<Byte> getSubArrayList(byte[] array, int offset, int size) {
return new AbstractList<Byte>() {
Byte get(int index) {
if (index < 0 || index >= size)
throw new IndexOutOfBoundsException();
return array[offset+index];
}
int size() {
return size;
}
};
}
(Untested)

I needed to iterate through the end of an array and didn't want to copy the array. My approach was to make an Iterable over the array.
public static Iterable<String> sliceArray(final String[] array,
final int start) {
return new Iterable<String>() {
String[] values = array;
int posn = start;
#Override
public Iterator<String> iterator() {
return new Iterator<String>() {
#Override
public boolean hasNext() {
return posn < values.length;
}
#Override
public String next() {
return values[posn++];
}
#Override
public void remove() {
throw new UnsupportedOperationException("No remove");
}
};
}
};
}

This is a little more lightweight than Arrays.copyOfRange - no range or negative
public static final byte[] copy(byte[] data, int pos, int length )
{
byte[] transplant = new byte[length];
System.arraycopy(data, pos, transplant, 0, length);
return transplant;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Serializing unknown Arraysize - java

If I safe an Array and reload it, is there a possibility to get the size if its unknown? Thanks

What do you mean by "unknown"? You can get the length of any java array with the length field. int[] myArray = deserializeSomeArray(); int size = myArray.length;

I think you need to supply some more information. How are you saving the array? Using an ObjectOutputStream?

No because the length of the array is just the size of memory allocated divided by the size of the object stored in it, and since no objects have a size of 0 you will always have a proper length, (which could be 0)

If you use ObjectInputStream.readObject() to read the saved array, it will be reconstituted with the proper length and you can just read the size with array.length.

Related

how can i initialize my array when i cant initialize as null?

Pass zero-sized array, save allocation?

Serializable, cloneable and memory use in Java

Java: Setting array length for an unknown number of entries

Grab a segment of an array in Java without creating a new array on heap

Categories

Resources