Serializable, cloneable and memory use in Java - java

I am using an inner class that is a subclass of a HashMap. I have a String as the key and double[] as the values. I store about 200 doubles per double[]. I should be using around 700 MB to store the keys, the pointers and the doubles. However, memory analysis reveals that I need a lot more than that (a little over 2 GB).
Using TIJmp (profiling tool) I saw there was a char[] that was using almost half of the total memory. TIJmp said that char[] came from Serializable and Cloneable. The values in it ranged from a list of fonts and default paths to messages and single characters.
What is the exact behavior of Serializable in the JVM? Is it keeping a "persistent" copy at all times thus, doubling the size of my memory footprint? How can I write binary copies of an object at runtime without turning the JVM into a memory hog?
PS: The method where the memory consumption increases the most is the one below. The file has around 229,000 lines and 202 fields per line.
public void readThetas(String filename) throws Exception
{
long t1 = System.currentTimeMillis();
documents = new HashMapX<String,double[]>(); //Document names to indices.
Scanner s = new Scanner(new File(filename));
int docIndex = 0;
if (s.hasNextLine())
System.out.println(s.nextLine()); // Consume useless first line :)
while(s.hasNextLine())
{
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
numTopics = fields.length/2-1;
double[] thetas = new double[numTopics];
for (int i=2;i<numTopics;i=i+2)
thetas[Integer.valueOf(fields[i].trim())] = Double.valueOf(fields[i+1].trim());
documents.put(docName,thetas);
docIndex++;
if (docIndex%10000==0)
System.out.print("*"); //progress bar ;)
}
s.close();
long t2 = System.currentTimeMillis();
System.out.println("\nRead file in "+ (t2-t1) +" ms");
}
Oh!, and HashMapX is an inner class declared like this:
public static class HashMapX< K, V> extends HashMap<K,V> {
public V get(Object key, V altVal) {
if (this.containsKey(key))
return this.get(key);
else
return altVal;
}
}

This may not address all of your questions, but is a way in which serialization can significantly increase memory usage: http://java.sun.com/javase/technologies/core/basic/serializationFAQ.jsp#OutOfMemoryError.
In short, if you keep an ObjectOutputStream open then none of the objects that have been written to it can be garbage-collected unless you explicitly call its reset() method.

So, I found the answer. It is a memory leak in my code. Had nothing to do with Serializable or Cloneable.
This code is trying to parse a file. Each line contains a set of values which I am trying to extract. Then, I keep some of those values and store them in a HashMapX or some other structure.
The core of the problem is here:
String[] fields = s.nextLine().split("\\s+");
String docName = fields[1];
and I propagate it here:
documents.put(docName,thetas);
What happens is that docName is a reference to an element in an array (fields) and I am keeping that reference for the life of the program (by storing it in the global HashMap documents). As long as I keep that reference alive, the whole String[] fields cannot be garbage collected. The solution:
String docName = new String(fields[1]); // A copy, not a reference.
Thus copying the object and releasing the reference to the array element. In this way, the garbage collector can free the memory used by the array once I process every field.
I hope this will be useful to all of those who parse large text files using split and store some of the fields in global variables.
Thanks everybody for their comments. They guided me in the right direction.

Related

How does the method assigns the array?

I am confused about how the array was assigned to any data, as the method meant to be a self contain
or I haven't understood a fundamental concept
// Craft stall stock and till program
import java.util.Scanner;
public class revisonTest {
public static void main(String[] args) // where program exicutes
{
final int numOFitems = 50;
String[] item = new String[numOFitems];
int [] broughtItem = new int[numOFitems];
int[] costItem = new int[numOFitems];
int COUNT = getDetail(item,broughtItem,costItem);
System.out.println(item[0]);
}
public static int getDetail(String[] name,int[] quantities,int[]cost)
{
int count =1;
int arrayIndex =0;
String answer = "";
while(!(answer.equals("Exit")))
{
answer = userInput("Item"+count+": ");
if(!(answer.equals("Exit")))
{
name[arrayIndex] = answer;
quantities[arrayIndex] = Integer.parseInt(userInput("How many "+name[arrayIndex]+" have you brought? "));
cost[arrayIndex] = Integer.parseInt(userInput("How much does a "+name[arrayIndex]+" cost? "));
count++;
arrayIndex++;
}
}
return count;
}
public static String userInput(String question)
{
Scanner sc = new Scanner(System.in);
System.out.println(question);
return sc.nextLine();
}
}
String[] item = new String[numOFitems];
This first makes a new treasure map named 'item'.
This makes a new treasurechest capable of containing numOFitems treasuremaps, and buries it in the sand. It is then filled with that many blank maps that lead to no treasure.
This updates your item treasuremap to point at this treasurechest-containing-maps.
getDetail(item,broughtItem,costItem);
This takes your treasuremap to the treasure-of-maps and makes a copy of it, and then hands the copy to the getDetail method. Your copy is unmodified and cannot be modified by getDetail... but that's just your copy of the treasure MAP, not the treasure. Note that getDetail calls this copy name and not item - which it is free to do.
(in getDetail) name[arrayIndex] = answer;
This is getDetail taking its name treasuremap (which is a copy of main's item map), follows the map, gets a shovel out, digs down, finds the treasure, opens it, finds the arrayIndexth map in it, pulls it up, and copies its answer map onto it.
Thus.. when main follows its copy of its map to the same treasure, same thing happens.
Of course, in java we use different jargon.
'treasure' -> 'object'
'treasuremap' -> 'reference'
'follow the map, dig down, open treasure' -> 'dereference'.
'create treasure' -> 'instantiate an object'
There are two different concepts here:
Allocating an array and assigning an array reference to a variable, and
Assigning values to elements in the array
In main, the new operation creates an array of a certain size, and assigns a reference to that array to the variable named item.
The call of getDetail(item,...) makes a copy of that reference (not the array itself) available to the method. Inside getDetail, this reference is stored in what is effectively a local variable, named name.
The loop inside getDetail is collecting answers (which are actually String references) and storing them in successive elements of the array that it knows as name and which the caller knows as item.
name[arrayIndex] = answer;
(Similarly for the other two arrays, of course)
In summary, getDetail is provided with an existing array, into which it writes values.
Incidentally, if the user types too many answers (more than name.length) you'll run off the end of the array, and get an 'index out of bounds' exception.
A String in java is considered a non-primitive data type. So when you created your item array using:
String[] item = new String[numOFitems];
You actually created an empty array of String objects. Based on your code the array has 50 empty spaces where you can store data.
The next part of your code is designed to get input from the user and fill those arrays:
int COUNT = getDetail(item,broughtItem,costItem);
Note: getDetail() never returns the item[] array, so how do you access the data?
When you pass your item array as an argument to the getDetail() method, you are actually passing that array as a reference.
In Java, non-primitive data types are passed as reference. This means that instead of sending the data to the getDetail() method, your actually sending information about where the data is located in memory.
Within your getDetail() method you can manipulate the data and the changes will be reflected on the original array without having to return it.
That is the reason why your print statement shows data in the array:
System.out.println(item[0]);
Any changes made within the getDetail() method, to the array, automatically appear on the original data source.

Using variables instead of a long statement?

I am facing a confusion while working with objects. I searched google but couldn't find actual words to search for. The question is:
I am working with objects which consist some other object. For example:
public void mapObjects(A a, B b) {
a.setWeight(BigDecimal.valueOf(b.getWeight));
//Now my doubt lies here
if (a.getCharges.getDiscounts.getDiscountList != null) {
for(int i = 0; i < a.getCharges.getDiscounts.getDiscountList.size(); i++){
b.getList().get(0).setDiscountValue(a.getCharges.getDiscounts.getDiscountList.get(i).getValue());
b.getList().get(0).setDiscountName(a.getCharges.getDiscounts.getDiscountList.get(i).getValue);
}
}
}
The above code is just an example. The project in which I am working uses similar type of coding style. The usage of a.getCharges.getDiscounts.getDiscountList() kind of code always bugs me. Because I am again and again calling the same statement.
When I asked a senior why dont we save this statement into a simple List<> variable. He told me that it will use extra references which will increase overhead. Can using a variable be that much overhead than calling getters again and again?
As Java exchanges references not actual object, if you take a local variable it will just add a reference variable entry in stack frame.
This memory would be very less, almost negligible
This memory will be released once the method is completed because this will be local to the method
Despite that, you can gain significant performance gains if you use local variables. You are extracting same information within loop multiple times.
a.getCharges.getDiscounts.getDiscountList.size() is called multiple times. It should be a local variable.
b.getList().get(0) is being called multiple times. It should be a local variable.
a.getCharges.getDiscounts.getDiscountList is called multiple times. It should be a local variable.
Changing these to local variables would results in good performance gains, because unnecessary method calls would be saved.
Point your senior to this. If it works for limited resources on Android, I guess the technique of storing in local variables everything used in a for cycle is actually beneficial for performance anywhere.
In the excerpt below, note that we aren't even speaking about the overhead introduced by calling the (virtual) list.size() method, only storing the array.length as a local variable produces notable differences in performance.
public void zero() {
int sum = 0;
for (int i = 0; i < mArray.length; ++i) {
sum += mArray[i].mSplat;
}
}
public void one() {
int sum = 0;
Foo[] localArray = mArray;
int len = localArray.length;
for (int i = 0; i < len; ++i) {
sum += localArray[i].mSplat;
}
}
public void two() {
int sum = 0;
for (Foo a : mArray) {
sum += a.mSplat;
}
}
zero() is slowest, because the JIT can't yet optimize away the cost of getting the array length once for every iteration through the loop.
one() is faster. It pulls everything out into local variables, avoiding the lookups. Only the array length offers a performance benefit.
two() is fastest for devices without a JIT, and indistinguishable from one() for devices with a JIT. It uses the enhanced for loop syntax introduced in version 1.5 of the Java programming language.
Just make the discountList field never null - ie initialized to an empty list - and iterate over it. Something like:
for (Discount discount : a.getCharges().getDiscounts().getDiscountList()) {
b.getList().get(0).setDiscountValue(discount.getValue());
b.getList().get(0).setDiscountName(discount.getName());
}
Your "senior" may need to do some research. The "performance impact" of doing this is a few bytes per object and a few microseconds per access. If he's really hung up about memory, initialise it with a LinkedList, which has almost no memory footprint.
In Java a variable V pointing to an object instance O is simply a numeric value pointing to a memory location where the object's data is stored.
When we assign Vto another variable V1 all that happens is that V1 now points to the same memory location where data for O is stored. This means that new memory is not allocated when you do simple assignment unlike C++ code where the = operator can be be overloaded to do a deep-copy in which case new memory is actually allocated. Illustrating with an example below
Consider a class like below
class Foo {
private List<String> barList = new ArrayList<>();
//class methods...
//getter for the list
List<String> getBarList() {
return this.barList;
}
}
public static void main(String[] args) {
Foo f = new Foo()
//the below lien will print 0 since there is no bar string added
System.out.println("Bar list size: " + f.getBarList().size());
// add a bar string. Observe here that I am simply getting the list
// and adding - similar to how your code is currently structured
f.getBarList().add("SomeString");
//now get a reference to the list and store it in a variable.
// Below anotherList only points to the same memory location
// where the original bar list is present. No new memory is allocated.
List<String> anotherList = f.getBarList();
//print the content of bar list and another list. Both will be same
for(String s : f.getBarList()) {
System.out.println(s);
}
for(String s: anotherList) {
System.out.println(s);
}
//add a new bar string using the reference variable
anotherList.add("Another string");
//print the content of bar list and another list. Both will be same. If anotherList had separate memory allocated to it then the new string added would be only seen when we print the content of anotherList and not when we print the content of f.getBarList(). This proves that references are only some numeric addresses that point to locations of the object on heap.
for(String s : f.getBarList()) {
System.out.println(s);
}
for(String s: anotherList) {
System.out.println(s);
}
}
Hope this helps.

When to create references, store values and set references to null

I have a couple of questions:
Regarding the creation of References to Objects and primitive
values, I was wondering: when is it usually appropriate to store
values in a variable?
From my general knowledge, the rule of thumb would be to create
references when the same value is used more than once or to avoid
hard-coding E.g.
String name = "Bob";
System.out.println("Welcome " + name + ". Is your name really " + name + "?");
Whereas if it is only used once like in the example below, it would
be more performant to simply do the following.
System.out.println("Welcome Bob");
as opposed to
String name = "Bob";
System.out.println("Welcome " + name + ".");
Added question: If we are talking about a variable that is used when iterating over an array or enumerable object, which of the following would be more performant (assuming we are looping over an object like 1 million times)? Or would there be no difference and is simply a stylistic choice?
For example,
// nameArray is an extremely long array
public static void loop(String[] nameArray) {
String name; //Should this be declared inside the loop?
int len = nameArray.length();
for(int i = 0; i < len; i++) {
name = nameArray[i];
System.out.println(name);
}
}
or would this be more preferred?
// nameArray is an extremely long array
public static void loop(String[] nameArray) {
int len = nameArray.length();
for(int i = 0; i < len; i++) {
String name = nameArray[i]; //Declare String reference inside for loop
System.out.println(name);
}
}
In regards to garbage collection, after a reference to an object/primitive
has passed its useful life, is it always good practice to set
that value to null to make it eligible for garbage collection (assuming that there are no other references to that object/primitive value) ?
For example,
String name = "Bob";
System.out.println("Welcome " + name + ".);
name = null;
thank you in advance for taking time to look at this.
No it makes no difference - the object is allocated whether you use a local variable to refer to it or not. Use whatever is more readable.
It is almost never good practice to set values to null explicitly. There are a few corner cases, such as when not doing it would hold unnecessary references to variables that would otherwise be eligible for garbage collection (see for example: Effective Java - Item 6: Eliminate obsolete object references). In all other situations, limiting the scope of variables as much as possible is the most efficient way to help the garbage collector.
The bottom line being: use variables when you need them and let the garbage collector do its job, unless you have a compelling reason not to.
when is it usually appropriate to store values in a variable?
In most cases the answer is: When the code benefits from it from a maintenance perspective. If the code becomes easier to understand or debug, then use a variable.
After a reference to an object/primitive has passed its useful life, is it always good practice to set that value to null to make it eligible for garbage collection?
If the variable goes out of scope shortly after, then setting it to null will just unnecessarily clutter the code. I would use it only for long-lived variables, and perhaps for variables that point to large objects.
From my general knowledge, the rule of thumb would be to create references when the same value is used more than once or to avoid hard-coding
Please, forget this rule. Use variables and their scope to help others (and you) understand your program more easily. You can do something like this
private static final String MY_UNCLE_NAME = "Bob";
System.out.println(String.format("Welcome %s.", MY_UNCLE_NAME));
Updated
If we are talking about a variable that is used when iterating over an array or enumerable object, which of the following would be more performant (assuming we are looping over an object like 1 million times)? Or would there be no difference and is simply a stylistic choice?
Always use this (keep a scope of variables as less as possible and forget about local optimizations).
// nameArray is an extremely long array
public static void loop(String[] nameArray) {
int len = nameArray.length();
for(int i = 0; i < len; i++) {
String name = nameArray[i]; //Declare String reference inside for loop
System.out.println(name);
}
}

Java Array of Objects crashes

Java noob question:
Consider the following C array and initializer code:
struct {
int x;
int y;
} point_t;
point_t points[1000];
Easy. This gets created and memory allocated at load time.
Now consider the similar in Java:
public class point_t
{
public int x;
public int y;
}
point_t points[] = new point_t[1000];
// Without this loop, java will crash when you run this
for (int i=0; i<1000; i++)
{
points[i] = new point_t;
}
points[0].x = 10; // Crash would occur here without above loop
points[1].x = 10;
Initially my java program was crashing with a null pointer dereference. The problem was that, coming from C++, I was not aware that you have to create the 1000 point_t objects. Just a comment but this seems INSANE. Suppose the array size was 1 million or 1 billion. It would literally take seconds simply to "create" this array with empty entries at run time. In C++ it all happens at load time. I admit that you don't always know what would be in the C++ array's cells, but in embedded systems where I work, quite often the memory is auto initialized to zeros so it works.
So is there any easier, quicker, more efficient way in Java to create an array and allocate the memory when you have an array of objects? Or am I doing something wrong in the code above?
Since you are coming from a C++ background, this may help. In Java, when you write
point_t points[] = new point_t[1000];
This is similar to writing, in C++,
point_t* points[] = new point_t*[1000];
That is, in Java, when you create the array, you are not creating an array of point objects, but rather and array of point references, the same as if you would have created an array of point pointers in C++.
Java is a managed (garbage-collected) language; that is what Java programmers would expect.
As for the second part of your question, how one would create the objects themselves, what you did is fine. Create 1000 point objects in a loop and load them up. If you want shorter code, you can write a nice method to do this work. :)
You can also look into other collection libraries that might have these kind of convenience factory methods.
Writing
point_t[] points = new point_t[ 1000 ];
is allocating a thousand references to point_t objects. (In C parlance, it's allocating pointers to structs of that type.)
That loop
for (int i=0; i<1000; i++)
{
points[i] = new point_t;
}
allocates a new point_t object, and puts the references (pointer) to it in the array. Until you did that, the array was nothing but nulls, and it probably gave you null exceptions.
That's not an array of point_t instances; those live out on the heap.
It's really an array of references to those point_t instances out on the heap.
Any reference that is not initialized by being assigned to a reference value (e.g. by calling new) is set to null.
It's true for non-array reference types, too.
public class Person {
private String name; // not initialized; that means it's null
public Person() {} // oops; constructor should have initialized name, but now it's null
public String getName() { return name; } // returns null unless you set it properly
public void setName(String newName) { this.name = newName; }
}
You can use the Flyweight pattern to share the same data between different objects and defer the creation of the point object until it is really necessary

Java: Setting array length for an unknown number of entries

I am trying to fill a RealVector (from Apache Commons Math) with values. I tried using the class's append method, but that didn't actually add anything. So now I'm using a double[], which works fine, except I don't know in advance how big the array needs to be.
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
double[] data = new double[100]; // TODO: bad.
int i = 0;
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
data[i++] = attr * m.getAnnotation(Analyze.class).weight();
}
}
weightedAnalysis = new ArrayRealVector(data);
}
How can I deal with this issue? Here are my ideas:
Iterate through the class and count the methods with the annotation, then use that size to initialize the array. However, this will require an extra loop, and reflection is performance-intensive. (right?)
Pick an arbitrary size for the array, doubling it if space runs out. Downside: requires more lines of code
Use a List<Double>, then somehow weasel the Double objects back into doubles so they can be put in the RealVector. Uses more memory for the list.
Just pick a huge size for the starting array, and hope that it never overflows. Downside: this is begging for arrayindexoutofbound errors.
Or am I just using append(double d) wrong?
private void runAnalysis() throws IllegalArgumentException, IllegalAccessException, InvocationTargetException {
Double attr;
weightedAnalysis = new ArrayRealVector(data);
for (Method m : ParseTree.class.getMethods()) {
if (m.isAnnotationPresent(Analyze.class)) {
attr = (Double) m.invoke(this);
analysis.put(m.getAnnotation(Analyze.class).name(), attr);
weightedAnalysis.append(attr * m.getAnnotation(Analyze.class).weight());
}
}
}
RealVector.append() doesn't modify the vector, but rather constructs a new vector:
The [Java doc of RealVector.append()](http://commons.apache.org/math/apidocs/org/apache/commons/math/linear/RealVector.html#append(double)) explains:
append
RealVector append(double d)
Construct a vector by appending a double to this vector.
Parameters:
d - double to append.
Returns:
a new vector
Please note that using RealVector to construct the vector is quite an expensive operation, as append() would need to copy the elements over and over (i.e. constructing the array in the way you explained runs in O(n^2) time.
I would recommend simply using java's ArrayList<Double> during construction, and then simply converting to RealVector or any other data abstraction you like.
Why not use an ArrayList and add the elements to that?
I would suggest 3 as a good option. Using Double vs double is a minimal problem since autoboxing was introduced.
Using RealVector will take a huge amount of memory and computation time to build, because what you want is:
RealVector newVector = oldVector.append(d);
append() returns a newly constructed object, which is what you'd want for correctness.
If you're okay with heavy overhead on build, take a look at Apache Commons ArrayUtils, specifically add(double) and/or toPrimitive(Double).
You mentioned that you tried the append method, but that didn't actually add anything. After looking at the javadoc, make sure that you assign the result of the append method back to the original value...You probably already tried this, but just in case you overlooked:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector = myRealVector.append(1.0);
in other words, this won't change myRealVector:
RealVector myRealVector = new ArrayRealVector(data);
myRealVector.append(1.0);
you could initialize the array using the
ParseTree.class.getMethods().lenght
as initial capacity:
double[] buf = new double[ ParseTree.class.getMethods().lenght ];
or better
DoubleBuffer buf = DoubleBuffer.allocate([ ParseTree.class.getMethods().lenght);
this may waste some memory but is a safe solution, it depends on how many hit the if inside the loop has.
if you prefer you may count how many methods are annotated in advance and then allocate the exact size for the array

Categories