Related
If for some reason I have to use as little memory space as possible, then is the second code bellow better that the first? (the code is just for illustration and doesn't have any meaning). (edit: imagine I want to make the assembly code of this before any optimization done by the JVM. Then do i use 99999998 extra memory locations in the first procedure compared to the second one? the focus is just one memory usage)
First:
for(int i=0; i<99999999; i++){
int k=2*i
}
Second:
int k=0;
for(int i=0; i<99999999; i++){
k=2*i
}
What I'm sure of :
In every case, The difference will not be visible. if you want to make such small optimization, Java is surely not the best technology. Which makes me recommend the first one cause it make the code more readable and logical. (Find it strange to declare a variable outside the for if you use it only inside it, it's confusing).
What I think :
In your small example and since your looking for a memory footprint. The first one is better because it follow the implicit rule mentioned in this comment: the smaller the scope is the better it is.
In the first case the variable k is used only in a really small loop. So the optimizer will easily understand it and use only a register, so no memory usage and less instructions.
In the second case, it will be harder for the optimizer to determine that k is not use elsewhere. So it could allow some memory instead of using a register. It will then use some memory and be less optimized since it will need instructions to load and store the memory.
As mentionned in this comment, it will mostly depend on how you use it. In your example the optimizer will detect it's the same usage in both case and will use no memory. But in harder code it will not always find it. So I recommend to have the smaller scope has possible.
#pranay-khandelwal 's response links to an excellent discussion of this question in a different language. The Java JVM, however, throws bytecode around at runtime to try to achieve better performance, which can complicate things.
There's actually another discussion here on a similar topic of in/out best practices in general for readability, which resulted in some benchmarks and discussion that one of the participants documented here
As a general rule of thumb the second option will be better for memory and performance under almost all circumstances - where the former may be more self-documenting and maintainable, and avoid accidental use elsewhere in the outer scope.
Since you mention that the code in your post is only a representative example and this could apply to more than just simple types:
Replacing the contents of an already registered memory area is less costly than registering a new one and deregistering the old one (or waiting for it to be garbage collected). Even where a language's compiler or interpreter smartly uses recently unreferenced things to store new things, that work also takes work, and is overhead that can optimistically be avoided with outer declarations - though as others mention, this is usually an unnecessary and potentially bug-spawning micro-optimization.
As from these short examples you provided, the second option. However, it always depends on the logic of your code.
Thinking about performance and minimising execution space and time, the second code scales better even though it looks countering some good coding practices.
K in your code is used only inside the loop block. However, it is "reused" over multiple "block iterations". Take a look at the syntax of your for loop, it declares i (int i) in the beginning of the statement; this declaration will happen just once. Again, declaring the variable multiple times may lead to waste to time and memory.
The JVM optimiser might do a good job in general, simple cases. However, it might fail in capturing the semantics of your code (Java).
for(int i=0,k=0; i<99999999; i++){
k=2*i
}
Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?
Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.
One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;
If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?
Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.
Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.
What is the best practice for initializing an ArrayList in Java?
If I initialize a ArrayList using the new operator then the ArrayList will by default have memory allocated for 10 buckets. Which is a performance hit.
I don't know, maybe I am wrong, but it seems to me that I should create a ArrayList by mentioning the size, if I am sure about the size!
Which is a performance hit.
I wouldn't worry about the "performance hit". Object creation in Java is very fast. The performance difference is unlikely to be measurable by you.
By all means use a size if you know it. If you don't, there's nothing to be done about it anyway.
The kind of thinking that you're doing here is called "premature optimization". Donald Knuth says it's the root of all evil.
A better approach is to make your code work before you make it fast. Optimize with data in hand that tells you where your code is slow. Don't guess - you're likely to be wrong. You'll find that you rarely know where the bottlenecks are.
If you know how many elements you will add, initialize the ArrayList with correct number of objects. If you don't, don't worry about it. The performance difference is probably insignificant.
This is the best advice I can give you:
Don't worry about it. Yes, you have several options to create an ArrayList, but using the new, the default option provided by the library, isn't a BAD choice, otherwise it'd be stupid to make it the default choice for everyone without clarifying what's better.
If it turns out that this is a problem, you'll quickly discover it when you profile. That's the proper place to find problems, when you profile your application for performance/memory problems. When you first write the code, you don't worry about this stuff -- that's premature optimization -- you just worry about writing good, clean code, with good design.
If your design is good, you should be able to fix this problem in no time, with little impact to the rest of the system. Effective Java 2nd Edition, Item 52: Refer to objects by their interfaces. You may even be able to switch to a LinkedList, or any other kind of List out there, if that turns out to be a better data structure. Design for this kinds of flexibility.
Finally, Effective Java 2nd Edition, Item 1: Consider static factory methods instead of constructors. You may even be able to combine this with Item 5: Avoid creating unnecessary objects, if in fact no new instances are actually needed (e.g. Integer.valueOf doesn't always create a new instance).
Related questions
Java Generics Syntax - in-depth about type inferring static factory methods (also in Guava)
On ArrayList micromanagement
Here are some specific tips if you need to micromanage an ArrayList:
You can use ArrayList(int initialCapacity) to set the initial capacity of a list. The list will automatically grow beyond this capacity if needed.
When you're about to populate/add to an ArrayList, and you know what the total number of elements will be, you can use ensureCapacity(int minCapacity) (or the constructor above directly) to reduce the number of intermediate growth. Each add will run in amortized constant time regardless of whether or not you do this (as guaranteed in the API), so this can only reduce the cost by a constant factor.
You can trimToSize() to minimize the storage usage.
This kind of micromanagement is generally unnecessary, but should you decide (justified by conclusive profiling results) that it's worth the hassle, you may choose to do so.
See also
Collections.singletonList - Returns an immutable list containing only the specified object.
If you already know the size of your ArrayList (approximately) you should use the constructor with capacity. But most of the time developers don't really know what will be in the List, so with a capacity of 10 it should be sufficient for most of the cases.
10 buckets is an approximation and isn't a performance hit unless you already know that your ArrayList contains tons of elements, in this case, the need to resize your array all the time will be the performance hit.
You don't need to tell initial size of ArrayList. You can always add/remove any element from it easily.
If this is a performance matter, please keep in mind following things :
Initialization of ArrayList is very fast. Don't worry about it.
Adding/removing element from ArrayList is also very fast. Don't worry about it.
If you find your code runs too slow. The first to blame is your algorithm, no offense. Machine specs, OS and Language indeed participate too. But their participation is considered insignificant compared to your algorithm participation.
If you don't know the size of theArrayList, then you're probably better off using a LinkedList, since the LinkedList.add() operation is constant speed.
However as most people here have said you should not worry about speed before you do some kind of profiling.
You can use this old, but good (in my opinion) article for reference.
http://chaoticjava.com/posts/linkedlist-vs-arraylist/
Since ArrayList is implemented by array underlying, we have to choose a initial size for the array.
If you really care you can call trimToSize() once you have constructed and populated the object. The javadoc states that the capacity will be at least as large as the list size. As previously stated, its unlikely you will find that the memory allocated to an ArrayList is a performance bottlekneck, and if it were, I would recommend you use an array instead.
Was recently reviewing some Java Swing code and saw this:
byte[] fooReference;
String getFoo() {
returns new String(fooReference);
}
void setFoo(String foo) {
this.fooReference = foo.getBytes();
}
The above can be useful to save on your memory foot print or so I'm told.
Is this overkill is anyone else encapsulating their Strings in this way?
That's a really, really bad idea. Don't use the platform default encoding. There's nothing to say that if you call setFoo and then getFoo that you'll get back the same data.
If you must do something like this, then use UTF-8 which can represent the whole of Unicode for certain... but I really wouldn't do it. It potentially saves some memory, but at the cost of performing conversions unnecessarily for most of the time - and being error-prone, in terms of failing to use an appropriate encoding.
I dare say there are some applications where this would be appropriate, but for 99.99% of them, it's a terrible idea.
This is not really useful:
1. You are copying the string every time getFoo or setFoo are called, therefore increasing both CPU and memory usage
2. It's obscure
A little historical excursion...
Using byte arrays instead of String objects actually used to have some considerable advantages in the early days of Java (1.0/1.1) if you could be sure that you would never need anything outside of ISO-8859-1. With the VMs of that time it was more than 10 times faster to use drawBytes() compared to drawString() and it actually does save memory which was still very scarce at that time and applets used to have a hard coded memory barrier of 32 and later 64 MB anyway. Not only is a byte[] smaller than the embedded char[] of String objects but you could also save the comparatively heavy String object itself which did make quite a difference if you had lots of short strings. Besides that accessing a plain byte array is also faster than using the accessor methods of String with all their extra bounds checks.
But since drawBytes ceased to be any faster in Java 1.2 and since current JITs are much better than the Symantec JIT of that time the remaining minimal performance advantage of byte[] arrays over strings is no longer worth the hassle. The memory advantage is still there and it might thus still be an option in some very rare extreme scenarios but nowadays it's nothing that should be considered if it's not really necessary.
It may well be overkill, and it may even consume more memory, since you now have two copies of the string. How long the actual string lives depends upon the client, but as with many such hacks, it smells a lot like premature optimization.
If you anticipate that you'll have a lot of identical strings, another much better way you can save memory is with the String.intern() method.
Each call to getFoo() is instantiating a new String. How is this saving memory? If anything you're adding additional overhead for your garbage collector to go and clean up these new instances when these new references become unreferenced
This does indeed not make any sense. If it were a compile time constant which you don't need to massage back to a String, then it would make a bit more sense. You still have the character encoding problem.
It would make more sense to me if it were a char[] constant. In real world there are several JSP compilers which optimizes String constants away into a char[] which in turn can easily be written to a Writer#write(char[]). This is finally "slightly" more efficient, but those little bits counts a lot in large and heavily used applications like Google Search and so on.
Tomcat's JSP compiler Jasper does this as well. Check the genStringAsCharArray setting. It does then like so
static final char[] text1 = "some static text".toCharArray();
instead of
static final String text1 = "some static text";
which ends up with less overhead. It doesn't need a whole String instance around those characters.
If, after profiling your code, you find that memory usage for strings is a problem, you're much better off using a general string compressor and storing compressed strings, rather than trying to use UTF-8 strings for the minor reduction in space they give you. With English language strings, you can generally compress them to 1-2 bits per character; most other languages are probably similar. Getting to <1 bit per character is hard, but possible if you have a lot of data.
Hmmm. I have a table which is an array of structures I need to store in Java. The naive don't-worry-about-memory approach says do this:
public class Record {
final private int field1;
final private int field2;
final private long field3;
/* constructor & accessors here */
}
List<Record> records = new ArrayList<Record>();
If I end up using a large number (> 106 ) of records, where individual records are accessed occasionally, one at a time, how would I figure out how the preceding approach (an ArrayList) would compare with an optimized approach for storage costs:
public class OptimizedRecordStore {
final private int[] field1;
final private int[] field2;
final private long[] field3;
Record getRecord(int i) { return new Record(field1[i],field2[i],field3[i]); }
/* constructor and other accessors & methods */
}
edit:
assume the # of records is something that is changed infrequently or never
I'm probably not going to use the OptimizedRecordStore approach, but I want to understand the storage cost issue so I can make that decision with confidence.
obviously if I add/change the # of records in the OptimizedRecordStore approach above, I either have to replace the whole object with a new one, or remove the "final" keyword.
kd304 brings up a good point that was in the back of my mind. In other situations similar to this, I need column access on the records, e.g. if field1 and field2 are "time" and "position", and it's important for me to get those values as an array for use with MATLAB, so I can graph/analyze them efficiently.
The answers that give the general "optimise when you have to" is unhelpful in this case because , IMHO, programmers should always be aware of the performance in different in design choices when that choice leads to an order of magnitude performance penalty, particularly API writers.
The original question is quite valid and I would tend to agree that the second approach is better, given his particular situation. I've written image processing code where each pixel requires a data structure, a situation not too dissimilar to this, except I needed frequent random access to each pixel. The overhead of creating one object for each pixel was enormous.
The second version is much, much worse. Instead of resizing one array, you're resizing three arrays when you do an insert or delete. What's more, the second version will lead to the creation of many more temporary objects and it will do so on accesses. That could lead to a lot of garbage (from a GC point of view). Not good.
Generally speaking, you should worry about how you use the objects long before you think about performance. So you have a record with three fields or three arrays. Which one more accurately depicts what you're modeling? By this I mean, when you insert or delete an item, are you doing one of the three arrays or all three as a block?
I suspect it's the latter in which case the former makes far more sense.
If you're really concerned about insertion/deletion performance then perhaps a different data structure is appropriate, perhaps a SortedSet or a Map or SortedMap.
If you have millions of records, the second approach has several advantages:
Memory usage: the first approach uses more memory because a) every Java object in heap has a header (containing class id, lock state etc.); b) objects are aligned in memory; c) each reference to an object costs 4 bytes (on 64-bit JVMs with Compressed OOPs or 32-bit JVMs) or 8 bytes (64-bit JVMs without Compressed OOPs). See e. g. CompressedOops for more details. So the first approach takes about two times more memory (more precisely: according to my benchmark, an object with 16 bytes of payload + a reference to it took 28 bytes on 32-bit Java 7, 36 bytes on 64-bit Java 7 with compressed OOPs, and 40 bytes on 64-bit Java 7 w/o compressed OOPs).
Garbage collection: although the second approach seems to create many objects (one on each call of getRecord), it might not be so, as modern server JVMs (e. g. Oracle's Java 7) can apply escape analysis and stack allocation to avoid heap allocation of temporary objects in some cases; anyway, GCing short-lived objects is cheap. On the other hand, it is probably easier for the garbage collector if there are not millions of long-lived objects (as there are in the first approach) whose reachability to check (or at least, such objects may make your application need more careful tuning of GC generation sizes). Thus the second approach may be better for GC performance. However, to see whether it makes a difference in the real situation, one should make a benchmark oneself.
Serialization speed: the speed of (de)serializing a large array of primitives on disk is only limited by HDD speed; serializing many small objects is inevitably slower (especially if you use Java's default serialization).
Therefore I have used the second approach quite often for very large collections. But of course, if you have enough memory and don't care about serialization, the first approach is simpler.
How are you going to access the data? If the accesses over the fields are always coupled, then use the first option, if you are going to process the fields by its own, then the second option is better.
See this article in wikipedia: Parallel Array
A good example about when it's more convenient to have separate arrays could be simulations where the numerical data is packed together in the same array, and other attributes like name, colour, etc. that are accessed just for presentation of the data in other array.
I was curious so I actually ran a benchmark. If you don't re-create the object like you are[1], then SoA beats AoS by 5-100% depending on workload[2]. See my code here:
https://gist.github.com/twolfe18/8168262c5420c7a62d39
[1] I didn't add that because if you are concerned enough about speed to consider this refactor, it would be silly to do that.
[2] This also doesn't account for re-allocation, but again, this is often something you can either amortize away or know statically. This is a reasonable assumption for a pure-speed benchmark.
Notice that the second approach might have negative impact on caching behaviour. If you want to access a single record at a time, you'd better have that record not scattered all across the place.
Also, the only memory you win in the second approach, is (possibly) due to member alignment. (and having to allocate a separate object).
Otherwise, they have exactly the same memory use, asymptotically. The first option is much better due to locality, IMO
Whenever I have tried doing number crunching in Java, I have always had to revert to C-style coding (i.e. close to your option 2). It minimised the number of objects floating around in your system, as instead of 1,000,000 objects, you only have 3. I was able to do a bit of FFT analysis of real-time sound data using the C-style, and it was far too slow using objects.
I'd choose the first method (array of structures) unless you access the store relatively infrequently and are running into serious memory pressure issues.
First version basically stores the objects in their "natural" form (+1 BTW for using immutable records). This uses a little more memory because of the per-object overhead (probably around 8-16 bytes depending on your JVM) but is very good for accessing and returning objects in a convenient and human-understandable form in one simple step.
Second version uses less memory overall, but the allocation of a new object on every "get" is a pretty ugly solution that will not perform well if accesses are frequent.
Some other possibilities to consider:
An interesting "extreme" variant would be to take the second version but write your algorithms / access methods to interact with the underlying arrays directly. This is clearly going to result in complex inter-dependencies and some ugly code, but would probably give you the absolute best performance if you really needed it. It's quite common to use this approach for intensive graphics applications such as manipulating a large array of 3D coordinates.
A "hybrid" option would be to store the underlying data in a structure of arrays as in the second version, but cache the accessed objects in a HashMap so that you only generate the object the first time a particular index is accessed. Might make sense if only a small fraction of objects are ever likely to accessed, but all data is needed "just in case".
(Not a direct answer, but one that I think should be given)
From your comment,
"cletus -- I greatly respect your thoughts and opinions, but you gave me the high-level programming & software design viewpoint which is not what I'm looking for. I cannot learn to ignore optimization until I can get an intuitive sense for the cost of different implementation styles, and/or the ability to estimate those costs. – Jason S Jul 14 '09 at 14:27"
You should always ignore optimization until it presents itself as a problem. Most important is to have the system be usable by a developer (so they can make it usable by a user). There are very few times that you should concern yourself with optimization, in fact in ~20 years of professional coding I have cared about optimization a total of two times:
Writing a program that had its primary purpose to be faster than another product
Writing a smartphone app with the intention of reducing the amount of data sent between the client and server
In the first case I wrote some code, then ran it through a profiler, when I wanted to do something and I was not sure which approach was best (for speed/memory) I would code one way and see the result in the profiler, then code the other way and see the result. Then I would chose the faster of the two. This works and you learn a lot about low level decisions. I did not, however, allow it to impact the higher level classes.
In the second case, there was no programming involved, but I did the same basic thing of looking at the data being sent and figuring out how to reduce the number of messages being sent as well as the number of bytes being sent.
If your code is clear then it will be easier to speed up once you find out it is slow. As Cletus said in his answer, you are resizing one time -vs- three times... one time will be faster than three. From a higher point of view the one time is simpler to understand than the three times, thus it is more likely to be correct.
Personally I'd rather get the right answer slowly then the wrong answer quickly. Once I know how to get the right answer then I can find out where the system is slow and replace those parts of it with faster implementations.
Because you are making the int[] fields final, you are stuck with just the one initialization of the array and that is it. Thus, if you wanted 10^6 field1's, Java would need to separate that much memory for each of those int[], because you cannot reassign the size of those arrays. With an ArrayList, if you do not know the number of records beforehand and will be removing records potentially, you save a lot of space upfront and then later on as well when you go to remove records.
I would go for the ArrayList version too, so I don't need to worry about growing it. Do you need to have a column like access to values? What is your scenario behind your question?
Edit You could also use a common long[][] matrix.
I don't know how you pass the columns to Matlab, but I guess you don't gain much speed with a column based storage, more likely you loose speed in the java computation.