Java large datastructure for storing a matrix

Java large datastructure for storing a matrix - java

I need to store a 2d matrix containing zip codes and the distance in km between each one of them. My client has an application that calculates the distances which are then stored in an Excel file. Currently, there are 952 places. So the matrix would have 952x952 = 906304 entries.
I tried to map this into a HashMap[Integer, Float]. The Integer is the hash code of the two Strings for two places, e.g. "A" and "B". The float value is the distance in km between them.
While filling in the data I run into OutOfMemoryExceptions after 205k entries. Do you have a tip how I can store this in a clever way? I even don't know if it's clever to have the whole bunch in memory. My options are SQL and MS Access...
The problem is that I need to access the data very quickly and possibly very often which is why I chose the HashMap because it runs in O(1) for the look up.
Thansk for your replies and suggestions!
Marco

A 2d array would be more memory efficient.
You can use a small hashmap to map the 952 places into a number between 0 and 951 .
Then, just do:
float[][] distances= new float[952][952];
To look things up, just use two hash lookups to convert the two places into two integers, and use them as indexes into the 2d array.
By doing it this way, you avoid the boxing of floats, and also the memory overhead of the large hashmap.
However, 906304 really isn't that many entries, you may just need to increase the Xmx maximum heap size

I would have thought that you could calculate the distances on the fly. Presumably someone has already done this, so you simply need to find out what algorithm they used, and the input data; e.g. longitude/latitude of the notional centres of each ZIP code.
EDIT: There are two commonly used algorithms for finding the (approximate) geodesic distance between two points given by longitude/latitude pairs.
The Vicenty formula is based on an ellipsoid approximation. It is more accurate, but more complicated to implement.
The Haversine formula is based on a spherical approximation. It is less accurate (0.3%), but simpler to implement.

Can you simply boost the memory available to the JVM ?
java -Xmx512m ...
By default the maximum memory configuration is 64Mb. Some more tuning tips here. If you can do this then you can keep the data in-process and maximise the performance (i.e. you don't need to calculate on the fly).

I upvoted Chi's and Benjamin's answers, because they're telling you what you need to do, but while I'm here, I'd like to stress that using the hashcode of the two strings directly will get you into trouble. You're likely to run into the problem of hash collisions.
This would not be a problem if you were concatenating the two strings (being careful to use a delimiter which cannot appear in the place designators), and letting HashMap do its magic, but the method you suggested, using the hashcodes for the two strings as a key, that's going to get you into trouble.

You will simply need more memory. When starting your Java process, kick it off like so:
java -Xmx256M MyClass
The -Xmx defines the max heap size, so this says the process can use up to 256 MB of memory for the heap. If you still run out, keep bumping that number up until you hit the physical limit.

Lately I've managed similar requisites for my master thesis.
I ended with a Matrix class that uses a double[], not a double[][], in order to alleviate double deref costs (data[i] that is an array, then array[i][j] that is a double) while allowing the VM to allocate a big, contiguous chunk of memory:
public class Matrix {
private final double data[];
private final int rows;
private final int columns;
public Matrix(int rows, int columns, double[][] initializer) {
this.rows = rows;
this.columns = columns;
this.data = new double[rows * columns];
int k = 0;
for (int i = 0; i < initializer.length; i++) {
System.arraycopy(initializer[i], 0, data, k, initializer[i].length);
k += initializer[i].length;
}
}
public Matrix set(int i, int j, double value) {
data[j + i * columns] = value;
return this;
}
public double get(int i, int j) {
return data[j + i * columns];
}
}
this class should use less memory than an HashMap since it uses a primitive array (no boxing needed): it needs only 906304 * 8 ~ 8 Mb (for doubles) or 906304 * 4 ~ 4 Mb (for floats). My 2 cents.
NB
I've omitted some sanity checks for simplicity's sake

Stephen C. has a good point: if the distances are as-the-crow-flies, then you could probably save memory by doing some calculations on the fly. All you'd need is space for the longitude and latitude for 952 zip codes and then you could use the vicenty formula to do your calculation when you need to. This would make your memory usage O(n) in zipcodes.
Of course, that solution makes some assumptions that may turn out to be false in your particular case, i.e. that you have longitude and latitude data for your zipcodes and that you're concerned with as-the-crow-flies distances and not something more complicated like driving directions.
If those assumptions are true though, trading a few computes for a whole bunch of memory might help you scale in the future if you ever need to handle a bigger dataset.

The above suggestions regarding heap size will be helpful. However, I am not sure if you gave an accurate description of the size of your matrix.
Suppose you have 4 locations. Then you need to assess the distances between A->B, A->C, A->D, B->C, B->D, C->D. This suggests six entries in your HashMap (4 choose 2).
That would lead me to believe the actual optimal size of your HashMap is (952 choose 2)=452,676; NOT 952x952=906,304.
This is all assuming, of course, that you only store one-way relationships (i.e. from A->B, but not from B->A, since that is redundant), which I would recommend since you are already experiencing problems with memory space.
Edit: Should have said that the size of your matrix is not optimal, rather than saying the description was not accurate.

Create a new class with 2 slots for location names. Have it always put the alphabetically first name in the first slot. Give it a proper equals and hashcode method. Give it a compareTo (e.g. order alphabetically by names). Throw them all in an array. Sort it.
Also, hash1 = hash2 does not imply object1 = object2. Don't ever do this. It's a hack.

Related

Best data structure to hold large amounts of data?

Reading in a lot of data from a file. There may be 100 different data objects with necessary headings, but there can be well over 300,000 values stored in each of these data objects. The values need to be stored in the same order that they are read in. This is the constructor for the data object:
public Data(String heading, ArrayList<Float> values) {
this.heading = heading;
this.values = values;
}
What would be the quickest way to store and retrieve these values sequentially in RAM?

Although in your comments you mention "quickness", without specifying what operation needs to be "quick", your main concern seems to be heap memory consumption.
Let's assume 100 groups of 300,000 numbers (you've used words like "may be" and "well over" but this will do as an example).
That's 30,000,000 numbers to store, plus 100 headings and some structural overhead for grouping.
A primitive Java float is 32 bits, that is 4 bytes. So at an absolute minimum, you're going to need 30,000,000 * 4 bytes == 120MB.
An array of primitives - float[30000000] - is just all the values concatenated into a contiguous chunk of memory, so will consume this theoretical minumum of 120MB -- plus a few bytes of once-per-array overhead that I won't go into detail about here.
A java Float wrapper object is 12 bytes. When you store an object (rather than a primitive) in an array, the reference itself is 4 bytes. So an array of Float - Float[30000000] will consume 30,000,000 * (12 + 4) == 480MB.
So, you can cut your memory use by more than half by using primitives rather than wrappers.
An ArrayList is quite a light wrapper around an array of Object and so has about the same memory costs. The once-per-list overheads are too small to have an impact compared to the elements, at these list sizes. But there are some caveats:
ArrayList can only store Objects, not primitives, so if you choose a List you're stuck with the 12-bytes-per-element overhead of Float.
There are some third-party libraries that provide lists of primitives - see: Create a List of primitive int?
The capacity of an ArrayList is dynamic, and to achieve this, if you grow the list to be bigger than its backing array, it will:
create a new array, 50% bigger than the old array
copy the contents of the old array into the new array (this sounds expensive, but hardware is very fast at doing this)
discard the old array
This means that if the backing array happens to have 30 million elements, and is full, ArrayList.add() will replace the array with one of 45 million elements, even if your List only needs 30,000,001.
You can avoid this if you know the needed capacity in advance, by providing the capacity in the constructor.
You can use ArrayList.trimToSize() to drop unneeded capacity and claw some memory back after you've filled the ArrayList.
If I was striving to use as little heap memory as possible, I would aim to store my lists of numbers as arrays of primitives:
class Data {
String header;
float[] values;
}
... and I would just put these into an ArrayList<Data>.
With this structure, you have O(1) access to arbitrary values, and you can use Arrays.binarySearch() (if the values are sorted) to find by value within a group.
If at all possible, I would find out the size of each group before reading the values, and initialise the array to the right size. If you can, make your input file format facilitate this:
while(line = readLine()) {
if(isHeader(line)) {
ParsedHeader header = new ParsedHeader(line);
currentArray = new float[header.size()];
arrayIndex = 0;
currentGroup = new Group(header.name(), currentArray);
groups.add(currentGroup);
} else if (isValue(line)) {
currentArray[arrayIndex++] = parseValue(line);
}
}
If you can't change the input format, consider making two passes through the file - once to discover group lengths, once again to fill your arrays.
If you have to consume the file in one pass, and the file format can't provide group lengths before groups, then you'll have to do something that allows a "list" to grow arbitrarily. There are several options:
Consume each group into an ArrayList<Float> - when the group is complete, convert it into an array[float]:
float[] array = new float[list.size()];
int i = 0;
for (Float f : list) {
array[i] = f; // auto-unboxes Float to float
}
Use a third-party list-of-float library class
Copy the logic used by ArrayList to replace your array with a bigger one when needed -- http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java#ArrayList.ensureCapacity%28int%29
Any number of approaches discussed in Computer Science textbooks, for example a linked list of arrays.
However none of this considers your reasons for slurping all these numbers into memory in the first place, nor whether this store meets your needs when it comes to processing the numbers.
You should step back and consider what your actual data processing requirement is, and whether slurping into memory is the best approach.
See whether you can do your processing by storing only a slice of data at a time, rather than storing the whole thing in memory. For example, to calculate max/min/mean, you don't need every number to be in memory -- you just need to keep a running total.
Or, consider using a lightweight database library.

You could use a RedBlack BST, which will be an extremely efficient way to store/retrieve data. This relies on nodes that link to other nodes, so there's no limit to the size of the input, as long as you have enough memory for java.

Java - Large array advice on how to break it down [duplicate]

I'm trying to find a counterexample to the Pólya Conjecture which will be somewhere in the 900 millions. I'm using a very efficient algorithm that doesn't even require any factorization (similar to a Sieve of Eratosthenes, but with even more information. So, a large array of ints is required.
The program is efficient and correct, but requires an array up to the x i want to check for (it checks all numbers from (2, x)). So, if the counterexample is in the 900 millions, I need an array that will be just as large. Java won't allow me anything over about 20 million. Is there anything I can possibly do to get an array that large?

You may want to extend the max size of the JVM Heap. You can do that with a command line option.
I believe it is -Xmx3600m (3600 megabytes)

Java arrays are indexed by int, so an array can't get larger than 2^31 (there are no unsigned ints). So, the maximum size of an array is 2147483648, which consumes (for a plain int[]) 8589934592 bytes (= 8GB).
Thus, the int-index is usually not a limitation, since you would run out of memory anyway.
In your algorithm, you should use a List (or a Map) as your data structure instead, and choose an implementation of List (or Map) that can grow beyond 2^31. This can get tricky, since the "usual" implementation ArrayList (and HashMap) uses arrays internally. You will have to implement a custom data structure; e.g. by using a 2-level array (a list/array). When you are at it, you can also try to pack the bits more tightly.

Java will allow up to 2 billions array entries. It’s your machine (and your limited memory) that can not handle such a large amount.

900 million 32 bit ints with no further overhead - and there will always be more overhead - would require a little over 3.35 GiB. The only way to get that much memory is with a 64 bit JVM (on a machine with at least 8 GB of RAM) or use some disk backed cache.

If you don't need it all loaded in memory at once, you could segment it into files and store on disk.

What do you mean by "won't allow". You probably getting an OutOfMemoryError, so add more memory with the -Xmx command line option.

You could define your own class which stores the data in a 2d array which would be closer to sqrt(n) by sqrt(n). Then use an index function to determine the two indices of the array. This can be extended to more dimensions, as needed.
The main problem you will run into is running out of RAM. If you approach this limit, you'll need to rethink your algorithm or consider external storage (ie a file or database).

If your algorithm allows it:
Compute it in slices which fit into memory.
You will have to redo the computation for each slice, but it will often be fast enough.
Use an array of a smaller numeric type such as byte.

Depending on how you need to access the array, you might find a RandomAccessFile will allow you to use a file which is larger than will fit in memory. However, the performance you get is very dependant on your access behaviour.

I wrote a version of the Sieve of Eratosthenes for Project Euler which worked on chunks of the search space at a time. It processes the first 1M integers (for example), but keeps each prime number it finds in a table. After you've iterated over all the primes found so far, the array is re-initialised and the primes found already are used to mark the array before looking for the next one.
The table maps a prime to its 'offset' from the start of the array for the next processing iteration.
This is similar in concept (if not in implementation) to the way functional programming languages perform lazy evaluation of lists (although in larger steps). Allocating all the memory up-front isn't necessary, since you're only interested in the parts of the array that pass your test for primeness. Keeping the non-primes hanging around isn't useful to you.
This method also provides memoisation for later iterations over prime numbers. It's faster than scanning your sparse sieve data structure looking for the ones every time.

I second #sfossen's idea and #Aaron Digulla. I'd go for disk access. If your algorithm can take in a List interface rather than a plain array, you could write an adapter from the List to the memory mapped file.

Use Tokyo Cabinet, Berkeley DB, or any other disk-based key-value store. They're faster than any conventional database but allow you to use the disk instead of memory.

could you get by with 900 million bits? (maybe stored as a byte array).

You can try splitting it up into multiple arrays.
for(int x = 0; x <= 1000000; x++){
myFirstList.add(x);
}
for(int x = 1000001; x <= 2000000; x++){
mySecondList.add(x);
}
then iterate over them.
for(int x: myFirstList){
for(int y: myFirstList){
//Remove multiples
}
}
//repeat for second list

Use a memory mapped file (Java 5 NIO package) instead. Or move the sieve into a small C library and use Java JNI.

fastest way to map a large number of longs

I'm writing a java application that transforms numbers (long) into a small set of result objects. This mapping process is very critical to the app's performance as it is needed very often.
public static Object computeResult(long input) {
Object result;
// ... calculate
return result;
}
There are about 150,000,000 different key objects, and about 3,000 distinct values.
The transformation from the input number (long) to the output (immutable object) can be computed by my algorithm with a speed of 4,000,000 transformations per second. (using 4 threads)
I would like to cache the mapping of the 150M different possible inputs to make the translation even faster but i found some difficulties creating such a cache:
public class Cache {
private static long[] sortedInputs; // 150M length
private static Object[] results; // 150M length
public static Object lookupCachedResult(long input) {
int index = Arrays.binarySearch(sortedInputs, input);
return results[index];
}
}
i tried to create two arrays with a length of 150M. the first array holds all possible input longs, and it is sorted numerically. the second array holds a reference to one of the 3000 distinct, precalculated result objects at the index corresponding to the first array's input.
to get to the cached result, i do a binary search for the input number on the first array. the cached result is then looked up in the second array at the same index.
sadly, this cache method is not faster than computing the results. not even half, only about 1.5M lookups per second. (also using 4 threads)
Can anyone think of a faster way to cache results in such a scenario?
I doubt there is a database engine that is able to answer more than 4,000,000 queries per second on, let's say an average workstation.

Hashing is the way to go here, but I would avoid using HashMap, as it only works with objects, i.e. must build a Long each time you insert a long, which can slow it down. Maybe this performance issue is not significant due to JIT, but I would recommend at least to try the following and measure performance against the HashMap-variant:
Save your longs in a long-array of some length n > 3000 and do the hashing by hand via a very simple hash-function (and thus efficient) like
index = key % n. Since you know your 3000 possible values before hand you can empirically find an array-length n such that this trivial hash-function won't cause collisions. So you circumvent rehashing etc. and have true O(1)-performance.
Secondly I would recommend you to look at Java-numerical libraries like
https://github.com/mikiobraun/jblas
https://github.com/fommil/matrix-toolkits-java
Both are backed by native Lapack and BLAS implementations that are usually highly optimized by very smart people. Maybe you can formulate your algorithm in terms of matrix/vector-algebra such that it computes the whole long-array at one time (or chunk-wise).

There are about 150,000,000 different key objects, and about 3,000 distinct values.
With the few values, you should ensure that they get re-used (unless they're pretty small objects). For this an Interner is perfect (though you can run your own).
i tried hashmap and treemap, both attempts ended in an outOfMemoryError.
There's a huge memory overhead for both of them. And there isn't much point is using a TreeMap as it uses a sort of binary search which you've already tried.
There are at least three implementations of a long-to-object-map available, google for "primitive collections". This should use slightly more memory than your two arrays. With hashing being usually O(1) (let's ignore the worst case as there's no reason for it to happen, is it?) and much better memory locality, it'll beat(*) your binary search by a factor of 20. You binary search needs log2(150e6), i.e., about 27 steps and hashing may need on the average maybe two. This depends on how tightly you pack the hash table; this is usually a parameter given when it gets created.
In case you run your own (which you most probably shouldn't), I'd suggest to use an array of size 1 << 28, i.e., 268435456 entries, so that you can use bitwise operations for indexing.
(*) Such predictions are hard, but I'm sure it's worth trying.

ArrayList<Double> to double[] with 300 million entries

I'm using a java program to get some data from a DB. I then calculate some numbers and start storing them in an array. The machine I'm using has 4 gigs of RAM. Now, I don't know how many numbers there will be in advance, so I use an ArrayList<Double>. But I do know there will be roughly 300 million numbers.
So, since one double is 8 bytes a rough estimate of the memory this array will consume is 2.4 gigs (probably more because of the overheads of an ArrayList). After this, I want to calculate the median of this array and am using the org.apache.commons.math3.stat.descriptive.rank.Median library which takes as input a double[] array. So, I need to convert the ArrayList<Double> to double[].
I did see many questions where this is raised and they all mention there is no way around looping through the entire array. Now this is fine, but since they also maintain both objects in memory, this brings my memory requirements up to 4.8 gigs. Now we have a problem since the total RAM available us 4 gigs.
First of all, is my suspicion that the program will at some point give me a memory error correct (it is currently running)? And if so, how can I calculate the median without having to allocate double the memory? I want to avoid sorting the array as calculating the median is O(n).

Your problem is even worse than you realize, because ArrayList<Double> is much less efficient than 8 bytes per entry. Each entry is actually an object, to which the ArrayList keeps an array of references. A Double object is probably about 12 bytes (4 bytes for some kind of type identifier, 8 bytes for the double itself), and the reference to it adds another 4, bringing the total up to 16 bytes per entry, even excluding overhead for memory management and such.
If the constraints were a little wider, you could implement your own DoubleArray that is backed by a double[] but knows how to resize itself. However, the resizing means you'll have to keep a copy of both the old and the new array in memory at the same time, also blowing your memory limit.
That still leaves a few options though:
Loop through the input twice; once to count the entries, once to read them into a right-sized double[]. It depends on the nature of your input whether this is possible, of course.
Make some assumption on the maximum input size (perhaps user-configurable), and allocate a double[] up front that is this fixed size. Use only the part of it that's filled.
Use float instead of double to cut memory requirements in half, at the expense of some precision.
Rethink your algorithm to avoid holding everything in memory at once.

There are many open source libraries that create dynamic arrays for primitives. One of these:
http://trove.starlight-systems.com/

The Median value is the value at the middle of a sorted list. So you don't have to use a second array, you can just do:
Collections.sort(myArray);
final double median = myArray.get(myArray.size() / 2);
And since you get that data from a DB anyways, you could just tell the DB to give you the median instead of doing it in Java, which will save all the time (and memory) for transmitting the data as well.

I agree, use Trove4j TDoubleArrayList class (see javadoc) to store double or TFloatArrayList for float. And by combining previous answers, we gets :
// guess initialcapacity to remove requirement for resizing
TDoubleArrayList data = new TDoubleArrayList(initialcapacity);
// fill data
data.sort();
double median = data.get(data.size()/2);

Representing a 100K X 100K matrix in Java

How can I store a 100K X 100K matrix in Java?
I can't do that with a normal array declaration as it is throwing a java.lang.OutofMemoryError.

The Colt library has a sparse matrix implementation for Java.
You could alternatively use Berkeley DB as your storage engine.
Now if your machine has enough actual RAM (at least 9 gigabytes free), you can increase the heap size in the Java command-line.

If the vast majority of entries in your matrix will be zero (or even some other constant value) a sparse matrix will be suitable. Otherwise it might be possible to rewrite your algorithm so that the whole matrix doesn't exist simultaneously. You could produce and consume one row at a time, for example.

Sounds like you need a sparse matrix. Others have already suggested good 3rd party implementations that may suite your needs...
Depending on your applications, you could get away without a third-party matrix library by just using a Map as a backing-store for your matrix data. Kind of...
public class SparseMatrix<T> {
private T defaultValue;
private int m;
private int n;
private Map<Integer, T> data = new TreeMap<Integer, T>();
/// create a new matrix with m rows and n columns
public SparseMatrix(int m, int n, T defaultValue) {
this.m = m;
this.n = n;
this.defaultValue = defaultValue;
}
/// set value at [i,j] (row, col)
public void setValueAt(int i, int j, T value) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
data.put(i * n + j, value);
}
/// retrieve value at [i,j] (row, col)
public T getValueAt(int i, int j) {
if (i >= m || j >= n || i < 0 || j < 0)
throw new IllegalArgumentException(
"index (" + i + ", " +j +") out of bounds");
T value = data.get(i * n + j);
return value != null ? value : defaultValue;
}
}
A simple test-case illustrating the SparseMatrix' use would be:
public class SparseMatrixTest extends TestCase {
public void testMatrix() {
SparseMatrix<Float> matrix =
new SparseMatrix<Float>(100000, 100000, 0.0F);
matrix.setValueAt(1000, 1001, 42.0F);
assertTrue(matrix.getValueAt(1000,1001) == 42.0);
assertTrue(matrix.getValueAt(1001,1000) == 0.0);
}
}
This is not the most efficient way of doing it because every non-default entry in the matrix is stored as an Object. Depending on the number of actual values you are expecting, the simplicity of this approach might trump integrating a 3rd-party solution (and possibly dealing with its License - again, depending on your situation).
Adding matrix-operations like multiplication to the above SparseMatrix implementation should be straight-forward (and is left as an exercise for the reader ;-)

100,000 x 100,000 = 10,000,000,000 (10 billion) entries. Even if you're storing single byte entries, that's still in the vicinity of 10 GB - does your machine even have that much physical memory, let alone have a will to allocate that much to a single process?
Chances are you're going to need to look into some kind of a way to only keep part of the matrix in memory at any given time, and the rest buffered on disk.

There are a number possible solutions depending on how much memory you have, how sparse the array actually is, and what the access patterns are going to be.
If the calculation of 100K * 100K * 8 is less than the amount of physical memory on your machine for use by the JVM, a simple non-sparse array is viable solution.
If the array is sparse, with (say) 75% or more of the elements being zero, then you can save space by using a sparse array library. Various alternatives have been suggested, but in all cases, you still need to work out if this is going to give you enough savings. Figure out how many non-zero elements there are going to be, multiply that by 8 (to give you doubles) and (say) 4 to account for the overheads of the sparse array. If that is less than the amount of physical memory that you can make available to the JVM, then sparse arrays are a viable solution.
If sparse and non-sparse arrays (in memory) won't work, things will get more complicated, and the viability of any solution will depend on the access patterns for the array data.
One approach is to represent the array as a file that is mapped into memory in the form of a MappedByteBuffer. Assuming that you don't have enough physical memory to store the entire file in memory, you are going to be hitting the virtual memory system hard. So it is best if your algorithm only needs to operate on contiguous sections of the array at any time. Otherwise, you'll probably die from swapping.
A second approach is a variation of the first. Map the array/file a section at a time, and when you are done, unmap and move to the next section. This only works if the algorithm works on the array in sections.
A third approach is to represent the array using a light-weight database like BDB. This will be slower than any in-memory solution because reading array elements will translate into disc accesses. But if you get it wrong it won't kill the system like the memory mapped approach will. (And if you do this on Linux/Unix, the system's disc block cache may speed things up, depending on your algorithm's array access patterns)
A fourth approach is to use a distributed memory cache. This replaces disc i/o with network i/o, and it is hard to say whether this is a good or bad thing.
A fifth approach is to analyze your algorithm and see if it is amenable to implementing as a distributed algorithm; e.g. with sections of the array and corresponding parts of the algorithm on different machines.

You can upgrade to this machine:
http://www.azulsystems.com/products/compute_appliance.htm
864 processor cores and 768 GB of memory, only costs a single family house somewhere.

Well, I'd suggest that you increase the memory in your jvm but you've going to need a lot of memory, as you're talking about 10 billion items. It's (barely) possible with lots of memory or a clustered jvm, but that's probably the wrong answer.
You're getting the outOfmemory because if you declare int[1000], the memory is allocated immediately (additionally doubles take up more space than ints-an int representation will also save you space). Maybe you can substitute a more efficient implementation of your array (if you have many empty entries lookup "sparse matrix" representations).
You could store pieces in an outside system, like memcached or memory-mapped buffers.
There are lots of good suggestions here, maybe if you posted a more detailed description of the problem you're trying to solve people could be more specific.

You should try an "external" package to handle matrices, I never did that though, maybe something like jama.

Unless you have 100K x 100K x 8 ~ 80GB of memory, you cannot create this matrix in memory. You can create this matrix on disk and access it using memory mapping. However, using this approach will be very slow.
What are you trying to do? You may find that representing your data in a different way will be much more efficient.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.