How to store table or matrix in Java?

How to store table or matrix in Java? - java

I used to use matrix in octave to store data from data set, in Java how can I do that? Assume I have 10-20 columns and large data, I don't think
int [][]data;
would be the best option. Is nested map the only solution?

You could create a class Coordinate that takes an X and Y values and properly implement hashCode and equals.
Then create a HashMap<Coordinate, Data> and work with it.

Depends on what you need to do. If you know the size of the lists, then an array is definitely ideal since it means you will have instant access (read/write time) to any position in the array, this is very useful for speed.
Maps are better if you dont know the size and it needs to be able to adapt.
And finally, as I discovered in a previous question, if you have a TON of data, and a lot of it will be "0" you might want to also consider using a Sparse Martrix

This answer merges some of gnomed's answer and SJuan76's answer contents.
At a quick glance, I'd suggest you to use bidimentional arrays such as int[][].
It's not a very huge amount of data (we're speaking of ≈500 ints) so it's not a bad idea.
Advantages: It's the simpler, ideal (from the data-structuring side) way to go,
especially if every “slot” of the matrix contains data.
The inconvenient: You have to know the size of the matrix before constructing it.
Anyway, you can resize it later using the Arrays utilities.
If you want more effective handling of the data, you can use a single point-map.
That is, the key of every entry is a java.awt.Point that defines where is the value located.
Advantages: It's more effective than having a 2D array,
especially if part of your matrix doesn't contain data.
And it's adaptative; you don't need to know any sizes to construct/resize it.
The inconvenient: If every “slot” of your matrix contains data,
you'll loose (a lot of) space and performance. A 2D-array is more effective then.
Want more? If your data is really huge you can use a sparse matrix.
See this question for more details.

I would not discard multidimensional arrays so far: have you tried them? Are you finding specific limitations? IMHO as long as your data fits in memory, arrays can be good.
If your data is very sparse though, you may want to look at maps indeed.
Related question btw: Making a very large Java array

You can use multidimensional arrays or you can try any pairs like HashMap

I think multi-dimentional arrays are the best choice! They should serve your purpose. If your data set is only integers, int [] [] is an ideal choice.

Well, if your indices are small integers, you can certainly use nested arrays.
In a matrix class, you may want to use a plain array, like so: (assuming n is the number of columns)
double get(int i, int j) { return data[i*n + j]; }
For a general table (sparse matrix), you can use nested maps, but consider using com.google.common.collect.Table implementations from the Google Guava library.

Related

Best pratice for using array as the key of memoization in Java

I am doing some algorithm problems in Java, and from time to time the problem needs memoization to optimize speed. And often times, the key is an array. What I usually uses is
HashMap<ArrayList<Integer>, Integer> mem;
The main reason here to use ArrayList<Integer> instead of int[] is that the hashCode() of an primitive array is calculated based on the reference, but for ArrayList<Integer> the value of the actual array is compared, which is desired behavior.
However, it is not very efficient and code can be pretty lengthy as well. So I am wondering if there is any best practice for this kind of memoization in Java? Thanks.
UPDATE: As many have pointed this out in the comments: it is a very bad idea to use mutable objects as the key of a HashMap, which I totally agree.
And I am going to clarify the question a little bit more: when I use this type of memoization, I will NOT change the ArrayList<Integer> once it is inserted to the map. Normally the array represents some status, and I need to cache the corresponding value for that status in case it is visited again.
So please do not focus on how bad it is to use a mutable object as the key to a HashMap. Do suggest some better way to do this kind of memoization please.
UPDATE2: So at last I choose the Arrays.toString() approach since I am doing algorithm problems on TopCoder/Codeforces, and it is just dirty and fast to code.
However, I do think HashMap is the more reasonable and readable way to do this.

You can create a new class - Key, put an array with some numbers as a field and implement your own hascode() based on the contents of the array.
It will improve the readability as well:
HashMap<Key, Integer> mem;

If your ArrayList contains usually 3-4 elements,
I would not worry much about performance. Your approach is OK.
But as others pointed out, your key is thus mutable which is
a bad idea.
Another approach is to append all elements of the ArrayList
together using some separator (say #) and thus have this kind
of string for key: 123#555#66678 instead of an ArrayList of
these 3 integers. You can just call Arrays.toString(int[])
and get a decent string key out of an array of integers.
I would choose the second approach.

If the input array is large, the main problem seems to be the efficiency of lookup. On the other hand, your computation is probably much more expensive than that, so you've got same CPU cycles to spare.
Lookup time will depend both on the hashcode calculation and on the brute-force equals needed to pinpoint the key in a hash bucket. This is why the array as a key is out of the question.
The suggestion already given by user:XpressOneUp, creating a class which wraps the array and provides its custom hash code, seems like your best bet and you can optimize hashcode calculation to involve only some array elements. You'll know best which elements are the most salient.

If the values in the array are small integer than here is way to do it efficiently :-
HashMap<String,Integer> Map
public String encode(ArrayList arr) {
String key = "";
for(int i=0;i<arr.size();i++) {
key = key + arr.get(i) + ",";
}
return(key);
}
Use the encode method to convert your array to unique string use to add and lookup the values in HashMap

Multidimensional arrays with different sizes

I just had an idea to test something out and it worked:
String[][] arr = new String[4][4];
arr[2] = new String[5];
for(int i = 0; i < arr.length; i++)
{
System.out.println(arr[i].length);
}
The output obviously is:
4
4
5
4
So my questions are:
Is this good or bad style of coding?
What could this be good for?
And most of all, is there a way to create such a construct in the declaration itself?
Also... why is it even possible to do?

Is this good or bad style of coding?
Like anything, it depends on the situation. There are situations where jagged arrays (as they are called) are in fact appropriate.
What could this be good for?
Well, for storing data sets with different lengths in one array. For instance, if we had the strings "hello" and "goodbye", we might want to store their character arrays in one structure. These char arrays have different lengths, so we would use a jagged array.
And most of all, is there a way to create such a construct in the declaration itself?
Yes:
char[][] x = {{'h','e','l','l','o'},
{'g','o','o','d','b','y','e'}};
Also... why is it even possible to do?
Because it is allowed by the Java Language Specification, §10.6.

This is a fine style of coding, there's nothing wrong with it. I've created jagged arrays myself for different problems in the past.
This is good because you might need to store data in this way. Data stored this way would allow you to saves memory. It would be a natural way to map items more efficiently in certain scenarios.
In a single line, without explicitly populating the array? No. This is the closest thing I can think of.
int[][] test = new int[10][];
test[0] = new int[100];
test[1] = new int[500];
This would allow you to populate the rows with arrays of different lengths. I prefer this approach to populating with values like so:
int[][] test = new int[][]{{1,2,3},{4},{5,6,7}};
Because it is more readable, and practical to type when dealing with large ragged arrays.
Its possible to do for the reasons given in 2. People have valid reasons for needing ragged arrays, so the creators of the language gave us a way to do it.

(1) While nothing is technically/functionally/syntactically wrong with it, I would say it is bad coding style because it breaks the assumption provided by the initialization of the object (String[4][4]). This, ultimately, is up to user preference; if you're the only one reading it, and you know exactly what you're doing, it would be fine. If other people share/use your code, it adds confusion.
(2) The only concept I could think of is if you had multiple arrays to be read in, but didn't know the size of them beforehand. However, it would make more sense to use ArrayList<String> in that case, unless the added overhead was a serious matter.
(3) I'm not sure what you're asking about here. Do you mean, can you somehow specify individual array lengths in that initial declaration? The answer to that is no.
(4) Its possible to extend and shrink primitive array lengths because behind the scenes, you're just allocating and releasing chunks of memory.

Memory-efficient sparse array in Java

(There are some questions about time-efficient sparse arrays but I am looking for memory efficiency.)
I need the equivalent of a List<T> or Map<Integer,T> which
Can grow on demand just by setting a key larger than any encountered before. (Can assume keys are nonnegative.)
Is about as memory-efficient as an ArrayList<T> in the case that most of the indices are not null, i.e. when the actual data is not very sparse.
When the indices are sparse, consumes space proportional to the number of non-null indices.
Uses less memory than HashMap<Integer,T> (as this autoboxes the keys and probably does not take advantage of the scalar key type).
Can get or set an element in amortized log(N) time where N is the number of entries: need not be linear time, binary search would be acceptable.
Implemented in a nonviral open-source pure Java library (preferably in Maven Central).
Does anyone know of such a utility class?
I would have expected Commons Collections to have one but it did not seem to.
I came across org.apache.commons.math.util.OpenIntToFieldHashMap which looks almost right except the value type is a FieldElement which seems gratuitous; I just want T extends Object. It looks like it would be easy to edit its source code to be more generic, though I would rather use a binary dependency if one is available.

I would try with trove collections, there is TIntObjectMap which can work for your intents.

I would look at Android's SparseArray implementation for inspiration. You can view the source by downloading AOSP's source code here http://source.android.com/source/downloading.html

I will suggest you to use OpenIntObjectHashMap from Colt library. Link

I have saved my test case as jglick/inthashmap. The results:
HashMap size: 1017504
TIntObjectMap size: 853216
IntHashMap size: 846984
OpenIntObjectHashMap size: 760472

Late to this question, but there is IntMap in libgdx which uses cuckoo hashing. If anything it would be interesting to compare with the others.

Fastest way to access a table of data Java

Basically I am amidst a friendly code optimisation battle (to get the fastest program), I am trying to find a way that is faster to access a dictionary of hard coded data than a multidimensional array.
e.g to get the value for x:
int x = array[v1][v2][v3] ;
I have read that nested switch statements in a custom array may possibly be faster. Or is there a way I can possibly access memory more directly similar to pointers in C. Any ideas appreciated!
My 'competitor' is using a truth table and idea is to find something faster!
Many Thanks
Sam

If the array is regular in shape (i.e. MxNxK for some fixed M, N and K), you could try flattening it to achieve better locality of reference:
int array[] = new int[M*N*K];
...
int x = array[v1*N*K + v2*K + v3];
Also, if the entire array doesn't fit in the CPU cache, you might want to examine the patterns in which the array is accessed, to perhaps re-order the indices or change your code to make better use of the caches.

How should I map string keys to values in Java in a memory-efficient way?

I'm looking for a way to store a string->int mapping. A HashMap is, of course, a most obvious solution, but as I'm memory constrained and need to store 2 million pairs, 7 characters long keys, I need something that's memory efficient, the retrieval speed is a secondary parameter.
Currently I'm going along the line of:
List<Tuple<String, int>> list = new ArrayList<Tuple<String, int>>();
list.add(...); // load from file
Collections.sort(list);
and then for retrieval:
Collections.binarySearch(list, key); // log(n), acceptable
Should I perhaps go for a custom tree (each node a single character, each leaf with result), or is there an existing collection that fits this nicely? The strings are practically sequential (UK postcodes, they don't differ much), so I'm expecting nice memory savings here.

Edit: I just saw you mentioned the String were UK postcodes so I'm fairly confident you couldn't get very wrong by using a Trove TLongIntHashMap (btw Trove is a small library and it's very easy to use).
Edit 2: Lots of people seem to find this answer interesting so I'm adding some information to it.
The goal here is to use a map containing keys/values in a memory-efficient way so we'll start by looking for memory-efficient collections.
The following SO question is related (but far from identical to this one).
What is the most efficient Java Collections library?
Jon Skeet mentions that Trove is "just a library of collections from primitive types" [sic] and, that, indeed, it doesn't add much functionality. We can also see a few benchmarks (by the.duckman) about memory and speed of Trove compared to the default Collections. Here's a snippet:
100000 put operations 100000 contains operations
java collections 1938 ms 203 ms
trove 234 ms 125 ms
pcj 516 ms 94 ms
And there's also an example showing how much memory can be saved by using Trove instead of a regular Java HashMap:
java collections oscillates between 6644536 and 7168840 bytes
trove 1853296 bytes
pcj 1866112 bytes
So even though benchmarks always need to be taken with a grain of salt, it's pretty obvious that Trove will save not only memory but will always be much faster.
So our goal now becomes to use Trove (seen that by putting millions and millions of entries in a regular HashMap, your app begins to feel unresponsive).
You mentioned 2 million pairs, 7 characters long keys and a String/int mapping.
2 million is really not that much but you'll still feel the "Object" overhead and the constant (un)boxing of primitives to Integer in a regular HashMap{String,Integer} which is why Trove makes a lot of sense here.
However, I'd point out that if you have control over the "7 characters", you could go even further: if you're using say only ASCII or ISO-8859-1 characters, your 7 characters would fit in a long (*). In that case you can dodge altogether objects creation and represent your 7 characters on a long. You'd then use a Trove TLongIntHashMap and bypass the "Java Object" overhead altogether.
You stated specifically that your keys were 7 characters long and then commented they were UK postcodes: I'd map each postcode to a long and save a tremendous amount of memory by fitting millions of keys/values pair into memory using Trove.
The advantage of Trove is basically that it is not doing constant boxing/unboxing of Objects/primitives: Trove works, in many cases, directly with primitives and primitives only.
(*) say you only have at most 256 codepoints/characters used, then it fits on 7*8 == 56 bits, which is small enough to fit in a long.
Sample method for encoding the String keys into long's (assuming ASCII characters, one byte per character for simplification - 7 bits would be enough):
long encode(final String key) {
final int length = key.length();
if (length > 8) {
throw new IndexOutOfBoundsException(
"key is longer than 8 characters");
}
long result = 0;
for (int i = 0; i < length; i++) {
result += ((long) ((byte) key.charAt(i))) << i * 8;
}
return result;
}

Use the Trove library.
The Trove library has optimized HashMap and HashSet classes for primitives. In this case, TObjectIntHashMap<String> will map the parameterized object (String) to a primitive int.

First of, did you measure that LinkedList is indeed more memory efficient than a HashMap, or how did you come to that conclusion? Secondly, a LinkedList's access time of an element is O(n), so you cannot do efficient binary search on it. If you want to do such approach, you should use an ArrayList, which should give you the beast compromise between performance and space. However, again, I doubt that a HashMap, HashTable or - in particular - a TreeMap would consume that much more memory, but the first two would provide constant access and the tree map logarithmic and provide a nicer interface that a normal list. I would try to do some measurements, how much the difference in memory consumption really is.
UPDATE: Given, as Adamski pointed out, that the Strings themselves, not the data structure they are stored in, will consume the most memory, it might be a good idea to look into data structures that are specific for strings, such as tries (especially patricia tries), which might reduce the storage space needed for the strings.

What you are looking for is a succinct-trie - a trie which stores its data in nearly the least amount of space theoretically possible.
Unfortunately, there are no succinct-trie classes libraries currently available for Java. One of my next projects (in a few weeks) is to write one for Java (and other languages).
In the meanwhile, if you don't mind JNI, there are several good native succinct-trie libraries you could reference.

Have you looked at tries. I've not used them but they may fit with what you're doing.

A custom tree would have the same complexity of O(log n), don't bother. Your solution is sound, but I would go with an ArrayList instead of the LinkedList because the linked list allocates one extra object per stored value, which will amount to a lot of objects in your case.

As Erick writes using the Trove library is a good place to start as you save space in storing int primitives rather than Integers.
However, you are still faced with storing 2 million String instances. Given that these are keys in the map, interning them won't offer any benefit so the next thing I'd consider is whether there's some characteristic of the Strings that can be exploited. For example:
If the Strings represent sentences of common words then you could transform the String into a Sentence class, and intern the individual words.
If the Strings only contain a subset of Unicode characters (e.g. only letters A-Z, or letters + digits) you could use a more compact encoding scheme than Java's Unicode.
You could consider transforming each String into a UTF-8 encoded byte array and wrapping this in class: MyString. Obviously the trade-off here is the additional time spent performing look-ups.
You could write the map to a file and then memory map a portion or all of the file.
You could consider libraries such as Berkeley DB that allow you to define persistent maps and cache a portion of the map in memory. This offers a scalable approach.

maybe you can go with a RadixTree?

Use java.util.TreeMap instead of java.util.HashMap. It makes use of a red black binary search tree and doesn't use more memory than what is required for holding notes containing the elements in the map. No extra buckets, unlike HashMap or Hashtable.

I think the solution is to step a little outside of Java. If you have that many values, you should use a database. If you don't feel like installing Oracle, SQLite is quick and easy. That way the data you don't immediately need is stored on the disk, and all of the caching/storage is done for you. Setting up a DB with one table and two columns won't take much time at all.

I'd consider using some cache as these often have the overflow-to-disk ability.

You might create a key class that matches your needs. Perhaps like this:
public class MyKey implements Comparable<MyKey>
{
char[7] keyValue;
public MyKey(String keyValue)
{
... load this.keyValue from the String keyValue.
}
public int compareTo(MyKey rhs)
{
... blah
}
public boolean equals(Object rhs)
{
... blah
}
public int hashCode()
{
... blah
}
}

try this one
OptimizedHashMap<String, int[]> myMap = new OptimizedHashMap<String, int[]>();
for(int i = 0; i < 2000000; i++)
{
myMap.put("iiiiii" + i, new int[]{i});
}
System.out.println(myMap.containsValue(new int[]{3}));
System.out.println(myMap.get("iiiiii" + 1));
public class OptimizedHashMap<K,V> extends HashMap<K,V>
{
public boolean containsValue(Object value) {
if(value != null)
{
Class<? extends Object> aClass = value.getClass();
if(aClass.isArray())
{
Collection values = this.values();
for(Object val : values)
{
int[] newval = (int[]) val;
int[] newvalue = (int[]) value;
if(newval[0] == newvalue[0])
{
return true;
}
}
}
}
return false;
}

Actually HashMap and List are too general for such specific task as a lookup of int by zipcode. You should use advantage of knowledge which data is used. One of the options is to use a prefix tree with leaves that stores the int value. Also, it could be pruned if (my guess) a lot of codes with same prefixes map to the same integer.
Lookup of the int by zipcode will be linear in such tree and will not grow if number of codes is increased, compare to O(log(N)) in case of binary search.

Since you are intending to use hashing, you can try numerical conversions of the strings based on ASCII values.
the simplest idea will be
int sum=0;
for(int i=0;i<arr.length;i++){
sum+=(int)arr[i];
}
hash "sum" using a well defined hash functions. You would use a hash function based on the expected input patterns.
e.g. if you use division method
public int hasher(int sum){
return sum%(a prime number);
}
selecting a prime number which is not close to an exact power of two improves performances and gives better uniformly hashed distribution of keys.
another method is to weigh the characters based on their respective position.
e.g: if you use the above method, both "abc" and "cab" will be hashed into a same location. but if you need them to be stored in two distinct location give weights for locations like we use the number systems.
int sum=0;
int weight=1;
for(int i=0;i<arr.length;i++){
sum+= (int)arr[i]*weight;
weight=weight*2; // using powers of 2 gives better results. (you know why :))
}
As your sample is quite large, you'd avoid collisions by a chaining mechanism rather than using a probe sequence.
After all,What method you would choose totally depends on the nature of your application.

The problem is objects' memory overhead, but using some tricks you can try to implement your own hashset. Something like this. Like others said strings have quite large overhead so you need to "compress" it somehow. Also try not to use too many arrays(lists) in hashtable (if you do chaining type hashtable) as they are also objects and also have overhead. Better yet do open addressing hashtable.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to store table or matrix in Java? - java

I used to use matrix in octave to store data from data set, in Java how can I do that? Assume I have 10-20 columns and large data, I don't think int [][]data; would be the best option. Is nested map the only solution?

You could create a class Coordinate that takes an X and Y values and properly implement hashCode and equals. Then create a HashMap<Coordinate, Data> and work with it.

I would not discard multidimensional arrays so far: have you tried them? Are you finding specific limitations? IMHO as long as your data fits in memory, arrays can be good. If your data is very sparse though, you may want to look at maps indeed. Related question btw: Making a very large Java array

You can use multidimensional arrays or you can try any pairs like HashMap

I think multi-dimentional arrays are the best choice! They should serve your purpose. If your data set is only integers, int [] [] is an ideal choice.

Related

Best pratice for using array as the key of memoization in Java

Multidimensional arrays with different sizes

Memory-efficient sparse array in Java

Fastest way to access a table of data Java

How should I map string keys to values in Java in a memory-efficient way?

Categories

Resources