Best way to store a table of data

Best way to store a table of data - java

I have a table of data (the number of columns can vary in length on different rows). I also need to be able to delete or add new rows of data.
What is the best way to store this data?
My first guess would be an ArrayList.

Two approaches:
Convert everything to strings and use an ArrayList<List<String>> where each entry is a row represented by an ArrayList<String>.
Advantage: Don't need to create your own class to represent a "row".
Disadvantage: Need to convert data, can't do mathematical operations without converting data back, need to make sure all rows are the same length.
As dystroy said, create a class representing a Row in the table, and use an ArrayList<Row>
Advantage: entries keep their actual types, rows don't have variable lengths (unless you want them to), and you can have meaningful ways to access columns (e.g. row.getDate() instead of row.get(3) ).
Disadvantage: might be more work to create the additional class.

I'd choose LinkedList especially if you expect your list to work as a Stack.
Main Drawback of ArrayList is that this one recreates a larger table when capacity is reached => table allocation and copy gets performance slower.
Whereas with LinkedList, there is no concept of capacity since all works by pointers.
According to me, the main (and probably unique in mostly cases...) reason to prefer ArrayList rather than LinkedList is when you mainly want to access (read part so) a particular index. With ArrayList it is O(1), whereas with LinkedList it is O(n).
You can read this post for more information :
When to use LinkedList over ArrayList?

Related

What are the benefits of using Map over ArrayList of costume class

I am learning Java now and I am learning about different kinds of collections, so far I learned about LinkedList, ArrayList and Array[].
Now I've been introduced to Hash types of collections, HashSet and HashMap, and I didn't quite understand why there are useful, because the list of commands that they support is quietly limited, also, they are sorted in a random order and I need to Override the equal and HashKey methods in order to make it work right with class.
Now, what I don't understand is the benefits over the hassle of using these types instead of ArrayList of a costume class.
I mean, what Map is doing is connecting 2 objects as 1, but wouldn't it just be better to create a class that contains this 2 objects as parameters, and have getters to modify and use them?
If the benefit is that this Hash objects can only contain 1 object of the same name, wouldn't it just be easier to make the ArrayList check that the type is not already there before adding it?
So far I learned to choose when to use LinkedList, ArrayList or Array[] by the rule of "if it's really simple, use Array[], if it's a bit more complex use ArrayList (for example to hold collection of certain class), and if the list is dynamic with a lot of objects inside that need to change order according to removing or adding a new one in the middle or go back and forth within the list then use LinkedList.
But I couldn't understand when to prefer HashMap or HashSet, and I would be really glad if you could explain it to me.

Let me help you out here...
Hashed collections are the most efficient to add, search and remove data, since they hash the key (in HashMap) or the element (in HashSet) to find the place where they belong in a single step.
The concept of hashing is really simple. It is the process of representing an object as a number that can work as it´s id.
For example, if you have a string in Java like String name = "Jeremy";, and you print its hashcode: System.out.println(name.hashCode());, you will see a big number there (-2079637766), that was created using that string object values (in this string object, it's characters), that way, that number can be used as an Id for that object.
So the Hashed collections like the ones mentioned above, use this number to use it as an array index to find the elements in no-time. But obviously is too big to use it as an array index for a possible small array. So they need to reduce that number so it fits in the range of the array size. (HashMap and HashSet use arrays to store their elements).
The operation that they use to reduce that number is called hashing, and is something like this: Math.abs(-2079637766 % arrayLength);.
It's not like that exactly, it's a bit more complex, but this is to simplify.
Let's say that arrayLength = 16;
The % operator will reduce that big number to a number smaller than 16, so that it can be fit in the array.
That is why a Hashed collection will not allow duplicate, because if you try to add the same object or an equivalent one (like 2 strings with the same characters), it will produce the same hashcode and will override whatever value is in the result index.
In your question, you mentioned that if you are worried about duplicates items in an ArrayList, we can just check if the item is there before inserting it, so this way we don't need to use a HashSet. But that is not a good idea, because if you call the method list.contains(elem); in an ArrayList, it needs to go one by one comparing the elements to see if it's there. If you have 1 million elements in the ArrayList, and you check if an element is there, but it is not there, the ArrayList iterated over 1 million elements, that is not good. But with a HashSet, it would only hashed the object and go directly where it is supposed to be in the array and check, doing it in just 1 step, instead of 1 million. So you see how efficient a HashSet is compared to an ArrayList.
The same happens with a HashMap of size 1 million, that it will only take 1 single step to check if a key is there, and not 1 million.
The same thing happens when you need to add, find and remove an element, with the hashed collections it will do all that in a single step (constant time, doesn't depend on the size of the map), but that varies for other structures.
That's why it is really efficient and widely used.
Main Difference between an ArrayList and a LinkedList:
If you want to find the element at place 500 in an ArrayList of size 1000, you do: list.get(500); and it will do that in a single step, because an ArrayList is implemented with an array, so with that 500, it goes directly where the element is in the array.
But a LinkedList is not implemented with an array, but with objects pointing to each other. This way, they need to go linearly and counting from 0, one by one until they get to the 500, which is not really efficient compared to the 1 single step of the ArrayList.
But when you need to add and remove elements in an ArrayList, sometimes the Array will need to be recreated so more elements fit in it, increasing the overhead.
But that doesn't happen with the LinkedList, since no array has to be recreated, only the objects (nodes) have to be re-referenced, which is done in a single step.
So an ArrayList is good when you won't be deleting or adding a lot of elements on the structure, but you are going to read a lot from it.
If you are going to add and remove a lot of elements, then is better a linked list since it has less work to do with those operations.
Why you need to implement the equals(), hashCode() methods for user-defined classes when you want to use those objects in HashMaps, and implement Comparable interface when you want to use those objects with TreeMaps?
Based on what I mentioned earlier for HashMaps, is possible that 2 different objects produce the same hash, if that happens, Java will not override the previous one or remove it, but it will keep them both in the same index. That is why you need to implement hashCode(), so you make sure that your objects will not have a really simple hashCode that can be easily duplicated.
And the reason why is recommended to override the equals() method is that if there is a collision (2 or more objects sharing the same hash in a HashMap), then how do you tell them apart? Well, asking the equals() method of those 2 objects if they are the same. So if you ask the map if it contains a certain key, and in that index, it finds 3 elements, it asks the equals() methods of those elements if its equals() to the key that was passed, if so, it returns that one. If you don't override the equals() method properly and specify what things you want to check for equality (like the properties name, age, etc.), then some unwanted overrides inside the HashMap will happen and you will not like it.
If you create your own classes, say, Person, and has properties like name, age, lastName and email, you can use those properties in the equals() method and if 2 different objects are passed but have the same values in your selected properties for equality, then you return true to indicate that they are the same, or false otherwise. Like the class String, that if you do s1.equals(s2); if s1 = new String("John"); and s2 = new String("John");, even though they are different objects in Java Heap Memory, the implementation of String.equals method uses the characters to determine if the objects are equals, and it returns true for this example.
To use a TreeMap with user-defined classes, you need to implement the Comparable interface, since the TreeMap will compare and sort the objects based on some properties, you need to specify by which properties your objects will be sorted. Will your objects be sorted by age? By name? By id? Or by any other property that you would like. Then, when you implement the Comparable interface and override the compareTo(UserDefinedClass o) method, you do your logic and return a positive number if the current object is greater than the o object passed, 0 if they are the same and a negative number if the current object is smaller. That way, the TreeMap will know how to sort them, based on the number returned.

First HashSet. In HashSet, you can easily get whether it contains given element. Let's have a set of people in your class and you want to ask whether a guy is in your class. You can make an array list of strings. And if you want to ask if a guy is in your class, you have to iterate through whole the list until you find him, which might be too slow for longer lists. If you use HashSet instead, the operation is much faster. You calculate the hash of the searched string and then you go directly to the hash, so you don't need to pass so many elements to answer your question. Well, you can also make a workaround to make the ArrayList faster to access for this purpose but this is already prepared.
And now HashMap. Now imagine that you also want to store a score for each person. So now you can use HashMap. You enter the name and you get his score in a short time, without the need of iterating through whole the data structure.
Does it make sense?

Concerning your question:
"But I couldn't understand when to prefer HashMap or HashSet, and I
would be really glad if you could explain it to me"
The HashMap implement the Map interface, to be used for mapping a Key (K) to a value (V) in constant time, and where order doesn't matter, so you can put and retrieve those data efficiently if you now the key.
And HashSet implement the Set interface, but is internanly using and HashMap, its role is to be used as a Set, meaning you're not supposed to retrieve an element, you just check that is in the set or not (mostly).
In HashMap, you can have identical value, while you can't in a Set (because its a property of a Set).
Concerning this question :
If the benefit is that this Hash objects can only contain 1 object of the same name, >wouldn't it just be easier to make the ArrayList check that the type is not already >there before adding it?
When dealing with collection, you have may base you choice of a particular one on the data representation but also on the way you want to access and store those data, how do you access it ? Do you need to sort them ? Because each implemenation may have different complexity (https://en.wikipedia.org/wiki/Time_complexity), it become important.
Using the doc,
For ArrayList:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
For HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So it's about the time complexity.
You may choose even more untypical collection for certain problems :).

This has little to do with Java specifically, and the choice depends mostly on performance requirements, but there's a fundamental difference that must be highlighted. Conceptually, Lists are types of collections that keep the order of insertion and may have duplicates, Sets are more like bags of items that have no specific order and no duplicates. Of course, different implementations may find a way around it (like a TreeSet).
First, let's check the difference between ArrayList and LinkedList. A linked list is a set of nodes, where each node contains a value and a link to the next and previous nodes. This makes inserting an element to a linked list a matter of appending a node to the end of the list, which is a quick operation since the memory does not have to be contiguous, as long as a node keeps a reference to the next node. On the other side, accessing a specific element requires transversing the entire list until finding it.
An array list, as the name implies, wraps an array. Accessing elements in an array by using its index is direct access, but inserting an element implies resizing the array to include the new element, so the memory it occupies is contiguous, making writes a bit heavier in this case.
A HashMap works like a dictionary, where for each key there's a value. The behavior of the insertion will mostly depend on how the hashCode and equals functions of the object used as a key are implemented. If the hashCode of two keys is the same, there's a hash collision, so equals will be used to understand if it's the same key or not. If equals is the same, then it's the same key, so the value is replaced. If not, the new value is added to the collection. Accessing and Writing values depends mostly on calculating the hash of the key followed by direct access to the value, making both operations really quick, O(1).
A set is pretty much like a hash map, without the "values" part, thus, it follows the same rules regarding the implementation of hashCode and equals operations for the added value.
It might be handy to study a bit about the Big-O notation and complexity of algorithms. If you are starting with Java, I'd strongly recommend the book Effective Java, by Joshua Bloch.
Hope it helps you dig further.

Copying an array or forcing an initial size on ArrayList. What is more efficient?

I'm trying to make a mutable list (list in a non programming sense) of values. The list is to be a predecided length n.
My first choice was to use an array initialized to n. But since an array isn't mutable, the only choice I have if I want to add another value is to copy the array into an array of bigger size. I decided to try using an ArrayList instead.
Then the trade off is that I can't set the ArrayList to an initial size n. However, I can add values to the ArrayList, since it's mutable. So I'm wondering whether it would be more efficient to create an array then copy it into a bigger array when needed, or to make an ArrayList and 'initialize' all the n values with a for loop.
What would be better to use?

Have a look at the implementation of ArrayList is does exactly what you suggest. It simply creates a bigger array whenever needed and copies all the entries form the small array.
This is quite efficient as long as the copy task must only be performed rarely and on not too big arrays. This is why you should always try to create an ArrayList with a size parameter that is about the size you expect that the list will grow to (or a bit bigger). This way you can minimize (or maybe avoid) the replacement of the internal array by a bigger one and all the effort it takes (the array copy).
P.s.
If your data is of an elementar data type you might want to avoid the overhead (memory and speed) of boxing into wrapper types. In this case ArrayList is not suitable for you and you might either implement a similar mechanism for the elementar data type you need it for or have a look at the www. I bed there are plenty implementations like this around there.
P.p.s
If you are unsure about the performance of different approaches, it is always a good idea to implement both and profile the performance for test data that are as close to your real data as possible.

Java: Wrapping objects in some type of collection to store duplicates in a set

I want to make a set of some type of collection (not sure which one yet) as a way of "storing duplicates" in a set. For example if I wanted to add the integer 5 with 39 additional copies I could put it into an arraylist at index 39. Thus if I were to get the size of the arraylist, I would know how many copies of 5 existed within the set.
There are a few other ways I could implement this but I have yet to decide on one. The main issue I'm having with implementing this is that I'm not sure how I can "dynamically" make arraylists (or whatever collection I may end up using) so that whenever someone were to call mySet.add(object), the object is first inserted into a unique arraylist then into the set itself.
Can anyone give me some ideas on how I could approach this?
EDIT:
Sorry I should have been more clear in my question. The point of the code that I'm writing is that we have a set-like collection that allows duplicates. And yes some of the associated methods will be re-written/will have to be re-written. Also my code should be written under the assumption that we do not know what type of object is being inserted(only one data type per set though) nor how many instances of the same object will be added nor how many different unique objects will be added.

I would rather go for using a Map like
HashMap list <Object, Integer>
where Object is the Object that you want to count and Integer is the count

You could try guava's MultiSet, I think it's what you want.
It can store the count of each object. What you need to do is just
multiSet.put(object);
And if it is put for the first time, like you said, a new list will be created, or its count will added by one.

Guava Table vs. 2D array

What is the difference between using a Guava Table implementation and a 2D array if I know the size of the array beforehand?
Is one more efficient than the other? How?
Would it make a difference in running time?

The most obvious and crucial difference is that an array is always indexed, with ints, whereas a Table can be indexed with arbitrary objects.
Consider the Table Example from the Guava site:
Table<Vertex, Vertex, Double> weightedGraph = HashBasedTable.create();
weightedGraph.put(v1, v2, 4);
...
The indexing here happens via Vertex objects. If you wanted to do the same with an array, you would have to add some getIndex method to the Vertex class, and accesss the array like this
double array[][] = new double[4][5];
array[v1.getIndex()][v2.getIndex()] = 4;
This is inconvenient and particularly hard to maintain - especially, when the indices may change (or when vertices have to be added/removed, although you mentioned that this is not the case for you).
Additionally, the Guava table allows obtaining rows or columns as separate entities. In a 2D array, you can always access either one row or one column of the array - depending on how you interpret the 2 dimensions of the array. The table allows accessing both, each in form of a Map.
Concerning the performance: There will be cases where you'll have a noticable difference in performance. Particularly when you have a large 2D array of primitive types (int, double etc), and the alternative would be a large table with the corresponding reference types (Integer, Double etc). But again, this will only be noticable when the array/table is really large.

Additionally to what Marco13 said, read this: https://stackoverflow.com/a/6105705/1273080.
Collections are better than object arrays in basically every way
imaginable.
The same applies here. A 2D array is a low-level tool that might be needed when you need some high-performance structure for primitives. However, arrays have no meaningful methods, no behaviour, no nothing, and therefore are usually an underlying data structure in classes that add some behaviour to them. Do this with a 2D fixed-size array and you'll end up with ... a Guava-esque Table.
Also, a Table can be a 2D-array, or a Map<R, Map<C, V>>, in the future we might also have resizable Table implementations - all within one interface.
Regarding the performance - you should almost always go for the more high-level approach to get code as readable and clear as possible, then measure the performance and only if it's problematic, go for different approaches.

Randomly getting elements in a HashMap or HashSet without looping

I have roughly 420,000 elements that I need to store easily in a Set or List of some kind. The restrictions though is that I need to be able to pick a random element and that it needs to be fast.
Initially I used an ArrayList and a LinkedList, however with that many elements it was very slow. When I profiled it, I saw that the equals() method in the object I was storing was called roughly 21 million times in a very short period of time.
Next I tried a HashSet. What I gain in performance I loose in functionality: I can't pick a random element. HashSet is backed by a HashMap which is backed by an array of HashMap.Entry objects. However when I attempted to expose them I was hindered by the crazy private and package-private visibility of the entire Java Collections Framework (even copying and pasting the class didn't work, the JCF is very "Use what we have or roll your own").
What is the best way to randomly select an element stored in a HashSet or HashMap? Due to the size of the collection I would prefer not to use looping.
IMPORTANT EDIT: I forgot a really important detail: exactly how I use the Collection. I populate the entire Collection at the begging of the table. During the program I pick and remove a random element, then pick and remove a few more known elements, then repeat. The constant lookup and changing is what causes the slowness

There's no reason why an ArrayList or a LinkedList would need to call equals()... although you don't want a LinkedList here as you want quick random access by index.
An ArrayList should be ideal - create it with an appropriate capacity, add all the items to it, and then you can just repeatedly pick a random number in the appropriate range, and call get(index) to get the relevant value.
HashMap and HashSet simply aren't suitable for this.

If ALL you need to do is get a large collection of values and pick a random one, then ArrayList is (literally) perfect for your needs. You won't get significantly faster (unless you went directly to primitive array, where you lose benefits of abstraction.)
If this is too slow for you, it's because you're using other operations as well. If you update your question with ALL the operations the collection must service, you'll get a better answer.

If you don't call contains() (which will call equals() many times), you can use ArrayList.get(randomNumber) and that will be O(1)
You can't do it with a HashMap - it stores the objects internally in an array, where the index = hashcode for the object. Even if you had that table, you'd need to guess which buckets contain objects. So a HashMap is not an option for random access.

Assuming that equals() calls are because you sort out duplicates with contains(), you may want to keep both a HashSet (for quick if-already-present lookup) and an ArrayList (for quick random access). Or, if operations don't interleave, build a HashSet first, then extract its data with toArray() or transform it into ArrayList with constructor of the latter.
If your problems are due to remove() call on ArrayList, don't use it and instead:
if you remove not the last element, just replace (with set()) the removed element with the last;
shrink the list size by 1.
This will of course screw up element order, but apparently you don't need it, judging by description. Or did you omit another important detail?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.