I am learning Java now and I am learning about different kinds of collections, so far I learned about LinkedList, ArrayList and Array[].
Now I've been introduced to Hash types of collections, HashSet and HashMap, and I didn't quite understand why there are useful, because the list of commands that they support is quietly limited, also, they are sorted in a random order and I need to Override the equal and HashKey methods in order to make it work right with class.
Now, what I don't understand is the benefits over the hassle of using these types instead of ArrayList of a costume class.
I mean, what Map is doing is connecting 2 objects as 1, but wouldn't it just be better to create a class that contains this 2 objects as parameters, and have getters to modify and use them?
If the benefit is that this Hash objects can only contain 1 object of the same name, wouldn't it just be easier to make the ArrayList check that the type is not already there before adding it?
So far I learned to choose when to use LinkedList, ArrayList or Array[] by the rule of "if it's really simple, use Array[], if it's a bit more complex use ArrayList (for example to hold collection of certain class), and if the list is dynamic with a lot of objects inside that need to change order according to removing or adding a new one in the middle or go back and forth within the list then use LinkedList.
But I couldn't understand when to prefer HashMap or HashSet, and I would be really glad if you could explain it to me.
Let me help you out here...
Hashed collections are the most efficient to add, search and remove data, since they hash the key (in HashMap) or the element (in HashSet) to find the place where they belong in a single step.
The concept of hashing is really simple. It is the process of representing an object as a number that can work as it´s id.
For example, if you have a string in Java like String name = "Jeremy";, and you print its hashcode: System.out.println(name.hashCode());, you will see a big number there (-2079637766), that was created using that string object values (in this string object, it's characters), that way, that number can be used as an Id for that object.
So the Hashed collections like the ones mentioned above, use this number to use it as an array index to find the elements in no-time. But obviously is too big to use it as an array index for a possible small array. So they need to reduce that number so it fits in the range of the array size. (HashMap and HashSet use arrays to store their elements).
The operation that they use to reduce that number is called hashing, and is something like this: Math.abs(-2079637766 % arrayLength);.
It's not like that exactly, it's a bit more complex, but this is to simplify.
Let's say that arrayLength = 16;
The % operator will reduce that big number to a number smaller than 16, so that it can be fit in the array.
That is why a Hashed collection will not allow duplicate, because if you try to add the same object or an equivalent one (like 2 strings with the same characters), it will produce the same hashcode and will override whatever value is in the result index.
In your question, you mentioned that if you are worried about duplicates items in an ArrayList, we can just check if the item is there before inserting it, so this way we don't need to use a HashSet. But that is not a good idea, because if you call the method list.contains(elem); in an ArrayList, it needs to go one by one comparing the elements to see if it's there. If you have 1 million elements in the ArrayList, and you check if an element is there, but it is not there, the ArrayList iterated over 1 million elements, that is not good. But with a HashSet, it would only hashed the object and go directly where it is supposed to be in the array and check, doing it in just 1 step, instead of 1 million. So you see how efficient a HashSet is compared to an ArrayList.
The same happens with a HashMap of size 1 million, that it will only take 1 single step to check if a key is there, and not 1 million.
The same thing happens when you need to add, find and remove an element, with the hashed collections it will do all that in a single step (constant time, doesn't depend on the size of the map), but that varies for other structures.
That's why it is really efficient and widely used.
Main Difference between an ArrayList and a LinkedList:
If you want to find the element at place 500 in an ArrayList of size 1000, you do: list.get(500); and it will do that in a single step, because an ArrayList is implemented with an array, so with that 500, it goes directly where the element is in the array.
But a LinkedList is not implemented with an array, but with objects pointing to each other. This way, they need to go linearly and counting from 0, one by one until they get to the 500, which is not really efficient compared to the 1 single step of the ArrayList.
But when you need to add and remove elements in an ArrayList, sometimes the Array will need to be recreated so more elements fit in it, increasing the overhead.
But that doesn't happen with the LinkedList, since no array has to be recreated, only the objects (nodes) have to be re-referenced, which is done in a single step.
So an ArrayList is good when you won't be deleting or adding a lot of elements on the structure, but you are going to read a lot from it.
If you are going to add and remove a lot of elements, then is better a linked list since it has less work to do with those operations.
Why you need to implement the equals(), hashCode() methods for user-defined classes when you want to use those objects in HashMaps, and implement Comparable interface when you want to use those objects with TreeMaps?
Based on what I mentioned earlier for HashMaps, is possible that 2 different objects produce the same hash, if that happens, Java will not override the previous one or remove it, but it will keep them both in the same index. That is why you need to implement hashCode(), so you make sure that your objects will not have a really simple hashCode that can be easily duplicated.
And the reason why is recommended to override the equals() method is that if there is a collision (2 or more objects sharing the same hash in a HashMap), then how do you tell them apart? Well, asking the equals() method of those 2 objects if they are the same. So if you ask the map if it contains a certain key, and in that index, it finds 3 elements, it asks the equals() methods of those elements if its equals() to the key that was passed, if so, it returns that one. If you don't override the equals() method properly and specify what things you want to check for equality (like the properties name, age, etc.), then some unwanted overrides inside the HashMap will happen and you will not like it.
If you create your own classes, say, Person, and has properties like name, age, lastName and email, you can use those properties in the equals() method and if 2 different objects are passed but have the same values in your selected properties for equality, then you return true to indicate that they are the same, or false otherwise. Like the class String, that if you do s1.equals(s2); if s1 = new String("John"); and s2 = new String("John");, even though they are different objects in Java Heap Memory, the implementation of String.equals method uses the characters to determine if the objects are equals, and it returns true for this example.
To use a TreeMap with user-defined classes, you need to implement the Comparable interface, since the TreeMap will compare and sort the objects based on some properties, you need to specify by which properties your objects will be sorted. Will your objects be sorted by age? By name? By id? Or by any other property that you would like. Then, when you implement the Comparable interface and override the compareTo(UserDefinedClass o) method, you do your logic and return a positive number if the current object is greater than the o object passed, 0 if they are the same and a negative number if the current object is smaller. That way, the TreeMap will know how to sort them, based on the number returned.
First HashSet. In HashSet, you can easily get whether it contains given element. Let's have a set of people in your class and you want to ask whether a guy is in your class. You can make an array list of strings. And if you want to ask if a guy is in your class, you have to iterate through whole the list until you find him, which might be too slow for longer lists. If you use HashSet instead, the operation is much faster. You calculate the hash of the searched string and then you go directly to the hash, so you don't need to pass so many elements to answer your question. Well, you can also make a workaround to make the ArrayList faster to access for this purpose but this is already prepared.
And now HashMap. Now imagine that you also want to store a score for each person. So now you can use HashMap. You enter the name and you get his score in a short time, without the need of iterating through whole the data structure.
Does it make sense?
Concerning your question:
"But I couldn't understand when to prefer HashMap or HashSet, and I
would be really glad if you could explain it to me"
The HashMap implement the Map interface, to be used for mapping a Key (K) to a value (V) in constant time, and where order doesn't matter, so you can put and retrieve those data efficiently if you now the key.
And HashSet implement the Set interface, but is internanly using and HashMap, its role is to be used as a Set, meaning you're not supposed to retrieve an element, you just check that is in the set or not (mostly).
In HashMap, you can have identical value, while you can't in a Set (because its a property of a Set).
Concerning this question :
If the benefit is that this Hash objects can only contain 1 object of the same name, >wouldn't it just be easier to make the ArrayList check that the type is not already >there before adding it?
When dealing with collection, you have may base you choice of a particular one on the data representation but also on the way you want to access and store those data, how do you access it ? Do you need to sort them ? Because each implemenation may have different complexity (https://en.wikipedia.org/wiki/Time_complexity), it become important.
Using the doc,
For ArrayList:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
For HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets. Iteration over collection views requires time proportional to the "capacity" of the HashMap instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
So it's about the time complexity.
You may choose even more untypical collection for certain problems :).
This has little to do with Java specifically, and the choice depends mostly on performance requirements, but there's a fundamental difference that must be highlighted. Conceptually, Lists are types of collections that keep the order of insertion and may have duplicates, Sets are more like bags of items that have no specific order and no duplicates. Of course, different implementations may find a way around it (like a TreeSet).
First, let's check the difference between ArrayList and LinkedList. A linked list is a set of nodes, where each node contains a value and a link to the next and previous nodes. This makes inserting an element to a linked list a matter of appending a node to the end of the list, which is a quick operation since the memory does not have to be contiguous, as long as a node keeps a reference to the next node. On the other side, accessing a specific element requires transversing the entire list until finding it.
An array list, as the name implies, wraps an array. Accessing elements in an array by using its index is direct access, but inserting an element implies resizing the array to include the new element, so the memory it occupies is contiguous, making writes a bit heavier in this case.
A HashMap works like a dictionary, where for each key there's a value. The behavior of the insertion will mostly depend on how the hashCode and equals functions of the object used as a key are implemented. If the hashCode of two keys is the same, there's a hash collision, so equals will be used to understand if it's the same key or not. If equals is the same, then it's the same key, so the value is replaced. If not, the new value is added to the collection. Accessing and Writing values depends mostly on calculating the hash of the key followed by direct access to the value, making both operations really quick, O(1).
A set is pretty much like a hash map, without the "values" part, thus, it follows the same rules regarding the implementation of hashCode and equals operations for the added value.
It might be handy to study a bit about the Big-O notation and complexity of algorithms. If you are starting with Java, I'd strongly recommend the book Effective Java, by Joshua Bloch.
Hope it helps you dig further.
I've got Treasure objects in a TreasureCollectionDB class.
The TreasureCollectionDB class has a Map<Long, Treasure> (long being an id generated by the TreasureCollectionDB) called treasures
and a second data collection/list (available treasures).
The thing I need the other Collection or List to do is hold Treasures which I will add/remove through JSP pages. The Treasures in this list should be unique, but sorted alphabetically (if there's no data holder that does this by it self, I will write a sort method).
Anyone know what data holder I should use? Answers on the internet are confusing me as to which is most suitable.
You may use TreeSet, that should give you the desired results.
Set doesn't allow duplicates and Tree maintains sorted order.
The Treasures may implement Comparable interface so that you can sort on the desired field(s).
You would need to create equals and hash code methods and also write a comparator. That comparator you may pass to a TreeSet and use SortedSet interface.
I am considering using a Java collection that would work best with random insertions. I will be inserting a lot and only read the collection once at the end.
My desired functionality is adding an element at a specified index, anywhere between <0, current_length>. Which collection would be the most efficient to use?
Useful link for your reference:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf
Not entirely sure how you will be reading the information post input (and how important it is to you). Hashmap or ArrayList would make sense depending on what you are looking to do. Also not sure if you are looking for something thread safe or not.
Hope it helps.
The inefficiency of using List is endemic to the problem. Every time you add something, every subsequent element will have to be re-indexed - as the javadoc states:
Shifts the element currently at that position (if any) and any
subsequent elements to the right (adds one to their indices).
From your question/comments, it would appear that you have a bunch of Objects, and you're sorting them as you go. I'd suggest a more efficient solution to this problem would be to write a Comparator (or make your object implement Comparable), and then use Collections.sort(list, comparator) (or Collections.sort(list)).
You might suggest that your Objects are being sorted on the basis of other variables. In which case, you could create an extension of the Object, with those other variables as fields and extending Comparable, and with a method like getOriginal(). You add these wrapped objects to your list, sort, and then iterate through the list, adding the original objects (from getOriginal()) to a new list.
For info on the sorting algorithm of collections - see this SO question
If I have a class that needs to return an array of strings of variable dimension (and that dimension could only be determined upon running some method of the class), how do I declare the dynamic array in my class' constructor?
If the question wasn't clear enough,
in php we could simply declare an array of strings as $my_string_array = array();
and add elements to it by $my_string_array[] = "New value";
What is the above code equivalent then in java?
You will want to look into the java.util package, specifically the ArrayList class. It has methods such as .add() .remove() .indexof() .contains() .toArray(), and more.
Plain java arrays (ie String[] strings) cannot be resized dynamically; when you're out of room but you still want to add elements to your array, you need to create a bigger one and copy the existing array into its first n positions.
Fortunately, there are java.util.List implementations that do this work for you. Both java.util.ArrayList and java.util.Vector are implemented using arrays.
But then, do you really care if the strings happen to be stored internally in an array, or do you just need a collection that will let you keep adding items without worrying about running out of room? If the latter, then you can pick any of the several general purpose List implementations out there. Most of the time the choices are:
ArrayList - basic array based implementation, not synchronized
Vector - synchronized, array based implementation
LinkedList - Doubly linked list implementation, faster for inserting items in the middle of a list
Do you expect your list to have duplicate items? If duplicate items should never exist for your use case, then you should prefer a java.util.Set. Sets are guaranteed to not contain duplicate items. A good general-purpose set implementation is java.util.HashSet.
Answer to follow-up question
To access strings using an index similar to $my_string_array["property"], you need to put them in a Map<String, String>, also in the java.util package. A good general-purpose map implementation is HashMap.
Once you've created your map,
Use map.put("key", "string") to add strings
Use map.get("key") to access a string by its key.
Note that java.util.Map cannot contain duplicate keys. If you call put consecutively with the same key, only the value set in the latest call will remain, the earlier ones will be lost. But I'd guess this is also the behavior for PHP associative arrays, so it shouldn't be a surprise.
Create a List instead.
List<String> l = new LinkedList<String>();
l.add("foo");
l.add("bar");
No dynamic array in java, length of array is fixed.
Similar structure is ArrayList, a real array is implemented underlying it.
See the name ArrayList :)
Given a List of MyClass objects (and a custom Comparitor myComparitor if needed), what good options are there for checking if the List contains two "equal" objects?
Edit: if there are duplicates, return a reference to one or more of the duplicates.
Overriding MyClass.equals(MyClass) in this case is not an option.
My initial thought is to create a hash table of sorts, but I suspect that there's a non-hack way to accomplish the same thing:
SortedSet mySet = new TreeSet(myComparitor);
mySet.addAll(myList);
// Find duplicates in a sorted set in O(N) time
P.S. Is there a good reference on Markdown?
If the element's equals(Object) method does not give you the semantic that you require, then HashMap or HashSet are not options. Your choices are:
Use a TreeMap for de-duping. This is O(NlogN).
Sort the ArrayList or a copy, then iterate over looking for element i equals element i + 1. This is O(NlogN).
Find an alternative implementation of hash sets that allows you to provide a separate object to implement equality and hashing. (Neither Apache or Google collections support this, so you'll need to look further afield.)
Create a wrapper class for your element type that overrides equals(Object) and hashCode(), and de-dup using a HashSet of wrapped objects. This is O(N), but the constant of proportionality will be larger than a simple HashSet due to creation of wrapper objects.
When de-duping with a Set it is probably better to use a loop rather than addAll. This is necessary if you need to know what all of the duplicates are. If you don't need to know that, then using a loop allows you to stop when you find the first duplicate. The only case where addAll is likely to perform better is when there are likely to be no duplicates.
if you already have a sorted list, you could just look at any element and the next element, and if they're the same you have dups.
in your question you are using a TreeSet, which would have culled out duplicates already, so if you just need to know if you have duplicates, just check the size of mySet vs the size of myList. if they aren't the same you have dups.