Why the HashMap#resize implementation is so complex?

Why the HashMap#resize implementation is so complex? - java

After reading source code of java.util.HashMap#resize , I'm very confused with some part -- that is when some bin has more than one node.
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
Why I feel this part is no need to exist? Just use below code
newTab[e.hash & (newCap - 1)] = e;
is ok -- I think they have the same effect.
So why bother to have so many code in the else branch?

At resize, every bin is split into two separate bins. So if the bin contained several linked items, you cannot move all of them into the single target bin based on the hash of the first item: you should recheck all the hashes and distribute them into "hi" and "lo" bin, depending on the new significant bit inside the hash ((e.hash & oldCap) == 0). This was somewhat simpler in Java 7 before the introduction of tree-bins, but the older algorithm could change the order of the items which is not acceptable now.

EDIT:
The threshold for treefying a bin changes as the table is made bigger. That's what it is doing.
I haven't read the entire file, but this could be a possible reason (line 220)
The use and transitions among plain vs tree modes is
complicated by the existence of subclass LinkedHashMap. See
below for hook methods defined to be invoked upon insertion,
removal and access that allow LinkedHashMap internals to
otherwise remain independent of these mechanics. (This also
requires that a map instance be passed to some utility methods
that may create new nodes.)

Related

Why does hashmap split method need to determine if (hiHead! = Null) before loHead.treeify (tab)

When I read the hashmap split method source code, I found a piece of source code：
final void split(HashMap<K,V> map, Node<K,V>[] tab, int index, int bit) {
TreeNode<K,V> b = this;
// Relink into lo and hi lists, preserving order
TreeNode<K,V> loHead = null, loTail = null;
TreeNode<K,V> hiHead = null, hiTail = null;
....
if (loHead != null) {
if (lc <= UNTREEIFY_THRESHOLD)
tab[index] = loHead.untreeify(map);
else {
tab[index] = loHead;
if (hiHead != null) // (else is already treeified)
loHead.treeify(tab);
}
}
.......
}
I can't understand why using loHead.treeify (tab); before, if (hiHead! = Null) judgment is needed. First of all, my understanding is that hashmap is used in a single thread, so I can't think of any relationship between hiHead and loHead. The official comment is "else is already treeified."

The input to the method is a tree bin to be split. If all entries in that bin fall into one or the other split sets (i.e. they all have the same value for the newly added hash bit), then the input tree is already correct. It can just be reused as-is by making its root the new bin value with the other being empty.

A matter in the method "transfer " in HashMap ( jdk 1.6 )?

the source code is like this:
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}
but I want to do like this,does this work well or any problem?
I think the hash value of all elements in the same list is same ,so we don't need calculate the buketIndex of the new table;the only thing we should do is
transfer the head element to the new table,and this will save time.
Thank you for your answer.
void transfer(Entry[] newTable){
Entry[] src = table;
int newCapacity = newTable.length;
for(int j = 0 ;j< src.length;j++){
Entry<K,V> e = src[j];
if(null != e){
src[j] = null;
int i = indexFor(e.hash,newCapacity);
newTable[i] = e;
}
}
}

I think you are asking about the HashMap class as implemented in Java 6, and (specifically) whether your idea for optimizing the internal transfer() method would work.
The answer is No it won't.
This transfer method is called when the main array has been resized, and its purpose is to transfer all existing hash entries to the correct hash chain in the resized table. Your code is simply moving entire chains. That won't work. The problem is that the hash chains typically hold entries with multiple different hashcodes. While a given group of entries all belonged on the same chain in the old version of the table, in the new version they probably won't.
In short, your proposed modification would "lose" some entries because they would be in the wrong place in the resized table.
There is a meta-answer to this too. The standard Java classes were written and tested by a group of really smart software engineers. Since then, the code has probably been read by tens or hundreds of thousands of other smart people outside of Sun / Oracle. The probability that everyone else has missed a simple / obvious optimization like this is ... vanishingly small.
So if you do find something that looks to you like an optimization (or a bug) in the Java source code for Java SE, you are probably mistaken!

Should we localize scope variables at cost of multiple declarations

Effective java greatly stresses on localizing scope of variable. But in case we have an if else it may cause multiple declations eg:
public List<E> midPoint() {
if (first == null) {
throw new NullPointerException("Linked list is empty");
}
if (first.next == null) {
ArrayList<E> arr = new ArrayList<E>();
arr.add(first.element);
return arr;
}
Node<E> fast = first.next;
Node<E> slow = first;
while (fast != null && fast.next != null) {
slow = slow.next;
fast = fast.next.next;
}
// even count for number of nodes in linkedlist.
if (fast != null) {
ArrayList<E> arr = new ArrayList<E>();
arr.add(slow.element);
arr.add(slow.next.element);
return arr;
} else {
ArrayList<E> arr = new ArrayList<E>();
arr.add(slow.element);
return arr;
}
}
In the above code Arraylist defination / declaration occurs multiple times, but the variable is localized.. Is it good the way it is OR should arrayList be declared at the top and returned where its matches condition : eg:
public List<E> midPoint() {
if (first == null) {
throw new NullPointerException("Linked list is empty");
}
ArrayList<E> arr = new ArrayList<E>(); // NOTE - JUST A SINGLE DECLARATION.
if (first.next == null) {
arr.add(first.element);
return arr;
}
Node<E> fast = first.next;
Node<E> slow = first;
while (fast != null && fast.next != null) {
slow = slow.next;
fast = fast.next.next;
}
// even count for number of nodes in linkedlist.
if (fast != null) {
arr.add(slow.element);
arr.add(slow.next.element);
return arr;
} else {
arr.add(slow.element);
return arr;
}
}
Thanks

In this case, it is advised that you declare it in only one place. It will be more readable and spare some lines of code.
Renaming would also be good, maybe something that suggests that is the final result of your method (like returnArray, resultArray).
In other circumstances, when that list would mean several different things, it would be really better to declare it, in that case, you would have different names too.

Should we localize scope variables at cost of multiple declarations
Different people (including well respected authors of well-known text books) will have different opinions on what makes code readable. The problem, is that readability is a subjective measure: it depends on the reader.
So I think it is up to you to decide. The chances are that you are going to be the primary reader of your code, at least to start with. So ...
Use the version that you think makes the code more readable.
If you want a second opinion, ask your co-workers.
If you have chosen to use a style guide ... be guided by what it says.
FWIW, my personal opinion is that it really depends on the context. Sometimes it is better to localize, sometimes not. A lot depends on how "far away" the declaration is from the usage, and how intuitive the meaning of the variable is. (For example, if arr was named res or result, you would not need to look at the variable declaration ... assuming you knew the signature of the current method.)

There's nothing wrong with declaring it multiple times, but you have a lot of repeated code: you can significantly improve your code by refactoring.
In your case, the JDK provides a convenience utility method to create ArrayLists in-line:
Instead of:
ArrayList<E> arr = new ArrayList<E>();
arr.add(slow.element);
arr.add(slow.next.element);
return arr;
Code this:
return Arrays.asList(slow.element, slow.next.element);
And so forth.
Note that the list returned from asList() is not modifiable. If you need a modifiable list, pass it to ArrayList's copy constructor:
return new ArrayList(Arrays.<E>asList(slow.element, slow.next.element));

Java : Merging two sorted linked lists

I have developed a code to merge two already sorted linked lists in java.
I need help with the following:
How do I retain the value of head node of merged list without using tempNode?
Can this code be better optimized?
public static ListNode mergeSortedListIteration(ListNode nodeA, ListNode nodeB) {
ListNode mergedNode ;
ListNode tempNode ;
if (nodeA == null) {
return nodeB;
}
if (nodeB == null) {
return nodeA;
}
if ( nodeA.getValue() < nodeB.getValue())
{
mergedNode = nodeA;
nodeA = nodeA.getNext();
}
else
{
mergedNode = nodeB;
nodeB = nodeB.getNext();
}
tempNode = mergedNode;
while (nodeA != null && nodeB != null)
{
if ( nodeA.getValue() < nodeB.getValue())
{
mergedNode.setNext(nodeA);
nodeA = nodeA.getNext();
}
else
{
mergedNode.setNext(nodeB);
nodeB = nodeB.getNext();
}
mergedNode = mergedNode.getNext();
}
if (nodeA != null)
{
mergedNode.setNext(nodeA);
}
if (nodeB != null)
{
mergedNode.setNext(nodeB);
}
return tempNode;
}

1: You have to keep a record of the first node, which means you will have to store it in a variable such as tempNode.
2: No. There's not much to optimize here. The process is quite trivial.

There are a few possibilities:
1) Instead of using mergedNode to keep track of the previous node, use nodeA.getNext().getValue() and nodeB.getNext().getValue(). Your algorithm will become more complex and you will have to deal with a few edge cases, but it is possible to eliminate one of your variables.
2) Use a doubly linked-list, and then use either nodeA.getPrev().getValue() and nodeB.getPrev().getValue() instead of mergedNode. You will also have to deal with edge cases here too.
In order to deal with edge cases, you will have to guarantee that your references can not possibly be null before calling getPrev(), getNext() or getValue(), or else you will throw an exception.
Note that the above modifications sacrifice execution time slightly and (more importantly) simplicity in order to eliminate a variable. Any gains would be marginal, and developer time is far more important than shaving a microsecond or two off of your operation.

Slow implementation and runs out of heap space (even when vm args are set to 2g)

I'm writing a function which generates all paths in a tree as xpath statements and storing them in a bag below is a naive (sorry this is long) and below that is my attempt to optimize it:
/**
* Create the structural fingerprint of a tree. Defined as the multiset of
* all paths and their multiplicities
*/
protected Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
/*
* difference between unordered and ordered trees is that the
* next-sibling axis must also be used
*
* this means that each node's children are liable to be generated more
* than once and so are memo-ised and reused
*/
Multiset<String> res = new Multiset<String>();
// so, we return a set containing:
// 1. the node name itself, prepended by root symbol
res.add("/" + t.getNodeName());
List<AbstractTree<String>> children = t.getChildren();
// all of the childrens' sets prepended by this one
if (children != null) {
for (AbstractTree<String> child : children) {
Multiset<String> sub = createSF(child, children);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
// 2. all of the following siblings' sets, prepended by this one
if (allSiblings != null) {
// node is neither original root nor leaf
// first, find current node
int currentNodePos = 0;
int ptrPos = 0;
for (AbstractTree<String> node : allSiblings) {
if (node == t) {
currentNodePos = ptrPos;
}
ptrPos++;
}
// 3. then add all paths deriving from (all) following siblings
for (int i = currentNodePos + 1; i < allSiblings.size(); i++) {
AbstractTree<String> sibling = allSiblings.get(i);
Multiset<String> sub = createSF(sibling, allSiblings);
for (String nextOne : sub) {
if (nextOne.indexOf("//") == 0) {
res.add(nextOne);
} else {
res.add("/" + nextOne);
res.add("/" + t.getNodeName() + nextOne);
}
}
}
}
return res;
}
And now the optimization which is (currently) in a subclass:
private Map<AbstractTree<String>, Multiset<String>> lookupTable = new HashMap<AbstractTree<String>, Multiset<String>>();
public Multiset<String> createSF(AbstractTree<String> t,
List<AbstractTree<String>> allSiblings) {
Multiset<String> lookup = lookupTable.get(t);
if (lookup != null) {
return lookup;
} else {
Multiset<String> res = super.createSF(t, allSiblings);
lookupTable.put(t, res);
return res;
}
}
My trouble is that the optimized version runs out of heap space (the vm args are set at -Xms2g -Xmx2g) and is very slow on moderately large input. Can anyone see a way to improve on this?

Run the code through a profiler. That's the only way to get real facts about the code. Everything else is just guesswork.

"generates all paths in a tree as xpath statements"
How many paths are you creating? This can be non-trivial. The number of paths should be O( n log n ), but the algorithm could be much worse depending on what representation they use for children of a parent.
You should profile the simple enumeration of paths without worrying about the bag storage.

Your code eats RAM exponentially. So one layer more means children.size() times more RAM.
Try to use a generator instead of materializing the results: Implement a Multiset which does not calculate the results beforehand but iterates through the tree structure as you call next() on the set's iterator.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why the HashMap#resize implementation is so complex? - java

Related

Why does hashmap split method need to determine if (hiHead! = Null) before loHead.treeify (tab)

A matter in the method "transfer " in HashMap ( jdk 1.6 )?

Should we localize scope variables at cost of multiple declarations

Java : Merging two sorted linked lists

Slow implementation and runs out of heap space (even when vm args are set to 2g)

Categories

Resources