A matter in the method "transfer " in HashMap ( jdk 1.6 )? - java

the source code is like this:
void transfer(Entry[] newTable) {
Entry[] src = table;
int newCapacity = newTable.length;
for (int j = 0; j < src.length; j++) {
Entry<K,V> e = src[j];
if (e != null) {
src[j] = null;
do {
Entry<K,V> next = e.next;
int i = indexFor(e.hash, newCapacity);
e.next = newTable[i];
newTable[i] = e;
e = next;
} while (e != null);
}
}
}
but I want to do like this,does this work well or any problem?
I think the hash value of all elements in the same list is same ,so we don't need calculate the buketIndex of the new table;the only thing we should do is
transfer the head element to the new table,and this will save time.
Thank you for your answer.
void transfer(Entry[] newTable){
Entry[] src = table;
int newCapacity = newTable.length;
for(int j = 0 ;j< src.length;j++){
Entry<K,V> e = src[j];
if(null != e){
src[j] = null;
int i = indexFor(e.hash,newCapacity);
newTable[i] = e;
}
}
}

I think you are asking about the HashMap class as implemented in Java 6, and (specifically) whether your idea for optimizing the internal transfer() method would work.
The answer is No it won't.
This transfer method is called when the main array has been resized, and its purpose is to transfer all existing hash entries to the correct hash chain in the resized table. Your code is simply moving entire chains. That won't work. The problem is that the hash chains typically hold entries with multiple different hashcodes. While a given group of entries all belonged on the same chain in the old version of the table, in the new version they probably won't.
In short, your proposed modification would "lose" some entries because they would be in the wrong place in the resized table.
There is a meta-answer to this too. The standard Java classes were written and tested by a group of really smart software engineers. Since then, the code has probably been read by tens or hundreds of thousands of other smart people outside of Sun / Oracle. The probability that everyone else has missed a simple / obvious optimization like this is ... vanishingly small.
So if you do find something that looks to you like an optimization (or a bug) in the Java source code for Java SE, you are probably mistaken!

Related

Why the HashMap#resize implementation is so complex?

After reading source code of java.util.HashMap#resize , I'm very confused with some part -- that is when some bin has more than one node.
else { // preserve order
Node<K,V> loHead = null, loTail = null;
Node<K,V> hiHead = null, hiTail = null;
Node<K,V> next;
do {
next = e.next;
if ((e.hash & oldCap) == 0) {
if (loTail == null)
loHead = e;
else
loTail.next = e;
loTail = e;
}
else {
if (hiTail == null)
hiHead = e;
else
hiTail.next = e;
hiTail = e;
}
} while ((e = next) != null);
if (loTail != null) {
loTail.next = null;
newTab[j] = loHead;
}
if (hiTail != null) {
hiTail.next = null;
newTab[j + oldCap] = hiHead;
}
}
Why I feel this part is no need to exist? Just use below code
newTab[e.hash & (newCap - 1)] = e;
is ok -- I think they have the same effect.
So why bother to have so many code in the else branch?
At resize, every bin is split into two separate bins. So if the bin contained several linked items, you cannot move all of them into the single target bin based on the hash of the first item: you should recheck all the hashes and distribute them into "hi" and "lo" bin, depending on the new significant bit inside the hash ((e.hash & oldCap) == 0). This was somewhat simpler in Java 7 before the introduction of tree-bins, but the older algorithm could change the order of the items which is not acceptable now.
EDIT:
The threshold for treefying a bin changes as the table is made bigger. That's what it is doing.
I haven't read the entire file, but this could be a possible reason (line 220)
The use and transitions among plain vs tree modes is
complicated by the existence of subclass LinkedHashMap. See
below for hook methods defined to be invoked upon insertion,
removal and access that allow LinkedHashMap internals to
otherwise remain independent of these mechanics. (This also
requires that a map instance be passed to some utility methods
that may create new nodes.)

Java Searching through two Arrays

I have 2 ArrayList's. ArrayList A has 8.1k elements and ArrayList B has 81k elements.
I need to iterate through B, search for that particular item in A then change a field in the matched element in list B.
Here's my code:
private void mapAtoB(List<A> aList, ListIterator<B> it) {
AtomicInteger i = new AtomicInteger(-1);
while(it.hasNext()) {
System.out.print(i.incrementAndGet() + ", ");
B b = it.next();
aList.stream().filter(a -> b.equalsB(a)).forEach(a -> {
b.setId(String.valueOf(a.getRedirectId()));
it.set(b);
});
}
System.out.println();
}
public class B {
public boolean equalsB(A a) {
if (a == null) return false;
if (this.getFullURL().contains(a.getFirstName())) return true;
return false;
}
}
But this is taking forever. To finish this method it takes close to 15 minutes. Is there any way to optimize any of this? 15 min run time is way too much.
I'll be happy to see a good and thorough solution, meanwhile I can propose two ideas (or maybe two reincarnations of one).
The first one is to speed up searching of all objects of type A in one object of type B. For that, Rabin-Karp algorithm seems applicable and simple enough to quickly implement, and Aho-Corasick harder but will probably give better results, not sure how much better.
The other option is to limit the number of objects of type B which should be fully processed for each object of A, for that you could e.g. build an inverse N-gram index: for each fullUrl you take all its substrings of length N ("N-grams"), and you build a map from each such N-gram to a set of B's that have such N-gram in their fullUrl. When searching for an object A, you take all of its N-grams, find a set of B's for each such N-gram and intersect all these sets, the intersection will contain all B's that you should fully process. I implemented this approach quickly, for the sizes you specified it gives a 6-7 time speedup for N=4; as N grows, search becomes faster, but building the index slows down (so if you can reuse it you are probably better off choosing a bigger N). This index takes about 200 Mb for the sizes you specified, so this approach will only scale this far with the growth of the collection of B's. Assuming that all strings are longer than NGRAM_LENGTH, here's the quick and dirty code for building the index using Guava's SetMultimap, HashMultimap:
SetMultimap<String, B> idx = HashMultimap.create();
for (B b : bList) {
for (int i = 0; i < b.getFullURL().length() - NGRAM_LENGTH + 1; i++) {
idx.put(b.getFullURL().substring(i, i + NGRAM_LENGTH), b);
}
}
And for the search:
private void mapAtoB(List<A> aList, SetMultimap<String, B> mmap) {
for (A a : aList) {
Collection<B> possible = null;
for (int i = 0; i < a.getFirstName().length() - NGRAM_LENGTH + 1; i++) {
String ngram = a.getFirstName().substring(i, i + NGRAM_LENGTH);
Set<B> forNgram = mmap.get(ngram);
if (possible == null) {
possible = new ArrayList<>(forNgram);
} else {
possible.retainAll(forNgram);
}
if (possible.size() < 20) { // it's ok to scan through 20
break;
}
}
for (B b : possible) {
if (b.equalsB(a)) {
b.setId(a.getRedirectId());
}
}
}
}
A possible direction for optimization would be to use hashes instead of full N-grams thus reducing the memory footprint and necessity for N-gram key comparisons.

How to implement efficient hash cons with java HashSet

I am trying to implement a hash cons in java, comparable to what String.intern does for strings. I.e., I want a class to store all distinct values of a data type T in a set and provide an T intern(T t) method that checks whether t is already in the set. If so, the instance in the set is returned, otherwise t is added to the set and returned. The reason is that the resulting values can be compared using reference equality since two equal values returned from intern will for sure also be the same instance.
Of course, the most obvious candidate data structure for a hash cons is java.util.HashSet<T>. However, it seems that its interface is flawed and does not allow efficient insertion, because there is no method to retrieve an element that is already in the set or insert one if it is not in there.
An algorithm using HashSet would look like this:
class HashCons<T>{
HashSet<T> set = new HashSet<>();
public T intern(T t){
if(set.contains(t)) {
return ???; // <----- PROBLEM
} else {
set.add(t); // <--- Inefficient, second hash lookup
return t;
}
}
As you see, the problem is twofold:
This solution would be inefficient since I would access the hash table twice, once for contains and once for add. But okay, this may not be a too big performance hit since the correct bucket will be in the cache after the contains, so add will not trigger a cache miss and thus be quite fast.
I cannot retrieve an element already in the set (see line flagged PROBLEM). There is just no method to retrieve the element in the set. So it is just not possible to implement this.
Am I missing something here? Or is it really impossible to build a usual hash cons with java.util.HashSet?
I don't think it's possible using HashSet. You could use some kind of Map instead and use your value as key and as value. The java.util.concurrent.ConcurrentMap also happens to posess the quite convenient method
putIfAbsent(K key, V value)
that returns the value if it is already existent. However, I don't know about the performance of this method (compared to checking "manually" on non-concurrent implementations of Map).
Here is how you would do it using a HashMap:
class HashCons<T>{
Map<T,T> map = new HashMap<T,T>();
public T intern(T t){
if (!map.containsKey(t))
map.put(t,t);
return map.get(t);
}
}
I think the reason why it is not possible with HashSet is quite simple: To the set, if contains(t) is fulfilled, it means that the given t also equals one of the t' in the set. There is no reason for being able return it (as you already have it).
Well HashSet is implemented as HashMap wrapper in OpenJDK, so you won't win in memory usage comparing to solution suggested by aRestless.
10-min sketch
class HashCons<T> {
T[] table;
int size;
int sizeLimit;
HashCons(int expectedSize) {
init(Math.max(Integer.highestOneBit(expectedSize * 2) * 2, 16));
}
private void init(int capacity) {
table = (T[]) new Object[capacity];
size = 0;
sizeLimit = (int) (capacity * 2L / 3);
}
T cons(#Nonnull T key) {
int mask = table.length - 1;
int i = key.hashCode() & mask;
do {
if (table[i] == null) break;
if (key.equals(table[i])) return table[i];
i = (i + 1) & mask;
} while (true);
table[i] = key;
if (++size > sizeLimit) rehash();
return key;
}
private void rehash() {
T[] table = this.table;
if (table.length == (1 << 30))
throw new IllegalStateException("HashCons is full");
init(table.length << 1);
for (T key : table) {
if (key != null) cons(key);
}
}
}

Insert into an already-sorted list

With Java, I have a class, known as TestClass, which has a member named Name, which is a string. I also have an ArrayList of this type, which is already sorted alphabetically by Name. What I want to do is find the best index in which to put a new instance of TestClass. The best approach I could come up with so far is this:
public static int findBestIndex(char entry, ArrayList<TestClass> list){
int desiredIndex = -1;
int oldPivot = list.size();
int pivot = list.size()/2;
do
{
char test = list.get(pivot).Name.charAt(0);
if (test == entry)
{
desiredIndex = pivot;
}
else if (Math.abs(oldPivot - pivot) <= 1)
{
if (test < entry)
{
desiredIndex = pivot + 1;
}
else
{
desiredIndex = pivot - 1;
}
}
else if (test < entry)
{
int tempPiv = pivot;
pivot = oldPivot - (oldPivot - pivot)/2;
oldPivot = tempPiv;
}
else
{
int tempPiv = pivot;
pivot = pivot - (oldPivot - pivot)/2;
oldPivot = tempPiv;
}
} while (desiredIndex < 0);
return desiredIndex;
}
Essentially, Break the array in half, check to see if your value goes before, after, or at that point. If it's after, check the first half of the array. Other wise, check the second half. Then, repeat. I understand that this method only tests by the first character, but that's easily fixed, and not relevant to my main problem. For some scenarios, this approach works well enough. For most, it works horribly. I assume that it isn't finding the new pivot point properly, and if that's the case, how would I fix it?
Edit: For clarification, I'm using this for an inventory system, so I'm not sure a LinkedList would be appropriate. I'm using an ArrayList because they are more familiar to me, and thus would be easier to translate into another language, if needed (which is likely, at the moment, might be moving over to C#). I'm trying to avoid things like Comparable for that reason, as I'd have to completely re-write if C# lacks it.
Edit part Duex: Figured out what I was doing wrong. Instead of using the previous pivot point, I should have been setting and changing the boundaries of the area I was checking, and creating the new pivot based on that.
It might not be a good idea to use a SortedSet (e.g. a TreeSet) for this, because Set‘s don't allow duplicate elements. If you have duplicate elements (i.e. TestClass instances with the same name), then a List should be used. To insert an element into an already sorted list is as simple as this:
void insert(List<TestClass> list, TestClass element) {
int index = Collections.binarySearch(list, element, Comparator.comparing(TestClass::getName));
if (index < 0) {
index = -index - 1;
}
list.add(index, element);
}
This code requires Java 8 or later, but can be rewritten to work in older Java versions as well.
As already pointed out, there is no reason to implement this by yourself, simple code example:
class FooBar implements Comparable<FooBar> {
String name;
#Override
public int compareTo(FooBar other) {
return name.compareTo(other.name);
}
}
TreeSet<FooBar> foobarSet = new TreeSet<>();
FooBar f1;
foobarSet.add(new FooBar("2"));
foobarSet.add(f1 = new FooBar("1"));
int index = foobarSet.headSet(f1).size();
(Based on How to find the index of an element in a TreeSet?)
I think the problem is in this bit of the code:
else if (test < entry)
{
int tempPiv = pivot;
pivot = oldPivot - (oldPivot - pivot)/2;
oldPivot = tempPiv;
}
else
{
int tempPiv = pivot;
pivot = pivot - (oldPivot - pivot)/2;
oldPivot = tempPiv;
}
You are peforming the same actions wether test < entry or wether test > entry. This will lead to a linear search when the item you are searching for is at the start of the array.
I prefer to use (low and high) like
high = list.size();
low = 0;
do {
pivot = (high + low) / 2;
if (test < entry) {
low = pivot;
} else if (test > entry) {
high = pivot
} else {
....
}
} while ...;
You should use something like a PriorityQueue that already has a sense of order. Inserting into a collection with a sense of order will automatically place the element in the correct place with minimal time(usually log(n) or less).
If you want to do arbitrary inserts without this, then I would suggest using a LinkedList that won't have to be resorted or completely copied over to insert a single item like the ArrayList you currently have. While finding the correct insert location for a LinkedList will take up to O(n) time, in practice it will still be faster than using a log(n) search for the correct location in an ArrayList, but then needing to copy or sort it afterwards.
Also the code for finding the insert location in a linked list is much simpler.
if (next != null && next.compareTo(insertElement) > 0){
// You have the right location
}
There are other data structures used could use instead of list like a tree, priority queue etc.
Make a list implementation of your own, and in your add method have these lines:
wrappedList.add(object);
Collections.sort(wrappedList);

Is there a way to optimize this code so that I avoid outOfMemory error?

My overall project is to create a tree and use Huffman coding to encode and decode a given file. I am at the point where I need to decode my file. To do this, i am having to step through my Huffman Tree until I reach a bottom most leaf and then return the Byte represented by that leaf. I traverse the tree according to the bit string given to the method. AKA if the current bit is 1, I go to childOne in the tree and so forth. The problem is that I keep getting an outOfMemory error. Is there any way to optimize this code so that it won't use as much memory?
public static int decode(List<Integer> bitArray, HuffmanNode root, int startingPosition,
ArrayList<Byte> finalArray)
{
HuffmanNode childOne;
HuffmanNode childZero;
int currentBit = bitArray.get(startPosition);
byte newByte;
childOne = root.getChildOne();
childZero = root.getChildZero();
if(childOne == null && childZero == null)
{
finalArray.add(root.getByteRepresented());
return startPosition;
}
else if(currentBit == 1)
{
startPosition++;
startPosition = decode(bitArray,childOne,startPosition,finalArray);
}
else
{
startPosition++;
startPosition = decode(bitArray,childZero,startPosition,finalArray);
}
return startPosition;
}
I am needing to know the place in the bitArray that it ended as well as put the specified Byte into an array, so that is why I am putting the Byte into the array within the method and returning and int. Basically, is there any better way to get this done?
Yes, there is. Change recursion to iteration..
temp = root;
childOne = temp.getChildOne();
childZero = temp.getChildZero();
while(childOne != null && childZero != null) {
currentBit = bitArray.get(startPosition++);
if (currentBit == 1) {
temp = childOne;
} else {
temp = childZero;
}
childOne = temp.getChildOne();
childZero = temp.getChildZero();
}
I don't know how big things are, but I don't think you need recursion. Can't you do what you need with a loop like this one:
while (curNode.isNotLeaf())
{
if (currentBit == 1) curNode = curNode.getChildOne();
else curNode = curNode.getChildZero();
currentBit = nextBit;
}
addByte(curNode, bigArray)
so you go through your bits in this loop, adding represented bytes when you get to leaves, and then continue on -- no need for all the stack frames recursion involves.
Also, consider using lower-level data structures like java.util.BitSet instead of List<Integer> and java.io.ByteArrayOutputStream instead of ArrayList<Byte>.
If recursion was your issue you would most likely be getting stack overflow errors. As you are running out of memory I would suggest you look at:
Be sure to discard bitArray and finalArray after each call.
Use a stream to output to.
Use a BitSet for the bitArray.
Make sure you are not accidentally making loops in your tree.
Have a look at compressed bit set implementation like
OpenBitSet (Lucene)
http://code.google.com/p/javaewah/
http://www.censhare.com/en/aktuelles/censhare-labs/yet-another-compressed-bitset

Categories