HashMap performs better than array? [duplicate]

HashMap performs better than array? [duplicate] - java

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.
ArrayExample:
SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");
void doSomethingToObject(String Identifier){
SomeObject object;
if(Identifier.equals("Obj1")){
object=objects[0];
}else if(){
object=objects[1];
}
//do stuff
}
HashMapExample:
HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());
void doSomethingToObject(String Identifier){
SomeObject object = (SomeObject) objects.get(Identifier);
//do stuff
}
The HashMap one looks much much better but I really need performance on this so that has priority.
EDIT: Well Array's it is then, suggestions are still welcome
EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)
EDIT: It appears that HashMaps are faster
Array: 128ms
Hash: 103ms
When using less cycles the HashMaps was even twice as fast
test code:
import java.util.HashMap;
import java.util.Random;
public class Optimizationsest {
private static Random r = new Random();
private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];
private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};
private static int t = 1000000;
public static void main(String[] args){
CreateHash();
CreateArray();
long loopTime = ProcessArray();
long hashTime = ProcessHash();
System.out.println("Array: " + loopTime + "ms");
System.out.println("Hash: " + hashTime + "ms");
}
public static void CreateHash(){
for(int i=0; i <= 5; i++){
hm.put("Obj"+(i+1), new SomeObject());
}
}
public static void CreateArray(){
for(int i=0; i <= 5; i++){
o[i]=new SomeObject();
}
}
public static long ProcessArray(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkArray(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkArray(String Identifier) {
SomeObject object;
if(Identifier.equals("Obj1")){
object=o[0];
}else if(Identifier.equals("Obj2")){
object=o[1];
}else if(Identifier.equals("Obj3")){
object=o[2];
}else if(Identifier.equals("Obj4")){
object=o[3];
}else if(Identifier.equals("Obj5")){
object=o[4];
}else if(Identifier.equals("Obj6")){
object=o[5];
}else{
object = new SomeObject();
}
object.kill();
}
public static long ProcessHash(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkHash(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkHash(String Identifier) {
SomeObject object = (SomeObject) hm.get(Identifier);
object.kill();
}
}

HashMap uses an array underneath so it can never be faster than using an array correctly.
Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.
The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.
HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.
A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.
This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)

arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)
and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast

For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.

Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.

Arrays will usually be faster than Collections classes.
PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo
"The HashTable one looks much much
better "

The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.
If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

Related

Splitting vectors into subvectors - Java

I have a function that processes vectors. Size of input vector can be anything up to few millions. Problem is that function can only process vectors that are no bigger than 100k elements without problems.
I would like to call function in smaller parts if vector has too many elements
Vector<Stuff> process(Vector<Stuff> input) {
Vector<Stuff> output;
while(1) {
if(input.size() > 50000) {
output.addAll(doStuff(input.pop_front_50k_first_ones_as_subvector());
}
else {
output.addAll(doStuff(input));
break;
}
}
return output;
}
How should I do this?

Not sure if a Vector with millions of elements is a good idea, but Vector implements List, and thus there is subList which provides a lightweight (non-copy) view of a section of the Vector.
You may have to update your code to work with the interface List instead of only the specific implementation Vector, though (because the sublist returned is not a Vector, and it is just good practice in general).

You probably want to rewrite your doStuff method to take a List rather than a Vector argument,
public Collection<Output> doStuff(List<Stuff> v) {
// calculation
}
(and notice that Vector<T> is a List<T>)
and then change your process method to something like
Vector<Stuff> process(Vector<Stuff> input) {
Vector<Stuff> output;
int startIdx = 0;
while(startIdx < input.size()) {
int endIdx = Math.min(startIdx + 50000, input.size());
output.addAll(doStuff(input.subList(startIdx, endIdx)));
startIdx = endIdx;
}
}
this should work as long as the "input" Vector isn't being concurrently updated during the running of the process method.
If you can't change the signature of doStuff, you're probably going to need to wrap a new Vector around the result of subList,
output.addAll(doStuff(new Vector<Stuff>(input.subList(startIdx, endIdx)))));

Is there a way to test for enum value in a list of candidates? (Java)

This is a simplified example. I have this enum declaration as follows:
public enum ELogLevel {
None,
Debug,
Info,
Error
}
I have this code in another class:
if ((CLog._logLevel == ELogLevel.Info) || (CLog._logLevel == ELogLevel.Debug) || (CLog._logLevel == ELogLevel.Error)) {
System.out.println(formatMessage(message));
}
My question is if there is a way to shorten the test. Ideally i would like somethign to the tune of (this is borrowed from Pascal/Delphi):
if (CLog._logLevel in [ELogLevel.Info, ELogLevel.Debug, ELogLevel.Error])
Instead of the long list of comparisons. Is there such a thing in Java, or maybe a way to achieve it? I am using a trivial example, my intention is to find out if there is a pattern so I can do these types of tests with enum value lists of many more elements.
EDIT: It looks like EnumSet is the closest thing to what I want. The Naïve way of implementing it is via something like:
if (EnumSet.of(ELogLevel.Info, ELogLevel.Debug, ELogLevel.Error).contains(CLog._logLevel))
But under benchmarking, this performs two orders of magnitude slower than the long if/then statement, I guess because the EnumSet is being instantiated every time it runs. This is a problem only for code that runs very often, and even then it's a very minor problem, since over 100M iterations we are talking about 7ms vs 450ms on my box; a very minimal amount of time either way.
What I settled on for code that runs very often is to pre-instantiate the EnumSet in a static variable, and use that instance in the loop, which cuts down the runtime back down to a much more palatable 9ms over 100M iterations.
So it looks like we have a winner! Thanks guys for your quick replies.

what you want is an enum set
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/EnumSet.html
put the elements you want to test for in the set, and then use the Set method contains().
import java.util.EnumSet;
public class EnumSetExample
{
enum Level { NONE, DEBUG, INFO, ERROR };
public static void main(String[] args)
{
EnumSet<Level> subset = EnumSet.of(Level.DEBUG, Level.INFO);
for(Level currentLevel : EnumSet.allOf(Level.class))
{
if (subset.contains(currentLevel))
{
System.out.println("we have " + currentLevel.toString());
}
else
{
System.out.println("we don't have " + currentLevel.toString());
}
}
}
}

There's no way to do it concisely in Java. The closest you can come is to dump the values in a set and call contains(). An EnumSet is probably most efficient in your case. You can shorted the set initialization a little using the double brace idiom, though this has the drawback of creating a new inner class each time you use it, and hence increases the memory usage slightly.

In general, logging levels are implemented as integers:
public static int LEVEL_NONE = 0;
public static int LEVEL_DEBUG = 1;
public static int LEVEL_INFO = 2;
public static int LEVEL_ERROR = 3;
and then you can test for severity using simple comparisons:
if (Clog._loglevel >= LEVEL_DEBUG) {
// log
}

You could use a list of required levels, ie:
List<ELogLevel> levels = Lists.newArrayList(ELogLevel.Info,
ELogLevel.Debug, ELogLevel.Error);
if (levels.contains(CLog._logLevel)) {
//
}

Most efficient way to create an array of counting numbers

What's the most efficient way to make an array of a given length, with each element containing its subscript?
Possible description with my dummy-level code:
/**
* The IndGen function returns an integer array with the specified dimensions.
*
* Each element of the returned integer array is set to the value of its
* one-dimensional subscript.
*
* #see Modeled on IDL's INDGEN function:
* http://idlastro.gsfc.nasa.gov/idl_html_help/INDGEN.html
*
* #params size
* #return int[size], each element set to value of its subscript
* #author you
*
* */
public int[] IndGen(int size) {
int[] result = new int[size];
for (int i = 0; i < size; i++) result[i] = i;
return result;
}
Other tips, such as doc style, welcome.
Edit
I've read elsewhere how inefficient a for loop is compared to other methods, as for example in Copying an Array:
Using clone: 93 ms
Using System.arraycopy: 110 ms
Using Arrays.copyOf: 187 ms
Using for loop: 422 ms
I've been impressed by the imaginative responses to some questions on this site, e.g., Display numbers from 1 to 100 without loops or conditions. Here's an answer that might suggest some methods:
public class To100 {
public static void main(String[] args) {
String set = new java.util.BitSet() {{ set(1, 100+1); }}.toString();
System.out.append(set, 1, set.length()-1);
}
}
If you're not up to tackling this challenging problem, no need to vent: just move on to the next unanswered question, one you can handle.

Since it's infeasible to use terabytes of memory at once, and especially to do any calculation with them simultaneously, you might considering using a generator. (You were probably planning to loop over the array, right?) With a generator, you don't need to initialize an array (so you can start using it immediately) and almost no memory is used (O(1)).
I've included an example implementation below. It is bounded by the limitations of the long primitive.
import java.util.Iterator;
import java.util.NoSuchElementException;
public class Counter implements Iterator<Long> {
private long count;
private final long max;
public Counter(long start, long endInclusive) {
this.count = start;
this.max = endInclusive;
}
#Override
public boolean hasNext() {
return count <= max;
}
#Override
public Long next() {
if (this.hasNext())
return count++;
else
throw new NoSuchElementException();
}
#Override
public void remove() {
throw new UnsupportedOperationException();
}
}
Find a usage demonstration below.
Iterator<Long> i = new Counter(0, 50);
while (i.hasNext()) {
System.out.println(i.next()); // Prints 0 to 50
}

only thing i ca think of is using "++i" instead of "i++" , but i think the java compiler already has this optimization .
other than that, this is pretty much the best algorithm there is.
you could make a class that acts as if it has an array yet it doesn't , and that it will simply return the same number that it gets (aka the identity function) , but that's not what you've asked for.

As other have said in their answers, your code is already close to the most efficient that I can think of, at least for small sized arrays. If you need to create those arrays a lot of times and they are very big, instead of continuously iterating in a for loop you could create all the arrays once, and then copy them. The copy operation will be faster than iterating over the array if the array is very big. It would be something like this (in this example for a maximum of 1000 elements):
public static int[][] cache = {{0},{0,1},{0,1,2},{0,1,2,3},{0,1,2,3,4}, ..., {0,1,2,...,998,999}};
Then, from the code where you need to create those arrays a lot of times, you would use something like this:
int[] arrayOf50Elements = Arrays.copyOf(cache[49], 50);
Note that this way you are using a lot of memory to improve the speed. I want to emphasize that this will only be worth the complication when you need to create those arrays a lot of times, the arrays are very big, and maximum speed is one of your requirements. In most of the situations I can think of, the solution you proposed will be the best one.
Edit: I've just seen the huge amount of data and memory you need. The approach I propose would require memory of the order of n^2, where n is the maximum integer you expect to have. In this case that's impractical, due to the monstrous amount of memory you would need. Forget about this. I leave the post because maybe it is useful for others.

Performance issue - clear and reuse a collection OR throw it and get a new one [duplicate]

This question already has answers here:
list.clear() vs list = new ArrayList<Integer>(); [duplicate]
(8 answers)
Closed 8 years ago.
Say we try to implement a merge sort algorithm, given an Array of Arrays to merge what is a better approach, this:
public void merge(ArrayList<ArrayList<E>> a) {
ArrayList<ArrayList<E>> tmp = new ArrayList<ArrayList<E>>() ;
while (a.size()>1) {
for (int i=1; i<a.size();i+=2) {
tmp.add(merge(a.get(i-1),a.get(i)));
}
if (a.size()%2==1) tmp.add(a.get(a.size()-1));
a = tmp;
tmp = new ArrayList<ArrayList<E>>() ;
}
}
or this :
public void merge(ArrayList<ArrayList<E>> a) {
ArrayList<ArrayList<E>> tmp = new ArrayList<ArrayList<E>>(),tmp2 ;
while (a.size()>1) {
for (int i=1; i<a.size();i+=2) {
tmp.add(merge(a.get(i-1),a.get(i)));
}
if (a.size()%2==1) tmp.add(a.get(a.size()-1));
tmp2 = a;
a = tmp;
tmp = tmp2;
tmp.clear();
}
}
to make it clearer, what i was doing is to merge each couple of neighbors in a and put the resulting merged arrays in an external Array of Arrays tmp, after merging all couples, one approach is to clear a and then move tmp to a, and then move the cleared a to tmp.
second approach is to "throw" old tmp and get a new tmp instead of reusing the old one.

As a general rule, don't spend energy trying to reuse old collections; it just makes your code harder to read (and frequently doesn't give you any actual benefit). Only try optimizations like these if you already have your code working, and you have hard numbers that say the speed of your algorithm is improved.

Always allocating a new ArrayList and filling it, will result in more garbage collections which generally slows down everything (minor GCs are cheap but not free).
Reusing the ArrayList will result in less Arrays.copyOf() which is used when the array inside the ArrayList needs to be resized (resizing is cheap but not free).
On the other hand: clear() will also nullify the array content to allow the GC to collect unused object which is of course also not free.
Still, if execution speed is concerned, I would reuse the ArrayList.

Time efficient implementation of generating probability tree and then sorting the results

I have some events, where each of them has a probability to happen, and a weight if they do. I want to create all possible combinations of probabilities of events, with the corresponding weights. In the end, I need them sorted in weight order. It is like generating a probability tree, but I only care about the resulting leaves, not which nodes it took to get them. I don't need to look up specific entries during the creation of the end result, just to create all the values and sort them by weight.
There will be only about 5-15 events,but since there is 2^n resulting possibilities with n events, and this is to be done very often, I don’t want it to take unnecessarily long time. Speed is much more important than the amount of storage used.
The solution I came up with works but is slow. Any idea for a quicker solution or some ideas for improvement?
class ProbWeight {
double prob;
double eventWeight;
public ProbWeight(double aProb, double aeventWeight) {
prob = aProb;
eventWeight = aeventWeight;
}
public ProbWeight(ProbWeight aCellProb) {
prob = aCellProb.getProb();
eventWeight = aCellProb.geteventWeight();
}
public double getProb(){
return prob;
}
public double geteventWeight(){
return eventWeight;
}
public void doesHappen(ProbWeight aProb) {
prob*=aProb.getProb();
eventWeight += aProb.geteventWeight();
}
public void doesNotHappen(ProbWeight aProb) {
prob*=(1-aProb.getProb());
}
}
//Data generation for testing
List<ProbWeight> dataList = new ArrayList<ProbWeight>();
for (int i =0; i<5; i++){
ProbWeight prob = new ProbWeight(Math.random(), 10*Math.random(), i);
dataList.add(prob);
}
//The list where the results will end up
List<ProbWeight> resultingProbList = new ArrayList<ProbWeight>();
// a temporaty list to avoid modifying a list while looping through it
List<ProbWeight> tempList = new ArrayList<ProbWeight>();
resultingProbList.add(dataList.remove(0));
for (ProbWeight data : dataList){ //for each event
//go through the already created event combinations and create two new for each
for(ProbWeight listed: resultingProbList){
ProbWeight firstPossibility = new ProbWeight(listed);
ProbWeight secondPossibility = new ProbWeight(listed);
firstPossibility.doesHappen(data);
secondPossibility.doesNotHappen(data);
tempList.add(firstPossibility);
tempList.add(secondPossibility);
}
resultingProbList = new ArrayList<ProbWeight>(tempList);
}
// Then sort the list by weight using sort and a comparator

It is 50% about choosing an appropriate data structure and 50% about the algorithm. Data structure - I believe TreeBidiMap will do the magic for you. You will need to implement 2 Comparators - 1 for the weight and another for the probability.
Algorithm - trivial.
Good luck!

just a few tricks to try to speed up your code:
- try to avoid non necessary objects allocation
- try to use the right constructor for your collections , in your code sample it seems that you already know the size of the collections, so use it as a parameter in the constructors to prevent useless collections resizing (and gc calls)
You may try to use a Set instead of List in order to see the ordering made on the fly.....
HTH
jerome

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HashMap performs better than array? [duplicate] - java

For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.

Arrays will usually be faster than Collections classes. PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo "The HashTable one looks much much better "

Related

Splitting vectors into subvectors - Java

Is there a way to test for enum value in a list of candidates? (Java)

Most efficient way to create an array of counting numbers

Performance issue - clear and reuse a collection OR throw it and get a new one [duplicate]

Time efficient implementation of generating probability tree and then sorting the results

Categories

Resources