Java: Returning Infinity [duplicate] - java

From the document of ConcurrentHashMap:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates.
Can we fully believe that ConcurrentHashMap does thread safe operation?
I am using ConcurrentHashMap for mapping key with their values.My key-value pair is:
Map<Integer,ArrayList<Double>> map1 = new ConcurrentHashMap();
The size of key ranges from [0,1000000]. I have 20 threads which can access/modify value corresponding to a key at a time. This not so frequent but that condition is possible. I am
getting an infinity from following method:
Double sum =0.0;
sum = sum + Math.exp(getScore(contextFeatureVector,entry.getValue())+constant);
contextFeatureVector and entry.getValue()are arraylist associated with a key.
[EDIT]
constant =0.0001
private double getScore(List<Double> featureVector,List<Double>weightVector) throws NullPointerException
{
double score =0.0;
int length = featureVector.size();
for (int i =0 ; i< length ; i++){
score = score + (featureVector.get(i)*weightVector.get(i));
}
return score;
}
Both featureVector<> and weightVector looks like
[-0.005554038592516575, 0.0048966974158881175, -0.05315976588195846, -0.030837804373964654, 0.014483064988148562, -0.018962129117649, -0.015221386014208877, 0.015825702365331477, -0.11363620479662287, 0.00802609847263844, -0.062106636476812194, 0.008108854471293185, -0.03193255218671684, 0.04949650992670292, -0.0545583154094599, -0.04873314092706468, 0.013534731656877033, 0.08433117163682455, 0.050310355477044114, -0.002420513353516017, -0.02708299928442614, -0.023489187394176294, -0.1277699782685597, -0.10071004855129333, 0.08649040730064464, -0.04940329664431305, -0.027481729446035053, -0.0571846057609884, -0.036738550618481455, -0.035608113682344365]
thus the value returned from getScore does not go exceptionally too large. it will be in
some thousands.

It is thread safe, but can use it an manner which is not thread safe.
I suspect you haven't investigated the problem enough to determine that there is a bug in a JDK library which has been used for more than a decade.

The data structure you use makes me believe there must some bug in your code. Most likely you are fetching the list from map and updating it:
map1.get(42).add(5);
Note that add(5) is not thread-safe as it operates on ordinary ArrayList. You either need thread safe ArrayList or replace(K key, V oldValue, V newValue) method.
If you read carefully through the guarantees ConcurrentHashMap is giving, you can use it effectively.

If you call Math.exp(...) on an input that is too large you will get an Infinity. That is the probably cause of your problems ... not some imagined problem with the thread safety.
I suggest that you add some trace code to see what
getScore(contextFeatureVector, entry.getValue())
is returning when sum becomes an Infinity. Beyond that, I don't think we'll be able to help without seeing more of your code.

The largest number that can be stored in a Java double is approximately exp(709). So if you pass anything larger than 709 into exp(), you should expect to get an arithmetic overflow.

Related

Using LongAdder to calculate a max value for a statistical counter?

We collect some statistics using AtomicLongs. Some users are seeing contention on these and have suggested using LongAdder instead. However I see no way to calculate the maximum value as we are currently doing with the Atomic:
AtomicLong _current, _total, _max;
...
void add(long delta)
{
long current = _current.addAndGet(delta);
if (delta>0)
{
_total.addAndGet(delta);
long max = _max.get();
while (current > max)
{
if (_max.compareAndSet(max, current))
break;
max = _max.get();
}
}
So I think we can replace _total easily enough with a LongAdder, but because we do _current.addAndGet(delta) that will not work well for a LongAdder, nor can we do cas operation for the `_max' value.
Are there any good algorithms for collecting such statistics based on LongAdder or similar scalable lock free constructs?
Actually, whiles I'm asking, our stats typically update 6 to 10 AtomicLongs. If we are seeing contention anyway, could it be possibly be better to just grab a lock and update 6 to 10 normal longs?
You don't want LongAdder, but LongAccumulator here: you want new LongAccumulator(Math::max, Long.MIN_VALUE), which does the right thing here. LongAdder is a special case of LongAccumulator.

Creating Map from large file

I have a very large file (10^8 lines) with counts of events as follows,
A 10
B 11
C 23
A 11
I need to accumulate the counts for each event, so that my map contains
A 21
B 11
C 23
My current approach:
Read the lines, maintain a map, and update the counts in the map as follows
updateCount(Map<String, Long> countMap, String key,
Long c) {
if (countMap.containsKey(key)) {
Long val = countMap.get(key);
countMap.put(key, val + c);
} else {
countMap.put(key, c);
}
}
Currently this is the slowest part of the code, (takes around 25 ms).
Note that the map is based on MapDB, but I doubt that updates are slow due to that (are they?)
This is the mapdb configs for the map,
DBMaker.newFileDB(dbFile).freeSpaceReclaimQ(3)
.mmapFileEnablePartial()
.transactionDisable()
.cacheLRUEnable()
.closeOnJvmShutdown();
Are there ways to speed this up?
EDIT:
The number of unique keys is of the order of the pages in wikipedia. The data is actually page traffic data from here.
You might try
class Counter {
long count;
}
void updateCount(Map<String, Counter> countMap, String key, int c) {
Counter counter = countMap.get(key);
if (counter == null) {
counter = new Counter();
countMap.put(key, counter);
counter.count = c;
} else {
counter.count += c;
}
}
This does not create many Long wrappers, but just allocates Counters the number of keys.
Note: do not create Long's. Above I made c an int to not oversee long/Long.
As a starting point, I'd suggest thinking about:
What is yardstick by which you're saying that 25ms is actually an unreasonable amount of time for the amount of data involved and for a generic map implementation? if you quantify that, it might help you work out if there is anything wrong.
How much time is being spent re-hashing the map versus other operations (e.g. calculation of hash codes on each put)?
What do your "events" as you call them consist of? How many unique events-- and hence unique keys-- are there? How are keys to the map being generated, and is there a more efficient way to do so? (In a standard hash map, for example, you create additional objects for each association, and actually store the key objects increasing the memory footprint.)
Depending on the answers to the previous, you could potentially roll a more efficient map structure yourself (see this example that you might be able to adapt). Essentially, you need to look specifically at what is taking the time (e.g. hash code calculation per put / cost of rehashing) and try and optimise that part.
If you are using a TreeMap, there are performance tuning options like
The number of entries in each node.
You could also use specific key and value serializer that will speed up the serialization and de-serilization.
You could use Pump mode to build the tree, which is very very fast. But one caveat is that this is useful when you are building a new map from scratch. You can find the full example here
https://github.com/jankotek/MapDB/blob/master/src/test/java/examples/Huge_Insert.java

Create an almost unique identifier based on a given array of numbers

Given an array of numbers, I would like to create a number identifier that represents that combination as unique as possible.
For example:
int[] inputNumbers = { 543, 134, 998 };
int identifier = createIdentifier(inputNumbers);
System.out.println( identifier );
Output:
4532464234
-The returned number must be as unique as possible
-Ordering of the elements must influence the result
-The algorithm must return always the same result from the same input array
-The algorithm must be as fast as possible to be used alot in 'for' loops
The purpose of this algorithm, is to create a small value to be stored in a DB, and to be easily comparable. It is nothing critical so it's acceptable that some arrays of numbers return the same value, but that cases must be rare.
Can you suggest a good way to accomplish this?
The standard ( Java 7 ) implementation of Arrays.hashCode(int[]) has the required properties. It is implemented thus:
2938 public static int hashCode(int a[]) {
2939 if (a == null)
2940 return 0;
2941
2942 int result = 1;
2943 for (int element : a)
2944 result = 31 * result + element;
2945
2946 return result;
2947 }
As you can see, the implementation is fast, and the result depends on the order of the elements as well as the element values.
If there is a requirement that the hash values are the same across all Java platforms, I think you can rely on that being satisfied. The javadoc says that the method will return a value that is that same as you get when calling List<Integer>.hashcode() on an equivalent list. And the formula for that hashcode is specified.
Have a look at Arrays.hashCode(int[]), it is doing exactly this.
documentation
What you're looking for is the array's hash code.
int hash = Arrays.hashCode(new int[]{1, 2, 3, 4});
See also the Java API
I also say you are looking for some kind of hash function.
I don't know how much you will rely on point 3 The algorithm must return always the same result from the same input array, but this depends on the JVM implementation.
So depending on your use case you might run into some trouble (The solution then would be to use a extern hash library).
For further information take a look at this SO question: Java, Object.hashCode() result constant across all JVMs/Systems?
EDIT
I just read you want to store the values in a DB. In that case I would recommend you to use a extern hasing library that is reliable and guaranteed to yield the same value every time it is invoked. Otherwise you would have to re-hash your whole DB every time you start your application, to have it in a consistent state.
EDIT2
Since you are using only plain ints the hash value should be the same every time. As #Stephen C showed in his answer.

ConcurrentHashMap: Can we trust on it?

From the document of ConcurrentHashMap:
A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates.
Can we fully believe that ConcurrentHashMap does thread safe operation?
I am using ConcurrentHashMap for mapping key with their values.My key-value pair is:
Map<Integer,ArrayList<Double>> map1 = new ConcurrentHashMap();
The size of key ranges from [0,1000000]. I have 20 threads which can access/modify value corresponding to a key at a time. This not so frequent but that condition is possible. I am
getting an infinity from following method:
Double sum =0.0;
sum = sum + Math.exp(getScore(contextFeatureVector,entry.getValue())+constant);
contextFeatureVector and entry.getValue()are arraylist associated with a key.
[EDIT]
constant =0.0001
private double getScore(List<Double> featureVector,List<Double>weightVector) throws NullPointerException
{
double score =0.0;
int length = featureVector.size();
for (int i =0 ; i< length ; i++){
score = score + (featureVector.get(i)*weightVector.get(i));
}
return score;
}
Both featureVector<> and weightVector looks like
[-0.005554038592516575, 0.0048966974158881175, -0.05315976588195846, -0.030837804373964654, 0.014483064988148562, -0.018962129117649, -0.015221386014208877, 0.015825702365331477, -0.11363620479662287, 0.00802609847263844, -0.062106636476812194, 0.008108854471293185, -0.03193255218671684, 0.04949650992670292, -0.0545583154094599, -0.04873314092706468, 0.013534731656877033, 0.08433117163682455, 0.050310355477044114, -0.002420513353516017, -0.02708299928442614, -0.023489187394176294, -0.1277699782685597, -0.10071004855129333, 0.08649040730064464, -0.04940329664431305, -0.027481729446035053, -0.0571846057609884, -0.036738550618481455, -0.035608113682344365]
thus the value returned from getScore does not go exceptionally too large. it will be in
some thousands.
It is thread safe, but can use it an manner which is not thread safe.
I suspect you haven't investigated the problem enough to determine that there is a bug in a JDK library which has been used for more than a decade.
The data structure you use makes me believe there must some bug in your code. Most likely you are fetching the list from map and updating it:
map1.get(42).add(5);
Note that add(5) is not thread-safe as it operates on ordinary ArrayList. You either need thread safe ArrayList or replace(K key, V oldValue, V newValue) method.
If you read carefully through the guarantees ConcurrentHashMap is giving, you can use it effectively.
If you call Math.exp(...) on an input that is too large you will get an Infinity. That is the probably cause of your problems ... not some imagined problem with the thread safety.
I suggest that you add some trace code to see what
getScore(contextFeatureVector, entry.getValue())
is returning when sum becomes an Infinity. Beyond that, I don't think we'll be able to help without seeing more of your code.
The largest number that can be stored in a Java double is approximately exp(709). So if you pass anything larger than 709 into exp(), you should expect to get an arithmetic overflow.

Array access optimization

I have a 10x10 array in Java, some of the items in array which are not used, and I need to traverse through all elements as part of a method. What Would be better to do :
Go through all elements with 2 for loops and check for the nulltype to avoid errors, e.g.
for(int y=0;y<10;y++){
for(int x=0;x<10;x++){
if(array[x][y]!=null)
//perform task here
}
}
Or would it be better to keep a list of all the used addresses... Say an arraylist of points?
Something different I haven't mentioned.
I look forward to any answers :)
Any solution you try needs to be tested in controlled conditions resembling as much as possible the production conditions. Because of the nature of Java, you need to exercise your code a bit to get reliable performance stats, but I'm sure you know that already.
This said, there are several things you may try, which I've used to optimize my Java code with success (but not on Android JVM)
for(int y=0;y<10;y++){
for(int x=0;x<10;x++){
if(array[x][y]!=null)
//perform task here
}
}
should in any case be reworked into
for(int x=0;x<10;x++){
for(int y=0;y<10;y++){
if(array[x][y]!=null)
//perform task here
}
}
Often you will get performance improvement from caching the row reference. Let as assume the array is of the type Foo[][]:
for(int x=0;x<10;x++){
final Foo[] row = array[x];
for(int y=0;y<10;y++){
if(row[y]!=null)
//perform task here
}
}
Using final with variables was supposed to help the JVM optimize the code, but I think that modern JIT Java compilers can in many cases figure out on their own whether the variable is changed in the code or not. On the other hand, sometimes this may be more efficient, although takes us definitely into the realm of microoptimizations:
Foo[] row;
for(int x=0;x<10;x++){
row = array[x];
for(int y=0;y<10;y++){
if(row[y]!=null)
//perform task here
}
}
If you don't need to know the element's indices in order to perform the task on it, you can write this as
for(final Foo[] row: array){
for(final Foo elem: row
if(elem!=null)
//perform task here
}
}
Another thing you may try is to flatten the array and store the elements in Foo[] array, ensuring maximum locality of reference. You have no inner loop to worry about, but you need to do some index arithmetic when referencing particular array elements (as opposed to looping over the whole array). Depending on how often you do it, it may or not be beneficial.
Since most of the elements will be not-null, keeping them as a sparse array is not beneficial for you, as you lose locality of reference.
Another problem is the null test. The null test itself doesn't cost much, but the conditional statement following it does, as you get a branch in the code and lose time on wrong branch predictions. What you can do is to use a "null object", on which the task will be possible to perform but will amount to a non-op or something equally benign. Depending on the task you want to perform, it may or may not work for you.
Hope this helps.
You're better off using a List than an array, especially since you may not use the whole set of data. This has several advantages.
You're not checking for nulls and may not accidentally try to use a null object.
More memory efficient in that you're not allocating memory which may not be used.
For a hundred elements, it's probably not worth using any of the classic sparse array
implementations. However, you don't say how sparse your array is, so profile it and see how much time you spend skipping null items compared to whatever processing you're doing.
( As Tom Hawtin - tackline mentions ) you should, when using an array of arrays, try to loop over members of each array rather than than looping over the same index of different arrays. Not all algorithms allow you to do that though.
for ( int x = 0; x < 10; ++x ) {
for ( int y = 0; y < 10; ++y ) {
if ( array[x][y] != null )
//perform task here
}
}
or
for ( Foo[] row : array ) {
for ( Foo item : row ) {
if ( item != null )
//perform task here
}
}
You may also find it better to use a null object rather than testing for null, depending what the complexity of the operation you're performing is. Don't use the polymorphic version of the pattern - a polymorphic dispatch will cost at least as much as a test and branch - but if you were summing properties having an object with a zero is probably faster on many CPUs.
double sum = 0;
for ( Foo[] row : array ) {
for ( Foo item : row ) {
sum += item.value();
}
}
As to what applies to android, I'm not sure; again you need to test and profile for any optimisation.
Holding an ArrayList of points would be "over engineering" the problem. You have a multi-dimensional array; the best way to iterate over it is with two nested for loops. Unless you can change the representation of the data, that's roughly as efficient as it gets.
Just make sure you go in row order, not column order.
Depends on how sparse/dense your matrix is.
If it is sparse, you better store a list of points, if it is dense, go with the 2D array. If in between, you can have a hybrid solution storing a list of sub-matrices.
This implementation detail should be hidden within a class anyway, so your code can also anytime convert between any of these representations.
I would discourage you from settling on any of these solutions without profiling with your real application.
I agree an array with a null test is the best approach unless you expect sparsely populated arrays.
Reasons for this:
1- More memory efficient for dense arrays (a list needs to store the index)
2- More computationally efficient for dense arrays (You need only compare the value you just retrieved to NULL, instead of having to also get the index from memory).
Also, a small suggestion, but in Java especially you are often better off faking a multi dimensional array with a 1D array where possible (square/rectangluar arrays in 2D). Bounds checking only happens once per iteration, instead of twice. Not sure if this still applies in the android VMs, but it has traditionally been an issue. Regardless, you can ignore it if the loop is not a bottleneck.

Categories