Java: Enum vs. Int - java

When using flags in Java, I have seen two main approaches. One uses int values and a line of if-else statements. The other is to use enums and case-switch statements.
I was wondering if there was a difference in terms of memory usage and speed between using enums vs ints for flags?

Both ints and enums can use both switch or if-then-else, and memory usage is also minimal for both, and speed is similar - there's no significant difference between them on the points you raised.
However, the most important difference is the type checking. Enums are checked, ints are not.
Consider this code:
public class SomeClass {
public static int RED = 1;
public static int BLUE = 2;
public static int YELLOW = 3;
public static int GREEN = 3; // sic
private int color;
public void setColor(int color) {
this.color = color;
}
}
While many clients will use this properly,
new SomeClass().setColor(SomeClass.RED);
There is nothing stopping them from writing this:
new SomeClass().setColor(999);
There are three main problems with using the public static final pattern:
The problem occurs at runtime, not compile time, so it's going to be more expensive to fix, and harder to find the cause
You have to write code to handle bad input - typically a if-then-else with a final else throw new IllegalArgumentException("Unknown color " + color); - again expensive
There is nothing preventing a collision of constants - the above class code will compile even though YELLOW and GREEN both have the same value 3
If you use enums, you address all these problems:
Your code won't compile unless you pass valid values in
No need for any special "bad input" code - the compiler handles that for you
Enum values are unique

Memory usage and speed aren't the considerations that matter. You would not be able to measure a difference either way.
I think enums should be preferred when they apply, because the emphasize the fact that the chosen values go together and comprise a closed set. Readability is improved a great deal, too. Code using enums is more self-documenting than stray int values scattered throughout your code.
Prefer enums.

You may even use Enums to replace those bitwise combined flags like int flags = FLAG_1 | FLAG_2;
Instead you can use a typesafe EnumSet:
Set<FlagEnum> flags = EnumSet.of(FlagEnum.FLAG_1, FlagEnum.FLAG_2);
// then simply test with contains()
if(flags.contains(FlagEnum.FLAG_1)) ...
The documentation states that those classes are internally optimized as bit vectors and that the implementation should be perform well enough to replace the int-based flags.

One of the reasons you will see some code using int flags instead of an enum is that Java did not have enums until Java 1.5
So if you are looking at code that was originally written for an older version of Java, then the int pattern was the only option available.
There are a very small number of places where using int flags is still preferable in modern Java code, but in most cases you should prefer to use an enum, due to the type safety and expressiveness that they offer.
In terms of efficiency, it will depend on exactly how they are used. The JVM handles both types very efficiently, but the int method would likely be slightly more efficient for some use cases (because they are handled as primitive rather than objects), but in other cases, the enum would be more efficient (because it doesn't need to go throw boxing/unboxing).
You would be hard pressed to find a situation in which the efficiency difference would be in any way noticeable in a real world application, so you should make the decision based on the quality of the code (readability and safety), which should lead you to use an enum 99% of the time.

Bear in mind that enums are type-safe, and you can't mix values from one enum with another. That's a good reason to prefer enums over ints for flags.
On the other hand, if you use ints for your constants, you can mix values from unrelated constants, like this:
public static final int SUNDAY = 1;
public static final int JANUARY = 1;
...
// even though this works, it's a mistake:
int firstMonth = SUNDAY;
The memory usage of enums over ints is negligible, and the type safety enums provide makes the minimal overhead acceptable.

Yes, there is a difference. Under modern 64-bit java Enum values are essentially pointers to objects and they either take 64 bits (non-compressed ops) or use additional CPU (compressed ops).
My test showed about 10% performance degradation for enums (1.8u25, AMD FX-4100): 13k ns vs 14k ns
Test source below:
public class Test {
public static enum Enum {
ONE, TWO, THREE
}
static class CEnum {
public Enum e;
}
static class CInt {
public int i;
}
public static void main(String[] args) {
CEnum[] enums = new CEnum[8192];
CInt[] ints = new CInt[8192];
for (int i = 0 ; i < 8192 ; i++) {
enums[i] = new CEnum();
ints[i] = new CInt();
ints[i].i = 1 + (i % 3);
if (i % 3 == 0) {
enums[i].e = Enum.ONE;
} else if (i % 3 == 1) {
enums[i].e = Enum.TWO;
} else {
enums[i].e = Enum.THREE;
}
}
int k=0; //calculate something to prevent tests to be optimized out
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
k+=test1(enums);
System.out.println();
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
k+=test2(ints);
System.out.println(k);
}
private static int test2(CInt[] ints) {
long t;
int k = 0;
for (int i = 0 ; i < 1000 ; i++) {
k+=test(ints);
}
t = System.nanoTime();
k+=test(ints);
System.out.println((System.nanoTime() - t)/100 + "ns");
return k;
}
private static int test1(CEnum[] enums) {
int k = 0;
for (int i = 0 ; i < 1000 ; i++) {
k+=test(enums);
}
long t = System.nanoTime();
k+=test(enums);
System.out.println((System.nanoTime() - t)/100 + "ns");
return k;
}
private static int test(CEnum[] enums) {
int i1 = 0;
int i2 = 0;
int i3 = 0;
for (int j = 100 ; j != 0 ; --j)
for (int i = 0 ; i < 8192 ; i++) {
CEnum c = enums[i];
if (c.e == Enum.ONE) {
i1++;
} else if (c.e == Enum.TWO) {
i2++;
} else {
i3++;
}
}
return i1 + i2*2 + i3*3;
}
private static int test(CInt[] enums) {
int i1 = 0;
int i2 = 0;
int i3 = 0;
for (int j = 100 ; j != 0 ; --j)
for (int i = 0 ; i < 8192 ; i++) {
CInt c = enums[i];
if (c.i == 1) {
i1++;
} else if (c.i == 2) {
i2++;
} else {
i3++;
}
}
return i1 + i2*2 + i3*3;
}
}

Answer to your question: No, the after a negligible time to load the Enum Class, the performance is the same.
As others have stated both types can be used in switch or if else statements. Also, as others have stated, you should favor Enums over int flags, because they were designed to replace that pattern and they provide added safety.
HOWEVER, there is a better pattern that you consider. Providing whatever value your switch statement/if statement was supposed to produce as property.
Look at this link: http://docs.oracle.com/javase/1.5.0/docs/guide/language/enums.html Notice the pattern provided for giving the planets masses and radii. Providing the property in this manner insures that you won't forget to cover a case if you add an enum.

I like using Enums when possible but I had a situation where I was having to compute millions of file offsets for different file types which I had defined in an enum and I had to execute a switch statement tens of millions of times to compute the offset base on on the enum type. I ran the following test:
import java.util.Random;
public class switchTest
{
public enum MyEnum
{
Value1, Value2, Value3, Value4, Value5
};
public static void main(String[] args)
{
final String s1 = "Value1";
final String s2 = "Value2";
final String s3 = "Value3";
final String s4 = "Value4";
final String s5 = "Value5";
String[] strings = new String[]
{
s1, s2, s3, s4, s5
};
Random r = new Random();
long l = 0;
long t1 = System.currentTimeMillis();
for(int i = 0; i < 10_000_000; i++)
{
String s = strings[r.nextInt(5)];
switch(s)
{
case s1:
// make sure the compiler can't optimize the switch out of existence by making the work of each case it does different
l = r.nextInt(5);
break;
case s2:
l = r.nextInt(10);
break;
case s3:
l = r.nextInt(15);
break;
case s4:
l = r.nextInt(20);
break;
case s5:
l = r.nextInt(25);
break;
}
}
long t2 = System.currentTimeMillis();
for(int i = 0; i < 10_000_000; i++)
{
MyEnum e = MyEnum.values()[r.nextInt(5)];
switch(e)
{
case Value1:
// make sure the compiler can't optimize the switch out of existence by making the work of each case it does different
l = r.nextInt(5);
break;
case Value2:
l = r.nextInt(10);
break;
case Value3:
l = r.nextInt(15);
break;
case Value4:
l = r.nextInt(20);
break;
case Value5:
l = r.nextInt(25);
break;
}
}
long t3 = System.currentTimeMillis();
for(int i = 0; i < 10_000_000; i++)
{
int xx = r.nextInt(5);
switch(xx)
{
case 1:
// make sure the compiler can't optimize the switch out of existence by making the work of each case it does different
l = r.nextInt(5);
break;
case 2:
l = r.nextInt(10);
break;
case 3:
l = r.nextInt(15);
break;
case 4:
l = r.nextInt(20);
break;
case 5:
l = r.nextInt(25);
break;
}
}
long t4 = System.currentTimeMillis();
System.out.println("strings:" + (t2 - t1));
System.out.println("enums :" + (t3 - t2));
System.out.println("ints :" + (t4 - t3));
}
}
and got the following results:
strings:442
enums :455
ints :362
So from this I decided that for me enums were efficient enough. When I decreased the loop counts to 1M from 10M the string and enums took about twice as long as the int which indicates that there was some overhead to using strings and enums for the first time as compared to ints.

Even though this question is old, I'd like to point out what you can't do with ints
public interface AttributeProcessor {
public void process(AttributeLexer attributeLexer, char c);
}
public enum ParseArrayEnd implements AttributeProcessor {
State1{
public void process(AttributeLexer attributeLexer, char c) {
.....}},
State2{
public void process(AttributeLexer attributeLexer, char c) {
.....}}
}
And what you can do is make a map of what value is expected as a Key, and the enum as a value,
Map<String, AttributeProcessor> map
map.getOrDefault(key, ParseArrayEnd.State1).process(this, c);

Related

Why is x not initialized in this?

Why is x not initialized in the following ?
public class rough {
public static void main(String[] args) {
int x;
boolean found = false;
for (int i = 0; i < 10; i++) {
if (Math.random() < 0.5) {
found = true;
x = 10;
break;
}
}
if (!found)
x = -1;
System.out.println(x);//x isn't initialized here
}
}
On average, for half of the iterations, the if inside the for loop would be true, thus initializing x. For the other half, found stays false therefore the outer if would initialize. Therefore, I don't understand why the compiler is annoyed.
As the ultimate distillation (see successive simplifications below), consider
public static void main(String[] args) {
int x;
boolean found = false;
if (!found)
x = -1;
System.out.println(x);
}
which also gives the error that x isn't init.
previous simplifications
Even more surprisingly, changing
if (Math.random() < 0.5) to if(true) also has the same problem.
In fact, investigating further, replacing the original for loop by these
for (int i=0;i<1;i++)
x = 10;
for (; !found; ) {
x = 10;
break;
}
is equally worse. Only for(;;){... break;} & for(;true;){... break;} don't raise any init. errors.
The compiler can't easily detect all branches lead to x being initialized, but you can fix that (and the code) pretty easily by assigning -1 to x to begin with. Something like
public static void main(String[] args) {
int x = -1;
for (int i = 0; i < 10; i++) {
if (Math.random() < 0.5) {
x = 10;
break;
}
}
System.out.println(x);
}
And now you don't need found (so I removed it too).
(Writing this up as a separate answer since I think it'll benefit from being taken out of comments).
This is the language-lawyer's answer.
The language specification requires initialization of variables before they are used.
The rules include:
The variable must be given a value on all possible paths through the code. The specification refers to this as 'definite assignment'.
The compiler does not consider the values of expressions in this analysis. See Example 16.2 for this.
The second rule explains why even in cases that are 'obvious' to us, the compiler can't use that knowledge. Even if the compiler-writer cared to do a deeper analysis, adherence to the Java specification forbids it.
If the next question is 'but why?' then I'd have to guess, but the point of a standard is to get consistent behaviour. You don't want one compiler accepting as legal Java something that another compiler rejects.

Java converting a string binary to integer without using maths pow [duplicate]

This question already has answers here:
How to convert a Binary String to a base 10 integer in Java
(12 answers)
Closed 4 years ago.
Main:
public class Main{
public static void main(String[] args){
System.out.println(Convert.BtoI("101101010101"));
System.out.println(Convert.BtoI("1011110"));
}
}
Sub:
public class Convert{
public static int BtoI(String value){
int no = 0;
for(int i=value.length()-1;i>=0;i--){
if(value.charAt(i)=='1')
no += (???) ;
++;
}
return no;
}
}
How can I convert a string binary to integer without using maths.pow, just using + - * and /, should I implement another for loop for it is int j = 1;i <= example; i*=2){ ?. I am quite confused and want to learn without the usage of maths.pow or any similar codes.
From the beginning of the string until to the end you can just multiply each character with its power and sum up the total which gives you the base ten value:
public static int binaryToInt(String binaryNum) {
int pow = 1;
int total = 0;
for (int i = binaryNum.length(); i > 0; i--) {
if (binaryNum.charAt(i-1) == '1') {
total += pow;
}
pow *= 2;
}
return total;
}
But i think this is way more elegant:
String binaryNum = "1001";
int decimalValue = Integer.parseInt(binaryNum, 2);
How about Integer.parseInt(yourString,2); ?? 2 means you are parsing base2 number.
Starting from your code + some vital style changes:
public class Convert {
public static int binaryToInt(String value) {
int no = 0;
for (int i = 0; i < value.length() - 1; i++) {
no = no * 2; // or no *= 2;
if (value.charAt(i) == '1') {
no = no + 1; // or no++
}
}
return no;
}
}
The meaning of the code I added should be self-evident. If not, I would encourage you to work out what it does as an exercise. (If you are not sure, use a pencil and paper to "hand execute" it ...)
The style changes (roughly in order of importance) are:
Don't start a method name with an uppercase character.
Use camel-case.
Avoid cute / obscure / unnecessary abbreviations. The next guy reading your code should not have to use a magic decoder ring to understand your method names.
Put spaces around operators.
4 character indentation.
Put spaces between ) and { and before / after keywords.
Use curly brackets around the "then" and "else" parts of an if statement, even if they are not strictly necessary. (This is to avoid a problem where indentation mistakes cause you to misread the code.)

Coding pattern for random percentage branching?

So let's say we have a code block that we want to execute 70% of times and another one 30% of times.
if(Math.random() < 0.7)
70percentmethod();
else
30percentmethod();
Simple enough. But what if we want it to be easily expandable to say, 30%/60%/10% etc.?
Here it would require adding and changing all the if statements on change which isn't exactly great to use, slow and mistake inducing.
So far I've found large switches to be decently useful for this use case, for example:
switch(rand(0, 10)){
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:70percentmethod();break;
case 8:
case 9:
case 10:30percentmethod();break;
}
Which can be very easily changed to:
switch(rand(0, 10)){
case 0:10percentmethod();break;
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:60percentmethod();break;
case 8:
case 9:
case 10:30percentmethod();break;
}
But these have their drawbacks as well, being cumbersome and split onto a predetermined amount of divisions.
Something ideal would be based on a "frequency number" system I guess, like so:
(1,a),(1,b),(2,c) -> 25% a, 25% b, 50% c
then if you added another one:
(1,a),(1,b),(2,c),(6,d) -> 10% a, 10% b, 20% c, 60% d
So simply adding up the numbers, making the sum equal 100% and then split that.
I suppose it wouldn't be that much trouble to make a handler for it with a customized hashmap or something, but I'm wondering if there's some established way/pattern or lambda for it before I go all spaghetti on this.
EDIT: See edit at end for more elegant solution. I'll leave this in though.
You can use a NavigableMap to store these methods mapped to their percentages.
NavigableMap<Double, Runnable> runnables = new TreeMap<>();
runnables.put(0.3, this::30PercentMethod);
runnables.put(1.0, this::70PercentMethod);
public static void runRandomly(Map<Double, Runnable> runnables) {
double percentage = Math.random();
for (Map.Entry<Double, Runnable> entry : runnables){
if (entry.getKey() < percentage) {
entry.getValue().run();
return; // make sure you only call one method
}
}
throw new RuntimeException("map not filled properly for " + percentage);
}
// or, because I'm still practicing streams by using them for everything
public static void runRandomly(Map<Double, Runnable> runnables) {
double percentage = Math.random();
runnables.entrySet().stream()
.filter(e -> e.getKey() < percentage)
.findFirst().orElseThrow(() ->
new RuntimeException("map not filled properly for " + percentage))
.run();
}
The NavigableMap is sorted (e.g. HashMap gives no guarantees of the entries) by keys, so you get the entries ordered by their percentages. This is relevant because if you have two items (3,r1),(7,r2), they result in the following entries: r1 = 0.3 and r2 = 1.0 and they need to be evaluated in this order (e.g. if they are evaluated in the reverse order the result would always be r2).
As for the splitting, it should go something like this:
With a Tuple class like this
static class Pair<X, Y>
{
public Pair(X f, Y s)
{
first = f;
second = s;
}
public final X first;
public final Y second;
}
You can create a map like this
// the parameter contains the (1,m1), (1,m2), (3,m3) pairs
private static Map<Double,Runnable> splitToPercentageMap(Collection<Pair<Integer,Runnable>> runnables)
{
// this adds all Runnables to lists of same int value,
// overall those lists are sorted by that int (so least probable first)
double total = 0;
Map<Integer,List<Runnable>> byNumber = new TreeMap<>();
for (Pair<Integer,Runnable> e : runnables)
{
total += e.first;
List<Runnable> list = byNumber.getOrDefault(e.first, new ArrayList<>());
list.add(e.second);
byNumber.put(e.first, list);
}
Map<Double,Runnable> targetList = new TreeMap<>();
double current = 0;
for (Map.Entry<Integer,List<Runnable>> e : byNumber.entrySet())
{
for (Runnable r : e.getValue())
{
double percentage = (double) e.getKey() / total;
current += percentage;
targetList.put(current, r);
}
}
return targetList;
}
And all of this added to a class
class RandomRunner {
private List<Integer, Runnable> runnables = new ArrayList<>();
public void add(int value, Runnable toRun) {
runnables.add(new Pair<>(value, toRun));
}
public void remove(Runnable toRemove) {
for (Iterator<Pair<Integer, Runnable>> r = runnables.iterator();
r.hasNext(); ) {
if (toRemove == r.next().second) {
r.remove();
break;
}
}
}
public void runRandomly() {
// split list, use code from above
}
}
EDIT :
Actually, the above is what you get if you get an idea stuck in your head and don't question it properly.
Keeping the RandomRunner class interface, this is much easier:
class RandomRunner {
List<Runnable> runnables = new ArrayList<>();
public void add(int value, Runnable toRun) {
// add the methods as often as their weight indicates.
// this should be fine for smaller numbers;
// if you get lists with millions of entries, optimize
for (int i = 0; i < value; i++) {
runnables.add(toRun);
}
}
public void remove(Runnable r) {
Iterator<Runnable> myRunnables = runnables.iterator();
while (myRunnables.hasNext()) {
if (myRunnables.next() == r) {
myRunnables.remove();
}
}
public void runRandomly() {
if (runnables.isEmpty()) return;
// roll n-sided die
int runIndex = ThreadLocalRandom.current().nextInt(0, runnables.size());
runnables.get(runIndex).run();
}
}
All these answers seem quite complicated, so I'll just post the keep-it-simple alternative:
double rnd = Math.random()
if((rnd -= 0.6) < 0)
60percentmethod();
else if ((rnd -= 0.3) < 0)
30percentmethod();
else
10percentmethod();
Doesn't need changing other lines and one can quite easily see what happens, without digging into auxiliary classes. A small downside is that it doesn't enforce that percentages sum to 100%.
I am not sure if there is a common name to this, but I think I learned this as the wheel of fortune back in university.
It basically just works as you described: It receives a list of values and "frequency numbers" and one is chosen according to the weighted probabilities.
list = (1,a),(1,b),(2,c),(6,d)
total = list.sum()
rnd = random(0, total)
sum = 0
for i from 0 to list.size():
sum += list[i]
if sum >= rnd:
return list[i]
return list.last()
The list can be a function parameter if you want to generalize this.
This also works with floating point numbers and the numbers don't have to be normalized. If you normalize (to sum up to 1 for example), you can skip the list.sum() part.
EDIT:
Due to demand here is an actual compiling java implementation and usage example:
import java.util.ArrayList;
import java.util.Random;
public class RandomWheel<T>
{
private static final class RandomWheelSection<T>
{
public double weight;
public T value;
public RandomWheelSection(double weight, T value)
{
this.weight = weight;
this.value = value;
}
}
private ArrayList<RandomWheelSection<T>> sections = new ArrayList<>();
private double totalWeight = 0;
private Random random = new Random();
public void addWheelSection(double weight, T value)
{
sections.add(new RandomWheelSection<T>(weight, value));
totalWeight += weight;
}
public T draw()
{
double rnd = totalWeight * random.nextDouble();
double sum = 0;
for (int i = 0; i < sections.size(); i++)
{
sum += sections.get(i).weight;
if (sum >= rnd)
return sections.get(i).value;
}
return sections.get(sections.size() - 1).value;
}
public static void main(String[] args)
{
RandomWheel<String> wheel = new RandomWheel<String>();
wheel.addWheelSection(1, "a");
wheel.addWheelSection(1, "b");
wheel.addWheelSection(2, "c");
wheel.addWheelSection(6, "d");
for (int i = 0; i < 100; i++)
System.out.print(wheel.draw());
}
}
While the selected answer works, it is unfortunately asymptotically slow for your use case. Instead of doing this, you could use something called Alias Sampling. Alias sampling (or alias method) is a technique used for selection of elements with a weighted distribution. If the weights of choosing those elements doesn't change you can do selection in O(1) time!. If this isn't the case, you can still get amortized O(1) time if the ratio between the number of selections you make and the changes you make to the alias table (changing the weights) is high. The current selected answer suggests an O(N) algorithm, the next best thing is O(log(N)) given sorted probabilities and binary search, but nothing is going to beat the O(1) time I suggested.
This site provides a good overview of Alias method that is mostly language agnostic. Essentially you create a table where each entry represents the outcome of two probabilities. There is a single threshold for each entry at the table, below the threshold you get one value, above you get another value. You spread larger probabilities across multiple table values in order to create a probability graph with an area of one for all probabilities combined.
Say you have the probabilities A, B, C, and D, which have the values 0.1, 0.1, 0.1 and 0.7 respectively. Alias method would spread the probability of 0.7 to all the others. One index would correspond to each probability, where you would have the 0.1 and 0.15 for ABC, and 0.25 for D's index. With this you normalize each probability so that you end up with 0.4 chance of getting A and 0.6 chance of getting D in A's index (0.1/(0.1 + 0.15) and 0.15/(0.1 + 0.15) respecively) as well as B and C's index, and 100% chance of getting D in D's index (0.25/0.25 is 1).
Given an unbiased uniform PRNG (Math.Random()) for indexing, you get an equal probability of choosing each index, but you also do a coin flip per index which provides the weighted probability. You have a 25% chance of landing on the A or D slot, but within that you only have a 40% chance of picking A, and 60% of D. .40 * .25 = 0.1, our original probability, and if you add up all of D's probabilities strewn through out the other indices, you would get .70 again.
So to do random selection, you need only to generate a random index from 0 to N, then do a coin flip, no matter how many items you add, this is very fast and constant cost. Making an alias table doesn't take that many lines of code either, my python version takes 80 lines including import statements and line breaks, and the version presented in the Pandas article is similarly sized (and it's C++)
For your java implementation one could map between probabilities and array list indices to your functions you must execute, creating an array of functions which are executed as you index to each, alternatively you could use function objects (functors) which have a method that you use to pass parameters in to execute.
ArrayList<(YourFunctionObject)> function_list;
// add functions
AliasSampler aliassampler = new AliasSampler(listOfProbabilities);
// somewhere later with some type T and some parameter values.
int index = aliassampler.sampleIndex();
T result = function_list[index].apply(parameters);
EDIT:
I've created a version in java of the AliasSampler method, using classes, this uses the sample index method and should be able to be used like above.
import java.util.ArrayList;
import java.util.Collections;
import java.util.Random;
public class AliasSampler {
private ArrayList<Double> binaryProbabilityArray;
private ArrayList<Integer> aliasIndexList;
AliasSampler(ArrayList<Double> probabilities){
// java 8 needed here
assert(DoubleStream.of(probabilities).sum() == 1.0);
int n = probabilities.size();
// probabilityArray is the list of probabilities, this is the incoming probabilities scaled
// by the number of probabilities. This allows us to figure out which probabilities need to be spread
// to others since they are too large, ie [0.1 0.1 0.1 0.7] = [0.4 0.4 0.4 2.80]
ArrayList<Double> probabilityArray;
for(Double probability : probabilities){
probabilityArray.add(probability);
}
binaryProbabilityArray = new ArrayList<Double>(Collections.nCopies(n, 0.0));
aliasIndexList = new ArrayList<Integer>(Collections.nCopies(n, 0));
ArrayList<Integer> lessThanOneIndexList = new ArrayList<Integer>();
ArrayList<Integer> greaterThanOneIndexList = new ArrayList<Integer>();
for(int index = 0; index < probabilityArray.size(); index++){
double probability = probabilityArray.get(index);
if(probability < 1.0){
lessThanOneIndexList.add(index);
}
else{
greaterThanOneIndexList.add(index);
}
}
// while we still have indices to check for in each list, we attempt to spread the probability of those larger
// what this ends up doing in our first example is taking greater than one elements (2.80) and removing 0.6,
// and spreading it to different indices, so (((2.80 - 0.6) - 0.6) - 0.6) will equal 1.0, and the rest will
// be 0.4 + 0.6 = 1.0 as well.
while(lessThanOneIndexList.size() != 0 && greaterThanOneIndexList.size() != 0){
//https://stackoverflow.com/questions/16987727/removing-last-object-of-arraylist-in-java
// last element removal is equivalent to pop, java does this in constant time
int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
double probabilityLessThanOne = probabilityArray.get(lessThanOneIndex);
binaryProbabilityArray.set(lessThanOneIndex, probabilityLessThanOne);
aliasIndexList.set(lessThanOneIndex, greaterThanOneIndex);
probabilityArray.set(greaterThanOneIndex, probabilityArray.get(greaterThanOneIndex) + probabilityLessThanOne - 1);
if(probabilityArray.get(greaterThanOneIndex) < 1){
lessThanOneIndexList.add(greaterThanOneIndex);
}
else{
greaterThanOneIndexList.add(greaterThanOneIndex);
}
}
//if there are any probabilities left in either index list, they can't be spread across the other
//indicies, so they are set with probability 1.0. They still have the probabilities they should at this step, it works out mathematically.
while(greaterThanOneIndexList.size() != 0){
int greaterThanOneIndex = greaterThanOneIndexList.remove(greaterThanOneIndexList.size() - 1);
binaryProbabilityArray.set(greaterThanOneIndex, 1.0);
}
while(lessThanOneIndexList.size() != 0){
int lessThanOneIndex = lessThanOneIndexList.remove(lessThanOneIndexList.size() - 1);
binaryProbabilityArray.set(lessThanOneIndex, 1.0);
}
}
public int sampleIndex(){
int index = new Random().nextInt(binaryProbabilityArray.size());
double r = Math.random();
if( r < binaryProbabilityArray.get(index)){
return index;
}
else{
return aliasIndexList.get(index);
}
}
}
You could compute the cumulative probability for each class, pick a random number from [0; 1) and see where that number falls.
class WeightedRandomPicker {
private static Random random = new Random();
public static int choose(double[] probabilties) {
double randomVal = random.nextDouble();
double cumulativeProbability = 0;
for (int i = 0; i < probabilties.length; ++i) {
cumulativeProbability += probabilties[i];
if (randomVal < cumulativeProbability) {
return i;
}
}
return probabilties.length - 1; // to account for numerical errors
}
public static void main (String[] args) {
double[] probabilties = new double[]{0.1, 0.1, 0.2, 0.6}; // the final value is optional
for (int i = 0; i < 20; ++i) {
System.out.printf("%d\n", choose(probabilties));
}
}
}
The following is a bit like #daniu answer but makes use of the methods provided by TreeMap:
private final NavigableMap<Double, Runnable> map = new TreeMap<>();
{
map.put(0.3d, this::branch30Percent);
map.put(1.0d, this::branch70Percent);
}
private final SecureRandom random = new SecureRandom();
private void branch30Percent() {}
private void branch70Percent() {}
public void runRandomly() {
final Runnable value = map.tailMap(random.nextDouble(), true).firstEntry().getValue();
value.run();
}
This way there is no need to iterate the whole map until the matching entry is found, but the capabilities of TreeSet in finding an entry with a key specifically comparing to another key is used. This however will only make a difference if the number of entries in the map is large. However it does save a few lines of code.
I'd do that something like this:
class RandomMethod {
private final Runnable method;
private final int probability;
RandomMethod(Runnable method, int probability){
this.method = method;
this.probability = probability;
}
public int getProbability() { return probability; }
public void run() { method.run(); }
}
class MethodChooser {
private final List<RandomMethod> methods;
private final int total;
MethodChooser(final List<RandomMethod> methods) {
this.methods = methods;
this.total = methods.stream().collect(
Collectors.summingInt(RandomMethod::getProbability)
);
}
public void chooseMethod() {
final Random random = new Random();
final int choice = random.nextInt(total);
int count = 0;
for (final RandomMethod method : methods)
{
count += method.getProbability();
if (choice < count) {
method.run();
return;
}
}
}
}
Sample usage:
MethodChooser chooser = new MethodChooser(Arrays.asList(
new RandomMethod(Blah::aaa, 1),
new RandomMethod(Blah::bbb, 3),
new RandomMethod(Blah::ccc, 1)
));
IntStream.range(0, 100).forEach(
i -> chooser.chooseMethod()
);
Run it here.

Ways of encapsulating choice of Java primitive; Avoiding "magic" primitives

I'm writing a program which creates a large number of large arrays to store data. All of this data has to held in RAM, so I'm avoiding objects and currently using shorts to save space. These shorts serve as ID numbers which can be put into a lookup class to get the corresponding object on demand. I have recently questioned whether I'll need the whole 2 bytes of a short, and so I'm now wondering if there's anyway to define the data type being stored in one place in my code so that I can change it easily without having to hunt down every cast, return type, etc. that is currently set to short.
If I were willing to use objects I could easily just do
class MySmallNumber extends Short{}
and change the parent class if necessary.
If this were C/C++, i could use
#define small short
for the effect I'm looking for.
I'm searching for a way to do something like this in java that won't require storing 64-bit object references in my arrays. Any help is greatly appreciated. Right now I'm looking at a really messy IDE replace all in order to do this.
You can incapsulate you array in some custom class. It shouldn't add considerable space overhead because you work with large arrays.
In all other places in your code you can use long. When you pass these longs to you array custom class you can convert it to the one you use inside it.
Finally you have to make changes in this one class only.
I would suggest factoring out all code that depends on the type of your ID values into a separate class. Let that class handle all the operations (including lookup) that depend on whether the ID values are short, byte, or something else. You can pass individual values in and out as short or even int values, even if internally they are converted to byte. (This is, for instance, how java.io.DataOutputStream.writeByte(int) was written—it takes an int argument and treats it as a byte value.)
not quite sure what you are after here, but this may be of interest:
import java.util.Arrays;
interface Index {
short getIndex(int i);
void setIndex(int i, short value);
int size();
}
class ShortIndexImpl implements Index {
ShortIndexImpl(int n) {
indices = new short[n];
}
#Override public short getIndex(int i) {
return indices[i];
}
#Override public void setIndex(int i, short value) {
indices[i] = value;
}
#Override public int size() {
return indices.length;
}
final short[] indices;
}
class TenBitIndexImpl implements Index {
TenBitIndexImpl(int n) {
indices = new int[(n + 2) / 3];
}
#Override public short getIndex(int i) {
int index = i / 3;
int remainder = i % 3;
int word = indices[index];
return (short) (0x3ff & (word >> shifts[remainder]));
}
#Override public void setIndex(int i, short value) {
int index = i / 3;
int remainder = i % 3;
int word = indices[index] & ~masks[remainder];
int shiftedValue = ((int) value) << shifts[remainder];
word |= shiftedValue;
indices[index] = word;
}
#Override public int size() {
return indices.length;
}
final int masks[] = new int[] { 0x3ff00000, 0xffc00, 0x3ff };
final int shifts[] = new int[] { 20, 10, 0 };
final int[] indices;
}
public class Main {
static void test(Index index) {
for (int i = 0; i < values.length; i++)
index.setIndex(i, values[i]);
for (int i = 0; i < values.length; i++) {
System.out.println(values[i] + " " + index.getIndex(i));
if (index.getIndex(i) != values[i])
System.out.println("expected " + values[i] + " but got " + index.getIndex(i));
}
}
public static void main(String[] args) {
Index index = new ShortIndexImpl(values.length);
test(index);
index = new TenBitIndexImpl(values.length);
test(index);
System.out.println("indices");
for (int i = 0; i < ((TenBitIndexImpl) index).indices.length; i++)
System.out.println(((TenBitIndexImpl) index).indices[i]);
}
static short[] values = new short[] { 1, 2, 3, 4, 5, 6 };
}

Performance intensive string splitting and manipulation in java

What is the most efficient way to split a string by a very simple separator?
Some background:
I am porting a function I wrote in C with a bunch of pointer arithmetic to java and it is incredibly slow(After some optimisation still 5* slower).
Having profiled it, it turns out a lot of that overhead is in String.split
The function in question takes a host name or ip address and makes it generic:
123.123.123.123->*.123.123.123
a.b.c.example.com->*.example.com
This can be run over several million items on a regular basis, so performance is an issue.
Edit: the rules for converting are thus:
If it's an ip address, replace the first part
Otherwise, find the main domain name, and make the preceding part generic.
foo.bar.com-> *.bar.com
foo.bar.co.uk-> *.bar.co.uk
I have now rewritten using lastIndexOf and substring to work myself in from the back and the performance has improved by leaps and bounds.
I'll leave the question open for another 24 hours before settling on the best answer for future reference
Here's what I've come up with now(the ip part is an insignificant check before calling this function)
private static String hostConvert(String in) {
final String [] subs = { "ac", "co", "com", "or", "org", "ne", "net", "ad", "gov", "ed" };
int dotPos = in.lastIndexOf('.');
if(dotPos == -1)
return in;
int prevDotPos = in.lastIndexOf('.', dotPos-1);
if(prevDotPos == -1)
return in;
CharSequence cs = in.subSequence(prevDotPos+1, dotPos);
for(String cur : subs) {
if(cur.contentEquals(cs)) {
int start = in.lastIndexOf('.', prevDotPos-1);
if(start == -1 || start == 0)
return in;
return "*" + in.substring(start);
}
}
return "*" + in.substring(prevDotPos);
}
If there's any space for further improvement it would be good to hear.
Something like this is about as fast as you can make it:
static String starOutFirst(String s) {
final int K = s.indexOf('.');
return "*" + s.substring(K);
}
static String starOutButLastTwo(String s) {
final int K = s.lastIndexOf('.', s.lastIndexOf('.') - 1);
return "*" + s.substring(K);
}
Then you can do:
System.out.println(starOutFirst("123.123.123.123"));
// prints "*.123.123.123"
System.out.println(starOutButLastTwo("a.b.c.example.com"));
// prints "*.example.com"
You may need to use regex to see which of the two method is applicable for any given string.
I'd try using .indexOf("."), and .substring(index)
You didn't elaborate on the exact pattern you wanted to match but if you can avoid split(), it should cut down on the number of new strings it allocates (1 instead of several).
It's unclear from your question exactly what the code is supposed to do. Does it find the first '.' and replace everything up to it with a '*'? Or is there some fancier logic behind it? Maybe everything up to the nth '.' gets replaced by '*'?
If you're trying to find an instance of a particular string, use something like the Boyer-Moore algorithm. It should be able to find the match for you and you can then replace what you want.
Keep in mind that String in Java is immutable. It might be faster to change the sequence in-place. Check out other CharSequence implementations to see what you can do, e.g. StringBuffer and CharBuffer. If concurrency is not needed, StringBuilder might be an option.
By using a mutable CharSequence instead of the methods on String, you avoid a bunch of object churn. If all you're doing is replacing some slice of the underlying character array with a shorter array (i.e. {'*'}), this is likely to yield a speedup since such array copies are fairly optimized. You'll still be doing an array copy at the end of the day, but it may be faster than new String allocations.
UPDATE
All the above is pretty much hogwash. Sure, maybe you can implement your own CharSequence that gives you better slicing and lazily resizes the array (aka doesn't actually truncate anything until it absolutely must), returning Strings based on offsets and whatnot. But StringBuffer and StringBuilder, at least directly, do not perform as well as the solution poly posted. CharBuffer is entirely inapplicable; I didn't realize it was an nio class earlier: it's meant for other things entirely.
There are some interesting things about poly's code, which I wonder whether he/she knew before posting it, namely that changing the "*" on the final lines of the methods to a '*' results in a significant slowdown.
Nevertheless, here is my benchmark. I found one small optimization: declaring the '.' and "*" expressions as constants adds a bit of a speedup as well as using a locally-scoped StringBuilder instead of the binary infix string concatenation operator.
I know the gc() is at best advisory and at worst a no-op, but I figured adding it with a bit of sleep time might let the VM do some cleanup after creating 1M Strings. Someone may correct me if this is totally naïve.
Simple Benchmark
import java.util.ArrayList;
import java.util.Arrays;
public class StringSplitters {
private static final String PREFIX = "*";
private static final char DOT = '.';
public static String starOutFirst(String s) {
final int k = s.indexOf(DOT);
return PREFIX + s.substring(k);
}
public static String starOutFirstSb(String s) {
StringBuilder sb = new StringBuilder();
final int k = s.indexOf(DOT);
return sb.append(PREFIX).append(s.substring(k)).toString();
}
public static void main(String[] args) throws InterruptedException {
double[] firstRates = new double[10];
double[] firstSbRates = new double[10];
double firstAvg = 0;
double firstSbAvg = 0;
double firstMin = Double.POSITIVE_INFINITY;
double firstMax = Double.NEGATIVE_INFINITY;
double firstSbMin = Double.POSITIVE_INFINITY;
double firstSbMax = Double.NEGATIVE_INFINITY;
for (int i = 0; i < 10; i++) {
firstRates[i] = testFirst();
firstAvg += firstRates[i];
if (firstRates[i] < firstMin)
firstMin = firstRates[i];
if (firstRates[i] > firstMax)
firstMax = firstRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstAvg /= 10.0d;
for (int i = 0; i < 10; i++) {
firstSbRates[i] = testFirstSb();
firstSbAvg += firstSbRates[i];
if (firstSbRates[i] < firstSbMin)
firstSbMin = firstSbRates[i];
if (firstSbRates[i] > firstSbMax)
firstSbMax = firstSbRates[i];
Thread.sleep(100);
System.gc();
Thread.sleep(100);
}
firstSbAvg /= 10.0d;
System.out.printf("First:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstMin, firstMax,
firstAvg, Arrays.toString(firstRates));
System.out.printf("FirstSb:\n\tMin:\t%07.3f\tMax:\t%07.3f\tAvg:\t%07.3f\n\tRates:\t%s\n\n", firstSbMin,
firstSbMax, firstSbAvg, Arrays.toString(firstSbRates));
}
private static double testFirst() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirst(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
private static double testFirstSb() {
ArrayList<String> strings = new ArrayList<String>(1000000);
for (int i = 0; i < 1000000; i++) {
int first = (int) (Math.random() * 128);
int second = (int) (Math.random() * 128);
int third = (int) (Math.random() * 128);
int fourth = (int) (Math.random() * 128);
strings.add(String.format("%d.%d.%d.%d", first, second, third, fourth));
}
long before = System.currentTimeMillis();
for (String s : strings) {
starOutFirstSb(s);
}
long after = System.currentTimeMillis();
return 1000000000.0d / (after - before);
}
}
Output
First:
Min: 3802281.369 Max: 5434782.609 Avg: 5185796.131
Rates: [3802281.3688212926, 5181347.150259067, 5291005.291005291, 5376344.086021505, 5291005.291005291, 5235602.094240838, 5434782.608695652, 5405405.405405405, 5434782.608695652, 5405405.405405405]
FirstSb:
Min: 4587155.963 Max: 5747126.437 Avg: 5462087.511
Rates: [4587155.963302752, 5747126.436781609, 5617977.528089887, 5208333.333333333, 5681818.181818182, 5586592.17877095, 5586592.17877095, 5524861.878453039, 5524861.878453039, 5555555.555555556]

Categories