Efficient mapping of Strings to ints in Android

Efficient mapping of Strings to ints in Android - java

Currently I'm using a HashMap<String, Integer> to map the strings to int and the int values need to be frequently accessed.
I'm looking for a better way to do this if possible with minimal object creation, and preferable being able to store the values as primitive ints without wrapping them with the Integer class.
(Basically, the reverse of the SparseArray's int->object mapping.)

Are they arbitrary values known at compile time? In that case, use a Java Enum or an Android Enum.

If you're using a HashMap there will be no way around using some object. If you're max value is 31 all values will be cached by the Integer implementation. There will be no object createion as long as you're using autoboxing or Integer.valueOf to access the Integer instances (not new Integer(..)).
If you want to minimize object creation you could write an mutable wrapper around a primitive int. One alternative is to (mis-)use java.util.concurrent.atomic.AtomicInteger which has some overhead.

Write a class like this:
public class Foo
{
public static final String A = "a";
public static final String B = "b";
public static int foo(String str)
{
final int val;
if(str == A || str.equals(A))
{
val = 0x01;
}
else if(str == B || str.equals(B))
{
val = 0x02;
}
// etc...
return (val);
}
}
You only create the Strings once each, the number of Strings are small enough that the .equals won't get called many times, and if you always us ethe constants then the .equals won't get called at all.
If you use a Map it will take more memory than this, and also given that the number of Strings is small the code above might be faster. Also you can refactor the implementation to use a Map internally (or an array, or whatever) and see what is the fastest/uses the least memory without having to change your API.
EDIT:
Another thing to look at is the proposal for switch with String... if this comes into the language (I think it is) and if Android adopts it, then you would be able to replace the code I have above with a switch without a performance hit. Essentially they do a switch on the hashCode and then only call .equals on objects that have the same hashCode. This does require that you figure out the hashCodes in advance, and that the hashCode always returns the same thing for ever (which it should since String defines the way hashCode must work).

Related

What is the most memory-saving data structure that can serve my purpose?

I'm trying to memory-optimize my server that keeps running into OOM.
Most of the objects (by count) in the server take the following form:
Each object is a HashMap
HashMap keys are strings
HashMap values are objects of Attribute class, which just has an int and 2 booleans.
Important caveat: 95% of such hashmaps would ONLY EVER have one key; and I know whether that's the case when creating the hashmap.
There are millions of these hashmaps.
I already asked a separate question about optimizing those hashmaps memory wise, and someone in a comment suggested that perhaps re-engineering my whole data structure would be better since even with initial size of "1" HashMaps still take up extra memory.
As such, my question is; is there a better Java data structure I can implement which can store the same exact data with better memory efficiency?
NOTE: I need to be able to look up whether a specific value is present as a key; therefore I have considered but rejected storing the data in an list of quintuples of [string_value, int, boolean, boolean].

Expose the less specific Map interface to the user instead of HashMap, this given you the freedom to use hash maps or singleton maps depending on the case.
Memory efficient singleton maps can be created via Collections.singletonMap(): https://docs.oracle.com/javase/7/docs/api/java/util/Collections.html#singletonMap(K,%20V)
SingletonMap seems to be implemented as an object with just two fields and some cache: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Collections.java#Collections.SingletonMap
p.s. You could have something even more compact for your special case by collapsing the map and value into a single instance like this:
public final class SingletonAttributeMap extends Attribute implements Map<String,Attribute> {
private final String key;
public Attribute get(String key) {
return this.key.equals(key) ? this : null;
}
....
}
p.p.s Is there a maximum size for the integers in the attributes? You might be able to do tricks like this (whether this actually saves memory will depend on padding / alignment, see What is the memory consumption of an object in Java?):
class Attribute {
private int value;
boolean getBoolean1() {
return (value & 1) != 0;
}
boolean getBoolean2() {
return (value & 2) != 0;
}
int getInt() {
return value >> 2;
}
void setBoolean1(boolean b) {
value = (value & ~1) | (b ? 1 : 0);
}
void setInt(int i) {
value = (value & ~3) | (i << 2);
}
...

Each object is a HashMap? Not a very good abstraction. I'd prefer composition: create an object that HAS-A HashMap and provides a clear API. Make it Comparable and implement hashCode and equals.
I would suggest that you follow the Flyweight pattern if lots of those are repeated.
If the objects are immutable and read-only this will be a perfectly good optimization.
Have a Factory object to create instances and maintain a List of unique ones. If someone asks for one that's already in the List return the immutable copy.
You can calculate how much of a memory savings this will provide before you implement it.

Benefits of using NumberUtils.INTEGER_ONE and other such utilities

In java for a comparison in an if statement, I wrote
if (x == 1)
and got a comment in code review to use NumberUtils.INTEGER_ONE instead of 1. I was wondering what benefit does it actually add to the code.

NumberUtils.INTEGER_ONE comes probably from commons-lang.
In commons-lang, it is defined as :
public static final Integer INTEGER_ONE = new Integer(1);
In commons-lang3, it is defined as :
public static final Integer INTEGER_ONE = Integer.valueOf(1);
The first version doesn't use the internal integer cache (as didn't exist yet)
while the second version takes advantage of it.
Now, whatever the version you are using, it doesn't really matter for your question as you compare integer values and you don't assign or create integer value (case where the cache could make more sense).
Suppose you are using it in this way :
if (x == NumberUtils.INTEGER_ONE)
If x is a primitive, it is not very efficient as it will produce an unboxing operation to convert NumberUtils.INTEGER_ONE to a 1 int primitive.
If x is an object, it is not a good idea either as Integer objects should be compared with equals() or intValue().

Java: best way to return two "immutable" integers?

1) When Java encounters int[], does it actually understand it as Integer[], I mean array can only hold references, not primitives ?
2) When it comes to return two immutable integers from a function, are those two ways equivalent?
int a, b;
...
int[] returnVal = {a, b};
return(returnVal);
vs.
Integer a, b;
...
Integer[] returnVal = {a, b};
return(returnVal);
3) What is the standard practice to return two immutable integers?
Edits:
I'm wondering if "immutable" is actually the correct term to use as my question is about how to return safely a pair of integer values to a caller and at the same time preventing the caller to change the original values without using unnecessary clone().
By trying different pieces of code, the short answer to point #2 seems to be that you can safely return the values as int[] or Integer[]. The caller may change the elements of the returned array, but not the initial values.
Answers below provide explanations for that, and valuable clues for points #1 and #3. As I cannot select multiple answers as correct, I've selected the most useful for me, but I thank everyone for their assistance.

No they are not identical. The compiler uses a trick called auto-boxing to make a and b switch between int and integers. It applies Integer.valueOf(primitiveInt) or integerInstance.intvalue() on demand and automatically.
The best semantics is whether any of the numbers can be absent. That only works for Integer, not int
There is a suitable Pair class in apache commons lang:
http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/tuple/Pair.html
Along with an immutable one.
Or create your own custom one:
Using Pairs or 2-tuples in Java

The standard practice is:
public class NameThatDescribesWhatAPairOfIntegersSymbolicallyRepresents {
private final int nameDescribingFirstInteger'sRole;
private final int nameDescribingSecondInteger'sRole;
public NameThatDescribesWhatAPairOfIntegersSymbolicallyRepresents(
int firstInteger, int secondInteger) {
nameDescribingFirstInteger'sRole = firstInteger;
nameDescribingSecondInteger'sRole = secondInteger;
}
public int getDescriptiveNameOfFirstInteger() {
return nameDescribingFirstInteger'sRole;
}
public int getDescriptiveNameOfSecondInteger() {
return nameDescribingSecondInteger'sRole;
}
}
Anything less will lead to a poor, disgruntled coder two years from now, staring at:
status = result[0] + 2 * result[1];
And proceeding to tear his hair out as he mouths to himself "What the !##$ is int[] result"?
=== Edit ===
Rant aside, the answers to your questions are:
1) No, these are totally different things. Primitives and Objects have different handles, consume different amounts of memory, and behave in different ways.
2) See (1) - no.
Additionally, no array - be it int[], or Integer[] - can ever be immutable. Java arrays are defined to be mutable always, you can't stop the caller from changing out the elements in an array. The only way to have a method return an "Immutable" array is if it generates a brand new copy every time it is called, and never hands out internal, mutable data.
3) See above

When Java encounters int[], does it actually understand it as
Integer[], I mean array can only hold references, not primitives ?
No for returning int[] collection of integer values(Primitive types) array is not same as Integer[] Collection of Integer Objects (Objects)
For Second Question.
No both are still not same because it's completely depend on return type of method.
As you can not return int[] where return type of method is Integer[].
For immutable return i want to add that you don't modified the object, rather you pointed the reference to a different object this happens only in Integer not in int.
In the sense of it depends on your code but if you ask which one is better i will say wrapping of int to Integer will be better in practice.

We come across these situations, in case we want to return wrapper class objects.
Simple solution for this would be to return map as follow:
private void parentFunction() {
Map<String,Integer> flagsForEventRule = new HashMap<>();
// call the method inside which you want to set these parameters
sample(flagsForEventRule);
System.out.println(flagsForEventRule.get("pathsMatched"));
}
private void sample(Map<String,Boolean> flagsForEventRule) {
// you can set the values in map like this
flagsForEventRule.put("pathsMatched", 1);
flagsForEventRule.put("shouldCreateRule", 2);
}

1) When Java encounters int[], does it actually understand it as
Integer[], I mean array can only hold references, not primitives ?
This is 2 different things,
int[] is a standard primitives array,
Integer[] is an array of Objects - meaning array of pointers to Integer Objects.
When it comes to return two immutable integers from a function, are
those two ways equivalent?
int is always mutable
Integer is immutable
So in this case if you want to work with immutable objects, you need to use Integer, but then again you can always change the pointer in your array, and that way you'll lose consistency.
What is the standard practice to return two immutable integers?
I would look at apache common ImmutablePair()

Generics, Guava Ordering.arbitrary()

#SuppressWarnings("unchecked")
public static final Ordering<EmailTemplate> ARBITRARY_ORDERING = (Ordering)Ordering.arbitrary();
public static final Ordering<EmailTemplate> ORDER_BY_NAME = Ordering.natural().nullsFirst().onResultOf(GET_NAME);
public static final Ordering<EmailTemplate> ORDER_BY_NAME_SAFE = Ordering.allEqual().nullsFirst()
.compound(ORDER_BY_NAME)
.compound(ARBITRARY_ORDERING);
Here's the code a use to order EmailTemplate.
If i have a list of EmailTemplate i want the null elements of the list to appear at the beginning, then the elements with a null name, and then by natural name order, and if they have the same name, an arbitrary order.
Is it how i am supposed to do? It seems strange to start the comparator by "allEqual" i think...
I also wonder what's the best way to deal with the Ordering.arbitrary(), since it's a static method that returns Ordering. Is there any elegant way to use it? I don't really like this kind of useless, with warning, line:
#SuppressWarnings("unchecked")
public static final Ordering<EmailTemplate> ARBITRARY_ORDERING = (Ordering)Ordering.arbitrary();
By the way, the documentation says:
Returns an arbitrary ordering over all objects, for which compare(a,
b) == 0 implies a == b (identity equality). There is no meaning
whatsoever to the order imposed, but it is constant for the life of the VM.
Does this mean that my object being compared with this Ordering will never be garbage collected?

Regarding the second question: no. Guava uses the identity hash codes of the objects to sort them arbitrarily.
Regarding the first question: I would use a comparison chain to sort by name, then by arbitrary order:
private class ByNameThenArbitrary implements Comparator<EmailTemplate> {
#Override
public int compare(EmailTemplate e1, EmailTemplate e2) {
return ComparisonChain.start()
.compare(e1.getName(), e2.getName(), Ordering.natural().nullsFirst(),
.compare(e1, e2, Ordering.arbitrary())
.result();
}
}
Then I would create the real ordering to order the templates with nulls first:
private static final Ordering<EmailTemplate> ORDER =
Ordering.fromComparator(new ByNameThenArbitrary()).nullsFirst();
Not tested, though.

I'm pretty sure, you're doing it too complicated:
Ordering.arbitrary() works with any Object and the compound doesn't require to restrict it to EmailTemplate
Saying nullsFirst() takes priority when null gets compared, and I'd suggest to apply it last
You don't need to define multiple constants, it all should be easy
I'd go for
public static final Ordering<EmailTemplate> ORDER_BY_NAME_SAFE = Ordering
.natural()
.onResultOf(GET_NAME)
.compound(Ordering.arbitrary())
.nullsFirst();
but I haven't tested it.
What's confusing here, is the way how compound and nullsFirst work. With the former, this takes precedence, while with the latter testing for null wins. Both is logical:
compound works left to right
nullsFirst must first test for null, otherwise we'd get an expection
but taken together it's confusing.
Does this mean that my object being compared with this Ordering will never be garbage collected?
No, it uses weak references. Whenever an object isn't referenced elsewhere, it can be garbage collected. This is no contradiction to "the ordering is constant for the life of the VM", since a no more existing object can't be compared anymore.
Note that Ordering.arbitrary() is indeed arbitrary and based on object's identity rather than on equals, which means that
Ordering.arbitrary().compare(new String("a"), new String("a"))
doesn't return 0.
I wonder if an "equals-compatible arbitrary ordering" could be implemented.

is there a performance hit when using enum.values() vs. String arrays?

I'm using enumerations to replace String constants in my java app (JRE 1.5).
Is there a performance hit when I treat the enum as a static array of names in a method that is called constantly (e.g. when rendering the UI)?
My code looks a bit like this:
public String getValue(int col) {
return ColumnValues.values()[col].toString();
}
Clarifications:
I'm concerned with a hidden cost related to enumerating values() repeatedly (e.g. inside paint() methods).
I can now see that all my scenarios include some int => enum conversion - which is not Java's way.
What is the actual price of extracting the values() array? Is it even an issue?
Android developers
Read Simon Langhoff's answer below, which has pointed out earlier by Geeks On Hugs in the accepted answer's comments. Enum.values() must do a defensive copy

For enums, in order to maintain immutability, they clone the backing array every time you call the Values() method. This means that it will have a performance impact. How much depends on your specific scenario.
I have been monitoring my own Android app and found out that this simple call used 13.4% CPU time! in my specific case.
In order to avoid cloning the values array, I decided to simple cache the values as a private field and then loop through those values whenever needed:
private final static Protocol[] values = Protocol.values();
After this small optimisation my method call only hogged a negligible 0.0% CPU time
In my use case, this was a welcome optimisation, however, it is important to note that using this approach is a tradeoff of mutability of your enum. Who knows what people might put into your values array once you give them a reference to it!?

Enum.values() gives you a reference to an array, and iterating over an array of enums costs the same as iterating over an array of strings. Meanwhile, comparing enum values to other enum values can actually be faster that comparing strings to strings.
Meanwhile, if you're worried about the cost of invoking the values() method versus already having a reference to the array, don't worry. Method invocation in Java is (now) blazingly fast, and any time it actually matters to performance, the method invocation will be inlined by the compiler anyway.
So, seriously, don't worry about it. Concentrate on code readability instead, and use Enum so that the compiler will catch it if you ever try to use a constant value that your code wasn't expecting to handle.
If you're curious about why enum comparisons might be faster than string comparisons, here are the details:
It depends on whether the strings have been interned or not. For Enum objects, there is always only one instance of each enum value in the system, and so each call to Enum.equals() can be done very quickly, just as if you were using the == operator instead of the equals() method. In fact, with Enum objects, it's safe to use == instead of equals(), whereas that's not safe to do with strings.
For strings, if the strings have been interned, then the comparison is just as fast as with an Enum. However, if the strings have not been interned, then the String.equals() method actually needs to walk the list of characters in both strings until either one of the strings ends or it discovers a character that is different between the two strings.
But again, this likely doesn't matter, even in Swing rendering code that must execute quickly. :-)
#Ben Lings points out that Enum.values() must do a defensive copy, since arrays are mutable and it's possible you could replace a value in the array that is returned by Enum.values(). This means that you do have to consider the cost of that defensive copy. However, copying a single contiguous array is generally a fast operation, assuming that it is implemented "under the hood" using some kind of memory-copy call, rather than naively iterating over the elements in the array. So, I don't think that changes the final answer here.

As a rule of thumb : before thinking about optimizing, have you any clue that this code could slow down your application ?
Now, the facts.
enum are, for a large part, syntactic sugar scattered across the compilation process. As a consequence, the values method, defined for an enum class, returns a static collection (that's to say loaded at class initialization) with performances that can be considered as roughly equivalent to an array one.

If you're concerned about performance, then measure.
From the code, I wouldn't expect any surprises but 90% of all performance guesswork is wrong. If you want to be safe, consider to move the enums up into the calling code (i.e. public String getValue(ColumnValues value) {return value.toString();}).

use this:
private enum ModelObject { NODE, SCENE, INSTANCE, URL_TO_FILE, URL_TO_MODEL,
ANIMATION_INTERPOLATION, ANIMATION_EVENT, ANIMATION_CLIP, SAMPLER, IMAGE_EMPTY,
BATCH, COMMAND, SHADER, PARAM, SKIN }
private static final ModelObject int2ModelObject[] = ModelObject.values();

If you're iterating through your enum values just to look for a specific value, you can statically map the enum values to integers. This pushes the performance impact on class load, and makes it easy/low impact to get specific enum values based on a mapped parameter.
public enum ExampleEnum {
value1(1),
value2(2),
valueUndefined(Integer.MAX_VALUE);
private final int enumValue;
private static Map enumMap;
ExampleEnum(int value){
enumValue = value;
}
static {
enumMap = new HashMap<Integer, ExampleEnum>();
for (ExampleEnum exampleEnum: ExampleEnum.values()) {
enumMap.put(exampleEnum.value, exampleEnum);
}
}
public static ExampleEnum getExampleEnum(int value) {
return enumMap.contains(value) ? enumMap.get(value) : valueUndefined;
}
}

I think yes. And it is more convenient to use Constants.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.