Using boolean instead of byte or int in Java - java

Is using boolean instead of byte(if I need 2 states) in Java useful for performance or it's just illusion... Does all the space profit leveled by alignment?

You should use whichever is clearer, unless you have profiled your code, and decided that making this optimization is worth the cost in readability. Most of the time, this sort of micro-optimization isn't worth the performance increase.
According to Oracle,
boolean: ... This data type represents one bit of information, but its
"size" isn't something that's precisely defined.

To give you an idea, I once consulted in a mini-shop (16-bit machines).
Sometimes people would have a "flag word", a global int containing space for 16 boolean flags.
This was to save space.
Never mind that to test a flag required two 16-bit instructions, and to set a flag required three or more.

Yes, boolean may use only 1 bit. But more important, it makes it clearer for another developer reading your code that there are only two possible states.

The answer depends on your JVM, and on your code. The only way to find out for sure is by profiling your actual code.

If you only have 2 states that you want to represent, and you want to reduce memory usage you can use a java.util.BitSet.

Only most JVMs a boolean uses the same amount of space as a byte. Accessing a byte/boolean can be more work than accessing an int or long, so if performance is the only consideration, a int or long can be faster. When you share a value between threads, there can be an advantage to reserving a whole cache line to the field (in the most extreme cases) This is 64-bytes on many CPUs.

Related

Best way to define a constant identifier?

In my java application one of my objects has exactly one value from a set of values. Now I wonder how to define them to increase the performance:
private static final String ITEM_TYPE1= "type1"
private static final int ITEM_TYPE1= 1
Does defining int better than string? (I should convert the value to string so I like to define as string but just fearing for performance reasons because comparing ints is simpler than srtings maybe)
EDIT: I am aware of enums but I just want to know whether ints has more performance than strings or not? This depends on how JDK and JRE handle the undergoing. (In Android dalvik or ART ..)
In my java application one of my objects has exactly one value from a set of values
That is what java enums are for.
Regarding the question "have ints more performance than strings", that is almost nonsensical.
You are talking about static constants. Even if they are used a 100 or a 1000 times in your app, performance doesn't matter here. What matters is to write code that is easy to read and maintain. Because then the JIT can kick in and turn it into nicely optimized machine code.
Please understand: premature optimisation is the root of all evil! Good or bad performance of your app depends on many other factors, definitely not on representing constants as ints or strings.
Beyond that: the type of some thing in Java should reflect its nature. If it is a string, make it a string (like when you want to mainly use it as string, and concatenate it to other strings). When you have numbers and deal with them as numbers, make it an int.
First of all, an int has always a fixed size which it uses in memory, on most systems it's 4 bytes (I guess on Java always).
A String is a complex type which means that it takes not only the bytes of the actual string data but also additional like the length of the string and so on.
So if you have the choice between String and int, you should always chose int. it does not take so much place and is faster to operate with.

Which is the best way to reduce complexity of time and space in Java?

I am writing a program and I am concerned about the time it takes to run and the space it takes.
In the program I used a variable to store the length of an array:
int len=newarray3.length;
Now, I wanted to know that whether I will be able to reduce the space complexity by not using the len variable and instead call the newarray3.length; whenever need.
Note: There are only two occasions when the length needs to be used.
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" - Donald Knuth
First off, a single int variable uses a negligible amount of space. More generally, don't worry about these minor performance ("time and space") issues. Focus on writing clear, readable programs and efficient algorithms. If performance becomes an issue down the line, you'll be able to optimize then as long as your general architecture is solid. So focus on that architecture.
I really doubt that this is the thing that will improve your performance.
Anyway, I would use newarray3.length (instead of assigning to a new variable which takes memory and adds another operation). This is a field anyway (not even a method call). Reading it is the same as reading the int value you copied but you spare the copying and the 4 bytes of memory that int consumes.
Java arrays are not resizable so the length member variable is constant for that object, this can lead to various optimizations by JIT.

Performance implications of using Java BigInteger for a huge bitmask

We have an interesting challenge. We have to control access to data that reside in "bins". There will be, potentially, hundreds of thousands of "bins". Access to each bin is controlled individually but the restrictions can, and probably will, overlap. We are thinking of assigning each bin a position in a bitmask (1,2,3,4, etc..).
Then when a user logs into the system, we look at his security attributes and determine which bins he's allowed to see. With that info we construct a bitmask for this user where the "set" bits correspond to the identifier of the bins he's allowed to see. So if he can see bins 1, 3 and 4, his bit mask would be 1101.
So when a user searches the data, we can look at the bin index of the returned row and see if that bit is set on his bitmask. If his bitmask has that bit set we let him see that row. We are planning for the bitmask to be stored as a BigInteger in Java.
My question is: Assuming the index number doesn't get bigger that Integer.MAX_INT, is a BigInteger bitmask going to scale for hundreds of thousands of bit positions? Would it take forever to run BigInteger.isBitSet(n) where n could be huge (e.g. 874,837)? Would it take forever to create such a BigInteger?
And secondly: If you have an alternative approach, I'd love to hear it.
BigInteger should be fast if you don't change it often.
A more obvious choice would be BitSet which is designed for this sort of thing. For looking up bits, I suspect the performance is similar. For creating/modifying it would be more efficient to use a BitSet.
Note: PaulG has commented the difference is "impressive" and BitSet is faster.
Java has a more convenient class for this, called BitSet.
You do not need to check if the bit is set in a loop: you can make a mask, use a bitwise and, and see if the result is non-empty to decide on whether to grant or deny the access:
BitSet resourceAccessMask = ...
BitSet userAllowedAccessMask = ...
BitSet test = (BitSet)resourceAccessMask.clone();
test.and(userAllowedAccessMask);
if (!test.isEmpty()) {
System.out.println("access granted");
} else {
System.out.println("access denied");
}
We used this class in a similar situation in my prior company, and the performance was acceptable for our purposes.
You could define your own Java interface for this, initially using a Java BitSet to implement that interface.
If you run into performance issues, or if you require the use of long later on, you may always provide a different implementation (e.g. one that uses caching or similar improvements) without changing the rest of the code. Think well about the interface you require, and choose a long index just to be sure, you can always check if it is out of bounds in the implementation later on (or simply return "no access" initially) for anything index > Integer.MAX_VALUE.
Using BigInteger is not such a good idea, as the class was not written for that particular purpose, and the only way of changing it is to create a fully new copy. It is efficient regarding memory use; it uses an array consisting 64 bit longs internally (at the moment, this could of course change).
One thing that should be worth considering (beside using BitSet) is using different granularity. Therefore you use a shorter bit set where each bit 'guards' multiple real bits. This way you would not need to have millions of bits per user in ram.
A simple way to achieve this is having a smaller bit set like n/32 and do something like this:
boolean isSet(int n) {
return guardingBits.isSet(n / 32) && realBits.isSet(n);
}
This gives you a good chance to avoid loading the real bits if those bits are mostly zero. You can modify this approach to match the expected bit-set. If you expect almost all bits are set you can use this guarding bits for storing a one if all bits it guards are set. So you only need to check for bits that might be zero.
Also this might be even the beginning. Depending on the usage and requirements you might want to use a B-tree or a paginated version where you only held a fraction of the big bit field in memory.

Comparing c and java programs runtime

I had a job interview today, we were given a programming question, and were asked to solve it using c/c++/Java, I solved it in java and its runtime was 3 sec (the test was more 16000 lines, and the person accompanying us said the running time was reasonable), another person there solved it in c and the runtime was 0.25 sec, so I was wondering, is a factor of 12 normal?
Edit:
As I said, I don't think there was really much room for algorithm variation except maybe in one little thing, anyway, there was this protocol that we had to implement:
A (client) and B (server) communicate according to some protocol p, before the messages are delivered their validity is checked, the protocol is defined by its state and the text messages that can be sent when it is in a certain state, in all states there was only one valid message that could be sent, except in one state where there was like 10 messages that can be sent, there are 5 states and the states transition is defined by the protocol too.
so what I did with the state from which 10 different messages can be sent was storing their string value in an ArrayList container, then when I needed to check the message validity in the corresponding state i checked if arrayList.contains(sentMessageStr); I would think that this operation's complexity is O(n) although I think java has some built-in optimization for this operation, although now that I am thinking about it, maybe I should've used a HashSet container.I suppose the c implementation would have been storing those predefined legal strings lexicographically in an array and implementing a binary search function.
thanks
I would guess that it's likely the jvm took a significant portion of that 3 seconds just to load. Try running your java version on the same machine 5 times in a row. Or try running both on a dataset 500 times as large. I suspect you'll see a significant constant latency for the Java version that will become insignificant when runtimes go into the minutes.
Sounds more like a case of insufficient samples and unequal implementations (and possibly unequal test beds).
One of the first rules in measurement is to establish enough samples and obtain the mean of the samples for comparison. Even a couple of runs of the same program is not sufficient. You need to tax the machine enough to obtain samples whose values can be compared. That's why test-beds need to be warmed up, so that there are little or no variables at play, except for the system under observation.
And of course, you also have different people implementing the same requirement/algorithm in different manners. It counts. Period. Unless the algorithm implementations have been "normalized", obtaining samples and comparing them are the same as comparing apples and watermelons.
I don't think I need to expand on the fact that the testbeds could have been of varying configurations, or under varying loads.
It's almost impossible to say without seeing the code - you may have a better algorithm for example that scales up much better for larger input but has a greater overhead for small input sizes.
Having said that, this kind of 12x difference is roughly what I would expect if you coded the solution using "higher level" constructs such as ArrayLists / boxed objects and the C solution was basically using optimised, low level pointer arithmetic into a pre-allocated memory region.
I'd rather maintain the higher level solution, but there are times when only hand-optimised low level code will do.....
Another potential explanation is that the JIT had not yet warmed up on your code. In general, you need to reach "steady state" (typically a few thousand iterations of every code path) before you will see top performance in JIT-compiled code.
Performance depends on implementation. Without knowing exactly what you code and what your competitor did, it's very difficult to tell exactly what happened.
But let's say for isntance, that you used objects like vectors or whatever to solve the problem and the C guy used arrays[], his implementation is going to be faster than yours for sure.
C code can be translated very efficiently into assembly instructions, while Java on the other hand, relies on a bunch of stuff (like the JVM) that might make the bytecode of your program fatter and probably a little bit slower.
You will be hard pressed to find something that can execute faster in Java than in C. Its true that an order of magnitude is a big difference but in general C is more performant.
On the other hand you can produce a solution to any given problem much quicker in Java (especially taking into account the richness of the libraries).
So at the end of the day, if there is a choice at all, it comes down as a dilemma between performance and productivity.
That depends on the algorithm. Java is of course generally slower than C/C++ as it's a virtual machine but for most common applications its speed is sufficient. I would not call a factor of 12 normal for common applications.
Would be nice if you posted the C and Java codes for comparison.
A factor of 12 can be normal. So could a factor of 1 or 1/2. As some commentators mentioned, a lot has to do with how you coded your solution.
Dont forget that java programs have to run in a jvm (unless you compile to native machine code), so any benchmarks should take that into account.
You can google for 'java and c speed comparisons' for some analysis
Back in the days I'd say that there's nothing wrong with your Java code being 12 times slower. But nowadays I'd rather say that the C guy implemented it more efficiently. Obviously I might be wrong, but are you sure you used proper data structures and well simply coded it well?
Also did you measure the memory usage? This might sound silly, but last year at the uni we had a programming challenge, don't really remember what it was but we had to solve a graph problem in whatever language we wanted - I did two implementations of my algorithm one in C and one in Java, the Java one was ~1,5-2x slower, BUT for instance I knew I didn't have to worry about memory management (I knew exactly how big the input will be and how many test samples will be run from the teacher) so I simply didn't free any memory (which took way too much time in a programme that run for ~1-2seconds on a graph with ~15k nodes, or was it 150k?) so my Java code was better memory wise but it was slower. I also parsed the input myself in C (didn't do that in Java) which saved me really A LOT of time (~20-25% boost, I was amazed myself). I'd say 1,5-2x is more realistic than 12x.
Most likely the algorithm used in the implementation was different.
For instance ( an over simplification ) if you want to add a number N , M number of times one implementation could be:
long addTimes( long n, long m ) {
long r = 0;
long i;
for( i = 0; i < m ; i++ ) {
r += n;
}
return r;
}
And another implementation could simply be:
long addTimes( long n, long m ) {
return n * m;
}
Both, will run mostly the same in Java and C (you don't even have to change the code ) and still, one implementation will run way lot faster than the other.

Anyone using short and byte primitive types, in real apps?

I have been programming in Java since 2004, mostly enterprise and web applications. But I have never used short or byte, other than a toy program just to know how these types work. Even in a for loop of 100 times, we usually go with int. And I don't remember if I have ever came across any code which made use of byte or short, other than some public APIs and frameworks.
Yes I know, you can use a short or byte to save memory in large arrays, in situations where the memory savings actually matters. Does anyone care to practice that? Or its just something in the books.
[Edited]
Using byte arrays for network programming and socket communication is a quite common usage. Thanks, Darren, to point that out. Now how about short? Ryan, gave an excellent example. Thanks, Ryan.
I use byte a lot. Usually in the form of byte arrays or ByteBuffer, for network communications of binary data.
I rarely use float or double, and I don't think I've ever used short.
Keep in mind that Java is also used on mobile devices, where memory is much more limited.
I used 'byte' a lot, in C/C++ code implementing functionality like image compression (i.e. running a compression algorithm over each byte of a black-and-white bitmap), and processing binary network messages (by interpreting the bytes in the message).
However I have virtually never used 'float' or 'double'.
The primary usage I've seen for them is while processing data with an unknown structure or even no real structure. Network programming is an example of the former (whoever is sending the data knows what it means but you might not), something like image compression of 256-color (or grayscale) images is an example of the latter.
Off the top of my head grep comes to mind as another use, as does any sort of file copy. (Sure, the OS will do it--but sometimes that's not good enough.)
The Java language itself makes it unreasonably difficult to use the byte or short types. Whenever you perform any operation on a byte or short value, Java promotes it to an int first, and the result of the operation is returned as an int. Also, they're signed, and there are no unsigned equivalents, which is another frequent source of frustration.
So you end up using byte a lot because it's still the basic building block of all things cyber, but the short type might as well not exist.
Until today I haven't notice how seldom I use them.
I've use byte for network related stuff, but most of the times they were for my own tools/learning. In work projects these things are handled by frameworks ( JSP for instance )
Short? almost never.
Long? Neither.
My preferred integer literals are always int, for loops, counters, etc.
When data comes from another place ( a database for instance ) I use the proper type, but for literals I use always int.
I use bytes in lots of different places, mostly involving low-level data processing. Unfortunately, the designers of the Java language made bytes signed. I can't think of any situation in which having negative byte values has been useful. Having a 0-255 range would have been much more helpful.
I don't think I've ever used shorts in any proper code. I also never use floats (if I need floating point values, I always use double).
I agree with Tom. Ideally, in high-level languages we shouldn't be concerned with the underlying machine representations. We should be able to define our own ranges or use arbitrary precision numbers.
when we are programming for electronic devices like mobile phone we use byte and short.In this case we should take care on memory management.
It's perhaps more interesting to look at the semantics of int. Are those arbitrary limits and silent truncation what you want? For application-level code really wants arbitrary sized integers, it's just that Java has no way of expressing those reasonably.
I have used bytes when saving State while doing model checking. In that application the space savings are worth the extra work. Otherwise I never use them.
I found I was using byte variables when doing some low-level image processing. The .Net GDI+ draw routines were really slow so I hand-rolled my own.
Most times, though, I stick with signed integers unless I am forced to use something larger, given the problem constraints. Any sort of physics modeling I do usually requires floats or doubles, even if I don't need the precision.
Apache POI was using short quite a few times. Probably because of Excel's row/column number limitation.
A few months ago they changed to int replacing
createCell(short columnIndex)
with
createCell(int column).
On in-memory datagrids, it can be useful.
The concept of a datagrid like Gemfire is to have a huge distributed map.
When you don't have enough memory you can overflow to disk with LRU strategy, but the keys of all entries of your map remains in memory (at least with Gemfire).
Thus it is very important to make your keys with a small footprint, particularly if you are handling very large datasets.
For the entry value, when you can it's also better to use the appropriate type with a small memory footprint...
I have used shorts and bytes in Java apps communicating to custom usb or serial micro-controllers to receive 10bit values wrapped in 2 bytes as shorts.
bytes and shorts are extensively used in Java Card development. Take a look at my answer to Are there any real life uses for the Java byte primitive type?.

Categories