Construct a string from a repeated character - java

In Java I need to construct a string of n zeros with n unknown at compile time. Ideally I'd use
String s = new String('0', n);
But no such constructor exists. CharSequence doesn't seem to have a suitable constructor either. So I'm tempted to build my own loop using StringBuilder.
Before I do this and risk getting defenestrated by my boss, could anyone advise: is there a standard way of doing this in Java? In C++, one of the std::string constructors allows this.

If you don't mind creating an extra string:
String zeros = new String(new char[n]).replace((char) 0, '0');
Or more explicit (and probably more efficient):
char[] c = new char[n];
Arrays.fill(c, '0');
String zeros = new String(c);
Performance wise, the Arrays.fill option seems to perform better in most situations, but especially for large strings. Using a StringBuilder is quite slow for large strings but efficient for small ones. Using replace is a nice one liner and performs ok for larger strings, but not as well as filll.
Micro benchmark for different values of n:
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO26504151.builder 1 avgt 3 29.452 ± 1.849 ns/op
c.a.p.SO26504151.builder 10 avgt 3 51.641 ± 12.426 ns/op
c.a.p.SO26504151.builder 1000 avgt 3 2681.956 ± 336.353 ns/op
c.a.p.SO26504151.builder 1000000 avgt 3 3522995.218 ± 422579.979 ns/op
c.a.p.SO26504151.fill 1 avgt 3 30.255 ± 0.297 ns/op
c.a.p.SO26504151.fill 10 avgt 3 32.638 ± 7.553 ns/op
c.a.p.SO26504151.fill 1000 avgt 3 592.459 ± 91.413 ns/op
c.a.p.SO26504151.fill 1000000 avgt 3 706187.003 ± 152774.601 ns/op
c.a.p.SO26504151.replace 1 avgt 3 44.366 ± 5.153 ns/op
c.a.p.SO26504151.replace 10 avgt 3 51.778 ± 2.959 ns/op
c.a.p.SO26504151.replace 1000 avgt 3 1385.383 ± 289.319 ns/op
c.a.p.SO26504151.replace 1000000 avgt 3 1486335.886 ± 1807239.775 ns/op

Create a n sized char array and convert it to String:
char[] myZeroCharArray = new char[n];
for(int i = 0; i < n; i++) myZeroCharArray[i] = '0';
String myZeroString = new String(myZeroCharArray);

See StringUtils in Apache Commons Lang
https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#repeat%28java.lang.String,%20int%29

There isn't a standard JDK way, but Apache commons (almost a defacto standard), has the StringUtils.repeat() method, e.g.:
String s = StringUtils.repeat('x', 5); // s = "xxxxx"

or the plain old String Format
int n = 10;
String s = String.format("%" + n + "s", "").replace(' ', '0');
System.out.println(s);

Related

Why Java compiled regex works slower then interpreted in String::split?

I'm trying to improve the following code:
public int applyAsInt(String ipAddress) {
var ipAddressInArray = ipAddress.split("\\.");
...
So I compile the regular expression into a static constant:
private static final Pattern PATTERN_DOT = Pattern.compile(".", Pattern.LITERAL);
public int applyAsInt(String ipAddress) {
var ipAddressInArray = PATTERN_DOT.split(ipAddress);
...
The rest of the code remained unchanged.
To my amazement, the new code is slower than the previous one.
Below are the test results:
Benchmark (ipAddress) Mode Cnt Score Error Units
ConverterBenchmark.mkyongConverter 1.2.3.4 avgt 10 166.456 ± 9.087 ns/op
ConverterBenchmark.mkyongConverter 120.1.34.78 avgt 10 168.548 ± 2.996 ns/op
ConverterBenchmark.mkyongConverter 129.205.201.114 avgt 10 180.754 ± 6.891 ns/op
ConverterBenchmark.mkyong2Converter 1.2.3.4 avgt 10 253.318 ± 4.977 ns/op
ConverterBenchmark.mkyong2Converter 120.1.34.78 avgt 10 263.045 ± 8.373 ns/op
ConverterBenchmark.mkyong2Converter 129.205.201.114 avgt 10 331.376 ± 53.092 ns/op
Help me understand why this is happening, please.
String.split has code aimed at exactly this use case:
https://github.com/openjdk/jdk17u/blob/master/src/java.base/share/classes/java/lang/String.java#L3102
/* fastpath if the regex is a
* (1) one-char String and this character is not one of the
* RegEx's meta characters ".$|()[{^?*+\\", or
* (2) two-char String and the first char is the backslash and
* the second is not the ascii digit or ascii letter.
*/
That means that when using split("\\.") the string is effectively not split using a regular expression - the method splits the string directly at the '.' characters.
This optimization is not possible when you write PATTERN_DOT.split(ipAddress).

Java Collections.reverseOrder() vs. Comparator

I am practicing leetcode and oftenTimes I see different ways of reversing data structures. I was wondering if there is a benefit to doing it one way vs the other?
For example to create a max heap from a PriortiyQueue i can
PriorityQueue<Integer> heap = new PriorityQueue<>((a,b) -> b-a);
or
PriorityQueue<Integer> heap = new PriorityQueue<>(Collections.reverseOrder());
Is the time and space complexity the same? Should i be defaulting to use a comparator or is the collections.reverseOrder() good?
The first Comparator (one with lambda) is wrong because it is subject to integer overflow (for example, with b = Integer.MIN_VALUE and a > 0, this would return a positive value, not a negative one), you should use:
PriorityQueue<Integer> heap = new PriorityQueue<>((a,b) -> Integer.compare(b, a));
Which is also called by Integer.compareTo, which is called by reverseOrder.
As said in other answer, you should use the reverseOrder or Comparator.reversed() because it make the intention clear.
Now, for your question in depth, you should note that:
The lambda (a, b) -> b - a is affected by unboxing and should be read b.intValue() - a.intValue() and since this one is not valid for some values due to integer overflow: it should be Integer.compare(b.intValue(), a.intValue()).
The reverseOrder simply calls b.compareTo(a) which calls Integer.compare(value, other.value) where value is the actual value of the boxed Integer.
The performance difference would amount to:
The cost of the two unboxing
Cost of calling methods.
The JVM optimization
You could wind up a JMH test (like the one below, based on another that I wrote for another answer): I shortened the values v1/v2 does to 3 because it takes times (~40m).
package stackoverflow;
import java.util.*;
import java.util.stream.*;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.infra.Blackhole;
#State(Scope.Benchmark)
#Warmup( time = 1, timeUnit = TimeUnit.SECONDS)
#Measurement( time = 1, timeUnit = TimeUnit.SECONDS)
#Fork(1)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode({ Mode.AverageTime})
public class ComparatorBenchmark {
private List<Integer> values;
#Param({ "" + Integer.MIN_VALUE, "" + Integer.MAX_VALUE, "10", "-1000", "5000", "10000", "20000", "50000", "100000" })
private Integer v1;
#Param({ "" + Integer.MIN_VALUE, "" + Integer.MAX_VALUE, "10", "1000", "-5000", "10000", "20000", "50000", "100000" })
private Integer v2;
private Comparator<Integer> cmp1;
private Comparator<Integer> cmp2;
private Comparator<Integer> cmp3;
#Setup
public void setUp() {
cmp1 = (a, b) -> Integer.compare(b, a);
cmp2 = (a, b) -> b - a;
cmp3 = Collections.reverseOrder();
}
#Benchmark
public void with_Integer_compare(Blackhole blackhole) {
blackhole.consume(cmp1.compare(v1, v2));
}
#Benchmark
public void with_b_minus_a(Blackhole blackhole) {
blackhole.consume(cmp2.compare(v1, v2));
}
#Benchmark
public void with_reverse_comparator(Blackhole blackhole) {
blackhole.consume(cmp3.compare(v1, v2));
}
}
Running this on:
Windows 10
Java 17.0.2
AMD Ryzen 7 2700X # 3.70 GHz
Produces the following result (I limited it to 3 values: MIN_VALUE, MAX_VALUE and 10 because its ETA is ~40m):
Benchmark (v1) (v2) Mode Cnt Score Error Units
ComparatorBenchmark.with_Integer_compare -2147483648 -2147483648 avgt 5 1,113 ± 0,074 ns/op
ComparatorBenchmark.with_Integer_compare -2147483648 2147483647 avgt 5 1,111 ± 0,037 ns/op
ComparatorBenchmark.with_Integer_compare -2147483648 10 avgt 5 1,111 ± 0,075 ns/op
ComparatorBenchmark.with_Integer_compare 2147483647 -2147483648 avgt 5 1,122 ± 0,075 ns/op
ComparatorBenchmark.with_Integer_compare 2147483647 2147483647 avgt 5 1,123 ± 0,070 ns/op
ComparatorBenchmark.with_Integer_compare 2147483647 10 avgt 5 1,102 ± 0,039 ns/op
ComparatorBenchmark.with_Integer_compare 10 -2147483648 avgt 5 1,097 ± 0,024 ns/op
ComparatorBenchmark.with_Integer_compare 10 2147483647 avgt 5 1,094 ± 0,019 ns/op
ComparatorBenchmark.with_Integer_compare 10 10 avgt 5 1,097 ± 0,034 ns/op
ComparatorBenchmark.with_b_minus_a -2147483648 -2147483648 avgt 5 1,105 ± 0,054 ns/op
ComparatorBenchmark.with_b_minus_a -2147483648 2147483647 avgt 5 1,099 ± 0,040 ns/op
ComparatorBenchmark.with_b_minus_a -2147483648 10 avgt 5 1,094 ± 0,038 ns/op
ComparatorBenchmark.with_b_minus_a 2147483647 -2147483648 avgt 5 1,112 ± 0,044 ns/op
ComparatorBenchmark.with_b_minus_a 2147483647 2147483647 avgt 5 1,105 ± 0,029 ns/op
ComparatorBenchmark.with_b_minus_a 2147483647 10 avgt 5 1,112 ± 0,068 ns/op
ComparatorBenchmark.with_b_minus_a 10 -2147483648 avgt 5 1,086 ± 0,010 ns/op
ComparatorBenchmark.with_b_minus_a 10 2147483647 avgt 5 1,125 ± 0,084 ns/op
ComparatorBenchmark.with_b_minus_a 10 10 avgt 5 1,125 ± 0,082 ns/op
ComparatorBenchmark.with_reverse_comparator -2147483648 -2147483648 avgt 5 1,121 ± 0,050 ns/op
ComparatorBenchmark.with_reverse_comparator -2147483648 2147483647 avgt 5 1,122 ± 0,067 ns/op
ComparatorBenchmark.with_reverse_comparator -2147483648 10 avgt 5 1,129 ± 0,094 ns/op
ComparatorBenchmark.with_reverse_comparator 2147483647 -2147483648 avgt 5 1,117 ± 0,046 ns/op
ComparatorBenchmark.with_reverse_comparator 2147483647 2147483647 avgt 5 1,122 ± 0,072 ns/op
ComparatorBenchmark.with_reverse_comparator 2147483647 10 avgt 5 1,116 ± 0,080 ns/op
ComparatorBenchmark.with_reverse_comparator 10 -2147483648 avgt 5 1,114 ± 0,052 ns/op
ComparatorBenchmark.with_reverse_comparator 10 2147483647 avgt 5 1,133 ± 0,068 ns/op
ComparatorBenchmark.with_reverse_comparator 10 10 avgt 5 1,134 ± 0,036 ns/op
As you can see, the score are within the same margin.
You would not gain/lose much from neither implementation, and as already said, you should use Collections.reverseOrder() to make your intention clear, and if not use Integer.compare, not a subtraction subject to integer overflow unless you are sure that each Integer is limited (for example, Short.MIN_VALUE to Short.MAX_VALUE).
The asymptotics are the same, but Collections.reverseOrder() has several important advantages:
It guarantees that it doesn't do an allocation. ((a, b) -> b - a probably doesn't allocate, either, but reverseOrder guarantees a singleton.)
It is clearly self-documenting.
It works for all integers; b-a will break if comparing e.g. Integer.MAX_VALUE and -2 due to overflow.
Yes, the time and space complexity is the same (constant).
Using Collections.reverseOrder() is better because it names the order explicitly, instead of making the reader read the implementation and infer the order.
I think time and space complexity must be same because at the end your passing the comparator in both object creations so the only difference is :-
//passing specified comparator
PriorityQueue<Integer> heap = new PriorityQueue<>((a,b) -> b-a);
//Collection.reverseOrder will return comparator with natural reverse ordering
PriorityQueue<Integer> heap = new PriorityQueue<>(Collections.reverseOrder());

JMH: strange dependency on the environment

While making my first approaches to using JMH to benchmark my class, I encountered a behavior that confuses me, and I'd like to clarify the issue before moving on.
The situation that confuses me:
When I run the benchmarks while the CPU is loaded (78%-80%) by extraneous processes, the results shown by JMH look quite plausible and stable:
Benchmark Mode Cnt Score Error Units
ArrayOperations.a_bigDecimalAddition avgt 5 264,703 ± 2,800 ns/op
ArrayOperations.b_quadrupleAddition avgt 5 44,290 ± 0,769 ns/op
ArrayOperations.c_bigDecimalSubtraction avgt 5 286,266 ± 2,454 ns/op
ArrayOperations.d_quadrupleSubtraction avgt 5 46,966 ± 0,629 ns/op
ArrayOperations.e_bigDecimalMultiplcation avgt 5 546,535 ± 4,988 ns/op
ArrayOperations.f_quadrupleMultiplcation avgt 5 85,056 ± 1,820 ns/op
ArrayOperations.g_bigDecimalDivision avgt 5 612,814 ± 5,943 ns/op
ArrayOperations.h_quadrupleDivision avgt 5 631,127 ± 4,172 ns/op
Relatively large errors are because I need only a rough estimate right now and I trade precision for quickness deliberately.
But the results obtained without extraneous load on the processor seem amazing to me:
Benchmark Mode Cnt Score Error Units
ArrayOperations.a_bigDecimalAddition avgt 5 684,035 ± 370,722 ns/op
ArrayOperations.b_quadrupleAddition avgt 5 83,743 ± 25,762 ns/op
ArrayOperations.c_bigDecimalSubtraction avgt 5 531,430 ± 184,980 ns/op
ArrayOperations.d_quadrupleSubtraction avgt 5 85,937 ± 103,351 ns/op
ArrayOperations.e_bigDecimalMultiplcation avgt 5 641,953 ± 288,545 ns/op
ArrayOperations.f_quadrupleMultiplcation avgt 5 102,692 ± 31,625 ns/op
ArrayOperations.g_bigDecimalDivision avgt 5 733,727 ± 161,827 ns/op
ArrayOperations.h_quadrupleDivision avgt 5 820,388 ± 546,990 ns/op
Everything seems to work almost twice slower, iteration times are very unstable (may vary from 500 to 1300 ns/op at neighbor iterations) and the errors are respectively unacceptably large.
The first set of results is obtained with a bunch of application running, including Folding#home distribute computations client (FahCore_a7.exe) which takes 75% of CPU time, a BitTorrent client that actively uses disks, a dozen of tabs in a browser, e-mail client etc. Average CPU load is about 85%. During the benchmark execution FahCoredecreases the load so that Java takes 25% and total load is 100%.
The second set of results is taken when all unnecessary processes are stopped, CPU is practically idle, only Java takes it's 25% and a couple of percents are used for system needs.
My CPU is Intel i5-4460, 4 kernels, 3.2 GHz, RAM 32 GB, OS Windows Server 2008 R2.
java version "1.8.0_231"
Java(TM) SE Runtime Environment (build 1.8.0_231-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.231-b11, mixed mode)
The questions are:
Why the benchmarks show much worse and unstable results when it's the only task that loads the machine?
Can I consider the first set of results more or less reliable when they depend on the environment so dramatically?
Should I setup the environment somehow to eliminate this dependency?
Or is this my code that is to blame?
The code:
package com.mvohm.quadruple.benchmarks;
// Required imports here
import com.mvohm.quadruple.Quadruple; // The class under tests
#State(value = Scope.Benchmark)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(java.util.concurrent.TimeUnit.NANOSECONDS)
#Fork(value = 1)
#Warmup(iterations = 3, time = 7)
#Measurement(iterations = 5, time = 10)
public class ArrayOperations {
// To do BigDecimal arithmetic with the precision close to this of Quadruple
private static final MathContext MC_38 = new MathContext(38, RoundingMode.HALF_EVEN);
private static final int DATA_SIZE = 0x1_0000; // 65536
private static final int INDEX_MASK = DATA_SIZE - 1; // 0xFFFF
private static final double RAND_SCALE = 1e39; // To provide a sensible range of operands,
// so that the actual calculations don't get bypassed
private final BigDecimal[] // Data to apply operations to
bdOp1 = new BigDecimal[DATA_SIZE], // BigDecimals
bdOp2 = new BigDecimal[DATA_SIZE],
bdResult = new BigDecimal[DATA_SIZE];
private final Quadruple[]
qOp1 = new Quadruple[DATA_SIZE], // Quadruples
qOp2 = new Quadruple[DATA_SIZE],
qResult = new Quadruple[DATA_SIZE];
private int index = 0;
#Setup
public void initData() {
final Random rand = new Random(12345); // for reproducibility
for (int i = 0; i < DATA_SIZE; i++) {
bdOp1[i] = randomBigDecimal(rand);
bdOp2[i] = randomBigDecimal(rand);
qOp1[i] = randomQuadruple(rand);
qOp2[i] = randomQuadruple(rand);
}
}
private static Quadruple randomQuadruple(Random rand) {
return Quadruple.nextNormalRandom(rand).multiply(RAND_SCALE); // ranged 0 .. 9.99e38
}
private static BigDecimal randomBigDecimal(Random rand) {
return Quadruple.nextNormalRandom(rand).multiply(RAND_SCALE).bigDecimalValue();
}
#Benchmark
public void a_bigDecimalAddition() {
bdResult[index] = bdOp1[index].add(bdOp2[index], MC_38);
index = ++index & INDEX_MASK;
}
#Benchmark
public void b_quadrupleAddition() {
// semantically the same as above
qResult[index] = Quadruple.add(qOp1[index], qOp2[index]);
index = ++index & INDEX_MASK;
}
// Other methods are similar
public static void main(String... args) throws IOException, RunnerException {
final Options opt = new OptionsBuilder()
.include(ArrayOperations.class.getSimpleName())
.forks(1)
.build();
new Runner(opt).run();
}
}
The reason was very simple, and I should have understood it immediately. Power saving mode was enabled in the OS, which reduced the clock frequency of the CPU under low load. The moral is, always disable power saving when benchmarking!

Performance comparison of modulo operator and bitwise AND

I'm working to determine if an 32bit integer is even or odd. I have setup 2 approaches:
modulo(%) approach
int r = (i % 2);
bitwise(&) approach
int r = (i & 0x1);
Both approaches work successfully. So I run each line for 15000 times to test performance.
Result:
modulo(%) approach (source code)
mean 141.5801887ns | SD 270.0700275ns
bitwise(&) approach (source code)
mean 141.2504ns | SD 193.6351007ns
Questions:
Why is bitwise(&) more stable than division(%) ?
Does JVM optimize modulo(%) using AND(&) according to here?
Let's try to reproduce with JMH.
#Benchmark
#Measurement(timeUnit = TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
public int first() throws IOException {
return i % 2;
}
#Benchmark
#Measurement(timeUnit = TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
public int second() throws IOException {
return i & 0x1;
}
Okay, it is reproducable. The first is slightly slower than the second. Now let's figure out why. Run it with -prof perfnorm:
Benchmark Mode Cnt Score Error Units
MyBenchmark.first avgt 50 2.674 ± 0.028 ns/op
MyBenchmark.first:CPI avgt 10 0.301 ± 0.002 #/op
MyBenchmark.first:L1-dcache-load-misses avgt 10 0.001 ± 0.001 #/op
MyBenchmark.first:L1-dcache-loads avgt 10 11.011 ± 0.146 #/op
MyBenchmark.first:L1-dcache-stores avgt 10 3.011 ± 0.034 #/op
MyBenchmark.first:L1-icache-load-misses avgt 10 ≈ 10⁻³ #/op
MyBenchmark.first:LLC-load-misses avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:LLC-loads avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:LLC-store-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.first:LLC-stores avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:branch-misses avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:branches avgt 10 4.006 ± 0.054 #/op
MyBenchmark.first:cycles avgt 10 9.322 ± 0.113 #/op
MyBenchmark.first:dTLB-load-misses avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:dTLB-loads avgt 10 10.939 ± 0.175 #/op
MyBenchmark.first:dTLB-store-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.first:dTLB-stores avgt 10 2.991 ± 0.045 #/op
MyBenchmark.first:iTLB-load-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.first:iTLB-loads avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.first:instructions avgt 10 30.991 ± 0.427 #/op
MyBenchmark.second avgt 50 2.263 ± 0.015 ns/op
MyBenchmark.second:CPI avgt 10 0.320 ± 0.001 #/op
MyBenchmark.second:L1-dcache-load-misses avgt 10 0.001 ± 0.001 #/op
MyBenchmark.second:L1-dcache-loads avgt 10 11.045 ± 0.152 #/op
MyBenchmark.second:L1-dcache-stores avgt 10 3.014 ± 0.032 #/op
MyBenchmark.second:L1-icache-load-misses avgt 10 ≈ 10⁻³ #/op
MyBenchmark.second:LLC-load-misses avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.second:LLC-loads avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.second:LLC-store-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.second:LLC-stores avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.second:branch-misses avgt 10 ≈ 10⁻⁴ #/op
MyBenchmark.second:branches avgt 10 4.014 ± 0.045 #/op
MyBenchmark.second:cycles avgt 10 8.024 ± 0.098 #/op
MyBenchmark.second:dTLB-load-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.second:dTLB-loads avgt 10 10.989 ± 0.161 #/op
MyBenchmark.second:dTLB-store-misses avgt 10 ≈ 10⁻⁶ #/op
MyBenchmark.second:dTLB-stores avgt 10 3.004 ± 0.042 #/op
MyBenchmark.second:iTLB-load-misses avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.second:iTLB-loads avgt 10 ≈ 10⁻⁵ #/op
MyBenchmark.second:instructions avgt 10 25.076 ± 0.296 #/op
Note the difference in cycles and instructions. And now that's kind of obvious. The first does care about the sign, but the second does not (just bitwise and). To make sure this is the reason take a look at the assembly fragment:
first:
0x00007f91111f8355: mov 0xc(%r10),%r11d ;*getfield i
0x00007f91111f8359: mov %r11d,%edx
0x00007f91111f835c: and $0x1,%edx
0x00007f91111f835f: mov %edx,%r10d
0x00007f6bd120a6e2: neg %r10d
0x00007f6bd120a6e5: test %r11d,%r11d
0x00007f6bd120a6e8: cmovl %r10d,%edx
second:
0x00007ff36cbda580: mov $0x1,%edx
0x00007ff36cbda585: mov 0x40(%rsp),%r10
0x00007ff36cbda58a: and 0xc(%r10),%edx
An execution time of 150 ns is about 500 clock cycles. I don't think there has ever been a processor that went about the business of checking a bit this inefficiently :-).
The problem is that your test harness is flawed in many ways. In particular:
you make no attempt to trigger JIT compilation before starting the clock
System.nanotime() is not guaranteed to have nanosecond accuracy
System.nanotime() is quite a bit more expensive to call that the code you want to measure
See How do I write a correct micro-benchmark in Java? for a more complete list of things to keep in mind.
Here's a better benchmark:
public abstract class Benchmark {
final String name;
public Benchmark(String name) {
this.name = name;
}
#Override
public String toString() {
return name + "\t" + time() + " ns / iteration";
}
private BigDecimal time() {
try {
// automatically detect a reasonable iteration count (and trigger just in time compilation of the code under test)
int iterations;
long duration = 0;
for (iterations = 1; iterations < 1_000_000_000 && duration < 1_000_000_000; iterations *= 2) {
long start = System.nanoTime();
run(iterations);
duration = System.nanoTime() - start;
cleanup();
}
return new BigDecimal((duration) * 1000 / iterations).movePointLeft(3);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
/**
* Executes the code under test.
* #param iterations
* number of iterations to perform
* #return any value that requires the entire code to be executed (to
* prevent dead code elimination by the just in time compiler)
* #throws Throwable
* if the test could not complete successfully
*/
protected abstract Object run(int iterations) throws Throwable;
/**
* Cleans up after a run, setting the stage for the next.
*/
protected void cleanup() {
// do nothing
}
public static void main(String[] args) throws Exception {
System.out.println(new Benchmark("%") {
#Override
protected Object run(int iterations) throws Throwable {
int sum = 0;
for (int i = 0; i < iterations; i++) {
sum += i % 2;
}
return sum;
}
});
System.out.println(new Benchmark("&") {
#Override
protected Object run(int iterations) throws Throwable {
int sum = 0;
for (int i = 0; i < iterations; i++) {
sum += i & 1;
}
return sum;
}
});
}
}
On my machine, it prints:
% 0.375 ns / iteration
& 0.139 ns / iteration
So the difference is, as expected, on the order of a couple clock cycles. That is, & 1 was optimized slightly better by this JIT on this particular hardware, but the difference is so small it is extremely unlikely to have a measurable (let alone significant) impact on the performance of your program.
The two operations correspond to different JVM processor instructions:
irem // int remainder (%)
iand // bitwise and (&)
Somewhere I read irem is usually implemented by the JVM, while iand is available on hardware. Oracle explains the two instructions as follows:
iand
An int result is calculated by taking the bitwise AND (conjunction) of value1 and value2.
irem
The int result is value1 - (value1 / value2) * value2.
It seems reasonable to me to assume that iand results in less CPU cycles.

Fastest way of converting uppercase to lowercase and lowercase to uppercase in Java

This is a question about performance. I can convert from uppercase to lowercase and vice versa by using this code:
From lowercase to uppercase:
// Uppercase letters.
class UpperCase {
public static void main(String args[]) {
char ch;
for (int i = 0; i < 10; i++) {
ch = (char) ('a' + i);
System.out.print(ch);
// This statement turns off the 6th bit.
ch = (char) ((int) ch & 65503); // ch is now uppercase
System.out.print(ch + " ");
}
}
}
From uppercase to lowercase:
// Lowercase letters.
class LowerCase {
public static void main(String args[]) {
char ch;
for (int i = 0; i < 10; i++) {
ch = (char) ('A' + i);
System.out.print(ch);
ch = (char) ((int) ch | 32); // ch is now lowercase
System.out.print(ch + " ");
}
}
}
I know that Java provides the following methods: .toUpperCase( ) and .toLowerCase( ). Thinking about performance, what is the fastest way to do this conversion, by using bitwise operations the way I showed it in the code above, or by using the .toUpperCase( ) and .toLowerCase( ) methods? Thank you.
Edit 1: Notice how I am using decimal 65503, which is binary ‭1111111111011111‬. I am using 16 bits, not 8. According to the answer currently with more votes at How many bits or bytes are there in a character?:
A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.
The code in my question is assuming UTF-16.
Yes a method written by you will be slightly faster if you choose to perform the case conversion with a simple bitwise operation, whereas Java's methods have more complex logic to support unicode characters and not just the ASCII charset.
If you look at String.toLowerCase() you'll notice that there's a lot of logic in there, so if you were working with software that needed to process huge amounts of ASCII only, and nothing else, you might actually see some benefit from using a more direct approach.
But unless you are writing a program that spends most of its time converting ASCII, you won't be able to notice any difference even with a profiler (and if you are writing that kind of a program...you should look for another job).
As promised, here are two JMH benchmarks; one comparing Character#toUpperCase to your bitwise method, and the other comparing Character#toLowerCase to your other bitwise method. Note that only characters within the English alphabet were tested.
First Benchmark (to uppercase):
#State(Scope.Benchmark)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
#Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
#Fork(3)
public class Test {
#Param({"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"})
public char c;
#Benchmark
public char toUpperCaseNormal() {
return Character.toUpperCase(c);
}
#Benchmark
public char toUpperCaseBitwise() {
return (char) (c & 65503);
}
}
Output:
Benchmark (c) Mode Cnt Score Error Units
Test.toUpperCaseNormal a avgt 30 2.447 ± 0.028 ns/op
Test.toUpperCaseNormal b avgt 30 2.438 ± 0.035 ns/op
Test.toUpperCaseNormal c avgt 30 2.506 ± 0.083 ns/op
Test.toUpperCaseNormal d avgt 30 2.411 ± 0.010 ns/op
Test.toUpperCaseNormal e avgt 30 2.417 ± 0.010 ns/op
Test.toUpperCaseNormal f avgt 30 2.412 ± 0.005 ns/op
Test.toUpperCaseNormal g avgt 30 2.410 ± 0.004 ns/op
Test.toUpperCaseBitwise a avgt 30 1.758 ± 0.007 ns/op
Test.toUpperCaseBitwise b avgt 30 1.789 ± 0.032 ns/op
Test.toUpperCaseBitwise c avgt 30 1.763 ± 0.005 ns/op
Test.toUpperCaseBitwise d avgt 30 1.763 ± 0.012 ns/op
Test.toUpperCaseBitwise e avgt 30 1.757 ± 0.003 ns/op
Test.toUpperCaseBitwise f avgt 30 1.755 ± 0.003 ns/op
Test.toUpperCaseBitwise g avgt 30 1.759 ± 0.003 ns/op
Second Benchmark (to lowercase):
#State(Scope.Benchmark)
#BenchmarkMode(Mode.AverageTime)
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)
#Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
#Fork(3)
public class Test {
#Param({"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"})
public char c;
#Benchmark
public char toLowerCaseNormal() {
return Character.toUpperCase(c);
}
#Benchmark
public char toLowerCaseBitwise() {
return (char) (c | 32);
}
}
Output:
Benchmark (c) Mode Cnt Score Error Units
Test.toLowerCaseNormal A avgt 30 2.084 ± 0.007 ns/op
Test.toLowerCaseNormal B avgt 30 2.079 ± 0.006 ns/op
Test.toLowerCaseNormal C avgt 30 2.081 ± 0.005 ns/op
Test.toLowerCaseNormal D avgt 30 2.083 ± 0.010 ns/op
Test.toLowerCaseNormal E avgt 30 2.080 ± 0.005 ns/op
Test.toLowerCaseNormal F avgt 30 2.091 ± 0.020 ns/op
Test.toLowerCaseNormal G avgt 30 2.116 ± 0.061 ns/op
Test.toLowerCaseBitwise A avgt 30 1.708 ± 0.006 ns/op
Test.toLowerCaseBitwise B avgt 30 1.705 ± 0.018 ns/op
Test.toLowerCaseBitwise C avgt 30 1.721 ± 0.022 ns/op
Test.toLowerCaseBitwise D avgt 30 1.718 ± 0.010 ns/op
Test.toLowerCaseBitwise E avgt 30 1.706 ± 0.009 ns/op
Test.toLowerCaseBitwise F avgt 30 1.704 ± 0.004 ns/op
Test.toLowerCaseBitwise G avgt 30 1.711 ± 0.007 ns/op
I've only included a few different letters (even though all were tested), as they are all share similar outputs.
It's clear that your bitwise methods are faster, mainly due to Character#toUpperCase and Character#toLowerCase performing logical checks (as I had mentioned earlier today in my comment).
Your code only works for ANSII characters. What about languages where no clear conversion between lowercase and uppercase exists e.g. German ß (please correct me if I'm wrong my German is horrible) or when a letter/symbol is written using multi-byte UTF-8 code point. Correctness comes before performance and the problem is not so simple if you have to handle UTF-8, as evident in String.toLowerCase(Locale) method.
Just stick to the provided methods .toLowerCase() and .toUpperCase(). Adding two separate classes to perform two methods that have already been provided by java.lang is an overkill and will slow down your program(with a small margin).

Categories