Improving performance of network coding-encoding - java

I'm currently developing a Java-based library for network coding (http://en.wikipedia.org/wiki/Network_coding). This is very CPU-intensive and therefore need some help optimizing the encoding stage. What I'm essentially doing is that I'm creating random-linear combinations of the original data where addition is XOR and multiplication is a Galois-field multiplication (in GF(2^16)).
I've come as far as I'm capable with the optimizations. For instance I'm using tricks like this: http://groups.google.com/group/comp.dsp/browse_thread/thread/cba57ae9db9971fd/7cd21eec39ddae1a?hl=en&lnk=gst&q=Sarwate+Galois#7cd21eec39ddae1a to make the multiplications faster.
I'm therefore looking for tips on how to optimize this further. It's hard to profile since the profilers I've used doesn't give you any hints on which operation is the most expensive (e.g. is it the array-lookup or the XOR). So I'm at the point where I'm sort of randomly trying out different ideas and test if it improves the overall performance.
More specifically some potential areas of improvement that I need help on are:
How can I make sure that Java can skip the bounds-checking on the array-operations?
How can I retrieve the bytecode that actually executes after the HotSpot is done optimizing?
Here's the core of the algorithm. It might be hard to understand out of context but if you see any unnecessarily expensive operations I'm doing then please let me know!
int messageFragmentStart = 0;
int messageFragmentEnd = fragmentCharSize;
int coefficientIndex = fragmentID * messageFragmentsPerDataBlock;
final int resultArrayIndexStart = fragmentID * fragmentCharSize;
for (int messageFragmentIndex = 0; messageFragmentIndex < messageFragmentsPerDataBlock; messageFragmentIndex++) {
final int coefficientLogValue = coefficientLogValues[coefficientIndex++];
int resultArrayIndex = resultArrayIndexStart;
for (int i = messageFragmentStart; i < messageFragmentEnd; i++) {
final int logSum = coefficientLogValue + logOfDataToEncode[i];
final int messageMultipliedByCoefficient = expTable[logSum];
resultArray[resultArrayIndex++] ^= messageMultipliedByCoefficient;
}
messageFragmentStart += fragmentCharSize;
messageFragmentEnd = Math.min(messageFragmentEnd + fragmentCharSize, maxTotalLength);
}

You can't make Java forgo the bounds checking as its specified in the JLS. But in most cases the JIT is able to avoid this as long as the bounds check is simple (eg i < array.length) - if not, there's no way to avoid it (well I assume one could play with unsafe objects?).
For your second problem there's this here which should fulfill the purposes just fine.
But anyhow from your code it seems like this problem is trivial to vectorize and sadly the JVM isn't very good at it/does it at all. Hence implementing the same code in c/c++ using compile intrinsics (you could even try the auto vectorization of ICC/GCC) could lead to some quite noticeable speedups - assuming we're not completely memory bound. So I'd implement it in C++ and use JNI just for reference.

Related

Java string concatenation optimisation is applied in this case?

Let's imagine I have a lib which contains the following simple method:
private static final String CONSTANT = "Constant";
public static String concatStringWithCondition(String condition) {
return "Some phrase" + condition + CONSTANT;
}
What if someone wants to use my method in a loop? As I understand, that string optimisation (where + gets replaced with StringBuilder or whatever is more optimal) is not working for that case? Or this is valid for strings initialised outside of the loop?
I'm using java 11 (Dropwizard).
Thanks.
No, this is fine.
The only case that string concatenation can be problematic is when you're using a loop to build one single string. Your method by itself is fine. Callers of your method can, of course, mess things up, but not in a way that's related to your method.
The code as written should be as efficient as making a StringBuilder and appending these 3 constants to it. There certainly is absolutely no difference at all between a literal ("Some phrase"), and an expression that the compiler can treat as a Compile Time Constant (which CONSTANT, here, clearly is - given that CONSTANT is static, final, not null, and of a CTCable type (All primitives and strings)).
However, is that 'efficient'? I doubt it - making a stringbuilder is not particularly cheap either. It's orders of magnitude cheaper than continually making new strings, sure, but there's always a bigger fish:
It doesn't matter
Computers are fast. Really, really fast. It is highly likely that you can write this incredibly badly (performance wise) and it still won't be measurable. You won't even notice. Less than a millisecond slower.
In general, anybody that worries about performance at this level simply lacks perspective and knowledge: If you apply that level of fretting to your java code and you have the knowledge to know what could in theory be non-perfectly-performant, you'll be sweating every 3rd character you ever type. That's no way to program. So, gain that perspective (or take it from me, "just git gud" is not exactly something you can do in a week - take it on faith for now, as you learn you can start verifying) - and don't worry about it. Unless you actually run into an actual situation where the code is slower than it feels like it could be, or slower than it needs to be, and then toss profilers and microbenchmark testing frameworks at it, and THEN, armed with all that information (and not before!), consider optimizing. The reports tell you what to optimize, because literally less than 1% of the code is responsible for 99% of the performance loss, so spending any time on code that isn't in that 1% is an utter waste of time, hence why you must get those reports first, or not start at all.
... or perhaps it does
But if it does matter, and it's really that 1% of the code that is responsible for 99% of the loss, then usually you need to go a little further than just 'optimize the method'. Optimize the entire pipeline.
What is happening with this string? Take that into consideration.
For example, let's say that it, itself, is being appended to a much bigger stringbuilder. In which case, making a tiny stringbuilder here is incredibly inefficient compared to rewriting the method to:
public static void concatStringWithCondition(StringBuilder sb, String condition) {
sb.append("Some phrase").append(condition).append(CONSTANT);
}
Or, perhaps this data is being turned into bytes using UTF_8 and then tossed onto a web socket. In that case:
private static final byte[] PREFIX = "Some phrase".getBytes(StandardCharsets.UTF_8);
private static final byte[] SUFFIX = "Some Constant".getBytes(StandardCharsets.UTF_8);
public void concatStringWithCondition(OutputStream out, String condition) {
out.write(PREFIX);
out.write(condition.getBytes(StandardCharsets.UTF_8));
out.write(SUFFIX);
}
and check if that outputstream is buffered. If not, make it buffered, that'll help a ton and would completely dwarf the cost of not using string concatenation. If the 'condition' string can get quite large, the above is no good either, you want a CharsetEncoder that encodes straight to the OutputStream, and may even want to replace all that with some ByteBuffer based approach.
Conclusion
Assume performance is never relevant until it is.
IF performance truly must be tackled, strap in, it'll take ages to do it right. Doing it 'wrong' (applying dumb rules of thumb that do not work) isn't useful. Either do it right, or don't do it.
IF you're still on bard, always start with profiler reports and use JMH to gather information.
Be prepared to rewrite the pipeline - change the method signatures, in order to optimize.
That means that micro-optimizing, which usually sacrifices nice abstracted APIs, is actively bad for performance - because changing pipelines is considerably more difficult if all code is micro-optimized, given that this usually comes at the cost of abstraction.
And now the circle is complete: Point 5 shows why the worrying about performance as you are doing in this question is in fact detrimental: It is far too likely that this worry results in you 'optimizing' some code in a way that doesn't actually run faster (because the JVM is a complex beast), and even if it did, it is irrelevant because the code path this code is on is literally only 0.01% or less of the total runtime expenditure, and in the mean time you've made your APIs worse and lack abstraction which would make any actually useful optimization much harder than it needs to be.
But I really want rules of thumb!
Allright, fine. Here are 2 easy rules of thumb to follow that will lead to better performance:
When in rome...
The JVM is an optimising marvel and will run the craziest code quite quickly anyway. However, it does this primarily by being a giant pattern matching machine: It finds recognizable code snippets and rewrites these to the fastest, most carefully tuned to juuust your combination of hardware machine code it can. However, this pattern machine isn't voodoo magic: It's got limited patterns. Which patterns do JVM makers 'ship' with their JVMs? Why, the common patterns, of course. Why include a pattern for exotic code virtually nobody ever writes? Waste of space.
So, write code the way java programmers tend to write it. Which very much means: Do not write crazy code just because you think it might be faster. It'll likely be slower. Just follow the crowd.
Trivial example:
Which one is faster:
List<String> list = new ArrayList<String>();
for (int i = 0; i < 10000; i++) list.add(someRandomName());
// option 1:
String[] arr = list.toArray(new String[list.size()]);
// option 2:
String[] arr = list.toArray(new String[0]);
You might think, obviously, option 1, right? Option 2 'wastes' a string array, making a 0-length array just to toss it in the garbage right after. But you'd be wrong: Option 2 is in fact faster (if you want an explanation: The JVM recognizes it, and does a hacky move: It makes an new string array that does not need to be initialized with all zeroes first. Normal java code cannot do this (arrays are neccessarily initialized blank, to prevent memory corruption issues), but specifically .toArray(new X[0])? Those pattern matching machines I told you about detect this and replace it with code that just blits the refs straight into a patch of memory without wasting time writing zeroes to it first.
It's a subtle difference that is highly unlikely to matter - it just highlights: Your instincts? They will mislead you every time.
Fortunately, .toArray(new X[0]) is common java code. And easier and shorter. So just write nice, convenient code that looks like how other folks write and you'd have gotten the right answer here. Without having to know such crazy esoterics as having to reason out how the JVM needs to waste time zeroing out that array and how hotspot / pattern matching might possibly eliminate this, thus making it faster. That's just one of 5 million things you'd have to know - and nobody can do that. Thus: Just write java code in simple, common styles.
Algorithmic complexity is a thing hotspot can't fix for you
Given an O(n^3) algorithm fighting an O(log(n) * n^2) algorithm, make n large enough and the second algorithm has to win, that's what big O notation means. The JVM can do a lot of magic but it can pretty much never optimize an algorithm into a faster 'class' of algorithmic complexity. You might be surprised at the size n has to be before algorithmic complexity dominates, but it is acceptable to realize that your algorithm can be fundamentally faster and do the work on rewriting it to this more efficient algorithm even without profiler reports and benchmark harnesses and the like.

Implementing nonlinear optimization with nonlinear inequality constraints with java

How do I implement a nonlinear optimization with nonlinear constraints in java? I am currently using org.apache.commons.math3.optim.nonlinear.scalar.noderiv, and I have read that none of the optimizers (such as the one I am currently working with, SimplexOptimizer) take constraints by default, but that instead one must map the constrained parameters to unconstrained ones by implementing the MultivariateFunctionPenaltyAdapter or MultivariateFunctionMappingAdapter classes. However, as far as I can tell, even using these wrappers, one can still only implement linear or "simple" constraints. I am wondering if there is any way to include nonlinear inequality constraints?
For example, suppose that My objective function is a function of 3 parameters: a,b,and c (depending on them non-linearly) and that additionally these parameters are subject to the constraint that ab
Any advice that would solve the problem using just apache commons would be great, but any suggestions for extending existing classes or augmenting the package would also be welcome of course.
My best attempt so far at implementing the COBYLA package is given below:
public static double[] Optimize(double[][] contractDataMatrix,double[] minData, double[] maxData,double[] modelData,String modelType,String weightType){
ObjectiveFunction objective = new ObjectiveFunction(contractDataMatrix,modelType,weightType);
double rhobeg = 0.5;
double rhoend = 1.0e-6;
int iprint = 3;
int maxfun = 3500;
int n = modelData.length;
Calcfc calcfc = new Calcfc(){
#Override
public double Compute(int n, int m, double[] x, double[] con){
con[0]=x[3]*x[3]-2*x[0]*x[1];
System.out.println("constraint: "+(x[3]*x[3]-2*x[0]*x[1]));
return objective.value(x);
}
};
COBYLAExitStatus result = COBYLA.FindMinimum(calcfc, n, 1, modelData, rhobeg, rhoend, iprint, maxfun);
return modelData;
}
The issue is that I am still getting illegal values in my optimization. As you can see, within the anonymous override of the compute function, I am printing out the value of my constraint. The result is often negative. But shouldn't this value be constrainted to be non-negative?
EDIT: I found the bug in my code, which was unrelated to the optimizer itself but rather my implementation.
Best,
Paul
You might want to consider an optimizer that is not available in Apache Commons Math. COBYLA is a derivative-free method for relatively small optimization problems (less than 100 variables) with nonlinear constraints. I have ported the original Fortran code to Java, the source code is here.

Simple: efficiency of ' if '

I do not have a teacher who I can ask questions about efficiency, so I will ask it here.
If I am only looking to have fast working code, not paying attention to ram use, only cpu:
I assume that checking 'if' once is faster than writing a variable once. But what is the ratio? When is it worth always checking if the variable is not already at the value that I am going to set it to?
For example:
//ex. 1
int a = 5;
while (true) {
a = 5;
}
//ex. 2
int a = 5;
while (true) {
if (a != 5) a = 5;
}
//ex. 3
int a = 6;
while (true) {
if (a != 5) a = 5;
a = 6;
}
I guess ex. 2 will work faster than ex. 1 because 'a' always stays at '5'. In this case 'if' speeds up the process by not writing a new value to 'a' everytime. But if 'a' often changes, like in ex. 3, then checking if (a != 5) is not necessary and slows down the process. So this checking is worth it if the variable stays the same most of the time; and not worth it if the variable changes most of the time. But where is the ratio? Maybe writing a variable takes 1000 times more time than just checking it? Or maybe writing almost takes the same time as checking it? Im not asking for an exact answer, I just always wonder what is best for my code.
Short answer: it doesn't matter.
Long answer: It really doesn't matter at that low level. Even if you were to actually compare the executed machine code, there are so many things in between (the JIT compiler for one, all sorts of CPU caches for other).
Gone are the times when you needed to micro-optimize things like this. What you need to make sure is that you're using effective algorithms. And as always, premature optimization is the root of all evil.
I noted that you wrote "I just always wonder what is the best way for my code". The best way is to write clear code, so that other people can understand what you're doing (if they saw code like in your examples, they would think you're insane). Another old adage was that in order for the JVM to optimize your code in the best way, you should write "dumb code". The JIT optimizer can then understand the code better and convert it to a more efficient form.

Good choice for a lightweight checksum algorithm?

I find myself needing to generate a checksum for a string of data, for consistency purposes. The broad idea is that the client can regenerate the checksum based on the payload it recieves and thus detect any corruption that took place in transit. I am vaguely aware that there are all kinds of mathematical principles behind this kind of thing, and that it's very easy for subtle errors to make the whole algorithm ineffective if you try to roll it yourself.
So I'm looking for advice on a hashing/checksum algorithm with the following criteria:
It will be generated by Javascript, so needs to be relatively light computationally.
The validation will be done by Java (though I cannot see this actually being an issue).
It will take textual input (URL-encoded Unicode, which I believe is ASCII) of a moderate length; typically around 200-300 characters and in all cases below 2000.
The output should be ASCII text as well, and the shorter it can be the better.
I'm primarily interested in something lightweight rather than getting the absolute smallest potential for collisions possible. Would I be naive to imagine that an eight-character hash would be suitable for this? I should also clarify that it's not the end of the world if corruption isn't picked up at the validation stage (and I do realise that this will not be 100% reliable), though the rest of my code is markedly less efficient for every corrupt entry that slips through.
Edit - thanks to all that contributed. I went with the Adler32 option and given that it was natively supported in Java, extremely easy to implement in Javascript, fast to calculate at both ends and have an 8-byte output it was exactly right for my requirements.
(Note that I realise that the network transport is unlikely to be responsible for any corruption errors and won't be folding my arms on this issue just yet; however adding the checksum validation removes one point of failure and means we can focus on other areas should this reoccur.)
CRC32 is not too hard to implement in any language, it is good enough to detect simple data corruption and when implemted in a good fashion, it is very fast. However you can also try Adler32, which is almost equally good as CRC32, but it's even easier to implement (and about equally fast).
Adler32 in the Wikipedia
CRC32 JavaScript implementation sample
Either of these two (or maybe even both) are available in Java right out of the box.
Are aware that both TCP and UDP (and IP, and Ethernet, and...) already provide checksum protection to data in transit?
Unless you're doing something really weird, if you're seeing corruption, something is very wrong. I suggest starting with a memory tester.
Also, you receive strong data integrity protection if you use SSL/TLS.
Javascript implementation of MD4, MD5 and SHA1. BSD license.
Other people have mentioned CRC32 already, but here's a link to the W3C implementation of CRC-32 for PNG, as one of the few well-known, reputable sites with a reference CRC implementation.
(A few years back I tried to find a well-known site with a CRC algorithm or at least one that cited the source for its algorithm, & was almost tearing my hair out until I found the PNG page.)
[UPDATE 30/5/2013: The link to the old JS CRC32 implementation died, so I've now linked to a different one.]
Google CRC32: fast, and much lighter weight than MD5 et al. There is a Javascript implementation here.
In my search for a JavaScript implementation of a good checksum algorithm I came across this question. Andrzej Doyle rightfully chose Adler32 as the checksum, as it is indeed easy to implement and has some excellent properties. DroidOS then provided an actual implementation in JavaScript, which demonstrated the simplicity.
However, the algorithm can be further improved upon as detailed in the Wikipedia page and as implemented below. The trick is that you need not determine the modulo in each step. Rather, you can defer this to the end. This considerably increases the speed of the implementation, up to 6x faster on Chrome and Safari. In addition, this optimalisation does not affect the readability of the code making it a win-win. As such, it definitely fits in well with the original question as to having an algorithm / implementation that is computationally light.
function adler32(data) {
var MOD_ADLER = 65521;
var a = 1, b = 0;
var len = data.length;
for (var i = 0; i < len; i++) {
a += data.charCodeAt(i);
b += a;
}
a %= MOD_ADLER;
b %= MOD_ADLER;
return (b << 16) | a;
}
edit: imaya created a jsperf comparison a while back showing the difference in speed when running the simple version, as detailed by DroidOS, compared to an optimised version that defers the modulo operation. I have added the above implementation under the name full-length to the jsperf page showing that the above implementation is about 25% faster than the one from imaya and about 570% faster than the simple implementation (tests run on Chrome 30): http://jsperf.com/adler-32-simple-vs-optimized/6
edit2: please don't forget that, when working on large files, you will eventually hit the limit of your JavaScript implementation in terms of the a and b variables. As such, when working with a large data source, you should perform intermediate modulo operations as to ensure that you do not exceed the maximum value of the integer that you can reliably store.
Use SHA-1 JS implementation. It's not as slow as you think (Firefox 3.0 on Core 2 Duo 2.4Ghz hashes over 100KB per second).
Here's a relatively simple one I've 'invented' - there's no mathematical research behind it but it's extremely fast and works in practice. I've also included the Java equivalent that tests the algorithm and shows that there's less than 1 in 10,000,000 chance of failure (it takes a minute or two to run).
JavaScript
function getCrc(s) {
var result = 0;
for(var i = 0; i < s.length; i++) {
var c = s.charCodeAt(i);
result = (result << 1) ^ c;
}
return result;
}
Java
package test;
import java.util.*;
public class SimpleCrc {
public static void main(String[] args) {
final Random randomGenerator = new Random();
int lastCrc = -1;
int dupes = 0;
for(int i = 0; i < 10000000; i++) {
final StringBuilder sb = new StringBuilder();
for(int j = 0; j < 1000; j++) {
final char c = (char)(randomGenerator.nextInt(128 - 32) + 32);
sb.append(c);
}
final int crc = crc(sb.toString());
if(lastCrc == crc) {
dupes++;
}
lastCrc = crc;
}
System.out.println("Dupes: " + dupes);
}
public static int crc(String string) {
int result = 0;
for(final char c : string.toCharArray()) {
result = (result << 1) ^ c;
}
return result;
}
}
This is a rather old thread but I suspect it is still viewed quite often so - if all you need is a short but reliable piece of code to generate a checksum the Adler32 bit algorithm has to be your choice. Here is the JavaScript code
function adler32(data)
{
var MOD_ADLER = 65521;
var a = 1, b = 0;
for (var i = 0;i < data.length;i++)
{
a = (a + data.charCodeAt(i)) % MOD_ADLER;
b = (b + a) % MOD_ADLER;
}
var adler = a | (b << 16);
return adler;
}
The corresponding fiddle demonsrating the algorithm in action is here.

Compile time optimization of String concatenation

I'm curious, how far the optimization of the following code snippet will go.
To what I know, whenever the capacity of StringBuffer is extended, it costs some CPU work, because its content is required to be reallocated. However, I guess Java compiler optimization can precalculate the required capacity instead of doing multiple reallocations.
The question is: will the following snippet of code be optimized so?
public static String getGetRequestURL(String baseURL, Map<String, String> parameters) {
StringBuilder stringBuilder = new StringBuilder();
parameters.forEach(
(key, value) -> stringBuilder.append(key).append("=").append(value).append("&"));
return baseURL + "?" + stringBuilder.delete(stringBuilder.length(),1);
}
In Java, most optimization is performed by the runtime's just in time compiler, so generally javac optimizations don't matter much.
As a consequence, a Java compiler is not required to optimize string concatenation, though all tend to do so as long as no loops are involved. You can check the extent of such compile time optimizations by using javap (the java decompiler included with the JDK).
So, could javac conceivably optimize this? To determine the length of the string builder, it would have to iterate the map twice. Since java does not feature const references, and the compiler has no special treatment for Map, the compiler can not determine that this rewrite would preserve the meaning of the code. And even if it could, it's not at all clear that the gains would be worth the cost of iterating twice. After all, modern processors can copy 4 to 8 characters in a single cpu instruction. Since memory access is sequential, there won't be any cache missing while growing the buffer. On the other hand, iterating the map a second time will likely cause additional cache misses, because the Map entries (and the strings they reference) can be scattered all over main memory.
In any case, I would not worry about the efficiency of this code. Even if your URL is 1000 characters long, resizing the buffer will take about 0.1 micro seconds. Unless you have evidence that this really is a performance hotspot, your time is probably better spent elsewhere.
First of all:
You can find out what (javac) compile time optimizations occur by looking at the bytecodes using the javap tool.
You can find out what JIT compiler optimizations are performed by getting the JVM to dump the native code.
So, if you need to know how your code has been optimized (on a particular platform) for practical reasons, then you should check.
In reality, the optimizations by javac are pretty simple-minded, and do not go to the extent of precalculating buffer sizes. I haven't checked, but I expect that the same is true for the JIT compiler. I doubt that it makes any attempt to preallocate a StringBuilder with an "optimal" size.
Why?
The reasons include the following:
An inaccurate precalculation (on average) doesn't help, and may be worse than doing nothing.
An accurate precalculation typically involves measuring the (dynamic) lengths of the actual strings to be joined.
Implementing the optimization logic would be complicated, and would make the optimizers slower and more effort to maintain.
At runtime, the String mensuration introduces overheads. Whether you would come out ahead often enough to make a difference is difficult to determine. (People don't like optimizations that make their code run slower ...)
There are better (more efficient) ways to do large scale text assembly than using String concatenation. The programmer (who has more knowledge of the problem domain and application logic) can optimize this better than a compiler. If it matters enough to spend the developer effort on this.
One optimization is to set the baseURL and ampersand in the stringBuilder instead of using the String concatenate at the end, such as:
public static String getGetRequestURL(String baseURL, Map<String, String> parameters) {
StringBuilder stringBuilder = new StringBuilder(baseURL);
stringBuilder.append("&");
parameters.forEach((key, value) -> stringBuilder.append(key).append("=").append(value).append("&"));
stringBuilder.setLength(stringBuilder.length() - 1);
return stringBuilder.toString();
}
If you want a little more speed and since javac or JIT will not optimize based potential string size, you can track that yourself without incurring much overhead, but adding a max size tracker, such as this:
protected static URL_SIZE = 256;
public static String getGetRequestURL(String baseURL, Map<String, String> parameters) {
StringBuilder stringBuilder = new StringBuilder(URL_SIZE);
stringBuilder.append(baseURL);
stringBuilder.append("&");
parameters.forEach((key, value) -> stringBuilder.append(key).append("=").append(value).append("&"));
int size = stringBuilder.length();
if (size > URL_SIZE) {
URL_SIZE = size;
}
stringBuilder.setLength(size - 1);
return stringBuilder.toString();
}
That said, with some testing of 1 million calls, I found that the different version preformed as (in milliseconds):
Your version: total = 1151, average = 230
Above version 1: total = 936, average = 187
Above version 2: total = 839, average = 167

Categories