I have this requirement that I need to set the values in a byte array of size 20MB.
I'm looking for a JAVA API which does the following. I've gone through apache commons arrayutils but couldn't find something useful.
The operation should be something of this type. Say the values range from 0 to 100.
I'd like to manipulate the array such that values less than 15 are changed to 15 and values greater than 70 are changed to 70.
Basically, I'm looking for an operation which would avoid me doing this - iterate through the array, check if the value is below 15, if it is below 15 then set it to 15 otherwise is it above 75, if it is above 75 then set the value to 75.
Any help is appreciated.
Even if there's some third-party library which has this functionality, it's just going to be doing exactly the same operation - looping over an array. Fundamentally you need something like:
for (int i = 0; i < array.length; i++)
{
array[i] = clamp(array[i], 15, 70);
}
...
public static byte clamp(byte value, byte min, byte max)
{
return value < min ? min
: value > max ? max
: value;
}
You could implement this in native code if you really wanted, but I suspect you won't find an existing implementation. It's more likely that there are libraries which perform the sort of image manipulation you're interested in as image manipulation rather than as an array operation.
You could use Guava's Lists.transform method to update the values. However, this would result in a new array not updating the values in the existing array.
List<Byte> list = Lists.newArrayList(myArray);
List<Byte> trans = Lists.transform(list, new Function<Byte, Byte>(){...});
byte[] bytes = Bytes.toArray(trans);
However, given what you are trying to do, I would suggest just looping over the values.
I'd recommend that you write the simple loop, and profile it in the context of your application. Only if you can demonstrate that this code is the overall bottleneck it would make sense to try and make it faster.
I'd try something like this:
final int n = array.length;
for (int i = 0; i < n; i++) {
int val = array[i];
if (val < 15) {
array[i] = 15;
} else if (val > 75) {
array[i] = 75;
}
}
My final point is that this type of code is likely to be limited by memory bandwidth, so it seems unlikely that a native C solution would be a lot faster anyway.
Instead of checking the ranges like Jon skeet proposes, you could create a lookup table for each of the 256 possible a byte could have, i.e. something like
{15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,16,17,18,...,69,70,70,70,70,...}
for (int i = 0; i < len; i++)
{
array[i] = lookup[array[i]];
}
In C: Less branching, much faster. In Java: Unfortunately not faster, even a bit slower, maybe because Java's array range checks eat up the speed gained; and since Java's bytes are always signed, it's a bit more complicated than shown above.
In C, you could even do that for 16bit halfwords, making it faster again. (Probably by factor 2)
EDIT: To my own shame, I must admit that proper testing revealed that the lookup table isn't faster in C. My first results were probably skewed by compiler optimisations. Anyway, at least on my machine,
if (array[i]<15) array[i]=15;
else if (array[i]>70) array[i]=70;
is noticable faster then using the ternary operator.
Related
Assume I have a Java BitSet. I now need to make combinations of the BitSet such that only Bits which are Set can be flipped. i.e. only need combinations of Bits which are set.
For Eg. BitSet - 1010, Combinations - 1010, 1000, 0010, 0000
BitSet - 1100, Combination - 1100, 1000, 0100, 0000
I can think of a few solutions E.g. I can take combinations of all 4 bits and then XOR the combinations with the original Bitset. But this would be very resource-intensive for large sparse BitSets. So I was looking for a more elegant solution.
It appears that you want to get the power set of the bit set. There is already an answer here about how to get the power set of a Set<T>. Here, I will show a modified version of the algorithm shown in that post, using BitSets:
private static Set<BitSet> powerset(BitSet set) {
Set<BitSet> sets = new HashSet<>();
if (set.isEmpty()) {
sets.add(new BitSet(0));
return sets;
}
Integer head = set.nextSetBit(0);
BitSet rest = set.get(0, set.size());
rest.clear(head);
for (BitSet s : powerset(rest)) {
BitSet newSet = s.get(0, s.size());
newSet.set(head);
sets.add(newSet);
sets.add(s);
}
return sets;
}
You can perform the operation in a single linear pass instead of recursion, if you realize the integer numbers are a computer’s intrinsic variant of “on off” patterns and iterating over the appropriate integer range will ultimately produce all possible permutations. The only challenge in your case, is to transfer the densely packed bits of an integer number to the target bits of a BitSet.
Here is such a solution:
static List<BitSet> powerset(BitSet set) {
int nBits = set.cardinality();
if(nBits > 30) throw new OutOfMemoryError(
"Not enough memory for "+BigInteger.ONE.shiftLeft(nBits)+" BitSets");
int max = 1 << nBits;
int[] targetBits = set.stream().toArray();
List<BitSet> sets = new ArrayList<>(max);
for(int onOff = 0; onOff < max; onOff++) {
BitSet next = new BitSet(set.size());
for(int bitsToSet = onOff, ix = 0; bitsToSet != 0; ix++, bitsToSet>>>=1) {
if((bitsToSet & 1) == 0) {
int skip = Integer.numberOfTrailingZeros(bitsToSet);
ix += skip;
bitsToSet >>>= skip;
}
next.set(targetBits[ix]);
}
sets.add(next);
}
return sets;
}
It uses an int value for the iteration, which is already enough to represent all permutations that can ever be stored in one of Java’s builtin collections. If your source BitSet has 2³¹ one bits, the 2³² possible combinations do not only require a hundred GB heap, but also a collection supporting 2³² elements, i.e. a size not representable as int.
So the code above terminates early if the number exceeds the capabilities, without even trying. You could rewrite it to use a long or even BigInteger instead, to keep it busy in such cases, until it will fail with an OutOfMemoryError anyway.
For the working cases, the int solution is the most efficient variant.
Note that the code returns a List rather than a HashSet to avoid the costs of hashing. The values are already known to be unique and hashing would only pay off if you want to perform lookups, i.e. call contains with another BitSet. But to test whether an existing BitSet is a permutation of your input BitSet, you wouldn’t even need to generate all permutations, a simple bit operation, e.g. andNot would tell you that already. So for storing and iterating the permutations, an ArrayList is more efficient.
I'm trying to convert a decimal into binary number using iterative process. How can I make this have a space complexity of O(1) instead of O(n)?
int i = 0;
int j;
int bin[] = new int[n]; //n here is my paramater int n
while(n > 0) {
bin[i] = n % 2;
n /= 2;
i++;
}
//I'm reversing the order of index i with variable j to get right order (e.g. 26 has 11010, instead of 01011)
for(j = i -1; j >= 0; j--) {
System.out.print(bin[j]);
}
First, you don't need place for n bits if the value itself is n. You just need log2(n)+1. It won't give you wrong results to use n bits, but for big values of n, the memory available to your Java process might be not enough.
And, about O(1)... maybe not really what you were thinking, but:
Javas int has a specific fixed value range, which leads to the guarantee that a (positive) int value needs max 31 bit (if you have negative numbers too, storing the sign somewhere is necessary, that's bit 32).
With that information, strictly speaking, you can get O(1) just by rewriting your loops so that they loop exactly 31 times. Then, for each value of n, your code has exactly the same amount of steps, and that is O(1) per definition.
Going the bit fiddling route won't help here. There are some useful shortcuts if your values fulfil certain conditions, but if you want your code to work with any int value, the normal loop as you have here is likely the best you can get.
(Of yourse, CPU intrinsics may help, but not for Java...)
What is the best and efficient way to get the maximum i, which is the number of rows and j, which is the number of columns, in a two dimensional array?
Hopefully, the time complexity can be lower than O(n) for every case. No loop here and can still find the maximum j.
For example, if I have an array like this one
[
[18,18,19,19,20,22,22,24,25,26],
[1,2,3],
[0,0,0,0]
]
Then I want to get i = 3 and j = 10 here as a result.
Can anyone help me?
You can avoid writing the loop yourself, but you can't avoid having a runtime of at least O(n), since "someone" needs to loop the source array.
Here is a possible way to do that in Java 8:
Arrays.stream(arr).map(row -> row.length).max(Integer::compare).get();
This returns the maximum length of a "row" in your 2d array:
10
Another version which avoids using the Comparator and therefore might be a bit easier to read:
Arrays.stream(arr).mapToInt(row -> row.length).max().getAsInt();
arr is supposed to be your source array.
Edit: the older version used .max(Integer::max), which is wrong and causes wrong results. See this answer for an explanation.
Assuming your array does not contain null values, you could write something like this:
private static final Comparator<int[]> lengthComparator = new Comparator<int[]> () {
#Override
public int compare(int[] o1, int[] o2) {
return o1.length - o2.length;
}
};
#Test
public void soArrayMaxLength() {
int[][] array = new int[][] {
{18,18,19,19,20, 22, 22, 24, 25,26},
{1,2,3},
{0,0,0,0}
};
int i = array.length;
Optional<int[]> longestArray =
Arrays.stream(array)
.max(lengthComparator);
int j = longestArray.isPresent() ? longestArray.get().length : 0;
System.out.println(String.format("i=%d j=%d", i, j));
}
If you happen to create a parallel stream from the array instead, you could speed up this even further.
Another option is to sort the array by length, the quicksort usually has an average complexity of O(n*log(n)) therefore this isn't faster;
int i = array.length;
Arrays.parallelSort(array, lengthComparator);
int j = array[i-1].length;
System.out.println(String.format("i=%d j=%d", i, j));
Your i is the number of rows, which is simply the length of the 2-D array (assuming you are OK with including empty/null rows in this count).
The max row length j, however, would require iterating over all the rows to find the row i having the maximum arr[i].length.
There will always be a loop1, even though the looping will be implicit in solutions that use Java 8 streams.
The complexity of getting the max number of columns is O(N) where N is the number of rows.
Implicit looping using streams probably will be less efficient than explicit looping using for.
Here's a neat solution using a for loop
int max = o;
for (int i = 0; i < array.length; i++) {
max = Math.max(max, array[i].length);
}
This works in the edge-case where array.length == 0, but if array or any array[i] is null you will get a NullPointerException. (You could modify the code to allow for that, but if the nulls are not expected, an NPE is probably a better outcome.)
1 - In theory, you could unroll the loops for all cases of array.length from 0 to Integer.MAX_VALUE, you would not need a loop. However, the code would not compile on any known Java compiler because it would exceed JVM limits on bytecode segments, etcetera. And the performance would be terrible for various reasons.
You could try this way: loop on the array and find the max length of the arrays which is in this array
byte[][] arrs = new byte[3][];
int maxLength = 0;
for (byte[] array : arrs) {
if (maxLength < array.length) {
maxLength = array.length;
}
}
The TL;DR version, for those who don't want the background, is the following specific question:
Question
Why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?
Background
Java has multidimensional arrays at the syntax level, in that one can declare
int[][] arr = new int[10][10];
but it seems that this is really not what one might have expected. Rather than having the JVM allocate a contiguous block of RAM big enough to store 100 ints, it comes out as an array of arrays of ints: so each layer is a contiguous block of RAM, but the thing as a whole is not. Accessing arr[i][j] is thus rather slow: the JVM has to
find the int[] stored at arr[i];
index this to find the int stored at arr[i][j].
This involves querying an object to go from one layer to the next, which is rather expensive.
Why Java does this
At one level, it's not hard to see why this can't be optimised to a simple scale-and-add lookup even if it were all allocated in one fixed block. The problem is that arr[3] is a reference all of its own, and it can be changed. So although arrays are of fixed size, we could easily write
arr[3] = new int[11];
and now the scale-and-add is screwed because this layer has grown. You'd need to know at runtime whether everything is still the same size as it used to be. In addition, of course, this will then get allocated somewhere else in RAM (it'll have to be, since it's bigger than what it's replacing), so it's not even in the right place for scale-and-add.
What's problematic about it
It seems to me that this is not ideal, and that for two reasons.
For one, it's slow. A test I ran with these methods for summing the contents of a single dimensional or multidimensional array took nearly twice as long (714 seconds vs 371 seconds) for the multidimensional case (an int[1000000] and an int[100][100][100] respectively, filled with random int values, run 1000000 times with warm cache).
public static long sumSingle(int[] arr) {
long total = 0;
for (int i=0; i<arr.length; i++)
total+=arr[i];
return total;
}
public static long sumMulti(int[][][] arr) {
long total = 0;
for (int i=0; i<arr.length; i++)
for (int j=0; j<arr[0].length; j++)
for (int k=0; k<arr[0][0].length; k++)
total+=arr[i][j][k];
return total;
}
Secondly, because it's slow, it thereby encourages obscure coding. If you encounter something performance-critical that would be naturally done with a multidimensional array, you have an incentive to write it as a flat array, even if that makes the unnatural and hard to read. You're left with an unpalatable choice: obscure code or slow code.
What could be done about it
It seems to me that the basic problem could easily enough be fixed. The only reason, as we saw earlier, that it can't be optimised is that the structure might change. But Java already has a mechanism for making references unchangeable: declare them as final.
Now, just declaring it with
final int[][] arr = new int[10][10];
isn't good enough because it's only arr that is final here: arr[3] still isn't, and could be changed, so the structure might still change. But if we had a way of declaring things so that it was final throughout, except at the bottom layer where the int values are stored, then we'd have an entire immutable structure, and it could all be allocated as one block, and indexed with scale-and-add.
How it would look syntactically, I'm not sure (I'm not a language designer). Maybe
final int[final][] arr = new int[10][10];
although admittedly that looks a bit weird. This would mean: final at the top layer; final at the next layer; not final at the bottom layer (else the int values themselves would be immutable).
Finality throughout would enable the JIT compiler to optimise this to give performance to that of a single dimensional array, which would then take away the temptation to code that way just to get round the slowness of multidimensional arrays.
(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)
Question
So why doesn't Java have an implementation of true multidimensional arrays? Is there a solid technical reason? What am I missing here?
Update
A bizarre side note: the difference in timings drops away to only a few percent if you use an int for the running total rather than a long. Why would there be such a small difference with an int, and such a big difference with a long?
Benchmarking code
Code I used for benchmarking, in case anyone wants to try to reproduce these results:
public class Multidimensional {
public static long sumSingle(final int[] arr) {
long total = 0;
for (int i=0; i<arr.length; i++)
total+=arr[i];
return total;
}
public static long sumMulti(final int[][][] arr) {
long total = 0;
for (int i=0; i<arr.length; i++)
for (int j=0; j<arr[0].length; j++)
for (int k=0; k<arr[0][0].length; k++)
total+=arr[i][j][k];
return total;
}
public static void main(String[] args) {
final int iterations = 1000000;
Random r = new Random();
int[] arr = new int[1000000];
for (int i=0; i<arr.length; i++)
arr[i]=r.nextInt();
long total = 0;
System.out.println(sumSingle(arr));
long time = System.nanoTime();
for (int i=0; i<iterations; i++)
total = sumSingle(arr);
time = System.nanoTime()-time;
System.out.printf("Took %d ms for single dimension\n", time/1000000, total);
int[][][] arrMulti = new int[100][100][100];
for (int i=0; i<arrMulti.length; i++)
for (int j=0; j<arrMulti[i].length; j++)
for (int k=0; k<arrMulti[i][j].length; k++)
arrMulti[i][j][k]=r.nextInt();
System.out.println(sumMulti(arrMulti));
time = System.nanoTime();
for (int i=0; i<iterations; i++)
total = sumMulti(arrMulti);
time = System.nanoTime()-time;
System.out.printf("Took %d ms for multi dimension\n", time/1000000, total);
}
}
but it seems that this is really not what one might have expected.
Why?
Consider that the form T[] means "array of type T", then just as we would expect int[] to mean "array of type int", we would expect int[][] to mean "array of type array of type int", because there's no less reason for having int[] as the T than int.
As such, considering that one can have arrays of any type, it follows just from the way [ and ] are used in declaring and initialising arrays (and for that matter, {, } and ,), that without some sort of special rule banning arrays of arrays, we get this sort of use "for free".
Now consider also that there are things we can do with jagged arrays that we can't do otherwise:
We can have "jagged" arrays where different inner arrays are of different sizes.
We can have null arrays within the outer array where appropriate mapping of the data, or perhaps to allow lazy building.
We can deliberately alias within the array so e.g. lookup[1] is the same array as lookup[5]. (This can allow for massive savings with some data-sets, e.g. many Unicode properties can be mapped for the full set of 1,112,064 code points in a small amount of memory because leaf arrays of properties can be repeated for ranges with matching patterns).
Some heap implementations can handle the many smaller objects better than one large object in memory.
There are certainly cases where these sort of multi-dimensional arrays are useful.
Now, the default state of any feature is unspecified and unimplemented. Someone needs to decide to specify and implement a feature, or else it wouldn't exist.
Since, as shown above, the array-of-array sort of multidimensional array will exist unless someone decided to introduce a special banning array-of-array feature. Since arrays of arrays are useful for the reasons above, that would be a strange decision to make.
Conversely, the sort of multidimensional array where an array has a defined rank that can be greater than 1 and so be used with a set of indices rather than a single index, does not follow naturally from what is already defined. Someone would need to:
Decide on the specification for the declaration, initialisation and use would work.
Document it.
Write the actual code to do this.
Test the code to do this.
Handle the bugs, edge-cases, reports of bugs that aren't actually bugs, backward-compatibility issues caused by fixing the bugs.
Also users would have to learn this new feature.
So, it has to be worth it. Some things that would make it worth it would be:
If there was no way of doing the same thing.
If the way of doing the same thing was strange or not well-known.
People would expect it from similar contexts.
Users can't provide similar functionality themselves.
In this case though:
But there is.
Using strides within arrays was already known to C and C++ programmers and Java built on its syntax so that the same techniques are directly applicable
Java's syntax was based on C++, and C++ similarly only has direct support for multidimensional arrays as arrays-of-arrays. (Except when statically allocated, but that's not something that would have an analogy in Java where arrays are objects).
One can easily write a class that wraps an array and details of stride-sizes and allows access via a set of indices.
Really, the question is not "why doesn't Java have true multidimensional arrays"? But "Why should it?"
Of course, the points you made in favour of multidimensional arrays are valid, and some languages do have them for that reason, but the burden is nonetheless to argue a feature in, not argue it out.
(I hear a rumour that C# does something like this, although I also hear another rumour that the CLR implementation is so bad that it's not worth having... perhaps they're just rumours...)
Like many rumours, there's an element of truth here, but it is not the full truth.
.NET arrays can indeed have multiple ranks. This is not the only way in which it is more flexible than Java. Each rank can also have a lower-bound other than zero. As such, you could for example have an array that goes from -3 to 42 or a two dimensional array where one rank goes from -2 to 5 and another from 57 to 100, or whatever.
C# does not give complete access to all of this from its built-in syntax (you need to call Array.CreateInstance() for lower bounds other than zero), but it does for allow you to use the syntax int[,] for a two-dimensional array of int, int[,,] for a three-dimensional array, and so on.
Now, the extra work involved in dealing with lower bounds other than zero adds a performance burden, and yet these cases are relatively uncommon. For that reason single-rank arrays with a lower-bound of 0 are treated as a special case with a more performant implementation. Indeed, they are internally a different sort of structure.
In .NET multi-dimensional arrays with lower bounds of zero are treated as multi-dimensional arrays whose lower bounds just happen to be zero (that is, as an example of the slower case) rather than the faster case being able to handle ranks greater than 1.
Of course, .NET could have had a fast-path case for zero-based multi-dimensional arrays, but then all the reasons for Java not having them apply and the fact that there's already one special case, and special cases suck, and then there would be two special cases and they would suck more. (As it is, one can have some issues with trying to assign a value of one type to a variable of the other type).
Not a single thing above shows clearly that Java couldn't possibly have had the sort of multi-dimensional array you talk of; it would have been a sensible enough decision, but so also the decision that was made was also sensible.
This should be a question to James Gosling, I suppose. The initial design of Java was about OOP and simplicity, not about speed.
If you have a better idea of how multidimensional arrays should work, there are several ways of bringing it to life:
Submit a JDK Enhancement Proposal.
Develop a new JSR through Java Community Process.
Propose a new Project.
UPD. Of course, you are not the first to question the problems of Java arrays design.
For instance, projects Sumatra and Panama would also benefit from true multidimensional arrays.
"Arrays 2.0" is John Rose's talk on this subject at JVM Language Summit 2012.
To me it looks like you sort of answered the question yourself:
... an incentive to write it as a flat array, even if that makes the unnatural and hard to read.
So write it as a flat array which is easy to read. With a trivial helper like
double get(int row, int col) {
return data[rowLength * row + col];
}
and similar setter and possibly a +=-equivalent, you can pretend you're working with a 2D array. It's really no big deal. You can't use the array notation and everything gets verbose and ugly. But that seems to be the Java way. It's exactly the same as with BigInteger or BigDecimal. You can't use braces for accessing a Map, that's a very similar case.
Now the question is how important all those features are? Would more people be happy if they could write x += BigDecimal.valueOf("123456.654321") + 10;, or spouse["Paul"] = "Mary";, or use 2D arrays without the boilerplate, or what? All of this would be nice and you could go further, e.g., array slices. But there's no real problem. You have to choose between verbosity and inefficiency as in many other cases. IMHO, the effort spent on this feature can be better spent elsewhere. Your 2D arrays are a new best as....
Java actually has no 2D primitive arrays, ...
it's mostly a syntactic sugar, the underlying thing is array of objects.
double[][] a = new double[1][1];
Object[] b = a;
As arrays are reified, the current implementation needs hardly any support. Your implementation would open a can of worms:
There are currently 8 primitive types, which means 9 array types, would a 2D array be the tenth? What about 3D?
There is a single special object header type for arrays. A 2D array could need another one.
What about java.lang.reflect.Array? Clone it for 2D arrays?
Many other features would have be adapted, e.g. serialization.
And what would
??? x = {new int[1], new int[2]};
be? An old-style 2D int[][]? What about interoperability?
I guess, it's all doable, but there are simpler and more important things missing from Java. Some people need 2D arrays all the time, but many can hardly remember when they used any array at all.
I am unable to reproduce the performance benefits you claim. Specifically, the test program:
public abstract class Benchmark {
final String name;
public Benchmark(String name) {
this.name = name;
}
abstract int run(int iterations) throws Throwable;
private BigDecimal time() {
try {
int nextI = 1;
int i;
long duration;
do {
i = nextI;
long start = System.nanoTime();
run(i);
duration = System.nanoTime() - start;
nextI = (i << 1) | 1;
} while (duration < 1000000000 && nextI > 0);
return new BigDecimal((duration) * 1000 / i).movePointLeft(3);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
#Override
public String toString() {
return name + "\t" + time() + " ns";
}
public static void main(String[] args) throws Exception {
final int[] flat = new int[100*100*100];
final int[][][] multi = new int[100][100][100];
Random chaos = new Random();
for (int i = 0; i < flat.length; i++) {
flat[i] = chaos.nextInt();
}
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
for (int k=0; k<multi[0][0].length; k++)
multi[i][j][k] = chaos.nextInt();
Benchmark[] marks = {
new Benchmark("flat") {
#Override
int run(int iterations) throws Throwable {
long total = 0;
for (int j = 0; j < iterations; j++)
for (int i = 0; i < flat.length; i++)
total += flat[i];
return (int) total;
}
},
new Benchmark("multi") {
#Override
int run(int iterations) throws Throwable {
long total = 0;
for (int iter = 0; iter < iterations; iter++)
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
for (int k=0; k<multi[0][0].length; k++)
total+=multi[i][j][k];
return (int) total;
}
},
new Benchmark("multi (idiomatic)") {
#Override
int run(int iterations) throws Throwable {
long total = 0;
for (int iter = 0; iter < iterations; iter++)
for (int[][] a : multi)
for (int[] b : a)
for (int c : b)
total += c;
return (int) total;
}
}
};
for (Benchmark mark : marks) {
System.out.println(mark);
}
}
}
run on my workstation with
java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)
prints
flat 264360.217 ns
multi 270303.246 ns
multi (idiomatic) 266607.334 ns
That is, we observe a mere 3% difference between the one-dimensional and the multi-dimensional code you provided. This difference drops to 1% if we use idiomatic Java (specifically, an enhanced for loop) for traversal (probably because bounds checking is performed on the same array object the loop dereferences, enabling the just in time compiler to elide bounds checking more completely).
Performance therefore seems an inadequate justification for increasing the complexity of the language. Specifically, to support true multi dimensional arrays, the Java programming language would have to distinguish between arrays of arrays, and multidimensional arrays.
Likewise, programmers would have to distinguish between them, and be aware of their differences. API designers would have to ponder whether to use an array of arrays, or a multidimensional array. The compiler, class file format, class file verifier, interpreter, and just in time compiler would have to be extended. This would be particularly difficult, because multidimensional arrays of different dimension counts would have an incompatible memory layout (because the size of their dimensions must be stored to enable bounds checking), and can therefore not be subtypes of each other. As a consequence, the methods of class java.util.Arrays would likely have to be duplicated for each dimension count, as would all otherwise polymorphic algorithms working with arrays.
To conclude, extending Java to support multidimensional arrays would offer negligible performance gain for most programs, but require non-trivial extensions to its type system, compiler and runtime environment. Introducing them would therefore have been at odds with the design goals of the Java programming language, specifically that it be simple.
Since this question is to a great extent about performance, let me contribute a proper JMH-based benchmark. I have also changed some things to make your example both simpler and the performance edge more prominent.
In my case I compare a 1D array with a 2D-array, and use a very short inner dimension. This is the worst case for the cache.
I have tried with both long and int accumulator and saw no difference between them. I submit the version with int.
#OutputTimeUnit(TimeUnit.NANOSECONDS)
#BenchmarkMode(Mode.AverageTime)
#OperationsPerInvocation(X*Y)
#Warmup(iterations = 30, time = 100, timeUnit=MILLISECONDS)
#Measurement(iterations = 5, time = 1000, timeUnit=MILLISECONDS)
#State(Scope.Thread)
#Threads(1)
#Fork(1)
public class Measure
{
static final int X = 100_000, Y = 10;
private final int[] single = new int[X*Y];
private final int[][] multi = new int[X][Y];
#Setup public void setup() {
final ThreadLocalRandom rnd = ThreadLocalRandom.current();
for (int i=0; i<single.length; i++) single[i] = rnd.nextInt();
for (int i=0; i<multi.length; i++)
for (int j=0; j<multi[0].length; j++)
multi[i][j] = rnd.nextInt();
}
#Benchmark public long sumSingle() { return sumSingle(single); }
#Benchmark public long sumMulti() { return sumMulti(multi); }
public static long sumSingle(int[] arr) {
int total = 0;
for (int i=0; i<arr.length; i++)
total+=arr[i];
return total;
}
public static long sumMulti(int[][] arr) {
int total = 0;
for (int i=0; i<arr.length; i++)
for (int j=0; j<arr[0].length; j++)
total+=arr[i][j];
return total;
}
}
The difference in performance is larger than what you have measured:
Benchmark Mode Samples Score Score error Units
o.s.Measure.sumMulti avgt 5 1,356 0,121 ns/op
o.s.Measure.sumSingle avgt 5 0,421 0,018 ns/op
That's a factor above three. (Note that the timing is reported per array element.)
I also note that there is no warmup involved: the first 100 ms are as fast as the rest. Apparently this is such a simple task that the interpreter already does all it takes to make it optimal.
Update
Changing sumMulti's inner loop to
for (int j=0; j<arr[i].length; j++)
total+=arr[i][j];
(note arr[i].length) resulted in a significant speedup, as predicted by maaartinus. Using arr[0].length makes it impossible to eliminate the index range check. Now the results are as follows:
Benchmark Mode Samples Score Error Units
o.s.Measure.sumMulti avgt 5 0,992 ± 0,066 ns/op
o.s.Measure.sumSingle avgt 5 0,424 ± 0,046 ns/op
If you want a fast implementation of a true multi-dimentional array you could write a custom implementation like this. But you are right... it is not as crisp as the array notation. Although, a neat implementation could be quite friendly.
public class MyArray{
private int rows = 0;
private int cols = 0;
String[] backingArray = null;
public MyArray(int rows, int cols){
this.rows = rows;
this.cols = cols;
backingArray = new String[rows*cols];
}
public String get(int row, int col){
return backingArray[row*cols + col];
}
... setters and other stuff
}
Why is it not the default implementation?
The designers of Java probably had to decide how the default notation of the usual C array syntax would behave. They had a single array notation which could either implement arrays-of-arrays or true multi-dimentional arrays.
I think early Java designers were really concerned with Java being safe. Lot of decisions seem to have been taken to make it difficult for the average programmer(or a good programmer on a bad day) to not mess up something . With true multi-dimensional arrays, it is easier for users to waste large chunks of memory by allocating blocks where they are not useful.
Also, from Java's embedded systems roots, they probably found that it was more likely to find pieces of memory to allocate rather than large chunks of memory required for true multi-dimentional objects.
Of course, the flip side is that places where multi-dimensional arrays really make sense suffer. And you are forced to use a library and messy looking code to get your work done.
Why is it still not included in the language?
Even today, true multi-dimensional arrays are a risk from the the point of view of possible of memory wastage/misuse.
Basically what I want is to I skip elements on those index values which are there in the set otherwise I should just push the old array elements into the new array.
So if my set contains [2, 4, 9, 10] I should skip the values at index 2,4,9,10 in the old Array and put the values at othe other index locations in my new Array.
I am writing code like this
int[] newArr = new int[oldArray.length - set.size()];
for(int i = 0, j = 0; j < newArr.length && i < oldArray.length; j++,i++){
if(set.contains(i) )
i++;
else
newArray[j] = oldArray[i];
}
I am creating and filling my set like this
Set<Integer> commonSet = new HashSet<>();
for(int i = 0; i < array1; i++ ){
for(int j= 0; j < array2; j++) {
if(arrayOne[i] == arrayTwo[j]){
commonSet.add(i);// Here I am saving the indices.
}
}
}
Not Sure if this is the best way. Is there any other way which would be more efficient?
Or I must have to resort to Collection classes like ArrayLists.
Using Collection classes instead of arrays would make your code much simpler.
Doing array subtraction using common libraries like apache CollectionUtils looks like this:
Collection<Integer> diff = CollectionUtils.subtract(Arrays.asList(array1), Arrays.asList(array2));
Unless you're going to be working very large sets of data, it won't have a noticeable impact on speed.
Also, creating a set of different indexes the way you do above is going to scale very poorly for larger data sets. Just calculating the times for doing a difference using CollectionUtils.subtract() vs your set creation code shows the scaling problems (arrays filled with random Integers):
array1.length = 1000
array2.length = 1000
diff.size() = 530
elapsed ms for calculating diff = 39
set creation ms = 7
array1.length = 10000
array2.length = 10000
diff.size() = 5182
elapsed ms for calculating diff = 47
set creation ms = 519
array1.length = 50000
array2.length = 50000
diff.size() = 26140
elapsed ms for calculating diff = 101
set creation ms = 12857
array1.length = 1000000
array2.length = 1000000
diff.size() = 524142
elapsed ms for calculating diff = 1167
(didn't bother to wait for the program to finish)
As you can see, doing a double loop to compare every element scales quite poorly, and that's not even counting the subtraction you'll have to do afterwards.
EDIT updated to reflect changes in the question
If you're worried about performance, definitely do not use any list or collection classes. They are notorious for re-allocating arrays frequently as they need more capacity, which is a very slow operation.
Unfortunately, I don't know how you create/fill the set of indices. If it is possible for you to have your set in an array as well and generate it in such a way that its entries are sorted, you can optimize your code significantly.
If set is fairly long compared to oldArray, do this (this assumes no duplicate entries in set!):
int l = oldArray.length; // Cache length (some compilers might do this for you)
for (int i=0, j=0, k=0; i<l; i++) {
if (set[k]==i) {
k++;
} else {
newArr[j++] = oldArray[i];
}
}
If set is fairly short, do this (this can handle duplicate entries, but set still needs to be sorted):
int o1=0;
int o2=0;
for (int p:set) {
System.arraycopy(oldArray, o1, newArr, o2, p-o1);
o1+=p+1;
o2+=p;
}
System.arraycopy(oldArray, o1, newArray, o2, oldArray.length-o1);
The former avoids function calls and the latter banks on the optimized memory-copy implementation of System.arraycopy(...) (and set can be any sorted Iterable, although an array will be faster).
Which one is faster will depend on the exact sizes of your arrays and which system (CPU, JVM) you use.
If set is not sorted, you can either use your approach (debugged, of course) or you can sort it first and then use one of the approaches here. Again, which one will give you better performance will depend on the size of set and your system.
This piece of code is doing it for me.
Thanks # Patricia Shanahan
int j = 0, i = 0;
while( j < newArr.length && i < oldArray.length){
if(commonSet.contains(i)){
i++;
}
else{
diffArray[j] = arrayOne[i];
j++;
i++;
}
}