Java: most efficient way to defensively copy an int[]? - java

I have an interface DataSeries with a method
int[] getRawData();
For various reasons (primarily because I'm using this with MATLAB, and MATLAB handles int[] well) I need to return an array rather than a List.
I don't want my implementing classes to return the int[] array because it is mutable. What is the most efficient way to copy an int[] array (sizes in the 1000-1000000 length range) ? Is it clone()?

The only alternative is Arrays#copyOf() (which uses System#arrayCopy() under the hoods).
Just test it.
package com.stackoverflow.q2830456;
import java.util.Arrays;
import java.util.Random;
public class Test {
public static void main(String[] args) throws Exception {
Random random = new Random();
int[] ints = new int[100000];
for (int i = 0; i < ints.length; ints[i++] = random.nextInt());
long st = System.currentTimeMillis();
test1(ints);
System.out.println(System.currentTimeMillis() - st);
st = System.currentTimeMillis();
test2(ints);
System.out.println(System.currentTimeMillis() - st);
}
static void test1(int[] ints) {
for (int i = 0; i < ints.length; i++) {
ints.clone();
}
}
static void test2(int[] ints) {
for (int i = 0; i < ints.length; i++) {
Arrays.copyOf(ints, ints.length);
}
}
}
20203
20131
and when test1() and test2() are swapped:
20157
20275
The difference is negligible. I'd say, just go for clone() since that is better readable and Arrays#copyOf() is Java 6 only.
Note: actual results may depend on platform and JVM used, this was tested at an Dell Latitude E5500 with Intel P8400, 4GB PC2-6400 RAM, WinXP, JDK 1.6.0_17_b04

No one ever solved their app's performance problems by going through and changing arraycopy() calls to clone() or vice versa.
There is no one definitive answer to this question. It isn't just that it might be different on different VMs, versions, operating systems and hardware: it really is different.
I benchmarked it anyway, on a very recent OpenJDK (on a recent ubuntu) and found that arraycopy is much faster. So is this my answer to you? NO! Because if it proves to be true, there's a bug with the intrinsification of Arrays.copyOf, and that bug will likely get fixed, so this information is only transient to you.

http://www.javapractices.com/topic/TopicAction.do?Id=3
the numbers will likely be different depending on your specs, but seems clone is the best choice.

Related

HashMap performs better than array? [duplicate]

Is it (performance-wise) better to use Arrays or HashMaps when the indexes of the Array are known? Keep in mind that the 'objects array/map' in the example is just an example, in my real project it is generated by another class so I cant use individual variables.
ArrayExample:
SomeObject[] objects = new SomeObject[2];
objects[0] = new SomeObject("Obj1");
objects[1] = new SomeObject("Obj2");
void doSomethingToObject(String Identifier){
SomeObject object;
if(Identifier.equals("Obj1")){
object=objects[0];
}else if(){
object=objects[1];
}
//do stuff
}
HashMapExample:
HashMap objects = HashMap();
objects.put("Obj1",new SomeObject());
objects.put("Obj2",new SomeObject());
void doSomethingToObject(String Identifier){
SomeObject object = (SomeObject) objects.get(Identifier);
//do stuff
}
The HashMap one looks much much better but I really need performance on this so that has priority.
EDIT: Well Array's it is then, suggestions are still welcome
EDIT: I forgot to mention, the size of the Array/HashMap is always the same (6)
EDIT: It appears that HashMaps are faster
Array: 128ms
Hash: 103ms
When using less cycles the HashMaps was even twice as fast
test code:
import java.util.HashMap;
import java.util.Random;
public class Optimizationsest {
private static Random r = new Random();
private static HashMap<String,SomeObject> hm = new HashMap<String,SomeObject>();
private static SomeObject[] o = new SomeObject[6];
private static String[] Indentifiers = {"Obj1","Obj2","Obj3","Obj4","Obj5","Obj6"};
private static int t = 1000000;
public static void main(String[] args){
CreateHash();
CreateArray();
long loopTime = ProcessArray();
long hashTime = ProcessHash();
System.out.println("Array: " + loopTime + "ms");
System.out.println("Hash: " + hashTime + "ms");
}
public static void CreateHash(){
for(int i=0; i <= 5; i++){
hm.put("Obj"+(i+1), new SomeObject());
}
}
public static void CreateArray(){
for(int i=0; i <= 5; i++){
o[i]=new SomeObject();
}
}
public static long ProcessArray(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkArray(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkArray(String Identifier) {
SomeObject object;
if(Identifier.equals("Obj1")){
object=o[0];
}else if(Identifier.equals("Obj2")){
object=o[1];
}else if(Identifier.equals("Obj3")){
object=o[2];
}else if(Identifier.equals("Obj4")){
object=o[3];
}else if(Identifier.equals("Obj5")){
object=o[4];
}else if(Identifier.equals("Obj6")){
object=o[5];
}else{
object = new SomeObject();
}
object.kill();
}
public static long ProcessHash(){
StopWatch sw = new StopWatch();
sw.start();
for(int i = 1;i<=t;i++){
checkHash(Indentifiers[r.nextInt(6)]);
}
sw.stop();
return sw.getElapsedTime();
}
private static void checkHash(String Identifier) {
SomeObject object = (SomeObject) hm.get(Identifier);
object.kill();
}
}
HashMap uses an array underneath so it can never be faster than using an array correctly.
Random.nextInt() is many times slower than what you are testing, even using array to test an array is going to bias your results.
The reason your array benchmark is so slow is due to the equals comparisons, not the array access itself.
HashTable is usually much slower than HashMap because it does much the same thing but is also synchronized.
A common problem with micro-benchmarks is the JIT which is very good at removing code which doesn't do anything. If you are not careful you will only be testing whether you have confused the JIT enough that it cannot workout your code doesn't do anything.
This is one of the reason you can write micro-benchmarks which out perform C++ systems. This is because Java is a simpler language and easier to reason about and thus detect code which does nothing useful. This can lead to tests which show that Java does "nothing useful" much faster than C++ ;)
arrays when the indexes are know are faster (HashMap uses an array of linked lists behind the scenes which adds a bit of overhead above the array accesses not to mention the hashing operations that need to be done)
and FYI HashMap<String,SomeObject> objects = HashMap<String,SomeObject>(); makes it so you won't have to cast
For the example shown, HashTable wins, I believe. The problem with the array approach is that it doesn't scale. I imagine you want to have more than two entries in the table, and the condition branch tree in doSomethingToObject will quickly get unwieldly and slow.
Logically, HashMap is definitely a fit in your case. From performance standpoint is also wins since in case of arrays you will need to do number of string comparisons (in your algorithm) while in HashMap you just use a hash code if load factor is not too high. Both array and HashMap will need to be resized if you add many elements, but in case of HashMap you will need to also redistribute elements. In this use case HashMap loses.
Arrays will usually be faster than Collections classes.
PS. You mentioned HashTable in your post. HashTable has even worse performance thatn HashMap. I assume your mention of HashTable was a typo
"The HashTable one looks much much
better "
The example is strange. The key problem is whether your data is dynamic. If it is, you could not write you program that way (as in the array case). In order words, comparing between your array and hash implementation is not fair. The hash implementation works for dynamic data, but the array implementation does not.
If you only have static data (6 fixed objects), array or hash just work as data holder. You could even define static objects.

BigDecimal.valueOf caching mechanism

I heard that the BigDecimal.valueOf() method is better than calling new BigDecimal() because it caches common values. I wanted to know how the caching mechanism of valueOf works.
Looking at the JDK 1.8 sources, it looks like it's just a static array which is initialized as part of class initialization - it only caches the values 0 to 10 inclusive, but that's an implementation detail. For example, given dasblinkenlight's post, it looks like earlier versions only cached 0 and 1.
For more detail - and to make sure you're getting information about the JDK which you're actually running - look at the source of the JDK you're using for yourself - most IDEs will open the relevant source code automatically, if they detect the source archive has been included in your JDK installation. Of course, if you're using a different JRE at execution time, you'd need to validate that too.
It's easy to tell whether or not a value has been cached, based on reference equality. Here's a short but complete program which finds the first non-negative value which isn't cached:
import java.math.BigDecimal;
public class Test {
public static void main(String[] args) {
for (long x = 0; x < Long.MAX_VALUE; x++) {
if (BigDecimal.valueOf(x) != BigDecimal.valueOf(x)) {
System.out.println("Value for " + x + " wasn't cached");
break;
}
}
}
}
On my machine with Java 8, the output is:
Value for 11 wasn't cached
Of course, an implementation could always cache the most recently requested value, in which case the above code would run for a very long time and then finish with no output...
If I m not wrong it caches only zero,one, two and ten, Thats why we have only
public static final BigInteger ZERO = new BigInteger(new int[0], 0);
public static final BigInteger ONE = valueOf(1);
private static final BigInteger TWO = valueOf(2);
public static final BigInteger TEN = valueOf(10);
That also calls valueOf(x) method.

Java multiple instances of same class share the same running objects, but they shoud not

I'm doing a multithreaded Java optimization algorhithm which initiates various instances of the same subclass, for time improvement reason. This subclass have itself other subclasses.
The algorhthm searchs though the search space for an optimal solution, by means of random movements. So, if i run several instances of it, i should take advantage of my system's cores and improve the search widing the search space.
I've noticed that the first instance runs well, but others seems to share the running objects of the first, picking the information they hold, even when it has finished.
Thats not what i want; i want any of the instances be insulated for the others.
I'm using Executor Services:
Code:
ExecutorService executorService = Executors.newCachedThreadPool();
ExecutorCompletionService<float[][]> service = new ExecutorCompletionService<float[][]>(executorService);
IteratedGreedy[] ig = new IteratedGreedy[instances];
Future<float[][]>[] future = new Future[instances];
// launching instances:
for (int i=0; i<instances; i++)
{
path = "\\" + i + ".txt";
ig[i] = new IteratedGreedy(path);
future[i] = service.submit(ig[i]);
}
// retrieveing solutions:
for (int i=1; i<instances; i++)
{
solutions[i] = future[i].get();
}
As you may think, the IteratedGreedy function has its own sublcasses inside.
Any help is appreciated.
Problem is, somewhere in the code, theres a class with a global static variable:
static float[][] matrix;
And then, a method uses it:
SomeMethod()
{
int f = matrix[i][b];
}
The solution is to change the way the method obtains the object:
float[][] matrix;
SomeMethod(float[][] matrix)
{
int f = matrix[i][j];
}

FST (Finite-state transducers) Libraries, C++ or java

I have a problem to solve using FSTs.
Basically, I'll make a morphological parser, and in this moment i have to work with large transducers. The performance is The Big issue here.
Recently, i worked in c++ in other projects where the performance matters, but now, i'am considering java, because the java's benefits and because java is getting better.
I studied some comparisons between java and c++, but I cannot decide what language i should use for this specific problem because it depends on lib in use.
I canĀ“t find much information about java's libs, so, my question is: Are there any open source java libs in which the performance is good, like The RWTH FSA Toolkit that i read in an article that is the fastest c++ lib?
Thanks all.
What are the "benefits" of Java, for your purposes? What specific problem does that platform solve that you need? What is the performance constraint you must consider? Were the "comparisons" fair, because Java is actually extremely difficult to benchmark. So is C++, but you can at least get some algorithmic boundary guarantees from STL.
I suggest you look at OpenFst and the AT&T finite-state transducer tools. There are others out there, but I think your worry about Java puts the cart before the horse-- focus on what solves your problem well.
Good luck!
http://jautomata.sourceforge.net/ and http://www.cs.duke.edu/csed/jflap/ are based Java finite state machine libraries, although I don't have experience using them so I cannot comment on the efficiency.
I'm one of the developers of the morfologik-stemming library. It's pure Java and its performance is very good, both when you build the automaton and when you use it. We use it for morphological analysis in LanguageTool.
The problem here is the minimum size of your objects in Java. In C++, without virtual methods and run time type identification, your objects weight exactly their content. And the time your automata take to manipulate memory has a big impact on performance.
I think that should be the main reason for choosing C++ over Java.
OpenFST is a C++ finite state transducer framework that is really comprehensive. Some people from CMU ported it to Java for use in their natural language processing.
A blog post series describing it.
The code is located on svn.
Update:
I ported it to java here
Lucene has a excellent implementation of FST, which is easy to use and high performance, making query engines like Elasticsearch, Solr deliver very fast sub-second term based query.Let me take an example:
import com.google.common.base.Preconditions;
import org.apache.lucene.store.ByteArrayDataInput;
import org.apache.lucene.store.DataInput;
import org.apache.lucene.store.GrowableByteArrayDataOutput;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.IntsRefBuilder;
import org.apache.lucene.util.fst.Builder;
import org.apache.lucene.util.fst.FST;
import org.apache.lucene.util.fst.PositiveIntOutputs;
import org.apache.lucene.util.fst.Util;
import java.io.IOException;
public class T {
private final String inputValues[] = {"cat", "dog", "dogs"};
private final long outputValues[] = {5, 7, 12};
// https://lucene.apache.org/core/8_4_0/core/org/apache/lucene/util/fst/package-summary.html
public static void main(String[] args) throws IOException {
T t = new T();
FST<Long> fst = t.buildFSTInMemory();
System.out.println(String.format("memory used for fst is %d bytes", fst.ramBytesUsed()));
t.searchFST(fst);
byte[] bytes = t.serialize(fst);
System.out.println(String.format("length of serialized fst is %d bytes", bytes.length));
fst = t.deserialize(bytes);
t.searchFST(fst);
}
private FST<Long> buildFSTInMemory() throws IOException {
// Input values (keys). These must be provided to Builder in Unicode sorted order! Use Collections.sort() to sort inputValues first.
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
Builder<Long> builder = new Builder<Long>(FST.INPUT_TYPE.BYTE1, outputs);
BytesRef scratchBytes = new BytesRef();
IntsRefBuilder scratchInts = new IntsRefBuilder();
for (int i = 0; i < inputValues.length; i++) {
// scratchBytes.copyChars(inputValues[i]);
scratchBytes.bytes = inputValues[i].getBytes();
scratchBytes.offset = 0;
scratchBytes.length = inputValues[i].length();
builder.add(Util.toIntsRef(scratchBytes, scratchInts), outputValues[i]);
}
FST<Long> fst = builder.finish();
return fst;
}
private FST<Long> deserialize(byte[] bytes) throws IOException {
DataInput in = new ByteArrayDataInput(bytes);
PositiveIntOutputs outputs = PositiveIntOutputs.getSingleton();
FST<Long> fst = new FST<Long>(in, outputs);
return fst;
}
private byte[] serialize(FST<Long> fst) throws IOException {
final int capicity = 32;
GrowableByteArrayDataOutput out = new GrowableByteArrayDataOutput(capicity);
fst.save(out);
return out.getBytes();
}
private void searchFST(FST<Long> fst) throws IOException {
for (int i = 0; i < inputValues.length; i++) {
Long value = Util.get(fst, new BytesRef(inputValues[i]));
Preconditions.checkState(value == outputValues[i], "fatal error");
}
}
}

Better practice to re-instantiate a List or invoke clear()

Using Java (1.6) is it better to call the clear() method on a List or just re-instantiate the reference?
I have an ArrayList that is filled with an unknown number of Objects and periodically "flushed" - where the Objects are processed and the List is cleared. Once flushed the List is filled up again. The flush happens at a random time. The number within the List can potentially be small (10s of Objects) or large (millions of objects).
So is it better to have the "flush" call clear() or new ArrayList() ?
Is it even worth worrying about this sort of issues or should I let the VM worry about it? How could I go about looking at the memory footprint of Java to work this sort of thing out for myself?
Any help greatly appreciated.
The main thing to be concerned about is what other code might have a reference to the list. If the existing list is visible elsewhere, do you want that code to see a cleared list, or keep the existing one?
If nothing else can see the list, I'd probably just clear it - but not for performance reasons; just because the way you've described the operation sounds more like clearing than "create a new list".
The ArrayList<T> docs don't specify what happens to the underlying data structures, but looking at the 1.7 implementation in Eclipse, it looks like you should probably call trimToSize() after clear() - otherwise you could still have a list backed by a large array of null references. (Maybe that isn't an issue for you, of course... maybe that's more efficient than having to copy the array as the size builds up again. You'll know more about this than we do.)
(Of course creating a new list doesn't require the old list to set all the array elements to null... but I doubt that that will be significant in most cases.)
The way you are using it looks very much like how a Queue is used. When you work of the items on the queue they are removed when you treat them.
Using one of the Queue classes might make the code more elegant.
There are also variants which handle concurrent updates in a predictable way.
I think if the Arraylist is to be too frequently flushed,like if it's run continuously in loop or something then better use clear if the flushing is not too frequent then you may create a new instance.Also since you say that elements may vary from 10 object to millions you can probably go for an in-between size for each new Arraylist your creating so that the arraylist can avoid resizing a lot of time.
There is no advantage for list.clear() than new XXList.
Here is my investigation to compare performance.
import java.util.ArrayList;
import java.util.List;
public class ClearList {
public static void testClear(int m, int n) {
List<Integer> list = new ArrayList<>();
long start = System.currentTimeMillis();
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
list.add(Integer.parseInt("" + j + i));
}
list.clear();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void testNewInit(int m, int n) {
List<Integer> list = new ArrayList<>();
long start = System.currentTimeMillis();
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
list.add(Integer.parseInt("" + j + i));
}
list = new ArrayList<>();
}
System.out.println(System.currentTimeMillis() - start);
}
public static void main(String[] args) {
System.out.println("clear ArrayList:");
testClear(991000, 100);
System.out.println("new ArrayList:");
testNewInit(991000, 100);
}
}
/*--*
* Out:
*
* clear ArrayList:
* 8391
* new ArrayList:
* 6871
*/

Categories