How do I efficiently iterate through a big list?

How do I efficiently iterate through a big list? - java

I want to make a open world 2D Minecraft like game and have the world load in Chunks (just like MC) with a size of 16x16 blocks (a total of 256 blocks). But I found out through iterating 256 times that it takes almost 20ms to iterate completely with a code like this:
long time = System.nanoTime();
for(int i = 0; i < 16*16; i++)
{
System.out.println(i);
}
System.out.println(System.nanoTime() - time);
And since I'm not only going to print numbers but also get a block, get it's texture and draw that texture onto the frame, I fear it might take even longer to iterate. Maybe I just exaggerate a bit, but is there a way to iterate faster?

It's not the iteration that takes 20ms, it's println();.
The following will be much faster:
long time = System.nanoTime();
StringBuilder sb = new StringBuilder();
for(int i = 0; i < 16*16; i++)
{
sb.append(i + System.getProperty("line.separator"));
}
System.out.println(sb);
System.out.println(System.nanoTime() - time);

So, first off, take into account that a list with 256 is not considered generally to have a big size.
The main thing consuming time your code is not iterating through the list but using System.out.println(). Printing to console (or any I/O action) tends to take longer than other instructions.
When I try your code locally I get roughly 6 ms but if I do something like this:
long secondStart = System.nanoTime();
StringBuffer stringBuffer = new StringBuffer();
for(int i = 0; i < 16*16; i++)
{
stringBuffer.append(i);
stringBuffer.append("\n");
}
System.out.println(stringBuffer);
System.out.println(System.nanoTime() - secondStart);
I get 0.5ms.
If that approach is not suitable for your needs then you would need to do as other comments say, consider traversing different parts of the list in parallel, maybe move to a different kind of traversal or even a different kind of structure.
Hope this helps.

You should ask yourself if you really need to do all that work. Do you need to draw things that are not seen by the camera for example? Of course not, so exclude every block in that chunk that is outside the camera rect.
Filtering out the blocks not seen implies some overhead but it is generally worth it compared to drawing every block in the chunk on each render update because drawing stuff is quite a heavy operation.
If you only want to speed up the traversal you could spawn threads that traverse the chunk in parallell or buy better hardware. But it is better to start with the question of how you could achieve the same result with less work put in.
On the other hand your computer should probably be able to draw 256 textures without problem especially if done on the gpu. So maybe do some testing before making premature optimizations.
PS. It isn't really the traversal itself you want to optimize for but rather the work done in each iteration. Just iterating 256 times is going to be quite fast.

Related

execution of task in java within specified time

I want to execute few lines of code with 5ms in Java. Below is the snippet of my code:
public void delay(ArrayList<Double> delay_array, int counter_main) {
long start=System.currentTimeMillis();
ArrayList<Double> delay5msecs=new ArrayList<Double>();
int index1=0, i1=0;
while(System.currentTimeMillis() - start <= 5)
{
delay5msecs.add(i1,null);
//System.out.println("time");
i1++;
}
for(int i=0;i<counter_main-1;i++) {
if(delay5msecs.get(i)!=null) {
double x1=delay_array.get(i-index1);
delay5msecs.add(i,x1);
//System.out.println(i);
} else {
index1++;
System.out.println("index is :"+index1);
}
}
}
Now the problem is that the entire array is getting filled with null values and I am getting some exceptions related to index as well. Basically, I want to fill my array list with 0 till 5ms and post that fill the data from another array list in it. I've not done coding since a long time. Appreciate your help.
Thank You.

System.currentTimeMillis() will probably not have the resolution you need for 5ms. The granularity on Windows may not be better than 15ms anyway, so your code will be very platform sensitive, and may actually not do what you want.
The resolution you need might be doable with System.nanoTime() but, again, there are platform limitations you might have to research. I recall that you can't just scale the value you get and have it work everywhere.
If you can guarantee no other threads running this code, then I suppose a naive loop and fill will work, without having to implement a worker thread that waits for the filler thread to finish.
You should try to use the Collection utilities and for-each loops instead of doing all this index math in the second part.
I suppose I should also warn you that nothing in a regular JVM is guaranteed to be real-time. So if you need a hard, dependable, reproducible 5ms you might be out of luck.

Significant slower processing as Set size goes beyond 500.000

I'm not used to working with really large datasets and I'm kind of stumped here.
I have the following code:
private static Set<String> extractWords(BufferedReader br) throws IOException {
String strLine;
String tempWord;
Set<String> words = new HashSet<String>();
Utils utils = new Utils();
int articleCounter = 0;
while(((strLine = br.readLine()) != null)){
if(utils.lineIsNotCommentOrLineChange(strLine)){
articleCounter++;
System.out.println("Working article : " + utils.getArticleName(strLine) + " *** Article #" + articleCounter + " of 3.769.926");
strLine = utils.removeURLs(strLine);
strLine = utils.convertUnicode(strLine);
String[] temp = strLine.split("\\W+");
for(int i = 0; i < temp.length; i++){
tempWord = temp[i].trim().toLowerCase();
if(utils.validateWord(tempWord)){
words.add(tempWord);
System.out.println("Added word " + tempWord + " to list");
}
}
}
}
return words;
}
This basically gets a huge text file from the BufferedReader where each line of text is a text from an article. I want to make a list of unique words in this text file, but there are 3.769.926 articles in there, so the word count is quite immense.
From what I understand about Sets, or specifically HashSets, this should be the man for the job so to speak. Everything runs quite smoothly at first, but after 500.000 articles it starts slowing down a bit. When it reaches 700.000 its beginning to get slow enough that it basically stops for a second of two before going on again. There's a bottleneck here somewhere, and I can't see what it is..
Any ideas?

I believe the issue you may be facing is that a Hash Table(set or map) has to be backed by a fixed number of entries it can hold. So your first declaration may have a table able to hold 16 entries. Putting aside things like load factors, once you tried to put 17 entries into the table, it has to grow to accommodate more entries to prevent collisions, so Java will expand it for you.
This expansion includes creating a new table with 2 * previousSize entries, then copying over the old entries. So if you are constantly expanding, you may end up hitting an area, like
524,288 where it will have to grow, but it will create a new table able to handle 1,048,576 entries, but it will have to copy over the entire previous table.
If you don't mind the extra look up time, you might think about using a TreeSet instead of a HashSet. You lookups will now be logarithmic time, but a Tree doesn't have a pre-allocated table and can grow dynamically easily. Either use this, or declare the size of your HashSet so it won't grow dynamically.

Honestly for that sort of scale you are better off going over to a database. You can embed Derby inside your application if you don't want to use a separate one.
Their indexing systems are optimised for this sort of scale, and while HashSet etc will cope if you massage them right you are better off using the right tool for it.

As noted by TheSageMage, the HashSet implementation will constantly resize the underlying HashMap as the data grows. There are a couple of ways of getting around that: initial capacity and load factor. You can set both by using the 2-arg constructor: HashSet(int, float). If you know the approximate number of words you are going to need, you can set the initial capacity to be bigger than that number. This will make smaller maps work a little slower, but will prevent dramatic slow-down for larger maps. The load factor is how full the map must get before increasing the underlying size rehashing. Since this is a relatively time-consuming operation for large maps, you may want to set it to a large fraction, say 0.9. If your initial capacity was set so that you may exceed it but will never exceed twice that size, a large load factor will guarantee that you rehash only once and as late as possible.

Snake self-eating optimization

I write my own Snake-game, where Snake is ArrayList of Points and I use this method to check self-eating:
public void checkSelfEating() {
for (int i = 1; i < body.size(); i++) {
if (body.get(i).equals(body.get(0))) {
sgv.setGameOverState(true);
sgv.setMessage("Game over!");
System.out.println("SelfEatingdetected");
}
}
}
Video (Started at 35 s.)
But it is too slow, and snake do about 5 moves until game is over. Is there a better solution?

Store the body units in a HashSet via add and remove calls. O(1). Furthermore if you use a LinkedHashSet it will be very easy to manage the head and tail (per comment).
This all being said, while this is the correct data structure and answers your question, I have absolutely no idea why having to do a for loop over a few dozen elements or so is making your program so horribly slow. I strongly recommend profiling and finding the actual bottleneck as I'm not even sure a hash set will be faster at this scale.

Time based reservoir sampling in java?

I've devised a way to do reservoir sampling in java, the code I used is here.
I've put in a huge file to be read now, and it takes about 40 seconds to read the lot before out putting the results to screen, and then reading the lot again. The file is too big to store in memory and just pick a random sample from that.
I was hoping I could write an extra while loop in there to get it to out put my reservoirList at a set period of time, and not just after it finished scanning the file.
Something like:
long startTime = System.nanoTime();
timeElapsed = 0;
while(sc.hasNext()) //avoid end of file
do{
long currentTime = System.nanoTime();
timeElapsed = (int) TimeUnit.MILLISECONDS.convert(startTime-currentTime,
TimeUnit.NANOSECONDS);
//sampling code goes here
}while(timeElapsed%5000!=0)
return reservoirList;
} return reservoirList;
But this outputs a bunch (not the full length of my ReservoirList) of lines and then a whole stream (a few hundred?) of the same line.
Is there a more elegant way to do this? One that, perhaps, works if possible.

I've cheated. For now I'm outputting every X lines read from file, where X is large enough to give me a nice time delay between each sample. I use the count from the sampling program to work out when this is.
do {
//sampling which includes a count++
}while(count%5000!=0)
One final note, I intialise counts to 1 to stop it outputting the first ten lines as a sample.
If anyone has a better, time based, solution, let me know.

Array access optimization

I have a 10x10 array in Java, some of the items in array which are not used, and I need to traverse through all elements as part of a method. What Would be better to do :
Go through all elements with 2 for loops and check for the nulltype to avoid errors, e.g.
for(int y=0;y<10;y++){
for(int x=0;x<10;x++){
if(array[x][y]!=null)
//perform task here
}
}
Or would it be better to keep a list of all the used addresses... Say an arraylist of points?
Something different I haven't mentioned.
I look forward to any answers :)

Any solution you try needs to be tested in controlled conditions resembling as much as possible the production conditions. Because of the nature of Java, you need to exercise your code a bit to get reliable performance stats, but I'm sure you know that already.
This said, there are several things you may try, which I've used to optimize my Java code with success (but not on Android JVM)
for(int y=0;y<10;y++){
for(int x=0;x<10;x++){
if(array[x][y]!=null)
//perform task here
}
}
should in any case be reworked into
for(int x=0;x<10;x++){
for(int y=0;y<10;y++){
if(array[x][y]!=null)
//perform task here
}
}
Often you will get performance improvement from caching the row reference. Let as assume the array is of the type Foo[][]:
for(int x=0;x<10;x++){
final Foo[] row = array[x];
for(int y=0;y<10;y++){
if(row[y]!=null)
//perform task here
}
}
Using final with variables was supposed to help the JVM optimize the code, but I think that modern JIT Java compilers can in many cases figure out on their own whether the variable is changed in the code or not. On the other hand, sometimes this may be more efficient, although takes us definitely into the realm of microoptimizations:
Foo[] row;
for(int x=0;x<10;x++){
row = array[x];
for(int y=0;y<10;y++){
if(row[y]!=null)
//perform task here
}
}
If you don't need to know the element's indices in order to perform the task on it, you can write this as
for(final Foo[] row: array){
for(final Foo elem: row
if(elem!=null)
//perform task here
}
}
Another thing you may try is to flatten the array and store the elements in Foo[] array, ensuring maximum locality of reference. You have no inner loop to worry about, but you need to do some index arithmetic when referencing particular array elements (as opposed to looping over the whole array). Depending on how often you do it, it may or not be beneficial.
Since most of the elements will be not-null, keeping them as a sparse array is not beneficial for you, as you lose locality of reference.
Another problem is the null test. The null test itself doesn't cost much, but the conditional statement following it does, as you get a branch in the code and lose time on wrong branch predictions. What you can do is to use a "null object", on which the task will be possible to perform but will amount to a non-op or something equally benign. Depending on the task you want to perform, it may or may not work for you.
Hope this helps.

You're better off using a List than an array, especially since you may not use the whole set of data. This has several advantages.
You're not checking for nulls and may not accidentally try to use a null object.
More memory efficient in that you're not allocating memory which may not be used.

For a hundred elements, it's probably not worth using any of the classic sparse array
implementations. However, you don't say how sparse your array is, so profile it and see how much time you spend skipping null items compared to whatever processing you're doing.
( As Tom Hawtin - tackline mentions ) you should, when using an array of arrays, try to loop over members of each array rather than than looping over the same index of different arrays. Not all algorithms allow you to do that though.
for ( int x = 0; x < 10; ++x ) {
for ( int y = 0; y < 10; ++y ) {
if ( array[x][y] != null )
//perform task here
}
}
or
for ( Foo[] row : array ) {
for ( Foo item : row ) {
if ( item != null )
//perform task here
}
}
You may also find it better to use a null object rather than testing for null, depending what the complexity of the operation you're performing is. Don't use the polymorphic version of the pattern - a polymorphic dispatch will cost at least as much as a test and branch - but if you were summing properties having an object with a zero is probably faster on many CPUs.
double sum = 0;
for ( Foo[] row : array ) {
for ( Foo item : row ) {
sum += item.value();
}
}
As to what applies to android, I'm not sure; again you need to test and profile for any optimisation.

Holding an ArrayList of points would be "over engineering" the problem. You have a multi-dimensional array; the best way to iterate over it is with two nested for loops. Unless you can change the representation of the data, that's roughly as efficient as it gets.
Just make sure you go in row order, not column order.

Depends on how sparse/dense your matrix is.
If it is sparse, you better store a list of points, if it is dense, go with the 2D array. If in between, you can have a hybrid solution storing a list of sub-matrices.
This implementation detail should be hidden within a class anyway, so your code can also anytime convert between any of these representations.
I would discourage you from settling on any of these solutions without profiling with your real application.

I agree an array with a null test is the best approach unless you expect sparsely populated arrays.
Reasons for this:
1- More memory efficient for dense arrays (a list needs to store the index)
2- More computationally efficient for dense arrays (You need only compare the value you just retrieved to NULL, instead of having to also get the index from memory).
Also, a small suggestion, but in Java especially you are often better off faking a multi dimensional array with a 1D array where possible (square/rectangluar arrays in 2D). Bounds checking only happens once per iteration, instead of twice. Not sure if this still applies in the android VMs, but it has traditionally been an issue. Regardless, you can ignore it if the loop is not a bottleneck.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I efficiently iterate through a big list? - java

Related

execution of task in java within specified time

Significant slower processing as Set size goes beyond 500.000

Snake self-eating optimization

Time based reservoir sampling in java?

Array access optimization

Categories

Resources