I was wondering if anyone could answer my fairly noobish question of which is better practice to use when programming.
i know to make a 2 dimension array i can either use:
ArrayList<ArrayList<String>> myArray = new ArrayList<ArrayList<String>>();
or
String[][] myArray2;
Using ArrayList in the manner in which you showed is completely fine. However, only use Array list if you really need your array to grow and shrink at run time. IF you're not sure that you need the array to grow or shrink, then go ahead and just use the ArrayList to start. If you don't need the array to be dynamic, you should go with String[][]. String[][] cannot grow or shrink in size after it has been allocated though.
If your application is performance intensive, I strongly suggest trying to write the app such that String[][] is used. More importantly, try to avoid a lot of string manipulation because this will cause a lot of allocations (and thus garbage collections). If you can use a CharBuffer[][] instead, you will be able to avoid object allocations. Again though, if you're writing a simple app, and it doesn't matter all that much that your app allocates (and thus eventually causes garbage collections), going with the ArrayList makes things easier to handle.
Since you consider yourself a noob, you might want to just go ahead and start with the ArrayList because it's a little easier to deal with.
Also, this isn't really much of an Android question, just more of a Java question.
Since Lists can increase in size, they are the optimal solution if you are unsure, how big your 2 dimensional data strucute will be. If you know how big the structure will be, then use the normal array instead, so the little overhead what the List objects would cause wont be a problem.
Related
I am reading XML document into HashMaps, ArrayLists so that the relationship maintains even in the memory. My code does my job but i am worried about the iterations or function calls i am performing on this huge maps and lists. Currently the xml data i am working is not so huge. but i dont know what happens if it has. What are the testcases i need to perform on my logics that use these hashmaps? How bad is using a Java collections for such a huge data? Is there any alternatives for them? Will the huge data affect the JVM to crash?
Java collections have a certain overhead, which can increase the memory usage a lot (20 times in extreme cases) when they're the primary data structures of an application and the payload data consists of a large number of small objects. This could lead to the application terminating with an OutOfMemoryError even though the actual data is much smaller than the available memory.
ArrayList is actually very efficient for large numbers of elements, but inefficient when you have a large number of lists that are empty or contain only one element. For those cases, you could use Collections.emptyList() and Collections.singletonList() to improve efficiency.
HashMap has the same problem as well as a considerable overhead for each element stored in it. So the same advice applies as for ArrayList. If you have a large number of elements, there may be alternative Map implementations that are more efficient, e.g. Google Guava.
The biggest overheads happen when you store primitive values such as int or long in collections, as the need to be wrapped as objects. In those cases, the GNU Trove collections offer an alternative.
In your case specifically, the question is whether you really need to keep the entire data from the XML in memory at once, or whether you can process it in small chunks. This would probably be the best solution if your data can grow arbitrarily large.
The easiest short term solution would be to simply buy more memory. It's cheap.
JVM will not crash in what you describe. What may happen is an OutOfMemoryError. Also if you retain the data in those Collections for long you may have issues with the garbage collection. Do you really need to store the whole XML data in memory?
If you are dealing with temporary data and you need to have a fast access to it you do not have to many alternatives. The question is what do you mean when you say "huge"? MegaBytes? GigaBytes? TeraBytes?
While your data does not exceed 1G IMHO holding it in memory may be OK. Otherwise you should think about alternatives like DB (relational or NoSql) files etc.
In your specific example I'd think about replacing ArrayList to LinkedList unless you need random access list. ArrayList is just a wrapper over array, so when you need 1 million elements it allocates 1 million elements long array. Linked list is better for when number of elements is big but it is rate of access of element by index is o(n/2). If you need both (i.e. huge list and fast access) use TreeMap with index as a key instead. You will get log(n) access rate.
What are the testcases i need to perform on my logics that use these hashmaps?
Why not to generate large XML files (for example, 5 times larger, than your current data samples) and check your parsers/memory storages with them? Because only you knows what files are possible in your case, how fast will they grow, this is the only solution.
How bad is using a Java collections for such a huge data? Is there any
alternatives for them? Will the huge data affect the JVM to crash?
Of course, is it possible that you will have OutOfMemory exception if you try to store too much data in memory, and it is not eligible for GC. This library: http://trove.starlight-systems.com/ declares, that it uses less memory, but I didn't use it myself. Some discussion is available here: What is the most efficient Java Collections library?
How bad is using a Java collections for such a huge data?
Java Map implementations and (to a lesser extent) Collection implementations do tend to use a fair amount of memory. The effect is most pronounced when the key / value / element types are wrapper types for primitive types.
Is there any alternatives for them?
There are alternative implementations of "collections" of primitive types that use less memory; e.g. the GNU Trove libraries. But they don't implement the standard Java collection APIs, and that severely limits their usefulness.
If your collections don't use the primitive wrapper classes, then your options are more limited. You might be able to implement your own custom data structures to use less memory, but the saving won't be that great (in percentage terms) and you've got a significant amount of work to do to implement the code.
A better solution is to redesign your application so that it doesn't need to represent the entire XML data structure in memory. (If you can achieve this.)
Will the huge data affect the JVM to crash?
It could cause a JVM to throw an OutOfMemoryError. That's not technically a crash, but in your use-case it probably means that the application has no choice but to give up.
could you please suggest me (novice in Android/JAVA) what`s the most efficient way to deal with a relatively large amounts of data?
I need to compute some stuff for each of the 1000...5000 of elements in say a big datatype (x1,y1,z1 - double, flag1...flagn - boolean, desc1...descn - string) quite often (once a sec), that is why I want to do is as fast as possible.
What way would be the best? To declare a multidimensional array, or produce an array for each element (x1[i], y1[i]...), special class, some sort of JavaBean? Which one is the most efficient in terms of speed etc? Which is the most common way to deal with that sort of thing in Java?
Many thanks in advance!
Nick, you've asked a very generally questions. I'll do my best to answer it, but please be aware if you want anything more specific, you're going to need to drill down your question a bit.
Some back-envolope-calculations show that for and array of 5000 doubles you'll use 8 bytes * 5000 = 40,000 bytes or roughly 40 kB of memory. This isn't too bad as memory on most android devices is on the order of mega or even giga bytes. A good 'ol ArrayList should do just fine for storing this data. You could probably make things a little faster by specifying the ArrayLists length when you constructor. That way the Arraylist doesn't have to dynamically expand every time you add more data to it.
Word of caution though. Since we are on a memory restricted device, what could potentially happen is if you generate a lot of these ArrayLists rapidly in succession, you might start triggering the garbage collector a lot. This could cause your app to slow down (the whole device actually). If you're really going to be generating lots of data, then don't store it in memory. Store it off on disk where you'll have plenty of room and won't be triggering the garbage collector all the time.
I think that the efficiency with which you write the computation you need to do on each element is way more important than the data structure you use to store it. The difference between using an array for each element or an array of objects (each of which is the instance of a class containing all elements) should practically be negligible. Use whatever data structures you feel most comfortable with and focus on writing efficient algorithms.
Firstly I did my homework,
I've checked this:
How to store 2D Array coordinates in a list in java
I have created my own Vector2 class (like XNA).
I think if i use list and access the list every frame, it might slow my game down:S
I need to access this array every frame, so I really want to know if i can create them in a high efficiency way (not using list or class,just some basic type like int[]).
thx in advance!
Edit:
just access this array(the array will be created on init), because i need to search this array's element every frame in my game. I did want to try Vector2[].. buy the efficiency y'konw ^^
Edit*2:eh,I'm really new to JAVA. the "map" you mean use map to store Vector2?
In Java, arrays are (mostly) objects. That is why you can do stuff like
int[] b = { 2, 4, 5 };
System.out.println(b.length);
I know you are concerned with speed; however, attempting to program in a non-object-oriented manner in Java doesn't speed up your program. The JVM is optimized heavily to make object-oriented code run fast.
Concentrate more on your algorithms. If you can keep a list of "to be updated" items instead of iterating over a list of all items, you will dramatically speed up your program regardless of the data structure. Only go shopping for better implementations of data structures after you have determined that the data structure is the slow part.
Premature optimization means you start writing your code to be fast (which normally means you make large concessions to reduce readability) before you have determined that the section of code is impacting the speed of your program. You don't want to make a section of critical code unreadable before you realize it's only 5% of the problem that is making your code slow.
Semi-recent benchmarking puts a correctly sized ArrayList within 3% of the speed of a raw array. If you are wrapping your raw array in an object, odds are good the stock implementation of the ArrayList around its embedded array is much faster.
I am sorting a number of integers from a file, which will probably be too large to fit into memory in one go, my current idea is to get sort chucks with quicksort, then mergesort them together. I would like to make the chunks as big as possible, so I'd like to know how much I can read in in one go.
I know about Runtime.FreeMemory, but how should I go about using it. Should I carefully work out what other variables I use in the program then create an array of size (freeMemory - variablesSizes), or is that too likely to go wrong?
Thanks!
Experiment until you find a size that works well. The largest array you can allocate on the heap isn't necessarily the fastest way to do it. In many circumstances, the entire heap does not fit in the computers RAM, and might be swapped out in parts. Just because you can allocate a huge array, does not mean it will be the best size for optimizing speed.
Some adaptive approach would probably be best (testing number of items sorted/second depending on array size) and adjusting for what you can fit without getting an OutOfMemoryError.
Simpler: stick with some large value that works well, but isn't necessarily the largest you can use.
Or: use an external library/database to do what you want - working with huge amounts of data is tricky to get right in general, and you will probably get better performance and shorter development time if you don't reinvent the wheel.
I'd start with a relatively small chunk size for the first chunk. Then I'd double the chunk for every next chunk until you get an OutOfMemoryException. Though that will probably trigger swapping.
I think figuring out exactly how much memory we can allocate is a sticky buisness, as by default in java the jvm will allocate a heap space of 256M, but this can always be increated using -Xmx, so it is best to trade performace for portability by having a fixed chunk size of lets say around 150M.
If you go with java building sorting functionality, you will have to use a Collection of some sort, which will not take int primitive types, but rather, you will have to use Integer objects. (List<Integer>)
In my experiences (not to be taken as gospel), an int weighs in at (obviously) 4 bytes of ram, whereas an Integer weighs in at 12 bytes on a 32bit machine and 24 bytes on a 64bit machine.
If you need to minimize memory foot print, use int[] and then implement your own sorter...
However, it might be easier all the way around to use List<Integer>, and the built in sorting functions and just deal with having to have more of smaller sized Lists.
To answer the question though, you should definitely look at the Merge-Sort angle of attack to this problem and just pick an arbitrary List size to start with. You will likely find, after some experimentation, that there is a trade off between list size and number of chunks. Find the sweet spot and tell us your results!
When I was using C++ in college, I was told to use multidimensional arrays (hereby MDA) whenever possible, since it exhibits better memory locality since it's allocated in one big chunk. Array of arrays (AoA), on the other hand, are allocated in multiple smaller chunks, possibly scattered all over the place in the physical memory wherever vacancies are found.
So I guess the first question is: is this a myth, or is this an advice worth following?
Assuming that it's the latter, then the next question would be what to do in a language like Java that doesn't have true MDA. It's not that hard to emulate MDA with a 1DA, of course. Essentially, what is syntactic sugar for languages with MDA can be implemented as library support for languages without MDA.
Is this worth the effort? Is this too low level of an optimization issue for a language like Java? Should we just abandon arrays and use Lists even for primitives?
Another question: in Java, does allocating AoA at once (new int[M][N]) possibly yield a different memory allocation than doing it hierarchically (new int[M][]; for (... new int[N])?
Java and C# allocate memory in much different fashion that C++ does. In fact, in .NET for sure all the arrays of AoA will be close together if they are allocated one after another because memory there is just one continuous chunk without any fragmentation whatsoever.
But it is still true for C++ and still makes sense if you want maximum speed. Although you shouldn't follow that advise every time you want multidimensional array, you should write maintainable code first and then profile it if it is slow, premature optimization is root for all evil in this world.
Is this worth the effort? Is this too low level of an optimization issue for a language like Java?
Generally speaking, it is not worth the effort. The best strategy to to forget about this issue in the first version of your application, and implement in a straight-forward (i.e. easy to maintain) way. If the first version runs too slowly for your requirements, use a profiling tool to find the application's bottlenecks. If the profiling suggests that arrays of arrays is likely to be the problem, do some experiments to change your data structures to simulated multi-dimensional arrays and profile see if it makes a significant difference. [I suspect that it won't make much difference. But the most important things is to not waste your time optimizing something unnecessarily.]
Should we just abandon arrays and use Lists even for primitives?
I wouldn't go that far. Assuming that you are dealing with arrays of a predetermined size:
arrays of objects will be a bit faster than equivalent lists of objects, and
arrays of primitives will be considerably faster and take considerably less space than equivalent lists of primitive wrapper.
On the other hand, if your application needs to "grow" the arrays, using a List will simplify your code.
I would not waste the effort to use a 1D array as a multidim array in Java because there is no syntax to help. Of course one could define functions (methods) to hide the work for you, but you just end up with a function call instead of following a pointer when using an array of arrays. Even if the compiler/interpreter speeds this up for you, I still don't think it is worth the effort. In addition you can run into complications when trying to use code that expects 2D (or N-Dim) arrays that expect as arrays of arrays. I'm sure most general code out there will be written for arrays like this in Java. Also one can cheaply reassign rows (or columns if you decide to think like that).
If you know that this multidim array is a bottleneck, you may disregard what I said and see if manually optimizing helps.
From personal experience in Java, multidimensional arrays are far far slower than one dimensional arrays if one is loading a large amount of data, or accessing elements in the data which are at different positions. I wrote a program that took a screen shot image in BMP format, and then searched the screenshot for a smaller image. Loading the screenshot image (approx. 3 mb) into a multidimensional array (three dimensional, [xPos][yPos][color] (with color=0 being red value, and suchforth)) took 14 seconds. To load it into a single dimensional array took 1 second.
The gain for finding the smaller image in the larger image was similar. It took around 28 seconds to find the smaller image in the larger image when both images were stored as multi-dimensional arrays. It took around a second to find the smaller image in the larger image when both images were stored as one dimensional arrays. That said, I first wrote my program using a dimensional arrays for the sake of readability.