HashMap vs ArrayList (160 entrys) [duplicate] - java

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Memory overhead of Java HashMap compared to ArrayList
I try to search on google but didnt found an answer, I need to store 160 entrys on a collection, and I dont want to iterate over them, I want to get the value by one entry, a point 2D, what is the best?
With less memory consume and faster access ?
Thanks alot in advance ;)

For 160 entries? Who cares. Use whichever API is better for this – probably the HashMap.
Also, with data structures, very frequently the tradeoff is speed versus memory use – as a rule you can't have both. (E.g. a hash table that uses chaining for collision resolution will have memory overhead for the buckets over an array; accessing an element by key in it will be faster than searching the array for the key.)

With less memory consume and faster access ?
HashMap probably gives faster access. This could be computationally important, but only if the application does a huge number of lookups using the collection.
ArrayList probably gives least memory usage. However, for a single collection containing 160 elements, the difference is probably irrelevant.
My advice is to not spend a lot of time trying to decide which is best. Toss a coin if you need to, and then move on to more important problems. You only should be worrying about this kind of thing (where the collection is always small) if CPU and/or memory profiling tells you that this data structure is a critical bottleneck.

If Hashmap vs ArrayList are you only two options, obviously it is going to be a Hashmap for speed of retrival. I am not sure of memory usage. But ArrayList has the ability to maintain order.

Related

Array of Structs are always faster than Structs of arrays?

I was wondering if the data layout Structs of Arrays (SoA) is always faster than an Array of Structs (AoS) or Array of Pointers (AoP) for problems with inputs that only fits in RAM programmed in C/JAVA.
Some days ago I was improving the performance of a Molecular Dynamic algorithm (in C), summarizing in this algorithm it is calculated the force interaction among particles based on their force and position.
Original the particles were represented by a struct containing 9 different doubles, 3 for particles forces (Fx,Fy,Fz) , 3 for positions and 3 for velocity. The algorithm had an array containing pointers to all the particles (AoP). I decided to change the layout from AoP to SoA to improve the cache use.
So, now I have a Struct with 9 array where each array stores forces, velocity and positions (x,y,z) of each particle. Each particle is accessed by it own array index.
I had a gain in performance (for an input that only fits in RAM) of about 1.9x, so I was wondering if typically changing from AoP or AoS to SoA it will always performance better, and if not in which types of algorithms this do not occurs.
Much depends of how useful all fields are. If you have a data structure where using one fields means you are likely to use all of them, then an array of struct is more efficient as it keeps together all the things you are likely to need.
Say you have time series data where you only need a small selection of the possible fields you have. You might have all sorts of data about an event or point in time, but you only need say 3-5 of them. In this case a structure of arrays is more efficient because a) you don't need to cache the fields you don't use b) you often access values in order i.e. caching a field, its next value and its next is useful.
For this reason, time-series information is often stored as a collection of columns.
This will depend on how exactly you access the data.
Try to imagine, what exactly happens in the hardware when you access your data, in either SoA or AoS.
To reason about your question, you must consider following things -
If the cache is absent, the performance should be the same, assuming that memory access latency is equal for all the elements of the data.
Now with the cache, if you access consecutive address locations, definitely you will get performance improvement. This is exactly valid in your case. When you have AoS, The locations are not consecutive in the memory, so you must lose some performance there.
You must be accessing in for loops your data like for(int i=0;i<1000000;i++) Fx[i] = 0. So if the data is huge in quantity, you will easily see the small performance benefits. If your data was small, this would not matter much.
Finally, you also don't know about the DRAM that you are using. It will have some benefits when you access consecutive data. For example to understand why it is like that you can refer to wiki.

Fastest way to access this object

Lets say I have a list of 1,000,000 users where their unique identifier is their username string. So to compare two User objects I just override the compareTo() method an compare the username members.
Given a username string I wish to find the User object from a list. What, in an average case, would be the fastest way to do this.
I'm guessing a HashMap, mapping usernames to User objects, but I wondered if there was something else that I didn't know about which would be better.
If you don't need to store them in a database (which is the usual scenario), a HashMap<String, User> would work fine - it has O(1) complexity for lookup.
As noted, the usual scenario is to have them in the database. But in order to get faster results, caching is utilized. You can use EhCache - it is similar to ConcurrentHashMap, but it has time-to-live for elements and the option to be distributed across multiple machines.
You should not dump your whole database in memory, because it will be hard to synchronize. You will face issues with invalidating the entries in the map and keeping them up-to-date. Caching frameworks make all this easier. Also note that the database has its own optimizations, and it is not unlikely that your users will be kept in memory there for faster access.
I'm sure you want a hash map. They're the fastest thing going, and memory efficient. As also noted in other replies, a String works as a great key, so you don't need to override anything. (This is also true of the following.)
The chief alternative is a TreeMap. This is slower and a uses a bit more memory. It's a lot more flexible, however. The same map will work great with 5 entries and 5 million entries. You don't need to clue it in in advance. If your list varies wildly in size, the TreeMap will grab memory as it needs and let it go when it doesn't. Hashmaps are not so good about letting go, and as I explain below, they can be awkward when grabbing more memory.
TreeMap's work better with Garbage Collectors. They ask for memory in small, easily found chunks. If you start a hashtable with room for 100,000 entries, when it gets full it will free the 100,000 element (almost a megabye on a 64 bit machine) array and ask for one that's even larger. If it does this repeatedly, it can get ahead of the GC, which tends to throw an out-of-memory exception rather than spend a lot of time gathering up and concentrating scattered bits of free memory. (It prefers to maintain its reputation for speed at the expense of your machine's reputation for having a lot of memory. You really can manage to run out of memory with 90% of your heap unused because it's fragmented.)
So if you are running your program full tilt, your list of names varies wildly in size--and perhaps you even have several lists of names varying wildly in size--a TreeMap will work a lot better for you.
A hash map will no doubt be just what you need. But when things get really crazy, there's the ConcurrentSkipListMap. This is everything a TreeMap is except it's a bit slower. On the other hand, it allows adds, updates, deletes, and reads from multiple threads willy-nilly, with no synchronization. (I mention it just to be complete.)
In terms of data structures the HashMapcan be a good choice. It favours larger datasets. The time for inserts is considered constant O(1).
In this case it sounds like you will be carrying out more lookups than inserts. For lookups the average time complexity is O(1 + n/k), the key factor here (sorry about the pun) is how effective the hashing algorithm is at evenly distributing the data across the buckets.
the risk here is that the usernames are short in length and use a small character set such as a-z. In which case there would be a lot of collisions causing the HashMap to be loaded unevenly and therefore slowing down the lookups. One option to improve this could be to create your own user key object and override the hashcode() method with an algorthim that suits your keys better.
in summary if you have a large data set, a good/suitable hashing algorithm and you have the space to hold it all in memory then HashMap can provide a relatively fast lookup
I think given your last post on the ArrayList and it's scalabilty I would take Bozho's suggestion and go for a purpose build cache such as EhCache. This will allow you to control memory usage and eviction policies. Still a lot faster than db access.
If you don't change your list of users very often then you may want to use Aho-Corasick. You will need a pre-processing step that will take O(T) time and space, where T is the sum of the lengths of all user names. After that you can match user names in O(n) time, where n is the length of the user name you are looking for. Since you will have to look at every character in the user name you are looking for I don't think it's possible to do better than this.

How to determing java object size in memory efficiently? [duplicate]

This question already has answers here:
How to determine the size of an object in Java
(28 answers)
Closed 9 years ago.
I am using a lru cache which has limit on memory usage size. The lru cache include two data structures: a hashmap and a linked list. The hash map holds the cache objects and the linked list keeps records of cache object access order. In order to determine java object memory usage, I use an open source tool -- Classmexer agent
which is a simple Java instrumentation agent at http://www.javamex.com/classmexer/.
I try another tool named SizeOf at http://sizeof.sourceforge.net/.
The problem is the performance very expensive. The time cost for one operation for measuring object size is around 1.3 sec which is very very slow. This can make the benefits of caching to be zero.
Is there any other solution to measuring a java object memory size when it is alive?
Thanks.
Getting an accurate, to the byte value will be expensive as it needs to use reflection to get a nested size, but it can be done.
In Java, what is the best way to determine the size of an object?
http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/Instrumentation.html
Look at this: How to calculate the memory usage of a Java array Maybe it helps you in the analysis.
Since you already have a tool that does it...the reality is that there's no fast way to do this in Java.

How bad is to use Hashmaps and ArrayLists while using huge data?

I am reading XML document into HashMaps, ArrayLists so that the relationship maintains even in the memory. My code does my job but i am worried about the iterations or function calls i am performing on this huge maps and lists. Currently the xml data i am working is not so huge. but i dont know what happens if it has. What are the testcases i need to perform on my logics that use these hashmaps? How bad is using a Java collections for such a huge data? Is there any alternatives for them? Will the huge data affect the JVM to crash?
Java collections have a certain overhead, which can increase the memory usage a lot (20 times in extreme cases) when they're the primary data structures of an application and the payload data consists of a large number of small objects. This could lead to the application terminating with an OutOfMemoryError even though the actual data is much smaller than the available memory.
ArrayList is actually very efficient for large numbers of elements, but inefficient when you have a large number of lists that are empty or contain only one element. For those cases, you could use Collections.emptyList() and Collections.singletonList() to improve efficiency.
HashMap has the same problem as well as a considerable overhead for each element stored in it. So the same advice applies as for ArrayList. If you have a large number of elements, there may be alternative Map implementations that are more efficient, e.g. Google Guava.
The biggest overheads happen when you store primitive values such as int or long in collections, as the need to be wrapped as objects. In those cases, the GNU Trove collections offer an alternative.
In your case specifically, the question is whether you really need to keep the entire data from the XML in memory at once, or whether you can process it in small chunks. This would probably be the best solution if your data can grow arbitrarily large.
The easiest short term solution would be to simply buy more memory. It's cheap.
JVM will not crash in what you describe. What may happen is an OutOfMemoryError. Also if you retain the data in those Collections for long you may have issues with the garbage collection. Do you really need to store the whole XML data in memory?
If you are dealing with temporary data and you need to have a fast access to it you do not have to many alternatives. The question is what do you mean when you say "huge"? MegaBytes? GigaBytes? TeraBytes?
While your data does not exceed 1G IMHO holding it in memory may be OK. Otherwise you should think about alternatives like DB (relational or NoSql) files etc.
In your specific example I'd think about replacing ArrayList to LinkedList unless you need random access list. ArrayList is just a wrapper over array, so when you need 1 million elements it allocates 1 million elements long array. Linked list is better for when number of elements is big but it is rate of access of element by index is o(n/2). If you need both (i.e. huge list and fast access) use TreeMap with index as a key instead. You will get log(n) access rate.
What are the testcases i need to perform on my logics that use these hashmaps?
Why not to generate large XML files (for example, 5 times larger, than your current data samples) and check your parsers/memory storages with them? Because only you knows what files are possible in your case, how fast will they grow, this is the only solution.
How bad is using a Java collections for such a huge data? Is there any
alternatives for them? Will the huge data affect the JVM to crash?
Of course, is it possible that you will have OutOfMemory exception if you try to store too much data in memory, and it is not eligible for GC. This library: http://trove.starlight-systems.com/ declares, that it uses less memory, but I didn't use it myself. Some discussion is available here: What is the most efficient Java Collections library?
How bad is using a Java collections for such a huge data?
Java Map implementations and (to a lesser extent) Collection implementations do tend to use a fair amount of memory. The effect is most pronounced when the key / value / element types are wrapper types for primitive types.
Is there any alternatives for them?
There are alternative implementations of "collections" of primitive types that use less memory; e.g. the GNU Trove libraries. But they don't implement the standard Java collection APIs, and that severely limits their usefulness.
If your collections don't use the primitive wrapper classes, then your options are more limited. You might be able to implement your own custom data structures to use less memory, but the saving won't be that great (in percentage terms) and you've got a significant amount of work to do to implement the code.
A better solution is to redesign your application so that it doesn't need to represent the entire XML data structure in memory. (If you can achieve this.)
Will the huge data affect the JVM to crash?
It could cause a JVM to throw an OutOfMemoryError. That's not technically a crash, but in your use-case it probably means that the application has no choice but to give up.

What is the fastest way to deal with large arrays of data in Android?

could you please suggest me (novice in Android/JAVA) what`s the most efficient way to deal with a relatively large amounts of data?
I need to compute some stuff for each of the 1000...5000 of elements in say a big datatype (x1,y1,z1 - double, flag1...flagn - boolean, desc1...descn - string) quite often (once a sec), that is why I want to do is as fast as possible.
What way would be the best? To declare a multidimensional array, or produce an array for each element (x1[i], y1[i]...), special class, some sort of JavaBean? Which one is the most efficient in terms of speed etc? Which is the most common way to deal with that sort of thing in Java?
Many thanks in advance!
Nick, you've asked a very generally questions. I'll do my best to answer it, but please be aware if you want anything more specific, you're going to need to drill down your question a bit.
Some back-envolope-calculations show that for and array of 5000 doubles you'll use 8 bytes * 5000 = 40,000 bytes or roughly 40 kB of memory. This isn't too bad as memory on most android devices is on the order of mega or even giga bytes. A good 'ol ArrayList should do just fine for storing this data. You could probably make things a little faster by specifying the ArrayLists length when you constructor. That way the Arraylist doesn't have to dynamically expand every time you add more data to it.
Word of caution though. Since we are on a memory restricted device, what could potentially happen is if you generate a lot of these ArrayLists rapidly in succession, you might start triggering the garbage collector a lot. This could cause your app to slow down (the whole device actually). If you're really going to be generating lots of data, then don't store it in memory. Store it off on disk where you'll have plenty of room and won't be triggering the garbage collector all the time.
I think that the efficiency with which you write the computation you need to do on each element is way more important than the data structure you use to store it. The difference between using an array for each element or an array of objects (each of which is the instance of a class containing all elements) should practically be negligible. Use whatever data structures you feel most comfortable with and focus on writing efficient algorithms.

Categories