I have a map of key-value pairs of huge size, approximately 10^7, and I have to loop through it 15 times a second in order to update its contents
Is there any class or structure that offers good complexity and reduces the time needed to loop through?
Currently, I am using TreeMap but the complexity is log n only for contains, put, get and remove. Looping through the elements is of n complexity
Do you know any structure or do you have any idea that may reduce the complexity below n?
If you have to arbitrary loop over the entire collection, you will not get better than n. If you have to loop the entire collection, you could use a simple ArrayList. but if you need to access specific data in the collection using a key, TreeMap will be fine.
You can't beat the O(n) bound on any sequential (or finitely parallel) computer, if your problem is just to look at all of O(n) values.
If you have a finitely parallel machine and depending on exactly how you're updating the elements, you could achieve speedup. For instance, using CUDA and a GPU or OpenMP/MPI and a cluster/multi-core workstation, you could compute A[i] = A[i]^3 or some such with good speedup. Of course, then there's the question of communication... but this might be something to look at.
Related
I want to store 1*10^8 Objects in a map for searching. When my program start, it will read and store these objects in a map. After reading is end, this map never be updated util program is dead. I don't want jvm to abandon any of them. I learn that HashMap will waste many memory , is there any type of map can store so much objects and save memory?
and I know that jvm will scan these objects, it waste time. how to avid this?
Sorry, The situation is that: I am writing a bolt with apache storm. I want to read data from databases. when a bolt is processing a tuple, I need to calculate with the data in databases. For performance of program I have to store them in memory. I know jvm is not good at managing a lot of memory, So maybe I should to try koloboke?
HashMap need to allocate array of sufficient size in order to minimize hash collisions - it can happen that two or more objects that are not equal have the same hash code - probability of such situation depends on quality of hash function. Collisions are resolved by techniques such as linear probing, which stores entry at next (hash + i) mod length index that is not occupied, quadratic probing which stores entry at next (hash + i^k) mod length index that is not occupied, separate chaining which stores linked list of entries at each bucket. Collision probability is decreased by increasing length of backing array, thus memory wasting.
However, you can use TreeMap which stores entries in tree structure that creates only such a number of nodes that is equal to number of entries i. e. efficient memory usage.
Note, there is a difference in complexity of get, put, remove operations. HashMap has complexity O(1), while TreeMap has complexity O(log n).
Suppose you want to get an entry from map of size 100 000 000, then in worst case (element to be found is leaf i. e. is located at the last level of the tree), path that need to be passed down the tree has length log(100 000 000) = 8.
Well, I am back.
In first I used about 30g to store about 5x10^7 key-value entry.. but gc is not stable.I make a mistake about using string to store double, it is bigger than double in memory and a char is 16bit in java ..after I changed this mistake, gc is better..but not enough. Finally I used 'filedb' in mapdb to fix this.
Suppose you are given a list of integers that have already been sorted such as (1,7,13,14,50). It should be noted that the list will contain no duplicates.
Is there some data structure that could store this while allowing me to add any new element (at it's proper location) in constant time? add(10) would yield (1,7,10,13,14,50).
Similarly, would I be able to update an element (such as changing 7 to 19) and shift the order accordingly in constant time? change(7,19) yields (1,13,14,19,50).
For a class I need to write a data structure that performs these operations as quickly as possible, but I just wanted to know if constant time could be done and if not, then what would the ideal runtime be?
To insert in constant time, O(1), this would only occur as a best case for any of the data structures. Hash tables generally have the best insertion time, but it might not always be O(1) if there are collisions and there is separate chaining. You do not sort a hash so the complexity is irrelevent.
Binary tree's have a good insertion time, and as a bonus, it is sorted already upon inserting a new node. This takes on average O(logn) time however. The best case for inserting is O(1) if the tree is empty.
Those were just a couple examples, see here for more info on the complexities of these operations: http://bigocheatsheet.com/
In general? No. Determining where to insert a new element or re-ordering the list after insertion involves performing analysis of the list's contents, which involves reading the elements of the list, which (in general) means iterating over some portion of the length of the list. This (again, in general) is dependant on how many elements are in the list, which by definition is not a constant. Hence, a constant-time sorted insert is simply not possible except in special cases.
A binary tree, TreeSet, would be adequate. An array with Arrays.binarySearch and Arrays.copy would be fine too because here we have ints, and then we do not need the wrapper class Integer.
For real constant time, O(1), one must pay in space. Use a BitSet. To add 17 simply set 17 to true. There are optimized methods to find the next set bit and so on.
But I doubt optimizing is really needed at this spot. File I/O might pay off more.
I have a sorted array, lets say D={1,2,3,4,5,6} and I want to add the number 5 in the middle. I can do that by adding the value 5 in the middle and move the other values one step to the right.
The problem is that I have an array with 1000 length and I need to do that operation 10.000 times, so I need a faster way.
What options do I have? Can I use LinkedLists for better performance?
That depends on how you add said numbers. If only in ascending or descending order - then yes, LinkedList will do the trick, but only if you keep the node reference in between inserts.
If you're adding numbers in arbitrary order, you may want to deconstruct your array, add the new entries and reconstruct it again. This way you can use a data structure that's good at adding and removing entries while maintaining "sortedness". You have to relax one of your assumptions however.
Option 1
Assuming you don't need constant time random access while adding numbers:
Use a binary sorted tree.
The downside - while you're adding, you cannot read or reference an element by their position, not easily at least. Best case scenario - you're using a tree that keeps track of how many elements the left node has and can get the ith element in log(n) time. You can still get pretty good performance if you're just iterating through the elements though.
Total runtime is down to n * log(n) from n^2. Random access is log(n).
Option 2
Assuming you don't need the elements sorted while you're adding them.
Use a normal array, but add elements to the end of it, then sort it all when you're done.
Total runtime: n * log(n). Random access is O(1), however elements are not sorted.
Option 3
(This is kinda cheating, but...)
If you have a limited number of values, then employing the idea of BucketSort will help you achieve great performance. Essentially - you would replace your array with a sorted map.
Runtime is O(n), random access is O(1), but it's only applicable to a very small number of situations.
TL;DR
Getting arbitrary values, quick adding and constant-time positional access, while maintaining sortedness is difficult. I don't know any such structure. You have to relax some assumption to have room for optimizations.
A LinkedList will probably not help you very much, if at all. Basically you are exchanging the cost of shifting every value on insert with the cost of having to traverse each node in order to reach the insertion point.
This traversal cost will also need to be paid whenever accessing each node. A LinkedList shines as a queue, but if you need to access the internal nodes individually it's not a great choice.
In your case, you want a sorted Tree of some sort. A BST (Balanced Search Tree, also referred to as a Sorted Binary Tree) is one of the simplest types and is probably a good place to start.
A good option is a TreeSet, which is likely functionally equivalent to how you were using an array, if you simply need to keep track of a set of sorted numbers.
I am trying to find out which structure would be the fastest, because i have a problem with my code. I have a large amount of data to store. Maybe thousands of nodes are needed. My first thought was to create an ArrayList and then i started adding integers to use them later. This ArrayList will be useful for fast accessing bytes in Random Access Files. So, i put the first node, which represents a pointer to the first entry in a Random Access File. Then, i put the second, at the same way, and so on..
My program takes too long when putting the integers in the ArrayList.
Could i fix my code using a faster structure??
Yes,
you can use LinkedList, your arraylist have amartized O(1) insertion but when you have a huge arraylist and it needs to be resized, it will take long time to allocate a new arraylist, copy the current elements and continue.
eg: if you have 10 million elements in your arraylist and it s full, when you insert one more, your arraylist has to double the size of current and then copy all the elements to the new one. this is very expensive operation.
If you use LinkedList you have O(1) insertion but not random access. So if you want to access to nth element, you will have to traverse all the nodes up to n. it takes O(n). but do you really do that.
So linkedlist is you option. possibly, doubly linked list.
If you want fast reads as well as fast insertion, you can use Dictionary, HashMap. You have O(1) writes and reads, if and only if you have a perfect hashing.
But again, internally, HashTable, Dictionary uses arrays so once your dictionary grows too large, you will have the same problem, moreover, each time your array expands, your hashcodes are re-calculated.
You can use Trees with logn writes and reads.
You can use Skiplist with logn writes and reads.
An ArrayList is clearly not the fastest thing here, because the ArrayList does not contain int but the Integer wrapper types. Therefore a plain array int[] intArray have the lowest overhead.
On the other hand: if you can omit the list/array completely and do the calculations instantly, this would save some more overhead. This leads in the direction to not do microoptimization but to think about the problem and perhaps use a completely different algorithm.
I am comparing 2 HashMaps, and I am trying to figure out the time complexity of the comparison loop.
The code is as follows:
//map1 is a HashMap and contains m elements and keys
//map2 is a HashMap and contains n elements and keys
List<myObject> myList = new ArrayList<myObject>()
for (String key: map1.keySet()){
if(!map2.containsKey(key)){
myList.add(map.get(key));
}
}
The first for loop will be O(m). I found on some other forum that the containsKey() takes lg(n) time. Can someone please confirm that? I couldn't find it in the JavaDocs.
If so , then the the total time complexity would be O(mlg{n}).
Also any ideas on how to do this comparison in a better way would be helpful.
Depends on your hashcode algorithm and collisions.
Using a perfect hashcode, theoretically map look up is O(1), constant time, if there are collisions, it might be upto O(n).
So in your case, if you have good hash algorithms, it would be O(m).
if you look at wiki, you can get more understanding about the concept. You can also look at Map source code.
The Java HashMap implementation should constantly be resizing the internal data structure to be larger than the number of elements in the map by a certain amount and the hashing algorithm is good so I would assume collisions are minimal and that you will get much closer to O(1) than O(n).
What HashMap are you using? The one that comes with Java? Your own?
You're right about the time complexity of the outer loop: O(n). The asymptotic complexity of HashMap.containsKey() is O(1) unless you've done something ridiculous in your implementation of myObject.hashCode(). So your method should run in O(n) time. An optimization would be to ensure you're looping over the smaller of the two maps.
Note that TreeMap.containsKey() has O(log n) complexity, not HashMap... Stop looking at those forums :)