Java: Inserting into LinkedList efficiently - java

I am optimizing an implementation of a sorted LinkedList.
To insert an element I traverse the list and compare each element until I have the correct index, and then break loop and insert.
I would like to know if there is any other way that I can insert the element at the same time as traversing the list to reduce the insert from O(n + (n capped at size()/2)) to O(n).
A ListIterator is almost what Im after because of its add() method, but unfortunately in the case where there are elements in the list equal to the insert, the insert has to be placed after them in the list. To implement this ListIterator needs a peek() which it doesnt have.
edit: I have my answer, but will add this anyway since a lot of people havent understood correctly:
I am searching for an insertion point AND inserting, which combined is higher than O(n)

You may consider a skip list, which is implemented using multiple linked lists at varying granularity. E.g. the linked list at level 0 contains all items, level 1 only links to every 2nd item on average, level 2 to only every 4th item on average, etc.... Searching starts from the top level and gradually descends to lower levels until it finds an exact match. This logic is similar to a binary search. Thus search and insertion is an O(log n) operation.
A concrete example in the Java class library is ConcurrentSkipListSet (although it may not be directly usable for you here).

I'd favor Péter Török suggestion, but I'd still like to add something for the iterator approach:
Note that ListIterator provides a previous() method to iterate through the list backwards. Thus first iterate until you find the first element that is greater and then go to the previous element and call add(...). If you hit the end, i.e. all elements are smaller, then just call add(...) without going back.

I have my answer, but will add this anyway since a lot of people havent understood correctly: I am searching for an insertion point AND inserting, which combined is higher than O(n).
Your require to maintain a collection of (possibly) non-unique elements that can iterated in an order given by a ordering function. This can be achieved in a variety of ways. (In the following I use "total insertion cost" to mean the cost of inserting a number (N) of elements into an initially empty data structure.)
A singly or doubly linked list offers O(N^2) total insertion cost (whether or not you combine the steps of finding the position and doing the insertion!), and O(N) iteration cost.
A TreeSet offers O(NlogN) total insertion cost and O(N) iteration cost. But has the restriction of no duplicates.
A tree-based multiset (e.g. TreeMultiset) has the same complexity as a TreeSet, but allows duplicates.
A skip-list data structure also has the same complexity as the previous two.
Clearly, the complexity measures say that a data structure that uses a linked list performs the worst as N gets large. For this particular group of requirements, a well-implemented tree-based multiset is probably the best, assuming there is only one thread accessing the collection. If the collection is heavily used by many threads (and it is a set), then a ConcurrentSkipListSet is probably better.
You also seem to have a misconception about how "big O" measures combine. If I have one step of an algorithm that is O(N) and a second step that is also O(N), then the two steps combined are STILL O(N) .... not "more than O(N)". You can derive this from the definition of "big O". (I won't bore you with the details, but the Math is simple.)

Related

Singly linked list iteration complexity

First of all, my basic understanding of a singly linked list has been that every node only points to the next subsequent node, so my problem might stem from the fact that my definition of such list is incorrect.
Given the list setup, getting to node n would require iterating through n-1 nodes that come before it, so search and access would be O(n). Now, apparently node insertion and deletion take O(1), but unless they are talking about first item insertion, then in reality it would be O(n) + O(1) for you to insert the item between nodes n and n+1.
Now, indexing a list would also have O(n) complexity, yet apparently building such indexes is frowned upon, and I cannot understand why. Couldn't we build a singly linked list index, which would allow us true O(1) insertion and deletion without having to perform an O(n) iteration over the list to get to our specific node? It wouldn't even need to be an index of all nodes, and we could have it pointing to subindexes i.e. for a list of 1000 items, first index would point to 10 different indexes for items between 1-100, 101-200, etc. and then those indexes could point to smaller indexes that go by 10. This way getting to node 543 could potentially take only 3+(index traversal) iterations, instead of 543 as it would for a typical singly linked list.
I guess, what I am asking is why such indexing should typically be avoided?
You are describing a skip-list.
A skip list have a search, insert, delete time complexity of O(logN), since this "smaller subindexes" you describe - you have logarithmic number of them (what happens if your list has 100 elements? How many of these levels do you need? And how much for 1,000,000 elements? and 10^30?).
Note that a skip list is usually maintained sorted, but you can do it unsorted (which is sorted by index - actually) if you wish as well.
With a singly-linked list, even if you already have a direct reference to the node to delete, the complexity is not O(1). This is because you have to update the prior node's next-node reference, which requires you to iterate through the list -- resulting in O(N) complexity. To get O(1) complexity for deletion, you'd need a doubly-linked list.
There is already a collections class that combines a HashMap with a doubly-linked list: the LinkedHashMap.

Fastest way to add a value in the middle of a sorted array - Java

I have a sorted array, lets say D={1,2,3,4,5,6} and I want to add the number 5 in the middle. I can do that by adding the value 5 in the middle and move the other values one step to the right.
The problem is that I have an array with 1000 length and I need to do that operation 10.000 times, so I need a faster way.
What options do I have? Can I use LinkedLists for better performance?
That depends on how you add said numbers. If only in ascending or descending order - then yes, LinkedList will do the trick, but only if you keep the node reference in between inserts.
If you're adding numbers in arbitrary order, you may want to deconstruct your array, add the new entries and reconstruct it again. This way you can use a data structure that's good at adding and removing entries while maintaining "sortedness". You have to relax one of your assumptions however.
Option 1
Assuming you don't need constant time random access while adding numbers:
Use a binary sorted tree.
The downside - while you're adding, you cannot read or reference an element by their position, not easily at least. Best case scenario - you're using a tree that keeps track of how many elements the left node has and can get the ith element in log(n) time. You can still get pretty good performance if you're just iterating through the elements though.
Total runtime is down to n * log(n) from n^2. Random access is log(n).
Option 2
Assuming you don't need the elements sorted while you're adding them.
Use a normal array, but add elements to the end of it, then sort it all when you're done.
Total runtime: n * log(n). Random access is O(1), however elements are not sorted.
Option 3
(This is kinda cheating, but...)
If you have a limited number of values, then employing the idea of BucketSort will help you achieve great performance. Essentially - you would replace your array with a sorted map.
Runtime is O(n), random access is O(1), but it's only applicable to a very small number of situations.
TL;DR
Getting arbitrary values, quick adding and constant-time positional access, while maintaining sortedness is difficult. I don't know any such structure. You have to relax some assumption to have room for optimizations.
A LinkedList will probably not help you very much, if at all. Basically you are exchanging the cost of shifting every value on insert with the cost of having to traverse each node in order to reach the insertion point.
This traversal cost will also need to be paid whenever accessing each node. A LinkedList shines as a queue, but if you need to access the internal nodes individually it's not a great choice.
In your case, you want a sorted Tree of some sort. A BST (Balanced Search Tree, also referred to as a Sorted Binary Tree) is one of the simplest types and is probably a good place to start.
A good option is a TreeSet, which is likely functionally equivalent to how you were using an array, if you simply need to keep track of a set of sorted numbers.

Sorted data structure with O(logN) insertion that gives insertion point index

My goal is a sorted data structure that can accomplish 2 things:
Fast insertion (at the location according to sort order)
I can quickly segment my data into the sets of everything greater than or less than or equal to an element. I need to know the size of each of these partitions, and I need to be able to "get" these partitions.
Currently, I'm implementing this in java using an ArrayList which provides #2 very easily since I can perform binary search (Collections.binarySearch) and get an insertion index telling me at what point an element would be inserted. Then based on the fact that indices range from 0 to the size of the array, I immediately know how many elements are greater than my element or smaller than my elements, and I can easily get at those elements (as a sublist). However, this doesn't have property #1, and results in too much array copying.
This makes me want to use something like a SkipList or RedBlackTree that could perform the insertions faster, but then I can't figure out how to satisfy property #2 without making it take O(N) time.
Any suggestions would be appreciated. Thanks
EDIT: Thanks for the answers below that reference data structures that perform the insertion in O(logN) time and that can partition quickly as well, but I want to highlight the size() requirement - I need to know the size of these partitions without having to traverse the entire partition (which, according to this is what the TreeSet does. The reasoning behind this is that in my use case I maintain my data using several different copies of data structures each using a different comparator, and then need to ask "according to what comparator is the set of all things larger than a particular element smallest". In the ArrayList case, this is actually easy and takes only O(YlogN) where Y is the number of comparators, because I just binary search each of the Y arrays and return the arraylist with the highest insertion index. It's unclear to me how I could this with a TreeSet without taking O(YN).
I should also add that an approximate answer for the insertion index would still be valuable even if it couldn't be solved exactly.
Use a common Java TreeSet. Insertion takes O(logN), so #1 of your requirements is done. Here's the qouting from documentation:
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
And as it implements the NavigableSet interface, you have #2 or your requirements with the following methods:
tailSet(someElem) returns a Set view starting from someElem till the last element
headSet(someElem) returns a Set view starting from the first element till someElem
subSet(fromElem, toElem) returns a Set view starting from fromElem till toElem
These operations are overloaded with versions that include/exclude the bounds provided.
TreeSet is quite flexible: it allows you to define a Comparator to order the Set in a custom way, or you can also rely on the natural ordering of the elements.
EDIT:
As per the requirement of returned subsets size() operation to not be O(n), I'm afraid there's no adhoc implementation in the Java API.
It is true, the set views returned by TreeSet range operations, implement size() by 'jumping' to the first element of the view in O(log n) time, and then iterating over the subsequent elements, adding 1 in each iteration, until the end of the subset is reached.
I must say this is quite unfortunate, since it's not always needed to traverse the returned subset view, but sometimes, knowing the size of the subset in advance can be quite useful (as it's your use case).
So, in order to fulfil your requirement, you need another structure, or at least, an auxiliary structure. After some research, I suggest you use a Fenwick tree. A Fenwick tree is also known as a Binary Indexed Tree (BIT), and can be either immutable or mutable. The immutable version is implemented with an array, while the mutable version could be implemented with a balanced binary tree, i.e. a Red-Black tree (Java TreeSet is actually implemented as a Red-Black tree). Fenwick trees are mainly used to store frequencies and calculate the sum of all frequencies up to a given element in O(log n) time.
Please refer to this question here on Stack Overflow for a complete introduction to this quite unknown but yet incredibly useful structure. (As the explanation is here in Stack Overflow, I won't copy it here).
Here's another Stack Overflow question asking how to properly initialize a Fenwick tree, and here's actual Java code showing how to implement Fenwick tree's operations. Finally, here's a very good theoretic explanation about the structure and the underlying algorithms being used.
The problem with all the samples in the web is that they use the immutable version of the structure, which is not suitable to you, since you need to interleave queries with adding elements to the structure. However, they are all very useful to fully understand the structure and algorithms being used.
My suggestion is that you study Java TreeMap's implementation and see how to modify/extend it so that you can turn it into a Fenwick tree that keeps 1 as a value for every key. This 1 would be each key's frequency. So Fenwick tree's basic operation getSum(someElement) would actually return the size of the subset from first element up to someElement, in O(log n) time.
So the challenge is to implement a balanced tree (a descendant of Java's Red-Black TreeMap, actually), that implements all Fenwick tree's operations you need. I believe you'd be done with getSum(somElement), but maybe you could also extend the returned subtree range views so that they all refer to getSum(someElelment) when implementing size() operation for range views.
Hope this helps, at least I hope it's a good place to start. Please, let me know if you need clarifications, as well as examples.
If you don't need duplicate elements (or if you can make the elements look distinct), I'd use a java.util.TreeSet. It meets your stated requirements.
O(log n) sorted insertion due to binary tree structure
O(log n) segmentation time using in-place subsets
Unfortunately, the O(log n) segmentation time is effectively slowed to O(n) by your requirement to always know the size of the segment, due to the reason in the answer you linked. The in-place subsets don't know their size until you ask them, and then they count. The counted size is stored, but if the backing set is changed in any way, the subset has to count again.
I think the best data structure for this problem would be a B-Tree with a dense index. Such a B-Tree is built from:
- inner nodes containing only pointers to child nodes
- leafs containing pointers to paged arrays
- a number of equal-sized-arrays (pages)
Unfortunately there are few generic implementations of a B-Tree in Java, probably because so many Variations exist.
The cost of insertion would be
O(log(n)) to find the position
O(p) to insert a new element into a page (where p is the constant page size)
Maybe this data structure also covers your segmentation problem. If not: The cost of extracting would be
O(log(n)) to find the borders
O(e) to copy the extract (where e is the size of the extract)
One easy way to get what you want involves augmenting your favourite binary search tree data structure (red-black trees, AVL trees, etc...) with left and right subtree sizes at each node --- call them L-size and R-size.
Assume that updating these fields in your tree data structures can be done efficiently (say constant time). Then here is what you get:
Insertion, deletion, and all the regular binary search tree operations as efficient as your choice of data structure --- O(log n) for red-back trees.
Given a key x, you can get the number of elements in your tree that have keys less than x in O(log n) time, by descending down the tree to find the appropriate location for x, summing up the L-sizes (plus one for the actual node you're traversing) each time you "go right". The "greater than" case is symmetrical.
Given a key x, you can get the sorted list x_L of elements that are less than x in O(log n + |x_L|) time by, again, descending down the tree to find the appropriate location for x, and each time you go right you tag the node you just traversed, appending it to a list h_L. Then doing in-order traversals of each of the nodes in h_L (in order of addition to h_L) will give you x_L (sorted). The "greater than" case is symmetrical.
Finally, for my answer to work, I need to guarantee you that we can maintain these L- and R-sizes efficiently for your choice of specific tree data structure. I'll consider the case of red-black trees.
Note that maintaining L-sizes and R-sizes is done in constant time for vanilla binary search trees (when you add a node starting from the root, just add one to L-sizes if the node should go in the left subtree, or one to the R-sizes if it goes in the right subtree). Now the additional balancing procedures of red-black trees only alter the tree structure through local rotations of nodes --- see Wikipedia's depiction of rotations in red-black trees. It's easy to see that the post-rotation L-size and R-size of P and Q can be recalculated from the L-sizes and R-sizes of A,B,C. This only adds a constant amount of work to the red-black tree's operations.

treemap vs arraylist - perfomance & resources while iterating/adding/editing values

Talking about performance and resources.
Which one is faster and needs less resources when adding and editing values, ArrayList or TreeMap?
Or is there any type of data that can beat these two? (it has to be able to somehow make the data sorted)
ArrayLists and TreeMaps are different types of structures meant for different things. It would be helpful to know what you are planning on using these structures for.
ArrayList
Allows duplicates (it is a List)
Amortized O(1) to add to the end of the list
O(n) to insert anywhere else in the list
O(1) to access
O(n) to remove
TreeMap
Does not allow duplicate keys (it is a Map)
O(logn) to insert
O(logn) to access
O(logn) to remove
Sorting the ArrayList will take O(nlogn) time (after inserting everything), while the TreeMap will always be sorted.
EDIT
You've mentioned that you are working with records retrieved from a database. Since they are coming from a DB, I would assume that they are already sorted - in which case you should just insert them one by one into an ArrayList.
Depends on what you need.
If we are talking about sorted data, as the ArrayList is a list/array, if it is sorted, you could get a value in O(log n) speed. To insert, though, it is O(n), because you potentially have to move the whole array when inserting a new element.
The TreeMap data structure is implemented as a red-black tree, which both the insertion time and search time are O(log n).
So, in a nutshell:
Data Structure Insertion Speed Search Speed
ArrayList (sorted) O(n) O(log n)
TreeMap O(log n) O(log n)
I'd definitely go with a TreeMap. It has the additional bonus of having it all ready to go right now (you'd have to implement some of the code yourself to make the ArrayList work).
Note:
If you don't get the O(n) (called big-Oh) notation, think of it as a formula for the amount seconds needed when the structure has n elements. So, if you had an ArrayList with 1000 elements (n=1000), it'd take 3 seconds (log 1000 = 3) to find an item there. It would take 1000 seconds to insert a new element though.
The TreeMap, on the other hand, would take 3 seconds to both search and insert.
ArrayList requires only O(1) to add and editing a value since you're only accessing it with the index. However, searching an element in a ArrayList is O(n) because you have to get through at most all the elements of the list.
But you have to be aware of what data structures you will implement. It's hard to choose betwen an ArrayList and a TreeMap since they are not really for the same purpose(a Map doesn't allow duplicates but not an ArrayList, etc.).
Here's two sheet which describes the differences between both (and others type of collections too).
List and Map :
Added Set too :

Binary tree to get minimum element in O(1)

I'm accessing the minimum element of a binary tree lots of times. What implementations allow me to access the minimum element in constant time, rather than O(log n)?
Depending on your other requirements, a min-heap might be what you are looking for. It gives you constant time retrieval of the minimum element.
However you cannot do some other operations with the same ease as with a simple binary search tree, like determining if a value is in the tree or not. You can take a look at splay trees, a kind of self-balancing binary tree that provides improved access time to recently accessed elements.
Find it once in O(log n) and then compare new values which you are going to add with this cached minimum element.
UPD: about how can this work if you delete the minimum element. You'll just need to spend O(log n) one more time and find new one.
Let's imagine that you have 10 000 000 000 000 of integers in your tree. Each element needs 4 bytes in memory. In this case all your tree needs about 40 TB of harddrive space. Time O (log n) which should be spent for searching minimum element in this huge tree is about 43 operations. Of course it's not the simplest operations but anyway it's almost nothing even for 20 years old processors.
Of course this is actual if it's a real-world problem. If for some purposes (maybe academical) you need real O(1) algorithm then I'm not sure that my approach can give you such performance without using additional memory.
This may sound silly, but if you mostly access the minimum element, and don't change the tree too much, maintaining a pointer to the minimal element on add/delete (on any tree) may be the best solution.
Walking the tree will always be O(log n). Did you write the tree implementation yourself? You can always simply stash a reference to the current lowest value element alongside your data structure and keep it updated when adding/removing nodes. (If you didn't write the tree, you could also do this by wrapping the tree implementation in your own wrapper object that does the same thing.)
There is a implementation in the TAOCP that uses the "spare" pointers in non-full nodes to complete a double linked list along the nodes in order (I don't recall the detail right now, but I image you have to have a "has_child" flag for each direction to make it work).
With that and a start pointer, you could have the starting element in O(1) time.
This solution is not faster or more efficient that caching the minimum.
If by minimum element you mean element with the smallest value then you could use TreeSet with a custom Comparator which sorts the items into correct order to store individual elements and then just call SortedSet#first() or #last() to get the biggest/smallest values as efficiently as possible.
Note that inserting new items to TreeSet is slightly slow compared to other Sets/Lists but if you don't have a huge amount of elements which constantly change then it shouldn't be a problem.
If you can use a little memory it sounds like a combined collection might work for you.
For instance, what you are looking for sounds a lot like a linked list. You can always get to the minimum element but an insert or lookup of an arbitrary node might take longer because you have to do a lookup O(n)
If you combine a linked list and a tree you might get the best of both worlds. To look up an item for get/insert/delete operations you would use the tree to find the element. The element's "holder" node would have to have ways to cross over from the tree to the linked list for delete operations. Also the linked list would have to be a doubly linked list.
So I think getting the smallest item would be O(1), any arbitrary lookup/delete would be O(logN)--I think even an insert would be O(logN) because you could find where to put it in the tree, look at the previous element and cross over to your linked list node from there, then add "next".
Hmm, this is starting to seem like a really useful data structure, perhaps a little wasteful in memory, but I don't think any operation would be worse than O(logN) unless you have to re balance the tree.
If you upgrade / "upcomplex" your binary tree to a threaded binary tree, then you can get O(1) first and last elements.
You basically keep a reference to the current first and last nodes.
Immediately after an insert, if first's previous is non-null, then update first. Similarly for last.
Whenever you remove, you first check whether the node being removed is the first or last. And update that stored first, last appropriately.

Categories