Thinking about more optimal solution for below algorithm

Thinking about more optimal solution for below algorithm - java

There are n vendors on amazon who are selling the product at particular price at particular time, I have to design an algorithm which will select the product with least price at particular time.
For ex: For below set of input
Input Format :
<StartTime, EndTime, Price of product by this vendor in this time frame>
1 5 20
3 8 15
7 10 8
Output should be:
1 2 20
3 6 15
7 10 8
I have done with the solution by storing the prices corresponding to time in hashmap, and updating the price if there exist a price lesser then the old one corresponding to that time, and then made the list in vendor class to store all the times corresponding to the particular price.
But above solution is taking O(n2) time complexity, so searching for some fancy DS or approach to solve this in lesser time complexity.

You can use a sweep line algorithm and a multiset to solve it in O(N log N) time:
Let's create two events for each vendor: the moment she starts selling the item and the moment she ends. We'll also create one "check" event for each time we're interested in.
Now we'll sort the list of events by their times.
For each event, we do the following: if it's a start event, we add the new price to the multiset. Otherwise, we remove it.
At any moment of time, the answer is the smallest element in the multiset, so we can answer each query efficiently.
If the multiset supports "fast" (that is, O(log N) or better) insertions, deletions and finding the smallest element, this solution uses O(N log N) time and O(N) space. There is no mulitset in the Java standard library, but you can use a TreeSet of pairs (price, vendor_id) to work around this limitation.

Related

Time Complexity and Better approach of coding

I have a problem, input will contain loggedin user id and timestamp when user logged in. when user logs in again need to find out no of times user logged in last x seconds.
input =
{(P1, 0),
(P2, 1),
(P3, 2),
(P1,3),
(P1,4),
(P2,5),
(P1,6)}
Q1: For last 4 seconds for P1 need to output how many times user has logged in.
output: P1,2
Q2: In last 6 seconds for P1:
output: P1,4
To solve this, I have initially used a hashMap, with keys as person id and values as set of timestamps for each person.
Map<personID,set<TimeStamp>> personMap = new HashMap();
personMap.put(P1,Treeset(0,3,4));
personMap.put(P2,Treeset(1,5));
personMap.put(P3,Treeset(2));
then on the final entry, we retrive the treeset and find values that fall after last 4 seconds from given time. This is working, but I am confused on timecomplexity, what will be timecomplexity for inserting and retrival considering values are Treesets?
Is there any better datastructure of storing timestamps in personMap?

The bottom layer of Treeset is red-black tree, and the time complexity of query is O (log n),
Compared with HashSet, TreeSet has slightly lower performance. The time complexity of insert query and other operations is O (log n).
HashMap is recommended.

Time Complexity:
tldr: I'm pretty sure this takes O(n) per query and O(log(n)) per insert.
For each query, you initially use log(n) on the TreeSet to select the right user.
Once you have the user, it sounds like you are finding the first timestamp within your bound (again can be log(k)) and iterating through the rest in your bound. This part becomes O(k) because you could query for a very long time back and end up iterating through your whole set.
Assuming your queries are random, the time complexity looks something like O(log(N) + K/2 + log(K)) -> O(K/2) -> O(n) (K is the set size for the user size, N is user count).
Solution:
tldr: Use a prefix sum.
The one way I can think of optimizing this from O(K) into O(log(K)) would be to keep track of the total number of logins performed by the user and store that with each timestamp.
Something like:
P1#Time0: 1 total login
P1#Time5: 2 total logins
P1#Time7: 3 total logins
Then for each query, you can just query the first time stamp out of your range and subtract its login count with the current login count.
Example: You login at time = 9 and query for all logins within 4 seconds prior.
You have:
P1#Time0: 1 total login
P1#Time5: 2 total logins
P1#Time7: 3 total logins
P1#Time9: 4 total logins (just now added)
Query finds Time0 as upper bound in log(n):
P1#Time0: 1 total login <- closest time outside 4 seconds
P1#Time5: 2 total logins
P1#Time7: 3 total logins
P1#Time9: 4 total logins
Take 4 - 1 (from their login counts) = 3, you have logged in three times from your current login to 4 seconds ago (this method counts your current login as well, subtract one from your result if you don't want that).
Look up a prefix sum for more info on this.
One note about this, it assumes that any newly added entries will come later than previous ones or you have the entire set of timestamps at the beginning. Your question sounds a lot like one coming from leetcode and if that's the case they most likely would put a condition like this in there. Otherwise, the solution might not function for what you need.

If array needs to be sorted would it count as part of the binary search algorithm

I am trying to understand the speed of the Binary Search algorithm.
I understand it needs to operate on a sorted array.
However if the array comes in unsorted and performing the sorting. Wouldn't the sorting be part of the Binary Search and thus its performance would be slower?
I am confused because I think that there is very little chance to use this algorithm if the data does not come in sorted.
And if my code needs to sort it then why isn't it counting towards the search algorithm.
Sorry if I am confusing,
Thank you for helping.

You can't just point at an algorithm and say: It's got O(n^2) complexity!
That's what people usually say, sure. But that's shorthand. They're omitting things; assuming that the listener / reader will make assumptions.
You need to fully describe the exact algorithm, the conditions under which it is applied, and the precise definition of n and any other variable.
Then, you can answer that question. The problem you're having here is that the definition of 'what is the performance of binary search' is unclear. If you assume it means X whilst your buddy assumes it means Y, and you then argue about the answers, you're not actually having a constructive debate at all. You're just tilting at windmills; the real problem is that neither of you figured out the problem is communicating the basics.
Given that there is some confusion here, I'll give you 3 different more or less equally sensible more fleshed out definitions, along with the actual answer for each such definition. Hint, for one of them, 'binary search' isn't the fastest algorithm!
Given [1] a list that is already sorted, and [2] a single value, write me an algorithm that determines if this value is in the list or not.
The best answer would be: A binary sort algorithm, and its complexity would be O(log n).
Given [1] a list that is not sorted, and [2] a single value, write me an algorithm that determines if this value is in the list or not.
The best answer would be: Just iterate through the list. Its complexity would be O(n), and binary sort is not part of this answer at all.
given [1] a list that is not sorted, and [2] a list of tests, whereby each individual test is defined by a single value, but they all use the same input unsorted list, write an algorithm that will, for each test, determine if the value for that test is in the list or not, and then give me the amortized complexity (basically, the complexity of the whole thing, divided by the # of tests we ran).
Then the best answer would be: First sort the list, spending O(n log n) time to do so, but we get to amortize that over the test case count, and then use binary search for each individual test, adding an O(log n) complexity to each test. If we term n the size of the input list and t the number of tests we have, this gets us:
O( (n log n)/t + O(log n) )
Which is the actual answer to the question, complex as it may look. But, if t is large or even considered effectively infinite in size, OR we add one more rider to the question:
The list from [1] is given to you in advance and, within reasonable time and memory limits, you may preprocess this data without needing to amortize these costs across your test cases
then that boils down to just O(log n), as the large value for t makes that (n log n) / t factor approach zero.
In communicating this to your buddy, given that we don't talk in entire scientific papers, one might then say: "The algorithmic complexity of the binary sort algorithm is O(log n)", even if that omits a gigantic chunk of the full story.
You interpret the question as per the second case (input is unsorted, the input comprises both the list and the value to search for, no multi-test clause). Someone who says 'binary search is O(log n)' is labouring under either the first or third. You're both right.
NB: The third definition seems unusually complicated. However, it matches common scenarios. For example, 'we have compiled a list of folks living in town and their phone numbers, and we want to print them in a giant book with the aim of letting recipients of this book look up phone numbers. We expect over the lifetime of a single print run that the 100,000 citizens of the township will eaech do on average about 50 lookups, for a grand total of 5 million lookups for this single list. That gives you t= 5 million, n = 200,000 (let's say 200k people live here, half of which get a phonebook). Plug those numbers in and sorting the phonebook wins by a landslide vs. releasing the phonebook in arbitrary, unsorted order. Even if, yes, you start 'down' the effort of sorting it and won't make up for that loss until a few folks have speedily looked up a few phone numbers to make up for your efforts in sorting it before printing the book.

Yes. If
the data comes in unsorted
you only need to search for one element
...then you would have to first sort the data to use binary search, which would take a total of O(n log n + log n) = O(n log n) time.
But once the data is sorted, you can then binary search on that data as many times as you want. You don't have to sort it again each time.

DYNAMICALLY perform insert(inedex,data), delete(index), getAt(index) operations in Efficient time and memory.

Recently I have developed and IMPLEMENTED an algorithm which is performing following 4 operations in Non-Amortized O(log m) time , where m is the number of elements present in Memory=O(m).
1.>insert(int index,int data)::
It inserts data at any index(just like in array) randomly and dynamically but with different time complexity.It can be understood as inserting in an array at a particular index but main feature is that it can insert data at same index in O(log m) time along with SHIFTING ALL data present contiguously from that index onwards .For eg:: insert any data as:
index ,data
{
(1 ,0),
(2 ,1),
(78 ,2),
(0 ,3),
(45 ,4),
(58999,5),
(32111,6),
(1 ,7),
(78 ,8),
(78 ,9),
(78 ,-1),
(0 ,-2),
(0 ,-3),
(0 ,-4),
(23 ,-5)
}. Then total time complexity for all insertions is O(log (m!) ), here m=15 and Memory= O(15).
NOTE:: NEW SEQUENCE OF DATA IS STORED as per my ALGORITHM AS::
{
(0,-4),
(1,-3),
(2,-2),
(3,3),
(4,7),
(5,0),
(6,1),
(23,-5),
(45,4),
(78,-1),
(79,9),
(80,8),
(81,2),
(32111,6),
(58999,5)
}.
HERE INSERTION AT PREOCCUPIED INDEX IS ON SAME INDEX WITH SHIFTING ALL ELEMENTS ON and BEYOND THAT INDEX IN WORST-CASE OF (log m) time complexity.
2.> delete(int index):
It deletes data ,if present , at index . Worst-Case Time is O(log m).
3.> getAt(int index)::
It retrieves data ,if present, at index.Time in Worst-Case is O(log m).
4.> printAll():: It will print All elements(data) in increasing order of index.Time in ALL CASES is O(m),m= no. of elements present.
TIME COST::
Here Worst Case Time for each of Ist 3 operations is O(log m) at any time where m=no. of elements present at that time and for last one is O(m).
SPACE COST::
Space used is O(m), m= no. of elements present.
AS I HAVE THOROUGHLY SEARCHED ON INTERNET BUT NOT FOUND SOLUTIONS WITH SUCH OPTIMIZED TIME AND SPACE COST FOR ALL ABOVE STATED 4 OPERATIONS.
I WANT TO KNOW WETHER SUCH TIME AND MEMORY COST HAS BEEN ACHIEVED FOR ALL ABOVE 4 OPERATIONS BY ANYONE TILL NOW ,if not Can it be patented?
Also I cannot reveal anything about my algorithm more than this ....

I don't think that's possible.
If we use ordered binary search tree then also for the above test cases it will take time as O(log 58999!) and space to be O(58999).
It is because all the empty nodes would have to be created for reaching last index .
Even if we use array then also we have to create array till 58999 index and moreover worst case insertion will take O(58989) time complexity.
#Gaara :Its amazing that you have achieved O(log 15) time complexity in worst case but for getting a patent I truely don't know.As you will not reveal anything more about your algorithm...
#David Eisenstat: I also want to know how you will achieve it .

Way to make this HashMap more efficient

I have a class User that has 3 objects(?) I'm not sure of the terminology.
an (int) ID code
an (int) date that the user was created
and a (string) name
I am trying to create a methods that
Add a user to my data structure (1)
return the name of a user based on their ID number (2)
return a full list of all users sorted by date (3)
return a list of users who's name has a certain string, sorted by date (4)
return a list of users who joined before a certain date (5)
I have made 10 arrays based on the years joined (2004-2014) and then sort the elements in the arrays again by the date (sorting by month then day)
Am I correct in thinking that this means methods (3) and (5) have O(1) time complexity but that (1),(4) and (2) have O(N)?
Also is there another data structure/method that I can use to have O(1) for all my methods? I tried repeatably to come up with one but the inclusion of method (2) has me stumped.

Comparison based sorting is always O(N*log N), and adding to already sorted container is O(log N). To avoid that, you need buckets, the way you now have them for years now. This trades memory for execution time.
(1) can be O(1) only if you only add things to HashMaps.
(2) can be O(1) if you have a separate HashMap which maps the ID to the user.
(3) of course is O(N) because you need to list all N users, but if you have a HashMap where key is the day and value is list of users, you only need to go through constant (10 years * 365 days + 2) number of arrays to list all users. So O(N) with (1) still being O(1). Assuming users are unsorted withing a single day.
(4) Basically same as 3 for simple implementation, just with less printing. You could perhaps speed up the best case with a trie or something, but it'll still be O(N) because it will be certain % of N which will match.
(5) Same as (3), you just can break out sooner.

You have to make compromises, and make informed guesses about the most common operations. There is a good chance that the most common operation will be to find a user by ID. A HashMap is thus the ideal structure for that: it's O(1), as well as the insertion into the map.
To implement the list of users sorted by date, and the list of users before a given date, the best data structure would be a TreeSet. The TreeSet is already sorted (so your 3rd operation would be O(1), and can return a sorted subset in O(log(n)) time.
But maintaining a TreeSet in parallel to a HashMap is cumbersome, error-prone, and costs memory. And insertion complexity would become O(log(N)). If these aren't common operations, you could simply iterate over the entries and filter them/sort them. Definitely forget about your 10 arrays. This is unmaintainable, and a TreeSet is a much better and easier solution, not limited to 10 years.
The list of users by name containing a given string is O(N), whatever the data structure you choose.

A HashMap doesn't sort anything; its primary purpose/advantage is to offer near-O(1) lookups (which you can use for lookups by ID). If you need to sort something, you should have the class implement Comparable, add it to a List, and use Collections.sort to sort the elements.
As for efficiency:
O(1)
O(1)
O(n log n)
O(n) (at least)
O(n) (or less, but I think it will have to be O(n))

Also is there another data structure/method that I can use to have O(1) for all my methods?
Three HashMaps. This isn't a database, so you have to maintain "indices" by hand.

How can I calculate the Big O complexity of my program?

I have a Big O notation question. Say I have a Java program that does the following things:
Read an Array of Integers into a HashMap that keeps track of how many occurrences of the Integers exists in the array. [1,2,3,1] would be [1->2, 2->1, 3->1].
Then I grab the Keys from the HashMap and place them in an Array:
Set<Integer> keys = dictionary.keySet();
Integer[] keysToSort = new Integer[keys.size()];
keys.toArray(keysToSort);
Sort the keyArray using Arrays.sort.
Then iterate through the sorted keyArray grabbing the corresponding value from the HashMap, in order to display or format the results.
I think I know the following:
Step 1 is O(n)
Step 3 is O(n log n) if I'm to believe the Java API
Step 4 is O(n)
Step 2: When doing this type of calculation I should know how Java implements the Set class toArray method. I would assume that it iterates through the HashMap retrieving the Keys. If that's the case I'll assume its O(n).
If sequential operations dictate I add each part then the final calculation would be
O(n + n·log n + n+n) = O(3n+n·log n).
Skip the constants and you have O(n+n log n). Can this be reduced any further or am I just completely wrong?

I believe O(n + nlogn) can be further simplified to just O(nlogn). This is because the n becomes asymptotically insignificant compared to the nlogn because they are different orders of complexity. The nlogn is of a higher order than n. This can be verified on the wikipedia page by scrolling down to the Order of Common Functions section.

When using complex data structures like hash maps you do need to know how it retrieves the object, not all data structures have the same retrieval process or time to retrieve elements.
This might help you with the finding the Big O of complex data types in Java:
http://www.coderfriendly.com/wp-content/uploads/2009/05/java_collections_v2.pdf

Step 2 takes O(capacity of the map).
Step 1 and 4 can get bad if you have many keys with same hash code (i.e. O(number of those keys) for a single lookup or change, multiply with the number of those lookups/changes).
O(n + n·log n) = O(n·log n)

You are correct to worry a little about step 2. As far as I can tell the Java API does not specify running times for these operations.
As for O(n + n log n) Treebranch is right. You can reduce that to O(n log n) the reason being that for some base value n0 n log n > c*n forall c /= 0, n > n0 this is obviously the case, since no matter what number you chose for c you could use an n0 set to 2^c+1

First,
Step 1 is only O(n) if inserting integers into a HashMap is O(1). In Perl, the worse case for inserting into a hash is O(N) for N items (aka amortised O(1)), and that's if you discount the length of the key (which is acceptable here). HashMap could be less efficient depending on how it addresses certain issues.
Second,
O(N) is O(N log N), so O(N + N log N) is O(N log N).

One thing big O doesn't tell you is that how big the scaling factor is. It also assume you have an ideal machine. The reason this is imporant is that read from a file is likely to be far more expensive than everything else you do.
If you actually time this you will get something which is startup cost + read time. The startup cost is likely to be the largest for even one million records. The read time will be propertional to the number of bytes read (i.e. the length of the numbers can matter) If you have 100 million the read time is likely to be more important. If you have one billion records, alot will depend on the number of unique entries rather than the total number of entries. The number of unique entries is limited to ~2 billion.
BTW: To perform the counting more efficiently, try TIntIntHashMap which can minimise object creation making it several times faster.
Of course I am only talking about real machines which big O doesn't consider ;)
The point I am making is that you can do a big O calculation but it will not be informative as to how a real application will behave.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.