I know priority queues tend to use heaps but what is the point of a priority queue when they basically seem the same as heaps? I initially thought all priority queues use hash maps to keep track of all object's locations in the heap, making finding and updating/deleting said object easier. However, I have used Java's priority queue and you have to manually iterate over it to update or delete objects not at the root. It seems odd to have a priority queue that appears to literally just be a heap with nothing else special about it.
It might help to reason by analogy here:
List is to dynamic array as PriorityQueue is to binary heap.
That is, the abstract idea of a list (a sequence of things starting at position zero where items can be inserted and removed) is a nice, high-level concept, while a dynamic array (an array along with a capacity that doubles or 1.5x’s in size if extra space is needed) is one possible way of implementing a list. If you’re using a list, you can just think “oh, it’s a sequence, and I can put things places” without worrying how that sequence is actually represented. On the other hand, working with a dynamic array requires you to track which array elements are valid versus which ones don’t actually get used, you need to manually transfer things over when there’s no more space and think carefully about your growth strategy, etc. The distinction here is at what level you’re viewing things. If you just need “a sequence,” think “list.” If you need to build a type from scratch representing a sequence, think “dynamic array.”
This is basically what’s going on with priority queues versus binary heaps. A priority queue abstractly represents the idea of “I can put things in and they’ll come back in sorted order.” A binary heap is one specific possible way of implementing a priority queue. When working with an abstract priority queue, you can focus your thoughts purely on questions like “what elements do I want to add?” and “how do I rank them?” When working with a binary heap, you have to think about things like “do I use one-indexing or zero indexing?” and “what’s the formula for identifying the children of a node at index k?” If all you need is the ability to put things in a bag and take them out in sorted order, you can use a priority queue without worrying about how it works. If you need to build one from scratch, you can use a binary heap.
Going back to the list versus dynamic array analogy: there are many types you can use to represent lists. Dynamic arrays are one, but you could also use a circular buffer (good if you add or remove from the ends) or a linked list (good if items get moved around between lists). Similarly, there are many ways you can implement a priority queue. Binary heaps are one option, but there’s also pairing heaps, binomial heaps, etc. Keeping the relevant abstraction in focus - I just want a sequence of things, I just want a way to retrieve things in sorted order - means that you don’t need to worry as much about how things work when what you care about is what operations you want to do.
Personal opinion, you are right, Java's PriorityQueue is heap. Java as a high level programming language, it is reasonable for it to provide all the common and standard algorithm implementations, most of the time we focus on business logic development and how to get the job done faster. So we don't want spend too much time on building a priority queue from the ground up, besides it is tedious and error-prone to do it yourself.
If you want update or delete objects at the same time, and don't want to iterate over it manually, you can just do this:
Object updatedObject;
priorityQueue.add(priorityQueue.remove(updatedObject));
although it's not efficient enough when updating occurs frequently, there is an alternative algorithm called Fibonacci heap to do the job better:
It seems odd to have a priority queue that appears to literally just be a heap with nothing else special about it.
Why?
Nothing about the name PriorityQueue promises anything more than the ability to put items in one end and get them out the other in sorted-by-priority order. That's also basically the definition of a heap, which is why a heap makes an ideal data structure to implement a priority queue.
So, essentially, the Java Collections Framework designers implemented a heap. Only instead of calling it a Heap, they called it a PriorityQueue. End of story. As the song lyric goes: "Who could ask for anything more?"
Java's Priority queue is can be either a min Heap or a max Heap, and based on how you have constructed it, it will always give you the min/max value.
Related
when I have an Array and I want to remove one value from it I need to shift the next element to lift but the idea is to do shifting one time when a n of null value in array.
Of course it is micro-optimisation, and ArrayList (maybe LinkedList) would be a production quality data structure for dynamic arrays.
Here you might keep an extra list of nulled entries. At a certain threshold one could do **System.arraycopy**s to remove the gaps. If there are many index based inserts too, you might opt for keeping gaps, maybe collecting small gaps together.
This is a traditional technique in editors for text.
For several data structures one might search through guava classes.
For instance write-on-copy data structures.
Or concurrency, compactifying in the background.
For a specific data structure & algorithm maybe someone else can give pointers.
need help with an optimized solution for the following problem http://acm.ro/prob/probleme/B.pdf.
Depending on the cost i either traverse the graph using only new edges, or using only
old edges, both of them work, but i need to pass test in a limited number of milliseconds,
and the algorithm for the old edges is dragging me down.
I need a way to optimize this, any suggestions are welcome
EDIT: for safety reasons i am taking the algorithm down, i'm sorry, i'm new so I don't
know what i need to do to delete the post now that it has answers
My initial algorithmic suggestion relied on an incorrect reading of the problem. Further, a textbook breadth-first search or Dijkstra on a graph of this size is unlikely to finish in a reasonable amount of time. There's likely an early-termination trick that you can employ for large cases; the thread Niklas B. linked to suggests several (as well as some other approaches). I couldn't find an early-termination trick that I could prove worked.
These micro-optimisation notes are still relevant, however:
I'd suggest not using Java's built-in Queue container for this (or any other built-in Java containers for anything else in a programming contest). It turns your 4-byte int into a gargantual Integer structure. This is very probably where your blowup is coming from. You can use a 500000-long int[] to store the data in your queue and two ints for the front and back of the queue instead. In general, you want to avoid instantiating Objects in Java contest programming because of their overhead.
Likewise, I'd suggest representing the edges of the graph as either a single big int[] or a 500000-long int[][] to cut down on that piece of overhead.
I only saw one queue in you code. That means you are searching from one direction only.
You may want to take a look at
Bidirectional Search
I was wondering if the data layout Structs of Arrays (SoA) is always faster than an Array of Structs (AoS) or Array of Pointers (AoP) for problems with inputs that only fits in RAM programmed in C/JAVA.
Some days ago I was improving the performance of a Molecular Dynamic algorithm (in C), summarizing in this algorithm it is calculated the force interaction among particles based on their force and position.
Original the particles were represented by a struct containing 9 different doubles, 3 for particles forces (Fx,Fy,Fz) , 3 for positions and 3 for velocity. The algorithm had an array containing pointers to all the particles (AoP). I decided to change the layout from AoP to SoA to improve the cache use.
So, now I have a Struct with 9 array where each array stores forces, velocity and positions (x,y,z) of each particle. Each particle is accessed by it own array index.
I had a gain in performance (for an input that only fits in RAM) of about 1.9x, so I was wondering if typically changing from AoP or AoS to SoA it will always performance better, and if not in which types of algorithms this do not occurs.
Much depends of how useful all fields are. If you have a data structure where using one fields means you are likely to use all of them, then an array of struct is more efficient as it keeps together all the things you are likely to need.
Say you have time series data where you only need a small selection of the possible fields you have. You might have all sorts of data about an event or point in time, but you only need say 3-5 of them. In this case a structure of arrays is more efficient because a) you don't need to cache the fields you don't use b) you often access values in order i.e. caching a field, its next value and its next is useful.
For this reason, time-series information is often stored as a collection of columns.
This will depend on how exactly you access the data.
Try to imagine, what exactly happens in the hardware when you access your data, in either SoA or AoS.
To reason about your question, you must consider following things -
If the cache is absent, the performance should be the same, assuming that memory access latency is equal for all the elements of the data.
Now with the cache, if you access consecutive address locations, definitely you will get performance improvement. This is exactly valid in your case. When you have AoS, The locations are not consecutive in the memory, so you must lose some performance there.
You must be accessing in for loops your data like for(int i=0;i<1000000;i++) Fx[i] = 0. So if the data is huge in quantity, you will easily see the small performance benefits. If your data was small, this would not matter much.
Finally, you also don't know about the DRAM that you are using. It will have some benefits when you access consecutive data. For example to understand why it is like that you can refer to wiki.
I am creating a tree in java to model the Extensive-form of a game for an AI. The tree will be an 25-ary tree (a tree in which each branch at most has 25 child branches) because at each turn of the game there are 25 different moves. Because the number of new branches that have to be created in each new layer of the tree is 25^n I'm very concerned with making this efficient. (I intend to remorselessly cut of branches to keep them from growing in order to keep things from getting bogged down). What is the best way to model such a tree when efficiency is such a concern? My first impression is to have a node object where each node has a parent node and an array of child nodes but this means creating a lot of objects. Ultimately these are my questions:
Is this the fastest way create and manage my tree?
What is a good way to figure out how much time any given algorithm or process in a program is going to take? (the only one I've thought of so far is to create a date before the process and then after and compare the # of milliseconds that have passed)
Any other thoughts are also welcome. My question implies and is related to a great number of other questions, i would expect. If i have been ambiguous or unclear please comment to let me know instead of down-voting and storming off as this isn't productive.
Realistically, the way you described is the best approach. It'll perform reasonably well compared to anything else you could do and will be straightforward to implement.
Time and again people are asking questions about how to do something "efficiently". The best answer is nearly always, "don't even bother trying". Unless your improvement is an algorithmic one, it's unlikely to make much difference anyway, and especially in a case like this, the extra effort and complexity isn't worth whatever miniscule gain you might be able to achieve.
Putting it another way, and to borrow a quote (though I can't remember the originator), the first rule of optimization is: don't.
Having said that, if you really feel the need to squeeze every last drop of speed, you could try caching and re-using objects (instead of discarding them completely, keep track of them in a free object store, and then when you need to create a new object, first check the free object store to check if there is an existing one). As always, you'll need to measure performance before and after to see if it really helps (chances are it won't help much, unless physical memory is really constrained, in which case garbage collection can become expensive).
I agree with the previous comment about only optimizing once you have implemented the rest of the application.
On the other hand, I do realize a few things that may be of importance:
Branching factor of 25: Although not ridiculously huge, it is still large with respect to other problems. For a tree, you will definitely have to have a list for each node to indicate the list of SubNodes. You can do this either by making a Node class which has a collection of nodes within it, or have an external Map that maps a given node to a list of children nodes.
Removing and adding of elements will be done: This lends itself to a LinkedList implementation of the stored children since you don't want to perform costly removes and adds. A HashSet may work also, but the problem is that you may need more memory.
Iteration of the elements may or may not be done: If you want to iterate over the entire list at each step, LinkedLists are fine. If you want to prioritize the nodes then you may be saving memory by using a priority queue data structure. Priority queues are especially helpful if you are going to implement a heuristic function and evaluate which child to move to at any given node.
Thats all I have so far, but I'll keep updating if I think of more things, or if you update your content.
I am trying to find (or write) a Java class that represents a fixed-size, non-blocking, auto-discarding FIFO queue. (e.g. if the queue has a capacity of 100, putting item 101 removes item 1 then successfully appends item 101.) The answer to this question seems helpful, but I have an extra constraint - I need it to be fast, for capacities of around 100-1000.
The items in my queue are only Floats, so is it generally more efficient to use something like the AutoDiscardingDeque<Float> described in the linked question, or just to use a float[] and some System.arraycopy() manipulation to handle it?
Alternatively, is there an even better solution that I haven't thought of?
If you only ever need to use floats, then yes, a float[] will be optimal in the implementation. You shouldn't need to copy the array at all - just maintain a "start position" and an "end position". You already know the capacity, so you can create the array to start with and never budge from it.
Note that I'm not suggesting you use a float[] instead of a queue here - just that you can implement the queue using a float[]. Of course, that means you can't easily make it implement Deque<Float> which you may want it to, without incurring boxing/unboxing costs... but if you're happy only ever using the concrete class within your client code, you'll end up with efficiency savings.
If you think you are going to want to perform a number of math related functions on your structure, specifically statistics functions like mean, max, min, ect., then you could use DescriptiveStatistics from Apache Commons Math (http://commons.apache.org/math/userguide/stat.html#a1.2_Descriptive_statistics). You can set your window size and it will automatically maintain your elements. However it takes doubles, not floats, so it might not be the perfect solution for you.
I need it to be fast, for capacities
of around 100-1000
Please, specify, which operations you need to be fast? Implementation is very sensible to how you are going to use it.
If you need accessing it by index very often, than the solution above seems good enough