I'm trying to solve a problem but unfortunately my solution is not really the best for this task.
Task:
At a party there are N guests ( 0 < N < 30000 ). All guests tell when they get to the party and when they leave (for example [10;12]). The task is to take photos of as many people as possible at the party. On a photo there can only be 2 people (a pair) and each person can only be on exactly one photo. Of course, a photo can only be taken when the two persons are at the party at the same time. This is the case when their attendance intervals overlap.
My idea: I wrote a program which from the intervals creates a graph of connections. From the graph I search for the person who has the least number of connections. From the connected persons I also select the person who has the least connections. Then these two are chosen as a pair on a photo. Both are removed from the graph. The algorithm runs until no connections are left.
This approach works however there is a 10 secs limit for the program to calculate. With 1000 entries it runs in 2 secs, but even with 4000 it takes a lot of time. Furthermore, when I tried it with 25000 data, the program stops with an out of memory error, so I cannot even store the connections properly.
I think a new approach is needed here, but I couldn't find an other way to make this work.
Can anyone help me to figure out the proper algorithm for this task?
Thank you very much!
Sample Data:
10
1 100
2 92
3 83
4 74
5 65
6 55
7 44
8 33
9 22
10 11
The first line is the number of guests the further data is the intervals of people at the party.
No need to create graph here, this problem can be solved well on intervals structure. Sort people by ascending order of their leaving time(ending point of interval). Then iterate over them in that sorted order: if current person is not intersecting with anyone, then he should be removed. If he is intersecting with more than one person, take as a pair one of them who has earliest leaving time. During iteration you should compare each person only with next ones.
Proving this approach is not so difficult, so I hope you can prove it yourself.
Regarding running time, simple solution will be O(N^2), however I think that it can be reduced to O(N * logN). Anyway, O(N^2) will fit in 10 seconds on a normal PC.
Seems like a classical maximum matching problem to me.
You build a graph, where people who can possibly be pictured together (their time intervals intersect) are connected with an edge, and then find maximum matching, for example, with Edmond's Blossom algorithm.
I wouldn't say, that it's quite easy to implement. However, you can get quite a good approximation of this with Kuhn's algorithm for maximum matching in bipartite graphs. This one is really easy to implement, but won't give you the exact solution.
I have some really simple idea:
Assume, party will take Xh, make X sets for each hour, add to them appropriate people. Of course, people who will be there longer than hour will be in few sets. Now if there are 2 sets "together" with even number of ppl, you just could take n/2 photos for each sets. If there are 2 sets of odd number of people you are looking for someone who will be on each of that 2 sets, and move him to one of them (so you got 2 even number sets of people who will be on the same time on the party).
Remember to remove all used ppl (consider some class - Man with lists of all his/her hours).
My idea probably should be expand to more advanced "moving people" algorithm, through more than one neighboring set.
I think the following can do:
First, read all the guests' data and sort them into an array by leaving time ascending. Then, take first element of the array and iterate through next elements until the very first time-match found (next guest's entry time is less than this guest's leave time), if found, remove both from array as a pair, and report it elsewhere. If not, remove the guest as it can't be paired at all. Repeat until the array is empty.
The worst case of this is also N^2, as a party can be like [1,2],[3,4],... where no guests could be paired with each other, and the algorithm will search through all 30000 guests in vain all the time. So I don't think this is the optimal algorithm, but it should give an exact answer.
You say you already have a graph structure representation. I assume your vertices represent the guest and the interval of their staying at the party and the edges represent overlap of the respective intervals. What you then have to solve is the graph theoretical maximum matching problem, which has been solved before.
However, as indicated in my comments above, I think you can exploit the properties of the problem, especially the transitivity-like "if A leaves before B leaves and B leaves before C arrives, then A and C will not meet, either" like this:
Wait until the next yet unphotographed guest is about to leave, then take a photo of this one with the one who leaves next among those present.
You might succeed in thinking about the earliest time a photo can be taken: It is the time when the second person arrives at the party.
So as a photographer, goto the party being the first person and wait. Whenever a person arrives, take a photo with him/her and all other persons at the party. As a person appears only once, you will not have any duplicates.
While taking a photo (i.e. iterating over the list of guests), remove those guests who actually left the party.
Related
I'm trying to write a Java program to find the best itinerary to do a driving tour to see a game at each of the 30 Major League Baseball stadiums. I define the "best" itinerary using the metric Miles Driven + (200 * Number of Days on the Road); this eliminates tours that are 20,000 miles over 30 days, or 11,000 miles over 90 days, neither of which would be a trip that I'd want to take. Each team plays 81 home games over the course of a 183-day season, so when a team is at home needs to be taken into consideration.
Also, I'm not just looking for one best tour for the entire baseball season. I'm looking to find the best tour that starts/ends at any given MLB city, on any given date (Detroit on June 15, Atlanta on August 3, etc.).
I've got the program producing results that I'm pretty happy with, but it would take a few months to run to completion on my laptop, and I'm wondering if anyone has any ideas for how to make it more efficient.
The program runs iteratively. It starts with a single game; say, Chicago on April 5. It figures out which games you could get to next within the next day or two after the Chicago game; let's say there are two such games, in Cincinnati and Detroit. It creates a data structure containing all the stops on each prospective tour (one for Chicago-Cincinnati, one for Chicago-Detroit). Then, it does the same thing to find prospective 3rd stops for both of the 2-stop tours, and so on, until it gets to the 30th and last stop, at which point it ascertains the best tour.
It uses a couple of methods to prune inefficient tours as it goes. The main one is employed using a HashMap. The key is a character sequence that denotes (1) which ballparks have already been visited, and (2) which was the last one visited. So it would run into a duplicate on, say, A-B-C-D-E and A-D-B-C-E. It would then keep the shorter route and eliminate the longer one. For most starting points, this keeps the maximum number of tours on the list at any given time at around 20 million, but for some starting points, it gets up to around 90 million.
So ... any ideas?
Your algorithm is not actually greedy -- it enumerates all possible tours (pruning the obviously bad ones as you go). A greedy algorithm looks only one step ahead, makes the best decision for that step, then moves the next step ahead, etc. They are not exact but they are very fast. I would suggest adapting a standard greedy-type TSP heuristic for your problem. There are many common ones -- nearest neighbor, cheapest insertion, etc. You'll find various online sources describing them if you aren't already familiar with them.
You could also create duplicate stadium nodes, one for each home game, and then model this as a generalized TSP (GTSP) in which each "cluster" consists of the nodes for a given stadium. The distance from node (i_1,j_1) to (i_2,j_2) (where i = stadium and j = date) is defined by your metric.
Technically this is a TSP with time windows, but it's more complicated than the usual definition of the TSPTW because usually each node has a single contiguous time window (e.g., arrive between 8 am and 2 pm) whereas here you have a disjoint set of windows and you must choose one of them.
Hope this helps.
Firstly, I have read every thread that I could find on stackoverflow or other internet searching. I did learn about different aspects, but it isn't exactly what I need.
I need to solve a Rush Hour puzzle of size no larger than 8 X 8 tiles.
As I have stated in title I want to use A*, as a heuristic for it I was going to use :
number of cars blocking the red car's ( the one that needs to be taken out ) path should decrease or stay the same.
I have read the BFS solution for Rush hour.
I don't know how to start or better said, what steps to follow.
In case anyone needs any explanation, here is the link to the task :
http://www.cs.princeton.edu/courses/archive/fall04/cos402/assignments/rushhour/index.html
So far from what have I read ( especially from polygenelubricants's answer ) I need to generate a graph of stages including initial one and "succes" one and determine the minimum path from initial to final using A* algorithm ?
Should I create a backtracking function to generate all the possible ( valid ) moves ?
As I have previously stated, I need help on outlining the steps I need to take rather than having issues with the implementation.
Edit : Do I need to generate all the possible moves so I convert them into graph nodes, isn't that time consuming ? I need to solve a 8X8 puzzle in less than 10 seconds
A* is an algorithm for searching graphs. Graphs consist of nodes and edges. So we need to represent your problem as a graph.
We can call each possible state of the puzzle a node. Two nodes have an edge between them if they can be reached from each other using exactly one move.
Now we need a start node and an end node. Which puzzle-states would represent our start- and end-nodes?
Finally, A* requires one more thing: an admissable distance heuristic - a guess at how many moves the puzzle will take to complete. The only restriction for this guess is that it must be less than the actual number of moves, so actually what we're looking for is a minimum-bound. Setting the heuristic to 0 would satisfy this, but if we can come up with a better minimum-bound, the algorithm will run faster. Can you come up with a minimum-bound on the number of moves the puzzle will take to complete?
This is basically what my program is doing:
if you have 5 different shirts and 4 different pants to choose from, there are 20 different combinations of shirts and pants you can wear, and my program would iterate through all 20 combinations to determine which is the "best" to wear.
Except, in my situation, there are 11 different types of clothing (such as headgear, gloves, pants, earrings, shoes, cloak, etc) and as many as 10 items in each category. So, there could be as many as 11^10 combinations, and when I tried running my program with only 4 in each category, or 11^4, it took about 5 seconds to complete. 11^10 will take DAYS.
Currently, what I have going on is 11 for loops nested inside each other to go through every combination. This obviously isn't the best way to do this because it is so slow. How can I make this faster? For context, I have 1 "outer" ArrayList containing 11 ArrayLists, and each of those 11 ArrayLists is a list of objects (the clothing).
There really is no way of doing it without using a brute-force search, however you can chop down the choice area by generating a range of possible matches, and only iterating through the items in the list that are in that range.
Edit
Since your score is additive, the best item of clothing from each category is the one that gives the best sum of dex+str+int. This means that you don't need to consider combinations, just pick the best shirt, then the best pants, etc. So 11 loops, not 11 nested loops.
Old answer
In the general case, there brute force (what you're doing right now) is the only way to guarantee a correct answer.
Under some conditions, however, you can do better: For example if I know that if the one shirt-pants combination is "better" than another shirt-pants combination no matter what other items of clothing I choose, I can find the best shirt-pants combination first, and then consider what item of clothing is best to add to it. It's essentially a greedy search.
I am looking for help in designing a scheduling algorithm for a medical review board:
Everyday hundreds of customers are scheduled starting 14 days later to specialized doctors.
Each patient may need to visit more than one doctor, in extreme cases could be up to 5 visits.
There are a fixed number of rooms, some of them with specialized equipment. For some of the meetings only specific rooms can be used.
Each doctor has a specific schedule, but usually between 14:00 and 19:00.
The main requirement is to try to have each patient come only once.
Many contraints including second visit with same doctor, avoid conflicts of interest (patient and doctor know each other) among others. Hospital/residents problem not suitable, mainly because of constraints. We are trying a solution using a prioritizing scheme and then trying to reschedule exceptions.
At this moment we are trying to define the algorithm, this is part of a whole system to manage the medical review board.
The sytem is based on Java with dojo for FE and EJB for BE.
This is a question that may be closed because it is too localized. It won't be much help to someone else. But it's a fun problem so I thought I'd throw out some ideas.
You are going to need to find matches for the most complex cases first.
Look for "best-fit" solutions. Don't take time on an empty day if you can fill up another day.
You're going to have to figure out a way to iterate through the matching so you try a wide range of possibilities. Some way to pull back, make a different choice, and then continue without getting into an infinite loop.
You might do the fitting up to (let's say) 80% and then swap around people. Swap a 3 hour appointment with a 2 and a 1 or something. The goal is to leave the schedule with the most "flexibility".
You're going to need to determine your swapping rules. What makes a schedule better?
Here's a bunch of SO questions for you to read:
Worker Scheduling Algorithm
Best Fit Scheduling Algorithm
Is there a scheduling algorithm that optimizes for "maker's schedules"?
Hope some of this helps.
Just wondering if anyone could help me out with some code that I'm currently working on for uni. It's a sliding tile puzzle that I'm coding and I've implemented an A* algorithm with a Manhattan distance heuristic. At the moment the time for it to solve the puzzle can range from a few hundered milliseconds to up to about 12 seconds for some configurations. What I was wanting to know is if this range in time is what I should be expecting?
I've never really done any AI before and I'm having to learn this on the fly, so any help would be appreciated.
What i was wanting to know is if this range in time is what i should be expecting?
That's a little hard to figure out just from the information you've provided. It would help if you could describe how you implemented A*, or if you profiled your application and needed help with specific areas that were slow.
One thing to note that'd probably speed up your average solution time: Half of the starting positions of any n-tile puzzle can never lead to a solution, so you can immediately exclude certain configurations very quickly. For example, you cannot solve an 8-tile puzzle that looks like this:
1 2 3
4 5 6
8 7 .
To see why, note that because the blank space has to wind up back where it started, the overall number of "up"/"down" moves must be equal, as does the overall number of "left"/"right" moves. That means that the overall number of moves must be even.
But the 7/8 transposition here is one move off from the starting puzzle, without changing the blank position! So this puzzle can't be solved. (However, if we made two transpositions, then it'd be solvable again.)
Like you should know you cannot expect any general time. It depends everytime on the code itself especially in which deap your implementation walkes down the tree and also if your code can use the advantages for processor features.
For debugging I would save or print out (but this takes time!) in which level of your tree you are.
Also remember that the weights are very important. E.g.:
123
4 6 <- your final state
789
213 1 3
To change 4 6 is much more expensive than 426
789 789
I hope that helps.
Obviously, this depends not only on your hardware, but on your implementation.
It's not a good measure of performance, though: What you want to do is determine the effective branching factor of your heuristic, vs the actual branching factor of some other non-heuristic approach.
I don't want to say too much more, since this is a homework problem, but if memory serves, Russel and Norvig conver this in the context of the sliding puzzle itself... chapter three, perhaps? (My R+N is not at hand.)