Roommate matching algorithm - java

I'm currently in an advanced data structure class and learned a good bit about the graph. For this summer, I was asked to help write an algorithm to match roommates. Now for my data structure class, I've written a City Path graph and performs some sorting and prims algorithms and I'm sort of thinking that a graph may be a great place to start with my roommate matching algorithm.
I was thinking that our data base could just be a text file, nothing too fancy. However I could initialize each nodes in the graph as a student each student would have an un-directed edge to many more students (no edge to the student who doesn't want to be roommate with another one, the sorority also doesn't want repeating roommate). Now I could also make the edge weights more, depending on the special interest.
Everything listed above is quite simple and I don't think I'll run into any problem implementing it. But here is my question:
How should I update the common interest field? Should I start that with a physical survey and then go back into the text file and update the weight of the edge manually? Or should I be creating a field that keeps track of the matching interests?

What you're trying to design is called bipartite matching. Fortunately unlike other bipartite matching algorithms, you won't need fancy graph algorithms and complex implementation for this. This is very close of Stable Marriage Problem and surprisingly there are very effective even easier algorithm for this.
If you are interested, I can share my C++ implementation of stable marriage problem.

Related

Genetic Algorithm for Appliance Scheduling in java

i am doing a project on household appliance scheduling using genetic algorithm in java. I have little knowledge of genetic algorithm and Java programming. I would really appreciate if anyone could help me.
My project is to schedule appliances like washing machine, aircon etc. in a 48 time slot with varying electricity price for each slot but it stays constant in one slot. My objective is to have minimum cost. Using binary to represent the appliance ON/OFF. Solution should be in n x m matrix
For washing machine for example, it has a cycle of 1 hour, thus the binary should look like this "0000110000..." and not " 0001010000" and also there is preferred ON period for appliances for example 6-8am and 6-10pm and if the appliance is to be used 2x a day then it will be 1x during the 6-8am slot and 1x during the 6-10pm.
Also i have to include renewable source(PV) to sell when the price is right or store the energy to supply when it is not worth to sell. I'm not sure how. Should this be another GA? Or can it be together?
How do i start to program this? And how do i use excel file to store info about appliance so that i can extract it from excel when i need.
Thank you.
first I suggest develop the model for the genetic algo, i didn't understand exactly how the model is going to work
you need to define those key components:
how to represent the problem with chromosomes (building blocks)
how to induce mutation and breed between 2 (or more) chromosomes
fit function to evaluate each solution
I suggest because of your lack of knowledge in java, start with a coding lang you familiar with, best approach will be an oriented data lang like Python or R, if you don't familiar with any I suggest go with Python
once you develop and tested your model (the hard part) you can write it in java if you want to... (my suggestion is to use python all the way)
I suggest focus on that first, all other utilities like excel csv and such are pretty straight forward once you have your model and data figured out and implemented
the good news you are using the best approach to solve this NP problem, I suggest to search in the internet for implementation of this problem's variants
also take a look at this solution to the traveling salesman problem to get familiar with the genetic algo approach

Which data structure should be chosen? [Android Dictionary]

After several hours I've searched information from internet, I still feel not sure anything. My problem is: i want to implement a dictionary on android devices (java base), my requirements are speed and then memory-efficiency, but I couldn't make a decision on which data structure to use for searching.
I have a list of data structures, help me understands them and choose one:
Ternary tree
TRIE
Aho–Corasick tree
[...your suggest DS...]
And will be very kind if somebody can guide me about getting results (many fields: pronounce, mean, example sentence...) of word after we found it? We will save these info on another data file?
You need to list the major concerns of your design before searching data structures. What functions does this dictionary offer? What are the major features of it? Fast search? Space compactness? Insertion/deletion friendly? Cross-referencing friendly? Only when you have these in your mind you may measure how good a candidate structure is.
It can be implemented in several ways, one of them is Trie. The route is represented by the digits and the nodes point to collection of words. Usage of trie is explained here
Agree with Hunter Mcmillen's comment. In case you need the words to be sorted alphabetically like a regular dictionary you can use Java TreeMap which is a SortedMap.

String analysis and classification

I am developing a financial manager in my freetime with Java and Swing GUI. When the user adds a new entry, he is prompted to fill in: Moneyamount, Date, Comment and Section (e.g. Car, Salary, Computer, Food,...)
The sections are created "on the fly". When the user enters a new section, it will be added to the section-jcombobox for further selection. The other point is, that the comments could be in different languages. So the list of hard coded words and synonyms would be enormous.
So, my question is, is it possible to analyse the comment (e.g. "Fuel", "Car service", "Lunch at **") and preselect a fitting Section.
My first thought was, do it with a neural network and learn from the input, if the user selects another section.
But my problem is, I don´t know how to start at all. I tried "encog" with Eclipse and did some tutorials (XOR,...). But all of them are only using doubles as in/output.
Anyone could give me a hint how to start or any other possible solution for this?
Here is a runable JAR (current development state, requires Java7) and the Sourceforge Page
Forget about neural networks. This is a highly technical and specialized field of artificial intelligence, which is probably not suitable for your problem, and requires a solid expertise. Besides, there is a lot of simpler and better solutions for your problem.
First obvious solution, build a list of words and synonyms for all your sections and parse for these synonyms. You can then collect comments online for synonyms analysis, or use parse comments/sections provided by your users to statistically detect relations between words, etc...
There is an infinite number of possible solutions, ranging from the simplest to the most overkill. Now you need to define if this feature of your system is critical (prefilling? probably not, then)... and what any development effort will bring you. One hour of work could bring you a 80% satisfying feature, while aiming for 90% would cost one week of work. Is it really worth it?
Go for the simplest solution and tackle the real challenge of any dev project: delivering. Once your app is delivered, then you can always go back and improve as needed.
String myString = new String(paramInput);
if(myString.contains("FUEL")){
//do the fuel functionality
}
In a simple app, if you will be having only some specific sections in your application then you can get string from comments and check it if it contains some keywords and then according to it change the value of Section.
If you have a lot of categories, I would use something like Apache Lucene where you could index all the categories with their name's and potential keywords/phrases that might appear in a users description. Then you could simply run the description through Lucene and use the top matched category as a "best guess".
P.S. Neural Network inputs and outputs will always be doubles or floats with a value between 0 and 1. As for how to implement String matching I wouldn't even know where to start.
It seems to me that following will do:
hard word statistics
maybe a stemming class (English/Spanish) which reduce a word like "lunches" to "lunch".
a list of most frequent non-words (the, at, a, for, ...)
The best fit is a linear problem, so theoretical fit for a neural net, but why not take immediately the numerical best fit.
A machine learning algorithm such as an Artificial Neural Network doesn't seem like the best solution here. ANNs can be used for multi-class classification (i.e. 'to which of the provided pre-trained classes does the input represent?' not just 'does the input represent an X?') which fits your use case. The problem is that they are supervised learning methods and as such you need to provide a list of pairs of keywords and classes (Sections) that spans every possible input that your users will provide. This is impossible and in practice ANNs are re-trained when more data is available to produce better results and create a more accurate decision boundary / representation of the function that maps the inputs to outputs. This also assumes that you know all possible classes before you start and each of those classes has training input values that you provide.
The issue is that the input to your ANN (a list of characters or a numerical hash of the string) provides no context by which to classify. There's no higher level information provided that describes the word's meaning. This means that a different word that hashes to a numerically close value can be misclassified if there was insufficient training data.
(As maclema said, the output from an ANN will always be floats with each value representing proximity to a class - or a class with a level of uncertainty.)
A better solution would be to employ some kind of word-relation or synonym graph. A Bag of words model might be useful here.
Edit: In light of your comment that you don't know the Sections before hand,
an easy solution to program would be to provide a list of keywords in a file that gets updated as people use the program. Simply storing a mapping of provided comments -> Sections, which you will already have in your database, would allow you to filter out non-keywords (and, or, the, ...). One option is to then find a list of each Section that the typed keywords belong to and suggest multiple Sections and let the user pick one. The feedback that you get from user selections would enable improvements of suggestions in the future. Another would be to calculate a Bayesian probability - the probability that this word belongs to Section X given the previous stored mappings - for all keywords and Sections and either take the modal Section or normalise over each unique keyword and take the mean. Calculations of probabilities will need to be updated as you gather more information ofcourse, perhaps this could be done with every new addition in a background thread.

Which Java data object to use for multidimensional range matching?

Project Background:
I am writing a map tile overlay class for java that can use gdal2tile.py tiles. Basically I will end up with thousands of jpg files that are in a file structure like
"Zoom Level/X coordinate/Y coordinate"
The coordinates are ints but will not necessarily start at 0 or 1.
I will have to search for tiles that are within a certain range to find out which ones I need to render.
My Problem:
I tried iterating using the file structure itself but it is wicked slow (not surprising).
I tried iterating using an ArrayList of strings of the file structure and .contains() but it seems to be even slower (not too surprising).
Optimally I would like to use a data structure that would let me choose a range on multiple dimensions so that I can call something like.
Tiles.getWhere(Zoom Level,min X,max X,min Y,maxY);
I assume that some sort of Collection or TreeMap would be the right choice but I'm not experienced enough with Java to know for sure and I'd prefer not to have to benchmark a lot of different approaches.
I could use SQLite to do it but that seems like overkill.
My Question:
What is the most efficient way to check for the existence of datasets given multiple dimensional constraints?
May be you are looking for a map with multiple keys.
Commons-collections provides a map with multiple lookup keys:
http://commons.apache.org/collections/apidocs/org/apache/commons/collections/map/MultiKeyMap.html
a map guarantees a O(1) insertion and O(1) selection timings.
Thinking of your problem I could find out three directions to which you could aim your search next (this is not a hand-by-hand guide but rather a out-of-the-box brain opener for a stucked situation you have faced):
1) Usage of Java built in structures. Yes, indeed, a list is the worst case of a searching method. A Map, as the name suggests, is far more convenient for maps. It is not only the name, but the indexing to a Map is signifigantly less time consuming compared to a List. You can imagine your map as a cube, where you have to handle about half of the dots inside it, if you use List and probably only a narrow layer of it when you search by indexing a Map. There is a magnitude of difference. So, my answer here: Map is a key word towards the correct direction (assuming you want to do it in this way after reading on my answer).
2) Usage of a Map Server solution. This is probably too far from your approach, but entire frameworks are made for solving your type of question. An example is GeoServer. It has a ready made solution for the entire problem. It is a stable solution for the great big problem possibly in your hand: showing a map to a user from a source.
3) Sticking to the GDAL framework you were using, you could select slightly different py-file, like gdal_proximity.py and - wow! - you have a searching possibility in your hand! This particular one searches by a center point and a distance, but will do the stuff you need =)
There is a starting point, how I would make it. Could this serve for something?
Sounds to me like you are looking for something like an Interval Tree.
http://en.wikipedia.org/wiki/Interval_tree
I have implemented one of these in the past but only in one dimension. The Wikipedia reference mentions extensions to more dimensions.
Paul

Solving The 8 Puzzle With A* Algorithm

I would like to solve/implement the 8 puzzle problem using the A* algorithm in Java. Am asking if someone can help me by explaining to me the steps i must follow to solve it. I have read on the net how the A* works but i don't know how to begin the implementation in Java.
I will be very grateful if you guys can help me and give me the guidelines so that i can implement it myself in Java. I really want to do it to be able to understand it, so i just need the guidelines to start.
I will use priority queues and will read the initial configuration from a text file which looks like for example this:
4 3 6
1 2 5
7 8
Pointers to other sites for more explanation/tutorials are welcome.
I'd begin with deciding how you want to represent the game board states,
then implement the operators (eg. move (blank) tile up, move (blank) tile down, ...).
Typically you will have a data structure to represent the open list (ie. those states
discovered but as yet unexplored (ie. compared with goal state) and another for the
closed list (ie. those states discovered and explored and found not to be the goal state).
You seed the open list with the starting state, and repeatedly take the "next" state to
be explored from the open list, apply the operators to it to generate new possible states
and so on ...
There is a tutorial I prepared many years ago at:
http://www.cs.rmit.edu.au/AI-Search/
It is far from the definitive word on state space searching though, it is simply an educational tool for those brand new to the concept.
Check http://olympiad.cs.uct.ac.za/presentations/camp1_2004/heuristics.pdf it describes ways of tackling this very problem.
A* is a lot like Djikstra's algorithm except it includes a heuristic. You might want to read that wiki or read about single-source shortest path algorithms in general.
A lot of the basic stuff is important but obvious. You'll need to represent the board and create a method for generating the possible next states.
The base score for any position will obviously be the minimum number of actual moves to arrive at it. For A* to work, you need a heuristic that can help you pick the best option of possible next states. One heuristic might be the number of pieces in the correct position.

Categories