Matching Algorithm (Java)

Matching Algorithm (Java) - java

I am writing an algorithm to match students with different groups. Each group has a limited number of spots. Each student provides their top 5 choices of groups. The students are then placed into groups in a predetermined order (older students and students with perfect attendance are given higher priority). There is no requirement for groups to be filled entirely, but they cannot be filled passed capacity.
I've looked into similar marriage problems such as the Gale-Shapely stable marriage algorithm, but the problem I am having is that there far fewer groups than students and each group can accept multiple students.
What is the best way to implement such an algorithm to find a solution that has been optimized entirely such that there is no better arrangement of students in groups? In terms of algorithm complexity, I'm placing roughly 600 students into 10-20 groups.

NB The close votes are terribly misplaced. Algorithm choice and design to solve an ambiguous problem is absolutely part of programming.
I think you'll get farther with Minimum Weight Bipartite Matching than Stable Marriage (also called the Hungarian method or algorithm or Maximum Weight Matching, which can give you a min weight matching just by negating the weights).
You are out to match positions with students. So the two node types in the bipartite graph are these.
The simplest statement of the algorithm requires a complete weighed bipartite graph with equal numbers of nodes in each set. You can think of this as a square matrix. The weights are the elements. The rows are students. The columns are positions.
The algorithm will pick a single element from each row/column such that the sum is minimized.
#nava's proposal is basically a greedy version of MWBM that's not optimal. The true Hungarian algorithm will give you an optimal answer.
To handle the fact that you have fewer positions than students is easy. To the "real" positions add as many "dummy" positions as needed. Connect all these to all the students with super high-weight edges. The algorithm will only pick them after all the real positions are matched.
The trick is to pick the edge weights. Let's call the ordinal where a student would be considered for a position O_i for the i'th student. Then let R_ip be the ranking that the same student places on the p'th position. Finally, W_ip is the weight of the edge connecting the i'th student to the p'th position. You'll want something like:
W_ip = A * R_ip + B * O_i
You get to pick A and B to specify the relative importance of the students' preferences and the order they're ranked. It sounds like order is quite important. So in that case you want B to be big enough to completely override students' rankings.
A = 1, B = N^2, where N is the number of students.
Once you get an implementation working, it's actually fun to tweak the parameters to see how many students get what preference, etc. You might want to tweak the parameters a bit to give up a little on the order.
At the time I was working on this (late 90's), the only open source MWBM I could find was an ancient FORTRAN lib. It was O(N^3). It handled 1,000 students (selecting core academic program courses) in a few seconds. I spent a lot of time coding a fancy O(N^2 log N) version that turned out to be about 3x slower for N=1000. It only started "winning" at about 5,000.
These days there are probably better options.

I would modify the Knapsack problem (Knapsack problem, wikipedia) to work with K numbers of groups (knapsacks) instead of just one. You can assign "value" to the preferences they have and the number of spots would be the maximum "weight" of the Knapsack. With this, you can backtrack to check what is the optimal solution of the problem.
I am not sure how efficient you need the problem to be, but I think this will work.

the most mathematically perfect
is very opinion based.
simplicity (almost) always wins. here
Here is a pseudocode:
students <-- sorted by attendance
for i=0 to n in students:
groups <-- sorted by ith student's preference
for j=0 to m in groups:
if group j has space then add student i to group j; studentAssigned=true; break;
if studentAssigned=false;
add i to unallocated
for i=0 to k in unallocated
allocate i to a random group that is not filled

For each group:
create an ordered set and add all the students (you must design the heuristic which will order the students inside the set, which could be attendance level multiplied by 1 if the group is within their choice, 0 otherwise).
Fill the group with the first nth students
But there are some details that you didn't explain. For example, what happen if there are students that couldn't enter any of their 5 choices because they got full with other students with higher priority?

Related

Is this way to select `parent` like `gambling roulette ` workable?

It occurs to me that using random number to select the parentslike gambling roulettemaybe workalbe.Let me explain it using an example in find the max value of function.The example is shown below:
1.Imagine that we have already generated n random individual and calculated their function value.We named the individual 'j' Xj,and its function value name is f(Xj).And we find and name the max function-value maxValue.
2.It is clear that the fitness of individual j is f(Xj)/maxValue.We can name it g(Xj).And then we calculate all fitness of individuals.
3.The next step is to find the parents.(We abandon the individual whose fitness values is less than 0) .A classtic way is gambling roulette.The chance of selecting Xjand Xkis g(Xj)*g(Xk)/[g(X1)+g(X2)+...+g(Xn)]^2.
My idea is
1.select two random individual Xj and Xk
2.generate a random number rn in range of 0~1.
3.if rn is less than g(Xj)and g(Xk)(the fitness of Xj and Xk),then they are able to reproduce.Then crossover and mutate.
4.judge whether we have generated enough child individuals,if so,end.
else,repeat 1-3.
The chance of selecting Xjand Xk is g(Xj)*g(Xk)/n^2,which is similar to gambling roulette.Consider that both denominator of two chance are constant value,they are equal in a certain way.
double randomNumToJudge=Math.random();//generate a random number to judge with the fitness
int randomMother=(int)(Math.random()*1000);
int randomFather=(int)(Math.random()*1000);//random generate parents
if((randomNumToJudge<=individualArray[generation][randomFather].fitnessValue)
&&(randomNumToJudge<=individualArray[generation][randomMother].fitnessValue))
//if the number is less than both fitness of parents,they are permited to reproduce.
{
Individual childIndividual=individualArray[generation][randomFather].crossOverAndMutate(individualArray[generation][randomFather], individualArray[generation][randomMother]);
//Crossover and mutate and generate child individual
individualArray[generation+1][counter]=childIndividual;//add childIndividual to tha Array.
counter++;//the count of individual number in child generation
}
I test this way in a java code.The function is x + 10sin(5x) + 7cos(4x), x∈[0,10).I generate 100 generation and the individual number in a generation is 1000.
Its result is correct.
In a certain execution,in the 100th generation,i find the best individual is 7.856744175554171,and the best function value is 24.855362868957645.
I have tested for 10 times.Every result is accurate to 10 decimal places in 100th generation.
So is this way workable?Is this way already been thought by others?
Any comments are appreciated ^#^
PS:Pardon my poor english-_-

Please note I have edited this answer.
From point 2, I am assuming your target fitness is 1. Your algorithm will likely never fully converge (find a local minima). This is because your random value range (0~>1) does not change even if your fitnesses do.
Note that this does not mean better fitnesses are not created; they will be. But there will be a sharp decline in the speed at which better fitnesses are created due to the fact that you are checking for fitnesses (random 0~>1).
Consider this example where all fitnesses have converged to be high:
[0.95555, 0.98888, 0.92345, 0.92366]
Here, all values are very likely to satisfy randomNumToJudge<=fitness. This means any of the values are equally likely to be chosen as a parent. You do not want this - you want the best values to have a higher chance of being chosen.
Your algorithm could be amended to converge properly if you set your randomNumToJudge to have a range of (median fitness in population ~> 1), though this still is not optimal.
Alternative Method
I recommend implementing the classic roulette wheel method.
The roulette wheel method assigns to each individual a probability of being chosen as a parent based on how "fit" they are. Essentially, the greater the fitness, the the bigger the slice of the wheel the individual will occupy and the higher the chance a random number will choose this position on the wheel.
Example Java code for roulette wheel selection

Time complexity assignment

I have an assignment in my intro to programming course that I don't understand at all. I've been falling behind because of problems at home. I'm not asking you to do my assignment for me I'm just hoping for some help for a programming boob like me.
The question is this:
Calculate the time complexity in average case for searching, adding, and removing in a
- unsorted vector
- sorted vector
- unsorted singlelinked list
- sorted singlelinked list
- hash table
Let n be the number of elements in the datastructure
and present the solution in a
table with three rows and five columns.
I'm not sure what this even means.. I've read as much as I can about time complexity but I don't understand it.. It's so confusing. I don't know where I would even start.. Remember I'm a novice programmer, as dumb as they come. I did really well last semester but had problems at home at the start of this one so I missed a lot of lectures and the first assignments so now I'm in over my head..
Maybe if someone could give me the answer and the reasoning behind it on a couple of them I could maybe understand it and do the others? I have a hard time learning through theory, examples work best.

Time complexity is a formula that describes how the cost of an operation varies related to the number of elements. It is usually expressed using "big-O" notation, for example O(1) or constant time, O(n) where cost relates linearly to n, O(n2) where cost increases as the square of the size of the input. There can be others involving exponentials or logarithms. Read up on "Big-O Notation".
You are being asked to evaluate five different data structures, and provide average cost for three different operations on each data structure (hence the table with three rows and five columns).

Time complexity is an abstract concept, that allows us to compare the complexity of various algorithms by looking at how many operations are performed in order to handle its inputs. To be precise, the exact number of operations isn't important, the bottom line is, how does the number of operations scale with increasing complexity of inputs.
Generally, the number of inputs is denoted as n and the complexity is denoted as O(p(n)), with p(n) being some kind of expression with n. If an algorithm has O(n) complexity, it means, that is scales linearly, with every new input, the time needed to run the algorithm increases by the same amount.
If an algorithm has complexity of O(n^2) it means, that the amount of operations grows as a square of number of inputs. This goes on and on, up to exponencially complex algorithms, that are effectively useless for large enough inputs.
What your professor asks from you is to have a look at the given operations and judge, how are going to scale with increasing size of lists, you are handling. Basically this is done by looking at the algorithm and imagining, what kinds of cycles are going to be necessary. For example, if the task is to pick the first element, the complexity is O(1), meaning that it doesn't depend on the size of input. However, if you want to find a given element in the list, you already need to scan the whole list and this costs you depending on the list size. Hope this gives you a bit of an idea how algorithm complexity works, good luck with your assignment.

Ok, well there are a few things you have to start with first. Algorithmic complexity has a lot of heavy math behind it and so it is hard for novices to understand, especially if you try to look up Wikipedia definitions or more-formal definitions.
A simple definition is that time-complexity is basically a way to measure how much an operation costs to perform. Alternatively, you can also use it to see how long a particular algorithm can take to run.
Complexity is described using what is known as big-O notation. You'll usually end up seeing things like O(1) and O(n). n is usually the number of elements (possibly in a structure) on which the algorithm is operating.
So let's look at a few big-O notations:
O(1): This means that the operation runs in constant time. What this means is that regardless of the number of elements, the operation always runs in constant time. An example is looking at the first element in a non-empty array (arr[0]). This will always run in constant time because you only have to directly look at the very first element in an array.
O(n): This means that the time required for the operation increases linearly with the number of elements. An example is if you have an array of numbers and you want to find the largest number. To do this, you will have to, in the worst case, look at every single number in the array until you find the largest one. Why is that? This is because you can have a case where the largest number is the last number in the array. So you cannot be sure until you have examined every number in the array. This is why the cost of this operation is O(n).
O(n^2): This means that the time required for the operation increases quadratically with the number of elements. This usually means that for each element in the set of elements, you are running through the entire set. So that is n x n or n^2. A well-known example is the bubble-sort algorithm. In this algorithm you run through and swap adjacent elements to ensure that the array is sorted according to the order you need. The array is sorted when no-more swaps need to be made. So you have multiple passes through the array, which in the worst case is equal to the number of elements in the array.
Now there are certain things in code that you can look at to get a hint to see if the algorithm is O(n) or O(n^2).
Single loops are usually O(n), since it means you are iterating over a set of elements once:
for(int i = 0; i < n; i++) {
...
}
Doubly-nested loops are usually O(n^2), since you are iterating over an entire set of elements for each element in the set:
for(int i = 0; i < n; i++) {
for(j = 0; j < n; j++) {
...
}
}
Now how does this apply to your homework? I'm not going to give you the answer directly but I will give you enough and more hints to figure it out :). What I wrote above, describing big-O, should also help you. Your homework asks you to apply runtime analyses to different data structures. Well, certain data structures have certain runtime properties based on how they are set up.
For example, in a linked list, the only way you can get to an element in the middle of the list, is by starting with the first element and then following the next pointer until you find the element that you want. Think about that. How many steps would it take for you to find the element that you need? What do you think those steps are related to? Do the number of elements in the list have any bearing on that? How can you represent the cost of this function using big-O notation?
For each datastructure that your teacher has asked you about, try to figure out how they are set up and try to work out manually what each operation (searching, adding, removing) entails. I'm talking about writing the steps out and drawing pictures of the strucutres on paper! This will help you out immensely! Looking at that, you should have enough information to figure out the number of steps required and how it relates to the number of elements in the set.
Using this approach you should be able to solve your homework. Good luck!

Genetic Algorithm - Grouping people: Only find solutions containing criteria X,Y and Z

I am trying to solve the following problem:
I have a list of 30 people.
These people need to be divided into groups of 6.
Each person has given the names of 3 other people who they would like to be in a group with.
I thought of solving this problem using a genetic algorithm.
The fitness function could evaluate all the groups, and assign a fitness score based on how many people per room have all their preferences met. (or a scoring system similar to that)
Example:
One of the generated solutions is: 1,3,19,5,22,2,7,8,11,12,13,14,15,13,17....etc
I would assume the first 5 people are in the first group, and the next 5 in the the next group and calculate a fitness value from that.
I think that this solution would work - does anyone see a better way of doing this?
My main question is this:
If I want to make sure person A and B are definitely in the same group, I could implement the fitness function to check for this and assign a terrible fitness if this condition isn't met. Is this the best way to do it? It seems quite inefficient.
Is there a way to 'lock' certain parts of the solution ("certain genes") and just solve or the remainder?
Any help or insights will be appreciated.
Thanks in advance.
AK

Just to clarify a bit, your problem isn't about genetic programming but genetic algorithms, which are two different things. Genetic programming is about generating (using evolutionary algorithms) executable individuals that will generate your solutions while genetic algorithms individuals represent directly your solutions.
That being said, your two assumptions are corrects. Data representation is a key element of evolutionary algorithms in general and a bad representation may hinder efficient solution space exploration. Your current data representation seems correct to me, given groups are only allowed to have exactly 5 individuals. Your second thought about the way to enforce some criteria is also right. Putting a large fitness value (preferably one that can't represent a potentially valid even if bad solution) such as infinity (if your library / language allows it easily) is the preferred way to express invalid solutions in literature. This has multiple advantages over simply deleting invalid individuals: During the selection stage, bad individuals won't be selected and thus the solution space they represent won't be explored as much as interesting ones, which is computationally good because it surely won't contain optimal solutions. Knowing a solution is bad is good knowledge, after all. At the same time, genetic diversity is really important in evolutionary algorithms in order to avoid stagnation. At least some bad individual should be kept for the sake of genetic diversity in order to explore solution spaces between currently represented zones.
The goal of genetic algorithms is to compute solutions that are either impossible or too hard to compute analytically or by brute-force. Trying to dynamically lock down some genes with heuristics would require much knowledge about the inner working of your problems as well as the underlying evolution mechanisms and would be defeating the purpose of using evolutionary algorithms. The effective goal of evolutionary algorithms is to lock down genes that seems correct.
In fact, if you are a priori absolutely positively certain that some given genes must have a given value, don't represent them in your individuals. For instance, make your first group 3 individuals long if you are sure that the 2 others must be of some given value. You can then code your evaluation function as if there was 5 individuals in the first group but won't be evolving / searching to replace the 2 fixed ones.

What does your crossover operation look like? The way you have it laid out in your description, I'm not sure how you implement it cleanly. For instance if you have two solutions:
1, 2, 3, 4, 5, ....., 30
and
1, 2, 30, 29,......,10
Assuming you're using single point crossover function, you would have the potential to get multiple assignments for the same people and other people not being assigned at all using the genomes above.
I would have a genome with 30 values, where each value defines a person's group assignment (1-6). It would look like 656324113255632....etc. So person 1 is assigned group 6, person 2 group 5, etc. This would make the crossover operation easier to implement, because you don't have to ensure that after crossover each new solution is a valid assignment regardless of whether it's optimal.
The fitness function would assign a penalty for each group not having the proper number of members (5), and additional penalties for group member assignments that are suboptimal. I would make the first penalty significantly larger than the second, and then adjust these to get the results you're looking for.

This can be modeled as a generalized quadratic assignment problem (GQAP). This problem allows to specify a number of equipment (people) that demand a certain capacity, a number of locations (groups) that offer a capacity and the weights matrix that specifies the closeness between equipment and the distance matrix specifying the distance between locations. Additionally, there are install costs, but these are not required for your problem. I have implemented this problem in HeuristicLab. It's not part of the trunk, but I can send you the plugin if you're interested (or you compile it yourself).

It seems that the most challenging part of using a genetic algorithm for this problem is implementing the crossover. Here's how I would do it:
First choose a constant, C. C will stay constant throughout all generations, and I will explain it's purpose in a moment.
I will use a smaller example than 5 groups of 6 to demonstrate this crossover, but the premise is the same. Say we have 2 parents, each consisting of 3 groups of 3. Let's make one [[1,2,3],[4,5,6],[7,8,9]], and the other [[9,4,3],[5,7,8],[6,1,2]].
Make a list of possible numbers (1 through total number of people), in this case it is simply [1,2,3,4,5,6,7,8,9]. Remove 1 random number from the list. Let's say we remove 2. The list becomes [1,3,4,5,6,7,8,9]
We assign each remaining number a probability. The probability starts at 1, and goes up by C for any matches with the parents. For example, in parent 1, 3 and 2 are in the same group so 3 would have a probability of 1+C. Same thing with 6 because it forms a match in parent 2. 1 would have a probability of 1+2C, because it is in the same group as 2 in both parents. Based on these probabilities, use a roulette wheel type selection. Let's say we pick 6.
Now, we have 2 and 6 in the same group. We similarly look for matches with these numbers and make probabilities. For each parent, we add C if it matches with only 2 or only 6, and 2C if it matches with both. Continue this until the group is done (for 3x3 this is the last selection, but for 5x6 there would be a few more)
4.Choose a new random number that has not been picked and continue for other groups
One of the good things about this crossover, is that it basically includes mutations already. There are chances built in to group people that were not grouped in their parents
Credit: I adapted the idea from the Omicron Genetic Algorithm

Algorithm Complexity (Big-O) of sudoku solver

I'm look for the "how do you find it" because I have no idea how to approach finding the algorithm complexity of my program.
I wrote a sudoku solver using java, without efficiency in mind (I wanted to try to make it work recursively, which i succeeded with!)
Some background:
my strategy employs backtracking to determine, for a given Sudoku puzzle, whether the puzzle only has one unique solution or not. So i basically read in a given puzzle, and solve it. Once i found one solution, i'm not necessarily done, need to continue to explore for further solutions. At the end, one of three possible outcomes happens: the puzzle is not solvable at all, the puzzle has a unique solution, or the puzzle has multiple solutions.
My program reads in the puzzle coordinates from a file that has one line for each given digit, consisting of the row, column, and digit. By my own convention, the upper left square of 7 is written as 007.
Implementation:
I load the values in, from the file, and stored them in a 2-D array
I go down the array until i find a Blank (unfilled value), and set it to 1. And check for any conflicts (whether the value i entered is valid or not).
If yes, I move onto the next value.
If no, I increment the value by 1, until I find a digit that works, or if none of them work (1 through 9), I go back 1 step to the last value that I adjusted and I increment that one (using recursion).
I am done solving when all 81 elements have been filled, without conflicts.
If any solutions are found, I print them to the terminal.
Otherwise, if I try to "go back one step" on the FIRST element that I initially modified, it means that there were no solutions.
How can my programs algorithm complexity? I thought it might be linear [ O(n) ], but I am accessing the array multiple times, so i'm not sure :(
Any help is appreciated

O(n ^ m) where n is the number of possibilities for each square (i.e., 9 in classic Sudoku) and m is the number of spaces that are blank.
This can be seen by working backwards from only a single blank. If there is only one blank, then you have n possibilities that you must work through in the worst case. If there are two blanks, then you must work through n possibilities for the first blank and n possibilities for the second blank for each of the possibilities for the first blank. If there are three blanks, then you must work through n possibilities for the first blank. Each of those possibilities will yield a puzzle with two blanks that has n^2 possibilities.
This algorithm performs a depth-first search through the possible solutions. Each level of the graph represents the choices for a single square. The depth of the graph is the number of squares that need to be filled. With a branching factor of n and a depth of m, finding a solution in the graph has a worst-case performance of O(n ^ m).

In many Sudokus, there will be a few numbers that can be placed directly with a bit of thought. By placing a number in the first empty cell, you give up on a lot of opportunities to reduce the possibilities. If the first ten empty cells have lots of possibilities, you get exponential growth. I'd ask the questions:
Where in the first line can the number 1 go?
Where in the first line can the number 2 go?
...
Where in the last line can the number 9 go?
Same but with nine columns?
Same but with the nine boxes?
Which number can go into the first cell?
Which number can go into the 81st cell?
That's 324 questions. If any question has exactly one answer, you pick that answer. If any question has no answer at all, you backtrack. If every question has two or more answers, you pick a question with the minimal number of answers.
You may get exponential growth, but only for problems that are really hard.

Markov Chain Text Generation

We were just assigned a new project in my data structures class -- Generating text with markov chains.
Overview
Given an input text file, we create an initial seed of length n characters. We add that to our output string and choose our next character based on frequency analysis..
This is the cat and there are two dogs.
Initial seed: "Th"
Possible next letters -- i, e, e
Therefore, probability of choosing i is 1/3, e is 2/3.
Now, say we choose i. We add "i" to the output string. Then our seed becomes
hi and the process continues.
My solution
I have 3 classes, Node, ConcreteTrie, and Driver
Of course, the ConcreteTrie class isn't a Trie of the traditional sense. Here is how it works:
Given the sentence with k=2:
This is the cat and there are two dogs.
I generate Nodes Th, hi, is, ... + ... , gs, s.
Each of these nodes have children that are the letter that follows them. For example, Node Th would have children i and e. I maintain counts in each of those nodes so that I can later generate the probabilities for choosing the next letter.
My question:
First of all, what is the most efficient way to complete this project? My solution seems to be very fast, but I really want to knock my professor's socks off. (On my last project A variation of the Edit distance problem, I did an A*, a genetic algorithm, a BFS, and Simulated Annealing -- and I know that the problem is NP-Hard)
Second, what's the point of this assignment? It doesn't really seem to relate to much of what we've covered in class. What are we supposed to learn?

On the relevance of this assignment with what you covered in class (Your second question). The idea of a 'data structures' class is to expose students to the very many structures frequently encountered in CS: lists, stacks, queues, hashes, trees of various types, graphs at large, matrices of various creed and greed, etc. and to provide some insight into their common implementations, their strengths and weaknesses and generally their various fields of application.
Since most any game / puzzle / problem can be mapped to some set of these structures, there is no lack of subjects upon which to base lectures and assignments. Your class seems interesting because while keeping some focus on these structures, you are also given a chance to discover real applications.
For example in a thinly disguised fashion the "cat and two dogs" thing is an introduction to statistical models applied to linguistics. Your curiosity and motivation prompted you to make the relation with markov models and it's a good thing, because chances are you'll meet "Markov" a few more times before graduation ;-) and certainly in a professional life in CS or related domain. So, yes! it may seem that you're butterflying around many applications etc. but so long as you get a feel for what structures and algorithms to select in particular situations, you're not wasting your time!
Now, a few hints on possible approaches to the assignment
The trie seems like a natural support for this type of problem. Maybe you can ask yourself however how this approach would scale, if you had to index say a whole book rather than this short sentence. It seems mostly linearly, although this depends on how each choice on the three hops in the trie (for this 2nd order Markov chain) : as the number of choices increase, picking a path may become less efficient.
A possible alternative storage for the building of the index is a stochatisc matrix (actually a 'plain' if only sparse matrix, during the statistics gathering process, turned stochastic at the end when you normalize each row -or column- depending on you set it up) to sum-up to one (100%). Such a matrix would be roughly 729 x 28, and would allow the indexing, in one single operation, of a two-letter tuple and its associated following letter. (I got 28 for including the "start" and "stop" signals, details...)
The cost of this more efficient indexing is the use of extra space. Space-wise the trie is very efficient, only storing the combinations of letter triplets effectively in existence, the matrix however wastes some space (you bet in the end it will be very sparsely populated, even after indexing much more text that the "dog/cat" sentence.)
This size vs. CPU compromise is very common, although some algorithms/structures are somtimes better than others on both counts... Furthermore the matrix approach wouldn't scale nicely, size-wize, if the problem was changed to base the choice of letters from the preceding say, three characters.
None the less, maybe look into the matrix as an alternate implementation. It is very much in spirit of this class to try various structures and see why/where they are better than others (in the context of a specific task).
A small side trip you can take is to create a tag cloud based on the probabilities of the letters pairs (or triplets): both the trie and the matrix contain all the data necessary for that; the matrix with all its interesting properties, may be more suited for this.
Have fun!

You using bigram approach with characters, but usually it applied to words, because the output will be more meaningful if we use just simple generator as in your case).
1) From my point of view you doing all right. But may be you should try slightly randomize selection of the next node? E.g. select random node from 5 highest. I mean if you always select node with highest probability your output string will be too uniform.
2) I've done exactly the same homework at my university. I think the point is to show to the students that Markov chains are powerful but without extensive study of application domain output of generator will be ridiculous

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.