Related
I am trying to understand the speed of the Binary Search algorithm.
I understand it needs to operate on a sorted array.
However if the array comes in unsorted and performing the sorting. Wouldn't the sorting be part of the Binary Search and thus its performance would be slower?
I am confused because I think that there is very little chance to use this algorithm if the data does not come in sorted.
And if my code needs to sort it then why isn't it counting towards the search algorithm.
Sorry if I am confusing,
Thank you for helping.
You can't just point at an algorithm and say: It's got O(n^2) complexity!
That's what people usually say, sure. But that's shorthand. They're omitting things; assuming that the listener / reader will make assumptions.
You need to fully describe the exact algorithm, the conditions under which it is applied, and the precise definition of n and any other variable.
Then, you can answer that question. The problem you're having here is that the definition of 'what is the performance of binary search' is unclear. If you assume it means X whilst your buddy assumes it means Y, and you then argue about the answers, you're not actually having a constructive debate at all. You're just tilting at windmills; the real problem is that neither of you figured out the problem is communicating the basics.
Given that there is some confusion here, I'll give you 3 different more or less equally sensible more fleshed out definitions, along with the actual answer for each such definition. Hint, for one of them, 'binary search' isn't the fastest algorithm!
Given [1] a list that is already sorted, and [2] a single value, write me an algorithm that determines if this value is in the list or not.
The best answer would be: A binary sort algorithm, and its complexity would be O(log n).
Given [1] a list that is not sorted, and [2] a single value, write me an algorithm that determines if this value is in the list or not.
The best answer would be: Just iterate through the list. Its complexity would be O(n), and binary sort is not part of this answer at all.
given [1] a list that is not sorted, and [2] a list of tests, whereby each individual test is defined by a single value, but they all use the same input unsorted list, write an algorithm that will, for each test, determine if the value for that test is in the list or not, and then give me the amortized complexity (basically, the complexity of the whole thing, divided by the # of tests we ran).
Then the best answer would be: First sort the list, spending O(n log n) time to do so, but we get to amortize that over the test case count, and then use binary search for each individual test, adding an O(log n) complexity to each test. If we term n the size of the input list and t the number of tests we have, this gets us:
O( (n log n)/t + O(log n) )
Which is the actual answer to the question, complex as it may look. But, if t is large or even considered effectively infinite in size, OR we add one more rider to the question:
The list from [1] is given to you in advance and, within reasonable time and memory limits, you may preprocess this data without needing to amortize these costs across your test cases
then that boils down to just O(log n), as the large value for t makes that (n log n) / t factor approach zero.
In communicating this to your buddy, given that we don't talk in entire scientific papers, one might then say: "The algorithmic complexity of the binary sort algorithm is O(log n)", even if that omits a gigantic chunk of the full story.
You interpret the question as per the second case (input is unsorted, the input comprises both the list and the value to search for, no multi-test clause). Someone who says 'binary search is O(log n)' is labouring under either the first or third. You're both right.
NB: The third definition seems unusually complicated. However, it matches common scenarios. For example, 'we have compiled a list of folks living in town and their phone numbers, and we want to print them in a giant book with the aim of letting recipients of this book look up phone numbers. We expect over the lifetime of a single print run that the 100,000 citizens of the township will eaech do on average about 50 lookups, for a grand total of 5 million lookups for this single list. That gives you t= 5 million, n = 200,000 (let's say 200k people live here, half of which get a phonebook). Plug those numbers in and sorting the phonebook wins by a landslide vs. releasing the phonebook in arbitrary, unsorted order. Even if, yes, you start 'down' the effort of sorting it and won't make up for that loss until a few folks have speedily looked up a few phone numbers to make up for your efforts in sorting it before printing the book.
Yes. If
the data comes in unsorted
you only need to search for one element
...then you would have to first sort the data to use binary search, which would take a total of O(n log n + log n) = O(n log n) time.
But once the data is sorted, you can then binary search on that data as many times as you want. You don't have to sort it again each time.
I wonder whether there is any automatic way of determining (at least roughly) the Big-O time complexity of a given function?
If I graphed an O(n) function vs. an O(n lg n) function I think I would be able to visually ascertain which is which; I'm thinking there must be some heuristic solution which enables this to be done automatically.
Any ideas?
Edit: I am happy to find a semi-automated solution, just wondering whether there is some way of avoiding doing a fully manual analysis.
It sounds like what you are asking for is an extention of the Halting Problem. I do not believe that such a thing is possible, even in theory.
Just answering the question "Will this line of code ever run?" would be very difficult if not impossible to do in the general case.
Edited to add:
Although the general case is intractable, see here for a partial solution: http://research.microsoft.com/apps/pubs/default.aspx?id=104919
Also, some have stated that doing the analysis by hand is the only option, but I don't believe that is really the correct way of looking at it. An intractable problem is still intractable even when a human being is added to the system/machine. Upon further reflection, I suppose that a 99% solution may be doable, and might even work as well as or better than a human.
You can run the algorithm over various size data sets, and you could then use curve fitting to come up with an approximation. (Just looking at the curve you create probably will be enough in most cases, but any statistical package has curve fitting).
Note that some algorithms exhibit one shape with small data sets, but another with large... and the definition of large remains a bit nebulous. This means that an algorithm with a good performance curve could have so much real world overhead that (for small data sets) it doesn't work as well as the theoretically better algorithm.
As far as code inspection techniques, none exist. But instrumenting your code to run at various lengths and outputting a simple file (RunSize RunLength would be enough) should be easy. Generating proper test data could be more complex (some algorithms work better/worse with partially ordered data, so you would want to generate data that represented your normal use-case).
Because of the problems with the definition of "what is large" and the fact that performance is data dependent, I find that static analysis often is misleading. When optimizing performance and selecting between two algorithms, the real world "rubber hits the road" test is the only final arbitrator I trust.
A short answer is that it's impossible because constants matter.
For instance, I might write a function that runs in O((n^3/k) + n^2). This simplifies to O(n^3) because as n approaches infinity, the n^3 term will dominate the function, irrespective of the constant k.
However, if k is very large in the above example function, the function will appear to run in almost exactly n^2 until some crossover point, at which the n^3 term will begin to dominate. Because the constant k will be unknown to any profiling tool, it will be impossible to know just how large a dataset to test the target function with. If k can be arbitrarily large, you cannot craft test data to determine the big-oh running time.
I am surprised to see so many attempts to claim that one can "measure" complexity by a stopwatch. Several people have given the right answer, but I think that there is still room to drive the essential point home.
Algorithm complexity is not a "programming" question; it is a "computer science" question. Answering the question requires analyzing the code from the perspective of a mathematician, such that computing the Big-O complexity is practically a form of mathematical proof. It requires a very strong understanding of the fundamental computer operations, algebra, perhaps calculus (limits), and logic. No amount of "testing" can be substituted for that process.
The Halting Problem applies, so the complexity of an algorithm is fundamentally undecidable by a machine.
The limits of automated tools applies, so it might be possible to write a program to help, but it would only be able to help about as much as a calculator helps with one's physics homework, or as much as a refactoring browser helps with reorganizing a code base.
For anyone seriously considering writing such a tool, I suggest the following exercise. Pick a reasonably simple algorithm, such as your favorite sort, as your subject algorithm. Get a solid reference (book, web-based tutorial) to lead you through the process of calculating the algorithm complexity and ultimately the "Big-O". Document your steps and results as you go through the process with your subject algorithm. Perform the steps and document your progress for several scenarios, such as best-case, worst-case, and average-case. Once you are done, review your documentation and ask yourself what it would take to write a program (tool) to do it for you. Can it be done? How much would actually be automated, and how much would still be manual?
Best wishes.
I am curious as to why it is that you want to be able to do this. In my experience when someone says: "I want to ascertain the runtime complexity of this algorithm" they are not asking what they think they are asking. What you are most likely asking is what is the realistic performance of such an algorithm for likely data. Calculating the Big-O of a function is of reasonable utility, but there are so many aspects that can change the "real runtime performance" of an algorithm in real use that nothing beats instrumentation and testing.
For example, the following algorithms have the same exact Big-O (wacky pseudocode):
example a:
huge_two_dimensional_array foo
for i = 0, i < foo[i].length, i++
for j = 0; j < foo[j].length, j++
do_something_with foo[i][j]
example b:
huge_two_dimensional_array foo
for j = 0, j < foo[j].length, j++
for i = 0; i < foo[i].length, i++
do_something_with foo[i][j]
Again, exactly the same big-O... but one of them uses row ordinality and one of them uses column ordinality. It turns out that due to locality of reference and cache coherency you might have two completely different actual runtimes, especially depending on the actual size of the array foo. This doesn't even begin to touch the actual performance characteristics of how the algorithm behaves if it's part of a piece of software that has some concurrency built in.
Not to be a negative nelly but big-O is a tool with a narrow scope. It is of great use if you are deep inside algorithmic analysis or if you are trying to prove something about an algorithm, but if you are doing commercial software development the proof is in the pudding, and you are going to want to have actual performance numbers to make intelligent decisions.
Cheers!
This could work for simple algorithms, but what about O(n^2 lg n), or O(n lg^2 n)?
You could get fooled visually very easily.
And if its a really bad algorithm, maybe it wouldn't return even on n=10.
Proof that this is undecidable:
Suppose that we had some algorithm HALTS_IN_FN(Program, function) which determined whether a program halted in O(f(n)) for all n, for some function f.
Let P be the following program:
if(HALTS_IN_FN(P,f(n)))
{
while(1);
}
halt;
Since the function and the program are fixed, HALTS_IN_FN on this input is constant time. If HALTS_IN_FN returns true, the program runs forever and of course does not halt in O(f(n)) for any f(n). If HALTS_IN_FN returns false, the program halts in O(1) time.
Thus, we have a paradox, a contradiction, and so the program is undecidable.
A lot of people have commented that this is an inherently unsolvable problem in theory. Fair enough, but beyond that, even solving it for any but the most trivial cases would seem to be incredibly difficult.
Say you have a program that has a set of nested loops, each based on the number of items in an array. O(n^2). But what if the inner loop is only run in a very specific set of circumstances? Say, on average, it's run in aprox log(n) cases. Suddenly our "obviously" O(n^2) algorithm is really O(n log n). Writing a program that could determine if the inner loop would be run, and how often, is potentially more difficult than the original problem.
Remember O(N) isn't god; high constants can and will change the playing field. Quicksort algorithms are O(n log n) of course, but when the recursion gets small enough, say down to 20 items or so, many implementations of quicksort will change tactics to a separate algorithm as it's actually quicker to do a different type of sort, say insertion sort with worse O(N), but much smaller constant.
So, understand your data, make educated guesses, and test.
I think it's pretty much impossible to do this automatically. Remember that O(g(n)) is the worst-case upper bound and many functions perform better than that for a lot of data sets. You'd have to find the worst-case data set for each one in order to compare them. That's a difficult task on its own for many algorithms.
You must also take care when running such benchmarks. Some algorithms will have a behavior heavily dependent on the input type.
Take Quicksort for example. It is a worst-case O(n²), but usually O(nlogn). For two inputs of the same size.
The traveling salesman is (I think, not sure) O(n²) (EDIT: the correct value is 0(n!) for the brute force algotithm) , but most algorithms get rather good approximated solutions much faster.
This means that the the benchmarking structure has to most of the time be adapted on an ad hoc basis. Imagine writing something generic for the two examples mentioned. It would be very complex, probably unusable, and likely will be giving incorrect results anyway.
Jeffrey L Whitledge is correct. A simple reduction from the halting problem proves that this is undecidable...
ALSO, if I could write this program, I'd use it to solve P vs NP, and have $1million... B-)
I'm using a big_O library (link here) that fits the change in execution time against independent variable n to infer the order of growth class O().
The package automatically suggests the best fitting class by measuring the residual from collected data against each class growth behavior.
Check the code in this answer.
Example of output,
Measuring .columns[::-1] complexity against rapid increase in # rows
--------------------------------------------------------------------------------
Big O() fits: Cubic: time = -0.017 + 0.00067*n^3
--------------------------------------------------------------------------------
Constant: time = 0.032 (res: 0.021)
Linear: time = -0.051 + 0.024*n (res: 0.011)
Quadratic: time = -0.026 + 0.0038*n^2 (res: 0.0077)
Cubic: time = -0.017 + 0.00067*n^3 (res: 0.0052)
Polynomial: time = -6.3 * x^1.5 (res: 6)
Logarithmic: time = -0.026 + 0.053*log(n) (res: 0.015)
Linearithmic: time = -0.024 + 0.012*n*log(n) (res: 0.0094)
Exponential: time = -7 * 0.66^n (res: 3.6)
--------------------------------------------------------------------------------
I guess this isn't possible in a fully automatic way since the type and structure of the input differs a lot between functions.
Well, since you can't prove whether or not a function even halts, I think you're asking a little much.
Otherwise #Godeke has it.
I don't know what's your objective in doing this, but we had a similar problem in a course I was teaching. The students were required to implement something that works at a certain complexity.
In order not to go over their solution manually, and read their code, we used the method #Godeke suggested. The objective was to find students who used linked list instead of a balansed search tree, or students who implemented bubble sort instead of heap sort (i.e. implementations that do not work in the required complexity - but without actually reading their code).
Surprisingly, the results did not reveal students who cheated. That might be because our students are honest and want to learn (or just knew that we'll check this ;-) ). It is possible to miss cheating students if the inputs are small, or if the input itself is ordered or such. It is also possible to be wrong about students who did not cheat, but have large constant values.
But in spite of the possible errors, it is well worth it, since it saves a lot of checking time.
As others have said, this is theoretically impossible. But in practice, you can make an educated guess as to whether a function is O(n) or O(n^2), as long as you don't mind being wrong sometimes.
First time the algorithm, running it on input of various n. Plot the points on a log-log graph. Draw the best-fit line through the points. If the line fits all the points well, then the data suggests that the algorithm is O(n^k), where k is the slope of the line.
I am not a statistician. You should take all this with a grain of salt. But I have actually done this in the context of automated testing for performance regressions. The patch here contains some JS code for it.
If you have lots of homogenious computational resources, I'd time them against several samples and do linear regression, then simply take the highest term.
It's easy to get an indication (e.g. "is the function linear? sub-linear? polynomial? exponential")
It's hard to find the exact complexity.
For example, here's a Python solution: you supply the function, and a function that creates parameters of size N for it. You get back a list of (n,time) values to plot, or to perform regression analysis. It times it once for speed, to get a really good indication it would have to time it many times to minimize interference from environmental factors (e.g. with the timeit module).
import time
def measure_run_time(func, args):
start = time.time()
func(*args)
return time.time() - start
def plot_times(func, generate_args, plot_sequence):
return [
(n, measure_run_time(func, generate_args(n+1)))
for n in plot_sequence
]
And to use it to time bubble sort:
def bubble_sort(l):
for i in xrange(len(l)-1):
for j in xrange(len(l)-1-i):
if l[i+1] < l[i]:
l[i],l[i+1] = l[i+1],l[i]
import random
def gen_args_for_sort(list_length):
result = range(list_length) # list of 0..N-1
random.shuffle(result) # randomize order
# should return a tuple of arguments
return (result,)
# timing for N = 1000, 2000, ..., 5000
times = plot_times(bubble_sort, gen_args_for_sort, xrange(1000,6000,1000))
import pprint
pprint.pprint(times)
This printed on my machine:
[(1000, 0.078000068664550781),
(2000, 0.34400010108947754),
(3000, 0.7649998664855957),
(4000, 1.3440001010894775),
(5000, 2.1410000324249268)]
I am trying to solve the following problem:
I have a list of 30 people.
These people need to be divided into groups of 6.
Each person has given the names of 3 other people who they would like to be in a group with.
I thought of solving this problem using a genetic algorithm.
The fitness function could evaluate all the groups, and assign a fitness score based on how many people per room have all their preferences met. (or a scoring system similar to that)
Example:
One of the generated solutions is: 1,3,19,5,22,2,7,8,11,12,13,14,15,13,17....etc
I would assume the first 5 people are in the first group, and the next 5 in the the next group and calculate a fitness value from that.
I think that this solution would work - does anyone see a better way of doing this?
My main question is this:
If I want to make sure person A and B are definitely in the same group, I could implement the fitness function to check for this and assign a terrible fitness if this condition isn't met. Is this the best way to do it? It seems quite inefficient.
Is there a way to 'lock' certain parts of the solution ("certain genes") and just solve or the remainder?
Any help or insights will be appreciated.
Thanks in advance.
AK
Just to clarify a bit, your problem isn't about genetic programming but genetic algorithms, which are two different things. Genetic programming is about generating (using evolutionary algorithms) executable individuals that will generate your solutions while genetic algorithms individuals represent directly your solutions.
That being said, your two assumptions are corrects. Data representation is a key element of evolutionary algorithms in general and a bad representation may hinder efficient solution space exploration. Your current data representation seems correct to me, given groups are only allowed to have exactly 5 individuals. Your second thought about the way to enforce some criteria is also right. Putting a large fitness value (preferably one that can't represent a potentially valid even if bad solution) such as infinity (if your library / language allows it easily) is the preferred way to express invalid solutions in literature. This has multiple advantages over simply deleting invalid individuals: During the selection stage, bad individuals won't be selected and thus the solution space they represent won't be explored as much as interesting ones, which is computationally good because it surely won't contain optimal solutions. Knowing a solution is bad is good knowledge, after all. At the same time, genetic diversity is really important in evolutionary algorithms in order to avoid stagnation. At least some bad individual should be kept for the sake of genetic diversity in order to explore solution spaces between currently represented zones.
The goal of genetic algorithms is to compute solutions that are either impossible or too hard to compute analytically or by brute-force. Trying to dynamically lock down some genes with heuristics would require much knowledge about the inner working of your problems as well as the underlying evolution mechanisms and would be defeating the purpose of using evolutionary algorithms. The effective goal of evolutionary algorithms is to lock down genes that seems correct.
In fact, if you are a priori absolutely positively certain that some given genes must have a given value, don't represent them in your individuals. For instance, make your first group 3 individuals long if you are sure that the 2 others must be of some given value. You can then code your evaluation function as if there was 5 individuals in the first group but won't be evolving / searching to replace the 2 fixed ones.
What does your crossover operation look like? The way you have it laid out in your description, I'm not sure how you implement it cleanly. For instance if you have two solutions:
1, 2, 3, 4, 5, ....., 30
and
1, 2, 30, 29,......,10
Assuming you're using single point crossover function, you would have the potential to get multiple assignments for the same people and other people not being assigned at all using the genomes above.
I would have a genome with 30 values, where each value defines a person's group assignment (1-6). It would look like 656324113255632....etc. So person 1 is assigned group 6, person 2 group 5, etc. This would make the crossover operation easier to implement, because you don't have to ensure that after crossover each new solution is a valid assignment regardless of whether it's optimal.
The fitness function would assign a penalty for each group not having the proper number of members (5), and additional penalties for group member assignments that are suboptimal. I would make the first penalty significantly larger than the second, and then adjust these to get the results you're looking for.
This can be modeled as a generalized quadratic assignment problem (GQAP). This problem allows to specify a number of equipment (people) that demand a certain capacity, a number of locations (groups) that offer a capacity and the weights matrix that specifies the closeness between equipment and the distance matrix specifying the distance between locations. Additionally, there are install costs, but these are not required for your problem. I have implemented this problem in HeuristicLab. It's not part of the trunk, but I can send you the plugin if you're interested (or you compile it yourself).
It seems that the most challenging part of using a genetic algorithm for this problem is implementing the crossover. Here's how I would do it:
First choose a constant, C. C will stay constant throughout all generations, and I will explain it's purpose in a moment.
I will use a smaller example than 5 groups of 6 to demonstrate this crossover, but the premise is the same. Say we have 2 parents, each consisting of 3 groups of 3. Let's make one [[1,2,3],[4,5,6],[7,8,9]], and the other [[9,4,3],[5,7,8],[6,1,2]].
Make a list of possible numbers (1 through total number of people), in this case it is simply [1,2,3,4,5,6,7,8,9]. Remove 1 random number from the list. Let's say we remove 2. The list becomes [1,3,4,5,6,7,8,9]
We assign each remaining number a probability. The probability starts at 1, and goes up by C for any matches with the parents. For example, in parent 1, 3 and 2 are in the same group so 3 would have a probability of 1+C. Same thing with 6 because it forms a match in parent 2. 1 would have a probability of 1+2C, because it is in the same group as 2 in both parents. Based on these probabilities, use a roulette wheel type selection. Let's say we pick 6.
Now, we have 2 and 6 in the same group. We similarly look for matches with these numbers and make probabilities. For each parent, we add C if it matches with only 2 or only 6, and 2C if it matches with both. Continue this until the group is done (for 3x3 this is the last selection, but for 5x6 there would be a few more)
4.Choose a new random number that has not been picked and continue for other groups
One of the good things about this crossover, is that it basically includes mutations already. There are chances built in to group people that were not grouped in their parents
Credit: I adapted the idea from the Omicron Genetic Algorithm
Article at http://leepoint.net/notes-java/algorithms/big-oh/bigoh.html says that Big O notation For accessing middle element in linked list is O(N) .should not it be O(N/2) .Assume we have 100 elements in linked list So to access the 50th element it has to traverse right from first node to 50th. So number of operation will be around 50.
One more big notation defined for binary search is log(N). Assume we have 1000 elements. As per log(N) we will be needing the opeartion close to 3. Now lets calculate it manually below will be the pattern
500 th element, 250th, 125,63,32,16,8,4,2 so operation are around 9 which is much larger than 3.
Is there any thing i am missing here?
What you're is missing that any constant multiples don't matter for Big O. So we have O(N) = O(N/2).
About the log part of the question, it is actually log2(N) not log10(N), so in this case log(1000) is actually 9 (when rounding down). Also, as before, O(log2(N)) = O(log10(N)), since the two are just constant multiples of each other. Specifically, log2(N) = log10(N) / log10(2)
The last thing to consider is that, if several functions are added together, the function of lower degree doesn't matter with Big O. That's because higher degree functions grow more quickly than functions of lower degree. So we find things like O(N3 + N) = O(N3), and O(eN + N2 + 1) = O(eN).
So that's two things to consider: drop multiples, and drop function of low degree.
O(n) means "proportional to n, for all large n".1
So it doesn't have to be equal to n.
Edit:
My explanation above was a little unclear as to whether something in O(n) is also in O(n2). It is -- my use of the word "proportional" was poor, and as the others were trying to mention (and as I was misunderstanding originally), "converge" would be a better term.2
1: It actually means "proportional to n when n is large in the worst-case scenario", but books usually consider the "average" case, which is more accurately represented by f(n) ~ n, where f is your function.
2: For the more technical people: it should be obvious, but I am not intending to be mathematically rigorous. Math.SE might be a better choice for asking this question, if you're looking for formal proofs (with ratios, limits, and whatnot).
From the web page you linked:
Similarly, constant multipliers are ignored. So a O(4*N) algorithm is
equivalent to O(N), which is how it should be written. Ultimately you
want to pay attention to these multipliers in determining the
performance, but for the first round of analysis using Big-Oh, you
simply ignore constant factors.
The Big O notation is a general expression of an algorithm's efficiency. You should not construe it as an exact equation for how fast an algorithm is going to be, or how much memory it is going to require, but rather view it as an approximation. The constant multipliers of the function are ignored, so for example you could view your example as O 1/2(N), or O k(N), drop the k which gives you O(N).
We were just assigned a new project in my data structures class -- Generating text with markov chains.
Overview
Given an input text file, we create an initial seed of length n characters. We add that to our output string and choose our next character based on frequency analysis..
This is the cat and there are two dogs.
Initial seed: "Th"
Possible next letters -- i, e, e
Therefore, probability of choosing i is 1/3, e is 2/3.
Now, say we choose i. We add "i" to the output string. Then our seed becomes
hi and the process continues.
My solution
I have 3 classes, Node, ConcreteTrie, and Driver
Of course, the ConcreteTrie class isn't a Trie of the traditional sense. Here is how it works:
Given the sentence with k=2:
This is the cat and there are two dogs.
I generate Nodes Th, hi, is, ... + ... , gs, s.
Each of these nodes have children that are the letter that follows them. For example, Node Th would have children i and e. I maintain counts in each of those nodes so that I can later generate the probabilities for choosing the next letter.
My question:
First of all, what is the most efficient way to complete this project? My solution seems to be very fast, but I really want to knock my professor's socks off. (On my last project A variation of the Edit distance problem, I did an A*, a genetic algorithm, a BFS, and Simulated Annealing -- and I know that the problem is NP-Hard)
Second, what's the point of this assignment? It doesn't really seem to relate to much of what we've covered in class. What are we supposed to learn?
On the relevance of this assignment with what you covered in class (Your second question). The idea of a 'data structures' class is to expose students to the very many structures frequently encountered in CS: lists, stacks, queues, hashes, trees of various types, graphs at large, matrices of various creed and greed, etc. and to provide some insight into their common implementations, their strengths and weaknesses and generally their various fields of application.
Since most any game / puzzle / problem can be mapped to some set of these structures, there is no lack of subjects upon which to base lectures and assignments. Your class seems interesting because while keeping some focus on these structures, you are also given a chance to discover real applications.
For example in a thinly disguised fashion the "cat and two dogs" thing is an introduction to statistical models applied to linguistics. Your curiosity and motivation prompted you to make the relation with markov models and it's a good thing, because chances are you'll meet "Markov" a few more times before graduation ;-) and certainly in a professional life in CS or related domain. So, yes! it may seem that you're butterflying around many applications etc. but so long as you get a feel for what structures and algorithms to select in particular situations, you're not wasting your time!
Now, a few hints on possible approaches to the assignment
The trie seems like a natural support for this type of problem. Maybe you can ask yourself however how this approach would scale, if you had to index say a whole book rather than this short sentence. It seems mostly linearly, although this depends on how each choice on the three hops in the trie (for this 2nd order Markov chain) : as the number of choices increase, picking a path may become less efficient.
A possible alternative storage for the building of the index is a stochatisc matrix (actually a 'plain' if only sparse matrix, during the statistics gathering process, turned stochastic at the end when you normalize each row -or column- depending on you set it up) to sum-up to one (100%). Such a matrix would be roughly 729 x 28, and would allow the indexing, in one single operation, of a two-letter tuple and its associated following letter. (I got 28 for including the "start" and "stop" signals, details...)
The cost of this more efficient indexing is the use of extra space. Space-wise the trie is very efficient, only storing the combinations of letter triplets effectively in existence, the matrix however wastes some space (you bet in the end it will be very sparsely populated, even after indexing much more text that the "dog/cat" sentence.)
This size vs. CPU compromise is very common, although some algorithms/structures are somtimes better than others on both counts... Furthermore the matrix approach wouldn't scale nicely, size-wize, if the problem was changed to base the choice of letters from the preceding say, three characters.
None the less, maybe look into the matrix as an alternate implementation. It is very much in spirit of this class to try various structures and see why/where they are better than others (in the context of a specific task).
A small side trip you can take is to create a tag cloud based on the probabilities of the letters pairs (or triplets): both the trie and the matrix contain all the data necessary for that; the matrix with all its interesting properties, may be more suited for this.
Have fun!
You using bigram approach with characters, but usually it applied to words, because the output will be more meaningful if we use just simple generator as in your case).
1) From my point of view you doing all right. But may be you should try slightly randomize selection of the next node? E.g. select random node from 5 highest. I mean if you always select node with highest probability your output string will be too uniform.
2) I've done exactly the same homework at my university. I think the point is to show to the students that Markov chains are powerful but without extensive study of application domain output of generator will be ridiculous