Implementing a graph in either C++ or Java [closed] - java

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
What's the best ways of implementing a graph in either C++ or Java? In C++, I was thinking about using a 2d array to do it. In java, I was considering an arrayList.

Firstly, language choices aren't the most massive issue in the world, in my opinion. Unless you have a requirement to use a specific language or on portability, using either C++ or Java will be sufficient. Having said that, your question seems somewhat homework-ish. Are you using Prim's algorithm for your MST implementation?
As other answers have already said, there are a few ways to represent graphs. The two that I'm most familiar with for representing graphs are:
An Adjacency Matrix, which is a 2D array where each "row" and "column" is a node, and a value at that position of the matrix indicates an edge (or an edge weight) and a null-value (or a 0-value, or some other sentinel/meaningful value) indicates no edge
An Adjacency List, which is a 2D array (kinda), where the i-th list is the list of all the nodes that are connected to (adjacent to) node i. At times, you may also choose to make the list a list of pairs of node names and edge weights, if your graph is directed/weighted.
In the adjacency list article on Wikipedia (linked above) there is a section on tradeoffs between the two representations.
On the subject of the MST algorithm:
You will probably get better performance using an adjacency list, out of the top of my head, but that's only theoretically (I think?). Implementation-wise, there are things such as locality of reference to take into account. I would personally prefer, for ease of coding, however, to use an adjacency matrix (I just personally find them easier to work with, especially on weighted graphs), unless there's a need for really good performance.
Adjacency Matrix (C++):
vector<vector<int> > adj_Mat(n, vector<int>(n, 0));
where n is the number of nodes in the graph. Then adj_Mat[i][j] is the weight of the edge between nodes i and j.
Adjacency List (C++):
vector<vector<pair<int, int> > > adj_list(n);
Then, if the i-th node has an edge weight of k to node j, I'd do something like this (assuming the graph is directed)
adj_list[i].push_back(pair<int, int>(j, k));
Now, my C++ is really hacky because I tend to use it for hacking random code in coding competitions, so this isn't really good code but it's really basic ways to code up these data representations.
Sorry about the massive rant, and I hope it does help.

There are 2 conventional ways to represent a graph.
Adjacency matrix: meaning if you got n nodes then you got a matrix of n by n with matrix[i, j] indicates if there is an edge between the nodes (usually with a boolean or int if you want weights)
Adjacency list: have an collection for each node which indicates the nodes that it has edges to.
For the first approach a 2d array is the best, for the second I would go with an HashSet (or HashMap if you want weights)
When choosing one over the other you need to consider the graph size, the number of edges vs number of nodes and what algorithm you are going to run on it.

Check out the Boost Graph Library. There is a learning phase, but once you get it you will never look back.

The most efficient way to implement graphs G=(V,E) depends on the properties of these graphs and what kind of operations you tend to perform on them. Generally speaking, for sparse graphs, i.e. graphs where |E| << |V|^2, one usually chooses adjacency lists (slower, but memory-efficient). For graphs with |E| =~ |V|^2, adjacency matrices are usually better (faster, but memory-intensive).
Depending on what exactly you intend to do with your graphs, other approaches might be more performant.

C++ is faster than Java. And if you want to create a good graph, you should make your own type with fields you need (for example weight of graph's nodes). You also should create a Node class, which contains it's attributes and list of other nodes to which it's connected.
In Graph you should have list of it's nodes, its attributes and functions (for example bool isFlat()).
I think it's a better way to implement graph than 2D array or ArrayList.

If you need fast algorithm use array-based approach in C++.
But if you write algorithm, that should be read by other people use OOP-based approach in Java.

Related

Implementing Dijkstra's algorithm in Java

I've done a fair bit of reading around this, and know that discussions regarding this algorithm in Java have been semi-frequent. My issue with implementing Dijkstra's algorithm in Java is simply that I'm not sure how to prepare my data.
I have a set of coordinates within an array, and a set of 1s and 0s in a matrix that represent whether there is a path between the points that the coordinates represent. My question is, how do I present this information so that I can search for the best path with Dijkstra? I have seen many people create a "Node" class, but they never seem to store the coordinates within that Node. Is there some standardized way of creating this kind of structure (I suppose it's a graph?) that I am simply missing?
Any help would be appreciated.
There are two main options:
1. You can use an adjacency matrix in which rows an columns represent your nodes. The value matrix[x, y] must be the weight(e.g. distance/cost etc.) to travel from x to y. You could use the Euclidian distance to calculate these values from your coordinate array;
2. You can implement a couple of classes (Node, Edge - or just Node with a internal Map to another node and the weight as a map value) - it is a graph indeed.

When representing a sparse graph with an adjacency matrix, why use linked list as structure to contain edges?

When representing graphs in memory in a language like Java, either an adjacency matrix is used (for dense graphs) or an adjacency list for sparse graphs.
So say we represent the latter like
Map<Integer, LinkedList<Integer>> graph;
The integer key represents the vertex and LinkedList contains all the other vertexes it points to.
Why use a LinkedList to represent the edges? Couldn't an int[] or ArrayList work just as fine, or is there a reason why you want to represent the edges in a way that maintains the ordering such as
2 -> 4 -> 1 -> 5
Either an int[] or ArrayList could also work.
I wouldn't recommend an int[] right off the bat though, as you'll need to cater for resizing in case you don't know all the sizes from the start, essentially simulating the ArrayList functionality, but it might make sense if memory is an issue.
A LinkedList might be slightly preferable since you'd need to either make the array / ArrayList large enough to handle the maximum number of possible edges, or resize it as you go, where-as you don't have this problem with a LinkedList, but then again, creating the graph probably isn't the most resource-intensive task for most applications.
Bottom line - it's most likely going to make a negligible difference for most applications - just pick whichever one you feel most comfortable with (unless of course you need to do access-by-index a lot, or something which one of the two performs a lot better than the other).
Algorithms 4th Edition by Sedgewick and Wayne proposes the following desired performance characteristics for a graph implementation that is useful for most graph-processing applications:
Space usage proportional to V + E
Constant time to add an edge
Time proportional to the degree of v to iterate through vertices adjacent to v (constant time per adjacent vertex processed)
Using a linked list to represent the vertices adjacent to each vertex has all these characteristics. Using an array instead of a linked list would result in either (1) or (2) not being achieved.

Using a Map for a Graph

Hi I'm going about implementing a Graph data structure for a course (the graph isn't part of the requirement, I've chosen to use it to approach the problem) and my first thought was to implement it using an adjacency list, because that requires less memory, and I don't expect to have that many edges in my graph.
But then it occurred to me. I can implement an adjacency list Graph data structure using a Map (HashMap to be specific). Instead of the list of vertices I'll have a Map of vertices, which then hold a short list of edges to vertices.
This seems to be the way to go for me. But I was wondering if anyone can see any drawbacks that a student such as I might have missed in using a HashMap for this? (unfortunately I recall being very tired whilst we were going over HashMaps...so my knowledge of them is less than all the other data structures I know of.) So I want to be sure.
By the way I'm using Java.
The two primary ways of representing a graph are:
with the adjacency list (as you mentioned)
with the adjacency matrix
Since you will not have too many edges (i.e. the adjacency matrix representing your graph would be sparse), I think your decision to use the list instead of the matrix is a good one since, as you said, it will indeed take up less space since no space is wasted to represent the absent edges. Furthermore, the Map approach seems to be logical, as you can map each Node of your graph to the list of Nodes to which it is connected. Another alternative would be to have each Node object contain, as a data field, the list of nodes to which it is connected. I think either of these approaches could work well. I've summed it up below.
First approach (maintain the map):
Map<Node, Node[]> graph = new HashMap<Node, Node[]>();
Second approach (data built into Node class):
public class Node {
private Node[] adjacentNodes;
public Node(Node[] nodes) { adjacentNodes = nodes; }
public Node[] adjacentNodes() { return adjacentNodes; }
}
Graphs are traditionally represented either via an adjacency list or an adjacency matrix (there are other ways that are optimized for certain graph formats, such as if the node id's are labeled sequentially and/or you know the number of nodes/edges ahead of time, but I won't get into that).
Picking between an adjacency list and an adjacency matrix depends on your needs. Clearly, an adjacency matrix will take up more space than an adjacency list (matrix will always take (# of nodes)^2 whereas a list will take (# of nodes + # of edges), but if your graph is "small" then it doesn't really make a difference.
Another concern is how many edges you have (is your graph sparse or dense)? You can find the density of your graph by taking the # of edges you have and dividing it by:
n(n-1) / 2
Where "n" is the number of nodes of the graph. The above equation finds the total # of possible edges in an "n" node UNDIRECTED graph. If the graph is directed, remove the " / 2".
Something else to think of is if efficient edge membership is important. An adjacency list can detect edge membership easily (O(1)) since it's just an array lookup - for an adjacency list if the "list" is stored as something other than a HashSet it will be much slower since you will have to look through the entire edgelist. Or maybe you keep the edgelist's sorted and you can just do a binary search, but then edge insertion takes longer. Maybe your graph is very sparse, and adjacency matrix is using too much memory, so you have to use an adjacency list. Lot's of things to think about.
There's a lot more concerns that may relate to your project, I just list a few.
In general, assuming your graph isn't very complex or "big" (in the sense of millions of nodes), a HashMap where the key is the node ID and the value is a Set or some other collection of node ID's indicating neighbors of the key node is fine, I've done this for 400,000+ node graphs on an 8gb machine. A HashMap based implementation will probably be easiest to implement.

Generating Array-based Graph With Java

I need to generate a graph using integer arrays. Edges of the graphs are kept as edges[e][2] where e is the number of edges.
I need my graph to be connected, i.e. you should be able to traverse from all nodes to all nodes.
edges[0] = {0,5} means an edge connects node 0 and node 5.
Could you suggest an algorithm, please?
And please keep in mind that i will generate graphs with millions of nodes, so that it is better if algorithm complexity isn't too high.
If each node is directly connected to each node, don't store all the edges ;)
If each node is reachable from each other node, but not necessarily directly, use an adjacency matrix. That's the simplest way if you need to use integer arrays.
If the matrix is sparse, I would store it differently, though. The best encoding depends on what graph algorithms you want to use it for. The wikipedia article on sparce matrices) lists the major ones.

Markov Chain Text Generation

We were just assigned a new project in my data structures class -- Generating text with markov chains.
Overview
Given an input text file, we create an initial seed of length n characters. We add that to our output string and choose our next character based on frequency analysis..
This is the cat and there are two dogs.
Initial seed: "Th"
Possible next letters -- i, e, e
Therefore, probability of choosing i is 1/3, e is 2/3.
Now, say we choose i. We add "i" to the output string. Then our seed becomes
hi and the process continues.
My solution
I have 3 classes, Node, ConcreteTrie, and Driver
Of course, the ConcreteTrie class isn't a Trie of the traditional sense. Here is how it works:
Given the sentence with k=2:
This is the cat and there are two dogs.
I generate Nodes Th, hi, is, ... + ... , gs, s.
Each of these nodes have children that are the letter that follows them. For example, Node Th would have children i and e. I maintain counts in each of those nodes so that I can later generate the probabilities for choosing the next letter.
My question:
First of all, what is the most efficient way to complete this project? My solution seems to be very fast, but I really want to knock my professor's socks off. (On my last project A variation of the Edit distance problem, I did an A*, a genetic algorithm, a BFS, and Simulated Annealing -- and I know that the problem is NP-Hard)
Second, what's the point of this assignment? It doesn't really seem to relate to much of what we've covered in class. What are we supposed to learn?
On the relevance of this assignment with what you covered in class (Your second question). The idea of a 'data structures' class is to expose students to the very many structures frequently encountered in CS: lists, stacks, queues, hashes, trees of various types, graphs at large, matrices of various creed and greed, etc. and to provide some insight into their common implementations, their strengths and weaknesses and generally their various fields of application.
Since most any game / puzzle / problem can be mapped to some set of these structures, there is no lack of subjects upon which to base lectures and assignments. Your class seems interesting because while keeping some focus on these structures, you are also given a chance to discover real applications.
For example in a thinly disguised fashion the "cat and two dogs" thing is an introduction to statistical models applied to linguistics. Your curiosity and motivation prompted you to make the relation with markov models and it's a good thing, because chances are you'll meet "Markov" a few more times before graduation ;-) and certainly in a professional life in CS or related domain. So, yes! it may seem that you're butterflying around many applications etc. but so long as you get a feel for what structures and algorithms to select in particular situations, you're not wasting your time!
Now, a few hints on possible approaches to the assignment
The trie seems like a natural support for this type of problem. Maybe you can ask yourself however how this approach would scale, if you had to index say a whole book rather than this short sentence. It seems mostly linearly, although this depends on how each choice on the three hops in the trie (for this 2nd order Markov chain) : as the number of choices increase, picking a path may become less efficient.
A possible alternative storage for the building of the index is a stochatisc matrix (actually a 'plain' if only sparse matrix, during the statistics gathering process, turned stochastic at the end when you normalize each row -or column- depending on you set it up) to sum-up to one (100%). Such a matrix would be roughly 729 x 28, and would allow the indexing, in one single operation, of a two-letter tuple and its associated following letter. (I got 28 for including the "start" and "stop" signals, details...)
The cost of this more efficient indexing is the use of extra space. Space-wise the trie is very efficient, only storing the combinations of letter triplets effectively in existence, the matrix however wastes some space (you bet in the end it will be very sparsely populated, even after indexing much more text that the "dog/cat" sentence.)
This size vs. CPU compromise is very common, although some algorithms/structures are somtimes better than others on both counts... Furthermore the matrix approach wouldn't scale nicely, size-wize, if the problem was changed to base the choice of letters from the preceding say, three characters.
None the less, maybe look into the matrix as an alternate implementation. It is very much in spirit of this class to try various structures and see why/where they are better than others (in the context of a specific task).
A small side trip you can take is to create a tag cloud based on the probabilities of the letters pairs (or triplets): both the trie and the matrix contain all the data necessary for that; the matrix with all its interesting properties, may be more suited for this.
Have fun!
You using bigram approach with characters, but usually it applied to words, because the output will be more meaningful if we use just simple generator as in your case).
1) From my point of view you doing all right. But may be you should try slightly randomize selection of the next node? E.g. select random node from 5 highest. I mean if you always select node with highest probability your output string will be too uniform.
2) I've done exactly the same homework at my university. I think the point is to show to the students that Markov chains are powerful but without extensive study of application domain output of generator will be ridiculous

Categories