Java adjacency list

Java adjacency list - java

I am having trouble comprehending how to scan in an adjacency list to a graph. I understand how adjacency tables work and their mapping to each other but what I don't understand is what type of data type to store them in. My assignment is to take an input file that tells the number of vertexes G=(V,E) and gives the edges to the other numbers in the graph.
So for example:
3
010
101
110
so:
0 maps to 1
1 maps to 0
2 maps to 0
2 maps to 1
From there I have to implement a breath search and a depth search on them. Would a hash table be my best bet?

The difference of using BFS and DFS is in which data structure you store the data, one is a "queue" the other is a "stack" (your answer). If you use a java list, you could get them from the beginning or from the end, but you can also use "real" stack and queue.
So in your case, create a List, and store the origin of your search in it.
After a while loop, while you have elements in your list keep it going.
So pick your element from the list ( first or last) and evaluate if it is your target, if it is not, store all its neighbors in the list and keep it going.
You may add something two stop adding the same element twice, you should have a list of visited nodes.
But, I have doubts if you wanted to know where to store the adjacency list. An array of lists would do. Every vertex, vertex[i] has a List with all the vertices that is connected to.

Related

Simplest algorithm to find 4-cycles in an undirected graph

I have an input text file containing a line for each edge of a simple undirected graph. The file contains reciprocal edges, i.e. if there's a line u,v, then there's also the line v,u.
I need an algorithm which just counts the number of 4-cycles in this graph. I don't need it to be optimal because I only have to use it as a term of comparison. If you can suggest me a Java implementation, I would appreciate it for the rest of my life.
Thank you in advance.

Construct the adjacency matrix M, where M[i,j] is 1 if there's an edge between i and j. M² is then a matrix which counts the numbers of paths of length 2 between each pair of vertices.
The number of 4-cycles is sum_{i<j}(M²[i,j]*(M²[i,j]-1)/2)/2. This is because if there's n paths of length 2 between a pair of points, the graph has n choose 2 (that is n*(n-1)/2) 4-cycles. We sum only the top half of the matrix to avoid double counting and degenerate paths like a-b-a-b-a. We still count each 4-cycle twice (once per pair of opposite points on the cycle), so we divide the overall total by another factor of 2.
If you use a matrix library, this can be implemented in a very few lines code.

Detecting a cycle is one thing but counting all of the 4-cycles is another. I think what you want is a variant of breadth first search (BFS) rather than DFS as has been suggested. I'll not go deeply into the implementation details, but note the important points.
1) A path is a concatenation of edges sharing the same vertex.
2) A 4-cycle is a 4-edge path where the start and end vertices are the same.
So I'd approach it this way.
Read in graph G and maintain it using Java objects Vertex and Edge. Every Vertex object will have an ArrayList of all of the Edges that are connected to that Vertex.
The object Path will contain all of the vertexes in the path in order.
PathList will contain all of the paths.
Initialize PathList to all of the 1-edge paths which are exactly all of edges in G. BTW, this list will contain all of the 1-cycles (vertexes connected to themselves) as well as all other paths.
Create a function that will (pseudocode, infer the meaning from the function name)
PathList iterate(PathList currentPathList)
{
newPathList = new PathList();
for(path in currentPathList.getPaths())
{
for(edge in path.lastVertexPath().getEdges())
{
PathList.addPath(Path.newPathFromPathAndEdge(path,edge));
}
}
return newPathList;
}
Replace currentPathList with PathList.iterate(currentPathList) once and you will have all of the 2-cyles, call it twice and you will have all of the 3 cycles, call it 3 times and you will have all of the 4 cycles.
Search through all of the paths and find the 4-cycles by checking
Path.firstVertex().isEqualTo(path.lastVertex())

Depth-first search, DFS-this is what you need

Construct an adjacency matrix, as prescribed by Anonymous on Jan 18th, and then find all the cycles of size 4.
It is an enumeration problem. If we know that the graph is a complete graph, then we know off a generating function for the number of cycles of any length. But for most of other graphs, you have to find all the cycles to find the exact number of cycles.
Depth first search with backtracking should be the ideal strategy. Implement it with each node as the starting node, one by one. Keep track of visited nodes. If you run out of nodes without finding a cycle of size 4, just backtrack and try a different route.
Backtrack is not ideal for larger graphs. For example, even a complete graph of order 11 is a little to much for backtracking algorithms. For larger graphs you can look for a randomized algorithm.

Why does Insertion Sort not need the swap operation?

So I am currently learning Java and I was asking myself, why the Insertion-Sort method doesn´t have the need to use the swap operation? As Far as I understood, elements get swapped so wouldn´t it be usefull to use the swap operation in this sorting algorithm?
As I said, I am new to this but I try to understand the background of these algorithms , why they are the way they actually are
Would be happy for some insights :)
B.

Wikipedia's article for Insertion sort states
Each iteration, insertion sort removes one element from the input
data, finds the location it belongs within the sorted list, and
inserts it there. It repeats until no input elements remain. [...] If
smaller, it finds the correct position within the sorted list, shifts
all the larger values up to make a space, and inserts into that
correct position.
You can consider this shift as an extreme swap. What actually happens is the value is stored in a placeholder and checked versus the other values. If those values are smaller, they are simply shifted, ie. replace the previous (or next) position in the list/array. The placeholder's value is then put in the position from which the element was shifted.

Insertion Sort does not perform swapping. It performs insertions by shifting elements in a sequential list to make room for the element that is being inserted.
That is why it is an O(N^2) algorithm: for each element out of N, there can be O(N) shifts.

So, you Could do insertion sort by swapping.
But, is that the best way to do it? you should think of what a swap is...
temp = a
a=b
b=temp
there are 3 assignments that take place for a single swap.
eg. [2,3,1]
If the above list is to be sorted, you could 1. swap 3 and 1 then, 2. swap 1 and 2
total 6 assignments
Now,
Instead of swapping, if you just shift 2 and 3 one place to the right ( 1 assignment each) and then put 1 in array[0], you would end up with just 3 assignments instead of the 6 you would do with swapping.

Finding the median in B+ tree

I need to implement a B+ tree.
And i need to create the following methods:
Insert(x) - 0(log_t(x)).
Search - Successful search - O(log_t(x)). Unsuccessful search - O(1) {With a high likely-hood}
So i started with implementing Insert(x)- Each time i have a full leaf i want to split it up into two separated leaves.
One leaf with keys equal or lower to the median key, Second one will contains keys with higher value than the median.
How can i find this median without hurting the run-time?
I thought about:
Representing each of the internal node and leaves as a smaller B+ tree, But then the median is the root (or one of the elements in the root) only when the tree is fully balanced.
Representing each of the internal nodes and leaves as a doubly-linked list. And trying to get the median key while the input is inserted, But there's input which doesn't work with it.
Representing as array might give me the middle, But then when i split it up i need at least O(n/2) to insert the keys into a new array.
What can i do?
And about the search, Idea-wise: The difference between a successful and unsuccessful search is about searching in the leaves, But i still need to 'run' through the different keys of the tree to determine whether the key is in the tree. So how can it be O(1)?

In B+ trees, all the values are stored in the leaves.
Note that you can add a pointer from each leaf to the following leaf, and you get in addition to the standard B+ tree an ordered linked list with all elements.
Now, note that assuming you know what the current median in this linked list is - upon insertion/deletion you can cheaply calculate the new median (it can be the same node, the next node or the previous node, no other choices).
Note that modifying this pointer is O(1) (though the insertion/deletion itself is O(logn).
Given that knowledge - one can cache a pointer to the median element and make sure to maintain it upon deletion/insertion. When you ask for median - just take the median from the cache - O(1).
Regarding Unsuccessful search - O(1) {With a high likely-hood} - this one screams bloom filters, which are a a probabilistic set implementation that never has false-negatives (never says something is not in set while it is), but has some false-positives (says something is in cache while in fact it isn't).

You don't need the median of the B+-tree. You need the median key in the node you're splitting. You have to split at that median to satisfy the condition that each node has N/2 <= n <= N keys. The median key in a node is just the one in the middle, at n/2, where n is the number of actual keys in the node. That's where you split the node. Computing that is O(1): it won't hurt the runtime.
You can't get O(1) search failure time from a B+-tree without superimposing another data structure.

I've already posted an answer (and since deleted it), but it's possible I've misinterpreted, so here's an answer for another interpretation...
What if you need to always know which item is the median in the complete B+ tree container.
As amit says, you can keep a pointer (along with your root pointer) to the current leaf node that contains the median. You can also keep an index into that leaf node. So you get O(1) access by following those directly to the correct node and item.
The issue is maintaining that. Certainly amit is correct that for each insert, the median must also remain the same item, or must step to the one just before or after. And if you have a linked list through the leaf nodes, that can be handled efficiently even if that means stepping to an adjacent leaf node.
I'm not convinced, though, that's it's trivial to determine whether or which way to step, though, except in the special case where the median and the insert happen to be in the same leaf node.
If you know the size of the complete tree (which you can easily store and maintain with the root pointer), you can at least determine which index the median item should be at both before and after the insert.
However, you need to know if the previous median item had it's index shifted up by the insert - if the insert point was before or after the median. Unless the insert point and median happen to be in the same node, that's a problem.
Overkill way - augment the B+ tree to support calculating the index of an item and searching for indexes. The trick for that is that each node keeps a total of the number of items in the leaf nodes of its subtree. That can be pushed up a level so each branch node has an array of subtree sizes along with its array of child node pointers.
This offers two solutions. You could use the information to determine the index for the insert point as you search, or (providing nodes have parent pointers) you could use it to re-determine the index of the previous median item after the insert.
[Actually three. After inserting, you could just search for the new half-way index based on the new size without reference to the previous median link.]
In terms of data stored for augmentation, though, this turns out to be overkill. You don't need to know the index of the insert point or the previous median - you can make do with knowing which side of the median the insert occurred on. If you know the trail to follow from the root to the median item, you should be able to keep track of which side of it you are as you search for the insert point. So you only need to augment with enough information to find and maintain that trail.

Using a Map for a Graph

Hi I'm going about implementing a Graph data structure for a course (the graph isn't part of the requirement, I've chosen to use it to approach the problem) and my first thought was to implement it using an adjacency list, because that requires less memory, and I don't expect to have that many edges in my graph.
But then it occurred to me. I can implement an adjacency list Graph data structure using a Map (HashMap to be specific). Instead of the list of vertices I'll have a Map of vertices, which then hold a short list of edges to vertices.
This seems to be the way to go for me. But I was wondering if anyone can see any drawbacks that a student such as I might have missed in using a HashMap for this? (unfortunately I recall being very tired whilst we were going over HashMaps...so my knowledge of them is less than all the other data structures I know of.) So I want to be sure.
By the way I'm using Java.

The two primary ways of representing a graph are:
with the adjacency list (as you mentioned)
with the adjacency matrix
Since you will not have too many edges (i.e. the adjacency matrix representing your graph would be sparse), I think your decision to use the list instead of the matrix is a good one since, as you said, it will indeed take up less space since no space is wasted to represent the absent edges. Furthermore, the Map approach seems to be logical, as you can map each Node of your graph to the list of Nodes to which it is connected. Another alternative would be to have each Node object contain, as a data field, the list of nodes to which it is connected. I think either of these approaches could work well. I've summed it up below.
First approach (maintain the map):
Map<Node, Node[]> graph = new HashMap<Node, Node[]>();
Second approach (data built into Node class):
public class Node {
private Node[] adjacentNodes;
public Node(Node[] nodes) { adjacentNodes = nodes; }
public Node[] adjacentNodes() { return adjacentNodes; }
}

Graphs are traditionally represented either via an adjacency list or an adjacency matrix (there are other ways that are optimized for certain graph formats, such as if the node id's are labeled sequentially and/or you know the number of nodes/edges ahead of time, but I won't get into that).
Picking between an adjacency list and an adjacency matrix depends on your needs. Clearly, an adjacency matrix will take up more space than an adjacency list (matrix will always take (# of nodes)^2 whereas a list will take (# of nodes + # of edges), but if your graph is "small" then it doesn't really make a difference.
Another concern is how many edges you have (is your graph sparse or dense)? You can find the density of your graph by taking the # of edges you have and dividing it by:
n(n-1) / 2
Where "n" is the number of nodes of the graph. The above equation finds the total # of possible edges in an "n" node UNDIRECTED graph. If the graph is directed, remove the " / 2".
Something else to think of is if efficient edge membership is important. An adjacency list can detect edge membership easily (O(1)) since it's just an array lookup - for an adjacency list if the "list" is stored as something other than a HashSet it will be much slower since you will have to look through the entire edgelist. Or maybe you keep the edgelist's sorted and you can just do a binary search, but then edge insertion takes longer. Maybe your graph is very sparse, and adjacency matrix is using too much memory, so you have to use an adjacency list. Lot's of things to think about.
There's a lot more concerns that may relate to your project, I just list a few.
In general, assuming your graph isn't very complex or "big" (in the sense of millions of nodes), a HashMap where the key is the node ID and the value is a Set or some other collection of node ID's indicating neighbors of the key node is fine, I've done this for 400,000+ node graphs on an 8gb machine. A HashMap based implementation will probably be easiest to implement.

algorithm for implementing DFA as a linked list

I want to know how to implement a DFA as a linked list in C/C++/Java.

since every state can have several branches, you probably need more than one linked list. that means, every state has an array of n linked lists. so it's more like a tree structure with cycles than a simple linked list.

This is definitely possible, but would be grossly inefficient. What you would do is to simply store all your states in a link list, and then each state would need to keep a transition table. The transition table would look something like:
'a' -> 2
'b' -> 5
where your alphabet is {a,b}, and 2 and 5 are the states stored at position 2 and 5 in the linked list. As I said, this is definitely NOT how you would want to implement a DFA, but it is possible.

The first thing that came up in my mind is that,
create a class/struct called state with two array components. one for the states that can reach our state and one for the ones that are reachable from our state.
Then create a linked list whose elements are your states.
here's my implementation of this class
class state
{
private:
string stateName;
vector<state> states_before_me;
vector<state> states_after_me;
state* next;
//methods of this state
}

Single linked list couldn't represent the DFA efficiently. You can think DFA as a directed weighted graph data structure as states are vertices, transitions are edges, transition symbols are weights. There are two main method to implement graph structure.
i) Adjacency list: It basically has V(Number of vertices) linked lists. Each link list contains vertices which has edge to corresponding vertex. If we have vertices (1,2,3) and edges (1,2),(1,3),(2,1),(2,3),(3,3) corresponding adjanceny list is:
1->2->3
2->1->3
3->3
ii) Adjacency matrix: It is a VxV matrix with every entry at (i,j) symbolize an edge from i to j. The same example above represented like(1 means there is edge, 0 mean there is not):
1 2 3
1 0 1 1
2 1 0 1
3 0 0 1
But you must make little changes to these because your graph is weighted.
For list implementation you can change vertices in linklist to a struct which contains vertex and the weight of the edge connecting these vertices.
For matrix implementation you can place the weights directly into matrix instead of 0,1 values.
If you don't want to deal with the implementation of graph class there is libraries like Boost Graph Library which contains the two implementation and all the important graph algorithms DFS to Dijkstra's shortest path algorithm. You can look it up from http://www.boost.org/doc/libs/1_47_0/libs/graph/doc/index.html.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.