Issues in Implementing Karger's Min Cut Algorithm using Union-Find - java

I've implemented the Karger's algorithm using the Union-Find Datastructure using Path-Compression Heuristics and Union by Rank but I've run into a couple of issues
What I've basically done is, I run the algorithm NNlog(N) time for a good estimate of the answer. However, I simply just don't get the answer for the MinCut. I pick a random edge each time which has 2 members the source 's' and the destination 'd'. If their parents are not equal, I merge them and reduce the count of the vertices, 'vcnt' which was initially equal to the original number of vertices. This process continues until the number of vertices left is 2. Finally, i find the parent of the source and destination of each edge and if they are ont equal, I increase the MinCut count. This repeats NNLog(N) times.
I've tried running my code with a lot of test data but I don't seem to be getting the Mincut Value, Especially for large data.
Could anyone help me out? Also, performance improvement suggestions are welcome. Here is the code:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
class KragersMinCut
{
static int n=200;//Number of Vertices
static int[] u=new int[n];
static int[]rank =new int[n];
static class Edge //Edge which hols the source and destination
{
int s,d;//Source,Destination
Edge(int s,int d)
{
this.s=s;
this.d=d;
}
}
private static void InitializeUnionFindData()
{
for(int i=0;i<n;i++)
{
u[i]=i;
rank[i]=1;
}
}
private static int FIND(int xx) //Finding Parent using Path-Compression Heuristics
{
if(u[xx]!=u[u[xx]])
{
u[xx]=FIND(u[xx]);
}
return u[xx];
}
private static boolean UNION(int x,int y) //Union by Order-by-Rank to create evenly balanced search trees
{
int px=FIND(x),py=FIND(y);
if(rank[px]>rank[py])
{
int temp=px;
px=py;
py=temp;
}
else if(rank[px]==rank[py])
rank[py]++;
u[px]=py;
return true;
}
public static void main(String[] args) throws IOException
{
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
ArrayList<Edge> EdgeList=new ArrayList<Edge>();
for(int i=0;i<n;i++)
{
String x=br.readLine();
ArrayList<Integer>al=new ArrayList<Integer>();
for(int j=0;j<x.length();j++) //This loop is for parsing the input format
{
if(x.charAt(j)<48 || x.charAt(j)>57)
continue;
int p=j;
String input="";
while(p!=x.length()&&(x.charAt(p)>=48 && x.charAt(p)<=57))
{
input+=(x.charAt(p));
p++;
}
j=p;
al.add(Integer.parseInt(input.trim())-1);
}
for(int j=1;j<al.size();j++)
{
EdgeList.add(new Edge(al.get(0),al.get(j)));//Source,Destination
}
}
//Edge list ready
int MinCut=Integer.MAX_VALUE;
for(int q=0;q<(n*n)*Math.log(n);q++)//Running theta(n^2*ln(n)) times for a good estimate. Runs in about 20 secs
{
int vcnt=n;//Essentially n
InitializeUnionFindData();
while(vcnt>2)
{
Edge x=EdgeList.get((int)(Math.random()*(EdgeList.size()-1)+1));//Obtaining random valued element at index from EdgeList
int s=x.s,d=x.d;
int ps=FIND(s),pd=FIND(d);
if(ps!=pd)//Contracting. Essentially making their parents equal
{
UNION(s,d);
vcnt--;
}
}
int CurrMinCutValue=0;
for(Edge i:EdgeList)
{
int px=FIND(i.s),py=FIND(i.d);
if(px!=py)//Since they belong to different Vertices
{
CurrMinCutValue++;
}
}
MinCut=Math.min(MinCut,CurrMinCutValue);//Finding Minimum cut of all random runs
}
System.out.println(MinCut);
}
}
TestData: (Source Vertex-> Connected Vertices)
1 2 3 4 7
2 1 3 4
3 1 2 4
4 1 2 3 5
5 4 6 7 8
6 5 7 8
7 1 5 6 8
8 5 6 7
Answer: 4 | Expected Answer: 2
Link: http://ideone.com/QP62FN
Thanks

The algorithm suggests that Every edge is equally likely to be selected for the merging.
But your code never selects the edge at index 0.
So modify the line:
Edge x=EdgeList.get((int)(Math.random()*(EdgeList.size()-1)+1));
to this:
Edge x=EdgeList.get((int)(Math.random()*(EdgeList.size())));
Also because every edge is listed twice in the edge list:
you should print the following
System.out.println(MinCut/2);
Now it should work.

Related

How can I improve this search algorithms runtime?

I'm trying to solve an interview problem I was given a few years ago in preparation for upcoming interviews. The problem is outlined in a pdf here. I wrote a simple solution using DFS that works fine for the example outlined in the document, but I haven't been able to get the program to meet the criteria of
Your code should produce correct answers in under a second for a
10,000 x 10,000 Geo GeoBlock containing 10,000 occupied Geos.
To test this I generated a CSV file with 10000 random entries and when I run the code against it, it averages just over 2 seconds to find the largest geo block in it. I'm not sure what improvements could be made to my approach to cut the runtime by over half, other than running it on a faster laptop. From my investigations it appears the search itself seems to only take about 8ms, so perhaps the way I load the data into memory is the inefficient part?
I'd greatly appreciate an advice on how this could be improved. See code below:
GeoBlockAnalyzer
package analyzer.block.geo.main;
import analyzer.block.geo.model.Geo;
import analyzer.block.geo.result.GeoResult;
import java.awt.*;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.time.format.DateTimeParseException;
import java.util.List;
import java.util.*;
public class GeoBlockAnalyzer {
private static final DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
private final int width;
private final int height;
private final String csvFilePath;
private GeoResult result = new GeoResult();
// Map of the geo id and respective geo object
private final Map<Integer, Geo> geoMap = new HashMap<>();
// Map of coordinates to each geo in the grid
private final Map<Point, Geo> coordMap = new HashMap<>();
/**
* Constructs a geo grid of the given width and height, populated with the geo data provided in
* the csv file
*
* #param width the width of the grid
* #param height the height of the grid
* #param csvFilePath the csv file containing the geo data
* #throws IOException
*/
public GeoBlockAnalyzer(final int width, final int height, final String csvFilePath)
throws IOException {
if (!Files.exists(Paths.get(csvFilePath)) || Files.isDirectory(Paths.get(csvFilePath))) {
throw new FileNotFoundException(csvFilePath);
}
if (width <= 0 || height <= 0) {
throw new IllegalArgumentException("Input height or width is 0 or smaller");
}
this.width = width;
this.height = height;
this.csvFilePath = csvFilePath;
populateGeoGrid();
populateCoordinatesMap();
calculateGeoNeighbours();
// printNeighbours();
}
/** #return the largest geo block in the input grid */
public GeoResult getLargestGeoBlock() {
for (final Geo geo : this.geoMap.values()) {
final List<Geo> visited = new ArrayList<>();
search(geo, visited);
}
return this.result;
}
/**
* Iterative DFS implementation to find largest geo block.
*
* #param geo the geo to be evaluated
* #param visited list of visited geos
*/
private void search(Geo geo, final List<Geo> visited) {
final Deque<Geo> stack = new LinkedList<>();
stack.push(geo);
while (!stack.isEmpty()) {
geo = stack.pop();
if (visited.contains(geo)) {
continue;
}
visited.add(geo);
final List<Geo> neighbours = geo.getNeighbours();
for (int i = neighbours.size() - 1; i >= 0; i--) {
final Geo g = neighbours.get(i);
if (!visited.contains(g)) {
stack.push(g);
}
}
}
if (this.result.getSize() < visited.size()) {
this.result = new GeoResult(visited);
}
}
/**
* Creates a map of the geo grid from the csv file data
*
* #throws IOException
*/
private void populateGeoGrid() throws IOException {
try (final BufferedReader br = Files.newBufferedReader(Paths.get(this.csvFilePath))) {
int lineNumber = 0;
String line = "";
while ((line = br.readLine()) != null) {
lineNumber++;
final String[] geoData = line.split(",");
LocalDate dateOccupied = null;
// Handle for empty csv cells
for (int i = 0; i < geoData.length; i++) {
// Remove leading and trailing whitespace
geoData[i] = geoData[i].replace(" ", "");
if (geoData[i].isEmpty() || geoData.length > 3) {
throw new IllegalArgumentException(
"There is missing data in the csv file at line: " + lineNumber);
}
}
try {
dateOccupied = LocalDate.parse(geoData[2], formatter);
} catch (final DateTimeParseException e) {
throw new IllegalArgumentException("There input date is invalid on line: " + lineNumber);
}
this.geoMap.put(
Integer.parseInt(geoData[0]),
new Geo(Integer.parseInt(geoData[0]), geoData[1], dateOccupied));
}
}
}
/** Create a map of each coordinate in the grid to its respective geo */
private void populateCoordinatesMap() {
// Using the geo id, calculate its point on the grid
for (int i = this.height - 1; i >= 0; i--) {
int blockId = (i * this.width);
for (int j = 0; j < this.width; j++) {
if (this.geoMap.containsKey(blockId)) {
final Geo geo = this.geoMap.get(blockId);
geo.setCoordinates(i, j);
this.coordMap.put(geo.getCoordinates(), geo);
}
blockId++;
}
}
}
private void calculateGeoNeighbours() {
for (final Geo geo : this.geoMap.values()) {
addNeighboursToGeo(geo);
}
}
private void addNeighboursToGeo(final Geo geo) {
final int x = geo.getCoordinates().x;
final int y = geo.getCoordinates().y;
final Point[] possibleNeighbours = {
new Point(x, y + 1), new Point(x - 1, y), new Point(x + 1, y), new Point(x, y - 1)
};
Geo g;
for (final Point p : possibleNeighbours) {
if (this.coordMap.containsKey(p)) {
g = this.coordMap.get(p);
if (g != null) {
geo.getNeighbours().add(g);
}
}
}
}
private void printNeighbours() {
for (final Geo geo : this.geoMap.values()) {
System.out.println("Geo " + geo.getId() + " has the following neighbours: ");
for (final Geo g : geo.getNeighbours()) {
System.out.println(g.getId());
}
}
}
}
GeoResult
package analyzer.block.geo.result;
import analyzer.block.geo.model.Geo;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.List;
public class GeoResult {
private final List<Geo> geosInBlock = new ArrayList<>();
public GeoResult() {
}
public GeoResult(final List<Geo> geosInBlock) {
this.geosInBlock.addAll(geosInBlock);
}
public List<Geo> getGeosInBlock() {
this.geosInBlock.sort(Comparator.comparingInt(Geo::getId));
return this.geosInBlock;
}
public int getSize() {
return this.geosInBlock.size();
}
#Override
public String toString() {
final StringBuilder sb = new StringBuilder();
sb.append("The geos in the largest cluster of occupied Geos for this GeoBlock are: \n");
for(final Geo geo : this.geosInBlock) {
sb.append(geo.toString()).append("\n");
}
return sb.toString();
}
}
Geo
package analyzer.block.geo.model;
import java.awt.Point;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import java.util.Objects;
public class Geo {
private final int id;
private final String name;
private final LocalDate dateOccupied;
private final Point coordinate;
private final List<Geo> neighbours = new ArrayList<>();
public Geo (final int id, final String name, final LocalDate dateOccupied) {
this.id = id;
this.name = name;
this.dateOccupied = dateOccupied;
this.coordinate = new Point();
}
public int getId() {
return this.id;
}
public String getName() {
return this.name;
}
public LocalDate getDateOccupied() {
return this.dateOccupied;
}
public void setCoordinates(final int x, final int y) {
this.coordinate.setLocation(x, y);
}
public Point getCoordinates() {
return this.coordinate;
}
public String toString() {
return this.id + ", " + this.name + ", " + this.dateOccupied;
}
public List<Geo> getNeighbours() {
return this.neighbours;
}
#Override
public int hashCode() {
return Objects.hash(this.id, this.name, this.dateOccupied);
}
#Override
public boolean equals(final Object obj) {
if(this == obj) {
return true;
}
if(obj == null || this.getClass() != obj.getClass()) {
return false;
}
final Geo geo = (Geo) obj;
return this.id == geo.getId() &&
this.name.equals(geo.getName()) &&
this.dateOccupied == geo.getDateOccupied();
}
}
The major optimization available here is a conceptual one. Unfortunately, this type of optimization is not easy to teach, nor look up in a reference somewhere. The principle being used here is:
It's (almost always) cheaper to use an analytic formula to compute a known result than to (pre)compute it. [1]
It's clear from your code & the definition of your problem that you are not taking advantage of this principle and the problem specification. In particular, one of the key points taken directly from the problem specification is this:
Your code should produce correct answers in under a second for a 10,000 x 10,000 Geo GeoBlock containing 10,000 occupied Geos.
When you read this statement a few things should be going through your mind (when thinking about runtime efficiency):
10,000^2 is a much larger number than 10,000 (exactly 10,000 times larger!) There is a clear efficiency gain if you can maintain an algorithm that is O(n) as opposed to O(n^2) (in the expected case because of the use of hashing.)
touching (i.e. computing any O(1) operation) for the entire grid is going to immediately yield a O(n^2) algorithm; clearly, this is something that must be avoided if possible
from the problem statement, we should never expect O(n^2) geo's that need to be touched. This should be a major hint as to what the person who wrote the problem is looking for. BFS or DFS is an O(N+M) algorithm where N,M are the number of nodes and edges touched. Thus, we should be expecting an O(n) search.
based on the above points, it is clear that the solution being looked for here should be O(10,000) for a problem input with grid size 10,000 x 10,000 and 10,000 geos
The solution you provided is O(n^2) because,
You use visited.contains where visited is a List. This is not showing up in your testing as a problem area because I suspect you are using small geo clusters. Try using a large geo cluster (one with 10,000 geos.) You should see a major slow down as compared to say the largest cluster having 3 geos. The solution here is to use an efficient data structure for visited, some that come to mind are a bit set (unknown to me if Java has any available, but any decent language should) or a hash set (clearly Java has some available.) Because you did not notice this in testing, this suggests to me you are not vetting/testing your code well enough with enough varied examples of the corner cases you expect. This should of come up immediately in any thorough testing/profiling of your code. As per my comment, I would of liked to have seen this type of groundwork/profiling done before the question was posted.
You touch the entire 10,000 x 10,000 grid in the function/member populateCoordinatesMap. This is clearly already O(n^2) where n=10,000. Notice, that the only location where coordMap is used outside of populateCoordinatesMap is in addNeighboursToGeo. This is a major bottleneck, and for no reason, addNeighboursToGeo can be computed in O(1) time without the need for a coordMap. However, we can still use your code as is with a minor modification given below.
I hope it is obvious how to fix (1). To fix (2), replace populateCoordinatesMap
/** Create a map of each coordinate in the grid to its respective geo */
private void populateCoordinatesMap() {
for (Map.Entry<int,Geo> entry : geoMap.entrySet()) {
int key = entry.getKey();
Geo value = entry.getValue();
int x = key % this.width;
int y = key / this.width;
value.setCoordinates(x, y);
this.coordMap.put(geo.getCoordinates(), geo);
}
}
Notice the principle being put to use here. Instead of iterating over the entire grid as you were doing before (O(n^2) immediately), this iterates only over the occupied Geos, and uses the analytic formula for indexing a 2D array (as opposed to doing copious computation to compute the same thing.) Effectively, this change improves populateCoordinatesMap from being O(n^2) to being O(n).
Some general & opinionated comments below:
Overall, I strongly disagree with using an object oriented approach over a procedural one for this problem. I think the OO approach is completely unjustified for how simple this code should be, but I understand that the interviewer wanted to see it.
This is a very simple problem you are trying to solve, and I think the object orientated approach you took here confounds it so much so you could not see the forest for the trees (or perhaps the trees for the forest.) A much simpler approach could of been taken in how this algorithm was implemented, even using an object oriented approach.
It's clear from the points above, you could benefit from knowing the available tools in the language you are working in. By this I mean you should know what containers are readily available and what the trade offs are for using each operation on each container. You should also know at least one decent profiling tool for the language you are working with if you are going to be looking into optimizing code. Given that you failed to post a profiling summary, even after I asked for it, it suggests to me you do not know of such a tool with Java. Learn one.
[1] I provide no reference for this principle because it is a first principle, and can be explained by the fact that running fewer constant time operations is cheaper than running many. The assumption here is that the known analytic form requires less computation. There are occasional exceptions to this rule. But it should be stressed that such exceptions are almost always because of hardware limitations or advantages. For example, when computing the hamming distance it is cheaper to use a precomputed LUT for computing the population count on a hardware architecture without access to SSE registers/operations.
Without testing, it seems to me that the main block here is the literal creation of the map, which could be up to 100,000,000 cells. There would be no need for that if instead we labeled each CSV entry and had a function getNeighbours(id, width, height) that returned the list of possible neighbour IDs (think modular arithmetic). As we iterate over each CSV entry in turn, if (1) neighbour IDs were already seen that all had the same label, we'd label the new ID with that label; if (2) no neighbours were seen, we'd use a new label for the new ID; and if (3) two or more different labels existed between seen neighbour IDs, we'd combine them to one label (say the minimal label), by having a hash that mapped a label to its "final" label. Also store the sum and size for each label. Your current solution is O(n), where n is width x height. The idea here would be O(n), where n is the number of occupied Geos.
Here's something really crude in Python that I wouldn't expect to have all scenarios handled but could hopefully give you an idea (sorry, I don't know Java):
def get_neighbours(id, width, height):
neighbours = []
if id % width != 0:
neighbours.append(id - 1)
if (id + 1) % width != 0:
neighbours.append(id + 1)
if id - width >= 0:
neighbours.append(id - width)
if id + width < width * height:
neighbours.append(id + width)
return neighbours
def f(data, width, height):
ids = {}
labels = {}
current_label = 0
for line in data:
[idx, name, dt] = line.split(",")
idx = int(idx)
this_label = None
neighbours = get_neighbours(idx, width, height)
no_neighbour_was_seen = True
for n in neighbours:
# A neighbour was seen
if n in ids:
no_neighbour_was_seen = False
# We have yet to assign a label to this ID
if not this_label:
this_label = ids[n]["label"]
ids[idx] = {"label": this_label, "data": name + " " + dt}
final_label = labels[this_label]["label"]
labels[final_label]["size"] += 1
labels[final_label]["sum"] += idx
labels[final_label]["IDs"] += [idx]
# This neighbour has yet to be connected
elif ids[n]["label"] != this_label:
old_label = ids[n]["label"]
old_obj = labels[old_label]
final_label = labels[this_label]["label"]
ids[n]["label"] = final_label
labels[final_label]["size"] += old_obj["size"]
labels[final_label]["sum"] += old_obj["sum"]
labels[final_label]["IDs"] += old_obj["IDs"]
del labels[old_label]
if no_neighbour_was_seen:
this_label = current_label
current_label += 1
ids[idx] = {"label": this_label, "data": name + " " + dt}
labels[this_label] = {"label": this_label, "size": 1, "sum": idx, "IDs": [idx]}
for i in ids:
print i, ids[i]["label"], ids[i]["data"]
print ""
for i in labels:
print i
print labels[i]
return labels, ids
data = [
"4, Tom, 2010-10-10",
"5, Katie, 2010-08-24",
"6, Nicole, 2011-01-09",
"11, Mel, 2011-01-01",
"13, Matt, 2010-10-14",
"15, Mel, 2011-01-01",
"17, Patrick, 2011-03-10",
"21, Catherine, 2011-02-25",
"22, Michael, 2011-02-25"
]
f(data, 4, 7)
print ""
f(data, 7, 4)
Output:
"""
4 0 Tom 2010-10-10
5 0 Katie 2010-08-24
6 0 Nicole 2011-01-09
11 1 Mel 2011-01-01
13 2 Matt 2010-10-14
15 1 Mel 2011-01-01
17 2 Patrick 2011-03-10
21 2 Catherine 2011-02-25
22 2 Michael 2011-02-25
0
{'sum': 15, 'size': 3, 'IDs': [4, 5, 6], 'label': 0}
1
{'sum': 26, 'size': 2, 'IDs': [11, 15], 'label': 1}
2
{'sum': 73, 'size': 4, 'IDs': [13, 17, 21, 22], 'label': 2}
---
4 0 Tom 2010-10-10
5 0 Katie 2010-08-24
6 0 Nicole 2011-01-09
11 0 Mel 2011-01-01
13 0 Matt 2010-10-14
15 3 Mel 2011-01-01
17 2 Patrick 2011-03-10
21 3 Catherine 2011-02-25
22 3 Michael 2011-02-25
0
{'sum': 39, 'size': 5, 'IDs': [4, 5, 6, 11, 13], 'label': 0}
2
{'sum': 17, 'size': 1, 'IDs': [17], 'label': 2}
3
{'sum': 58, 'size': 3, 'IDs': [21, 22, 15], 'label': 3}
"""

How to divide a set of numbers into two sets such that the difference of their sum is minimum

How to write a Java Program to divide a set of numbers into two sets such that the difference of the sum of their individual numbers, is minimum.
For example, I have an array containing integers- [5,4,8,2]. I can divide it into two arrays- [8,2] and [5,4]. Assuming that the given set of numbers, can have a unique solution like in above example, how to write a Java program to achieve the solution. It would be fine even if I am able to find out that minimum possible difference.
Let's say my method receives an array as parameter. That method has to first divide the array received into two arrays, and then add the integers contained in them. Thereafter, it has to return the difference between them, such that the difference is minimum possible.
P.S.- I have had a look around here, but couldn't find any specific solution to this. Most probable solution seemed to be given here- divide an array into two sets with minimal difference . But I couldn't gather from that thread how can I write a Java program to get a definite solution to the problem.
EDIT:
After looking at the comment of #Alexandru Severin, I tried a java program. It works for one set of numbers [1,3,5,9], but doesn't work for another set [4,3,5,9, 11]. Below is the program. Please suggest changes:-
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class FindMinimumDifference {
public static void main(String[] args) {
int[] arr= new int[]{4,3,5,9, 11};
FindMinimumDifference obj= new FindMinimumDifference();
obj.returnMinDiff(arr);
}
private int returnMinDiff(int[] array){
int diff=-1;
Arrays.sort(array);
List<Integer> list1= new ArrayList<>();
List<Integer> list2= new ArrayList<>();
int sumOfList1=0;
int sumOfList2=0;
for(int a:array){
for(Integer i:list1){
sumOfList1+=i;
}
for(Integer i:list2){
sumOfList2+=i;
}
if(sumOfList1<=sumOfList2){
list1.add(a);
}else{
list2.add(a);
}
}
List<Integer> list3=new ArrayList<>(list1);
List<Integer> list4= new ArrayList<>(list2);
Map<Integer, List<Integer>> mapOfProbables= new HashMap<Integer, List<Integer>>();
int probableValueCount=0;
for(int i=0; i<list1.size();i++){
for(int j=0; j<list2.size();j++){
if(abs(list1.get(i)-list2.get(j))<
abs(getSumOfEntries(list1)-getSumOfEntries(list2))){
List<Integer> list= new ArrayList<>();
list.add(list1.get(i));
list.add(list2.get(j));
mapOfProbables.put(probableValueCount++, list);
}
}
}
int minimumDiff=abs(getSumOfEntries(list1)-getSumOfEntries(list2));
List resultList= new ArrayList<>();
for(List probableList:mapOfProbables.values()){
list3.remove(probableList.get(0));
list4.remove(probableList.get(1));
list3.add((Integer)probableList.get(1));
list4.add((Integer)probableList.get(0));
if(minimumDiff>abs(getSumOfEntries(list3)-getSumOfEntries(list4))){
// valid exchange
minimumDiff=abs(getSumOfEntries(list3)-getSumOfEntries(list4));
resultList=probableList;
}
}
System.out.println(minimumDiff);
if(resultList.size()>0){
list1.remove(resultList.get(0));
list2.remove(resultList.get(1));
list1.add((Integer)resultList.get(1));
list2.add((Integer)resultList.get(0));
}
System.out.println(list1+""+list2); // the two resulting set of
// numbers with modified data giving expected result
return minimumDiff;
}
private static int getSumOfEntries(List<Integer> list){
int sum=0;
for(Integer i:list){
sum+=i;
}
return sum;
}
private static int abs(int i){
if(i<=0)
i=-i;
return i;
}
}
First of all, sorting the array then putting first member in group and second in another wound never work, and here is why:
Given the input[1,2,3,100].
The result would be: [1,3] and [2,100], clearly wrong.
The correct answer should be: [1,2,3] and [100]
You can find many optimization algorithms on google for this problem, but since I assume you're a beginner, I'll try to give you a simple algorithm that you can implement:
sort the array
iterate from highest to lowest value
for each iteration, calculate the sum of each group, then add the element to the group with minimum sum
At the end of the loop you should have two fairly balanced arrays. Example:
Array: [1,5,5,6,7,10,20]
i1: `[20] []`
i2: `[20] [10]`
i3: `[20] [10,7]`
i4: `[20] [20,7,6]`
i5: `[20,5] [10,7,6]`
i6: `[20,5] [10,7,6,5]`
i7: `[20,5,1] [10,7,6,5]`
Where the sums are 26 and 28. As you can see we can further optimize the solution, if we exchange 5 and 6 resulting in [20,6,1] and [20,7,5,5] the sums are equal.
For this step you can:
find all groups of elements (x,y) where x is in group1, y is in group2, and |x-y| < |sum(group1) - sum(group2)|
loop all groups and try exchanging x with y until you get a minimum difference
after each exchange check if the minimum value in the group with the highest sum is higher then the difference of the groups, if so, transfer it to the other group
This algorithm will always return the best solution, and is a whole lot better then a greedy approach. However it is not optimal in terms of complexity, speed and memory. If one needs it for very large arrays and the resources are limited, the most optimal algorithm may differ depending on the speed/memory ration and the accepted error percentage.
This is a variation on the Partition Problem https://en.wikipedia.org/wiki/Partition_problem
If you want the optimal solution you have to test every possible combination of output sets. That may be feasible for small sets but is infeasible for large inputs.
One good approximation is the greedy algorithm I present below.
This heuristic works well in practice when the numbers in the set are
of about the same size as its cardinality or less, but it is not
guaranteed to produce the best possible partition.
First you need to put your input in a sortable collection such as a List.
1) Sort the input collection.
2) Create 2 result sets.
3) Iterate over the sorted input. If the index is even put the item in result1 else put the item in result2.
List<Integer> input = new ArrayList<Integer>();
Collections.sort(input);
Set<Integer> result1 = new HashSet<Integer>();
Set<Integer> result2 = new HashSet<Integer>();
for (int i = 0; i < input.size(); i++) {
if (i % 2 == 0) {// if i is even
result1.add(input.get(i));
} else {
result2.add(input.get(i));
}
}
I seem to have got the perfect solution for this. Below Java program works perfectly. Only assumption is that, the given problem has unique solution (just one solution). This assumption implies- only non-zero number. I am putting the program below. I request everyone to tell if the program could fail for certain scenario, or if it could be improved/optimized in some way. Credits to Mr Alexandru Severin's algorithm posted as one of the answers in this thread.
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class FindMinimumDifference {
static List<Integer> list1= new ArrayList<>();
static List<Integer> list2= new ArrayList<>();
public static void main(String[] args) {
int[] arr= new int[]{3,-2,9,7};
// tested for these sample data:- [1,5,9,3] ; [4,3,5,9,11] ;
//[7,5,11,2,13,15,14] ; [3,2,1,7,9,11,13] ;
//[3,1,0,5,6,9] ; [6,8,10,2,4,0] ; [3,1,5,7,0] ; [4,-1,5,-3,7] ; [3,-2,9,7]
System.out.println("the minimum possible difference is: "+returnMinDiff(arr));
System.out.println("the two resulting set of nos. are: "+list1+" and "+list2);
}
private static int returnMinDiff(int[] array){
int diff=-1;
Arrays.sort(array);
for(int a:array){
int sumOfList1=0;
int sumOfList2=0;
for(Integer i:list1){
sumOfList1+=i;
}
for(Integer i:list2){
sumOfList2+=i;
}
if(sumOfList1<=sumOfList2){
list1.add(a);
}else{
list2.add(a);
}
}
List<Integer> list3=new ArrayList<>(list1);
List<Integer> list4= new ArrayList<>(list2);
if(list3.size()!=list4.size()){ // both list should contain equal no. of entries.
//If not, add 0 to the list having lesser no. of entries
if(list3.size()<list4.size()){
list3.add(0);
}else{
list4.add(0);
}
}
Map<Integer, List<Integer>> mapOfProbables= new HashMap<Integer, List<Integer>>();
int probableValueCount=0;
for(int i=0; i<list3.size();i++){
for(int j=0; j<list4.size();j++){
if(abs(list3.get(i)-list4.get(j))
<abs(getSumOfEntries(list3)-getSumOfEntries(list4))){
List<Integer> list= new ArrayList<>();
list.add(list3.get(i));
list.add(list4.get(j));
mapOfProbables.put(probableValueCount++, list);
}
}
}
int minimumDiff=abs(getSumOfEntries(list1)-getSumOfEntries(list2));
List resultList= new ArrayList<>();
for(List probableList:mapOfProbables.values()){
list3=new ArrayList<>(list1);
list4= new ArrayList<>(list2);
list3.remove(probableList.get(0));
list4.remove(probableList.get(1));
list3.add((Integer)probableList.get(1));
list4.add((Integer)probableList.get(0));
if(minimumDiff>abs(getSumOfEntries(list3)-getSumOfEntries(list4))){ // valid exchange
minimumDiff=abs(getSumOfEntries(list3)-getSumOfEntries(list4));
resultList=probableList;
}
}
if(resultList.size()>0){ // forming the two set of nos. whose difference of sum comes out to be minimum
list1.remove(resultList.get(0));
list2.remove(resultList.get(1));
if(!resultList.get(1).equals(0) ) // (resultList.get(1).equals(0) && !list1.contains(0))
list1.add((Integer)resultList.get(1));
if(!resultList.get(0).equals(0) || (resultList.get(0).equals(0) && list2.contains(0)))
list2.add((Integer)resultList.get(0));
}
return minimumDiff; // returning the minimum possible difference
}
private static int getSumOfEntries(List<Integer> list){
int sum=0;
for(Integer i:list){
sum+=i;
}
return sum;
}
private static int abs(int i){
if(i<=0)
i=-i;
return i;
}
}
For this question, assume that we can divide the array into two subarrays such that their sum is equal. (Even thought they are not equal , it will work)
So if the sum of elements in array is S. Your goal is to find a subset with sum S/2. You can write a recursive function for this.
int difference = Integer.MAX_VALUE;
public void recursiveSum(int[] array, int presentSum, int index,Set<Integer> presentSet){
if(index == array.length){
if(Math.abs(presentSum - (S/2)) < difference)){
difference = Math.abs(presentSum - (S/2);
// presentSet is your answer
return;
}
}
recursiveSum(array,presentSum,index+1,presentSet); // don't consider the present element in the final solution
presentSet.add(array[index]);
recursiveSum(array,presentSum + array[index],index+1,presentSet); //consider the present element in the final solution
}
You can also write an equivalent O(N^2) dynamic programming code for this.
I was just demonstrating the idea.
So when you find this set with sum S/2, automatically you have divided the array in to two parts with same sum (S/2 here).
It seems that you are more interested in the algorithm than the code. So, here is my psuedocode:-
int A[];//This contains your elements in sorted (descending) order
int a1[],a2[];//The two sub-arrays
int sum1=0,sum2=0;//These store the sum of the elements of the 2 subarrays respectively
for(i=0;i<A.length;i++)
{
//Calculate the absolute difference of the sums for each element and then add accordingly and thereafter update the sum
if(abs(sum1+A[i]-sum2)<=abs(sum2+A[i]-sum1))
{a1.add(A[i]);
sum1+=A[i];}
else
{a2.add(A[i]);
sum2+=A[i];}
}
This will work for all integers, positive or negative.

Code chef :ODD , fastest way to calculate the nearest power of 2 for a given large number?

The link to the problem on codechef is:
http://www.codechef.com/problems/DCE05
The problem is:
The contestants have to stand in a line. They are given the numbers in the order in which they stand, starting from 1. The captain then removes all the contestants that are standing at an odd position.
Initially, standing people have numbers - 1,2,3,4,5...
After first pass, people left are - 2,4,...
After second pass - 4,....
And so on.
You want to board the ship as a crew member. Given the total number of applicants for a position, find the best place to stand in the line so that you are selected.
Input
First line contains the number of test cases t (t<=10^5). The next t lines contain integer n, the number of applicants for that case. (n<=10^9)
Output
Display t lines, each containing a single integer, the place where you would stand to win a place at TITANIC.
Example
Input:
2
5
12
Output:
4
8
I noticed a pattern:
For 1 : Output=1 (2^0)
For 2 : Output=2 (2^1)
For 3 : Output=2 (2^1)
For 4 : Output=4 (2^2)
For 5 : Output=4 (2^2)
For 6 : Output=4 (2^2)
For 7 : Output=4 (2^2)
For 8 : Output=8 (2^3) ans so on
So the answer every time is the nearest power of 2 which is <=number.
Here's my code:
import java.io.*;
public class Main {
public static void main(String[] args) throws NumberFormatException, IOException
{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int n=Integer.parseInt(br.readLine());
int x=0;
int output[]=new int[n];
for(int i = 0; i < n; i++)
{
output[i]=(int) Math.pow(2, Math.floor(Math.log(Integer.parseInt(br.readLine()))/Math.log(2)));
}
for(int i=0; i<n;i++)
{
System.out.println(output[i]);
}
}
}
Approach 1: I used Math.pow() to calculate powers of two in a loop until is becomes <= number , which I suppose was very inefficient.
Approach 2: I replaced Math.pow() with *2 in loop. (Still time exceeded)
Approach 3: I replaced multiplication by 2 with left shift in loop. (Still time exceeded)
Approach 4: I replaced loop with that log 2 logic, I found on stackverflow). (Still time exceeded)
Still it's showing time exceeded.
What is the fastest way to do this?
Be ensured that Integer.highestOneBit is the fastest part of your code. Even your original version was most probably way faster than parsing and formatting. I tried myself and succeeded with
final StringBuilder sb = new StringBuilder();
for (int i = 0; i < n; i++) {
if (sb.length() > 100000) {
System.out.print(sb);
sb.delete(0, sb.length());
}
final int x = Integer.parseInt(br.readLine());
final int y = Integer.highestOneBit(x);
sb.append(y).append("\n");
}
System.out.print(sb);
The computation is trivial, so I guessed the problem was the output and added some buffering. There's probably a simpler way, but I don't care.
Another possibility is that the site works non-deterministically and I was just lucky.

Find Survivor when n people are sitting in a circle

Hi I am across this problem and trying to solve this
Take a second to imagine that you are in a room with 100 chairs arranged in a circle. These chairs are numbered sequentially from One to One Hundred.
At some point in time, the person in chair #1 will be told to leave the room. The person in chair #2 will be skipped, and the person in chair #3 will be told to leave. Next to go is person in chair #6. In other words, 1 person will be skipped initially, and then 2, 3, 4.. and so on. This pattern of skipping will keep going around the circle until there is only one person remaining.. the survivor. Note that the chair is removed when the person leaves the room.Write a program to figure out which chair the survivor is sitting in.
I made good progress but stuck with a issue, after the count reaches 100 and not sure how to iterate from here, can any one help me, this is my code
import java.util.ArrayList;
public class FindSurvivor {
public static void main(String[] args) {
System.out.println(getSurvivorNumber(10));
}
private static int getSurvivorNumber(int numChairs) {
// Handle bad input
if (numChairs < 1) {
return -1;
}
// Populate chair array list
ArrayList<Integer> chairs = new ArrayList<Integer>();
for (int i = 0; i < numChairs; i++) {
chairs.add(i + 1);
}
int chairIndex = 0;
int lr =0;
while (chairs.size() > 1) {
chairs.remove(lr);
chairIndex+=1;
System.out.println(lr+" lr, size "+chairs.size()+" index "+chairIndex);
if(lr==chairs.size()||lr==chairs.size()-1)
lr=0;
lr = lr+chairIndex;
printChair(chairs);
System.out.println();
}
return chairs.get(0);
}
public static void printChair(ArrayList<Integer> chairs){
for(int i : chairs){
System.out.print(i);
}
}
}
The answer is 31. Here are three different implementations
var lastSurvivor = function(skip, count, chairs) {
//base case checks to see if there is a lone survivor
if (chairs.length === 1)
return chairs[0];
//remove chairs when they are left/become dead
chairs.splice(skip, 1);
//increment the skip count so we know which chair
//to leave next.
skip = (skip + 1 + count) % chairs.length;
count++;
//recursive call
return lastSurvivor(skip, count, chairs);
};
/** TESTS *******************************************************************
----------------------------------------------------------------------------*/
var result = lastSurvivor(0, 0, chairs);
console.log('The lone survivor is located in chair #', result);
// The lone survivor is located in chair # 31
/** ALTERNATE IMPLEMENTATIONS ***********************************************
-----------------------------------------------------------------------------
/* Implemenation 2
-----------------*/
var lastSurvivor2 = function(chairs, skip) {
skip++;
if (chairs === 1)
return 1;
else
return ((lastSurvivor2(chairs - 1, skip) + skip - 1) % chairs) + 1;
};
/** Tests 2 *******************************************************************/
var result = lastSurvivor2(100, 0);
console.log('The lone survivor is located in chair #', result);
// The lone survivor is located in chair # 31
/* Implemenation 3
------------------*/
var chairs2 = [];
for (var i = 1; i <= 100; i++)
chairs2.push(i);
var lastSurvivor3 = function(chairs, skip) {
var count = 0;
while (chairs.length > 1) {
chairs.splice(skip, 1);
skip = (skip + 1 + count) % chairs.length;
count++;
}
return chairs[0];
};
/** Tests 3 *******************************************************************/
var result = lastSurvivor3(chairs2, 0);
console.log('The lone survivor is located in chair #', result);
// The lone survivor is located in chair # 31
I'm not sure what your removal pattern is but I'd probably implement this as a circular linked list where the 100th seat holder will connect back to the 1st seat holder. If you use an array, you will have to worry about re-organizing the seats after every removal.
There is elegant analytical solution:
Let's change numbering of people: #2 -> #1, #3 -> #2, ..., #1 -> #100 (in the end we just need to substract 1 to "fix" the result). Now first person remains instead or leaving. Suppose that there is only 64 people in circle. It's easy to see that after first elimination pass 32 people in circle will remain and numbering will start again from #1. So in the end only #1 will remain.
We have 100 people. After 36 people will leave the circle we will end up with 64 people - and we know how to solve this. For each person that leaves the room one person remains, so circle with 64 people will start from 1 + 2*36 = #73 (new #1). Because of changing indexes on first step final answer will be #72.
In general case res = 2*(N - closest_smaller_pow_2) = 2*N - closest_larger_pow_2. The code is trivial:
public static long remaining(long total) {
long pow2 = 1;
while (pow2 < total) {
pow2 *= 2;
}
return 2*total - pow2;
}
Also this algorithm has O(log(N)) complexity instead of O(N), so it's possible to calculate function for huge inputs (it can be easily adapted to use BigInteger instead of long).
First, let's assume the chairs are numbered from 0. We'll switch the numbering back at the end -- but usually things are simpler when items are enumerated from 0 rather than 1.
Now, if you've got n people and you start eliminating at chair x (x is 0 or 1) then in a single pass through you're going to eliminate half the people. Then you've got a problem of roughly half the size (possibly plus one), and if you solve that, you can construct the solution to the original problem by multiplying that sub-result by 2 and maybe adding one.
To code this, it's simply a matter of getting the 4 cases (n odd or even vs x 0 or 1) right. Here's a version that gets the 4 cases right by using bitwise trickery.
public static long j2(long n, long x) {
if (n == 1) return 0;
return j2(n/2 + (n&x), (n&1)^x) + 1-x;
}
A solution with chairs numbered from 1 and without the extra argument can now be written:
public static long remaining(long n) {
return 1 + j2(n, 0);
}
This runs in O(log n) time and uses O(log n) memory.
If your step is incremental you can you use the following code:
int cur = 0;
int step = 1;
while (chairs.size() > 1) {
chairs.remove(cur);
cur += ++step;
cur %= chairs.size();
}
return chairs.get(0);
If your step is fixed to 1 then based on explanation provided by #Jarlax you can solve the problem with one-line of code in O(log n) time:
//for long values
public static long remaining(long numChairs) {
return (numChairs << 1) - (long)Math.pow(2,Long.SIZE - Long.numberOfLeadingZeros(numChairs));
}
//for BigInteger values
public static BigInteger remaining(BigInteger numChairs) {
return numChairs.shiftLeft(1).subtract(new BigInteger("2").pow(numChairs.bitLength()));
}
However, if you stick with ArrayLists no extra variables are required to your code. Always remove the first element and remove-then-add the next at the end of the list. This is however O(n).
while (chairs.size() > 1) {
chairs.remove(0);
chairs.add(chairs.remove(0));
}
return chairs.get(0);

Graph colouring algorithm: typical scheduling problem

I'm training code problems like UvA and I have this one in which I have to, given a set of n exams and k students enrolled in the exams, find whether it is possible to schedule all exams in two time slots.
Input
Several test cases. Each one starts with a line containing 1 < n < 200 of different examinations to be scheduled.
The 2nd line has the number of cases k in which there exist at least 1 student enrolled in 2 examinations. Then, k lines will follow, each containing 2 numbers that specify the pair of examinations for each case above.
(An input with n = 0 will means end of the input and is not to be processed).
Output:
You have to decide whether the examination plan is possible or not for 2 time slots.
Example:
Input:
3
3
0 1
1 2
2 0
9
8
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0
Ouput:
NOT POSSIBLE.
POSSIBLE.
I think the general approach is graph colouring, but I'm really a newb and I may confess that I had some trouble understanding the problem.
Anyway, I'm trying to do it and then submit it.
Could someone please help me doing some code for this problem?
I will have to handle and understand this algo now in order to use it later, over and over.
I prefer C or C++, but if you want, Java is fine to me ;)
Thanks in advance
You are correct that this is a graph coloring problem. Specifically, you need to determine if the graph is 2-colorable. This is trivial: do a DFS on the graph, coloring alternating black and white nodes. If you find a conflict, then the graph is not 2-colorable, and the scheduling is impossible.
possible = true
for all vertex V
color[V] = UNKNOWN
for all vertex V
if color[V] == UNKNOWN
colorify(V, BLACK, WHITE)
procedure colorify(V, C1, C2)
color[V] = C1
for all edge (V, V2)
if color[V2] == C1
possible = false
if color[V2] == UNKNOWN
colorify(V2, C2, C1)
This runs in O(|V| + |E|) with adjacency list.
in practice the question is if you can partition the n examinations into two subsets A and B (two timeslots) such that for every pair in the list of k examination pairs, either a belongs to A and b belongs to B, or a belongs to B and b belongs to A.
You are right that it is a 2-coloring problem; it's a graph with n vertices and there's an undirected arc between vertices a and b iff the pair or appears in the list. Then the question is about the graph's 2-colorability, the two colors denoting the partition to timeslots A and B.
A 2-colorable graph is a "bipartite graph". You can test for bipartiteness easily, see http://en.wikipedia.org/wiki/Bipartite_graph.
I've translated the polygenelubricant's pseudocode to JAVA code, in order to provide a solution for my problem. We have a submission platform (like uva/ACM contests), so I know it passed even in the problem with more and hardest cases.
Here it is:
import java.util.ArrayList;
import java.util.Hashtable;
import java.util.Scanner;
/**
*
* #author newba
*/
public class GraphProblem {
class Edge {
int v1;
int v2;
public Edge(int v1, int v2) {
this.v1 = v1;
this.v2 = v2;
}
}
public GraphProblem () {
Scanner cin = new Scanner(System.in);
while (cin.hasNext()) {
int num_exams = cin.nextInt();
if (num_exams == 0)
break;
int k = cin.nextInt();
Hashtable<Integer,String> exams = new Hashtable<Integer, String>();
ArrayList<Edge> edges = new ArrayList<Edge>();
for (int i = 0; i < k; i++) {
int v1 = cin.nextInt();
int v2 = cin.nextInt();
exams.put(v1,"UNKNOWN");
exams.put(v2,"UNKNOWN");
//add the edge from A->B and B->A
edges.add(new Edge(v1, v2));
edges.add(new Edge(v2, v1));
}
boolean possible = true;
for (Integer key: exams.keySet()){
if (exams.get(key).equals("UNKNOWN")){
if (!colorify(edges, exams,key, "BLACK", "WHITE")){
possible = false;
break;
}
}
}
if (possible)
System.out.println("POSSIBLE.");
else
System.out.println("NOT POSSIBLE.");
}
}
public boolean colorify (ArrayList<Edge> edges,Hashtable<Integer,String> verticesHash,Integer node, String color1, String color2){
verticesHash.put(node,color1);
for (Edge edge : edges){
if (edge.v1 == (int) node) {
if (verticesHash.get(edge.v2).equals(color1)){
return false;
}
if (verticesHash.get(edge.v2).equals("UNKNOWN")){
colorify(edges, verticesHash, edge.v2, color2, color1);
}
}
}
return true;
}
public static void main(String[] args) {
new GraphProblem();
}
}
I didn't optimized yet, I don't have the time right new, but if you want, you/we can discuss it here.
Hope you enjoy it! ;)

Categories