knapsack-like algorithm - java

Here is a very interesting java problem I've found:
Before book printing was found the books were copied by certain people called "writers".
The bookkeeper has a stack of N books that need to be copied.For that purpose he has K writers. Each book can have a different number of pages and every writer can only take books from the top of the stack (if he takes book 1 then he can take book 2 but not book 4 or book 5). The bookkeeper knows the number of pages each book has and he needs to share the books between the writers in order for the maximum number of pages a writer has to copy to be the minimum possible.The pages of course can't be split for example you can't have a 30 page book split into 15 and 15.
For example if we have 7 books with 3 writers and the books pages accordingly: 30 50 5 60 90 5 80 then the optimal solution would be for the first writer to take the first 4 books, the second writer the next book and the 3rd the last two books so we would have:
1st = 145 pages
2nd = 90 pages
3rd = 85 pages
So the program is to write an algorithm which finds the optimal solution for sharing the pages between the writers. So in the end of the program you have to show how many pages each one got.
This was in a programming contest years ago and I wanted to give it a try and what I've found so far is that if you take the total number of pages of all the books and divide them by the number of writers you get in the example 106.66 pages and then you try to give to each writer the continuous books from the stack that are closest to that number, but that doesn't work well at all for large page numbers especially if the number of pages a book has exceeds the total number of pages divided by the number of writers
So share your opinion and give tips and hints if you'd like, mathematical or whatever, this is a very interesting algorithm to be found!

I've come up with a straight forward solution, perhaps not very efficient, but the logic works. Basically you start with the number of writers being the same number as that of the number of books and reduce until you have your number of writers.
To show with an example. Suppose you start with your seven values, 30 50 5 60 90 5 80. For each step you reduce it by one by summing up the "lowest pair". The values in bold are the pair being carried on to the next iteration.
7
30 50 5 60 90 5 80
6
30 55 60 90 5 80
5
30 55 60 90 85
4
85 60 90 85
3
145 90 85
With some pseudo programming, this example shows how it could be implemented
main(books: { 30 50 5 60 90 5 80 }, K: 3)
define main(books, K)
writers = books
while writers.length > K do
reduceMinimalPair(writers)
endwhile
end
define reduceMinimalPair(items)
index = 0
minvalue = items[0] + items[1]
for i in 1..items.length-1 do
if items[i] + items[i + 1] < minvalue then
index = i
minvalue = items[i] + items[i + 1]
endif
endfor
items.removeat(index)
items.removeat(index + 1)
items.insertat(index, minvalue)
end

Let us assume you have books 1...n with pages b1,b2,...,bn. Assume you have K writers.
Initialize a matrix F[1...n,1...K]=infinity.
Let F[i,1]= sum_j=1..i (bj)
Now, for every k=2..K
F[i,k] = min_j=1..i( max(F[j,k-1], sum_r=j+1..i (br) )

I think the way you thought is the right one, but if you are saying it did not work for big numbers then maybe you should check if a bigger number than the average exists and do something else in that case. Maybe remove the number and give it from the start to a writer or something along those lines

Alternate to solving it with Dynamic Programming, you can also binary search a upper page limit that everyone will not copy more than this number of pages. When this number converge, that's the answer.

Related

Array and file creation basic Java Array Problem

I happen to be having a hard time with a problem. A test will be coming up soon for me and I'm just not so sure as to how to get this problem done. (Nothing too advanced) My main issue is that the reading of and creating the final file. the problem is right below. Any help is appreciated, thanks!
I've been told to add a summary due to the difficulty to understand what im asking. So essentially you have a file, text.txt and the program takes said file and adds all the numbers in it, spaces make the numbers distinguished and if a zero is at the front it needs to get taken away. All numbers are separated (whatever number of spaces are between them) is all separated with a space + space and a space = at the end followed by space the result of the addition. I'm looking for help as how to:
a) Make the new file
b) Make the +s and =s
c) Transferring from the original file
d) How to do the addition (dealing with large space gaps and doing the summing)
IM STILL VERY NEW TO JAVA SO ANY LONGER EXPLANATIONS WOULD BE GREATLY APPRECIATED
The approach you are to implement is to store each integer in an array of digits, with one digit per array element. We will be using arrays of length 25, so we will be able to store integers up to 25 digits long. We have to be careful in how we store these digits. Consider, for example, storing the numbers 38423 and 27. If we store these at the “front” of the array with the leading digit of each number in index 0 of the array, then when we go to add these numbers together, we’re likely to add them like this:
38423
27
Thus, we would be adding 3 and 2 in the first column and 8 and 7 in the second column. Obviously this won’t give the right answer. We know from elementary school arithmetic that we have to shift the second number to the right to make sure that we have the appropriate columns lined up:
38423
27
To simulate this right-shifting of values, we will store each value as a sequence of exactly 25 digits, but we’ll allow the number to have leading 0’s. For example, the problem above is converted into:
0000000000000000000038423
0000000000000000000000027
Now the columns line up properly and we have plenty of space at the front in case we have even longer numbers to add to these.
The data for your program will be stored in a file called sum.txt. Each line of the input file will have a different addition problem for you to solve. Each line will have one or more integers to be added together. Take a look at the input file at the end of this write-up and the output you are supposed to produce. Notice that you produce a line of output for each input line showing the addition problem you are solving and its answer. Your output should also indicate at the end how many lines of input were processed. You must exactly reproduce this output.
You should use the techniques described in chapter 6 to open a file, to read it line by line, and to process the contents of each line. In reading these numbers, you won’t be able to read them as ints or longs because many of them are too large to be stored in an int or long. So you’ll have to read them as String values using calls on the method next(). Your first task, then, will be to convert a String of digits into an array of 25 digits. As described above, you’ll want to shift the number to the right and include leading 0’s in front. Handout #15 provides an example of how to process an array of digits. Notice in particular the use of the String method charAt and the method Character.getNumericValue. These will be helpful for solving this part of the problem.
You are to add up each line of numbers, which means that you’ll have to write some code that allows you to add together two of these numbers or to add one of them to another. This is something you learned in Elementary School to add starting from the right, keeping track of whether there is a digit to carry from one column to the next. Your challenge here is to take a process that you are familiar with and to write code that performs the corresponding task.
Your program also must write out these numbers. In doing so, it should not print any leading 0’s. Even though it is convenient to store the number internally with leading 0’s, a person reading your output would rather see these numbers without any leading 0’s.
You can assume that the input file has numbers that have 25 or fewer digits and that the answer is always 25 digits or fewer. Notice, however, that you have to deal with the possibility that an individual number might be 0 or the answer might be 0. There will be no negative integers in the input file.
You should solve this problem using arrays that are exactly 25 digits long. Certain bugs can be solved by stretching the array to something like 26 digits, but it shouldn’t be necessary to do that and you would lose style points if your arrays require more than 25 digits.
The choice of 25 for the number of digits is arbitrary (a magic number), so you should introduce a class constant that you use throughout that would make it easy to modify your code to operate with a different number of digits.
Consider the input file as an example of the kind of problems your program must solve. We might use a more complex input file for actual grading.
The Java class libraries include classes called BigInteger and BigDecimal that use a strategy similar to what we are asking you to implement in this program. You are not allowed to solve this problem using BigInteger or BigDecimal. You must solve it using arrays of digits.
The sample program of handout #18 should be particularly helpful to study to prepare you for this programming problem. It has some significant differences from the task you are asked to solve, but it also has some similarities that you will find helpful to study. The textbook has a discussion of the program starting in section 7.6.
You may assume that the input file has no errors. In particular, you may assume that each line of input begins with at least one number and that each number and each answer will be 25 digits or fewer. There will be whitespace separating the various numbers, although there is no guarantee about how much whitespace there will be between numbers.
You will again be expected to use good style throughout your program and to comment each method and the class itself. A major portion of the style points will be awarded based on how you break this program down into static methods. As with the sample program in handout #18, try to think in terms of logical subtasks of the overall task and create different methods for different subtasks. You should have at least four static methods other than main and you are welcome to introduce more than four if you find it helpful.
Your program should be stored in a file called Sum.java. You will need to include the files Scanner.java and sum.txt from the class web page (under the “assignments” link) in the same folder as your program. For those using DrJava, you will either have to use a full path name for the file sum.txt (see section 6.2.2 of the book) or you will have to put the file in the same directory as the DrJava program.
Input file sum.txt
82384
204 435
22 31 12
999 483
28350 28345 39823 95689 234856 3482 55328 934803
7849323789 22398496 8940 32489 859320
729348690234239 542890432323 534322343298
3948692348692348693486235 5834938349234856234863423
999999999999999999999999 432432 58903 34
82934 49802390432 8554389 4789432789 0 48372934287
0
0 0 0
7482343 0 4879023 0 8943242
3333333333 4723 3333333333 6642 3333333333
Output that should be produced
82384 = 82384
204 + 435 = 639
22 + 31 + 12 = 65
999 + 483 = 1482
28350 + 28345 + 39823 + 95689 + 234856 + 3482 + 55328 + 934803 = 1420676
7849323789 + 22398496 + 8940 + 32489 + 859320 = 7872623034
729348690234239 + 542890432323 + 534322343298 = 730425903009860
3948692348692348693486235 + 5834938349234856234863423 = 9783630697927204928349658
999999999999999999999999 + 432432 + 58903 + 34 = 1000000000000000000491368
82934 + 49802390432 + 8554389 + 4789432789 + 0 + 48372934287 = 102973394831
0 = 0
0 + 0 + 0 = 0
7482343 + 0 + 4879023 + 0 + 8943242 = 21304608
3333333333 + 4723 + 3333333333 + 6642 + 3333333333 = 10000011364
Total lines = 14

Competitive Coding - Clearing all levels with minimum cost : Not passing all test cases

I was solving problems on a competitive coding website when I came across this. The problem states that:
In this game there are N levels and M types of available weapons. The levels are numbered from 0 to N-1 and the weapons are numbered from 0 to M-1 . You can clear these levels in any order. In each level, some subset of these M weapons is required to clear this level. If in a particular level, you need to buy x new weapons, you will pay x^2 coins for it. Also note that you can carry all the weapons you have currently to the next level . Initially, you have no weapons. Can you find out the minimum coins required such that you can clear all the levels?
Input Format
The first line of input contains 2 space separated integers:
N = the number of levels in the game
M = the number of types of weapons
N lines follows. The ith of these lines contains a binary string of length M. If the jth character of
this string is 1 , it means we need a weapon of type j to clear the ith level.
Constraints
1 <= N <=20
1<= M <= 20
Output Format
Print a single integer which is the answer to the problem.
Sample TestCase 1
Input
1 4
0101
Output
4
Explanation
There is only one level in this game. We need 2 types of weapons - 1 and 3. Since, initially Ben
has no weapons he will have to buy these, which will cost him 2^2 = 4 coins.
Sample TestCase 2
Input
3 3
111
001
010
Output
3
Explanation
There are 3 levels in this game. The 0th level (111) requires all 3 types of weapons. The 1st level (001) requires only weapon of type 2. The 2nd level requires only weapon of type 1. If we clear the levels in the given order(0-1-2), total cost = 3^2 + 0^2 + 0^2 = 9 coins. If we clear the levels in the order 1-2-0, it will cost = 1^2 + 1^2 + 1^2 = 3 coins which is the optimal way.
Approach
I was able to figure out that we can calculate the minimum cost by traversing the Binary Strings in a way that we purchase minimum possible weapons at each level.
One possible way could be traversing the array of binary strings and calculating the cost for each level while the array is already arranged in the correct order. The correct order should be when the Strings are already sorted i.e. 001, 010, 111 as in case of the above test case. Traversing the arrays in this order and summing up the cost for each level gives the correct answer.
Also, the sort method in java works fine to sort these Binary Strings before running a loop on the array to sum up cost for each level.
Arrays.sort(weapons);
This approach work fine for some of the test cases, however more than half of the test cases are still failing and I can't understand whats wrong with my logic. I am using bitwise operators to calculate the number of weapons needed at each level and returning their square.
Unfortunately, I cannot see the test cases that are failing. Any help is greatly appreciated.
This can be solved by dynamic programming.
The state will be the bit mask of weapons we currently own.
The transitions will be to try clearing each of the n possible levels in turn from the current state, acquiring the additional weapons we need and paying for them.
In each of the n resulting states, we take the minimum cost of the current way to achieve it and all previously observed ways.
When we already have some weapons, some levels will actually require no additional weapons to be bought; such transitions will automatically be disregarded since in such case, we arrive at the same state having paid the same cost.
We start at the state of m zeroes, having paid 0.
The end state is the bitwise OR of all the given levels, and the minimum cost to get there is the answer.
In pseudocode:
let mask[1], mask[2], ..., mask[n] be the given bit masks of the n levels
p2m = 2 to the power of m
f[0] = 0
all f[1], f[2], ..., f[p2m-1] = infinity
for state = 0, 1, 2, ..., p2m-1:
current_cost = f[state]
current_ones = popcount(state) // popcount is the number of 1 bits
for level = 1, 2, ..., n:
new_state = state | mask[level] // the operation is bitwise OR
new_cost = current_cost + square (popcount(new_state) - current_ones)
f[new_state] = min (f[new_state], new_cost)
mask_total = mask[1] | mask[2] | ... | mask[n]
the answer is f[mask_total]
The complexity is O(2^m * n) time and O(2^m) memory, which should be fine for m <= 20 and n <= 20 in most online judges.
The dynamic optimization idea by #Gassa could be extended by using A* by estimating min and max of the remaining cost, where
minRemaining(s)=bitCount(maxState-s)
maxRemaining(s)=bitCount(maxState-s)^2
Start with a priority queue - and base it on cost+minRemaining - with the just the empty state, and then replace a state from this queue that has not reached maxState with at most n new states based the n levels:
Keep track bound=min(cost(s)+maxRemaining(s)) in queue,
and initialize all costs with bitCount(maxState)^2+1
extract state with lowest cost
if state!=maxState
remove state from queue
for j in 1..n
if (state|level[j]!=state)
cost(state|level[j])=min(cost(state|level[j]),
cost(state)+bitCount(state|level[j]-state)^2
if cost(state|level[j])+minRemaining(state|level[j])<=bound
add/replace state|level[j] in queue
else break
The idea is to skip dead-ends. So consider an example from a comment
11100 cost 9 min 2 max 4
11110 cost 16 min 1 max 1
11111 cost 25 min 0 max 0
00011 cost 4 min 3 max 9
bound 13
remove 00011 and replace with 11111 (skipping 00011 since no change)
11111 cost 13 min 0 max 0
11100 cost 9 min 2 max 4
11110 cost 16 min 1 max 1
remove 11100 and replace with 11110 11111 (skipping 11100 since no change):
11111 cost 13 min 0 max 0
11110 cost 10 min 1 max 1
bound 11
remove 11110 and replace with 11111 (skipping 11110 since no change)
11111 cost 11 min 0 max 0
bound 11
Number of operations should be similar to dynamic optimization in the worst case, but in many cases it will be better - and I don't know if the worst case can occur.
The logic behind this problem is that every time you have to find the minimum count of set bits corresponding to a binary string which will contain the weapons so far got in the level.
For ex :
we have data as
4 3
101-2 bits
010-1 bits
110-2 bits
101-2 bits
now as 010 has min bits we compute cost for it first then update the current pattern (by using bitwise OR) so current pattern is 010
next we find the next min set bits wrt to current pattern
i have used the logic by first using XOR for current pattern and the given number then using AND with the current number(A^B)&A
so the bits become like this after the operation
(101^010)&101->101-2 bit
(110^010)&110->100-1 bit
now we know the min bit is 110 we pick it and compute the cost ,update the pattern and so on..
This method returns the cost of a string with respect to current pattern
private static int computeCost(String currPattern, String costString) {
int a = currPattern.isEmpty()?0:Integer.parseInt(currPattern, 2);
int b = Integer.parseInt(costString, 2);
int cost = 0;
int c = (a ^ b) & b;
cost = (int) Math.pow(countSetBits(c), 2);
return cost;
}

Multi Thread Processing in Java

Im just having a few issues getting my head around this problem. Any help would be greatly appreciated.
The program must read a text file, where it will compute the sum of divisors of each input number. For example, number 20 has the sum 1+2+4+5+10=22. Then these sums are summed up line-by-line. For each of these sums above the divisor is then found, and then finally they are totaled up.
E.g Initial File
1 2 4 6 15 20
25 50 100 125 250 500
16 8 3
Then computes the sum of divisors.
1 1 3 6 9 22
6 43 117 31 218 592
15 7 1
Summed up line by line
42
1007
23
Then above sums are computed.
54
73
1
Then finally totaled up and returned.
128
I need to complete the process with each new line being completed by a threadpool.
My logic is as follows.
5.2. For each input line (Add each line to an ArrayBlockingQueue,
Then add each item in the Queue to an ExecutorService Which will run the follow)
5.2.1. Parse the current input line into integers
5.2.2. For each integer in the current input line
5.2.2.1. Compute the sum-of-divisors of this integer
5.2.2.2. Add this to the cumulated sum-of-divisors
5.2.3. Compute the sum-of-divisors of this cumulated sum
5.2.4. Add this to the grand total
I get stuck after 5.2, Do I either create a new class that implements the runnable interface and then adds the cumulated sum to an atomicArray, or is best to create a class that implements the callable interface and then get it to return the cumulated sum? Or is there a completely different way.
Here is what i have so far which returns the desired result but in a sequential matter.
http://pastebin.com/AyB58fpr
Use
java.util.concurrent.Future
and a
java.util.concurrent.Executors.newFixedThreadPool(int nThreads)
This will be really easy to do.
Follow the Oracle tutorial if you are not familiar with Executors.
I prefer the Callable interface since that doesn't create a dependency of the code which processes the input to how the output is gathered.
The usual approach is to collect the tasks in a list of Futures. See this answer for an example.

Reducing the granularity of a data set

I have an in-memory cache which stores a set of information by a certain level of aggregation - in the Students example below let's say I store it by Year, Subject, Teacher:
# Students Year Subject Teacher
1 30 7 Math Mrs Smith
2 28 7 Math Mr Cork
3 20 8 Math Mrs Smith
4 20 8 English Mr White
5 18 8 English Mr Book
6 10 12 Math Mrs Jones
Now unfortunately my cache doesn't have GROUP BY or similar functions - so when I want to look at things at a higher level of aggregation, I will have to 'roll up' the data myself. For example, if I aggregate Students by Year, Subject the aforementioned data would look like so:
# Students Year Subject
1 58 7 Math
2 20 8 Math
3 38 8 English
4 10 12 Math
My question is thus - how would I best do this in Java? Theoretically I could be pulling back tens of thousands of objects from this cache, so being able to 'roll up' these collections quickly may become very important.
My initial (perhaps naive) thought would be to do something along the following lines;
Until I exhaust the list of records:
Each 'unique' record that I come
across is added as a key to a
hashmap.
If I encounter a record that
has the same data for this new level
of aggregation, add its quantity to
the existing one.
Now for all I know this is a fairly common problem and there's much better ways of doing this. So I'd welcome any feedback as to whether I'm pointing myself in the right direction.
"Get a new cache" not an option I'm afraid :)
-Dave.
Your "initial thought" isn't a bad approach. The only way to improve on it would be to have an index for the fields on which you are aggregating (year and subject). (That's basically what a dbms does when you define an index.) Then your algorithm could be recast as iterating through all index values; you wouldn't have to check the results hash for each record.
Of course, you would have to build the index when populating the cache and maintain it as data is modified.

Interpolating Large Datasets On the Fly

Interpolating Large Datasets
I have a large data set of about 0.5million records representing the exchange rate between the USD / GBP over the course of a given day.
I have an application that wants to be able to graph this data or maybe a subset. For obvious reasons I do not want to plot 0.5 million points on my graph.
What I need is a smaller data set (100 points or so) which accurately (as possible) represents the given data. Does anyone know of any interesting and performant ways this data can be achieved?
Cheers, Karl
There are several statistical methods for reducing a large dataset to a smaller, easier to visualize dataset. It's not clear from your question what summary statistic you want. I've just assumed that you want to see how the exchange rate changes as a function of time, but perhaps you are interested in how often the exchange rate goes above a certain value, or some other statistic that I'm not considering.
Summarizing a trend over time
Here is an example using the lowess method in R (from the documentation on scatter plot smoothing):
> library(graphics)
# print out the first 10 rows of the cars dataset
> cars[1:10,]
speed dist
1 4 2
2 4 10
3 7 4
4 7 22
5 8 16
6 9 10
7 10 18
8 10 26
9 10 34
10 11 17
# plot the original data
> plot(cars, main = "lowess(cars)")
# fit a loess-smoothed line to the points
> lines(lowess(cars), col = 2)
# plot a finger-grained loess-smoothed line to the points
> lines(lowess(cars, f=.2), col = 3)
The parameter f controls how tightly the regression fits to your data. Use some thoughtfulness with this, as you want something that accurately fits your data without overfitting. Rather than speed and distance, you could plot the exchange rate versus time.
It's also straightforward to access the results of the smoothing. Here's how to do that:
> data = lowess( cars$speed, cars$dist )
> data
$x
[1] 4 4 7 7 8 9 10 10 10 11 11 12 12 12 12 13 13 13 13 14 14 14 14 15 15 15 16 16 17 17 17 18 18 18 18 19 19
[38] 19 20 20 20 20 20 22 23 24 24 24 24 25
$y
[1] 4.965459 4.965459 13.124495 13.124495 15.858633 18.579691 21.280313 21.280313 21.280313 24.129277 24.129277
[12] 27.119549 27.119549 27.119549 27.119549 30.027276 30.027276 30.027276 30.027276 32.962506 32.962506 32.962506
[23] 32.962506 36.757728 36.757728 36.757728 40.435075 40.435075 43.463492 43.463492 43.463492 46.885479 46.885479
[34] 46.885479 46.885479 50.793152 50.793152 50.793152 56.491224 56.491224 56.491224 56.491224 56.491224 67.585824
[45] 73.079695 78.643164 78.643164 78.643164 78.643164 84.328698
The data object that you get back contains entries named x and y, which correspond to the x and y values passed into the lowess function. In this case, x and y represent speed and dist.
One thought is use the DBMS to compress the data for you using an appropriate query. Something along the lines of having it take a median for a specific range, a pseudo-query:
SELECT truncate_to_hour(rate_ts), median(rate) FROM exchange_rates
WHERE rate_ts >= start_ts AND rate_ts <= end_ts
GROUP BY truncate_to_hour(rate_ts)
ORDER BY truncate_to_hour(rate_ts)
Where truncate_to_hour is something appropriate to your DBMS. Or a similar approach with some kind of function to segment the time into unique blocks (such as round to nearest 5 minute interval), or another math function to aggregate the group thats appropriate in place of median. Given the complexity of the time segmenting procedure and how your DBMS optimizes it may be more efficient to run a query on a temporary table with the segmented time value.
If you wanted to write your own, one obvious solution would be to break your record set into fixed number-of-points chunks, for which the value would be the average (mean, median, ... pick one). This has the probable advantage of being the fastest, and shows overall trends.
But it lacks the drama of price ticks. A better solution would probably involve looking for the inflection points, then selecting among them using sliding windows. This has the advantage of better displaying the actual events of the day, but will be slower.
Something like RRDTool would do what you need automatically - the tutorial should get you started, and drraw will graph the data.
I use this at work for things like error graphs, I don't need 1-minute resolution for a 6-month time period, only for the most recent few hours. After that I have 1-hour resolution for a few days, then 1-day resolution for a few months.
The naive approach is simply calculating an average per timeinterval corresponding to a pixel.
http://commons.wikimedia.org/wiki/File:Euro_exchange_rate_to_AUD.svg
This does not show flunctuations. I would suggest also calculating the standard deviation in each time interval and plot that too (essentially making each pixel higher than one single pixel). I could not locate an example, but I know that Gnuplot can do this (but is not written in Java).
How about to make enumeration/iterator wrapper. I'm not familiar with Java, but it may looks similar to:
class MedianEnumeration implements Enumeration<Double>
{
private Enumeration<Double> frameEnum;
private int frameSize;
MedianEnumeration(Enumeration<Double> e, int len) {
frameEnum = e;
frameSize = len;
}
public boolean hasMoreElements() {
return frameEnum.hasMoreElements();
}
public Double nextElement() {
Double sum = frameEnum.nextElement();
int i;
for(i=1; (i < frameSize) && (frameEnum.hasMoreElements()); ++i) {
sum += (Double)frameEnum.nextElement();
}
return (sum / i);
}
}

Categories