Seeking maximum capacity utilization for 0-1 Multidimensional Knapsack - java

The typical objective function of 0-1 Multidimensional Knapsack is to maximize the value of all items in the knapsack. A good algorithm was provided in the Stackexchange link here: 0-1 Multidimensional Knapsack.
However, what if my objective function is to pack as many items in the knapsack as possible? All pieces have equal values. The Stackexchange post (Knapsack problem with all profits equal to 1) claims that a one dimensional knapsack with equal values can be solved by Greedy Algorithm. Is that true? I thought the 01 knapsack problem is NP-hard therefore the greedy algorithm may not give the optimum solution.
So my questions are two-part?
1) can optimum solution be given by greedy algorithm in this case? 01 knapsack with equal values
2) how to implement a multi-dimensional greedy algorithm? the vi/wi is a value divided by a vector...

1.) The knapsack problem IS an NP-Hard problem. So in short, no, you cannot solve it optimally using a greedy algorithm. Instead heuristic approaches exist that can get you very close.
2.) In the case of equal profit knapsack, this will likely to degrade to a simple bin packing problem. In this context if all choices are equal in terms of profit, you will likely need to focus on a different aspect of the problem which is probably something like "size." If that is the case you will want to pick the smallest item you can each time - in which case a greedy algorithm is probably sufficient and can be achieved by simply looking through your choices and choosing the smallest element.
It should be noted that a linear search can add an annoying amount of overhead to your program if its being repeated a great deal of times. Instead you may want to consider using a MIN-Heap as this will make the smallest value available at a lower computational cost.

Related

Find the longest repeated subarray

Given an array a[0..N-1] of integers between 0 < N < 10^5, find the size of the longest most repeated subarray in optimal time and space. I was thinking of using a HashMap to store the indexes where a number begins, and start checking for each index the possible subarrays it can form. But this seems inefficient, so I would like to hear some opinions of how I can approach this problem.
Example:
Input a = [0,1,2,1,4,1,2,1,0,5]
Expected output: Most Repeated: [1,2,1]; Size: 3
Such problems are known to have a naive approach - something that takes a lot of time and performs a brute force search for all the possible solutions.
Obviously, you are not looking for that. Whenever you are in such a situation, the answer is always Dynamic Programming. It is a fairly broad, difficult for beginners and very important in computer science. Which means, I cannot cover it in this answer.
But here is one approach on how you can solve this problem.
Since a common subarray of A and B must start at some A[i] and B[j], let dp[i][j] be the longest common prefix of A[i:] and B[j:]. Whenever A[i] == B[j], we know dp[i][j] = dp[i+1][j+1] + 1. Also, the answer is max(dp[i][j]) over all i, j.
We can perform bottom-up dynamic programming to find the answer based on this recurrence. Our loop invariant is that the answer is already calculated correctly and stored in dp for any larger i, j.
Hope this helps. Good luck.
I believe, this problem is equivalent to finding the longest repeated substring, https://en.wikipedia.org/wiki/Longest_repeated_substring_problem. If overlapping substrings are allowed, the problem can be efficiently solved using a suffix tree in linear time. However, in this case it seems that identifying non-overlapping substrings is required, and this approach doesn't work. At least, it can be solved in quadratic time using dynamic programming, https://www.geeksforgeeks.org/longest-repeating-and-non-overlapping-substring/.

Recursive Knapsack in Java

I have read many variations of the Knapsack problem, but the version I am tasked with is a little different and I don't quite understand how to solve it.
I have an array of integers that represent weights (ie. {1,4,6,12,7,2}) and need to find only one solution that adds up to the target weight.
I understand the basic algorithm, but I cannot understand how to implement it.
First, what is my base case? Is it when the array is empty? The target has been reached? The target has been over-reached? Or maybe some combination?
Second, when the target is over-reached, how do I backtrack and try the next item?
Third, what am I supposed to return? Should I be returning ints (in which case, am I supposed to print them out?)? Or do I return arrays and the final return is the solution?
Think carefully about the problem you are trying to solve. To approach this problem, I am
considering the inputs and outputs of the Knapsack algorithm.
Input: A set of integers (the knapsack) and a single integer (the proposed sum)
Output: A set of integers who add up to the proposed sum, or null.
In this way your code might look like this
public int[] knapsackSolve(int[] sack, int prospectiveSum) {
//your algorithm here
}
The recursive algorithm is straightforward. First compute the sum of the sack. If it equals
prospectiveSum then return the sack. Otherwise iterate over sack, and for each item initialise a new knapsack with that item removed. Call knapsackSolve on this. If there is a solution, return it. Otherwise proceed to the next item.
For example if we call knapsackSolve({1,2,3},5) the algorithm tries 1 + 2 + 3 = 5 which is false. So it loops through {1,2,3} and calls knapsackSolve on the sublists {2,3},{1,3} and {1,2}. knapsackSolve({2,3},5) is the one that returns a solution.
This isn't a very efficient algorithm although it illustrates fairly well how complex the Knapsack problem is!
Basically the Knapsack problem is formulated as (from Wikipedia): "Given a set of items, each with a mass and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. For your problem we are interested in weights only. So you can set all values to 1. Additionally we only want to know if the target weight can be reached exactly. I guess you are allowed to use a weight only once and not multiple times?
This problem can be solved nicely with dynamic programming. Are you familiar with that principle? Applying dynamic programming is much more elegant and quicker to program than backtracking. But if you are only allowed to do backtracking use the approach from user2738608 posted above.

Time complexity assignment

I have an assignment in my intro to programming course that I don't understand at all. I've been falling behind because of problems at home. I'm not asking you to do my assignment for me I'm just hoping for some help for a programming boob like me.
The question is this:
Calculate the time complexity in average case for searching, adding, and removing in a
- unsorted vector
- sorted vector
- unsorted singlelinked list
- sorted singlelinked list
- hash table
Let n be the number of elements in the datastructure
and present the solution in a
table with three rows and five columns.
I'm not sure what this even means.. I've read as much as I can about time complexity but I don't understand it.. It's so confusing. I don't know where I would even start.. Remember I'm a novice programmer, as dumb as they come. I did really well last semester but had problems at home at the start of this one so I missed a lot of lectures and the first assignments so now I'm in over my head..
Maybe if someone could give me the answer and the reasoning behind it on a couple of them I could maybe understand it and do the others? I have a hard time learning through theory, examples work best.
Time complexity is a formula that describes how the cost of an operation varies related to the number of elements. It is usually expressed using "big-O" notation, for example O(1) or constant time, O(n) where cost relates linearly to n, O(n2) where cost increases as the square of the size of the input. There can be others involving exponentials or logarithms. Read up on "Big-O Notation".
You are being asked to evaluate five different data structures, and provide average cost for three different operations on each data structure (hence the table with three rows and five columns).
Time complexity is an abstract concept, that allows us to compare the complexity of various algorithms by looking at how many operations are performed in order to handle its inputs. To be precise, the exact number of operations isn't important, the bottom line is, how does the number of operations scale with increasing complexity of inputs.
Generally, the number of inputs is denoted as n and the complexity is denoted as O(p(n)), with p(n) being some kind of expression with n. If an algorithm has O(n) complexity, it means, that is scales linearly, with every new input, the time needed to run the algorithm increases by the same amount.
If an algorithm has complexity of O(n^2) it means, that the amount of operations grows as a square of number of inputs. This goes on and on, up to exponencially complex algorithms, that are effectively useless for large enough inputs.
What your professor asks from you is to have a look at the given operations and judge, how are going to scale with increasing size of lists, you are handling. Basically this is done by looking at the algorithm and imagining, what kinds of cycles are going to be necessary. For example, if the task is to pick the first element, the complexity is O(1), meaning that it doesn't depend on the size of input. However, if you want to find a given element in the list, you already need to scan the whole list and this costs you depending on the list size. Hope this gives you a bit of an idea how algorithm complexity works, good luck with your assignment.
Ok, well there are a few things you have to start with first. Algorithmic complexity has a lot of heavy math behind it and so it is hard for novices to understand, especially if you try to look up Wikipedia definitions or more-formal definitions.
A simple definition is that time-complexity is basically a way to measure how much an operation costs to perform. Alternatively, you can also use it to see how long a particular algorithm can take to run.
Complexity is described using what is known as big-O notation. You'll usually end up seeing things like O(1) and O(n). n is usually the number of elements (possibly in a structure) on which the algorithm is operating.
So let's look at a few big-O notations:
O(1): This means that the operation runs in constant time. What this means is that regardless of the number of elements, the operation always runs in constant time. An example is looking at the first element in a non-empty array (arr[0]). This will always run in constant time because you only have to directly look at the very first element in an array.
O(n): This means that the time required for the operation increases linearly with the number of elements. An example is if you have an array of numbers and you want to find the largest number. To do this, you will have to, in the worst case, look at every single number in the array until you find the largest one. Why is that? This is because you can have a case where the largest number is the last number in the array. So you cannot be sure until you have examined every number in the array. This is why the cost of this operation is O(n).
O(n^2): This means that the time required for the operation increases quadratically with the number of elements. This usually means that for each element in the set of elements, you are running through the entire set. So that is n x n or n^2. A well-known example is the bubble-sort algorithm. In this algorithm you run through and swap adjacent elements to ensure that the array is sorted according to the order you need. The array is sorted when no-more swaps need to be made. So you have multiple passes through the array, which in the worst case is equal to the number of elements in the array.
Now there are certain things in code that you can look at to get a hint to see if the algorithm is O(n) or O(n^2).
Single loops are usually O(n), since it means you are iterating over a set of elements once:
for(int i = 0; i < n; i++) {
...
}
Doubly-nested loops are usually O(n^2), since you are iterating over an entire set of elements for each element in the set:
for(int i = 0; i < n; i++) {
for(j = 0; j < n; j++) {
...
}
}
Now how does this apply to your homework? I'm not going to give you the answer directly but I will give you enough and more hints to figure it out :). What I wrote above, describing big-O, should also help you. Your homework asks you to apply runtime analyses to different data structures. Well, certain data structures have certain runtime properties based on how they are set up.
For example, in a linked list, the only way you can get to an element in the middle of the list, is by starting with the first element and then following the next pointer until you find the element that you want. Think about that. How many steps would it take for you to find the element that you need? What do you think those steps are related to? Do the number of elements in the list have any bearing on that? How can you represent the cost of this function using big-O notation?
For each datastructure that your teacher has asked you about, try to figure out how they are set up and try to work out manually what each operation (searching, adding, removing) entails. I'm talking about writing the steps out and drawing pictures of the strucutres on paper! This will help you out immensely! Looking at that, you should have enough information to figure out the number of steps required and how it relates to the number of elements in the set.
Using this approach you should be able to solve your homework. Good luck!

Addition Chains [duplicate]

How can you compute a shortest addition chain (sac) for an arbitrary n <= 600 within one second?
Notes
This is the programming competition on codility for this month.
Addition chains are numerically very important, since they are the most economical way to compute x^n (by consecutive multiplications).
Knuth's Art of Computer Programming, Volume 2, Seminumerical Algorithms has a nice introduction to addition chains and some interesting properties, but I didn't find anything that enabled me to fulfill the strict performance requirements.
What I've tried (spoiler alert)
Firstly, I constructed a (highly branching) tree (with the start 1-> 2 -> ( 3 -> ..., 4 -> ...)) such that for each node n, the path from the root to n is a sac for n. But for values >400, the runtime is about the same as for making a coffee.
Then I used that program to find some useful properties for reducing the search space. With that, I'm able to build all solutions up to 600 while making a coffee. But for n, I need to compute all solutions up to n. Unfortunately, codility measures the class initialization's runtime, too...
Since the problem is probably NP-hard, I ended up hard-coding a lookup table. But since codility asked to construct the sac, I don't know if they had a lookup table in mind, so I feel dirty and like a cheater. Hence this question.
Update
If you think a hard-coded, full lookup table is the way to go, can you give an argument why you think a full computation/partly computed solutions/heuristics won't work?
I have just got my Golden Certificate for this problem. I will not provide a full solution because the problem is still available on the site.I will instead give you some hints:
You might consider doing a deep-first search.
There exists a minimal star-chain for each n < 12509
You need to know how prune your search space.
You need a good lower bound for the length of the chain you are looking for.
Remember that you need just one solution, not all.
Good luck.
Addition chains are numerically very important, since they are the
most economical way to compute x^n (by consecutive multiplications).
This is not true. They are not always the most economical way to compute x^n. Graham et. all proved that:
If each step in addition chain is assigned a cost equal to the product
of the numbers at that step, "binary" addition chains are shown to
minimize the cost.
Situation changes dramatically when we compute x^n (mod m), which is a common case, for example in cryptography.
Now, to answer your question. Apart from hard-coding a table with answers, you could try a Brauer chain.
A Brauer chain (aka star-chain) is an addition chain where each new element is formed as the sum of the previous element and some element (possibly the same). Brauer chain is a sac for n < 12509. Quoting Daniel. J. Bernstein:
Brauer's algorithm is often called "the left-to-right 2^k-ary method",
or simply "2^k-ary method". It is extremely popular. It is easy to
implement; constructing the chain for n is a simple matter of
inspecting the bits of n. It does not require much storage.
BTW. Does anybody know a decent C/C++ implementation of Brauer's chain computation? I'm working partially on a comparison of exponentiation times using binary and Brauer's chains for both cases: x^n and x^n (mod m).

Fast counting of 2D sub-matrices withing a large, dense 2D matrix?

What's a good algorithm for counting submatrices within a larger, dense matrix? If I had a single line of data, I could use a suffix tree, but I'm not sure if generalizing a suffix tree into higher dimensions is exactly straightforward or the best approach here.
Thoughts?
My naive solution to index the first element of the dense matrix and eliminate full-matrix searching provided only a modest improvement over full-matrix scanning.
What's the best way to solve this problem?
Example:
Input:
Full matrix:
123
212
421
Search matrix:
12
21
Output:
2
This sub-matrix occurs twice in the full matrix, so the output is 2. The full matrix could be 1000x1000, however, with a search matrix as large as 100x100 (variable size), and I need to process a number of search matrices in a row. Ergo, a brute force of this problem is far too inefficient to meet my sub-second search time for several matrices.
For an algorithms course, I once worked an exercise in which the Rabin-Karp string-search algorithm had to be extended slightly to search for a matching two-dimensional submatrix in the way you describe.
I think if you take the time to understand the algorithm as it is described on Wikipedia, the natural way of extending it to two dimensions will be clear to you. In essence, you just make several passes over the matrix, creeping along one column at a time. There are some little tricks to keep the time complexity as low as possible, but you probably won't even need them.
Searching an N×N matrix for a M×M matrix, this approach should give you an O(N²⋅M) algorithm. With tricks, I believe it can be refined to O(N²).
Algorithms and Theory of Computation Handbook suggests what is an N^2 * log(Alphabet Size) solution. Given a sub-matrix to search for, first of all de-dupe its rows. Now note that if you search the large matrix row by row at most one of the de-duped rows can appear at any position. Use Aho-Corasick to search this in time N^2 * log(Alphabet Size) and write down at each cell in the large matrix either null or an identifier for the matching row of the sub-matrix. Now use Aho-Corasick again to search down the columns of this matrix of row matches and signal a match where all the rows are present below each other.
This sounds similar to template matching. If motivated you could probably transform your original array with the FFT and drop a log from a brute force search. (Nlog(M)) instead of (NM)
I don't have a ready answer but here's how I would start:
-- You want very fast lookup, how much (time) can you spend on building index structures? When brute-force isn't fast enough you need indexes.
-- What do you know about your data that you haven't told us? Are all the values in all your matrices single-digit integers?
-- If they are single-digit integers (or anything else you can represent as a single character or index value), think about linearising your 2D structures. One way to do this would be to read the matrix along a diagonal running top-right to bottom-left and scanning from top-left to bottom-right. Difficult to explain in words, but read the matrix:
1234
5678
90ab
cdef
as 125369470c8adbef
(get it?)
Now you can index your super-matrix to whatever depth your speed and space requirements demand; in my example key 1253... points to element (1,1), key abef points to element (3,3). Not sure if this works for you, and you'll have to play around with the parameters to your solution. Choose your favourite method for storing the key-value pairs: a hash, a list, or even build some indexes into the index if things get wild.
Regards
Mark

Categories