Parse CSV file to give values in ranges Java

Parse CSV file to give values in ranges Java - java

I have a CSV file which has data on it such as:
File % Diff
testfile0.xml 44.948
testfile1.xml 22.232
testfile2.xml 45.343
testfile3.xml 2.345
testfile4.xml -3.948
testfile5.xml 22.232
testfile6.xml 45.343
testfile7.xml 2.345
testfile8.xml -3.948
testfile9.xml 90.948
I'd like to summarize this data and categorize the %diff results into 10% increments
%Grouping No files
neg-00 2
00-10 2
10-20 0
20-30 2
30-40 0
40-50 3
50-60 0
60-70 0
70-80 0
80-90 1
90-100 1
So essentially, how can I group doubles together in ranges of 10% increments using Java. Any help of assistance would be much appreciated.

If your groups are -1 to 9 then:
groupNumber = (diff<=0) ? -1 : (diff.intValue() / 10);

Store the counts of the different increments in an array with 11 locations. Initialize the 11 counts to 0.
Read the double value on each line.
If the double value is negative, increment the count in the 0th index. Otherwise, truncate the double value to an integer, and then integer divide the result by 10 and add 1 to get the index of the count to increment.

Related

The files in a folder are stored in an _nXm_ matrix, where n gives the number of rows and m gives the number of columns

In a interview I was asked this 2D matrix problem but not able to think the logic.
Please help me with logic.
The files in a folder are stored in an nXm matrix, where n gives the
number of rows and m gives the number of columns. The numbering
system starts from (1,1).
There is a powerful virus in one of the files and the location of the
file is given by (r,c).
The virus spreads to adjacent blocks in one second. From each infected
block, it takes another second to spread to its adjacent blocks. And
so on.
For example, if the virus is at (1,1), it takes a second to spread to
the blocks (1,2), (2,1) and (2,2). After two seconds, the infected
blocks are (1,1), (1,2), (2,1), (2,2), (1,3), (2,3), (3,3), (3,2),
(3,1).
And so on.
So, given the values of n, m and (r,c), find the number of seconds it
will take to spread through the entire folder.
Input Format
The input contains:
The first line contains t test cases.
Each test case contains two lines:
The first line contains n and m separated by a space. Next line of the
test case contains (r,c) which gives the position of the virus
infected file in the folder. Output Format
The output contains t lines each of which contains the time needed for
the virus to spread to the entire folder in minutes and seconds. Note
that if the time taken is less than a minute, the output should be x
seconds. If the time is 1 second, the output should be 1 second. If
the time is 1 minute, then the time should be output as 1 minute 0
seconds. See the test cases for clarity.
Sample Input
3
6 5
(2,2)
100 50
(39,5)
44 130
(1,1)
Sample Output
4 seconds
1 minute 1 second
2 minutes
9 seconds
Explanation
For the first test case, there are 6 rows and 5 columns. The virus is
at position (2,2) it will take 1 second to spread to (1,1), (1,2),
(1,3), (2,1), (2,3), (3,1), (3,2), (3,3). From there it will take 1
more second to spread to adjacent files. To spread to all the files,
it will take total of 4 seconds.

Just an idea (need to be proven).
n - row count
m - column count
r - row index of virus starting pos.
c - column index of virus starting pos.
then answer will be Math.max(n - r, m - c);
Test cases:
Math.max(6 - 2, 5 - 2) == 4 seconds;
Math.max(100 - 39, 50 - 5) == 61 seconds == 1 minute 1 second;
Math.max(44 - 1, 130 - 1) == 129 seconds == 2 minutes 9 seconds;
Explanation:
Let's simplify and suppose we have just one row [1; 100] and the virus starts spreading from 1, then in 99 seconds all cells would be infested.
If the virus starts from 50, then in 50 seconds infestation would be completed.
So we have the following formula rowSize - (startPos * 2) + startPos <=> rowSize - startPos

You should create a 2 dimentional array (m, n) of boolean values, initiating everything as false. E.g. true means infected, false means not infected.
Also, start by making true the first infected cell at (r,c).
And then start a loop with the condition "while there are still cells not-infected (false)".
Inside this loop, make a loop through all the cells of the table: for every infected cell you should find the adjacent not-infected (false) cells (over, right, left, under) - if they exist, and keep them in a list.
The big loop will end when all the cells are infected. For every iteration in the loop, add 1 to a seconds variable. This seconds variable at the end will be the number of seconds needed for the full infection of the entire folder.
Here's some pseudocode:
table initialization (m,n)
set first infected cell (c,r)
number of seconds = 0
while (not all cells infected) {
reset the infectionExtensionList
for (i: columns) {
for (j: rows) {
if (i,j is infected) {
append to the infectionExtensionList: (i+1,j), (i-1,j), (i,j+1), (i,j-1) // if they exist!
}
}
}
turn all the cells of the infectionExtensionList to infected
number of seconds ++
}
return number of seconds
Good luck!

Competitive Coding - Clearing all levels with minimum cost : Not passing all test cases

I was solving problems on a competitive coding website when I came across this. The problem states that:
In this game there are N levels and M types of available weapons. The levels are numbered from 0 to N-1 and the weapons are numbered from 0 to M-1 . You can clear these levels in any order. In each level, some subset of these M weapons is required to clear this level. If in a particular level, you need to buy x new weapons, you will pay x^2 coins for it. Also note that you can carry all the weapons you have currently to the next level . Initially, you have no weapons. Can you find out the minimum coins required such that you can clear all the levels?
Input Format
The first line of input contains 2 space separated integers:
N = the number of levels in the game
M = the number of types of weapons
N lines follows. The ith of these lines contains a binary string of length M. If the jth character of
this string is 1 , it means we need a weapon of type j to clear the ith level.
Constraints
1 <= N <=20
1<= M <= 20
Output Format
Print a single integer which is the answer to the problem.
Sample TestCase 1
Input
1 4
0101
Output
4
Explanation
There is only one level in this game. We need 2 types of weapons - 1 and 3. Since, initially Ben
has no weapons he will have to buy these, which will cost him 2^2 = 4 coins.
Sample TestCase 2
Input
3 3
111
001
010
Output
3
Explanation
There are 3 levels in this game. The 0th level (111) requires all 3 types of weapons. The 1st level (001) requires only weapon of type 2. The 2nd level requires only weapon of type 1. If we clear the levels in the given order(0-1-2), total cost = 3^2 + 0^2 + 0^2 = 9 coins. If we clear the levels in the order 1-2-0, it will cost = 1^2 + 1^2 + 1^2 = 3 coins which is the optimal way.
Approach
I was able to figure out that we can calculate the minimum cost by traversing the Binary Strings in a way that we purchase minimum possible weapons at each level.
One possible way could be traversing the array of binary strings and calculating the cost for each level while the array is already arranged in the correct order. The correct order should be when the Strings are already sorted i.e. 001, 010, 111 as in case of the above test case. Traversing the arrays in this order and summing up the cost for each level gives the correct answer.
Also, the sort method in java works fine to sort these Binary Strings before running a loop on the array to sum up cost for each level.
Arrays.sort(weapons);
This approach work fine for some of the test cases, however more than half of the test cases are still failing and I can't understand whats wrong with my logic. I am using bitwise operators to calculate the number of weapons needed at each level and returning their square.
Unfortunately, I cannot see the test cases that are failing. Any help is greatly appreciated.

This can be solved by dynamic programming.
The state will be the bit mask of weapons we currently own.
The transitions will be to try clearing each of the n possible levels in turn from the current state, acquiring the additional weapons we need and paying for them.
In each of the n resulting states, we take the minimum cost of the current way to achieve it and all previously observed ways.
When we already have some weapons, some levels will actually require no additional weapons to be bought; such transitions will automatically be disregarded since in such case, we arrive at the same state having paid the same cost.
We start at the state of m zeroes, having paid 0.
The end state is the bitwise OR of all the given levels, and the minimum cost to get there is the answer.
In pseudocode:
let mask[1], mask[2], ..., mask[n] be the given bit masks of the n levels
p2m = 2 to the power of m
f[0] = 0
all f[1], f[2], ..., f[p2m-1] = infinity
for state = 0, 1, 2, ..., p2m-1:
current_cost = f[state]
current_ones = popcount(state) // popcount is the number of 1 bits
for level = 1, 2, ..., n:
new_state = state | mask[level] // the operation is bitwise OR
new_cost = current_cost + square (popcount(new_state) - current_ones)
f[new_state] = min (f[new_state], new_cost)
mask_total = mask[1] | mask[2] | ... | mask[n]
the answer is f[mask_total]
The complexity is O(2^m * n) time and O(2^m) memory, which should be fine for m <= 20 and n <= 20 in most online judges.

The dynamic optimization idea by #Gassa could be extended by using A* by estimating min and max of the remaining cost, where
minRemaining(s)=bitCount(maxState-s)
maxRemaining(s)=bitCount(maxState-s)^2
Start with a priority queue - and base it on cost+minRemaining - with the just the empty state, and then replace a state from this queue that has not reached maxState with at most n new states based the n levels:
Keep track bound=min(cost(s)+maxRemaining(s)) in queue,
and initialize all costs with bitCount(maxState)^2+1
extract state with lowest cost
if state!=maxState
remove state from queue
for j in 1..n
if (state|level[j]!=state)
cost(state|level[j])=min(cost(state|level[j]),
cost(state)+bitCount(state|level[j]-state)^2
if cost(state|level[j])+minRemaining(state|level[j])<=bound
add/replace state|level[j] in queue
else break
The idea is to skip dead-ends. So consider an example from a comment
11100 cost 9 min 2 max 4
11110 cost 16 min 1 max 1
11111 cost 25 min 0 max 0
00011 cost 4 min 3 max 9
bound 13
remove 00011 and replace with 11111 (skipping 00011 since no change)
11111 cost 13 min 0 max 0
11100 cost 9 min 2 max 4
11110 cost 16 min 1 max 1
remove 11100 and replace with 11110 11111 (skipping 11100 since no change):
11111 cost 13 min 0 max 0
11110 cost 10 min 1 max 1
bound 11
remove 11110 and replace with 11111 (skipping 11110 since no change)
11111 cost 11 min 0 max 0
bound 11
Number of operations should be similar to dynamic optimization in the worst case, but in many cases it will be better - and I don't know if the worst case can occur.

The logic behind this problem is that every time you have to find the minimum count of set bits corresponding to a binary string which will contain the weapons so far got in the level.
For ex :
we have data as
4 3
101-2 bits
010-1 bits
110-2 bits
101-2 bits
now as 010 has min bits we compute cost for it first then update the current pattern (by using bitwise OR) so current pattern is 010
next we find the next min set bits wrt to current pattern
i have used the logic by first using XOR for current pattern and the given number then using AND with the current number(A^B)&A
so the bits become like this after the operation
(101^010)&101->101-2 bit
(110^010)&110->100-1 bit
now we know the min bit is 110 we pick it and compute the cost ,update the pattern and so on..
This method returns the cost of a string with respect to current pattern
private static int computeCost(String currPattern, String costString) {
int a = currPattern.isEmpty()?0:Integer.parseInt(currPattern, 2);
int b = Integer.parseInt(costString, 2);
int cost = 0;
int c = (a ^ b) & b;
cost = (int) Math.pow(countSetBits(c), 2);
return cost;
}

smallest possible sum of the values of subarrays

I'm trying to understand question and solving it using java.
But first I'm not able to understand properly.
Here is the question:
You are given an array a of length n and an integer c.
The value of some array b of length k is the sum of its elements except for the smallest. For example, the value of the array [3, 1, 6, 5, 2] with c = 2 is 3 + 6 + 5 = 14.
Among all possible partitions of a into contiguous subarrays output the smallest possible sum of the values of these subarrays.
Input
The first line contains integers n and c (1 ≤ n, c ≤ 100 000).
The second line contains n integers ai (1 ≤ ai ≤ 109) — elements of a.
Output
Output a single integer — the smallest possible sum of values of these subarrays of some partition of a.
Examples
inputCopy
3 5
1 2 3
output
6
inputCopy
12 10
1 1 10 10 10 10 10 10 9 10 10 10
output
92
inputCopy
7 2
2 3 6 4 5 7 1
output
17
inputCopy
8 4
1 3 4 5 5 3 4 1
output
23
In the third example one of the optimal partitions is [2, 3], [6, 4, 5, 7], [1] with the values 3, 13 and 1 respectively.
My Understanding:
1) Partition is being being done within continuous numbers. Correct ?
2) What is the significance of Integer c in input ?
3) How is being done in third example ? I mean after having subarrays, How 13 came out from second subarray ?
Can anyone help me to understand the question ? I can write code myself.

Big O notation and not understading from class lecture

I have this problem where in class my professor said that the below statement is O(log(n)) where I thought it was O(n). Could someone please clarify how it is O(log(n))?
Printing a number of magnitude n in binary. Assume that printing each bit requires constant time.

You should work out some examples. Write some numbers in binary. For example, how many bits are there in 63, 255, and 511? Notice that the number of bits does not grow nearly as quickly as the number itself.

It's O(log(n)) because you have to divide by 2 every time you are going to print a 0 or 1.
For example, to print 256 in binary, you'd have to divide by 2 starting from 256 and print the result of % 2 every time.
256 % 2 -> 0
64% 2 -> 0
32 % 2 -> 0
16 % 2 -> 0
8 % 2 -> 0
4 % 2 -> 0
2 % 2 -> 0
1 % 2 -> 1
So, for a number of magnitude 256 you would have to iterate 8 times which is equals to log 256.

O(log(n)) is all about cutting data by half.
When each step of an algorithm rules out a fraction of the remaining input -- e.g. you always cut the space in half, or by a third, or even to 99/100 of the previous step -- that algorithm runs in O(log(n)) time.
hope this helps

How to reduce the time complexity of bucket filling program?

I was solving a problem which states following:
There are n buckets in a row. A gardener waters the buckets. Each day he waters the buckets between positions i and j (inclusive). He does this for t days for different i and j.
Output the volume of the waters in the buckets assuming initially zero volume and each watering increases the volume by 1.
Input: first line contains t and n seperated by spaces.
The next t lines contain i and j seperated by spaces.
Output: a single line showing the volume in the n buckets seperated by spaces.
Example:
Input:
2 2
1 1
1 2
Output:
2 1
Constraints:
0 <= t <= 104; 1 <= n <= 105
I tried this problem. But I use O(n*t) algorithm. I increment each time the bucket from i to j at each step. But this shows time limit error. Is there any efficient algorithm to solve this problem. A small hint would suffice.
P.S: I have used C++ and Java as tags bcoz the program can be programmed in both the languages.

Instead of remembering the amount of water in each bucket, remember the difference between each bucket and the previous one.

have two lists of the intervals, one sorted by upper, one by lower bound
then iterate over n starting with a volume v of 0.
On each iteration over n
check if the next interval starts at n
if so increase v by one and check the next interval.
do the same for the upper bounds but decrease the volume
print v
repeat with the next n

I think the key observation here is that you need to figure out a way to represent your (possibly) 105 buckets without actually allocating space for each and every one of them, and tracking them separately. You need to come up with a sparse representation to track your buckets, and the water inside.
The fact that your input comes in ranges gives you a good hint: you should probably make use of ranges in your sparse representation. You can do this by just tracking the buckets on the ends of each range.
I suggest you do this with a linked list. Each list node will contain 2 pieces of information:
a bucket number
the amount of water in that bucket
You assume that all buckets between the current bucket and the next bucket have the same volume of water.
Here's an example:
Input:
5 30
1 5
4 20
7 13
25 30
19 27
Here's what would happen on each step of the algorithm, with step 1 being the initial state, and each successive step being what you do after parsing a line.
1:0→NULL (all buckets are 0)
1:1→6:0→NULL (1-5 have 1, rest are 0)
1:1→4:2→6:1→21:0→NULL (1-3 have 1, 4-5 have 2, 6-20 have 1, rest have 0)
1:1→4:2→6:1→7:2→14:1→21:0→NULL
1:1→4:2→6:1→7:2→14:1→21:0→25:1→NULL
1:1→4:2→6:1→7:2→14:1→19:2→21:1→25:2→28:1→NULL
You should be able to infer from the above example that the complexity with this method is actually O(t2) instead of O(n×t), so this should be much faster. As I said in my comment above, the bottleneck this way should actually be the parsing and output rather than the actual computation.

Here's an algorithm with space and time complexity O(n)
I am using java since I am used to it
1) Create a hashset of n elements
2) Each time a watering is made increase the respective elements count
3) After file parsing is complete then iterate over hashset to calculate result.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.