How to find a cycle/repeats in a string?

How to find a cycle/repeats in a string? - java

I need to detect a cycle/sequence in a string and return the first occurrence. How should I go about doing it?
Example :
2 0 5 3 1 5 3 1 5 3 1
The first sequence to occur is 5 3 1.
There is no rules. The sequence can be half the string length, for example
5 3123 1231 231 31 231 41 452 3453 21 312312 5 3123 1231 231 31 231 41 452 3453 21 312312
The sequence is 5 3123 1231 231 31 231 41 452 3453 21 312312

Have you studied Floyds cycle-finding algorithm? That may well help you if you want to find cycles. Very easy to implement as well.

Clarification based on the comments: cycle means a sequence of digits which repeats immediately. So
1 1
would be a cycle
1 3 1
wouldn't because the potential cycle of 1s is interupted by 3
1 3 1 3
is a cycle (1 3).
So a basic algorithm could look like this.
Iterate of the String.
For each Digit find it next occurrence in the String. If nothing found continue with the next character.
If a next occurrence is found compare the sequence from the current digit up to the following occurrence with the sequence of same length beginning at the next occurence. If they are the same you found a cycle. If not continue with the next occurence.

Related

how to calculate of a program having unkown iterations

I have a program of which I am unable to find out the time complexity.
Pseudo-code of the mentioned program is as follows:
n = 10
while n > 1
print n;
if n is even
n = n / 2
else
n = 11 * n + 1
endwhile
Can anyone suggest me how to find out time complexity of this program and any of similar kind of programs like this one, where measuring of iterations are unpredictable.

This seems like a variation on the Collatz conjecture. The number of iterations you have to perform is equal to the total stopping time. It is also stated on the wiki page that it is not known whether every number eventually reaches 1.
Your case is slightly different in that it uses a different factor. It uses 11 instead of 3. We could try all sorts of different factors. If you use 5 instead of 11 (or 3) and start with n=5 you get a sequence which repeats it self.
5 -> 5 * 5 + 1 = 26
26 -> 26 / 2 = 13
13 -> 5 * 13 + 1 = 66
66 -> 66 / 2 = 33
33 -> 5 * 33 + 1 = 166
166 -> 166 / 2 = 83
83 -> 5 * 83 + 1 = 416
416 -> 416 / 2 = 208
208 -> 208 / 2 = 104
104 -> 104 / 2 = 52
52 -> 52 / 2 = 26
26 -> 26 / 2 = 13 (back to line 2)
When trying the same with factor 11 on a couple of odd numbers it seems to create divergent sequences. Meaning it will not stop (unless it overflows your integer type) and keeps growing without going into a loop. What would be left to do is to show that this happens (some mathematical proof perhaps), which may or may not be an easy task.
In general you need to show that there is some bound on the number of iterations that can occur, using specific details of the algorithm in question. For instance if the question used 5 instead of 11 the answer would be that your program will loop forever on some inputs. Any 2 ** k * 5 would cause infinite looping, so in Big O notation there exists no function which can be a valid bound for the time complexity.
Collatz Conjecture: https://en.wikipedia.org/wiki/Collatz_conjecture

you are right though that different values of n will yield different iterations. (online compiler here https://rextester.com/PKCTO53046)
start value 32: iteration 5
start value 31: iteration 31
start value 30: iteration 49
start value 29: iteration 67
start value 28: iteration 58
start value 27: iteration 81
start value 26: iteration 98
start value 25: iteration 89
start value 24: iteration 76
start value 23: iteration 35
start value 22: iteration 80
start value 21: iteration 71
start value 20: iteration 62
start value 19: iteration 49
start value 18: iteration 93
start value 17: iteration 71
start value 16: iteration 4
start value 15: iteration 48
start value 14: iteration 57
start value 13: iteration 97
start value 12: iteration 75
start value 11: iteration 79
start value 10: iteration 61
start value 9: iteration 92
start value 8: iteration 3
start value 7: iteration 56
start value 6: iteration 74
start value 5: iteration 60
start value 4: iteration 2
start value 3: iteration 73
start value 2: iteration 1
start value 1: iteration 0
so in your case you want to calculate how many iterations it takes for your int to overflow into negative based on your starting value, and the multiplier.
you need to rewrite your algo into a series of for loops based on the pattern of n- to the number of iteration (rather than the original "n" number), to be your N value.
you need to break your algorithm into different operations. you can see there's actually a pattern.
for example each operation can be calculated like below:
those n*2 (2,4,8,16,etc) is solved in n^0.5 (1,2,3,4,...)
those n*3
(3,6,12,24,etc) is solved in 72+(n/3) (73,74,75,...)
those n*5 (5,10,20,etc) is solved in (61,62,63,...)
those n*7 (7,14,28,etc) is solved in (56,57,58,...)
etc...
you add those up and take the highest O for the algorithm. my guess is that value will be O(N) after you remove all the constants.
based on https://medium.com/dataseries/how-to-calculate-time-complexity-with-big-o-notation-9afe33aa4c46
1 Break your algorithm/function into individual operations
2 Calculate the Big O of each operation
3 Add up the Big O of each operation together
4 Remove the constants
5 Find the highest order term — this will be what we consider the Big O of our algorithm/function

Check my math: The bouncycastle issue: Odds 2 non-equal passwords considered equal

This is the 'check if hashes are equal' code of BouncyCastle's v1.66 release of their OpenBSD BCrypt implementation:
for (int i = 0; i != sLength; i++){
isEqual &= (bcryptString.indexOf(i) == newBcryptString.indexOf(i));
}
where sLength is guaranteed 60 (see line 268), and bcryptString is a full openbsd-style bcrypt string, such as for example $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a.
The error is the used method: They meant to use charAt.
The intent of this loop is to check, for each position from 0 to 59, if the character at position i in a is the same character at position i in b.
But, due to erroneous use of indexOf(int), instead, this checks if the position of the first character with unicode i in a matches the position of the first character with unicode i in b, with 'not in string' matching 'not in string'.
Example: "Hello".indexOf(101) returns 1 (java is 0-based, 101 is the unicode of e, and e is the second character). "Hello".indexOf(0) returns -1 (because there is no unicode-0 in "Hello").
I'm trying to do the math to figure out the answer to following question:
If you try to log in for a given user by using an arbitrarily chosen password instead of the actual password (and let's posit that the odds that you so happen to choose the exact password are zero), what are the odds that this algorithm erroneously considers the arbitrarily chosen password as 'equal'?
Construction of the openbsd string
As far as I can tell, this: $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
breaks down as follows:
$2y$ - constant - it's a marker that means 'bcrypt string', pretty much.
10 - constant per server - number of rounds used; it's 10 almost everywhere, and will always have the same value for any given user's passhash.
$ - constant, again.
.vGA1O9wmRjrwAVXD98HNO (22 characters): The 16-byte salt padded with 2 zero bytes, then base-64ed, and then chuck out the last 2 characters. This can be used to reconstruct the salt.
the rest (31 characters): The result of bcrypt(rounds, concat(salt+utf8encode(pass))), base64-encoded, and toss out the last character.
Note that the base64 impl uses all lowercase letters, all uppercase letters, all digits, the dot, and the slash.
Basic realizations about the odds
The faulty algorithm will end up checking if the position of the first occurrence of all characters in unicode range 0 to 59 (inclusive) is the same for both hashes ("realhash" and "inhash"). If all 60 have the same 'pos of first occurrence' for both realhash and inhash, the algorithm considers the passwords equal.
Of all unicode symbols between 0 and 59, the only ones that could even be in this string are 0123456789$./.
However, of those, the $012 are irrelevant: For any passhash.indexOf('$'), the answer is 0. For any passhash.indexOf('1'), the answer is 4. Same goes for 0 and 2. That leaves just 9 characters that could possibly result in the algorithm saying "inhash" is not equal to "realhash": 3456789./.
To figure out what the odds are, we need for each of these 9 characters to not end up being a differentiator. To figure out what the odds are that one specific character (let's say the '3') fails to differentiate, then that's just 1-Z with Z being the odds that the '3' is enough to differentiate.
Z = Q * (U*A + (1-U)*B)
Q = '3' is not in the salt section of realhash. (first 22)
U = '3' is in the pass section of inhash. (last 31)
A = either '3' is not in the pass section of realhash, or it is, but its first occurrence is not in the same place as the first occurrence of '3' in inhash.
B = '3' is in the pass section of realhash.
A = 1- (V*W); V = '3' is in the pass section of realhash, W = provided '3' is in both realhash and passhash, its first occurrence is the same place.
Once Z has been determined, the odds that my arbitrary password results in this algorithm thinking it is correct, even when it isn't, is then defined by: Neither the '3', nor any of the other 8 characters was sufficient to differentiate. Thus: (1-Z)^9.
Z = Q * ( (U * (1 - (V * W))) + ((1-U) * (1-V)) )
Q = (63/64)^22 ~= 0.707184
U = 1-(63/64)^31 ~= 0.386269
V = 1-(63/64)^31 ~= 0.386269
W = 1/31 ~= 0.032258
(1 - (V * W)) ~= 0.987539
(1-U) * (1-V) ~= 0.376666
Z ~= 0.536131
Chance that the 3 fails to differentiate:
(1-Z) ~= 0.463868
Chance that all of 3456789./ fail:
(1-Z)^9 ~= 0.00099438
Thus, roughly 0.001: about one in a thousand odds that the algorithm says that 2 passwords are equal when they are not.
Am I missing anything significant?
NB: BouncyCastle's most recent public version has fixed this bug. CVE-2020-28052 tracks the problem).

For problems like this, a Monte Carlo simulation is a useful sanity check. The result I got from the simulation was 0.0044, about 4 times higher than the calculated result in the question. That seemed high to me, so I did some debugging to see where that result was coming from.
It turns out that the vast majority of the false matches are due to one very simple mechanism: the 22 character salt eliminates some of the characters-of-interest, and the remaining characters-of-interest do not appear in the rest of the hash.
As mentioned in the question, there are 9 characters-of-interest: 3456789./
If any of those appear in the salt, then the indexOf() for that character will match, and that character is no longer of interest. Monte Carlo shows that on average, 2.6 of the 9 characters appear in the salt, and are eliminated from consideration. That makes sense because the salt contains at most 22 of the base-64 characters, so about one third. Here's a sample run of the Monte Carlo simulation:
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139
9 3
The first column is the number of characters-of-interest that were eliminated by the salt. The second column is the number of times that occurred out of 1 million attempts. For example, in 3 out of a million attempts, the salt eliminated all 9 characters-of-interest, which then guarantees a false match.
In 139 out of a million attempts, the salt eliminated 8 of the 9 characters-of-interest. The remaining character then either needs to match in the two hash strings, or it needs to be absent from both hash strings. The odds that it will be absent are (63/64) ^ 62 = 0.377.
So we can augment the table of results like so (these are Monte Carlo results):
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139 53 0.386053
9 3 3 1.000000
The second to last line can be interpreted as follows: 8 characters-of-interest were eliminated by the salt in 139 out of 1 million attempts. Of the 139, 53 (or 38.6%) resulted in a match because the single remaining character-of-interest did not appear in the last 31 characters of either hash string.
Here are the full results:
0 35645 2 0.000070 0
1 156228 39 0.000250 5
2 283916 214 0.000757 25
3 281018 628 0.002237 64
4 166381 1056 0.006349 83
5 61024 1114 0.018260 68
6 13791 702 0.050936 32
7 1850 265 0.143467 7
8 139 53 0.386053 0
9 3 3 1.000000 0
false matches due to elimination and absence: 4076
false matches due to elimination, and matching: 284
total false matches: 4360 out of 1 million
The last column is the number of times one or more characters of interest matched indexes in the final 31 characters of the hash strings.
Mathematical Analysis of the Problem
There are two steps to analyzing the problem:
Compute the odds that the salt (22 characters) will eliminate characters of interest (koi1).
Compute the odds that the characters in the tail of the hash (the last 31 characters) either match (at the first occurrence), or don't occur.
(1): I'm using "koi" as the abbreviation since the spell checker thinks it's a fish, and will leave it alone.
The decision tree for the first step looks like this:
The column headers are the number of characters in the salt seen so far. The column footers are the divisor for that column. The row labels are the number of koi left.
Column 0 of the tree:
1/1 is the probability that 9 koi remain after 0 characters have been seen.
Column 1 of the tree:
55/64 is the probability that the first character of the salt is not a koi.
9/64 is the probability that the first character of the salt is a koi, leaving 8 koi.
Column 2 of the tree:
55*55/4096 is the probability that neither of the first two characters is a koi.
55*9/4096 is the probability of a non-koi followed by a koi, leaving 8 koi left.
9*56/4096 is the probability that the first character was a koi (leaving 8) followed by a non-koi.
9*8/4096 is the probability that the first two characters are both koi, leaving 7 koi.
The full decision tree has 23 columns (0 to 22 characters in the salt), and ten rows (9 to 0 koi remaining). The last column in the decision tree (shown below) gives the odds (out of a million) for that number of koi remaining after the salt has been checked.
9 35647
8 156071
7 283872
6 281006
5 166489
4 61078
3 13837
2 1861
1 134
0 4
The decision tree for the second step is a bit more complicated. At each of the 31 character positions, there are two characters to consider, the character in the real hash, and the character in the fake hash. Each is independently and randomly selected, so there are 64*64=4096 possibilities for each of the 31 character positions. There are four possible outcomes:
Neither character is a koi. The probability (64-k)/64 * (64-k)/64 where k is the number of koi left. The number of koi remaining is unchanged.
The character in the real hash is not a koi, but the character in the fake hash is a koi. The probability is (64-k)/64 * k/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash is not the exact same character. The probability is k/64 * 63/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash matches. The probability is k/64 * 1/64, and the number of koi is reduced by one.
The decision tree starting with 9 koi looks like this:
The biggest difference compared to the salt decision tree is the addition of the failure row. The matching can fail at any column. Once a failure occurs, that failure carries forward to the last column in the decision tree. For example, the 55*9 failure in the column 1 carries forward as 55*9 * 64*64 in the column 2 (because regardless of which two characters occur at the second position, the result is still failure). The probability that the fake hash will match the real hash for a given k (the "success" rate) is 1 - failures for that k.
Hence, for each value of k, we can multiply the odds for that k (from step 1) by the success rate for that k. The table below shows the results.
The first column is k, the number koi that remain after the salt.
The second column is the odds for that number of koi.
The third column is the number of matches that occur because none of the koi appear in the tail of either hash.
The fourth column is the number of matches that occur when one or more koi successfully match in the tail.
All values are out of one million.
9 35647 3 1
8 156071 40 6
7 283872 216 27
6 281006 628 63
5 166489 1074 85
4 61078 1117 67
3 13837 705 30
2 1861 260 7
1 134 51 1
0 4 4 0
Matches with no koi present in the tail of either hash: 4098
Matches where 1 or more koi match in the tail: 287
Total matches per million: 4385

smallest possible sum of the values of subarrays

I'm trying to understand question and solving it using java.
But first I'm not able to understand properly.
Here is the question:
You are given an array a of length n and an integer c.
The value of some array b of length k is the sum of its elements except for the smallest. For example, the value of the array [3, 1, 6, 5, 2] with c = 2 is 3 + 6 + 5 = 14.
Among all possible partitions of a into contiguous subarrays output the smallest possible sum of the values of these subarrays.
Input
The first line contains integers n and c (1 ≤ n, c ≤ 100 000).
The second line contains n integers ai (1 ≤ ai ≤ 109) — elements of a.
Output
Output a single integer — the smallest possible sum of values of these subarrays of some partition of a.
Examples
inputCopy
3 5
1 2 3
output
6
inputCopy
12 10
1 1 10 10 10 10 10 10 9 10 10 10
output
92
inputCopy
7 2
2 3 6 4 5 7 1
output
17
inputCopy
8 4
1 3 4 5 5 3 4 1
output
23
In the third example one of the optimal partitions is [2, 3], [6, 4, 5, 7], [1] with the values 3, 13 and 1 respectively.
My Understanding:
1) Partition is being being done within continuous numbers. Correct ?
2) What is the significance of Integer c in input ?
3) How is being done in third example ? I mean after having subarrays, How 13 came out from second subarray ?
Can anyone help me to understand the question ? I can write code myself.

Regex that blocks single digit numbers 2,3,4,5

I am pretty new to regular expressions and for some particular case need a regular expression which will block numbers:
2,3,4 and 5
... out of:
0 to 21
Specifically, this should block only single digit 2,3,4 and 5 and not 12,13,14,15 or 21 and 22 for that matter.
I tried [^\d2-5] but then its also blocking 12,13,14,15,21,20,22 which is not desired since only 4 numbers specifically 2,3,4 and 5 are to be blocked.
Any help on this will be really helpful.

For the range [0;21] excluding [2;5] range you can use the following:
^(?:[016789]|1\d|2[01])$
Demo
If you just need to exclude [2;5] range, then the following might suit you:
^(?:[016789]|[1-9]\d+)$
Demo

If I understand your question correctly, you can try finding those specific digits by delimiting them with word boundaries.
For instance:
String singleDigitsToBeBlocked = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21";
Pattern p = Pattern.compile("\\b[2-5]\\b");
Matcher m = p.matcher(singleDigitsToBeBlocked);
while (m.find()) {
System.out.printf("Blocked: %s%n", m.group());
}
Output
Blocked: 2
Blocked: 3
Blocked: 4
Blocked: 5

what about just checking for patterns that include everything EXCEPT those single digits?
(\b[0-9]{2,})|([01])|([6-9])

Is very simple
(?:1[0-9]|2[0-1]|[0-1]|[6-9])
You can test this here: http://www.regexr.com/39044

What was wrong with my logic on Java insertion sorts?

I had a question on a test - I got it wrong, but I don't know why?
Question: An array of integers is to be sorted from biggest to smallest using an insertion sort. Assume the array originally contains the following elements:
5 9 17 12 2 14
What will it look like after the third pass through the for loop?
17 9 5 12 2 14
17 12 9 5 2 14 - Correct answer
17 12 9 5 14 2
17 14 12 9 5 2
9 5 17 12 2 14
In an insertion sort, I thought the source array is untouched; I felt, at the third pass, the destination array would be incomplete.
How did the test arrive at this answer?

Basically, after n passes of insertion sort, the first n+1 elements are sorted in the correct order and the rest are untouched. As you can see in the alternatives, the correct answer is the only one that fulfills that. Every step is only inserting relative to the already sorted numbers, so each step one extra number is correctly sorted.
Step 0 (Original, 5 assumed sorted)
5 9 17 12 2 14
Step 1, takes the 9 and puts it in the correct place before 5 (result, 9 5 sorted)
9 5 17 12 2 14
Step 2, takes the 17 and puts it in the correct place before 9 (result 17 9 5 sorted)
17 9 5 12 2 14
Step 3, takes the 12 and puts it in the correct place after 17 and before 9 (result 17 12 9 5sorted)
17 12 9 5 2 14

Insertion sort is typically done in-place. After k iterations the first k + 1 elements are sorted, hence the answer you've shown above.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.