Regex that blocks single digit numbers 2,3,4,5 - java

I am pretty new to regular expressions and for some particular case need a regular expression which will block numbers:
2,3,4 and 5
... out of:
0 to 21
Specifically, this should block only single digit 2,3,4 and 5 and not 12,13,14,15 or 21 and 22 for that matter.
I tried [^\d2-5] but then its also blocking 12,13,14,15,21,20,22 which is not desired since only 4 numbers specifically 2,3,4 and 5 are to be blocked.
Any help on this will be really helpful.

For the range [0;21] excluding [2;5] range you can use the following:
^(?:[016789]|1\d|2[01])$
Demo
If you just need to exclude [2;5] range, then the following might suit you:
^(?:[016789]|[1-9]\d+)$
Demo

If I understand your question correctly, you can try finding those specific digits by delimiting them with word boundaries.
For instance:
String singleDigitsToBeBlocked = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21";
Pattern p = Pattern.compile("\\b[2-5]\\b");
Matcher m = p.matcher(singleDigitsToBeBlocked);
while (m.find()) {
System.out.printf("Blocked: %s%n", m.group());
}
Output
Blocked: 2
Blocked: 3
Blocked: 4
Blocked: 5

what about just checking for patterns that include everything EXCEPT those single digits?
(\b[0-9]{2,})|([01])|([6-9])

Is very simple
(?:1[0-9]|2[0-1]|[0-1]|[6-9])
You can test this here: http://www.regexr.com/39044

Related

Check my math: The bouncycastle issue: Odds 2 non-equal passwords considered equal

This is the 'check if hashes are equal' code of BouncyCastle's v1.66 release of their OpenBSD BCrypt implementation:
for (int i = 0; i != sLength; i++){
isEqual &= (bcryptString.indexOf(i) == newBcryptString.indexOf(i));
}
where sLength is guaranteed 60 (see line 268), and bcryptString is a full openbsd-style bcrypt string, such as for example $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a.
The error is the used method: They meant to use charAt.
The intent of this loop is to check, for each position from 0 to 59, if the character at position i in a is the same character at position i in b.
But, due to erroneous use of indexOf(int), instead, this checks if the position of the first character with unicode i in a matches the position of the first character with unicode i in b, with 'not in string' matching 'not in string'.
Example: "Hello".indexOf(101) returns 1 (java is 0-based, 101 is the unicode of e, and e is the second character). "Hello".indexOf(0) returns -1 (because there is no unicode-0 in "Hello").
I'm trying to do the math to figure out the answer to following question:
If you try to log in for a given user by using an arbitrarily chosen password instead of the actual password (and let's posit that the odds that you so happen to choose the exact password are zero), what are the odds that this algorithm erroneously considers the arbitrarily chosen password as 'equal'?
Construction of the openbsd string
As far as I can tell, this: $2y$10$.vGA1O9wmRjrwAVXD98HNOgsNpDczlqm3Jq7KnEd1rVAGv3Fykk1a
breaks down as follows:
$2y$ - constant - it's a marker that means 'bcrypt string', pretty much.
10 - constant per server - number of rounds used; it's 10 almost everywhere, and will always have the same value for any given user's passhash.
$ - constant, again.
.vGA1O9wmRjrwAVXD98HNO (22 characters): The 16-byte salt padded with 2 zero bytes, then base-64ed, and then chuck out the last 2 characters. This can be used to reconstruct the salt.
the rest (31 characters): The result of bcrypt(rounds, concat(salt+utf8encode(pass))), base64-encoded, and toss out the last character.
Note that the base64 impl uses all lowercase letters, all uppercase letters, all digits, the dot, and the slash.
Basic realizations about the odds
The faulty algorithm will end up checking if the position of the first occurrence of all characters in unicode range 0 to 59 (inclusive) is the same for both hashes ("realhash" and "inhash"). If all 60 have the same 'pos of first occurrence' for both realhash and inhash, the algorithm considers the passwords equal.
Of all unicode symbols between 0 and 59, the only ones that could even be in this string are 0123456789$./.
However, of those, the $012 are irrelevant: For any passhash.indexOf('$'), the answer is 0. For any passhash.indexOf('1'), the answer is 4. Same goes for 0 and 2. That leaves just 9 characters that could possibly result in the algorithm saying "inhash" is not equal to "realhash": 3456789./.
To figure out what the odds are, we need for each of these 9 characters to not end up being a differentiator. To figure out what the odds are that one specific character (let's say the '3') fails to differentiate, then that's just 1-Z with Z being the odds that the '3' is enough to differentiate.
Z = Q * (U*A + (1-U)*B)
Q = '3' is not in the salt section of realhash. (first 22)
U = '3' is in the pass section of inhash. (last 31)
A = either '3' is not in the pass section of realhash, or it is, but its first occurrence is not in the same place as the first occurrence of '3' in inhash.
B = '3' is in the pass section of realhash.
A = 1- (V*W); V = '3' is in the pass section of realhash, W = provided '3' is in both realhash and passhash, its first occurrence is the same place.
Once Z has been determined, the odds that my arbitrary password results in this algorithm thinking it is correct, even when it isn't, is then defined by: Neither the '3', nor any of the other 8 characters was sufficient to differentiate. Thus: (1-Z)^9.
Z = Q * ( (U * (1 - (V * W))) + ((1-U) * (1-V)) )
Q = (63/64)^22 ~= 0.707184
U = 1-(63/64)^31 ~= 0.386269
V = 1-(63/64)^31 ~= 0.386269
W = 1/31 ~= 0.032258
(1 - (V * W)) ~= 0.987539
(1-U) * (1-V) ~= 0.376666
Z ~= 0.536131
Chance that the 3 fails to differentiate:
(1-Z) ~= 0.463868
Chance that all of 3456789./ fail:
(1-Z)^9 ~= 0.00099438
Thus, roughly 0.001: about one in a thousand odds that the algorithm says that 2 passwords are equal when they are not.
Am I missing anything significant?
NB: BouncyCastle's most recent public version has fixed this bug. CVE-2020-28052 tracks the problem).
For problems like this, a Monte Carlo simulation is a useful sanity check. The result I got from the simulation was 0.0044, about 4 times higher than the calculated result in the question. That seemed high to me, so I did some debugging to see where that result was coming from.
It turns out that the vast majority of the false matches are due to one very simple mechanism: the 22 character salt eliminates some of the characters-of-interest, and the remaining characters-of-interest do not appear in the rest of the hash.
As mentioned in the question, there are 9 characters-of-interest: 3456789./
If any of those appear in the salt, then the indexOf() for that character will match, and that character is no longer of interest. Monte Carlo shows that on average, 2.6 of the 9 characters appear in the salt, and are eliminated from consideration. That makes sense because the salt contains at most 22 of the base-64 characters, so about one third. Here's a sample run of the Monte Carlo simulation:
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139
9 3
The first column is the number of characters-of-interest that were eliminated by the salt. The second column is the number of times that occurred out of 1 million attempts. For example, in 3 out of a million attempts, the salt eliminated all 9 characters-of-interest, which then guarantees a false match.
In 139 out of a million attempts, the salt eliminated 8 of the 9 characters-of-interest. The remaining character then either needs to match in the two hash strings, or it needs to be absent from both hash strings. The odds that it will be absent are (63/64) ^ 62 = 0.377.
So we can augment the table of results like so (these are Monte Carlo results):
0 35645
1 156228
2 283916
3 281018
4 166381
5 61024
6 13791
7 1850
8 139 53 0.386053
9 3 3 1.000000
The second to last line can be interpreted as follows: 8 characters-of-interest were eliminated by the salt in 139 out of 1 million attempts. Of the 139, 53 (or 38.6%) resulted in a match because the single remaining character-of-interest did not appear in the last 31 characters of either hash string.
Here are the full results:
0 35645 2 0.000070 0
1 156228 39 0.000250 5
2 283916 214 0.000757 25
3 281018 628 0.002237 64
4 166381 1056 0.006349 83
5 61024 1114 0.018260 68
6 13791 702 0.050936 32
7 1850 265 0.143467 7
8 139 53 0.386053 0
9 3 3 1.000000 0
false matches due to elimination and absence: 4076
false matches due to elimination, and matching: 284
total false matches: 4360 out of 1 million
The last column is the number of times one or more characters of interest matched indexes in the final 31 characters of the hash strings.
Mathematical Analysis of the Problem
There are two steps to analyzing the problem:
Compute the odds that the salt (22 characters) will eliminate characters of interest (koi1).
Compute the odds that the characters in the tail of the hash (the last 31 characters) either match (at the first occurrence), or don't occur.
(1): I'm using "koi" as the abbreviation since the spell checker thinks it's a fish, and will leave it alone.
The decision tree for the first step looks like this:
The column headers are the number of characters in the salt seen so far. The column footers are the divisor for that column. The row labels are the number of koi left.
Column 0 of the tree:
1/1 is the probability that 9 koi remain after 0 characters have been seen.
Column 1 of the tree:
55/64 is the probability that the first character of the salt is not a koi.
9/64 is the probability that the first character of the salt is a koi, leaving 8 koi.
Column 2 of the tree:
55*55/4096 is the probability that neither of the first two characters is a koi.
55*9/4096 is the probability of a non-koi followed by a koi, leaving 8 koi left.
9*56/4096 is the probability that the first character was a koi (leaving 8) followed by a non-koi.
9*8/4096 is the probability that the first two characters are both koi, leaving 7 koi.
The full decision tree has 23 columns (0 to 22 characters in the salt), and ten rows (9 to 0 koi remaining). The last column in the decision tree (shown below) gives the odds (out of a million) for that number of koi remaining after the salt has been checked.
9 35647
8 156071
7 283872
6 281006
5 166489
4 61078
3 13837
2 1861
1 134
0 4
The decision tree for the second step is a bit more complicated. At each of the 31 character positions, there are two characters to consider, the character in the real hash, and the character in the fake hash. Each is independently and randomly selected, so there are 64*64=4096 possibilities for each of the 31 character positions. There are four possible outcomes:
Neither character is a koi. The probability (64-k)/64 * (64-k)/64 where k is the number of koi left. The number of koi remaining is unchanged.
The character in the real hash is not a koi, but the character in the fake hash is a koi. The probability is (64-k)/64 * k/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash is not the exact same character. The probability is k/64 * 63/64, and the outcome is failure.
The character in the real hash is a koi, and the character in the fake hash matches. The probability is k/64 * 1/64, and the number of koi is reduced by one.
The decision tree starting with 9 koi looks like this:
The biggest difference compared to the salt decision tree is the addition of the failure row. The matching can fail at any column. Once a failure occurs, that failure carries forward to the last column in the decision tree. For example, the 55*9 failure in the column 1 carries forward as 55*9 * 64*64 in the column 2 (because regardless of which two characters occur at the second position, the result is still failure). The probability that the fake hash will match the real hash for a given k (the "success" rate) is 1 - failures for that k.
Hence, for each value of k, we can multiply the odds for that k (from step 1) by the success rate for that k. The table below shows the results.
The first column is k, the number koi that remain after the salt.
The second column is the odds for that number of koi.
The third column is the number of matches that occur because none of the koi appear in the tail of either hash.
The fourth column is the number of matches that occur when one or more koi successfully match in the tail.
All values are out of one million.
9 35647 3 1
8 156071 40 6
7 283872 216 27
6 281006 628 63
5 166489 1074 85
4 61078 1117 67
3 13837 705 30
2 1861 260 7
1 134 51 1
0 4 4 0
Matches with no koi present in the tail of either hash: 4098
Matches where 1 or more koi match in the tail: 287
Total matches per million: 4385

Print triangle of numbers in java

I've been having some trouble with a homework assignment for my Java class. In it, we're supposed to take in an integer between 1 and 13 and display three different triangles consisting of numbers. For example, if I were to enter 5, the result would be:
Triangle 1
1
2 3
4 5 6
7 8 9 10
11 12 13 14 15
Triangle 2
1
2 6
3 7 10
4 8 11 13
5 9 12 14 15
Triangle 3
5
4 9
3 8 12
2 7 11 14
1 6 10 13 15
I've already got the first Triangle going fine, but my big concern is the second triangle. I haven't attempted the third one yet. The other thing is that my Professor is picky about what method we use in creating the project. In other words, we can only use what he has taught us. He told us to use the System.out.printf("%3d", n) statement to space out the characters and we have to create them within a separate class.
The code for the first triangle is as follows:
void triangle1(int n)
{
int k = 1;
for (int i = 1; i <= n; i++)
{
for (int j = 0; j < 1; j++)
{
System.out.printf("%3d", n);
k += 1;
}
System.out.println();
}
}
So, pretty much, I need to follow that standard to create the other two triangles, but I'm really stuck on the second one and I don't know where to start. Any help will be much appreciated!
Here is the way I would approach it.
Programs print one line at a time, you cannot print half a line then start to print another line.
With that being said, you should recognize the pattern in the triangles.
1
2 6
3 7 10
4 8 11 13
5 9 12 14 15
You have the first number n, then you see the next row starts with n + 1. The next number starts in the row is (n + 1) + t where t = 4. There is a pattern there.
The third row follows the same pattern.
The first number is (n + 1) then you can calculate the others by + (t - 1)
This can be done with a for loop, like you did in the first time.
For the last triangle you can use the same process, just change the signs and t would equal something different.
Algorithm writing is all about identifying patterns.
If you look closely, you'll see there's a repeating pattern between each number and the one that follows on a given line.
3 7 10 => [3 & 7 differ by 4][7 & 10 differ by 3]
4 8 11 13 => [4 & 8 differ by 4][8 & 11 differ by 3][11 & 13 differ by 2]
5 9 12 14 15 => [... differ by 4][... by 3][... by 2][... by 1]
You can use that information to make the second triangle. I'll leave the rest to you. I hope that helps!
Seems you are a CS-Student, thus I will not present a finished solution. I'll give you some hints how I would solve this.
This is what the print statement has to do:
for i=1 j=0 print 1
for i=2 j=0 print 2
for i=2 j=1 print 6
for i=3 j=0 print 3
for i=3 j=1 print 7
for i=3 j=2 print 10
Find a formula that calculates the correct output from i and j, its a simple linear combination.

Multi Thread Processing in Java

Im just having a few issues getting my head around this problem. Any help would be greatly appreciated.
The program must read a text file, where it will compute the sum of divisors of each input number. For example, number 20 has the sum 1+2+4+5+10=22. Then these sums are summed up line-by-line. For each of these sums above the divisor is then found, and then finally they are totaled up.
E.g Initial File
1 2 4 6 15 20
25 50 100 125 250 500
16 8 3
Then computes the sum of divisors.
1 1 3 6 9 22
6 43 117 31 218 592
15 7 1
Summed up line by line
42
1007
23
Then above sums are computed.
54
73
1
Then finally totaled up and returned.
128
I need to complete the process with each new line being completed by a threadpool.
My logic is as follows.
5.2. For each input line (Add each line to an ArrayBlockingQueue,
Then add each item in the Queue to an ExecutorService Which will run the follow)
5.2.1. Parse the current input line into integers
5.2.2. For each integer in the current input line
5.2.2.1. Compute the sum-of-divisors of this integer
5.2.2.2. Add this to the cumulated sum-of-divisors
5.2.3. Compute the sum-of-divisors of this cumulated sum
5.2.4. Add this to the grand total
I get stuck after 5.2, Do I either create a new class that implements the runnable interface and then adds the cumulated sum to an atomicArray, or is best to create a class that implements the callable interface and then get it to return the cumulated sum? Or is there a completely different way.
Here is what i have so far which returns the desired result but in a sequential matter.
http://pastebin.com/AyB58fpr
Use
java.util.concurrent.Future
and a
java.util.concurrent.Executors.newFixedThreadPool(int nThreads)
This will be really easy to do.
Follow the Oracle tutorial if you are not familiar with Executors.
I prefer the Callable interface since that doesn't create a dependency of the code which processes the input to how the output is gathered.
The usual approach is to collect the tasks in a list of Futures. See this answer for an example.

How to find a cycle/repeats in a string?

I need to detect a cycle/sequence in a string and return the first occurrence. How should I go about doing it?
Example :
2 0 5 3 1 5 3 1 5 3 1
The first sequence to occur is 5 3 1.
There is no rules. The sequence can be half the string length, for example
5 3123 1231 231 31 231 41 452 3453 21 312312 5 3123 1231 231 31 231 41 452 3453 21 312312
The sequence is 5 3123 1231 231 31 231 41 452 3453 21 312312
Have you studied Floyds cycle-finding algorithm? That may well help you if you want to find cycles. Very easy to implement as well.
Clarification based on the comments: cycle means a sequence of digits which repeats immediately. So
1 1
would be a cycle
1 3 1
wouldn't because the potential cycle of 1s is interupted by 3
1 3 1 3
is a cycle (1 3).
So a basic algorithm could look like this.
Iterate of the String.
For each Digit find it next occurrence in the String. If nothing found continue with the next character.
If a next occurrence is found compare the sequence from the current digit up to the following occurrence with the sequence of same length beginning at the next occurence. If they are the same you found a cycle. If not continue with the next occurence.

What was wrong with my logic on Java insertion sorts?

I had a question on a test - I got it wrong, but I don't know why?
Question: An array of integers is to be sorted from biggest to smallest using an insertion sort. Assume the array originally contains the following elements:
5 9 17 12 2 14
What will it look like after the third pass through the for loop?
17 9 5 12 2 14
17 12 9 5 2 14 - Correct answer
17 12 9 5 14 2
17 14 12 9 5 2
9 5 17 12 2 14
In an insertion sort, I thought the source array is untouched; I felt, at the third pass, the destination array would be incomplete.
How did the test arrive at this answer?
Basically, after n passes of insertion sort, the first n+1 elements are sorted in the correct order and the rest are untouched. As you can see in the alternatives, the correct answer is the only one that fulfills that. Every step is only inserting relative to the already sorted numbers, so each step one extra number is correctly sorted.
Step 0 (Original, 5 assumed sorted)
5 9 17 12 2 14
Step 1, takes the 9 and puts it in the correct place before 5 (result, 9 5 sorted)
9 5 17 12 2 14
Step 2, takes the 17 and puts it in the correct place before 9 (result 17 9 5 sorted)
17 9 5 12 2 14
Step 3, takes the 12 and puts it in the correct place after 17 and before 9 (result 17 12 9 5sorted)
17 12 9 5 2 14
Insertion sort is typically done in-place. After k iterations the first k + 1 elements are sorted, hence the answer you've shown above.

Categories