Related
I need regular expressions to match the below cases.
3 or more consecutive sequential characters/numbers; e.g. 123, abc, 789, pqr, etc.
3 or more consecutive identical characters/numbers; e.g. 111, aaa, bbb, 222, etc.
I don't think you can (easily) use regex for the first case. The second case is easy though:
Pattern pattern = Pattern.compile("([a-z\\d])\\1\\1", Pattern.CASE_INSENSITIVE);
Since \\1 represents part matched by group 1 this will match any sequence of three identical characters that are either within the range a-z or are digits (\d).
Update
To be clear, you can use regex for the first case. However, the pattern is so laborious and ridiculously convoluted that you are better off not doing it at all. Especially if you wanted to REALLY cover all the alphabet. In that case you should probably generate the pattern programmatically by iterating the char codes of the Unicode charset or something like that and generate groupings for every three consecutive characters. However, you should realize that by having generated such a large decision tree for the pattern matcher, the marching performance is bound to suffer (O(n) where n is the number of groups which is the size of the Unicode charset minus 2).
I disagree, case 1 is possible to regex, but you have to tell it the sequences to match... which is kind of long and boring:
/(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|012|123|234|345|456|567|678|789)+/ig
http://regexr.com/3dqln
for the second question:
\\b([a-zA-Z0-9])\\1\\1+\\b
explanation:
\\b : zero-length word boundary
( : start capture group 1
[a-zA-Z0-9] : a letter or a digit
) : end group
\\1 : same character as group 1
\\1+ : same character as group 1 one or more times
\\b : zero-length word boundary
To my knowledge, the first case is indeed not possible. The regex engine doesn't know anything about the order of the natural numbers or the alphabet. But it's at least possible to differentiate between 3 or more numbers and 3 or more letters, for example:
[a-z]{3,}|[A-Z]{3,}|\d{3,}
This matches abcd, ABCDE or 123 but doesn't match ab2d, A5c4 or 12z, for example. According to this, the second case can be correctly given in a shorter version as:
(\w)\1{2,}
3 or more consecutive sequential characters/numbers ex - 123, abc, 789, pqr etc.
Not possible with regular expressions.
3 or more consecutive identical characters/numbers ex - 111, aaa, bbb. 222 etc.
Use a pattern of (?i)(?:([a-z0-9])\\1{2,})*.
If you want to check the whole string, use Matcher.matches(). To find matches within a string, use Matcher.find().
Here's some sample code:
final String ps = "(?i)(?:([a-z0-9])\\1{2,})*";
final String psLong =
"(?i)\t\t\t# Case insensitive flag\n"
+ "(?:\t\t\t\t# Begin non-capturing group\n"
+ " (\t\t\t\t# Begin capturing group\n"
+ " [a-z0-9]\t\t# Match an alpha or digit character\n"
+ " )\t\t\t\t# End capturing group\n"
+ " \\1\t\t\t\t# Back-reference first capturing group\n"
+ " {2,}\t\t\t# Match previous atom 2 or more times\n"
+ ")\t\t\t\t# End non-capturing group\n"
+ "*\t\t\t\t# Match previous atom zero or more characters\n";
System.out.println("***** PATTERN *****\n" + ps + "\n" + psLong
+ "\n");
final Pattern p = Pattern.compile(ps);
for (final String s : new String[] {"aa", "11", "aaa", "111",
"aaaaaaaaa", "111111111", "aaa111bbb222ccc333",
"aaaaaa111111bbb222"})
{
final Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println("Success: " + s);
} else {
System.out.println("Fail: " + s);
}
}
And the output is:
***** PATTERN *****
(?i)(?:([a-z0-9])\1{2,})*
(?i) # Case insensitive flag
(?: # Begin non-capturing group
( # Begin capturing group
[a-z0-9] # Match an alpha or digit character
) # End capturing group
\1 # Back-reference first capturing group
{2,} # Match previous atom 2 or more times
) # End non-capturing group
* # Match previous atom zero or more characters
Fail: aa
Fail: 11
Success: aaa
Success: 111
Success: aaaaaaaaa
Success: 111111111
Success: aaa111bbb222ccc333
Success: aaaaaa111111bbb222
Regex to match three consecutive numbers or alphabets is
"([0-9]|[aA-zZ])\1\1"
Thanks All for helping me.
For the first case - 3 or more consecutive sequential characters/numbers; e.g. 123, abc, 789, pqr, etc. I used below code logic. Pls share your comments on this.
public static boolean validateConsecutiveSeq(String epin) {
char epinCharArray[] = epin.toCharArray();
int asciiCode = 0;
boolean isConSeq = false;
int previousAsciiCode = 0;
int numSeqcount = 0;
for (int i = 0; i < epinCharArray.length; i++) {
asciiCode = epinCharArray[i];
if ((previousAsciiCode + 1) == asciiCode) {
numSeqcount++;
if (numSeqcount >= 2) {
isConSeq = true;
break;
}
} else {
numSeqcount = 0;
}
previousAsciiCode = asciiCode;
}
return isConSeq;
}
If you have lower bound (3) and upper bound regexString can be generated as follows
public class RegexBuilder {
public static void main(String[] args) {
StringBuilder sb = new StringBuilder();
int seqStart = 3;
int seqEnd = 5;
buildRegex(sb, seqStart, seqEnd);
System.out.println(sb);
}
private static void buildRegex(StringBuilder sb, int seqStart, int seqEnd) {
for (int i = seqStart; i <= seqEnd; i++) {
buildRegexCharGroup(sb, i, '0', '9');
buildRegexCharGroup(sb, i, 'A', 'Z');
buildRegexCharGroup(sb, i, 'a', 'z');
buildRegexRepeatedString(sb, i);
}
}
private static void buildRegexCharGroup(StringBuilder sb, int seqLength,
char start, char end) {
for (char c = start; c <= end - seqLength + 1; c++) {
char ch = c;
if (sb.length() > 0) {
sb.append('|');
}
for (int i = 0; i < seqLength; i++) {
sb.append(ch++);
}
}
}
private static void buildRegexRepeatedString(StringBuilder sb, int seqLength) {
sb.append('|');
sb.append("([a-zA-Z\\d])");
for (int i = 1; i < seqLength; i++) {
sb.append("\\1");
}
}
}
Output
012|123|234|345|456|567|678|789|ABC|BCD|CDE|DEF|EFG|FGH|GHI|HIJ|IJK|JKL|KLM|LMN|MNO|NOP|OPQ|PQR|QRS|RST|STU|TUV|UVW|VWX|WXY|XYZ|abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|([a-z\d])\1\1|0123|1234|2345|3456|4567|5678|6789|ABCD|BCDE|CDEF|DEFG|EFGH|FGHI|GHIJ|HIJK|IJKL|JKLM|KLMN|LMNO|MNOP|NOPQ|OPQR|PQRS|QRST|RSTU|STUV|TUVW|UVWX|VWXY|WXYZ|abcd|bcde|cdef|defg|efgh|fghi|ghij|hijk|ijkl|jklm|klmn|lmno|mnop|nopq|opqr|pqrs|qrst|rstu|stuv|tuvw|uvwx|vwxy|wxyz|([a-z\d])\1\1\1|01234|12345|23456|34567|45678|56789|ABCDE|BCDEF|CDEFG|DEFGH|EFGHI|FGHIJ|GHIJK|HIJKL|IJKLM|JKLMN|KLMNO|LMNOP|MNOPQ|NOPQR|OPQRS|PQRST|QRSTU|RSTUV|STUVW|TUVWX|UVWXY|VWXYZ|abcde|bcdef|cdefg|defgh|efghi|fghij|ghijk|hijkl|ijklm|jklmn|klmno|lmnop|mnopq|nopqr|opqrs|pqrst|qrstu|rstuv|stuvw|tuvwx|uvwxy|vwxyz|([a-z\d])\1\1\1\1
All put together:
([a-zA-Z0-9])\1\1+|(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|012|123|234|345|456|567|678|789)+
3 or more consecutive sequential characters/numbers; e.g. 123, abc, 789, pqr, etc.
(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|012|123|234|345|456|567|678|789)+
3 or more consecutive identical characters/numbers; e.g. 111, aaa, bbb, 222, etc.
([a-zA-Z0-9])\1\1+
https://regexr.com/4727n
This also works:
(?:(?:0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9)){2,}\d|(?:a(?=b)|b(?=c)|c(?=d)|d(?=e)|e(?=f)|f(?=g)|g(?=h)|h(?=i)|i(?=j)|j(?=k)|k(?=l)|l(?=m)|m(?=n)|n(?=o)|o(?=p)|p(?=q)|q(?=r)|r(?=s)|s(?=t)|t(?=u)|u(?=v)|v(?=w)|w(?=x)|x(?=y)|y(?=z)){2,}[[:alpha:]])|([a-zA-Z0-9])\1\1+
https://regex101.com/r/6fXC9u/1
for the first question this works if you're ok with less regex
containsConsecutiveCharacters(str) {
for (let i = 0; i <= str.length - 3; i++) {
var allthree = str[i] + str[i + 1] + str[i + 2];
let s1 = str.charCodeAt(i);
let s2 = str.charCodeAt(i + 1);
let s3 = str.charCodeAt(i + 2);
if (
/[a-zA-Z]+$/.test(allthree) &&
(s1 < s2 && s2 < s3 && s1+s2+s3-(3*s1) === 3)
) {
return true;
}
}
}
3 or more consecutive sequential characters/numbers; e.g. 123, abc, 789, pqr, etc.
(?:(?:0(?=1)|1(?=2)|2(?=3)|3(?=4)|4(?=5)|5(?=6)|6(?=7)|7(?=8)|8(?=9)){2,}\d|(?:a(?=b)|b(?=c)|c(?=d)|d(?=e)|e(?=f)|f(?=g)|g(?=h)|h(?=i)|i(?=j)|j(?=k)|k(?=l)|l(?=m)|m(?=n)|n(?=o)|o(?=p)|p(?=q)|q(?=r)|r(?=s)|s(?=t)|t(?=u)|u(?=v)|v(?=w)|w(?=x)|x(?=y)|y(?=z)){2,}[\p{Alpha}])
https://regex101.com/r/5IragF/1
3 or more consecutive identical characters/numbers; e.g. 111, aaa, bbb, 222, etc.
([\p{Alnum}])\1{2,}
https://regex101.com/r/VEHoI9/1
All put together:
([a-zA-Z0-9])\1\1+|(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|012|123|234|345|456|567|678|789)+
3 or more consecutive sequential characters/numbers; e.g. 123, abc, 789, pqr, etc.
(abc|bcd|cde|def|efg|fgh|ghi|hij|ijk|jkl|klm|lmn|mno|nop|opq|pqr|qrs|rst|stu|tuv|uvw|vwx|wxy|xyz|012|123|234|345|456|567|678|789)+
3 or more consecutive identical characters/numbers; e.g. 111, aaa, bbb, 222, etc.
([a-zA-Z0-9])\1\1+
https://regexr.com/4727n
For case #2 I got inspired by a sample on regextester and created the following regex to match n identical digits (to check for both numbers and letters replace 0-9 with A-Za-z0-9):
const n = 3
const identicalAlphanumericRegEx = new RegExp("([0-9])" + "\\1".repeat(n - 1))
I was discussing this with a coworker and we think we have a good solution for #1.
To check for abc or bcd or ... or 012 or 123 or even any number of sequential characters, try:
.*((a(?=b))|(?:b(?=c))|(?:c(?=d))|(?:d(?=e))|(?:e(?=f))|(?:f(?=g))|(?:g(?=h))|(?:h(?=i))|(?:i(?=j))|(?:j(?=k))|(?:k(?=l))|(?:l(?=m))|(?:m(?=n))|(?:n(?=o))|(?:o(?=p))|(?:p(?=q))|(?:q(?=r))|(?:r(?=s))|(?:s(?=t))|(?:t(?=u))|(?:u(?=v))|(?:v(?=w))|(?:w(?=x))|(?:x(?=y))|(?:y(?=z))|(?:0(?=1))|(?:1(?=2))|(?:2(?=3))|(?:3(?=4))|(?:4(?=5))|(?:5(?=6))|(?:6(?=7))|(?:7(?=8))|(?:8(?=9))){2,}.*
The nice thing about this solution is if you want more than 3 consecutive characters, increase the {2,} to be one less than what you want to check for.
the ?: in each group prevents the group from being captured.
Try this for the first question.
returns true if it finds 3 consecutive numbers or alphabets in the arg
function check(val){
for (i = 0; i <= val.length - 3; i++) {
var s1 = val.charCodeAt(i);
var s2 = val.charCodeAt(i + 1);
var s3 = val.charCodeAt(i + 2);
if (Math.abs(s1 - s2) === 1 && s1 - s2 === s2 - s3) {
return true;
}
}
return false;
}
console.log(check('Sh1ak#ki1r#100'));
I'm trying to parse input such as this:
VAR1: 7, VAR2: [1,2,3], VAR3: value1=1,value2=2, TIMEZONE: GMT+5, TIME: 17:15:00
into a Map:
{VAR1=7, VAR2=[1,2,3], VAR3=value1=1,value2=2, TIMEZONE=GMT, TIME=17:15:00}
So variables are separated by commas(,) and their values come after colon(:). They're not always in caps, I wrote them like this to make it more obvious which are names of variables and which are values. Also, whitespace can appear anywhere anywhere around names or in values.
Problem is that commas can appear in values like in VAR2 or VAR3 and colons can appear in variables like TIME.
I tried splitting string like this to get values out:
final String regex = ",?\\s*(\\w+)\\s*:\\s*";
final String[] values = inputString.split(regex);
and it works as long as inputString doesn't contain any time variables with colons in its value. Otherwise it returns this as values:
[, 7, [1,2,3], value1=1,value2=2, GMT+5, , , 00]
instead of:
[7, [1,2,3], value1=1,value2=2, GMT+5, 17:15:00]
I suspect that it matches the last colon in TIME rather than the first one located after variable's name separating it from its value.
I tried using reluctant quantifier for colon ",?\s*(\w+)\s*:?\s" but this returned:
[, :, , : [, , , ], :, =, , =, , :, +, , :, :, :]
Which is nonsense.
I would appreciate any ideas to improve regex.
Assuming that a variable name cannot start with a digit the colons in the date/time are not a problem. I have more issues with the commas in the values.
Here's how I solved the problem:
String input = "VAR1: 7, VAR2: [1,2,3], VAR3: value1=1,value2=2, TIMEZONE: GMT+5, TIME: 17:15:00";
Pattern re = Pattern.compile(
"^\\s*(\\p{Alpha}\\p{Alnum}*)\\s*:\\s*(\\S*)(?:,\\s*(\\p{Alpha}\\p{Alnum}*\\s*:.*))?$");
Matcher matcher = re.matcher(input);
while (matcher.matches()) {
String name = matcher.group(1);
String value = matcher.group(2);
String tail = matcher.group(3);
System.out.println(name + ": " + value);
if (tail == null) {
break;
}
matcher = re.matcher(tail);
}
Result:
VAR1: 7
VAR2: [1,2,3]
VAR3: value1=1,value2=2
TIMEZONE: GMT+5
TIME: 17:15:00
UPDATE:
It also works with:
Pattern re = Pattern.compile(
"^\\s*(\\w+)\\s*:\\s*(\\S*)(?:,\\s*(\\w+\\s*:.*))?\\s*$");
Possible solution (online test):
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(.+?):\\s?(.+?)(?:,\\W|$)";
final String string = "VAR1: 7, VAR2: [1,2,3], VAR3: value1 =1,value2=2, TIMEZONE: GMT+5, TIME: 17:15:00";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
Just collect the results in a map to obtain what you asked for
Regex explanation:
(.+?): Captures your keys (example: VAR1)
:: Captures the : symbol literally
\s?: Captures an optional space
(.+?): Captures your values (example: 7)
(?:,\\W|$): Captures a comma followed by a space (these two symbols together are our actual separator) OR the end of the string
Basically I would like to split a string into an array delimiting by spaces and operators, but keep the operators while removing the spaces
ex. 3 52 9+- 2 3 * /
will be [3][52][9][+][-][2][3][*][/]
The logic you want when splitting is to consume delimiters which are whitespace and to not consume delimiters which are arithmetic symbols. Towards this end, we can use a lookahead to split by symbol, and use plain \\s to split by whitespace and remove it from the result.
String input = "3 52 9+- 2 3 * /";
input = input.replaceAll("([\\+\\-*/])(.)", " $1$2")
.replaceAll("\\s+", " ");
String[] parts = input.split("(?<=[\+\-*/])|\\s")
System.out.println(Arrays.toString(parts));
Output:
[3, 52, 9, +, -, 2, 3, *, /]
import java.util.ArrayList;
import java.util.List;
public class Test {
public static void main(String[] args) {
String input = "3 52 9+- 2 3 * /";
input = input.replaceAll("([\\+\\-*/])", " $1 ").replaceAll("\\s+", " ");
String[] parts = input.split("(?<=[+\\-*/ ])");
List<String> finalList = new ArrayList<String>();
for(String part : parts) {
if(part.trim().length() > 0) {
finalList.add(part);
}
}
System.out.println(finalList);
}
}
Output
[3 , 52 , 9 , +, -, 2 , 3 , *, /]
Try this regex:
([\-]?[^\s\+\-\*\/]+|[\+\-\*\/])
It will select:
[\-]? signed or unsigned
[^\s\+\-\*\/] characters that is neither spaces nor [+ - * /]
or [\+\-\*\/] [+ - * /]
Just match your case.
I am trying to make a simple calculator application that would take a string like this
5 + 4 + 3 - 2 - 10 + 15
I need Java to parse this string into an array
{5, +4, +3, -2, -10, +15}
Assume the user may enter 0 or more spaces between each number and each operator
I'm new to Java so I'm not entirely sure how to accomplish this.
You can use Integer.parseInt to get the values, splitting the string you can achieve with String class. A regex could work, but I dont know how to do those :3
Take a look at String.split():
String str = "1 + 2";
System.out.println(java.util.Arrays.toString(str.split(" ")));
[1, +, 2]
Note that split uses regular expressions, so you would have to quote the character to split by "." or similar characters with special meanings. Also, multiple spaces in a row will create empty strings in the parse array which you would need to skip.
This solves the simple example. For more rigorous parsing of true expressions you would want to create a grammar and use something like Antlr.
Let str be your line buffer.
Use Regex.match for pattern ([-+]?[ \t]*[0-9]+).
Accumulate all matches into String[] tokens.
Then, for each token in tokens:
String s[] = tokens[i].split(" +");
if (s.length > 1)
tokens[i] = s[0] + s[1];
else
tokens[i] = s[0];
You can use positive lookbehind:
String s = "5 + 4 + 3 - 2 - 10 + 15";
Pattern p = Pattern.compile("(?<=[0-9]) +");
String[] result = p.split(s);
for(String ss : result)
System.out.println(ss.replaceAll(" ", ""));
String cal = "5 + 4 + 3 - 2 - 10 + 15";
//matches combinations of '+' or '-', whitespace, number
Pattern pat = Pattern.compile("[-+]{1}\\s*\\d+");
Matcher mat = pat.matcher(cal);
List<String> ops = new ArrayList<String>();
while(mat.find())
{
ops.add(mat.group());
}
//gets first number and puts in beginning of List
ops.add(0, cal.substring(0, cal.indexOf(" ")));
for(int i = 0; i < ops.size(); i++)
{
//remove whitespace
ops.set(i, ops.get(i).replaceAll("\\s*", ""));
}
System.out.println(Arrays.toString(ops.toArray()));
//[5, +4, +3, -2, -10, +15]
Based off the input of some of the answers here, I found this to be the best solution
// input
String s = "5 + 4 + 3 - 2 - 10 + 15";
ArrayList<Integer> numbers = new ArrayList<Integer>();
// remove whitespace
s = s.replaceAll("\\s+", "");
// parse string
Pattern pattern = Pattern.compile("[-]?\\d+");
Matcher matcher = pattern.matcher(s);
// add numbers to array
while (matcher.find()) {
numbers.add(Integer.parseInt(matcher.group()));
}
// numbers
// {5, 4, 3, -2, -10, 15}
How can I create a java regular expression for a comma separator list
(3)
(3,6)
(3 , 6 )
I tried, but it does not match anything:
Pattern.compile("\\(\\S[,]+\\)")
and how can I get the value "3" or "3"and "6" in my code from the Matcher class?
It's not clear to me exactly what your input looks like, but I doubt the pattern your using is what you want. Your pattern will match a literal (, followed by a single non-whitespace character, followed by one or more commas, followed by a literal ).
If you want to match a number, optionally followed by a comma and another number, all surrounded by parentheses, you could try this pattern:
"\\(\\s*(\\d+)\\s*(,\\d+)?\\s*\\)"
That should match (3), ( 3 ), ( 3, 6), etc. but not (a) or (3, a).
You can retrieve the matched digit(s) using Matcher.group; the first digit will be group 1, the second (if any) will be group 2.
Validation regex
You can try this meta-regex approach for clarity:
String pattern =
"< part (?: , part )* >"
.replace("<", "\\(")
.replace(">", "\\)")
.replace(" ", "\\s*")
.replace("part", "[^\\s*(,)]++");
System.out.println(pattern);
/*** this is the pattern
\(\s*[^\s*(,)]+\s*(?:\s*,\s*[^\s*(,)]+\s*)*\s*\)
****/
The part pattern is [^\s(,)]+, i.e. one or more of anything but whitespace, brackets and comma. This construct is called the negated character class. [aeiou] matches any of the 5 vowel letters; [^aeiou] matches everything but (which includes consonants but also numbers, symbols, whitespaces).
The + repetition is also made possessive to ++ for optimization. The (?:...) construct is a non-capturing group, also for optimization.
References
regular-expressions.info/Character Class, Possessive Quantifier, Non-capturing Group
java.util.regex.Pattern
Testing and splitting
We can then test the pattern as follows:
String[] tests = {
"(1,3,6)",
"(x,y!,a+b=c)",
"( 1, 3 , 6)",
"(1,3,6,)",
"(())",
"(,)",
"()",
"(oh, my, god)",
"(oh,,my,,god)",
"([],<>)",
"( !! , ?? , ++ )",
};
for (String test : tests) {
if (test.matches(pattern)) {
String[] parts = test
.replaceAll("^\\(\\s*|\\s*\\)$", "")
.split("\\s*,\\s*");
System.out.printf("%s = %s%n",
test,
java.util.Arrays.toString(parts)
);
} else {
System.out.println(test + " no match");
}
}
This prints:
(1,3,6) = [1, 3, 6]
(x,y!,a+b=c) = [x, y!, a+b=c]
( 1, 3 , 6) = [1, 3, 6]
(1,3,6,) no match
(()) no match
(,) no match
() no match
(oh, my, god) = [oh, my, god]
(oh,,my,,god) no match
([],<>) = [[], <>]
( !! , ?? , ++ ) = [!!, ??, ++]
This uses String.split to get a String[] of all the parts after trimming the brackets out.