trouble with writing regex java - java

String always consists of two distinct alternating characters. For example, if string 's two distinct characters are x and y, then t could be xyxyx or yxyxy but not xxyy or xyyx.
But a.matches() always returns false and output becomes 0. Help me understand what's wrong here.
public static int check(String a) {
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
if(a.matches("["+on+"("+to+""+on+")*]|["+to+"("+on+""+to+")*]")) {
return a.length();
}
}
return 0;
}

Use regex (.)(.)(?:\1\2)*\1?.
(.) Match any character, and capture it as group 1
(.) Match any character, and capture it as group 2
\1 Match the same characters as was captured in group 1
\2 Match the same characters as was captured in group 2
(?:\1\2)* Match 0 or more pairs of group 1+2
\1? Optionally match a dangling group 1
Input must be at least two characters long. Empty string and one-character string will not match.
As java code, that would be:
if (a.matches("(.)(.)(?:\\1\\2)*\\1?")) {
See regex101.com for working examples1.
1) Note that regex101 requires use of ^ and $, which are implied by the matches() method. It also requires use of flags g and m to showcase multiple examples at the same time.
UPDATE
As pointed out by Austin Anderson:
fails on yyyyyyyyy or xxxxxx
To prevent that, we can add a zero-width negative lookahead, to ensure input doesn't start with two of the same character:
(?!(.)\1)(.)(.)(?:\2\3)*\2?
See regex101.com.
Or you can use Austin Anderson's simpler version:
(.)(?!\1)(.)(?:\1\2)*\1?

Actually your regex is almost correct but problem is that you have enclosed your regex in 2 character classes and you need to match an optional 2nd character in the end.
You just need to use this regex:
public static int check(String a) {
if (a.length() < 2)
return 0;
char on = a.charAt(0);
char to = a.charAt(1);
if(on != to) {
String re = on+"("+to+on+")*"+to+"?|"+to+"("+on+to+")*"+on+"?";
System.out.println("re: " + re);
if(a.matches(re)) {
return a.length();
}
}
return 0;
}
Code Demo

Related

Regex for negative number with leading zeros and words with apastrophe

hey I need a regex that removes the leadings zeros.
right now I am using this code . it does work it just doesn't keep the negative symbol.
String regex = "^+(?!$)";
String numbers = txaTexte.getText().replaceAll(regex, ")
after that I split numbers so it puts the numbers in a array.
input :
-0005
0003
-87
output :
-5
3
-87
I was also wondering what regex I could use to get this.
the words before the arrow are input and after is the output
the text is in french. And right now I am using this it works but not with the apostrophe.
String [] tab = txaTexte.getText().split("(?:(?<![a-zA-Z])'|'(?![a-zA-Z])|[^a-zA-Z'])+")
Un beau JOUR. —> Un/beau/JOUR
La boîte crânienne —> La/boîte/crânienne
C’était mieux aujourd’hui —> C’/était/mieux/aujourd’hui
qu’autrefois —> qu’/autrefois
D’hier jusqu’à demain! —> D’/hier/jusqu’/à/demain
Dans mon sous-sol—> Dans/mon/sous-sol
You might capture an optional hyphen, then match 1+ more times a zero and 1 capture 1 or more digits in group 2 starting with a digit 1-9
^(-?)0+([1-9]\d*)$
^ Start of string
(-?) Capture group 1, match optional hyphen
0+ Match 0+ zeroes
([1-9]\d*) Capture group 2, match 1+ digits starting with a digit 1-9
$ End of string
See a regex demo.
In the replacement use group 1 and group 2.
String regex = "^(-?)0+([1-9]\\d*)$";
String text = "-0005";
String numbers = txaTexte.getText().replaceAll(regex, "$1$2");
Here is one way. This preserves the sign.
capture the optional sign.
check for 0 or more leading zeros
followed by 1 or more digits.
String regex = "^([+-])?0*(\\d+)";
String [] data = {"-1415", "+2924", "-0000123", "000322", "+000023"};
for (String num : data) {
String after = num.replaceAll(regex, "$1$2");
System.out.printf("%8s --> %s%n", num , after);
}
prints
-1415 --> -1415
+2924 --> +2924
-0000123 --> -123
000322 --> 322
+000023 --> +23
If you want to keep -000, 000, +0000 etc. as just 0, try this regex:
`^[-+]?0*(0)$|^([-+])?0*(\d+)$`
Break down:
^...$ means the entire string should match (^ is the start of the string, $ is the end)
...|... is an alternative
[-+] is a character class that contains only the plus and minus characters. Note that - has a special meaning ("range") in character classes if it's not the first or last character
(...) is a capturing group which can be referenced in the replacement string by $number where number is the 1-based and 1-digit position of the group within the regex (the first group to start is no. 1 etc.)
?, * and + are quantifiers when used outside character classes meaning "0 or 1 occurence" (?), "any number of occurences, including none" (*) and "at least one occurence" (+)
^[-+]?0*(0)$ thus means: the entire string must be an optional sign, followed by any number of zeros and ending with a single zero which is captured as group 1.
alternatively ^([-+])?0*(\d+)$ means the entire string must be an optional sign which is captured as group 2, followed by any number of zeros and ending in at least one digit which is captured as group 3.
This regex can then be used with String.replaceAll(regex, "$1$2$3") in order to keep only the single 0 from group 1 or the optional sign and the number without leading zeros from groups 2 and 3. Any empty groups will result empty strings, that's why this works.
However, regular expressions can be slow, especially if you have to process a lot of strings.
One thing to improve this would be to compile the pattern only once:
//compile the pattern once and reuse it
Pattern p = Pattern.compile("^[+-]?0*(0)$|^([+-])?0*(\\d+)$");
//build a matcher from the pattern and the input string, and do the replacement
String number = p.matcher(txaTexte.getText()).replaceAll("$1$2$3");
If you're working on a large number of strings (> 10000) you might want to use some specialized plain parsing without regex. Consider something like this, which on my machine is about 10x faster than the regex approach with reused pattern:
public static String stripLeadingZeros(String s) {
//nothing to do, return the string as is
if( s == null || s.isEmpty() ) {
return s;
}
char[] chars = s.toCharArray();
int usedChars = 0;
//check if the first character is the sign
boolean hasSign = false;
if(chars[0] == '-' || chars[0] == '+') {
hasSign = true;
usedChars++;
//special case: just a sign
if(chars.length == 1) {
return s;
}
}
//process the rest of the characters
boolean stripZeros = true;
for( int i = usedChars; i < chars.length; i++) {
//not a digit, this isn't a simple integer, stop processing and keep the original string
if( chars[i] < '0' || chars[i] > '9') {
return s;
}
//are we still in zero-stripping mode
if( stripZeros) {
if( chars[i] == '0') {
continue; //check next char
}
//we've found a non-zero char, keep it and end zero-stripping mode
if(chars[i] >= '1' && chars[i] <= '9') {
stripZeros = false;
}
}
//since we are ignoring leading zeros, we just move all digits of the actual number to the left
chars[usedChars++] = chars[i];
}
//handle special case of number 0 (with optional sign)
if( usedChars == (hasSign ? 1 : 0)) {
chars[0] = '0';
usedChars = 1;
}
return new String(chars,0, usedChars);
}

Regex to identify strings containing a particular symbol?

I have set of inputs ++++,----,+-+-.Out of these inputs I want the string containing only + symbols.
If you want to see if a String contains nothing but + characters, write a loop to check it:
private static boolean containsOnly(String input, char ch) {
if (input.isEmpty())
return false;
for (int i = 0; i < input.length(); i++)
if (input.charAt(i) != ch)
return false;
return true;
}
Then call it to check:
System.out.println(containsOnly("++++", '+')); // prints: true
System.out.println(containsOnly("----", '+')); // prints: false
System.out.println(containsOnly("+-+-", '+')); // prints: false
UPDATE
If you must do it using regex (worse performance), then you can do any of these:
// escape special character '+'
input.matches("\\++")
// '+' not special in a character class
input.matches("[+]+")
// if "+" is dynamic value at runtime, use quote() to escape for you,
// then use a repeating non-capturing group around that
input.matches("(?:" + Pattern.quote("+") + ")+")
Replace final + with * in each of these, if an empty string should return true.
The regular expression for checking if a string is composed of only one repeated symbol is
^(.)\1*$
If you only want lines composed by '+', then it's
^\++$, or ^++*$ if your regex implementation does not support +(meaning "one or more").
For a sequence of the same symbol, use
(.)\1+
as the regular expression. For example, this will match +++, and --- but not +--.
Regex pattern: ^[^\+]*?\+[^\+]*$
This will only permit one plus sign per string.
Demo Link
Explanation:
^ #From start of string
[^\+]* #Match 0 or more non plus characters
\+ #Match 1 plus character
[^\+]* #Match 0 or more non plus characters
$ #End of string
edit, I just read the comments under the question, I didn't actually steal the commented regex (it just happens to be intellectual convergence):
Whoops, when using matches disregard ^ and $ anchors.
input.matches("[^\\+]*?\+[^\\+]*")

Regex for detecting repeating symbols

I'm looking for the regex expression that will detect repeating symbols in a String. And currently I didn't found solution that fits all my requirements.
Requirements are pretty simple:
detect any repeating symbol in a String;
to be able to setup repeating count (eg. more than twice)
Examples of required detection (of symbol 'a', more than 2 times, true if detects, false otherwise)
"Abcdefg" - false
"AbcdaBCD" - false
"abcd_ab_ab" - true (symbol 'a' used three times)
"aabbaabb" - true (symbols 'a' used four times)
Since I'm not a pro in regex and usage of them - code snippet and explanation would be appreciated!
Thanks!
I think that
(.).*\1
would work:
(.) match a single character and capture
.* match any intervening characters
\1 match the captured group again.
(You'd need to compile with the DOTALL flag, or replace . with [\s\S] or similar if the string contains characters not ordinarily matched by .)
and if you want to require that it is found at least 3 times, just change the quantifier of the second two bullets:
(.)(.*\1){2}
etc.
This is going to be pretty inefficient, though, because it's going to have to do the "search for the next matching character" between every character in the string and the end of the string, making it at least quadratic.
You might be as well off not using regular expressions, e.g.
char[] cs = str.toCharArray();
Arrays.sort(cs);
int n = numOccurrencesRequired - 1;
for (int i = n; i < cs.length; ++i) {
boolean allSame = true;
for (int j = 1; j <= n && allSame; ++j) {
allSame = cs[i] == cs[i - j];
}
if (allSame) return true;
}
return false;
This sorts all of the same characters together, allowing you just to pass over the string once looking for adjacent equal characters.
Note that this doesn't quite work for any symbol: it will split up multi-char codepoints like 🍕. You can adapt the code above to work with codepoints, rather than chars.
Try this regex: (.)(?:.*\1)
It basically matches any character (.) is followed by anything .* and itself \1. If you want to check for 2 or more repeats only add {n,} at the end with n being the number of repeats you want to check for.
Yea, such regex exists but just because the set of characters is finite.
regex: .*(a.*a|b.*b|c.*c|...|y.*y|z.*z).*
It makes no sense. Use another approach:
String string = "something";
int[] count = new int[256];
for (int i = 0; i < string.length; i++) {
int temp = int(string.charAt(i));
count[temp]++;
}
Now you have all characters counted and you can use them as you wish.

java regular expression examples for match without length limitation

i trying to write a regular expression for match a string starting with letter "G" and second index should be any number (0-9) and rest of the string can be contain any thing and can be any length,
i'm stuck in following code
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "^[G]\\d[0-9]";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
output:
G12 - Successful
why is not match the 3 index "G8756942"
G - the letter G
[0-9] - a digit
.* - any sequence of characters
So the expression
G[0-9].*
will match a letter G followed by a digit followed by any sequence of characters.
when you write \d it already means [0-9]
so when you say \d[0-9] that means two digits exactly
better use :
^G\\d*
which will match all words starting with G and having zero or more digits
"^[G]\\d[0-9]"
This regex matches "G" followed by \\d, then another number.
Use one of these:
"^G\\d"
"^G[0-9]"
Also note that you don't need a character class since it only contains one letter, so it's redundant.
try this regex .* will match any character after digit
^G\\d.*
http://regex101.com/r/uE4tX1/1
why is not match the 3 index "G8756942"
Because you match for a string starting with G, followed by a \, a d and exactly one digit. Solution:
^[G]\d
This regex would be fine.
"G\\d.*"
Because matches method tries to match the whole input, you need to add .* at the last in your pattern and also you don't need to include anchors.
String[] array = { "DA4545", "G121", "G8756942", "N45", "4578", "#45565" };
String regExp = "G\\d.*";
for(int i = 0; i < array.length; i++)
{
if(Pattern.matches(regExp, array[i]))
{
System.out.println(array[i] + " - Successful");
}
}
Output:
G121 - Successful
G8756942 - Successful

How can I look for two specific characters in a string?

String abc = "||:::|:|::";
It should return true if there's two | and three : appearances.
I'm not sure how to use "regex" or if it's the right method to use. There's no specific pattern in the abc String.
Using a regex would be a bad idea, especially if there's no specific order to them. Make a function that counts the number of times a character sppears in a string, and use that:
public int count(String base, char toFind)
{
int count = 0;
char[] haystack = base.toCharArray();
for (int i = 0; i < haystack.length; i++)
if (haystack[i] == toFind)
count++;
return count;
}
String abc = "||:::|:|::";
if (count(abc,"|") >= 2 && count(abc,":") >= 3)
{
//Do some code here
}
My favorite method for searching for the number of characters in a string is int num = s.length() - s.replaceAll("|","").length(); you can do that for both and test those ints.
If you want to test all conditions in one regex you can use look-ahead (?=condition).
Your regex can look like
String regex =
"(?=(.*[|]){2})"//contains two |
+ "(?=(.*:){3})"//contains three :
+ "[|:]+";//is build only from : and | characters
Now you can use it with matches like
String abc = "||:::|:|::";
System.out.println(abc.matches(regex));//true
abc = "|::::::";
System.out.println(abc.matches(regex));//false
Anyway I you can avoid regex and write your own method which will calculate number of | and : in your string and check if this numbers are greater or equal to 2 and 3. You can use StringUtils.countMatches from apache-commons so your test code could look like
public static boolean testString(String s){
int pipes = StringUtils.countMatches(s, "|");
int colons = StringUtils.countMatches(s, ":");
return pipes>=2 && colons>=3;
}
or
public static boolean testString(String s){
return StringUtils.countMatches(s, "|")>=2
&& StringUtils.countMatches(s, ":")>=3;
}
This is assuming you are looking for two '|' to be one after the other and the same for the three ':'
and one follows the other .Do it using the following single regular expressions.
".*||.*:::.*"
If you are looking to just check the presence of characters and their irrespective of their order then use String.matches method using the two regular expressions with a logical AND
".*|.*|.*"
".*:.*:.*:.*"
Here is a cheat sheet for regular expressions. Its fairly simple to learn. Look at groups and quantifiers in the document to understand the above expression.
Haven't tested it, but this should work
Pattern.compile("^(?=.*[|]{2,})(?=.*[:]{3,})$");
The entire string is read by ?=.* and checked wether the allowed characters (|) occurs at least twice. The same is then done for :, only that this has to match at least three times.

Categories