JAVA regex pattern if statement - java

This is what i have already.
^[abceghj-prstw-z][a-np-z]$
I am trying to form regex pattern with these requirements:
First position can be any letter but d,f,i,q,u,v.
Second position can be any letter but o.
The first and second position can't be BG, GB, NK, KN, TN, NT, ZZ.
So for example string "ap" = true.
ao = false (because second position is o).
gb = false (because it cant be gb)
I am pretty new with regular expressions so any help would be great.
Thanks.

You need to make use of negative lookahead to make the regex fail if those specific patterns exist:
^(?i)(?!(bg)|(gb)|(nk)|(kn)|(tn)|(nt)|(zz))[abceghj-prstw-z][a-np-z]$
(?i) makes it case-insensitive.

As answered here you may add negative lookahead to exclude forbidden symbols from the beginnig of your regex:
^(?!bg|gb|nk|kn|tn|nt|zz)[abceghj-prstw-z][a-np-z]$

You can use negative lookaheads or negative lookbehinds if you don't want to check the exceptions (gb, ...) manually. Here's an example with a negative lookbehind:
Pattern p = Pattern.compile("[abceghj-prstw-z][a-np-z](?<!gb|bg|nk|kn|tn|nt|zz)", Pattern.CASE_INSENSITIVE);
List<String> inputs = Arrays.asList("ap", "apo", "AP", "GB", "gb", "gg");
for (String input : inputs) {
System.out.println(input + " " + p.matcher(input).matches());
}
Prints:
ap true
apo false
AP true
GB false
gb false
gg true

Related

REGEX extract two double number separated from hypen

I have strings like:
some foo text
some foo
1-2
1.00-2.00
3.21-1.23
2.12-2.12
I have to check if the string format contains two numbers separated by hyphen.
How can I do it?
Thanks
Regex for float is: ^[1-9]\d*\.\d+$ if decimals are optional : ^[1-9]\d*(?:\.\d+)?$
Repeat it twice with hyphen in between:
`^[1-9]\d*(?:\.\d+)?-[1-9]\d*(?:\.\d+)?$`
You can use the regex:
^\d+(\.\d+)?-\d+(\.\d+)?$
Explanation can be found here.
Using java you can create a method that checks whether your desired pattern exists or not:
public static boolean returnMatch(String input) {
Pattern p1 = Pattern.compile("^\\d+(\\.\\d+)?-\\d+(\\.\\d+)?$");
Matcher m1 = p1.matcher(input);
return m1.find() ? true : false;
}
Now call it using:
System.out.println(returnMatch("some foo text")); // false
System.out.println(returnMatch("1.00-2.00")); // true
System.out.println(returnMatch("2.12-2.12")); // true
System.out.println(returnMatch("10-20")); // true
Use a simple Regex:
(\d+(?:\.\d+)?)-(\d+(?:\.\d+)?)
This solution assumes there is always a decimal part present (at least one digit). Demo at Regex101.
\d is a digit
\d+ is at least one digit
\. matches a dot (.) literally
() is a capturing group
(?:\.\d+)? is a non-capturing group which optionally matches the decimal part
Don't forget the proper escaping in Java String regex = "(\\d+(?:\\.\\d+)?)-(\\d+(?:\\.\\d+)?)";
In case one or more spaced or blank characters appear between the dash and numbers, use:
(\d+(?:\.\d+)?)\s*-\s*(\d+(?:\.\d+)?)

Writing regex for string containing no only numbers

I need to write a regex containing not only digits [0-9]. How can I do that without explicitly specifying all possible charaters in a group. Is it possible to do through lookahead/lookbehind? Examples:
034987694 - doesn't match
23984576s9879 - match
rtfsdbhkjdfg - match
=-0io[-09uhidkbf - match
9347659837564983467 - doesn't match
^(?!\\d+$).*$
This should do it for you.See demo.
https://regex101.com/r/fM9lY3/1
The negative will lookahead will check if the string doesnt have integers from start to end.You need $ to make sure the check is till end or else it will just check at the start.
If you just need to detect whether the string is not numbers-only, then you can simply test for /\D/ - "succeed if there is a non-digit anywhere".
Why not check if it only contains digits, if not it matches
String[] strings = {"034987694", "23984576s9879",
"rtfsdbhkjdfg",
"=-0io[-09uhidkbf",
"9347659837564983467"};
for (String s : strings) {
System.out.printf("%s = %s%n", s, !s.matches("\\d*"));
}
output
034987694 = false
23984576s9879 = true
rtfsdbhkjdfg = true
=-0io[-09uhidkbf = true
9347659837564983467 = false
You may try the below,
string.matches(".*\\D.*");
This expects atleast 1 non-digit character.

Match exactly N repetitions of the same character

How do I write an expression that matches exactly N repetitions of the same character (or, ideally, the same group)? Basically, what (.)\1{N-1} does, but with one important limitation: the expression should fail if the subject is repeated more than N times. For example, given N=4 and the string xxaaaayyybbbbbzzccccxx, the expressions should match aaaa and cccc and not bbbb.
I'm not focused on any specific dialect, feel free to use any language. Please do not post code that works for this specific example only, I'm looking for a general solution.
Use negative lookahead and negative lookbehind.
This would be the regex: (.)(?<!\1.)\1{N-1}(?!\1) except that Python's re module is broken (see this link).
English translation: "Match any character. Make sure that after you match that character, the character before it isn't also that character. Match N-1 more repetitions of that character. Make sure that the character after those repetitions is not also that character."
Unfortunately, the re module (and most regular expression engines) are broken, in that you can't use backreferences in a lookbehind assertion. Lookbehind assertions are required to be constant length, and the compilers aren't smart enough to infer that it is when a backreference is used (even though, like in this case, the backref is of constant length). We have to handhold the regex compiler through this, as so:
The actual answer will have to be messier: r"(.)(?<!(?=\1)..)\1{N-1}(?!\1)"
This works around that bug in the re module by using (?=\1).. instead of \1. (these are equivalent most of the time.) This lets the regex engine know exactly the width of the lookbehind assertion, so it works in PCRE and re and so on.
Of course, a real-world solution is something like [x.group() for x in re.finditer(r"(.)\1*", "xxaaaayyybbbbbzzccccxx") if len(x.group()) == 4]
I suspect you want to be using negative lookahead: (.)\1{N-1}(?!\1).
But that said...I suspect the simplest cross-language solution is just write it yourself without using regexes.
UPDATE:
^(.)\\1{3}(?!\\1)|(.)(?<!(?=\\2)..)\\2{3}(?!\\2) works for me more generally, including matches starting at the beginning of the string.
It is easy to put too much burden onto regular expressions and try to get them to do everything, when just nearly everything will do!
Use a regex to find all substrings consisting of a single character, and then check their length separately, like this:
use strict;
use warnings;
my $str = 'xxaaaayyybbbbbzzccccxx';
while ( $str =~ /((.)\2*)/g ) {
next unless length $1 == 4;
my $substr = $1;
print "$substr\n";
}
output
aaaa
cccc
Perl’s regex engine does not support variable-length lookbehind, so we have to be deliberate about it.
sub runs_of_length {
my($n,$str) = #_;
my $n_minus_1 = $n - 1;
my $_run_pattern = qr/
(?:
# In the middle of the string, we have to force the
# run being matched to start on a new character.
# Otherwise, the regex engine will give a false positive
# by starting in the middle of a run.
(.) ((?!\1).) (\2{$n_minus_1}) (?!\2) |
#$1 $2 $3
# Don't forget about a potential run that starts at
# the front of the target string.
^(.) (\4{$n_minus_1}) (?!\4)
# $4 $5
)
/x;
my #runs;
while ($str =~ /$_run_pattern/g) {
push #runs, defined $4 ? "$4$5" : "$2$3";
}
#runs;
}
A few test cases:
my #tests = (
"xxaaaayyybbbbbzzccccxx",
"aaaayyybbbbbzzccccxx",
"xxaaaa",
"aaaa",
"",
);
$" = "][";
for (#tests) {
my #runs = runs_of_length 4, $_;
print qq<"$_":\n>,
" - [#runs]\n";
}
Output:
"xxaaaayyybbbbbzzccccxx":
- [aaaa][cccc]
"aaaayyybbbbbzzccccxx":
- [aaaa][cccc]
"xxaaaa":
- [aaaa]
"aaaa":
- [aaaa]
"":
- []
It’s a fun puzzle, but your regex-averse colleagues will likely be unhappy if such a construction shows up in production code.
How about this in python?
def match(string, n):
parts = []
current = None
for c in string:
if not current:
current = c
else:
if c == current[-1]:
current += c
else:
parts.append(current)
current = c
result = []
for part in parts:
if len(part) == n:
result.append(part)
return result
Testing with your string with various sizes:
match("xxaaaayyybbbbbzzccccxx", 6) = []
match("xxaaaayyybbbbbzzccccxx", 5) = ["bbbbb"]
match("xxaaaayyybbbbbzzccccxx", 4) = ['aaaa', 'cccc']
match("xxaaaayyybbbbbzzccccxx", 3) = ["yyy"]
match("xxaaaayyybbbbbzzccccxx", 2) = ['xx', 'zz']
Explanation:
The first loop basically splits the text into parts, like so: ["xx", "aaaa", "yyy", "bbbbb", "zz", "cccc", "xx"]. Then the second loop tests those parts for their length. In the end the function only returns the parts that have the current length. I'm not the best at explaining code, so anyone is free to enhance this explanation if needed.
Anyways, I think this'll do!
Why not leave to regexp engine what it does best - finding longest string of same symbols and then check length yourself?
In Perl:
my $str = 'xxaaaayyybbbbbzzccccxx';
while($str =~ /(.)\1{3,}/g){
if(($+[0] - $-[0]) == 4){ # insert here full match length counting specific to language
print (($1 x 4), "\n")
}
}
>>> import itertools
>>> zz = 'xxaaaayyybbbbbzzccccxxaa'
>>> z = [''.join(grp) for key, grp in itertools.groupby(zz)]
>>> z
['xx', 'aaaa', 'yyy', 'bbbbb', 'zz', 'cccc', 'xx', 'aa']
From there you can iterate through the list and check for occasions when N==4 very easily, like this:
>>> [item for item in z if len(item)==4]
['cccc', 'aaaa']
In Java we can do like below code
String test ="xxaaaayyybbbbbzzccccxx uuuuuutttttttt";
int trimLegth = 4; // length of the same characters
Pattern p = Pattern.compile("(\\w)\\1+",Pattern.CASE_INSENSITIVE| Pattern.MULTILINE);
Matcher m = p.matcher(test);
while (m.find())
{
if(m.group().length()==trimLegth) {
System.out.println("Same Characters String " + m.group());
}
}

Regex to get first number in string with other characters

I'm new to regular expressions, and was wondering how I could get only the first number in a string like 100 2011-10-20 14:28:55. In this case, I'd want it to return 100, but the number could also be shorter or longer.
I was thinking about something like [0-9]+, but it takes every single number separately (100,2001,10,...)
Thank you.
/^[^\d]*(\d+)/
This will start at the beginning, skip any non-digits, and match the first sequence of digits it finds
EDIT:
this Regex will match the first group of numbers, but, as pointed out in other answers, parseInt is a better solution if you know the number is at the beginning of the string
Try this to match for first number in string (which can be not at the beginning of the string):
String s = "2011-10-20 525 14:28:55 10";
Pattern p = Pattern.compile("(^|\\s)([0-9]+)($|\\s)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(2));
}
Just
([0-9]+) .*
If you always have the space after the first number, this will work
Assuming there's always a space between the first two numbers, then
preg_match('/^(\d+)/', $number_string, $matches);
$number = $matches[1]; // 100
But for something like this, you'd be better off using simple string operations:
$space_pos = strpos($number_string, ' ');
$number = substr($number_string, 0, $space_pos);
Regexs are computationally expensive, and should be avoided if possible.
the below code would do the trick.
Integer num = Integer.parseInt("100 2011-10-20 14:28:55");
[0-9] means the numbers 0-9 can be used the + means 1 or more times. if you use [0-9]{3} will get you 3 numbers
Try ^(?'num'[0-9]+).*$ which forces it to start at the beginning, read a number, store it to 'num' and consume the remainder without binding.
This string extension works perfectly, even when string not starts with number.
return 1234 in each case - "1234asdfwewf", "%sdfsr1234" "## # 1234"
public static string GetFirstNumber(this string source)
{
if (string.IsNullOrEmpty(source) == false)
{
// take non digits from string start
string notNumber = new string(source.TakeWhile(c => Char.IsDigit(c) == false).ToArray());
if (string.IsNullOrEmpty(notNumber) == false)
{
//replace non digit chars from string start
source = source.Replace(notNumber, string.Empty);
}
//take digits from string start
source = new string(source.TakeWhile(char.IsDigit).ToArray());
}
return source;
}
NOTE: In Java, when you define the patterns as string literals, do not forget to use double backslashes to define a regex escaping backslash (\. = "\\.").
To get the number that appears at the start or beginning of a string you may consider using
^[0-9]*\.?[0-9]+ # Float or integer, leading digit may be missing (e.g, .35)
^-?[0-9]*\.?[0-9]+ # Optional - before number (e.g. -.55, -100)
^[-+]?[0-9]*\.?[0-9]+ # Optional + or - before number (e.g. -3.5, +30)
See this regex demo.
If you want to also match numbers with scientific notation at the start of the string, use
^[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Just number
^-?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional -
^[-+]?[0-9]*\.?[0-9]+([eE][+-]?[0-9]+)? # Number with an optional - or +
See this regex demo.
To make sure there is no other digit on the right, add a \b word boundary, or a (?!\d)
or (?!\.?\d) negative lookahead that will fail the match if there is any digit (or . and a digit) on the right.
public static void main(String []args){
Scanner s=new Scanner(System.in);
String str=s.nextLine();
Pattern p=Pattern.compile("[0-9]+");
Matcher m=p.matcher(str);
while(m.find()){
System.out.println(m.group()+" ");
}
\d+
\d stands for any decimal while + extends it to any other decimal coming directly after, until there is a non number character like a space or letter

test method of RegExp GWT/Javascript

I want to detect if a String is a decimal by using a regular expression. My question is more on how to use the regular expression mechanism than detecting that a String is a decimal. I use the RegExp class provided by GWT.
String regexDecimal = "\\d+(?:\\.\\d+)?";
RegExp regex = RegExp.compile(regexDecimal);
String[] decimals = { "one", "+2", "-2", ".4", "-.4", ".5", "2.5" };
for (int i = 0; i < decimals.length; i++) {
System.out.println(decimals[i] + " "
+ decimals[i].matches(regexDecimal) + " "
+ regex.test(decimals[i]) + " "
+ regex.exec(decimals[i]));
}
The output:
one false false null
+2 false true 2
-2 false true 2
.4 false true 4
-.4 false true 4
.5 false true 5
2.5 true true 2.5
I was expecting that both methods String.matches() and RegExp.test() return the same result.
So what's the difference between
both methods?
How to use the RegExp.test() to get the same behaviour?
Try to change the regex to
"^\\d+(?:\\.\\d+)?$"
explain
double escape is because we're in Java...
regex start with ^ to forces the regex to match from the very start of the string.
regex end with $ to forces the regex to match from the very end of the string.
this is how you should get String.matches() to do the same as GWT RegExp.test()
I don't know the difference, but I would say that RegExp.test() is correct, because your regex matches as soon as there is a digit within your string and String.matches() behaves like there where anchors around the regex.
\\d+(?:\\.\\d+)?
Your non capturing group is optional, so one \\d ([0-9]) is enough to match, no matter what is around.
When you add anchors to your regex, that means it has to match the string from the start to the end, then RegExp.test() will probably show the same results.
^\\d+(?:\\.\\d+)?$

Categories