regex certain character can exist or not but nothing after that - java

I'm new to regex and I'm trying to do a search on a couple of string.
I wanted to check if a certain character, in this case its ":" (without the quote) exist on the strings.
If : does not exist in the string it would still match, but if : exist there should be nothing after that only space and new line will be allowed.
I have this pattern, but it does not seem to work as I want it.
(.*)(:?\s*\n*)
Thank you.

If I understand your question correctly, ^[^:]*(:\s*)?$
Let's break this down a bit:
^ Starting anchor; without this, the match can restart itself every time it sees another colon, or non-whitespace following a colon.
[^:]* Match any number of characters that AREN'T colon characters; this way, if the entire string is non-colon characters, the string is treated as a valid match.
(:\s*)? If at any point we do see a colon, all following characters must be white space until the end of the string; the grouping parens and following ? act to make this an all-or-nothing conditional statement.
$ Ending anchor; without this, the regex won't know that if it sees a colon the following whitespace MUST persist until the end of the string.

here is a pattern which should work
/^([^:]*|([^:]*:\s*))$/
you can use the pipe to manage alternatives

Another way is :
^[^:]*(|:[\n]*)$
^[^:]* => starts with anything except :
(|:[\n]*)$ => ends either with exactly nothing OR ':' followed by line breaks

Related

Using regular expression, how to remove matching sequence at the beginning and ending of the text but keeping what's in the middle?

my problem is very simple but I can't figure out the correct regular expression I should use.
I have the following variable (Java) :
String text = "\033[1mYO\033[0m"; // this is ANSI for bold text in the Terminal
My goal is to remove the ANSI codes with a single regular expression (I just want to keep the plain text at the middle). I cannot modify the text in any way and those ANSI codes will always be at the same place (so one at the beginning, one at the end, though sometimes it's possible that there is none).
With this regular expression, I will remove them using replaceAll method :
String plainText = text.replaceAll(unknownRegex, "");
Any idea on what the unknown regex could be?
Well, you use a single regex that has the ansi codes optionally at the beginning and end, captures anything in between and replaces the entire string with the value of the group: text.replaceAll("^(?:\\\\\\d+\\[1m)?(.*?)(?:\\\\\\d+\\[0m)?$", "$1"). (this might not capture every ansi code - adjust if needed).
Breaking the expression down (note that the example above escapes backslashes for Java strings so they are doubled):
^ is the start of the string
(?:\\\d+\[1m)? matches an optional \<at least 1 digit>[1m
(.*?) matches any text but as little as possible, and captures it into group 1
(?:\\\d+\[0m)? atches an optional \<at least 1 digit>[0m
$ is the end of the input
In the replacement $1 refers to the value of capturing group 1 which is (.*?) in the expression.
Found the answer thanks to a comment that disappeared.
Actually, i just need to make a group to get what's in the middle of the string and using it ($1) to replace the whole thing :
String plainText = text.replaceAll("\\033\\[.*m(.+)\\033\\[.*m", "$1")
Not sure if this will remove every ANSI codes but that is enough for what I want to do.

Regex to allow only Numbers, alphabets, spaces and hyphens - Java

Need to allow user to enter only Numbers or alphabets or spaces or hyphens OR combination of any of the above.
and i tried the following
String regex = "/^[0-9A-Za-z\s\-]+$/";
sampleString.matches(regex);
but it is not working properly. would somebody help me to fix please.
Issue : your regex is trying to match / symbol at the beginning and at the end
In java there is no need of / before and after regex so use, java!=javascript
"^[0-9A-Za-z\\s-]+$"
^[0-9A-Za-z\\s-]+$ : ^ beginning of match
[0-9A-Za-z\\s-]+ : one or more alphabets, numbers , spaces and -
$ : end of match
You are close but need to make two changes.
The first is to double-escape (i.e., use \\ instead of \). This is due to the weirdness of Java (see the section "Backslashes, escapes, and quoting" in Javadoc for the Pattern class). The second thing is to drop the explicit reference to the start and end of the string. That's going to be implied when using matches(). So the correct Java code is
String regex = "[0-9A-Za-z\\s\\-]+";
sampleString.matches(regex);
While that will work, you can also replace the "0-9" reference with \d and drop the escaping of the "-". That gives you
String regex = "[\\dA-Za-z\\s-]+";

Java Regular Expression Negative Look Ahead Finding Wrong Match

Assume I have the following string.
create or replace package test as
-- begin null; end;/
end;
/
I want a regular expression that will find the semicolon not preceded by a set of "--" double dashes on the same line. I'm using the following pattern "(?!--.*);" and I'm still getting matches for the two semicolons on the 2nd line.
I feel like I'm missing something about negative look aheads but I can't figure out what.
If you want to match semicolons only on the lines which do not start with --, this regex should do the trick:
^(?!--).*(;)
Example
I only made a few changes from your regex:
Multi-line mode, so we can use ^ and $ and search by line
^ at the beginning to indicate start of a line
.* between the negative lookahead and the semicolon, because otherwise with the first change it would try to match something like ^;, which is wrong
(I also added parentheses around the semicolon so the demo page displays the result more clearly, but this is not necessary and you can change to whatever is most convenient for your program.)
First of all, what you need is a negative lookbehind (?<!) and not a negative lookahead (?!) since you want to check what's behind your potential match.
Even with that, you won't be able to use the negative lookbehind in your case since the Java's regex engine does not support variable length lookbehind. This means that you need to know exactly how many characters to look behind your potential match for it to work.
With that said, wouldn't be simpler in your case to just split your String by linefeed/carriage return and then remove the line that start with "--"?
The reason "(?!--.*);" isn't working is because the negative look ahead is asserting that when positioned before a ; that the next two chars are --, which of course matches every time (; is always not --).
In java, to match a ; that doesn't have -- anywhere before it:
"\\G(((?<!--)[^;])*);"
To see this in action using a replaceAll() call:
String s = "foo; -- begin null; end;";
s = s.replaceAll("\\G(((?<!--)[^;])*);", "$1!");
System.out.println(s);
Output:
foo! -- begin null; end;
Showing that only semi colons before a double dash are matched.

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

regex for specific digit prefix

I am trying to have the following regx rule, but couldn't find solution.
I am sorry if I didn't make it clear. I want for each rule different regx. I am using Java.
rule should fail for all digit inputs start with prefix '1900' or '1901'.
(190011 - fail, 190111 - fail, 41900 - success...)
rule should success for all digit inputs with the prefix '*'
different regex for each rule (I am not looking for the combination of both of them together)
Is this RE fitting the purpose ? :
'\A(\*|(?!190[01])).*'
\A means 'the beginning of string' . I think it's the same in Java's regexes
.
EDIT
\A : "from the very beginning of the string ....". In Python (which is what I know, in fact) this can be omitted if we use the function match() that always analyzes from the very beginning, instead of search() that search everywhere in a string. If you want the regex able to analyze lines from the very beginning of each line, this must be replaced by ^
(...|...) : ".... there must be one of the two following options : ....."
\* : "...the first option is one character only, a star; ..." . As a star is special character meaning 'zero, one or more times what is before' in regex's strings, it must be escaped to strictly mean 'a star' only.
(?!190[01]) : "... the second option isn't a pattern that must be found and possibly catched but a pattern that must be absent (still after the very beginning). ...". The two characters ?! are what says 'there must not be the following characters'. The pattern not to be found is 4 integer characters long, '1900' or '1901' .
(?!.......) is a negative lookahead assertion. All kinds of assertion begins with (? : the parenthese invalidates the habitual meaning of ? , that's why all assertions are always written with parentheses.
If \* have matched, one character have been consumed. On the contrary, if the assertion is verified, the corresponding 4 first characters of the string haven't been consumed: the regex motor has gone through the analysed string until the 4th character to verify them, and then it has come back to its initial position, that is to say, presently, at the very beginning of the string.
If you want the bi-optional part (...|...) not to be a capturing group, you will write ?: just after the first paren, then '\A(?:\*|(?!190[01])).*'
.* : After the beginning pattern (one star catched/matched, or an assertion verified) the regex motor goes and catch all the characters until the end of the line. If the string has newlines and you want the regex to catch all the characters until the end of the string, and not only of a line, you will specify that . must match the newlines too (in Python it is with re.MULTILINE), or you will replace .* with (.|\r|\n)*
I finally understand that you apparently want to catch strings composed of digits characters. If so the RE must be changed to '\A(?:\*|(?!190[01]))\d*' . This RE matches with empty strings. If you want no-match with empty strings, put \d+ in place of \d* . If you want that only strings with at least one digit, even after the star when it begins with a star, match, then do '\A(?:\*|(?!190[01]))(?=\d)\d*'
For the first rule, you should use a combo regex with two captures, one to capture the 1900/1901-prefixed case, and one the capture the rest. Then you can decide whether the string should succeed or fail by examining the two captures:
(190[01]\d+)|(\d+)
Or just a simple 190[01]\d+ and negate your logic.
Regex's are not really very good at excluding something.
You may exclude a prefix using negative look-behind, but it won't work in this case because the prefix is itself a stream of digits.
You seem to be trying to exclude 1-900/901 phone numbers in the US. If the number of digits is definite, you can use a negative look-behind to exclude this prefix while matching the remaining exact number digits.
For the second rule, simply:
\*\d+

Categories