combine multiple regex to extract sub string from : separated string - java

I have been stuck for some time developing a single regex to extract a path from either of the following strings :
1. "life:living:fast"
2. "life"
3. ":life"
4. ":life:"
I have these regex expressions to use :
(.{3,}):", ":(.{3,}):", ":(.{3,})", "(.{3,})
The first match is all I need. i.e. the desired result for each should be the string located where the word life is. consider life to be a variable
But for some reason combining these individual regex's is a pain: If I excecute them sequentially I get the word 'life' extracted. However I am unable to combine them into one.
I appreciate your effort.

If you want the first life with the colons, you can use this:
^:?(?:.{3,}?)(?::|$)
See demo
If you prefer the first life without the colons, switch to this:
((?<=^:)|^)([^:]{3,}?)(?=:|$)
See demo
How it Works #1: ^:?(?:.{3,}?)(?::|$)
With ^:?, at the beginning of the string, we match an optional colon
(?:.{3,}?) lazily matches three or more chars up to...
(?::|$) a colon or the end of the string
How it Works #1: ((?<=^:)|^)([^:]{3,}?)(?=:|$)
((?<=^:)|^) ensures that we are either positioned at the beginning of the string, or after a colon immediately after the beginning of the string
([^:]{3,}?) lazily matches chars that are not colons...
up to a point where the lookahead (?=:|$) can assert that what follows is a colon or the end of the string.

You can use this pattern, since you are looking for the first word:
(?<=^:?)[^:]{3,}
Note that this pattern doesn't check all the string.

Related

How to remove front and end character of a string

For this input : |PS D#W ||OOOP #||# || QQWQ|
I want to remove first and last pipe from the string by modifying below regex which I have written for any space and special char removal.
str = str.replaceAll("[^a-zA-Z|]","");
Also I want to combine this regex - str.replaceAll("\\|+","|") (For updating many pipelines in between string to one pipe). Is it possible to combine this to one regex?
Expected output: PSDW|OOOP|QQWQ
I won't ask why you want regex to remove first and last pipe (you could check if string starts and ends with pipe and use substring)
Remember that regex is heavier than working with strings.
But...to remove first and last you can use
str.replaceAll("^\\|(.*)\\|$","$1")
Explanation here
then the others.
Ofcourse it depends how long is string and how often you use this method - but it's shortest answer for so asked question.

Regex for multiple occurrences of specific words

Hello
I'm trying to create a validation rule that checks the regular expression to accept only specific phrases. Regex is based on Java.
Here are examples of correct inputs:
1OR2
2
1 OR 2 OR 15
( 2OR3) AND 1
(12AND13 AND1)OR(4 AND5)
((2AND3 AND 1)OR(4AND5))AND6
but I would be happy if only the regex could accept anything like :
())34AND(4
I have no idea how to create a regex to check if the brackets open and close correctly(they can be nested). I assumed it can be impossible to check it in regex so the proper validation for the brackets I've already made in the code(stack implementation). In the code I have a second step validation of the phrase.
All I need the regex to do is to check if there are these specific things inside the phrase:
numbers, round brackets, words AND and OR with multiple occurrences and whitespaces are allowed.
It should NOT accept letters or other characters.
All I managed to create so far is this:
^[0-9 \\(][0-9 \\(\\)]*
also tried adding something like:
\\b(AND|OR)\\b
inside the second pair of brackets but with no luck.
I cannot figure out how to correct it to add OR and AND words.
I used the following and matched all the inputs you gave:
^[^\)][0-9 \( (AND|OR)]*$
I assumed you didn't want to start with ), which is why I included ^[^\)].
In case you weren't aware, I use https://www.regexpal.com to check my regular expressions for code.
Since you have an arbitrary number of nested elements it's arguably not possible with regex.
For demonstration purposes only, this matches zero or more conjunctions and one set of parenthesis:
^\d+(\s*(?:AND|OR)\s*(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\)))*$|^(\d+|\(?\s*\d+(\s*(?:AND|OR)\s*\d+)\s*\))\s*(\s*(?:AND|OR)\s*\d+)*$
That's it. Adding more sets and levels of nested parenthesis leads to exponentially increasing complexity - till it breaks altogether.
Demo

Matches A but not B in Java Regex?

I have a big document. Lets scale it to
location=State-City-House
location=City-House
So What I want to do is replace all those not starting with State, with some other string. Say "NY". But those starting with State must remain untouched.
So my end result would be
location=State-City-House
location=NY-City-House
1.Obviously I cant use String.replaceAll().
2.Using Pattern.matcher() is tricky since we are using two different patterns where one must be found and one must not be found.
3.Tried a dirty way of replacing "location=State" first with "bocation=State" then replacing the others and then re-replacing.
So, A neat and simple way to do it?
You can definitely use replaceAll with a negative lookahead:
String repl = input.replaceAll( "(?m)^(location=)(?!State)", "$1NY-" );
(?m) sets MULTILINE modifier so that we match anchors ^ and $ in each line
(location=) matches location= and captures the value in group #1
(?!State) is the negative lookahead to fail the match when State appears after the captured group #1 i.e. location=
In replacement we use $1NY- to make it location=NY- at start.
RegEx Demo
If I understand your intention correctly, you don't actually have the string "State" in your input, but varying strings that represent states.
But some of your text lines are missing the state altogether and only have the name of the City and House. Is that correct? In that case, the defining characteristic between the 2 kinds of lines is the number of dashes.
^location=([^-]+)-([^-]+)$
The above regex matches only full lines with only 1 dash.
I might have misunderstood the task. It would be easier if you would post some of the actual input.

regex certain character can exist or not but nothing after that

I'm new to regex and I'm trying to do a search on a couple of string.
I wanted to check if a certain character, in this case its ":" (without the quote) exist on the strings.
If : does not exist in the string it would still match, but if : exist there should be nothing after that only space and new line will be allowed.
I have this pattern, but it does not seem to work as I want it.
(.*)(:?\s*\n*)
Thank you.
If I understand your question correctly, ^[^:]*(:\s*)?$
Let's break this down a bit:
^ Starting anchor; without this, the match can restart itself every time it sees another colon, or non-whitespace following a colon.
[^:]* Match any number of characters that AREN'T colon characters; this way, if the entire string is non-colon characters, the string is treated as a valid match.
(:\s*)? If at any point we do see a colon, all following characters must be white space until the end of the string; the grouping parens and following ? act to make this an all-or-nothing conditional statement.
$ Ending anchor; without this, the regex won't know that if it sees a colon the following whitespace MUST persist until the end of the string.
here is a pattern which should work
/^([^:]*|([^:]*:\s*))$/
you can use the pipe to manage alternatives
Another way is :
^[^:]*(|:[\n]*)$
^[^:]* => starts with anything except :
(|:[\n]*)$ => ends either with exactly nothing OR ':' followed by line breaks

Regular Expression - Return all matches as a single match

I'm working with a piece of code that applies a regex to a string and returns the first match. I don't have access to modify the code to return all matches, nor do I have the ability to implement alternative code.
I have the following example target string:
usera,userb,,userc,,userd,usere,userf,
This is a list of comma delimited usernames joined from multiple sources, some of which were blank resulting in two commas in some places. I'm trying to write a regex that will return all of the comma delimited usernames except for specific values.
For example, consider the following expression:
[^,]\w{1,},(?<!(userb|userc|userd),)
This results in three matches:
usera,
usere,
userf,
Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,' ?
If I could write code in any language this would be trivial, but I'm limited to input of only the target string and the pattern, and I need a single match that has all items except for the ones I'm omitting. I'm not sure if this is even possible, everything I've ever done with regex's involves processing multiple items in a match collection.
Here is an example in Regex Coach. This image shows that there are the three matches I want, but my requirement is to have the text in a single match, not three separate matches.
EDIT1:
To clarify this ticket is specifically intended to solve the use case using only regular expression syntax. Solving this problem in code is trivial but solving it using only a regex was the requirement given the fact that the executing code is part of a 3rd party product that I didn't want to reverse engineer, wrap, or replace.
Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,'?
No. Regex matches are consecutive.
A regular expression matches a (sub)string from start to finish. You cannot drop the middle part, this is not how regex engines work. But you can apply the expression again to find another matching substring (incremental search - that's what Regex Coach does). This would result in a match collection.
That being said, you could also just match everything you don't want to keep and remove it, e.g.
,(?=[\s,]+)|(userb|userc|userd)[\s,]*
http://rubular.com/r/LOKOg6IeBa

Categories