Constructing regular expression - java

I want to construct a regular expression in order to validate a string with Oval.
I got lost in all the signs and expression.
I want my string not to contain certain words at the beginning and doesn't contain some special words.
Like for example I want to exclude the words ignoreMe1,ignoreMe2,ignoreMe3at the beginning of the string and exclude ?;*/.
I tried to do this as start : ^(?!ignoreMe1|ignoreMe2|ignoreMe3) but it doesn't work.
How to proceed?

This will match anything that doesn't start with ignoreMe1/2/3 and does not contain any of those symbols
^(?!ignoreMe1|ignoreMe2|ignoreMe3)[^?;*/]+$
See example here

Related

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

combine multiple regex to extract sub string from : separated string

I have been stuck for some time developing a single regex to extract a path from either of the following strings :
1. "life:living:fast"
2. "life"
3. ":life"
4. ":life:"
I have these regex expressions to use :
(.{3,}):", ":(.{3,}):", ":(.{3,})", "(.{3,})
The first match is all I need. i.e. the desired result for each should be the string located where the word life is. consider life to be a variable
But for some reason combining these individual regex's is a pain: If I excecute them sequentially I get the word 'life' extracted. However I am unable to combine them into one.
I appreciate your effort.
If you want the first life with the colons, you can use this:
^:?(?:.{3,}?)(?::|$)
See demo
If you prefer the first life without the colons, switch to this:
((?<=^:)|^)([^:]{3,}?)(?=:|$)
See demo
How it Works #1: ^:?(?:.{3,}?)(?::|$)
With ^:?, at the beginning of the string, we match an optional colon
(?:.{3,}?) lazily matches three or more chars up to...
(?::|$) a colon or the end of the string
How it Works #1: ((?<=^:)|^)([^:]{3,}?)(?=:|$)
((?<=^:)|^) ensures that we are either positioned at the beginning of the string, or after a colon immediately after the beginning of the string
([^:]{3,}?) lazily matches chars that are not colons...
up to a point where the lookahead (?=:|$) can assert that what follows is a colon or the end of the string.
You can use this pattern, since you are looking for the first word:
(?<=^:?)[^:]{3,}
Note that this pattern doesn't check all the string.

Regex and java to ignore keywords and string inside quotation

I'm searching for keywords that has to start with a letter followed by a letter or a character or nothing
Things I am looking for: x, x2, xx, and so on
the regular expression i have is [A-Za-z][A-Za-z0-9]+|[a-zA-Z]
I need to ignore words such as INT, WRITE, READ and so on, not sure how to implement
also if it comes across a string with quotation, I need it to ignore whatever that is inside the quotation?
any help?
Thanks in advance.
Your question is not clear to me. If you want to accept words that start with a letter, and continue with either a letter or a digit (or an underscore) ; but exclude words from a list, you can use the regex:
(?!\b(?:INT|WRITE|READ)\b)\b[A-Za-z]\w*\b
If, instead of a list, you want to exclude words that consist of all capitalized letters, then try:
(?!(?:\b[A-Z]+\b))\b[A-Za-z]\w*\b
In Java, I believe you need to double the backslashes for the metacharacters, so it might be something like:
"(?!\\b(?:INT|WRITE|READ)\\b)\\b[A-Za-z]\\w*\\b"
If you also want to exclude strings within quotes, you could use something like:
"[^"]+"|((?!\b(?:INT|WRITE|READ)\b)\b[A-Za-z]\w*\b)
and then check to see if there is anything within capturing group 1 which would NOT include the phrases delineated by the double quote marks.
Another option would be to replace all those parameters you don't want with nothing -- the word list as well as the quoted text. In Java, something like:
String resultString = subjectString.replaceAll("\"[^\"]*\"|\\b(?:WRITE|INT|READ)\\b", "");

Java RegEx pattern is invalid when trying to exclude commas

I'm building a function to validate usernames, and in this case I want to accept alphabetic characters only. I'm matching the provided user input against this regex:
[1-9!##$%&*()_+=|<>?{}\\[\\]~-,]
This is the method that makes use of the regex:
public static String purgeInvalidLogin(String failedLogin, String pattern) {
Pattern special = Pattern.compile (pattern);
String purgedLogin = failedLogin.replaceAll(special.pattern(), ""); // remove any special characters before moving on
purgedLogin = StringUtils.deleteWhitespace(purgedLogin);
return purgedLogin;
}
However when trying to run this I get this message:
Illegal character range near index 25 [!##$%&*()_+=|<>?{}[]~-,] ^
which only happened once I added the comma. I've also tried the expression [!##$%&*()_+=|<>?{}[]~-\,] (escaping the comma) to no avail. I'm wondering how I can use the regex properly to exclude commas making use of my method above.
Thanks in advance.
Escape the hyphen just before it. It is interpreted as defining a range of characters, as soon as you add another character (the comma) after it.
[1-9!##$%&*()_+=|<>?{}\\[\\]~\\-,]
You want to accept only alpha chars and you are doing this by listing every possible illegal character. I think you have got this backwards and it would better to look for what you do want (which would be a much shorter regex) and flag non matches.

Regular Expression - Return all matches as a single match

I'm working with a piece of code that applies a regex to a string and returns the first match. I don't have access to modify the code to return all matches, nor do I have the ability to implement alternative code.
I have the following example target string:
usera,userb,,userc,,userd,usere,userf,
This is a list of comma delimited usernames joined from multiple sources, some of which were blank resulting in two commas in some places. I'm trying to write a regex that will return all of the comma delimited usernames except for specific values.
For example, consider the following expression:
[^,]\w{1,},(?<!(userb|userc|userd),)
This results in three matches:
usera,
usere,
userf,
Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,' ?
If I could write code in any language this would be trivial, but I'm limited to input of only the target string and the pattern, and I need a single match that has all items except for the ones I'm omitting. I'm not sure if this is even possible, everything I've ever done with regex's involves processing multiple items in a match collection.
Here is an example in Regex Coach. This image shows that there are the three matches I want, but my requirement is to have the text in a single match, not three separate matches.
EDIT1:
To clarify this ticket is specifically intended to solve the use case using only regular expression syntax. Solving this problem in code is trivial but solving it using only a regex was the requirement given the fact that the executing code is part of a 3rd party product that I didn't want to reverse engineer, wrap, or replace.
Is there any way to get these results as a single match, instead of a match collection, e.g. a single match having the text 'usera,usere,userf,'?
No. Regex matches are consecutive.
A regular expression matches a (sub)string from start to finish. You cannot drop the middle part, this is not how regex engines work. But you can apply the expression again to find another matching substring (incremental search - that's what Regex Coach does). This would result in a match collection.
That being said, you could also just match everything you don't want to keep and remove it, e.g.
,(?=[\s,]+)|(userb|userc|userd)[\s,]*
http://rubular.com/r/LOKOg6IeBa

Categories