(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])
I use the library reregexp2
This RE does not work in Go and will report error:
regexp2: Compile(`(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])`): error parsing regexp: unknown unicode category, script, or property 'Punct' in `(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])`
I hope it can be executed normally
If you take a look at Java regex Pattern documentation, you'll see that \p{Punct} is Punctuation: One of !"#$%&'()*+,-./:;<=>?#[\]^_{|}~
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
So what you need to convert this to a go regex checking the regexp syntax documentation
Related
I want to Capture an alphanumeric group in regex such that it does not capture starting underscore. For example _reverse(abc) should return reverse(. I am using (?<name>\w+) but it return _reverse(.
You can try this,
[^a-zA-Z0-9()\\s+]
The output will be reverse(abc)
You can specify characters explicitly, e.g.:
[a-zA-Z0-9]+
From what you are showing, I assume you want to strip underscores and content behind the opening parentheses.
Basically, that should work with a regex like this:
"_([a-zA-Z0-9]+\()"
this can be used in conjunction with a Matcher to extract all capturing groups (in this case, [a-zA-Z0-9]+\() and return them.
Note that you can find almost all the help you need with Regular Expressions on utility sites like RegEx 101 and RegEx Per, the latter being a nice visualizer but only working with javaScript-like expressions.
Also, RegEx 101 contains a Regex Debugger to help avoid dangerous regular expressions
I'm building some regex expressions to match naming conventions in Sigasi Studio (which uses Java syntax for regex). For example, a port name must end in _i or _o - e.g. my_input_port_i
I tried using the txt2re generator, however instead of a simple expression it generated code.
Looking at regex syntax, it seems that the "$" character (end of line) and the "|" symbol (OR) could be helpful - something like $_i|_o but after testing with regex101.com no matches are found.
Naming convention dialog:
In Sigasi Studio the entire name should match. So your are looking for:
.*_[io]
The $ means end of the string, but you use it at the beginning.
Maybe you are looking for this at the end of the string, which uses an underscore _, then a character class to match i or o and then matches the end of the string $
_[io]$
I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";
Could you please provide regex to match assignments in the text
$one: 3-2; $three: 4-1; 4
$one: 3-2
I've tried \$\w+:.+?;? and expect it to match $one: 3-2; and $one: 3-2.
But it matches only $one:
What am I missing?
Think you mean this,
\$\w+:[^;]*;?
In java.
"\\$\\w+:[^;]*;?"
Your regex matches upto the first space it's mainly because of the non-greedy pattern and the following optional semicolon.
\$\w+:.+?(?:;|$)
You need to use this with multi line mode. Yours is not working because ; is optional and .+? is non-greedy so it will stop by consuming only 1 character as it has the option to ignore ;
See demo:
https://regex101.com/r/eX9gK2/8
You can use this regex:
\$\w+:\s*([^;]+)
RegEx Demo
In Java it will be:
"\\$\\w+:\\s*([^;]+)"
Consider the following command line: tfile -a -fn P2324_234.w07 -tc 8811
The regex to parse this: -\w+|\w+\s|\w+\.+\w+\s (see screenshot below)
The problem is when the file name has multiple dots, say: tfile -a -fn P23.24.23.4.w07 -tc 8811
Question: how to ensure the P23.24.23.4.w07 is parsed as one argument (as in P23.24.23.4.w07)?
Describe it!
For: P23.24.23.4.w07
use: \w+(?:\.\w+)+
note that for your java version you can use possessive quantifiers and atomic groups:
\\w++(?>\\.\\w++)+
Use a character class, e.g., /-fn [a-z0-9.]+ -tc/i. In English, that means "-fn, followed by one or more of characters between a-z, between 0-9, or a ., followed by -tc." If you want to capture that part, wrap that part in parentheses.
I have used this
-\w+|\w+\s|\S+.+\w+\s
Instead of 'word', we may use 'not space', You have not specified your extra requirement so I think it is fine.
Use a quantifier:
-\w+|\w+\s|(?:\w+\.+)+\w+\s
^^^ ^^
You can also simply your expression to:
-?\w+\s?|(?:\w+\.+)+\w+\s
For doing this in java, all you need to do is split it along the spaces, no regex needed. The good ole String.split() should be able to handle it.