Convert regex from Java to Go - java

(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])
I use the library reregexp2
This RE does not work in Go and will report error:
regexp2: Compile(`(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])`): error parsing regexp: unknown unicode category, script, or property 'Punct' in `(?<=^|[a-z]\-|[\s\p{Punct}&&[^\-]])([A-Z][A-Z0-9_]*-\d+)(?![^\W_])`
I hope it can be executed normally

If you take a look at Java regex Pattern documentation, you'll see that \p{Punct} is Punctuation: One of !"#$%&'()*+,-./:;<=>?#[\]^_{|}~
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
So what you need to convert this to a go regex checking the regexp syntax documentation

Related

I want to Capture a alphanumeric group without underscore

I want to Capture an alphanumeric group in regex such that it does not capture starting underscore. For example _reverse(abc) should return reverse(. I am using (?<name>\w+) but it return _reverse(.
You can try this,
[^a-zA-Z0-9()\\s+]
The output will be reverse(abc)
You can specify characters explicitly, e.g.:
[a-zA-Z0-9]+
From what you are showing, I assume you want to strip underscores and content behind the opening parentheses.
Basically, that should work with a regex like this:
"_([a-zA-Z0-9]+\()"
this can be used in conjunction with a Matcher to extract all capturing groups (in this case, [a-zA-Z0-9]+\() and return them.
Note that you can find almost all the help you need with Regular Expressions on utility sites like RegEx 101 and RegEx Per, the latter being a nice visualizer but only working with javaScript-like expressions.
Also, RegEx 101 contains a Regex Debugger to help avoid dangerous regular expressions

Regex syntax to check prefixes and suffixes

I'm building some regex expressions to match naming conventions in Sigasi Studio (which uses Java syntax for regex). For example, a port name must end in _i or _o - e.g. my_input_port_i
I tried using the txt2re generator, however instead of a simple expression it generated code.
Looking at regex syntax, it seems that the "$" character (end of line) and the "|" symbol (OR) could be helpful - something like $_i|_o but after testing with regex101.com no matches are found.
Naming convention dialog:
In Sigasi Studio the entire name should match. So your are looking for:
.*_[io]
The $ means end of the string, but you use it at the beginning.
Maybe you are looking for this at the end of the string, which uses an underscore _, then a character class to match i or o and then matches the end of the string $
_[io]$

Add Dash to Java Regex

I am trying to modify an existing Regex expression being pulled in from a properties file from a Java program that someone else built.
The current Regex expression used to match an email address is -
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
That matches email addresses such as abc.xyz#example.com, but now some email addresses have dashes in them such as abc-def.xyz#example.com and those are failing the Regex pattern match.
What would my new Regex expression be to add the dash to that regular expression match or is there a better way to represent that?
Basing on the regex you are using, you can add the dash into your character class:
RR.emailRegex=^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
add
RR.emailRegex=^[a-zA-Z0-9_\\.-]+#[a-zA-Z0-9_-]+\\.[a-zA-Z0-9_-]+$
Btw, you can shorten your regex like this:
RR.emailRegex=^[\\w.-]+#[\\w-]+\\.[\\w-]+$
Anyway, I would use Apache EmailValidator instead like this:
if (EmailValidator.getInstance().isValid(email)) ....
Meaning of - inside a character class is different than used elsewhere. Inside character class - denotes range. e.g. 0-9. If you want to include -, write it in beginning or ending of character class like [-0-9] or [0-9-].
You also don't need to escape . inside character class because it is treated as . literally inside character class.
Your regex can be simplified further. \w denotes [A-Za-z0-9_]. So you can use
^[-\w.]+#[\w]+\.[\w]+$
In Java, this can be written as
^[-\\w.]+#[\\w]+\\.[\\w]+$
^[a-zA-Z0-9_\\.\\-]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
Should solve your problem. In regex you need to escape anything that has meaning in the Regex engine (eg. -, ?, *, etc.).
The correct Regex fix is below.
OLD Regex Expression
^[a-zA-Z0-9_\\.]+#[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+$
NEW Regex Expression
^[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$
Actually I read this post it covers all special cases, so the best one that's work correctly with java is
String pattern ="(?:[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*|\"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*\")#(?:(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?|\\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-zA-Z0-9-]*[a-zA-Z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])";

java assignment regex

Could you please provide regex to match assignments in the text
$one: 3-2; $three: 4-1; 4
$one: 3-2
I've tried \$\w+:.+?;? and expect it to match $one: 3-2; and $one: 3-2.
But it matches only $one:
What am I missing?
Think you mean this,
\$\w+:[^;]*;?
In java.
"\\$\\w+:[^;]*;?"
Your regex matches upto the first space it's mainly because of the non-greedy pattern and the following optional semicolon.
\$\w+:.+?(?:;|$)
You need to use this with multi line mode. Yours is not working because ; is optional and .+? is non-greedy so it will stop by consuming only 1 character as it has the option to ignore ;
See demo:
https://regex101.com/r/eX9gK2/8
You can use this regex:
\$\w+:\s*([^;]+)
RegEx Demo
In Java it will be:
"\\$\\w+:\\s*([^;]+)"

Regular expression - filename with multiple periods

Consider the following command line: tfile -a -fn P2324_234.w07 -tc 8811
The regex to parse this: -\w+|\w+\s|\w+\.+\w+\s (see screenshot below)
The problem is when the file name has multiple dots, say: tfile -a -fn P23.24.23.4.w07 -tc 8811
Question: how to ensure the P23.24.23.4.w07 is parsed as one argument (as in P23.24.23.4.w07)?
Describe it!
For: P23.24.23.4.w07
use: \w+(?:\.\w+)+
note that for your java version you can use possessive quantifiers and atomic groups:
\\w++(?>\\.\\w++)+
Use a character class, e.g., /-fn [a-z0-9.]+ -tc/i. In English, that means "-fn, followed by one or more of characters between a-z, between 0-9, or a ., followed by -tc." If you want to capture that part, wrap that part in parentheses.
I have used this
-\w+|\w+\s|\S+.+\w+\s
Instead of 'word', we may use 'not space', You have not specified your extra requirement so I think it is fine.
Use a quantifier:
-\w+|\w+\s|(?:\w+\.+)+\w+\s
^^^ ^^
You can also simply your expression to:
-?\w+\s?|(?:\w+\.+)+\w+\s
For doing this in java, all you need to do is split it along the spaces, no regex needed. The good ole String.split() should be able to handle it.

Categories