Find strings with regex expression [closed] - java

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I want to search into Java packages using the following expression:
com.company.*
Test example: https://regex101.com/r/tHTQd9/2
But when I use it into Java code it's not finding anything. Do I need to put some escape characters for .?

The following expression would work:
\bcom\.company\.\w[\w\.]*\b
Match between word-boundaries
Use literal dot characters by escaping
1 alphanumeric (or underscore) followed by 0 or more alphanumerics or dots
Pattern regex = Pattern.compile("\\bcom\\.company\\.\\w[\\w\\.]*\\b");

If you are looking for a word or more in the last sequence you can try:
com\\.company\\.\w+
Or, even more generic (any other character or more):
com\\.company\\..+
Please remember that this is quite generic and prone to errors.
If you provide a more detailed explanation or constraints we can help building a better RegEx.
Why double backslash in Java?
We know that the backslash character is an escape character in Java
String literals as well. Therefore, we need to double the backslash
character when using it to precede any character (including the \
character itself).
Source

In java to escape dot (.) you need to append double backslash (\\) so your regex will be like this:
com\\.company\\.*
Why double backslash is needed:
As dot(.) is a special symbol in regex so you need to escape it using a backslash (\) but as backslash also works as an escape character in java so it will be removed by java after processing the string. In order to preserve it, we need to add another backslash (\)
Regex string you will see
com\\.company\\.*
String after java processed it which will be the input as regex
com\.company\.*

Related

What does regular expression (?<=[\\S])[\\S]*\\s* do? [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 3 years ago.
I saw a regex expression in this other stackoverflow question but I didn't understand the meaning of each part.
String[] split = s.split("(?<=[\\S])[\\S]*\\s*");
The result of this is the Acronym of a sentence.
In order to understand a chaining regex expression should I start reading it from left to right or viceversa? How can I identify (or limit) each part?
Thank you for your answers.
(?<=[\\S]) states that the match should be preceded by \\S, that is, anything except for a space.
[\\S]* states that the regex should match zero or more non-space characters
\\s* matches zero or more spaces.
In essence, the regex finds a non-space character, and matches all non-space characters in front of it, along with the spaces after them.
The regex matches ohandas<space><space> and aramchand<space> from Mohandas Karamchand G
Thus, after using these matches to split the string, you end up with {"M", "K", "G"}
Note the two spaces that the regex matches after Mohandas, because the \\s* part matches zero or more spaces
To clarify suspircius regular expression you may use the websites https://regexr.com/ or https://regex101.com/
Both mark parts with colors and explain what they do. But you have to replace the double backslashes by single backslashes.

not able to escape regex using karate framework on IntelliJ [duplicate]

This question already has an answer here:
regex: How to escape backslashes and special characters?
(1 answer)
Closed 4 years ago.
I'm testing a web service using karate framework using IntelliJ.
By framework definition, I should be able to use regex to assert XML responses and I have been able to use it to some extent.
But the problem arises when I want to assert using regex which contains back-slash, for example: "\X{20,}"
So I tried: (using 3 back-slashes \\\)
Then match response ...... rawData == '#regex \\\X{20,}'
and this gives me an error:
com.intuit.karate.exception.KarateException: Illegal/unsupported escape sequence near index 1
\X{20,}
The regex \\\X{20,} cannot be valid. Double backslash is required in your feature file.
So \\ represents a single regex backslash (meant to escape what follows it). If you want to match a single literal \ in your regex, you need \\\\ in the feature file.
So your pattern should probably be \\\\X{20,} if the content should contain a backslash.
This is documented here.
Note that regex escaping has to be done with a double back-slash - for e.g: '#regex a\.dot' will match 'a.dot'

(Java) To replace with a slash, why are 4 slashes required in replacement argument of String's replaceAll method? [duplicate]

This question already has answers here:
Regular expression to match a backslash followed by a quote
(3 answers)
Closed 5 years ago.
In Java, the string "\\" represents a single backlash, the first backslash being an escape character. Thus System.out.print("\\") prints \. However if "\\" is given as the replacement argument in method replaceAll, as in "aba".replaceAll("b", "\\"), the following exception is thrown: java.lang.IllegalArgumentException: character to be escaped is missing.
Four slashes does the trick. Thus if one prints "aba".replaceAll("b", "\\\\") the result is a\a. But why is two slashes incorrect? Isn't the first slash the escaping slash, and the second slash the character to be escaped, just like in System.out.print("\\")? Notice that only one escaping slash is sufficient for other replacement strings passed to replaceAll. E.g. printing "aba".replaceAll("b", "\t") results in a a.
Note: I'm using Java SE 9.
Edit: Some the questions suggested as duplicates are not duplicates. Please do not confuse this with the question of why four backslashes are needed in a regex to match a single backslash. This is not the same issue, as the second argument in replaceAll is obviously not a regex. You couldn't specify a replacement String with a regex because ultimately replacement needs to resolve to a literal String.
The answer is that replaceAll method takes a regualar expression String which has its own meaning of \ character so you have to escape it twice.
In simple words, reason that "aba".replaceAll("b", "\t") results in a a is that \t is parsed to tabulation before regex, so when regex is parsed it only contains a tabulation.

Using regex to only match those Strings which use escape character correctly (according to Java syntax)?

take these strings for example:
"hello world\n" (correct - regex should match this)
"I'm happy \ here" (this is incorrect as the escape character is not
used correctly - regex should not match this one)
I've tried searching on google but didn't find anything helpful.
I want this one to be used in a parser which only parses string literals from a java code file.
Here is the the regex I used:
"\\\"(\\[tbnrf\'\"\\])*[a-zA-Z0-9\\`\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)\\_\\-\\+\\=\\|\\{\\[\\}\\]\\;\\:\\'\\/\\?\\>\\.\\<\\,]\\\""
what am I doing wrong?
I guess you gave us the regex in Java String literal form, like
String regex = \"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\";
Unpacking that from Java's String escaping syntax gives the raw regex:
\"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\"
That consists of:
\" matching a double-quote character (Java String literal begins here). Escaping the double quotes with backslash isn't necessary: " on its own is ok as well.
(\[tbnrf'"\])*: a group, repeated 0...n times. I guess you want that to match against the various Java backslash escapes, but that should read (\\[tbnrf'"\\])* with a double backslash in front and inside the character class. And maybe you want to cover the Java octal escapes as well (see the language specification), giving (\\[tbnrf01234567'"\\])*
[a-zA-Z0-9\``\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]: a character class matching one character from a selected list of alphabetic and punctuation characters. I'd replace that with [^"\\], meaning anything but double quote or backslash.
\" matching a double-quote character (string literal ends here). Once again, no need to escape the double quote.
Besides the individual elements, the overall structure of the regex probably isn't what you want: You allow only strings beginning with any number of backslash escapes, followed by exactly one non-escape character, and this enclosed in a pair of double quotes.
The overall structure should instead be "(backslash_escape|simple_character)*"
So, the complete regex would be:
"(\\[tbnrf01234567'"\\]|[^"\\])*"
or, expressed in a Java literal:
String regex = "\"(\\\\[tbnrf01234567'\"\\\\]|[^\"\\\\])*\"";
And, although this is shorter than your original attempt, I'd still not call it readable and opt for a different implementation, not using regular expressions.
P.S. Although I did some testing with my regex, I'm not at all sure that it covers all relevant cases correctly.
P.P.S. There are the \uxxxx escapes, not yet covered by the regex.

Escaping space and equal to character [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I am using java and have string which could have multiple spaces and equal to "=" sign as shown below.
String temp = "[name='FPC:CPU']/XM chip/allocate";
This temp string will passed to some other program which is failing because of space and equal sign.
How can i escape space and "=" character?
My desired out put from original string
[name='FPC:CPU']/XM chip/allocate
to
[name\='FPC:CPU']/XM\ chip/allocate
Wondering how can i do that using temp.replaceAll
That should be pretty straight forward.
System.out.println("foo bar=baz".replaceAll("([ =])" "\\\\$1"));
Should print this
foo\ bar\=baz
The parenthesis in the regular expression form a capturing group, and the character class [ =] will capture spaces and equal signs.
In the replace expression, the $1 refers to the first capturing group. The only thing that gets a bit tricky is escaping the backslash.
Normally, in a regular expression replacement the backslash itself is an escape character. So you'd need two of them together to insert a backslash, however backslash is also an escape in a Java String, so to put two backslashes into a Java String (to form the regular expression escape), you must insert four backslashes. So that's how you end up with "\\$1".

Categories