Regex does not work will tailing quote

Regex does not work will tailing quote - java

I have regex to remove the match the string that MAY start and end with quotes. So I created a regex to do this.
String str = "#TEST_ENV_TEST_VAR=\"value\"";
Pattern p = Pattern.compile("#TEST_ENV_(.*)=\"?(.*)\"?");
Matcher matcher = p.matcher(str);
matcher.find()
String key = matcher.group(2);
But when I check the key and string is value". It should be value right because we have added ? at the end.
I try using []? regex and also try with * but none work.

You do not need the last ? because it stops the greediness of (.*) and stops at the first "
#TEST_ENV_(.*)=\"?(.*)\"
Demo
Otherwise, if the goal is only to match the string between quotes, you could simply use positive lookaheads and lookbehinds
(?<=\").*(?=\")
Demo

This one will match your stuff correctly:
#TEST_ENV_([^=]*)=(\"?)([^\"]*)\2
This captures:
your #TEST_ENV_ string
anything not an equal sign, in the first group
the equal sign itself
an optional quote, as group #2
the contents of the quote, defined as "anything not a quote"
A back-reference to group 2, meaning it requires a bracket if the front has one.
Of course, the data will be in group #1 and group #3, since #2 is used for the brackets.
Test it yourself.

Related

can deal with the first line space when i use regex for polynomials

here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out

Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits

Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.

Java regexp in matcher input

I'm trying to get quoted strings using regexp.
String regexp = "('([^\\\\']+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*'|\"([^\\\\\"]+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*\")";
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(source);
while (m.find()) {
String newElement = m.group(1);
//...
}
It works well, but if source text contains
' onkeyup="this.value = this.value.replace (/\D/, \'\')">'
program goes into eternal loop.
How can I correctly get this string?
For example, I have a text(php code):
'qty'=>'<input type="text" maxlength="3" class="qty_text" id='.$key.' value ='
The result should be
'qty'
'<input type="text" maxlength="3" class="qty_text" id='
' value ='

Your regex seems to work okay when presented with a string it matches; it's when it can't match that it goes into the endless loop. (In this case it's the \D that's causing it to choke.) But that regex is much more complicated than it needs to be; you're trying to match them, not validate them. Here's the quintessential regex for a string literal in C-style languages:
"[^"\\\r\n]*(?:\\.[^"\\\r\n]*)*"
...and the single-quoted version, for languages that support that style:
'[^'\\\r\n]*(?:\\.[^'\\\r\n]*)*'
It uses Friedl's "unrolled loop" technique for maximum efficiency. Here's the Java code for it, as generated by RegexBuddy 4:
Pattern regex = Pattern.compile(
"\"[^\"\\\\\r\n]*(?:\\\\.[^\"\\\\\r\n]*)*\"|'[^'\\\\\r\n]*(?:\\\\.[^'\\\\\r\n]*)*'"
);

Maybe I misunderstand the principle, but that looks rather trivial now that you added the example.
Consider this for instance:
String input = "'qty'=>'<input type=\"text\" maxlength=\"3\" class=\"qty_text\" id='.$key.' value ='";
String otherInput = "' onkeyup=\"this.value = this.value.replace (/\\D/, \'\')\">'";
// matching anything starting with single quote and ending with single quote
// included, reluctant quantified
Pattern p = Pattern.compile("'.+?'");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group());
}
m = p.matcher(otherInput);
System.out.println();
while (m.find()) {
System.out.println(m.group());
}
Output:
'qty'
'<input type="text" maxlength="3" class="qty_text" id='
' value ='
' onkeyup="this.value = this.value.replace (/\D/, '
')">'
See the Java Pattern documentation for more detailed explanations.

The character groups that match neither backslashes nor quotes shouldn't be followed by a +. Remove the +es to fix the hang (which was due to catastrophic backtracking).
Also, your original regex wasn't recognizing \D as a valid backslash escape - therefore the string constant in your test input containing \D wasn't being matched. If you make the rules of your regex more liberal to recognize any character immediately following a backslash as part of the string constant, it will behave the way you expect.
"('([^\\\\']|\\\\.)*'|\"([^\\\\\"]|\\\\.)*\")"

You can do it all in one line using split() with the right regex:
String[] array = source.replaceAll("^[^']+", "").split("(?<!\\G.)(?<=').*?(?='|$)");
There's a reasonable amount of regex kung fu going on here, so I'll break it down:
The delimiter is wrapped by even/odd quotes, but can not contain the quotes because split() consumes the delimiter, so a look behind (?<=') and look ahead (?=') (which are non-consuming) is used to match the quotes instead of a literal quote in the regex
a reluctant match .*? for characters between the quotes ensures that it stops at the next quote (instead of matching through to the last quote)
I added an alternate match for end of input tot he look ahead (?='|$) in case there's no trailing close quote
And saving the best for last, the regex that is key to making this all work is the negative look behind (?<!\\G.) which means "don't match on the end of the previous match" and ensures the next match advances past the end of the previous delimiter, without which you would end up with just the quote characters in your array. \G matches the end of the previous match, but also matches start of input for the first match, so it rather neatly automatically handles not matching on the first quote - thus making the delimiter wrapped in even/odd quote instead of odd/even as it would be otherwise.
To cater for the input's first character not being a quote, you need to strip off the leading characters before splitting - that's why the replaceAll() is needed
Here's some test code using your sample input:
String source = "'qty'=>'<input type=\"text\" maxlength=\"3\" class=\"qty_text\" id='.$key.' value ='";
String[] array = source.replaceAll("^[^']+", "").split("(?<!\\G.)(?<=').*?(?='|$)");
System.out.println(Arrays.toString(array));
Output:
['qty', '<input type="text" maxlength="3" class="qty_text" id=', ' value =']

Java Regex lookahead takes too much time

I'm trying to create a proper regex for my problem and apparently ran into weird issue.
Let me describe what I'm trying to do..
My goal is to remove commas from both ends of the string. E,g, string , ,, ,,, , , Hello, my lovely, world, ,, , should become just Hello, my lovely, world.
I have prepared following regex to accomplish this:
(\w+,*? *?)+(?=(,?\W+$))
It works like a charm in regex validators, but when I'm trying to run it on Android device, matcher.find() function hangs for ~1min to find a proper match...
I assume, the problem is in positive lookahead I'm using, but I couldn't find any better solution than just trim commas separately from the beginning and at the end:
output = input.replaceAll("^(,?\\W?)+", ""); //replace commas at the beginning
output = output.replaceAll("(,?\\W?)+$", ""); //replace commas at the end
Is there something I am missing in positive lookahead in Java regex? How can I retrieve string section between commas at the beginning and at the end?

You don't have to use a lookahead if you use matching groups. Try regex ^[\s,]*(.+?)[\s,]*$:
EDIT: To break it apart, ^ matches the beginning of the line, which is technically redundant if using matches() but may be useful elsewhere. [\s,]* matches zero or more whitespace characters or commas, but greedily--it will accept as many characters as possible. (.+?) matches any string of characters, but the trailing question mark instructs it to match as few characters as possible (non-greedy), and also capture the contents to "group 1" as it forms the first set of parentheses. The non-greedy match allows the final group to contain the same zero-or-more commas or whitespaces ([\s,]*). Like the ^, the final $ matches the end of the line--useful for find() but redundant for matches().
If you need it to match spaces only, replace [\s,] with [ ,].
This should work:
Pattern pattern = Pattern.compile("^[\\s,]*(.+?)[\\s,]*$");
Matcher matcher = pattern.matcher(", ,, ,,, , , Hello, my lovely, world, ,, ,");
if (!matcher.matches())
return null;
return matcher.group(1); // "Hello, my lovely, world"

How can I remove all leading and trailing punctuation?

I want to remove all the leading and trailing punctuation in a string. How can I do this?
Basically, I want to preserve punctuation in between words, and I need to remove all leading and trailing punctuation.
., #, _, &, /, - are allowed if surrounded by letters
or digits
\' is allowed if preceded by a letter or digit
I tried
Pattern p = Pattern.compile("(^\\p{Punct})|(\\p{Punct}$)");
Matcher m = p.matcher(term);
boolean a = m.find();
if(a)
term=term.replaceAll("(^\\p{Punct})", "");
but it didn't work!!

Ok. So basically you want to find some pattern in your string and act if the pattern in matched.
Doing this the naiive way would be tedious. The naiive solution could involve something like
while(myString.StartsWith("." || "," || ";" || ...)
myString = myString.Substring(1);
If you wanted to do a bit more complex task, it could be even impossible to do the way i mentioned.
Thats why we use regular expressions. Its a "language" with which you can define a pattern. the computer will be able to say, if a string matches that pattern. To learn about regular expressions, just type it into google. One of the first links: http://www.codeproject.com/Articles/9099/The-30-Minute-Regex-Tutorial
As for your problem, you could try this:
myString.replaceFirst("^[^a-zA-Z]+", "")
The meaning of the regex:
the first ^ means that in this pattern, what comes next has to be at
the start of the string.
The [] define the chars. In this case, those are things that are NOT
(the second ^) letters (a-zA-Z).
The + sign means that the thing before it can be repeated and still
match the regex.
You can use a similar regex to remove trailing chars.
myString.replaceAll("[^a-zA-Z]+$", "");
the $ means "at the end of the string"

You could use a regular expression:
private static final Pattern PATTERN =
Pattern.compile("^\\p{Punct}*(.*?)\\p{Punct}*$");
public static String trimPunctuation(String s) {
Matcher m = PATTERN.matcher(s);
m.find();
return m.group(1);
}
The boundary matchers ^ and $ ensure the whole input is matched.
A dot . matches any single character.
A star * means "match the preceding thing zero or more times".
The parentheses () define a capturing group whose value is retrieved by calling Matcher.group(1).
The ? in (.*?) means you want the match to be non-greedy, otherwise the trailing punctuation would be included in the group.

Use this tutorial on patterns. You have to create a regex that matches string starting with alphabet or number and ending with alphabet or number and do inputString.matches("regex")

How to find the exact word using a regex in Java?

Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.

When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.

Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after

For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));

You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"

Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)

System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex does not work will tailing quote - java

You do not need the last ? because it stops the greediness of (.) and stops at the first " #TEST_ENV_(.)=\"?(.)\" Demo Otherwise, if the goal is only to match the string between quotes, you could simply use positive lookaheads and lookbehinds (?<=\").(?=\") Demo

Related

can deal with the first line space when i use regex for polynomials

Java regexp in matcher input

Java Regex lookahead takes too much time

How can I remove all leading and trailing punctuation?

How to find the exact word using a regex in Java?

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex does not work will tailing quote - java

You do not need the last ? because it stops the greediness of (.*) and stops at the first " #TEST_ENV_(.*)=\"?(.*)\" Demo Otherwise, if the goal is only to match the string between quotes, you could simply use positive lookaheads and lookbehinds (?<=\").*(?=\") Demo

Related

can deal with the first line space when i use regex for polynomials

Java regexp in matcher input

Java Regex lookahead takes too much time

How can I remove all leading and trailing punctuation?

How to find the exact word using a regex in Java?

Categories

Resources

You do not need the last ? because it stops the greediness of (.) and stops at the first " #TEST_ENV_(.)=\"?(.)\" Demo Otherwise, if the goal is only to match the string between quotes, you could simply use positive lookaheads and lookbehinds (?<=\").(?=\") Demo