I've got this bit of code to grab a url within a textarea. It has been working great until I tried a url with a '+' in it.
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
So I tried puting \\+ and \\\\+ in my code but it did not work. So i did some googling and stack overflow problems kept mentioning this guy
Pattern.quote("+");
However, I am not sure how I implement that statement into what I currently have now. If that is even the way I want to go. But I'm assuming I need to do something like this...
String quote = Pattern.quote("+");
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
And then add the variable quote somewhere in the pattern? Please help! I just learned this stuff today I'm brand new to it! Thank you?
just escape the quote with \, example
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z\"]*)(.*)");
(https?[://.0-9-?a-z=_#!A-Z]*)
Bear in mind that [ and ] denote a class of characters, and that this means that any character within it will be included. [aegl]+ will match "age", "a", "e", g", "eagle", and "gaggle". It also means that a character listed twice (like /) is completely redundant.
Pattern.quote is useful, but will only return the same string with a backslash preceding any special character. Pattern.quote("+") will return \+.
Because + has no significance between square brackets, you should be able to put a + unescaped within the square brackets. At that point you can also add a \\ if it makes you feel better.
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z+]*)(.*)");
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z\\+]*)(.*)");
See it here: http://fiddle.re/0780
Related
Update:
I've found the solution thanks to #dasblinkenlight and all other good samaritans.
The working code is here for any of you with similar question:
Pattern pattern = Pattern.compile("(\\d+)(\\s)([-+*%/^])(\\s)(\\d+)");
Matcher matchOp1 = pattern.matcher(text);
matchOp1.find();
System.out.println(matchOp1.group(1));
This will only print the first group.
Original Question:
First and foremost, I cannot use any if statements, therefore I must catch and handle exceptions only.
Assume i have a string which contains "10 + 20".
I have the following regex: "(\d\+)(\s)([\+\-\*\%\^])(\s)([\d\+)".
This regex is intended to match (integer of any length)(space)(an operator)(space)(integer of any length)
Pattern pattern = Pattern.compile("(\\d\\+)(\\s)([\\+\\-\\*\\%\\^])(\\s)([\\d\\+)");
Matcher matchOp1 = pattern.matcher("1 + 1");
System.out.println(matchOp1.group(1));
I want this to print "10" only if there's a match, but this throws PatternSyntaxException. Can anyone give me some insight please?
Thank you!
You have an extra [ in your pattern, and you escaped pluses where you shouldn't have:
Pattern pattern = Pattern.compile("(\\d\\+)(\\s)([\\+\\-\\*\\%\\^])(\\s)([\\d\\+)");
// ^^ ^ ^^
Removing these will fix the problem.
Note that escaping meta-characters inside character class [...] is not necessary: just be careful to move - to one of the ends, and place ^ in any position other than first:
"(\\d+)(\\s)([-+*%/^])(\\s)(\\d+)"
Note that with all these unnecessary backslashes you forgot the division sign.
You've some issues in your regex
Pattern pattern = Pattern.compile("(\\d\\+)(\\s)([\\+\\-\\*\\%\\^])(\\s)([\\d\\+)");
^ ^ ^
You havent closed your square brackets
You should not escape + as it is there to indicate more than 1 digit, NOT literally +.
It will throw IllegalStateException so you have to place if(matchOp1.find()) before capturing group.
Instead, it should be like:
(\d+)(\s)([\+\-\*\%\^])(\s)(\d+)
and while using in code:
Pattern pattern = Pattern.compile("(\\d+)(\\s)([\\+\\-\\*\\%\\^])(\\s)(\\d+)");
Matcher matchOp1 = pattern.matcher("1 + 1");
if(matchOp1.find())
System.out.println(matchOp1.group(1));
DEMO
I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.
I am trying to match a series of string thats looks like this:
item1 = "some value"
item2 = "some value"
I have some strings, though, that look like this:
item-one = "some new value"
item-two = "some new value"
I am trying to parse it using regular expressions, but I can't get it to match the optional hyphen.
Here is my regex string:
Pattern p = Pattern.compile("^(\\w+[-]?)\\w+?\\s+=\\s+\"(.*)\"");
Matcher m = p.matcher(line);
m.find();
String option = m.group(1);
String value = m.group(2);
May someone please tell me what I could be doing wrong.
Thank you
I suspect that main reason of your problem is that you are expecting w+? to make w+ optional, where in reality it will make + quantifier reluctant so regex will still try to find at least one or more \\w here, consuming last character from ^(\\w+.
Maybe try this way
Pattern.compile("^(\\w+(?:-\\w+)?)\\s+=\\s+\"(.*?)\"");
in (\\w+(?:-\\w+)?) -> (?:-\\w+) part will create non-capturing group (regex wont count it as group so (.*?) will be group(2) even if this part will exist) and ? after it will make this part optional.
in \"(.*?)\" *? is reluctant quantifier which will make regex to look for minimal match that exist between quotation marks.
Demo
Your problem is that you have the ? in the wrong place:
Try this regex:
^((\\w+-)?\\w+)\\s*=\\s*\"([^\"]+)\"
But use groups 1 and 3.
I've cleaned up the regex a bit too
This regex should work for you:
^\w[\w-]*(?<=\w)\s*=\s*\"([^"]*)\"
In Java:
Pattern p = Pattern.compile("^\\w[\\w-]*(?<=\\w)\\s*=\\s*\"([^\"]*)\"");
Live Demo: http://www.rubular.com/r/0CvByDnj5H
You want something like this:
([\w\-]+)\s*=\s*"([^"]*)"
With extra backslashes for Java:
([\\w\\-]+)\\s*=\\s*\"([^\"]*)\"
If you expect other symbols to start appearing in the variable name, you could make it a character class like [^=\s] to accept any characters not = or whitespace, for example.
I need to print #OPOK, but in the following code:
String s = "\"MSG1\":\"00\",\"MSG2\":\"#OPOK\",\"MSG3\":\"XXXXXX\"}";
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+)\".*");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
System.out.println("Match not found");
}
I get #OPOK","MSG3":"XXXXXX instead, how do I fix my pattern ?
You want to make your .+ part reluctant. By default it's greedy - it'll match as much as it can without preventing the pattern from matching. You want it to match as little as it can, like this:
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+?)\".*");
The ? is what makes it reluctant. See the Pattern documentation for more details.
Or of course you could just match against "any character other than a double quote" which is what Brian's approach will do. Both will work equally well as far as I'm aware; there may well be performance differences between them (I'd expect Brian's to perform better to be honest) but if performance is important to you you should test both approaches.
You probably want the following:
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]+)\"");
For the capture group you are interested in, this will match any character except a double quote. Since the group is surrounded by double quotes, this should prevent it from going "too far" in the match.
Edited to add: As #bmorris591 suggested in the comments, you can add an extra + (as shown below) to make the quantifier possessive. This may help improve performance in cases where the matcher fails to find a match.
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]++)\"");
I have strings like "xxxxx?434334", "xxx?411112", "xxxxxxxxx?11113" and so on.
How to substring properly to retrieve "xxxxx" (everything that comes untill '?' character)?
return s.substring(0, s.indexOf('?'));
No need for a regex for that.
If you have a problem, use a regex. Now you have two problems.
str = str.replaceAll("[?].*", "");
In other words, "remove everything after, and including, the question mark character". The ? has to be enclosed in square brackets because otherwise it has a special meaning.
I would agree with others answers that you should avoid using regex wherever possible, but if you did want to use it for this scenario you could use the following
Pattern regex = Pattern.compile("([^\\?]*)\\?{1}");
Matcher m = regex.matcher(str);
if (m.find()) {
result = m.group(1);
}
where str is your input string.
EDIT:
Description of regex match any group of characters that are not a "?" and have a single "?" after the group
The Pattern ".*(?=\?)" should work as well. ?= is a positive lookahead, which means the mattern matches everything that comes before a quotation mark, but not the quotation mark itself.