This is my original String:
String response = "attributes[{"id":50,"name":super},{"id":55,"name":hello}]";
I'm trying to parse the String and extract all the id values e.g
50
55
Pattern idPattern = Pattern.compile("{\"id\":(.*),");
Matcher matcher = idPattern.matcher(response);
while(matcher.find()){
System.out.println(matcher.group(1));
}
When i try to print the value i get an exception:
java.util.regex.PatternSyntaxException: Illegal repetition
Not had much experience with regular expressions in the past but cannot find a simple solution to this online.
Appreciate any help!
Pattern.compile("\"id\":(\\d+)");
Don't use a greedy match operator like * with a . which matches any character. unnecessarily.
If you want the digits extracted, you can use \d.
"id":(\d+)
Within a Java String,
Pattern.compile("\"id\":(\\d+)");
{ is a reserved character in regular expressions and should be escaped.
\{\"id\":(.*?),
Edit : If you're going to be working with JSON, you should consider using a dedicated JSON parser. It will make your life much easier. See Parsing JSON Object in Java
Related
Pardon my novelty in java, I have the following string ( Below ), I am trying to clean it and extract only the integer digits. What would be the correct java regex to use to achieve my goal:
Original String : uint32_t Count "77 (0x0000004D)"
Desired Output: 77
I have tried reading Java docs on regex but I only got more confused. I guess EE engineers are not cut for this fancy coding tricks :D
You could exploit "\\b" which is a word boundary:
String regex = "\\b\\d+\\b";
Matcher m = Pattern.compile(regex).matcher("uint32_t Count \"77 (0x0000004D)\"");
m.find();
System.out.println(m.group()); //output 77
"\\d+" finds a substring of digits, and surrounding it with "\\b" ensures that it is not embedded in another word/symbol.
more examples to get a pattern helps but with what you have given i can think of a simple regex that matches the group with the given pattern and then you strip out the quote and get your integer.
(["](\d{1,}))
I would suggest you play around regex more over here so you learn as you practice
I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.
I have a string like
Berlin -> Munich [label="590"]
and now I'm searching a regular expression in Java that checks if a given line (like above) is valid or not.
Currently, my RegExp looks like \\w\\s*->\\s*\\w\\s*\\[label=\"\\d\"\\]"
However, it doesn't work and I've found out that \\w\\s*->\\s*\\w\\s* still works but when adding \\[ it can't find the occurence (\\w\\s*->\\s*\\w\\s*\\[).
What I also found out is that when '->' is removed it works (\\w\\s*\\s*\\w\\s*\\[)
Is the arrow the problem? Can hardly imagine that.
I really need some help on this.
Thank you in advance
This is the correct regular expression:
"\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]"
What you report about matches and non-matches of partial regular expressions is very unlikely, not possible with the Berlin/Munich string.
Also, if you are really into German city names, you might have to consider names like Castrop-Rauxel (which some wit has called the Latin name of Wanne-Eickel ;-) )
Try this
String message = "Berlin -> Munich [label=\"590\"]";
Pattern p = Pattern.compile("\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]");
Matcher matcher = p.matcher(message);
while(matcher.find()) {
System.out.println(matcher.group());
}
You need to much more than one token of characters and numbers.
String.split(String regex) splits the string around a given regular expression and returns an String array. But I am interested in the regex matches and would like them to be returned as string array instead of strings around them.
For example,
In case of trival regex like ":" it probably wouldn't matter. But there are regexes which would match a particular date in a paragraph and I would like to get all these dates which may be different each time. I checked the jdk api but couldn't find any such methods. Is there any method that I can make use of?. Any help would much appreciated.
Take a look at java.util.regex package Matcher and Pattern classes:
http://download.oracle.com/javase/6/docs/api/java/util/regex/package-summary.html
Just use the Java regular expression API
Pattern pat = Pattern.compile("\\d");
Matcher mat= pat.matcher("Foo99Bar66Baz");
while(mat.find()) {
System.out.println(mat.group());
}
You can find simple but quite comprehensive examples for startup in the following link
http://www.vogella.de/articles/JavaRegularExpressions/article.html
Also Pattern and Matcher usage example in:
http://www.vogella.de/articles/JavaRegularExpressions/article.html#regexjava
I have following regex (<.*?>.*?</.*?>|[\w[-]]+)\p{Punct}* which works perfectly for most string with tags but if a tag is not preceded by space then it breaks the tag while finding a match.
Please help me in modifying this regex such that it doesn't break tags. All I am looking is to split on spaces but not if space is within a tag.
For Example:
BIRD-<abc attr="co_1">ab</span> #apos;<abc attr="co_12">cd</span>FEE DEF
should split into:
BIRD-<abc attr="co_1">ab</span>
#apos;<abc attr="co_12">cd</span>FEE
DEF
I am currently using a matcher to match this pattern and get the tokens
Matcher matcher = REGEX.matcher(newString);
while (matcher.find())
{
token = matcher.group();
}
Try this :
.*?<.*?>.*?</.*?>[^\s]*
It will produce the result you expect.
I would be wary of performing that type of parsing using regex. The pattern you are suggesting, as well as various adaptations of it may start behaving weirdly if attributes contain the > and/or < characters. The following example would throw your pattern off, for example.
<element attr="></>">value</element>
Any time you need to parse or process an XML file, I would advice you to consider using a proper XML parser. Please see this answer for a longer explanation.