java regular expression issue about capture group

java regular expression issue about capture group - java

public void test(){
String source = "hello<a>goodA</a>boys can goodB\"\n"
+ " + \"this can help";
Pattern pattern = Pattern.compile("<a[\\s+.*?>|>](.*?)</a>");
Matcher matcher = pattern.matcher(source);
while (matcher.find()){
System.out.println("laozhu:" + matcher.group(1));
}
}
Output:
laozhu:goodA
laozhu:href="www.baidu.com">goodB
Why the second match is not laozhu:goodB?

Try this Regex:
<a(?: .*?)?>(\w+)<\/a>
So your Pattern should look like this:
Pattern pattern = Pattern.compile("<a(?: .*?)?>(\\w+)<\\/a>");
It matches goodA and goodB.
For the detailed description, look here: Regex101.

Pattern pattern = Pattern.compile("<a.*?>(.*?)</a>");

Related

Problems defining the pattern to extract multiple dates from a json string in java

I have the following code:
public static void main(String[] args) {
String str = "{\"$and\":[{\\\"$or\\\":[{\\\"origen\\\":{\\\"$eq\\\":\\\"LEMD\\\"}},{\\\"origen\\\":{\\\"$eq\\\":\\\"LEBL\\\"}}]},{\"horasacta\":{\"$gte\":\"28/02/2015 00:00:00\"}},{\"horasacta\":{\"$lte\":\"28/02/2015 23:59:59\"}}]}";
Pattern pattern = Pattern.compile("\\{\"(.*?)\":\\{\"\\$(.*?)\":\"[0-9]+/[0-9]+/[0-9]+ [0-9]+:[0-9]+:[0-9]+\"}}");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
I want to get the substrings:
{\"departure\":{\"$gte\":\"28/02/2015 00:00:00\"}}
{\"departure\":{\"$lte\":\"28/02/2015 23:59:59\"}}
but the program give me:
{"$and":[{\"$or\":[{\"origin\":{\"$eq\":\"LEMD\"}},{\"origin\":{\"$eq\":\"LEBL\"}}]},{"departure":{"$gte":"28/02/2015 00:00:00"}}
{"departure":{"$lte":"28/02/2015 23:59:59"}}
the 2nd time the find() matches the pattern but the 1st time it doesn't do the job.
Any help?
thanks

It's frowned upon, yet if you have to, I'm guessing that you may be trying to write an expression looking somewhat like:
{\\"([^\\]+)\\":{\\"\$([^\\]+)\\":\\"[0-9]+\/[0-9]+\/[0-9]+\s+[0-9]+:[0-9]+:[0-9]+\\"}}
not so sure though.
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "\\{\\\\\"([^\\\\]+)\\\\\":\\{\\\\\"\\$([^\\\\]+)\\\\\":\\\\\"[0-9]+\\/[0-9]+\\/[0-9]+\\s+[0-9]+:[0-9]+:[0-9]+\\\\\"\\}}";
final String string = "{\\\"departure\\\":{\\\"$gte\\\":\\\"28/02/2015 00:00:00\\\"}}\n"
+ "{\\\"departure\\\":{\\\"$lte\\\":\\\"28/02/2015 23:59:59\\\"}}";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}

Java Pattern regex whole word match

I am trying to match a keyword with following string
"abc,pqr(1),xyz"
It will be succesfull match if the whole one word matched for e.g. "par" or "abc" or "xyz"
Can anyone please help me in creating regex for this match ?
String text = "hello, hellos(1),bye";
String keyword = "account";
String patternString = "["+ keyword + "]";
Pattern pattern = Pattern.compile(patternString, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
boolean matches = matcher.matches();
System.out.println("matches = " + matches);

This Should Work.
([a-zA-Z]+)
Input:
"abc,pqr(1),xyz"
Output:
abc
pqr
xyz
See: https://regex101.com/r/Us6G3X/2

Match all lines starting with a space up till a line that doesnt start with a space

So I have a few lines like such:
tag1:
line1word1 lineoneanychar
line2word1
tag2:
line1word1 ....
line2word1 .....
I am trying to build a java regex that extracts all the data under the tags. i.e:
String parsed1 = line1word1 lineone\nline2word1
String parsed2 = line1word1 ....\nline2word1 .....
I believe the right way to do this is using something like this, but I haven't quite got it right:
Pattern p = Pattern.compile("tag1:\n( {1}.*)\n(?!\\w+)", Pattern.DOTALL);
Matcher m = p.matcher(clean_data);
if(m.find()){
System.out.println(m.group(1));
}
Any help would be appreciated!

Could be something like that
public static void main(String[] args) throws Exception {
String input = "tag1:\n"
+ " line1word1 lineoneanychar\n"
+ " line2word1\n"
+ "tag2:\n"
+ " line1word1 ....\n"
+ " line2word1 .....\n";
Pattern p = Pattern.compile("tag\\d+:$\\n((?:^\\s.*?$\\n)+)", Pattern.DOTALL|Pattern.MULTILINE);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(1));
}
}
Remember to escape \\ in your regex.
\d is a number
\s a space
(?:something) is for making a group that won't be a real 'group' in the matcher

how to deal with a string with regex

for example:
I have a string like this:
http://shop.vipshop.com/detail-97996-12358781.html
I want to use regex to find 97996 and 12358781
java code is appreciated
Many thanks.
String str="http://shop.vipshop.com/detail-97996-12358781.html";
String regex ="\-d{5}\-";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.group());
but it was wrong

Try this
String str="http://shop.vipshop.com/detail-97996-12358781.html";
String regex =".*detail-(\\d+)-(\\d+).html";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if(matcher.matches()){
System.out.println(matcher.group(1) + "|" + matcher.group(2));
}

You have to invoke either Matcher#find() or Matcher#matches() to actually get the matches. In this case, you would need the former one, as you are only finding a part of string matching the regex.
And you can use + quantifier to get any length of digit. Try using this:
String regex ="\\d+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}

Two lines:
String num1 = str.replaceAll(".*-(\\d+)-.*", "$1");
String num2 = str.replaceAll(".*-(\\d+)\\..*", "$1");

String str = "http://shop.vipshop.com/detail-97996-12358781.html";
String regex = "(?<=detail-)(\\d+)-(\\d+)(?=\\.html)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
matcher.find();
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
output:
97996 12358781

You have used a quantifier (5) and still the 1235... is 8 characters long?
Is it always 5 and 8 can you use:
"([\\d]{5,8})"
The matches captured into backreferences
But if you need to find in the specific form detail-NUMBER-NUMBER.html you can use:
"detail-([\\d]*)-([\\d]*).html"
The matches captured in [1] and [2]

you can use this:
String str="http://shop.vipshop.com/detail-97996-12358781.html";
String regex ="[^0-9]+([0-9]+)[^0-9]+([0-9]+).+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if(matcher.matches()){
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
for advance tutorial go to this link
RegEx tutorial
and Regular Expression tutorial

You should add
if(matcher.find()){
}
on
System.out.println(matcher.group());
then your code is:
String str="http://shop.vipshop.com/detail-97996-12358781.html";
String regex ="\\d{5,}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
if(matcher.find()){
System.out.println(matcher.group());
}

Splitting a string java

I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.

String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}

Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}

You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.

You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java regular expression issue about capture group - java

Try this Regex: <a(?: .?)?>(\w+)<\/a> So your Pattern should look like this: Pattern pattern = Pattern.compile("<a(?: .?)?>(\\w+)<\\/a>"); It matches goodA and goodB. For the detailed description, look here: Regex101.

Pattern pattern = Pattern.compile("<a.?>(.?)</a>");

Related

Problems defining the pattern to extract multiple dates from a json string in java

Java Pattern regex whole word match

Match all lines starting with a space up till a line that doesnt start with a space

how to deal with a string with regex

Splitting a string java

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java regular expression issue about capture group - java

Try this Regex: <a(?: .*?)?>(\w+)<\/a> So your Pattern should look like this: Pattern pattern = Pattern.compile("<a(?: .*?)?>(\\w+)<\\/a>"); It matches goodA and goodB. For the detailed description, look here: Regex101.

Pattern pattern = Pattern.compile("<a.*?>(.*?)</a>");

Related

Problems defining the pattern to extract multiple dates from a json string in java

Java Pattern regex whole word match

Match all lines starting with a space up till a line that doesnt start with a space

how to deal with a string with regex

Splitting a string java

Categories

Resources

Try this Regex: <a(?: .?)?>(\w+)<\/a> So your Pattern should look like this: Pattern pattern = Pattern.compile("<a(?: .?)?>(\\w+)<\\/a>"); It matches goodA and goodB. For the detailed description, look here: Regex101.

Pattern pattern = Pattern.compile("<a.?>(.?)</a>");