I just started learning about regular expressions. I am trying to get the attribute values within "mytag" tags and used the following code, but it is giving me No match found exception.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class dummy {
public static void testRegEx()
{
// String pattern_termName = "(?i)\\[.*\\]()\\[.*\\]";
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1="(.*?)" attr2="(.*?)" attr3="(.*?)"](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
public static void main(String args[])
{
testRegEx();
}
}
I have used \" in place of " but it still shows me same exception.
You forget to check the matcher object against find function and also you need to use \"
instead of ",. The find method scans the input sequence looking for the next subsequence that matches the pattern.
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1=\"(.*?)\" attr2=\"(.*?)\" attr3=\"(.*?)\"\\s*](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
while(matcherTag.find()){
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
Output:
20258044753052856*********A security $$$$$$$$$$$$
DEMO
\\s+ or \\s* missing
code:
final String pattern = "\\[\\s*mytag\\s+attr1\\s*=\\s*\"(.*?)\"\\s+attr2\\s*=\\s*\"(.*?)\"\\s+attr3\\s*=\\s*\"(.*?)\"\\s*\\](.+?)\\[/mytag\\]";
final String input = "[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
final Pattern p = Pattern.compile( pattern );
final Matcher m = p.matcher( input );
if( m.matches()) {
System.out.println(
m.group(1) + '\t' + m.group(2) + '\t' + m.group(3) + '\t' + m.group(4));
}
outpout:
20258044753052856 A security cvvc TagTitle
Related
Need to grab string text of email value in big XML/normal string.
Been working with Regex for it and as of now below Regex is working correctly for normal String
Regex : ^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}$
Text : paris#france.c
but in case when above text is enclosed in XML tag it fails to return.
<email>paris#france.c</email>
I am trying to amend some change to this regex so that it will work for both of the scenarios
You have put ^ at the beginning which means the "Start of the string", and $ at the end which means the "End of the string". Now, look at your string:
<email>paris#france.c</email>
Do you think, it starts and ends with an email address?
I have removed them and also escaped the - in your regex. Here you can check the following auto-generated Java code with the updated regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+(?:\\\\.[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}";
final String string = "paris#france.c\n"
+ "<email>paris#france.c</email>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output:
Full match: paris#france.c
Full match: paris#france.c
I have been trying to match the following string -
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
with the regex
boolean a = temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9]*#[a-zA-Z_0-9]*\\|\\]\\]");
"\\[\\[Wikipedia:(.*?)#(.*?)\\|\\]\\]"
"\\[\\[Wikipedia:(.*)*#(.+)*\\|\\]\\]"
"\\[\\[(.*?)#(.*?)\\|\\]\\]"
But none of them are giving any positive matches.
Straight away I can see a problem: you are using a character class without a space to match input with spaces.
Try this:
boolean a = temp.matches("\\[\\[Wikipedia:[\\w ]*#[\\w ]+\\|\\]\\]");
Note that [a-zA-Z_0-9] can be replaced by [\w] (but would include letters/numbers from all languages, which should be fine)
public static void main(String[] args) {
String temp = "[[Wikipedia:Manual of Style#Links|]]";
Pattern pattern = Pattern.compile("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Matcher matcher = pattern.matcher(temp);
if(matcher.find()) {
System.out.println("Manual of Style: " + matcher.group(1));
System.out.println("links : " + matcher.group(2));
}
}
or
temp.matches("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Just add a space to your custom character class:
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9 ]*#[a-zA-Z_0-9]*\\|\\]\\]"); //true
I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}
Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.
You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.
I try to parse windows ini file with the java under the Windows. Assume that content is:
[section1]
key1=value1
key2=value2
[section2]
key1=value1
key2=value2
[section3]
key1=value1
key2=value2
I use the folowing code:
Pattern pattSections = Pattern.compile("^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL + Pattern.MULTILINE);
Pattern pattPairs = Pattern.compile("^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*)$", Pattern.DOTALL + Pattern.MULTILINE);
// parse sections
Matcher matchSections = pattSections.matcher(content);
while (matchSections.find()) {
String keySection = matchSections.group(1);
String valSection = matchSections.group(2);
// parse section content
Matcher matchPairs = pattPairs.matcher(valSection);
while (matchPairs.find()) {
String keyPair = matchPairs.group(1);
String valPair = matchPairs.group(2);
}
}
But it doesn't work properly:
The section1 doesn't match. It's probably because this starts not from the 'after EOL'. When I put the empty string before the [section1] then it matches.
The valSection returns '\r\nke1=value1\r\nkey2=value2\r\n'. The keyPair returns 'key1'. It looks like ok. But the valPair returns the 'value1\r\nkey2=value2\r\n' but not the 'value1' as desired.
What is wrong here?
You do not need the DOTALL flag as you do not use dots at all in your pattern.
I think Java treats \n itself as newline so \r won't be processed. Your pattern:
^\\[([a-zA-Z_0-9\\s]+)\\]$
won't be true, but insted
^\\[([a-zA-Z_0-9\\s]+)\\]\r$
will.
I recommend you ignore MULTILINE too and use the following patterns as line separators:
(^|\r\n)
($|\r\n)
The first regex just worked (isn't it a problem on how you read the file?), and the second one was added the "?" sign to use it in a reluctant way.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String content = "[section1]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n" +
"[section2]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n" +
"[section3]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n";
Pattern pattSections = Pattern.compile(
"^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL
+ Pattern.MULTILINE);
Pattern pattPairs = Pattern.compile(
"^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*?)$", Pattern.DOTALL
+ Pattern.MULTILINE);
// parse sections
Matcher matchSections = pattSections.matcher(content);
while (matchSections.find()) {
String keySection = matchSections.group(1);
String valSection = matchSections.group(2);
// parse section content
Matcher matchPairs = pattPairs.matcher(valSection);
while (matchPairs.find()) {
String keyPair = matchPairs.group(1);
String valPair = matchPairs.group(2);
}
}
}
}
How do I write a regex that will match multiline delmitied by new line and spaces?
The following code works for one multiline but does not work if the input
is
String input = "A1234567890\nAAAAA\nwwwwwwww"
By which I mean matches() is not true for the input.
Here is my code:
package patternreg;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class pattrenmatching {
public static void main(String[] args) {
String input = "A1234567890\nAAAAA";
String regex = ".*[\\w\\s\\w+].*";
Pattern p = Pattern.compile(regex,Pattern.MULTILINE);
Matcher m =p.matcher(input);
if (m.matches()) {
System.out.println("matches() found the pattern \""
+ "\" starting at index "
+ " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
}
You could also add the DOTALL flag to get it working:
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
I believe your problem is that .* is greedy, so it's matching all the other '\n' in the string.
If you want to stick with the code above try: "[\S]*[\s]+". Which means match zero or more non-whitespace chars followed by one or more whitespace chars.
fixed up code:
public static void main(String[] args) {
String input = "A1234567890\nAAAAA\nsdfasdf\nasdfasdf";
String regex = "[\\S]*[\\s]+";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(input.substring(m.start(), m.end()) + "*");
}
if (m.matches()) {
System.out.println("matches() found the pattern \"" + "\" starting at index " + " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
OUTPUT:
A1234567890
* AAAAA
* sdfasdf
* matches() found nothing
Also, a pattern of
"([\\S]*[\\s]+)+([\\S])*"
will match the entire output (matcher returns true) but messes up the token part of your code.