regex in java: matching BOL and EOL - java

I try to parse windows ini file with the java under the Windows. Assume that content is:
[section1]
key1=value1
key2=value2
[section2]
key1=value1
key2=value2
[section3]
key1=value1
key2=value2
I use the folowing code:
Pattern pattSections = Pattern.compile("^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL + Pattern.MULTILINE);
Pattern pattPairs = Pattern.compile("^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*)$", Pattern.DOTALL + Pattern.MULTILINE);
// parse sections
Matcher matchSections = pattSections.matcher(content);
while (matchSections.find()) {
String keySection = matchSections.group(1);
String valSection = matchSections.group(2);
// parse section content
Matcher matchPairs = pattPairs.matcher(valSection);
while (matchPairs.find()) {
String keyPair = matchPairs.group(1);
String valPair = matchPairs.group(2);
}
}
But it doesn't work properly:
The section1 doesn't match. It's probably because this starts not from the 'after EOL'. When I put the empty string before the [section1] then it matches.
The valSection returns '\r\nke1=value1\r\nkey2=value2\r\n'. The keyPair returns 'key1'. It looks like ok. But the valPair returns the 'value1\r\nkey2=value2\r\n' but not the 'value1' as desired.
What is wrong here?

You do not need the DOTALL flag as you do not use dots at all in your pattern.
I think Java treats \n itself as newline so \r won't be processed. Your pattern:
^\\[([a-zA-Z_0-9\\s]+)\\]$
won't be true, but insted
^\\[([a-zA-Z_0-9\\s]+)\\]\r$
will.
I recommend you ignore MULTILINE too and use the following patterns as line separators:
(^|\r\n)
($|\r\n)

The first regex just worked (isn't it a problem on how you read the file?), and the second one was added the "?" sign to use it in a reluctant way.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String content = "[section1]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n" +
"[section2]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n" +
"[section3]\r\n" +
"key1=value1\r\n" +
"key2=value2\r\n";
Pattern pattSections = Pattern.compile(
"^\\[([a-zA-Z_0-9\\s]+)\\]$([^\\[]*)", Pattern.DOTALL
+ Pattern.MULTILINE);
Pattern pattPairs = Pattern.compile(
"^([a-zA-Z_0-9]+)\\s*=\\s*([^$]*?)$", Pattern.DOTALL
+ Pattern.MULTILINE);
// parse sections
Matcher matchSections = pattSections.matcher(content);
while (matchSections.find()) {
String keySection = matchSections.group(1);
String valSection = matchSections.group(2);
// parse section content
Matcher matchPairs = pattPairs.matcher(valSection);
while (matchPairs.find()) {
String keyPair = matchPairs.group(1);
String valPair = matchPairs.group(2);
}
}
}
}

Related

How can I extract substring from the given url using regex in Android Studio

I'm trying to extract CANseIqFMnf from the URL https://www.instagram.com/p/CANseIqFMnf/ using regex in Android studio. Please help me to get a regex expression eligible for Android Studio.
Here is the code for my method:
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String REGEX = "/p\//";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(url);
boolean match = matcher.matches();
if (match){
Log.e("success", "start = " + matcher.start() + " end = " + matcher.end() );
}else{
Log.e("failed", "failed");
}
But it gives me failed in return!
Method 1
You just need to use replaceAll method in String, no need to compile a pattern and complicate things:
String input = "https://www.instagram.com/p/CANseIqFMnf/";
String output = input.replaceAll("https://www.instagram.com/p/", "").replaceAll("/", "");
Log.v(TAG, output);
Note that the first replaceAll is to remove the url and the second replaceAll is to remove any slashes /
Method 2
Pattern pattern = Pattern.compile("https://www.instagram.com/p/(.*?)/");
Matcher matcher = pattern.matcher("https://www.instagram.com/p/CANseIqFMnf/");
while(matcher.find()) {
System.out.println(matcher.group(1));
}
Note that if matcher.find() returns true then if you used modifiers like this in your REGEX (.*?) then the part found there will be in group(1), and group(0) will hold the entire regex match which is in your case the entire url.
Alternate option w/o regex can be implemented in a simpler manner as below using java.nio.file.Paths APIs
public class Url {
public static void main(String[] args) {
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String name = java.nio.file.Paths.get(url).getFileName().toString();
System.out.println(name);
}
}

Regular Expression : No match found

I just started learning about regular expressions. I am trying to get the attribute values within "mytag" tags and used the following code, but it is giving me No match found exception.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class dummy {
public static void testRegEx()
{
// String pattern_termName = "(?i)\\[.*\\]()\\[.*\\]";
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1="(.*?)" attr2="(.*?)" attr3="(.*?)"](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
public static void main(String args[])
{
testRegEx();
}
}
I have used \" in place of " but it still shows me same exception.
You forget to check the matcher object against find function and also you need to use \"
instead of ",. The find method scans the input sequence looking for the next subsequence that matches the pattern.
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1=\"(.*?)\" attr2=\"(.*?)\" attr3=\"(.*?)\"\\s*](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
while(matcherTag.find()){
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
Output:
20258044753052856*********A security $$$$$$$$$$$$
DEMO
\\s+ or \\s* missing
code:
final String pattern = "\\[\\s*mytag\\s+attr1\\s*=\\s*\"(.*?)\"\\s+attr2\\s*=\\s*\"(.*?)\"\\s+attr3\\s*=\\s*\"(.*?)\"\\s*\\](.+?)\\[/mytag\\]";
final String input = "[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
final Pattern p = Pattern.compile( pattern );
final Matcher m = p.matcher( input );
if( m.matches()) {
System.out.println(
m.group(1) + '\t' + m.group(2) + '\t' + m.group(3) + '\t' + m.group(4));
}
outpout:
20258044753052856 A security cvvc TagTitle

Regex to match [[Wikipedia:Manual of Style#Links|]] # in java

I have been trying to match the following string -
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
with the regex
boolean a = temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9]*#[a-zA-Z_0-9]*\\|\\]\\]");
"\\[\\[Wikipedia:(.*?)#(.*?)\\|\\]\\]"
"\\[\\[Wikipedia:(.*)*#(.+)*\\|\\]\\]"
"\\[\\[(.*?)#(.*?)\\|\\]\\]"
But none of them are giving any positive matches.
Straight away I can see a problem: you are using a character class without a space to match input with spaces.
Try this:
boolean a = temp.matches("\\[\\[Wikipedia:[\\w ]*#[\\w ]+\\|\\]\\]");
Note that [a-zA-Z_0-9] can be replaced by [\w] (but would include letters/numbers from all languages, which should be fine)
public static void main(String[] args) {
String temp = "[[Wikipedia:Manual of Style#Links|]]";
Pattern pattern = Pattern.compile("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Matcher matcher = pattern.matcher(temp);
if(matcher.find()) {
System.out.println("Manual of Style: " + matcher.group(1));
System.out.println("links : " + matcher.group(2));
}
}
or
temp.matches("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Just add a space to your custom character class:
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9 ]*#[a-zA-Z_0-9]*\\|\\]\\]"); //true

Splitting a string java

I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}
Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.
You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.

java regex string matches and multiline delimited with new line

How do I write a regex that will match multiline delmitied by new line and spaces?
The following code works for one multiline but does not work if the input
is
String input = "A1234567890\nAAAAA\nwwwwwwww"
By which I mean matches() is not true for the input.
Here is my code:
package patternreg;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class pattrenmatching {
public static void main(String[] args) {
String input = "A1234567890\nAAAAA";
String regex = ".*[\\w\\s\\w+].*";
Pattern p = Pattern.compile(regex,Pattern.MULTILINE);
Matcher m =p.matcher(input);
if (m.matches()) {
System.out.println("matches() found the pattern \""
+ "\" starting at index "
+ " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
}
You could also add the DOTALL flag to get it working:
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
I believe your problem is that .* is greedy, so it's matching all the other '\n' in the string.
If you want to stick with the code above try: "[\S]*[\s]+". Which means match zero or more non-whitespace chars followed by one or more whitespace chars.
fixed up code:
public static void main(String[] args) {
String input = "A1234567890\nAAAAA\nsdfasdf\nasdfasdf";
String regex = "[\\S]*[\\s]+";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(input.substring(m.start(), m.end()) + "*");
}
if (m.matches()) {
System.out.println("matches() found the pattern \"" + "\" starting at index " + " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
OUTPUT:
A1234567890
* AAAAA
* sdfasdf
* matches() found nothing
Also, a pattern of
"([\\S]*[\\s]+)+([\\S])*"
will match the entire output (matcher returns true) but messes up the token part of your code.

Categories