I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}
Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.
You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.
Related
Need to grab string text of email value in big XML/normal string.
Been working with Regex for it and as of now below Regex is working correctly for normal String
Regex : ^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}$
Text : paris#france.c
but in case when above text is enclosed in XML tag it fails to return.
<email>paris#france.c</email>
I am trying to amend some change to this regex so that it will work for both of the scenarios
You have put ^ at the beginning which means the "Start of the string", and $ at the end which means the "End of the string". Now, look at your string:
<email>paris#france.c</email>
Do you think, it starts and ends with an email address?
I have removed them and also escaped the - in your regex. Here you can check the following auto-generated Java code with the updated regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+(?:\\\\.[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}";
final String string = "paris#france.c\n"
+ "<email>paris#france.c</email>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output:
Full match: paris#france.c
Full match: paris#france.c
My input string is like this :
String msgs="<InfoStart>\r\n"
+ "id:1234\r\n"
+ "phone:912119882\r\n"
+ "info_type:1\r\n"
+<InfoEnd>\r\n"
+"<InfoStart>\r\n"
+ "id:5678\r\n"
+ "phone:912119881\r\n"
+ "info_type:1\r\n"
+<InfoEnd>\r\n";
Now I can use the regular expression to get the info array :
private static Pattern patter= Pattern.compile("InfoStart>([\\s\\S]*?)<InfoEnd>");,But how to get the id,phone using regular expression?I try to write the code,but it fail,how to fix it?
private static Pattern infP = Pattern.compile("<InfoStart>([\\s\\S]*?)<InfoEnd>");
private static Pattern lineP = Pattern.compile(".*?\r\n");
final java.util.regex.Matcher matcher = patter.matcher(msgs);
while (matcher.find()){
String item = matcher.group(1);
Matcher matcherLine = lineP.matcher(item);
while(matcherLine.find()){
if(matcherLine.groupCount()>0){
String value= matcherLine.group(1);
int firstIndex=value.indexOf(":");
System.out.println("key:"+value.substring(0, firstIndex)+"value:"+value.substring(firstIndex+1));
}
}
}
Perhaps you can try this:
Pattern xmlPattern = Pattern.compile("<InfoStart>\\s+id:(\\d+)\\s+phone:(\\d+)\\s+info_type:(\\d+)\\s+<InfoEnd>");
Matcher matcher = xmlPattern.matcher(msgs);
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
The output:
1234
912119882
1
5678
912119881
1
But still I have to as say as Tim Biegeleisen mentioned, you'd better use other way around to parse a XML string.
Besides, your input string is incorrect, it should be:
String msgs="<InfoStart>\r\n"
+ "id:1234\r\n"
+ "phone:912119882\r\n"
+ "info_type:1\r\n"
+ "<InfoEnd>\r\n" // you lack an open double quote;
+"<InfoStart>\r\n"
+ "id:5678\r\n"
+ "phone:912119881\r\n"
+ "info_type:1\r\n"
+ "<InfoEnd>\r\n"; // you lack an open double quote;
So I have a few lines like such:
tag1:
line1word1 lineoneanychar
line2word1
tag2:
line1word1 ....
line2word1 .....
I am trying to build a java regex that extracts all the data under the tags. i.e:
String parsed1 = line1word1 lineone\nline2word1
String parsed2 = line1word1 ....\nline2word1 .....
I believe the right way to do this is using something like this, but I haven't quite got it right:
Pattern p = Pattern.compile("tag1:\n( {1}.*)\n(?!\\w+)", Pattern.DOTALL);
Matcher m = p.matcher(clean_data);
if(m.find()){
System.out.println(m.group(1));
}
Any help would be appreciated!
Could be something like that
public static void main(String[] args) throws Exception {
String input = "tag1:\n"
+ " line1word1 lineoneanychar\n"
+ " line2word1\n"
+ "tag2:\n"
+ " line1word1 ....\n"
+ " line2word1 .....\n";
Pattern p = Pattern.compile("tag\\d+:$\\n((?:^\\s.*?$\\n)+)", Pattern.DOTALL|Pattern.MULTILINE);
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(1));
}
}
Remember to escape \\ in your regex.
\d is a number
\s a space
(?:something) is for making a group that won't be a real 'group' in the matcher
Please consider the following text :
String str=
"<div style=\"text-align:left;\">$#abc#$</div>$#pqr#$";
How can I get the abc and pqr.
I tried using below code
String tempStr =
"$#<div style=\"text-align:left;\">$#Order-CASNo#$</div>$#abc#$";
Pattern p = Pattern.compile("(?<=\\$#)(\\w*)(?=#\\$)");
Matcher m = p.matcher(tempStr);
List<String> tokens = new ArrayList<String>();
while (m.find()) {
System.out.println("Found a " + m.group() + ".");
but it give me just abc..i want answer as Order-CASNo and abc.
This is the regex:
EDIT:
\b(?<=\$\#)(.*?)(?=\#\$)\b
Regex Demo
How do I write a regex that will match multiline delmitied by new line and spaces?
The following code works for one multiline but does not work if the input
is
String input = "A1234567890\nAAAAA\nwwwwwwww"
By which I mean matches() is not true for the input.
Here is my code:
package patternreg;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class pattrenmatching {
public static void main(String[] args) {
String input = "A1234567890\nAAAAA";
String regex = ".*[\\w\\s\\w+].*";
Pattern p = Pattern.compile(regex,Pattern.MULTILINE);
Matcher m =p.matcher(input);
if (m.matches()) {
System.out.println("matches() found the pattern \""
+ "\" starting at index "
+ " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
}
You could also add the DOTALL flag to get it working:
Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);
I believe your problem is that .* is greedy, so it's matching all the other '\n' in the string.
If you want to stick with the code above try: "[\S]*[\s]+". Which means match zero or more non-whitespace chars followed by one or more whitespace chars.
fixed up code:
public static void main(String[] args) {
String input = "A1234567890\nAAAAA\nsdfasdf\nasdfasdf";
String regex = "[\\S]*[\\s]+";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(input.substring(m.start(), m.end()) + "*");
}
if (m.matches()) {
System.out.println("matches() found the pattern \"" + "\" starting at index " + " and ending at index ");
} else {
System.out.println("matches() found nothing");
}
}
OUTPUT:
A1234567890
* AAAAA
* sdfasdf
* matches() found nothing
Also, a pattern of
"([\\S]*[\\s]+)+([\\S])*"
will match the entire output (matcher returns true) but messes up the token part of your code.