Break a string variable's content into two parts - java

I have a string variable, I have to divide the content of the String variable into two parts and save them in two different string variables. I have already extracted one part of it, but I am not able to extract the other part.
This is the code:
String set_id="(1) Speed Test 150(min) Demo 1";
set_id = set_id.substring(set_id.indexOf("(") + 1);
set_id = set_id.substring(0, set_id.indexOf(")"));
The above code has extracted the digit 1 for me which is saved in the set_id variable.
Now I want to extract Speed Test 150(min) Demo 1 from the variable and save it in a variable named set_name.
The format of the variable's content will always remain the same, but the digit and the name itself may vary.
What should I do to extract the different parts of the string?

You are overwriting the original string when you are getting the first substring. Save each substring in a new variable:
String set_id="(1) Speed Test 150(min) Demo 1";
String part1 = set_id.substring(set_id.indexOf("(") + 1);
part1 = part1.substring(0, part1.indexOf(")"));
String part2 = set_id.substring(set_id.indexOf(")")+2);

Try the following:
\\((\d+)\\)\s*(.+)
$1 gives the id and $2 gives name.
Here,
\\( and \\) match opening and closing brackets. (escaped, as ( and ) have special meaning)
(\d+) matches one or more digits (captured, so that $1 can be used to refer this)
\s* matches zero or more spaces
(.+) matches one or more (any) characters (again captured)
Use it like
String string = "(1) Speed Test 150(min) Demo 1";
id = string.replaceAll("\\((\d+)\\)\s*(.+)","$1");
name = string.replaceAll("\\((\d+)\\)\s*(.+)","$2");

Assuming the format is the same:
set_id.substring(set_id.indexOf(")")+2);

Check this ... a better and efficient REGEX can be used ...
Pattern pattern = Pattern.compile("\\((\\d{0,1})\\)(.*$)");
String string = "(1) Speed Test 150(min) Demo 1";
Matcher matcher = pattern.matcher(string);
if (matcher.matches()) {
System.out.println("Total matches: " + matcher.groupCount());
for(int i=1, max=matcher.groupCount(); i<=max; i++ ) {
System.out.println(i + " : " + matcher.group(i));
}
} else {
System.out.println("No match");
}

I aaded "]" to find out out the end carachter.
String set_id="(1) Speed Test 150(min) Demo 1]";
String set_name=set_id.subSequence(set_id.indexOf(')')+1, set_id.indexOf(']')).toString();
System.out.println(set_name);
Now its working for me.
O/P:-- Speed Test 150(min) Demo

You can use Regex to extract the variables.Below is the sample code.
Pattern pattern = Pattern.compile("(\\d+).+?(\\d+)\\(");
String sample = "(1) Speed Test 150(min) Demo 1";
Matcher matcher = pattern.matcher(sample);
if(matcher.find()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

You could try this:
public class RegexTest {
public static void main(String[] args) {
String originalString = "(1) Speed Test 150(min) Demo 1";
String setId = originalString.replaceAll("\\((\\d+)\\)\\s*(.+)", "$1").trim();
String setName = originalString.replaceAll("\\((\\d+)\\)\\s*(.+)", "$2").trim();
System.out.println("setId: " + setId);
System.out.println("setName: " + setName);
}
}

Related

complex regular expression in Java

I have a rather complex (to me it seems rather complex) problem that I'm using regular expressions in Java for:
I can get any text string that must be of the format:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
I started with a regular expression for extracting the text between the M:/:D:/:C:/:Q: as:
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
And that works fine if the <either a url or string> is just an alphanumeric string. But it all falls apart when the embedded string is a url of the format:
tcp://someurl.something:port
Can anyone help me adjust the above reg exp to extract the text after :D: to be either a url or a alpha-numeric string?
Here's an example:
public static void main(String[] args) {
String name = "M:myString1:D:tcp://someurl.com:8989:C:myString2:Q:1";
boolean matchFound = false;
ArrayList<String> values = new ArrayList<>();
String pattern2 = "(M:|:D:|:C:|:Q:.*?)([a-zA-Z_\\.0-9]+)";
Matcher m3 = Pattern.compile(pattern2).matcher(name);
while (m3.find()) {
matchFound = true;
String m = m3.group(2);
System.out.println("regex found match: " + m);
values.add(m);
}
}
In the above example, my results would be:
myString1
tcp://someurl.com:8989
myString2
1
And note that the Strings can be of variable length, alphanumeric, but allowing some characters (such as the url format with :// and/or . - characters
You mention that the format is constant:
M:<some text>:D:<either a url or string>:C:<some more text>:Q:<a number>
Capture groups can do this for you with the pattern:
"M:(.*):D:(.*):C:(.*):Q:(.*)"
Or you can do a String.split() with a pattern of "M:|:D:|:C:|:Q:". However, the split will return an empty element at the first index. Everything else will follow.
public static void main(String[] args) throws Exception {
System.out.println("Regex: ");
String data = "M:<some text>:D:tcp://someurl.something:port:C:<some more text>:Q:<a number>";
Matcher matcher = Pattern.compile("M:(.*):D:(.*):C:(.*):Q:(.*)").matcher(data);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
}
System.out.println();
System.out.println("String.split(): ");
String[] pieces = data.split("M:|:D:|:C:|:Q:");
for (String piece : pieces) {
System.out.println(piece);
}
}
Results:
Regex:
<some text>
tcp://someurl.something:port
<some more text>
<a number>
String.split():
<some text>
tcp://someurl.something:port
<some more text>
<a number>
To extract the URL/text part you don't need the regular expression. Use
int startPos = input.indexOf(":D:")+":D:".length();
int endPos = input.indexOf(":C:", startPos);
String urlOrText = input.substring(startPos, endPos);
Assuming you need to do some validation along with the parsing:
break the regex into different parts like this:
String m_regex = "[\\w.]+"; //in jsva a . in [] is just a plain dot
String url_regex = "."; //theres a bunch online, pick your favorite.
String d_regex = "(?:" + url_regex + "|\\p{Alnum}+)"; // url or a sequence of alphanumeric characters
String c_regex = "[\\w.]+"; //but i'm assuming you want this to be a bit more strictive. not sure.
String q_regex = "\\d+"; //what sort of number exactly? assuming any string of digits here
String regex = "M:(?<M>" + m_regex + "):"
+ "D:(?<D>" + d_regex + "):"
+ "C:(?<D>" + c_regex + "):"
+ "Q:(?<D>" + q_regex + ")";
Pattern p = Pattern.compile(regex);
Might be a good idea to keep the pattern as a static field somewhere and compile it in a static block so that the temporary regex strings don't overcrowd some class with basically useless fields.
Then you can retrieve each part by its name:
Matcher m = p.matcher( input );
if (m.matches()) {
String m_part = m.group( "M" );
...
String q_part = m.group( "Q" );
}
You can go even a step further by making a RegexGroup interface/objects where each implementing object represents a part of the regex which has a name and the actual regex. Though you definitely lose the simplicity makes it harder to understand it with a quick glance. (I wouldn't do this, just pointing out its possible and has its own benefits)

Regex to match [[Wikipedia:Manual of Style#Links|]] # in java

I have been trying to match the following string -
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
with the regex
boolean a = temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9]*#[a-zA-Z_0-9]*\\|\\]\\]");
"\\[\\[Wikipedia:(.*?)#(.*?)\\|\\]\\]"
"\\[\\[Wikipedia:(.*)*#(.+)*\\|\\]\\]"
"\\[\\[(.*?)#(.*?)\\|\\]\\]"
But none of them are giving any positive matches.
Straight away I can see a problem: you are using a character class without a space to match input with spaces.
Try this:
boolean a = temp.matches("\\[\\[Wikipedia:[\\w ]*#[\\w ]+\\|\\]\\]");
Note that [a-zA-Z_0-9] can be replaced by [\w] (but would include letters/numbers from all languages, which should be fine)
public static void main(String[] args) {
String temp = "[[Wikipedia:Manual of Style#Links|]]";
Pattern pattern = Pattern.compile("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Matcher matcher = pattern.matcher(temp);
if(matcher.find()) {
System.out.println("Manual of Style: " + matcher.group(1));
System.out.println("links : " + matcher.group(2));
}
}
or
temp.matches("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Just add a space to your custom character class:
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9 ]*#[a-zA-Z_0-9]*\\|\\]\\]"); //true

Regular expression on a string

I have a String like below
String phone = (123) 456-7890
Now I would like my program to verify if that my input is the same pattern as string 'phone'
I did the following
if(phone.contains("([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]")) {
//display pass
}
else {
//display fail
}
It didn't work. I tried with other combinations too. nothing worked.
Question :
1. How can I achieve this without using 'Pattern' like above?
2. How to do this with pattern. I tried with pattern as below
Pattern pattern = Pattern.compile("(\d+)");
Matcher match = pattern.matcher(phone);
if (match.find()) {
//Displaypass
}
String#matches checks if a string matches a pattern:
if (phone.matches("\\(\\d{3}\\) \\d{3}-\\d{4}")) {
//Displaypass
}
The pattern is a regular expression. Therefor I had to escape the round brackets, as they have a special meaning in regex (they denote capturing groups).
contains() only checks if a string contains the substring passed to it.
I'm not going to dive too deeply into regex syntax, but there definitely is something off with your regex.
"([0-9][0-9][0-9]) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
it containes ( and ) and those have special meaning. Escape them
"\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
and you'll also have to escape your \ for the final
"\\([0-9][0-9][0-9]\\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]"
You can write like:
Pattern pattern = Pattern.compile("\\(\\d{3}\\) \\d{3}-\\d{4}");
Matcher matcher = pattern.matcher(sPhoneNumber);
if (matcher.matches()) {
System.out.println("Phone Number Valid");
}
For more information you can visit this article.
It appears that your problem is that you didn't escape the parentheses, so your Regex is failing. Try this:
\([0-9][0-9][0-9]\) [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
This works
String PHONE_REGEX = "[(]\\b[0-9]{3}\\b[)][ ]\\b[0-9]{3}\\b[-]\\b[0-9]{4}\\b";
String phone1 = "(1234) 891-6762";
Boolean b = phone1.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone1 + " :Valid = " + b);
String phone2 = "(143) 456-7890";
b = phone2.matches(PHONE_REGEX);
System.out.println("is e-mail: " + phone2 + " :Valid = " + b);
Output:
is phone: (1234) 891-6762 :Valid = false
is phone: (143) 456-7890 :Valid = true

Splitting a string java

I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}
Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.
You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.

How to take a substring using pattern match

I have
String content= "<a data-hovercard=\"/ajax/hovercard/group.php?id=180552688740185\">
<a data-hovercard=\"/ajax/hovercard/group.php?id=21392174\">"
I want to get all the id between "group.php?id=" and "\""
Ex:180552688740185
Here is my code:
String content1 = "";
Pattern script1 = Pattern.compile("group.php?id=.*?\"");
Matcher mscript1 = script1.matcher(content);
while (mscript1.find()) {
content1 += mscript1.group() + "\n";
}
But for some reason it does not work.
Can you give me some advice?
Why are you using .*? to match the id. .*? will match every character. You just need to check for digits. So, just use \\d.
Also, you need to capture the id and then print it.
// To consider special characters as literals
String str = Pattern.quote("group.php?id=") + "(\\d*)";
Pattern script1 = Pattern.compile(str);
// Your matcher line
while (mscript1.find()) {
content += mscript1.group(1) + "\n"; // Capture group 1 contains your id
}

Categories