java regular expression get substring - java

I can't find any good resource for parsing with regular expression. Could someone please show me the way.
How can I parse this statement?
"Breakpoint 10, main () at file.c:10"
I want get the substring "main ()" or 3rd word of the statement.

This works:
public void test1() {
String text = "Breakpoint 10, main () at file.c:10";
String regex = ",(.*) at";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Basically the regular expression .(.*) at with group(1) returns the value main ().

Assuming you want the 3rd word of your string (as said in your comments), first break it using a StringTokenizer. That will allow you to specify separator (space is by default)
List<String> words = new ArrayList<String>();
String str = "Breakpoint 10, main () at file.c:10";
StringTokenizer st = new StringTokenizer(str); // space by default
while(st.hasMoreElements()){
words.add(st.nextToken());
}
String result = words.get(2);
That returns main
If you also want the (), as you defined spaces as separator, you also need to take the next word words.get(3)

Good website regular-expressions.info
Good online tester regexpal.com
Java http://download.oracle.com/javase/tutorial/essential/regex/
I turn to these when I want to play with Regex

Have you seen the standard Sun tutorial on regular expressions ? In particular the section on matching groups would be of use.

Try: .*Breakpoint \d+, (.*) at

Well, the regular expression main \(\) does parse this. However, I suspect that you would like everything after the first comman and before the last "at": ,(.*) at gives you that in group(1) that is opened by the parenthesis in the expression.

Related

Java Regular Expression for finding specific string

I have a file with a long string a I would like to split it by specific item i.e.
String line = "{{[Metadata{"this, is my first, string"}]},{[Metadata{"this, is my second, string"}]},{[Metadata{"this, is my third string"}]}}"
String[] tab = line.split("(?=\\bMetadata\\b)");
So now when I iterate my tab I will get lines starting from word: "Metadata" but I would like lines starting from:
"{[Metadata"
I've tried something like:
String[] tab = line.split("(?=\\b{[Metadata\\b)");
but it doesnt work.
Can anyone help me how to do that, plese?
You may use
(?=\{\[Metadata\b)
See a demo on regex101.com.
Note that the backslashes need to be escaped in Java so that it becomes
(?=\\{\\[Metadata\\b)
Here is solution using a formal pattern matcher. We can try matching your content using the following regex:
(?<=Metadata\\{\")[^\"]+
This uses a lookbehind to check for the Metadata marker, ending with a double quote. Then, it matches any content up to the closing double quote.
String line = "{{[Metadata{\"this, is my first, string\"}]},{[Metadata{\"this, is my second, string\"}]},{[Metadata{\"this, is my third string\"}]}}";
String pattern = "(?<=Metadata\\{\")[^\"]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
while (m.find( )) {
System.out.println(m.group(0));
}
this, is my first, string
this, is my second, string
this, is my third string

Regular expression match a-alphanumeric&b-digits&c-digits

I have query about java regular expressions. Actually, I am new to regular expressions.
So I need help to form a regex for the statement below:
Statement: a-alphanumeric&b-digits&c-digits
Possible matching Examples: 1) a-90485jlkerj&b-34534534&c-643546
2) A-RT7456ffgt&B-86763454&C-684241
Use case: First of all I have to validate input string against the regular expression. If the input string matches then I have to extract a value, b value and c value like
90485jlkerj, 34534534 and 643546 respectively.
Could someone please share how I can achieve this in the best possible way?
I really appreciate your help on this.
you can use this pattern :
^(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)$
In the case what you try to match is not the whole string, just remove the anchors:
(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)
explanations:
(?i) make the pattern case-insensitive
[0-9]++ digit one or more times (possessive)
[0-9a-z]++ the same with letters
^ anchor for the string start
$ anchor for the string end
Parenthesis in the two patterns are capture groups (to catch what you want)
Given a string with the format a-XXX&b-XXX&c-XXX, you can extract all XXX parts in one simple line:
String[] parts = str.replaceAll("[abc]-", "").split("&");
parts will be an array with 3 elements, being the target strings you want.
The simplest regex that matches your string is:
^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)
With your target strings in groups 1, 2 and 3, but you need lot of code around that to get you the strings, which as shown above is not necessary.
Following code will help you:
String[] texts = new String[]{"a-90485jlkerj&b-34534534&c-643546", "A-RT7456ffgt&B-86763454&C-684241"};
Pattern full = Pattern.compile("^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)");
Pattern patternA = Pattern.compile("(?i)([\\da-z]+)&[bc]");
Pattern patternB = Pattern.compile("(\\d+)");
for (String text : texts) {
if (full.matcher(text).matches()) {
for (String part : text.split("-")) {
Matcher m = patternA.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()).split("&")[0]);
}
m = patternB.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()));
}
}
}
}

Remove parenthesis from String using java regex

I want to remove parenthesis using Java regular expression but I faced to error No group 1 please see my code and help me.
public String find_parenthesis(String Expr){
String s;
String ss;
Pattern p = Pattern.compile("\\(.+?\\)");
Matcher m = p.matcher(Expr);
if(m.find()){
s = m.group(1);
ss = "("+s+")";
Expr = Expr.replaceAll(ss, s);
return find_parenthesis(Expr);
}
else
return Expr;
}
and it is my main:
public static void main(String args[]){
Calculator c1 = new Calculator();
String s = "(4+5)+6";
System.out.println(s);
s = c1.find_parenthesis(s);
System.out.println(s);
}
The simplest method is to just remove all parentheses from the string, regardless of whether they are balanced or not.
String replaced = "(4+5)+6".replaceAll("[()]", "");
Correctly handling the balancing requires parsing (or truly ugly REs that only match to a limited depth, or “cleverness” with repeated regular expression substitutions). For most cases, such complexity is overkill; the simplest thing that could possibly work is good enough.
What you want is this: s = s.replaceAll("[()]","");
For more on regex, visit regex tutorial.
You're getting the error because your regex doesn't have any groups, but I suggest you use this much simpler, one-line approach:
expr = expr.replaceAll("\\((.+?)\\)", "$1");
You can't do this with a regex at all. It won't remove the matching parentheses, just the first left and the first right, and then you won't be able to get the correct result from the expression. You need a parser for expressions. Have a look around for recursive descent ezpresssion parsers, the Dijkstra shunting-yard algorithm, etc.
The regular expression defines a character class consisting of any whitespace character (\s, which is escaped as \s because we're passing in a String), a dash (escaped because a dash means something special in the context of character classes), and parentheses. Try it working code.
phoneNumber.replaceAll("[\\s\\-()]", "");
I know I'm very late here. But, just in case you're still looking for a better answer. If you want to remove both open and close parenthesis from a string, you can use a very simple method like this:
String s = "(4+5)+6";
s=s.replaceAll("\\(", "").replaceAll("\\)","");
If you are using this:
s=s.replaceAll("()", "");
you are instructing the code to look for () which is not present in your string. Instead you should try to remove the parenthesis separately.
To explain in detail, consider the below code:
String s = "(4+5)+6";
String s1=s.replaceAll("\\(", "").replaceAll("\\)","");
System.out.println(s1);
String s2 = s.replaceAll("()", "");
System.out.println(s2);
The output for this code will be:
4+5+6
(4+5)+6
Also, use replaceAll only if you are in need of a regex. In other cases, replace works just fine. See below:
String s = "(4+5)+6";
String s1=s.replace("(", "").replace(")","");
Output:
4+5+6
Hope this helps!

regex - matching between a literal string and a quotation mark

I'm terrible at Regex and would greatly appreciate any help with this issue, which I think will be newb stuff for anyone familiar.
I'm getting a response like this from a REST call
{"responseData":{"translatedText":"Ciao mondo"},"responseDetails":"","responseStatus":200,"matches":[{"id":"424913311","segment":"Hello World","translation":"Ciao mondo","quality":"74","reference":"","usage-count":50,"subject":"All","created-by":"","last-updated-by":null,"create-date":"2011-12-29 19:14:22","last-update-date":"2011-12-29 19:14:22","match":1},{"id":"0","segment":"Hello World","translation":"Ciao a tutti","quality":"70","reference":"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.","usage-count":1,"subject":"All","created-by":"MT!","last-updated-by":null,"create-date":"2012-05-14","last-update-date":"2012-05-14","match":0.85}]}
All I need is the 'Ciao mondo' in between those quotations. I was hoping with Java's Split feature I could do this but unfortunately it doesn't allow two separate delimiters as then I could have specified the text before the translation.
To simplify, what I'm stuck with is the regex to gather whatever is inbetween translatedText":" and the next "
I'd be very grateful for any help
You can use \"translatedText\":\"([^\"]*)\" expression to capture the match.
The expression meaning is as follows: find quoted translatedText followed by a colon and an opening quote. Then match every character before the following quote, and capture the result in a capturing group.
String s = " {\"responseData\":{\"translatedText\":\"Ciao mondo\"},\"responseDetails\":\"\",\"responseStatus\":200,\"matches\":[{\"id\":\"424913311\",\"segment\":\"Hello World\",\"translation\":\"Ciao mondo\",\"quality\":\"74\",\"reference\":\"\",\"usage-count\":50,\"subject\":\"All\",\"created-by\":\"\",\"last-updated-by\":null,\"create-date\":\"2011-12-29 19:14:22\",\"last-update-date\":\"2011-12-29 19:14:22\",\"match\":1},{\"id\":\"0\",\"segment\":\"Hello World\",\"translation\":\"Ciao a tutti\",\"quality\":\"70\",\"reference\":\"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.\",\"usage-count\":1,\"subject\":\"All\",\"created-by\":\"MT!\",\"last-updated-by\":null,\"create-date\":\"2012-05-14\",\"last-update-date\":\"2012-05-14\",\"match\":0.85}]}";
System.out.println(s);
Pattern p = Pattern.compile("\"translatedText\":\"([^\"]*)\"");
Matcher m = p.matcher(s);
if (!m.find()) return;
System.out.println(m.group(1));
This fragment prints Ciao mondo.
use look-ahead and look-behind to gather strings inside quotations:
(?<=[,.{}:]\").*?(?=\")
class Test
{
public static void main(String[] args)
{
Scanner scanner = new Scanner(System.in);
String in = scanner.nextLine();
Matcher matcher = Pattern.compile("(?<=[,.{}:]\\\").*?(?=\\\")").matcher(in);
while(matcher.find())
System.out.println(matcher.group());
}
}
Try this regular expression -
^.*translatedText":"([^"]*)"},"responseDetails".*$
The matching group will contain the text Ciao mondo.
This assumes that translatedText and responseDetails will always occur in the positions specified in your sample.

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

Categories