Regex to extract part of string before regex match - java

Say i have the following line:
My '123? Text ā€“ 56x73: Hello World blablabla
I want to extract everything before " - 56x73 ..."
I already found a regex to match the part which I don't want to extract:
\sā€“\s\d{1,2}x\d{1,2}:\s.+
How can I get only the other part using Java and Regex?

use
String str= ...
String regex= your regex
Pattern pattern;
Matcher matcher;
pattern = Pattern.compile(regex);
matcher = pattern.matcher(str);
if (matcher.find())
{
matcher.group(0, 1, ...)
use () in your regex to deliminate groups

You already got way around but this can be helpful,
Assuming 56 and 73 will NOT be constant.
Use Regex: "(.*)(\\s)(.*)(\\s)([\\d]+[x][\\d]+)"
then use "group(int number)" where a number will be 1 in this case.
I used .* between two \s intentionally to get around with "-" thing I didn't anything about that but I found this. Also noticed from one of the comment.
If anybody wants to edit and improve my answer you are more than welcome.

Related

What is the easiest way to filter a changing number in a string?

Can someone tell me the easiest way to extract the number '20' in the following substring.
Level I (10/20)
Note: The numbers in the brackets and the number behind 'Level' are changing and can contain more chars than in this example
It would be awesome if there is a method for using a regex and extract a specific part out of it.
I'm not the best with regex, but here's a working solution for your example:
String s = "Level I (10/20)";
Pattern pattern = Pattern.compile("\\(\\d+/(\\d+)\\)");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
20
How about this one, works for multi-line input too:
^Level[[:blank:]].+\([\d]*\/([\d]*)\)
Test here

Regex in java to extract specific pattern

I want to match the pattern (including the square brackets, equals, quotes)
[fixedtext="sometext"]
What would be a correct regex expression?
Anything can occur inside quotes. 'fixedtext' is fixed.
Your basic solution (although I'd be skeptical of this, per the comments) is essentially:
"\\[fixedtext=\\\"(.*)\\\"\\]"
which resolves to:
"\[fixedtext=\"(.*)\"\]"
Simple escaping of [] and quotes. The (.*) says capture everything in quotes as a capture group (matcher.group(1)).
But if you had a string of, for example '[fixedtext="abc\"]def"]' you'd get the an answer of abc\ instead of abc\"]def.
If you know the ending bracket ends the line, then use:
"\\[fixedtext=\\\"(.*)\\\"\\]$"
(add the $ at the end to mark end of line) and that should be fairly reliable.
My suggestion is using named-capturing groups.
You can find more details here:
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Here's an example for your input:
String input = "[fixedtext=\"sometext\"]";
Pattern pattern = Pattern.compile("\\[(?<field>.*)=\"(?<value>.*)\"]");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println(matcher.group("field"));
System.out.println(matcher.group("value"));
} else {
System.err.println(input + " doesn't match " + pattern);
}

How to extract integer from Java Regex?

Pardon my novelty in java, I have the following string ( Below ), I am trying to clean it and extract only the integer digits. What would be the correct java regex to use to achieve my goal:
Original String : uint32_t Count "77 (0x0000004D)"
Desired Output: 77
I have tried reading Java docs on regex but I only got more confused. I guess EE engineers are not cut for this fancy coding tricks :D
You could exploit "\\b" which is a word boundary:
String regex = "\\b\\d+\\b";
Matcher m = Pattern.compile(regex).matcher("uint32_t Count \"77 (0x0000004D)\"");
m.find();
System.out.println(m.group()); //output 77
"\\d+" finds a substring of digits, and surrounding it with "\\b" ensures that it is not embedded in another word/symbol.
more examples to get a pattern helps but with what you have given i can think of a simple regex that matches the group with the given pattern and then you strip out the quote and get your integer.
(["](\d{1,}))
I would suggest you play around regex more over here so you learn as you practice

Correct Regular Expression

I'm using Java Regex to read String of the type
"{\n 'Step 1.supply.vendor1.quantity':\"80"\,\n
'Step 2.supply.vendor2.quantity':\"120"\,\n
'Step 3.supply.vendor3.quantity':\"480"\,\n
'Step 4.supply.vendor4.quantity':\"60"\,\n}"
I have to detect strings of type
'Step 2.supply.vendor2.quantity':\"120"\,\n.
I'm trying to use pattern and matcher of regex but I'm not able to figure out the correct regular expression for lines like
<Beginning of Line><whitespace><whitespace><'Step><whitespace><Number><.><Any number & any type of characters><,\n><EOL>.
The <Beginning of Line> and <EOL> I have used for clarification purpose.
I have tried several patterns
String regex = "(\\n\\s{2})'Step\\s\\d.*,\n";
String regex = "\\s\\s'Step\\s\\d.*,\n";
I always get IllegalStateException: No match found.
I'm not able to find proper material to read on Java Regex with good examples. Any help would be really great. Thanks.
As the others said in the comments, you should really use a JSON Parser.
But if you want to see how it could work with a regex, here is how you can do it :
Take an example of a line you want to capture :Step 1.supply.vendor1.quantity':"80"
Replace digits with \\d* (\\d matches any digit)
Replace dots with \\. (dots need to be escaped)
Add some parenthesis around the parts that you want to capture
Here is the resulting regex : "Step (\\d*)\\.supply\\.vendor(\\d*)\\.quantity':\"(\\d*)\""
Now, use a Regex and a Matcher :
String input = "{\n 'Step 1.supply.vendor1.quantity':\"80\"\\,\n";
Pattern pattern = Pattern.compile("Step (\\d*)\\.supply\\.vendor(\\d*)\\.quantity':\"(\\d*)\"");
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
Output :
1 //(corresponds to "Step (\\d*)")
1 //(corresponds to "vendor(\\d*)")
80 //(corresponds to "quantity':\"(\\d*)")

Strip all reluctant curly braces using regex

Note: This is a Java-only question (i.e. no Javascript, sed, Perl, etc.)
I need to filter out all the "reluctant" curly braces ({}) in a long string of text.
(by "reluctant" I mean as in reluctant quantifier).
I have been able to come up with the following regex which correctly finds and lists all such occurrences:
Pattern pattern = Pattern.compile("(\\{)(.*?)(\\})", Pattern.DOTALL);
Matcher matcher = pattern.matcher(originalString);
while (matcher.find()) {
Log.d("WITHIN_BRACES", matcher.group(2));
}
My problem now is how to replace every found matcher.group(0) with the corresponding matcher.group(2).
Intuitively I tried:
while (matcher.find()) {
String noBraces = matcher.replaceAll(matcher.group(2));
}
But that replaced all found matcher.group(0) with only the first matcher.group(2), which is of course not what I want.
Is there an expression or a method in Java's regex to perform this "corresponding replaceAll" that I need?
ANSWER: Thanks to the tip below, I have been able to come up with 2 fixes that did the trick:
if (matcher.find()) {
String noBraces = matcher.replaceAll("$2");
}
Fix #1: Use "$2" instead of matcher.group(2)
Fix #2: Use if instead of while.
Works now like a charm.
You can use the special backreference syntax:
String noBraces = matcher.replaceAll("$2");

Categories