How to crop multiple text from String in Java? - java

I want to crop a portion of String:
" this is Test [ABC:123456] Sting with multiple properties [ABC:98765] ..."
So in result i want to crop String between "[ ]". {Here ABC:12345 and ABC:98765}
Note There can be n number of property.
what is the Best way to get result.

public static void main(String[] args) {
String input = "test bla [ABC56465:asd] asdasdqwd [DEF:345]";
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher match = pattern.matcher(input);
while(match.find()){
System.out.println(match.group());
}
}
Follow the Tutorials from Niels. This could be solution.
To get the output without the "[ ]" just replace:
System.out.println(match.group());
With:
System.out.println(match.group(1));
as mentioned in the comments.

You need to define you pattern and do a regex matching, extrating the group you need.
See http://tutorials.jenkov.com/java-regex/matcher.html
for a tutorial on regex. (Especially the find() start() and end() section)
In your case the pattern should be very simple.

In Java, you can do this:
String resultString = subjectString.replaceAll("\\[[^\\]]*\\]", "");
Explanation
\[ matches the opening bracket
The negated character class [^\]]* matches any character that is not a closing bracket
\] matches the closing bracket
We replace with the empty string

Related

Regular expression -- ignore part of a matched string

I have text that I want to fined some thing like this
"Name DAVID"
I want to match "DAVID" in this, larger, text.
I tried use a regular expression like this:
(Name(.*))
and also
(?:Name(.*))
but this also matched "Name," and I only want to match "David".
Just drop the extra parens:
"Name (.*)"
Even that is probably excessive, you probably want something more like:
"Name (\w*)"
to catch exactly the characters that you want.
You need to use a Matcher. This snippet of code worked for me
public static void main(String[] asdf) {
String text = "NAME David";
Pattern p = Pattern.compile("NAME (.+)");
Matcher m = p.matcher(text);
if (m.matches()){
System.out.println(m.group(1));
}
}
Note that m.matches() is mandatory or m.group(1) will throw java.lang.IllegalStateException: No match found
String's matches()
public static void main(String[] args) {
String regex = "^Name(.+)$";
System.out.println("Name".matches(regex));
System.out.println("Name MIKE".matches(regex));
System.out.println("Name DAVID".matches(regex));
}
For some reason, it doesn't look like your question had yet been answered with a correct regex for what you requested.
Here is what you are looking for:
(?<=Name )DAVID
This only matches DAVID, in the proper context (see demo).
You probably know this, but this is a way to test a string with this regex:
Pattern regex = Pattern.compile("(?<=Name )DAVID");
Matcher regexMatcher = regex.matcher(subjectString);
foundMatch = regexMatcher.find();
Explain Regex
(?<= # look behind to see if there is:
Name # 'Name '
) # end of look-behind
DAVID # 'DAVID'

Java expression for Embedded parameter

I am defining java expression for my pattern but it does not work.
Here is the text that I want to define a pattern for:
"sometext {10} some text {25} sometext".
Named parameters are {10}, {25}, ....
I used pattern like this: "({\d+})*" but It doesn't work and I received exception:
Caused by: java.util.regex.PatternSyntaxException: Illegal repetition near index 0
({\d+})*
Here is the code I have:
public static final Pattern pattern = Pattern.compile("({\\d+})*");
public static void main(String[] args) {
String s = "{10}ABC{2}";
Matcher matcher = pattern .matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Can someone explain what I am wrong here? Thanks.
There are a few issues with your Pattern.
Firstly you don't need the outer parenthesis, as you're not defining a group save from the main group.
Secondly you must escape the curly brackets (\\{), otherwise they will be interpreted as a quantifier.
Thirdly you don't need the last quantifier (*), as you're iterating over matches within the same input String
So your Pattern will look something like "\\{\\d+\\}".
More info on Java Patterns here.
Edit -- example
String input = "sometext {10} some text {25} sometext";
Pattern p = Pattern.compile("\\{\\d+\\}");
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.group());
}
Output:
{10}
{25}
{ is a special character in regex, just double-escape it \\{. Same for }.
Also take into account that if you use *, it will also match empty strings.

regex - matching between a literal string and a quotation mark

I'm terrible at Regex and would greatly appreciate any help with this issue, which I think will be newb stuff for anyone familiar.
I'm getting a response like this from a REST call
{"responseData":{"translatedText":"Ciao mondo"},"responseDetails":"","responseStatus":200,"matches":[{"id":"424913311","segment":"Hello World","translation":"Ciao mondo","quality":"74","reference":"","usage-count":50,"subject":"All","created-by":"","last-updated-by":null,"create-date":"2011-12-29 19:14:22","last-update-date":"2011-12-29 19:14:22","match":1},{"id":"0","segment":"Hello World","translation":"Ciao a tutti","quality":"70","reference":"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.","usage-count":1,"subject":"All","created-by":"MT!","last-updated-by":null,"create-date":"2012-05-14","last-update-date":"2012-05-14","match":0.85}]}
All I need is the 'Ciao mondo' in between those quotations. I was hoping with Java's Split feature I could do this but unfortunately it doesn't allow two separate delimiters as then I could have specified the text before the translation.
To simplify, what I'm stuck with is the regex to gather whatever is inbetween translatedText":" and the next "
I'd be very grateful for any help
You can use \"translatedText\":\"([^\"]*)\" expression to capture the match.
The expression meaning is as follows: find quoted translatedText followed by a colon and an opening quote. Then match every character before the following quote, and capture the result in a capturing group.
String s = " {\"responseData\":{\"translatedText\":\"Ciao mondo\"},\"responseDetails\":\"\",\"responseStatus\":200,\"matches\":[{\"id\":\"424913311\",\"segment\":\"Hello World\",\"translation\":\"Ciao mondo\",\"quality\":\"74\",\"reference\":\"\",\"usage-count\":50,\"subject\":\"All\",\"created-by\":\"\",\"last-updated-by\":null,\"create-date\":\"2011-12-29 19:14:22\",\"last-update-date\":\"2011-12-29 19:14:22\",\"match\":1},{\"id\":\"0\",\"segment\":\"Hello World\",\"translation\":\"Ciao a tutti\",\"quality\":\"70\",\"reference\":\"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.\",\"usage-count\":1,\"subject\":\"All\",\"created-by\":\"MT!\",\"last-updated-by\":null,\"create-date\":\"2012-05-14\",\"last-update-date\":\"2012-05-14\",\"match\":0.85}]}";
System.out.println(s);
Pattern p = Pattern.compile("\"translatedText\":\"([^\"]*)\"");
Matcher m = p.matcher(s);
if (!m.find()) return;
System.out.println(m.group(1));
This fragment prints Ciao mondo.
use look-ahead and look-behind to gather strings inside quotations:
(?<=[,.{}:]\").*?(?=\")
class Test
{
public static void main(String[] args)
{
Scanner scanner = new Scanner(System.in);
String in = scanner.nextLine();
Matcher matcher = Pattern.compile("(?<=[,.{}:]\\\").*?(?=\\\")").matcher(in);
while(matcher.find())
System.out.println(matcher.group());
}
}
Try this regular expression -
^.*translatedText":"([^"]*)"},"responseDetails".*$
The matching group will contain the text Ciao mondo.
This assumes that translatedText and responseDetails will always occur in the positions specified in your sample.

java regular expression

Can anyone please help me do the following in a java regular expression?
I need to read 3 characters from the 5th position from a given String ignoring whatever is found before and after.
Example : testXXXtest
Expected result : XXX
You don't need regex at all.
Just use substring: yourString.substring(4,7)
Since you do need to use regex, you can do it like this:
Pattern pattern = Pattern.compile(".{4}(.{3}).*");
Matcher matcher = pattern.matcher("testXXXtest");
matcher.matches();
String whatYouNeed = matcher.group(1);
What does it mean, step by step:
.{4} - any four characters
( - start capturing group, i.e. what you need
.{3} - any three characters
) - end capturing group, you got it now
.* followed by 0 or more arbitrary characters.
matcher.group(1) - get the 1st (only) capturing group.
You should be able to use the substring() method to accomplish this:
string example = "testXXXtest";
string result = example.substring(4,7);
This might help: Groups and capturing in java.util.regex.Pattern.
Here is an example:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Example {
public static void main(String[] args) {
String text = "This is a testWithSomeDataInBetweentest.";
Pattern p = Pattern.compile("test([A-Za-z0-9]*)test");
Matcher m = p.matcher(text);
if (m.find()) {
System.out.println("Matched: " + m.group(1));
} else {
System.out.println("No match.");
}
}
}
This prints:
Matched: WithSomeDataInBetween
If you don't want to match the entire pattern rather to the input string (rather than to seek a substring that would match), you can use matches() instead of find(). You can continue searching for more matching substrings with subsequent calls with find().
Also, your question did not specify what are admissible characters and length of the string between two "test" strings. I assumed any length is OK including zero and that we seek a substring composed of small and capital letters as well as digits.
You can use substring for this, you don't need a regex.
yourString.substring(4,7);
I'm sure you could use a regex too, but why if you don't need it. Of course you should protect this code against null and strings that are too short.
Use the String.replaceAll() Class Method
If you don't need to be performance optimized, you can try the String.replaceAll() class method for a cleaner option:
String sDataLine = "testXXXtest";
String sWhatYouNeed = sDataLine.replaceAll( ".{4}(.{3}).*", "$1" );
References
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html
http://www.vogella.com/tutorials/JavaRegularExpressions/article.html#using-regular-expressions-with-string-methods

How do I make a regex match for measurement units?

I'm building a small Java library which has to match units in strings. For example, if I have "300000000 m/s^2", I want it to match against "m" and "s^2".
So far, I have tried most imaginable (by me) configurations resembling (I hope it's a good start)
"[[a-zA-Z]+[\\^[\\-]?[0-9]+]?]+"
To clarify, I need something that will match letters[^[-]numbers] (where [ ] denotes non obligatory parts). That means: letters, possibly followed by an exponent which is possibly negative.
I have studied regex a little bit, but I'm really not fluent, so any help will be greatly appreciated!
Thank you very much,
EDIT:
I have just tried the first 3 replies
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
and it doesn't work... I know the code which tests the patterns work, because if I try something simple, like matching "[0-9]+" in "12345", it will match the whole string. So, I don't get what's still wrong. I'm trying with changing my brackets for parenthesis where needed at the moment...
CODE USED TO TEST:
public static void main(String[] args) {
String input = "30000 m/s^2";
// String input = "35345";
String regex1 = "([a-zA-Z]+)(?:\\^(-?\\d+))?";
String regex2 = "[a-zA-Z]+(\\^-?[0-9]+)?";
String regex3 = "[a-zA-Z]+(?:\\^-?[0-9]+)?";
String regex10 = "[0-9]+";
String regex = "([a-zA-Z]+)(?:\\^\\-?[0-9]+)?";
Pattern pattern = Pattern.compile(regex3);
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
System.out.println("MATCHES");
do {
int start = matcher.start();
int end = matcher.end();
// System.out.println(start + " " + end);
System.out.println(input.substring(start, end));
} while (matcher.find());
}
}
([a-zA-Z]+)(?:\^(-?\d+))?
You don't need to use the character class [...] if you're matching a single character. (...) here is a capturing bracket for you to extract the unit and exponent later. (?:...) is non-capturing grouping.
You're mixing the use of square brackets to denote character classes and curly brackets to group. Try this instead:
[a-zA-Z]+(\^-?[0-9]+)?
In many regular expression dialects you can use \d to mean any digit instead of [0-9].
Try
"[a-zA-Z]+(?:\\^-?[0-9]+)?"

Categories