how to get all word if it has _I, im using "\S_I+\S".
I Have String :
the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O
this my code
Pattern p = Pattern.compile("\\S*_I+\\S*");
Matcher m = p.matcher(input);
while(m.find()){
hasilReg = m.group();
}
after compile i got only one value : group_I-NP
but i would like all word if it has _I
thanks
The group_I-NP is the last value and you only get this because you reassign the hasilReg value all the time. Add the results to a List<String>:
String str = "the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O ";
Pattern ptrn = Pattern.compile("\\S*_I+\\S*");
Matcher matcher = ptrn.matcher(str);
List<String> lst = new ArrayList<>();
while (matcher.find()) {
lst.add(matcher.group());
}
System.out.println(lst);
// => [camera_I-NP, easy_I-ADJP, use_I-VP, recent_I-NP, trip_I-NP, past_I-NP, week_I-NP, i_I-NP, asked_I-VP, take_I-VP, picture_I-NP, vacationing_I-NP, elderly_I-NP, group_I-NP]
See the Java demo
Related
I have an input string
invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend
I want to fetch only the subdata part of it, I tried,
Pattern p = Pattern.compile('(?<=sufixpart).*?(subdata.)+.*?(?=end)',Pattern.DOTALL);
Matcher m = p.matcher(inputString);
while(m.find()){
System.out.println(m.group(1));
}
But I get only the first match. How can i get all the subdata, such as [subdata1,subdata2,subdata3]?
I'd go for a simpler approach, get the blocks first with a regex like start(.*?)end and then extract all the matches from Group 1 with a mere subdata\S*-like regex.
See the Java demo:
String rx = "(?sm)^sufixpart$(.*?)^end$";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern_outer = Pattern.compile(rx);
Pattern pattern_token = Pattern.compile("(?m)^subdata\\S*$");
Matcher matcher = pattern_outer.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
List<String> lst = new ArrayList<>();
if (!matcher.group(1).isEmpty()) { // If Group 1 is not empty
Matcher m = pattern_token.matcher(matcher.group(1)); // Init the second matcher
while (m.find()) { // If a token is found
lst.add(m.group(0)); // add it to the list
}
}
res.add(lst); // Add the list to the result list
}
System.out.println(res); // => [[subdata1, subdata2, subdatan], [subdata001, subdata002, subdata00n]]
Another approach is to use a \G based regex:
(?sm)(?:\G(?!\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\S*)(?=.*?^end$)
See the regex demo
Explanation:
(?sm) - enables DOTALL and MULTILINE modes
(?:\G(?!\A)|^sufixpart$) - matches either the end of the previous successful match (\G(?!\A)) or a whole line with sufixpart text on it (^sufixpart$)
(?:(?!^(?:sufixpart|end)$).)*? - matches any single char that is not the starting point of a sufixpart or end that are whole lines
(subdata\S*) - Group 1 matching subdata and 0+ non-whitespaces
(?=.*?^end$) - there must be a end line after any 0+ chars.
Java demo:
String rx = "(?sm)(\\G(?!\\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\\S*)(?=.*?^end$)";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
List<String> lst = null;
while (matcher.find()){
if (!matcher.group(1).isEmpty()) {
if (lst != null) res.add(lst);
lst = new ArrayList<>();
lst.add(matcher.group(2));
} else lst.add(matcher.group(2));
}
if (lst != null) res.add(lst);
System.out.println(res);
i am trying to extract "d320" from the below string using regex in java using the below code
n-us; micromax d320 build/kot49h)
String m = "n-us; micromax d320 build/kot49h) ";
String pattern = "micromax (.*)(\\d\\D)(.*) ";
Pattern r = Pattern.compile(pattern);
Matcher m1 = r.matcher(m);
if (m1.find()) {
System.out.println(m1.group(1));
}
but it is giving me the output as "d320 build/kot4" , i want only d320
Try to use micromax\\s(.*?)\\s like this:
String m = "n-us; micromax d320 build/kot49h) ";
String pattern = "micromax\\s(.*?)\\s";
Pattern r = Pattern.compile(pattern);
Matcher m1 = r.matcher(m);
if (m1.find()) {
System.out.println(m1.group(1));
}
Output:
d320
It's not known whether you want the word after "micromax", or the word that starts with a letter and has all digits afterward, so here's both solutions:
To extract the word following "micromax":
String code = m.replaceAll(".*micromax\\s+(\\w+)?.*", "$1");
To extract the word that looks like "x9999":
String code = m.replaceAll(".*?\b([a-z]\\d+)?\b.*", "$1");
Both snippets will result in a blank string if is there's no match.
I have string like
{Action}{RequestId}{Custom_21_addtion}{custom_22_substration}
{Imapact}{assest}{custom_23_multiplication}.
From this I want only those sub string which contains "custom".
For example from above string I want only
{Custom_21_addtion}{custom_22_substration}{custom_23_multiplication}.
How can I get this?
You can use a regular expression, looking from {custom to }. It will look like this:
Pattern pattern = Pattern.compile("\\{custom.*?\\}", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
System.out.print(matcher.group());
}
The .* after custom means 0 or more characters after the word "custom", and the question mark limits the regex to as few character as possible, meaning that it will break on the next } that it can find.
If you want an alternative solution without regex:
String a = "{Action}{RequestId}{Custom_21_addtion}{custom_22_substration}{Imapact}{assest}{custom_23_multiplication}";
String[] b = a.split("}");
StringBuilder result = new StringBuilder();
for(String c : b) {
// if you want case sensitivity, drop the toLowerCase()
if(c.toLowerCase().contains("custom"))
result.append(c).append("}");
}
System.out.println(result.toString());
you can do it sth like this:
StringTokenizer st = new StringTokenizer(yourString, "{");
List<String> llista = new ArrayList<String>():
Pattern pattern = Pattern.compile("(\W|^)custom(\W|$)", Pattern.CASE_INSENSITIVE);
while(st.hasMoreTokens()) {
String string = st.nextElement();
Matcher matcher = pattern.matcher(string);
if(matcher.find()){
llista.add(string);
}
}
Another solution:
String inputString = "{Action}{RequestId}{Custom}{Custom_21_addtion}{custom_22_substration}{Imapact}{assest}" ;
String strTokens[] = inputString.split("\\}");
for(String str: strTokens){
Pattern pattern = Pattern.compile( "custom", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputString);
if (matcher.find()) {
System.out.println("Tag Name:" + str.replace("{",""));
}
}
I need replace {word} by a regex named group: (?< word >\w++) to future match expressions, i.e.: /{name}/{age}... This code doesn't work!
String p = "/{name}/{id}";
p = p.replaceAll("\\{(\\w+)\\}", "(?<$1>\\\\\\\\w+)");
Pattern URL_PATTERN = Pattern.compile(p);
CharSequence cs = "/lucas/3";
Matcher m = URL_PATTERN.matcher(cs);
if(m.matches()){
for(int i=1;i<m.groupCount();++i){
System.out.println(m.group("name"));
}
}
Result: nothing :(
But when I get the result of replacement: /(?\w+)/(?\w+) and put in Pattern.compile() this works:
String p = "/{name}/{id}";
p = p.replaceAll("\\{(\\w+)\\}", "(?<$1>\\\\\\\\w+)");
Pattern URL_PATTERN = Pattern.compile("/(?<name>\\w+)/(?<id>\\w+)");
System.out.println(p);
CharSequence cs = "/lucas/3";
Matcher m = URL_PATTERN.matcher(cs);
if(m.matches()){
for(int i=1;i<m.groupCount();++i){
System.out.println(m.group("name"));
}
}
Result: "lucas"
What's wrong?
I think you used too many \ in your replace. Try
p = p.replaceAll("\\{(\\w+)\\}", "(?<$1>\\\\\\w+)");
How would I parse a file like this:
Item costs $15 and is made up of --Metal--
Item costs $64 and is made up of --Plastic--
I can do
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
String result = m.group();
But how would I get EVERY result?
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
List<String> matches = new ArrayList<String>();
while(m.find()){
matches.add(m.group());
}