How would I parse a file like this:
Item costs $15 and is made up of --Metal--
Item costs $64 and is made up of --Plastic--
I can do
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
String result = m.group();
But how would I get EVERY result?
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(input);
List<String> matches = new ArrayList<String>();
while(m.find()){
matches.add(m.group());
}
Related
how to get all word if it has _I, im using "\S_I+\S".
I Have String :
the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O
this my code
Pattern p = Pattern.compile("\\S*_I+\\S*");
Matcher m = p.matcher(input);
while(m.find()){
hasilReg = m.group();
}
after compile i got only one value : group_I-NP
but i would like all word if it has _I
thanks
The group_I-NP is the last value and you only get this because you reassign the hasilReg value all the time. Add the results to a List<String>:
String str = "the_B-NP camera_I-NP is_B-VP very_B-ADJP easy_I-ADJP to_B-VP use_I-VP ,_O in_B-PP fact_B-NP on_B-PP a_B-NP recent_I-NP trip_I-NP this_B-NP past_I-NP week_I-NP i_I-NP was_B-VP asked_I-VP to_B-VP take_I-VP a_B-NP picture_I-NP of_B-PP a_B-NP vacationing_I-NP elderly_I-NP group_I-NP ._O ";
Pattern ptrn = Pattern.compile("\\S*_I+\\S*");
Matcher matcher = ptrn.matcher(str);
List<String> lst = new ArrayList<>();
while (matcher.find()) {
lst.add(matcher.group());
}
System.out.println(lst);
// => [camera_I-NP, easy_I-ADJP, use_I-VP, recent_I-NP, trip_I-NP, past_I-NP, week_I-NP, i_I-NP, asked_I-VP, take_I-VP, picture_I-NP, vacationing_I-NP, elderly_I-NP, group_I-NP]
See the Java demo
I have an input string
invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend
I want to fetch only the subdata part of it, I tried,
Pattern p = Pattern.compile('(?<=sufixpart).*?(subdata.)+.*?(?=end)',Pattern.DOTALL);
Matcher m = p.matcher(inputString);
while(m.find()){
System.out.println(m.group(1));
}
But I get only the first match. How can i get all the subdata, such as [subdata1,subdata2,subdata3]?
I'd go for a simpler approach, get the blocks first with a regex like start(.*?)end and then extract all the matches from Group 1 with a mere subdata\S*-like regex.
See the Java demo:
String rx = "(?sm)^sufixpart$(.*?)^end$";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern_outer = Pattern.compile(rx);
Pattern pattern_token = Pattern.compile("(?m)^subdata\\S*$");
Matcher matcher = pattern_outer.matcher(s);
List<List<String>> res = new ArrayList<>();
while (matcher.find()){
List<String> lst = new ArrayList<>();
if (!matcher.group(1).isEmpty()) { // If Group 1 is not empty
Matcher m = pattern_token.matcher(matcher.group(1)); // Init the second matcher
while (m.find()) { // If a token is found
lst.add(m.group(0)); // add it to the list
}
}
res.add(lst); // Add the list to the result list
}
System.out.println(res); // => [[subdata1, subdata2, subdatan], [subdata001, subdata002, subdata00n]]
Another approach is to use a \G based regex:
(?sm)(?:\G(?!\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\S*)(?=.*?^end$)
See the regex demo
Explanation:
(?sm) - enables DOTALL and MULTILINE modes
(?:\G(?!\A)|^sufixpart$) - matches either the end of the previous successful match (\G(?!\A)) or a whole line with sufixpart text on it (^sufixpart$)
(?:(?!^(?:sufixpart|end)$).)*? - matches any single char that is not the starting point of a sufixpart or end that are whole lines
(subdata\S*) - Group 1 matching subdata and 0+ non-whitespaces
(?=.*?^end$) - there must be a end line after any 0+ chars.
Java demo:
String rx = "(?sm)(\\G(?!\\A)|^sufixpart$)(?:(?!^(?:sufixpart|end)$).)*?(subdata\\S*)(?=.*?^end$)";
String s = "invalidsufix\nsubadatax\nsufixpart\nsubdata1\nsomerandomn\nsubdata2\nsubdatan\nend\ninvalidsufix\nsubadatax\nsufixpart\nsubdata001\nsomerandomn\nsubdata002\nsubdata00n\nend";
Pattern pattern = Pattern.compile(rx);
Matcher matcher = pattern.matcher(s);
List<List<String>> res = new ArrayList<>();
List<String> lst = null;
while (matcher.find()){
if (!matcher.group(1).isEmpty()) {
if (lst != null) res.add(lst);
lst = new ArrayList<>();
lst.add(matcher.group(2));
} else lst.add(matcher.group(2));
}
if (lst != null) res.add(lst);
System.out.println(res);
I have string like
{Action}{RequestId}{Custom_21_addtion}{custom_22_substration}
{Imapact}{assest}{custom_23_multiplication}.
From this I want only those sub string which contains "custom".
For example from above string I want only
{Custom_21_addtion}{custom_22_substration}{custom_23_multiplication}.
How can I get this?
You can use a regular expression, looking from {custom to }. It will look like this:
Pattern pattern = Pattern.compile("\\{custom.*?\\}", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputString);
while (matcher.find()) {
System.out.print(matcher.group());
}
The .* after custom means 0 or more characters after the word "custom", and the question mark limits the regex to as few character as possible, meaning that it will break on the next } that it can find.
If you want an alternative solution without regex:
String a = "{Action}{RequestId}{Custom_21_addtion}{custom_22_substration}{Imapact}{assest}{custom_23_multiplication}";
String[] b = a.split("}");
StringBuilder result = new StringBuilder();
for(String c : b) {
// if you want case sensitivity, drop the toLowerCase()
if(c.toLowerCase().contains("custom"))
result.append(c).append("}");
}
System.out.println(result.toString());
you can do it sth like this:
StringTokenizer st = new StringTokenizer(yourString, "{");
List<String> llista = new ArrayList<String>():
Pattern pattern = Pattern.compile("(\W|^)custom(\W|$)", Pattern.CASE_INSENSITIVE);
while(st.hasMoreTokens()) {
String string = st.nextElement();
Matcher matcher = pattern.matcher(string);
if(matcher.find()){
llista.add(string);
}
}
Another solution:
String inputString = "{Action}{RequestId}{Custom}{Custom_21_addtion}{custom_22_substration}{Imapact}{assest}" ;
String strTokens[] = inputString.split("\\}");
for(String str: strTokens){
Pattern pattern = Pattern.compile( "custom", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(inputString);
if (matcher.find()) {
System.out.println("Tag Name:" + str.replace("{",""));
}
}
I have strings:
#Table(name = "T_MEM_MEMBER_ADDRESS1")
#Table( name = "T_MEM_MEMBER_ADDRESS2")
#Table ( name = "T_MEM_MEMBER_ADDRESS3" )
I want to write a regex, which can get the name value,such as :
T_MEM_MEMBER_ADDRESS1
T_MEM_MEMBER_ADDRESS2
T_MEM_MEMBER_ADDRESS3
I write
String regexPattern="...";
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(input);
boolean matches = matcher.matches();
if (matches){
log.debug(matcher.group(1));
}
but i cannot write the regexPattern..
You can use this regex:
(?<=")(.+)(?=")
In Java:
String regexPattern="(?<=\")(.+)(?=\")";
It uses look-behinds and lookaheads.
Group 1 will contain what you want.
You can use this piece of code:
String input = "#Table(name = \"T_MEM_MEMBER_ADDRESS1\")";
String regexPattern=".*\"(.*)\".*";
Pattern pattern = Pattern.compile(regexPattern);
Matcher matcher = pattern.matcher(input);
boolean matches = matcher.matches();
if (matches){
System.out.println(matcher.group(1));
}
Hope it helps.
I have sentence and I want to calculate words, semiPunctuation and endPunctuation in it.
Command "m.group()" will show String result. But how to know which group is found?
I can use method with "group null", but it is sounds not good.
String input = "Some text! Some example text."
int wordCount=0;
int semiPunctuation=0;
int endPunctuation=0;
Pattern pattern = Pattern.compile( "([\\w]+) | ([,;:\\-\"\']) | ([!\\?\\.]+)" );
Matcher m = pattern.matcher(input);
while (m.find()) {
// need more correct method
if(m.group(1)!=null) wordCount++;
if(m.group(2)!=null) semiPunctuation++;
if(m.group(3)!=null) endPunctuation++;
}
You could use named groups to capture the expressions
Pattern pattern = Pattern.compile( "(?<words>\\w+)|(?<semi>[,;:\\-\"'])|(?<end>[!?.])" );
Matcher m = pattern.matcher(input);
while (m.find()) {
if (m.group("words") != null) {
wordCount++;
}
...
}