Trying to split a string into 3 parts with Regex - java

I need to split up a JSONPath path into 3 parts if it has a separator. A separator would be an indicator of an array.
For example:
$.Colors[*].name
Would need to become:
Before: "$.Colors"
Separator: "[*]"
After: ".name"
In the event that there's multiple:
like:
$.Colors[*].Color[*].name
It would need to take the first:
Before: "$.Colors"
Separator: "[*]"
After: ".Color[*].name"
I also want this to work on filters:
$.Colors[?(#.type == 'Primary')].Color[*].name
It would split on that filter value.
Before: "$.Colors"
Separator: "[?(#.type == 'Primary')]"
After: ".Color[*].name"
My attempts have been fruitless thus far:
static private String regexString = "\\[\\*]|\\[\\?\\(.*\\)]";
static private Pattern pattern = Pattern.compile(regexString);
private boolean splittable;
private String pre;
private String post;
private String split;
PathSplitter(String path) {
Matcher matcher = pattern.matcher(path);
if (!matcher.find()) {
splittable = false;
}
else {
splittable = true;
split = matcher.group(0);
//pre = matcher.group(1);
//post = matcher.group(2);
}
}
Any help would be great!

The regex you need is this for getting the expected matches as mentioned in your post,
(.*?)(\[[^[\]]*\])(.*)
Here,
(.*?) - This part captures the Before part anything as less as possible before the separator pattern and captures the data in group1
(\[[^[\]]*\]) - This part captures the separator which starts with literal [ followed by any character other than [ and ] zero or more followed by a closing ]
(.*) - Finally this captures the remaining of After part
Regex Demo
Java code,
List<String> list = Arrays.asList("$.Colors[*].name","$.Colors[*].Color[*].name","$.Colors[?(#.type == 'Primary')].Color[*].name");
Pattern p = Pattern.compile("(.*?)(\\[[^\\[\\]]*\\])(.*)");
list.forEach(x -> {
Matcher m = p.matcher(x);
if (m.matches()) {
System.out.println("For string: " + x);
System.out.println("Before: "+ m.group(1));
System.out.println("Separator: "+ m.group(2));
System.out.println("After: "+ m.group(3));
System.out.println();
}
});
Prints the following like you expected,
For string: $.Colors[*].name
Before: $.Colors
Separator: [*]
After: .name
For string: $.Colors[*].Color[*].name
Before: $.Colors
Separator: [*]
After: .Color[*].name
For string: $.Colors[?(#.type == 'Primary')].Color[*].name
Before: $.Colors
Separator: [?(#.type == 'Primary')]
After: .Color[*].name

Related

Get the value after a string and comma and ends if there is character '|'

If I have string variable :
String word = "wordA";
and I have another string variable :
String fullText= "wordA,A A|wordB,B B|wordC,C C|wordD,D D";
so is it possible to get the value after the comma and ends with | ?
Example
If word equals "wordA" then I get only "A A" because in fullText right after wordA and comma is 'A A' and ends with |
and if word equals "wordD" then varible result is "D D" based on the variable fullText.
So how to get this variable result in Java ?
You can use a simple regular expression. Like this:
String text = fullText.replaceAll(".*" + word + ",([^\\|]+).*", "$1");
Alternatively:
Matcher matcher = Pattern.compile(word + ",([^\\|]+)").matcher(fullText);
matcher.find();
matcher.group(1); // "A A" for word = wordA
If you are using Java8 you can use stream like so :
String result = Arrays.stream(fullText.split("\\|")) // split with |
.filter(s -> s.startsWith(word + ",")) // filter by start with word + ','
.findFirst() // find first or any
.map(a -> a.substring(word.length() + 1)) // get every thing after work + ','
.orElse(null); // or else null or any default value
How about this:
public static String search(String fullText, String key) {
Pattern re = Pattern.compile("(?:^|\\|)" + key + ",([^|]*)(?:$|\\|)");
Matcher matcher = re.matcher(fullText);
if (matcher.find()) {
return matcher.group(1);
}
return null;
}
Example:
String fullText= "wordA,A A|wordB,B B|wordC,C C|wordD,D D";
System.out.println(search(fullText, "wordA"));
System.out.println(search(fullText, "wordB"));
System.out.println(search(fullText, "wordC"));
System.out.println(search(fullText, "wordD"));
Output:
A A
B B
C C
D D
UPDATE: To avoid recompiling the regex at each search:
private static final Pattern RE = Pattern.compile("(?:^|\\|)([^,]*),([^|]*)(?:$|(?=\\|))");
public static String search(String fullText, String key) {
Matcher matcher = RE.matcher(fullText);
while (matcher.find()) {
if (matcher.group(1).equals(key)) {
return matcher.group(2);
}
}
return null;
}

Java Regex jumps to next match with if clause [duplicate]

I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22

How to find match for exact word using pattern matcher in java

I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true
For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";
Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";
If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true
Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";

Regex pattern java with commas

I have a below string which comes from an excel column
"\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\""
I would like to set regex pattern to retrieve the entire string,so that my result would be exactly like
"USE CODE ""Gef, sdf"" FROM 1/7/07"
Below is what I tried
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\", Delete , Hello , How are you ? , ";
String line2 = "Test asda ds asd, tesat2 . test3";
String dpattern = "(\"[^\"]*\")(?:,(\"[^\"]*\"))*,|([^,]+),";
// Create a Pattern object
Pattern d = Pattern.compile(dpattern);
Matcher md = d.matcher(line2);
Pattern r = Pattern.compile(dpattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: 0 " + m.group(0) );
// System.out.println("Found value: 1 " + m.group(1) );
//System.out.println("Found value: 2 " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}
and the result out of it breaks after ,(comma) and hence the output is
Found value: 0 "USE CODE ""Gef,
It should be
Found value: 0 "USE CODE ""Gef sdf"" FROM 1/7/07",
and for the second line Matcher m = r.matcher(line2); the output should be
Found value: 0 "Test asda ds asd",
You may use
(?:"[^"]*(?:""[^"]*)*"|[^,])+
See the regex demo
Explanation:
" - leading quote
[^"]* - 0+ chars other than a double quote
(?:""[^"]*)* - 0+ sequences of a "" text followed with 0+ chars other than a double quote
" - trailing quote
OR:
[^,] - any char but a comma
And the whole pattern is matched 1 or more times as it is enclosed with (?:...)+ and + matches 1 or more occurrences.
IDEONE demo:
String line = "\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\", Delete , Hello , How are you ? , ";
String line2 = "Test asda ds asd, tesat2 . test3";
Pattern pattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = pattern.matcher(line);
if (matcher.find()){ // if is used to get the 1st match only
System.out.println(matcher.group(0));
}
Matcher matcher2 = pattern.matcher(line2);
if (matcher2.find()){
System.out.println(matcher2.group(0));
}

Regex to match [[Wikipedia:Manual of Style#Links|]] # in java

I have been trying to match the following string -
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
with the regex
boolean a = temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9]*#[a-zA-Z_0-9]*\\|\\]\\]");
"\\[\\[Wikipedia:(.*?)#(.*?)\\|\\]\\]"
"\\[\\[Wikipedia:(.*)*#(.+)*\\|\\]\\]"
"\\[\\[(.*?)#(.*?)\\|\\]\\]"
But none of them are giving any positive matches.
Straight away I can see a problem: you are using a character class without a space to match input with spaces.
Try this:
boolean a = temp.matches("\\[\\[Wikipedia:[\\w ]*#[\\w ]+\\|\\]\\]");
Note that [a-zA-Z_0-9] can be replaced by [\w] (but would include letters/numbers from all languages, which should be fine)
public static void main(String[] args) {
String temp = "[[Wikipedia:Manual of Style#Links|]]";
Pattern pattern = Pattern.compile("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Matcher matcher = pattern.matcher(temp);
if(matcher.find()) {
System.out.println("Manual of Style: " + matcher.group(1));
System.out.println("links : " + matcher.group(2));
}
}
or
temp.matches("\\[\\[Wikipedia:([\\w ]+)#([\\w ]+)\\|\\]\\]");
Just add a space to your custom character class:
String temp = "[[Wikipedia:Manual of Style#Links|]]" ;
temp.matches("\\[\\[Wikipedia:[a-zA-Z_0-9 ]*#[a-zA-Z_0-9]*\\|\\]\\]"); //true

Categories