Java match two strings without last character - java

I've a URL with path being /mypath/check/10.10/-123.11 . I want to return true if (optionally) there are 3 digits after decimal instead of 2 e.g /mypath/check/10.101/-123.112 should return true when matched. Before decimal for both two occurences should be exact match.
To cite some examples :
Success
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.112
/mypath/check/10.10/-123.11 = /mypath/check/10.101/-123.11
/mypath/check/10.10/-123.11 = /mypath/check/10.10/-123.112
/mypath/check/10.10/123.11 = /mypath/check/10.101/123.112
.. and so forth
Failure :
/mypath/check/10.10/-123.11 != /mypath/check/10.121/-123.152
/mypath/check/10.11/-123.11 != /mypath/check/10.12/-123.11
The numbers before decimal can include - with digits with 1 to 3 numbers.

Try /mypath/check/10\.10/-?123\.11[ ]*=[ ]*/mypath/check/(\d\d)\.\1\d?/
demo

Try this:
url1.equals(url2) || url1.equals(url2.replaceAll("\\d$", ""))

Idea
Regex subpatterns that shall match optionally are suffixed with the ? modifier. In your case this applies to the 3rd character after a decimal point.
An equality tests modulo that optional digit may be implemented in matching each occurrence of the context pattern and replacing the optional part within the match with the empty string. After this normalization the strings can be tested for equality.
Code
// Initializing test data.
// Will compare Strings in batch1, batch2 at the same array position.
//
String[] batch1 = {
"/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.10/123.11"
, "/mypath/check/10.10/-123.11"
, "/mypath/check/10.11/-123.11"
};
String[] batch2 = {
"/mypath/check/10.101/-123.112"
, "/mypath/check/10.101/-123.11"
, "/mypath/check/10.10/-123.112"
, "/mypath/check/10.101/123.112"
, "/mypath/check/10.121/-123.152"
, "/mypath/check/10.12/-123.11"
};
// Regex pattern used for normalization:
// - Basic pattern: decimal point followed by 2 or 3 digits
// - Optional part: 3rd digit of the basic pattern
// - Additional context: Pattern must match at the end of the string or be followed by a non-digit character.
//
Pattern re_p = Pattern.compile("([.][0-9]{2})[0-9]?(?:$|(?![0-9]))");
// Replacer routine for processing the regex match. Returns capture group #1
Function<MatchResult, String> fnReplacer= (MatchResult m)-> { return m.group(1); };
// Processing each test case
// Expected result
// match
// match
// match
// match
// mismatch
// mismatch
//
for ( int i = 0; i < batch1.length; i++ ) {
String norm1 = re_p.matcher(batch1[i]).replaceAll(fnReplacer);
String norm2 = re_p.matcher(batch2[i]).replaceAll(fnReplacer);
if (norm1.equals(norm2)) {
System.out.println("Url pair #" + Integer.toString(i) + ": match ( '" + norm1 + "' == '" + norm2 + "' )");
} else {
System.out.println("Url pair #" + Integer.toString(i) + ": mismatch ( '" + norm1 + "' != '" + norm2 + "' )");
}
}
Demo available here (ideone.com).

I'm assuming the first URL always has exactly 2 digits after every decimal point. If so, match the 2nd URL to the regex formed by appending an optional digit to the end of each decimal fraction in the first URL.
static boolean matchURL(String url1, String url2)
{
return url2.matches(url1.replaceAll("([.][0-9]{2})", "$1[0-9]?"));
}
Test:
String url1 = "/mypath/check/10.10/-123.11";
List<String> tests = Arrays.asList(
"/mypath/check/10.10/-123.11",
"/mypath/check/10.10/-123.111",
"/mypath/check/10.101/-123.11",
"/mypath/check/10.101/-123.111",
"/mypath/check/10.11/-123.11"
);
for(String url2 : tests)
System.out.format("%s : %s = %b%n", url1, url2, matchURL(url1, url2));
Output:
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.10/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.11 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.101/-123.111 = true
/mypath/check/10.10/-123.11 : /mypath/check/10.11/-123.11 = false

Related

Regex seperate 2 numbers by komma

I'm trying to make a regex to allow only a case of a number then "," and another number or same case seperated by ";" like
57,1000
57,1000;6393,1000
So far i made this: Pattern.compile("\\b[0-9;,]{1,5}?\\d+;([0-9]{1,5},?)+").matcher("57,1000").find();
which work if case is 57,1000;6393,1000 but it also allow letters and don't work when case 57,1000
try Regex "(\d+,\d+(;\d+,\d+)?)"
#Test
void regex() {
Pattern p = Pattern.compile("(\\d+,\\d+)(;\\d+,\\d+)?");
Assertions.assertTrue(p.matcher("57,1000").matches());
Assertions.assertTrue(p.matcher("57,1000;6393,1000").matches());
}
How about like this. Just look for two numbers separated by a comma and capture them.
String[] data = {"57,1000",
"57,1000;6393,1000"};
Pattern p = Pattern.compile("(\\d+),(\\d+)");
for (String str : data) {
System.out.println("For String : " + str);
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1) + " " + m.group(2));
}
System.out.println();
}
prints
For String : 57,1000
57 1000
For String : 57,1000;6393,1000
57 1000
6393 1000
If you just want to match those, you can do the following: It matches a single instance of the string followed by an optional one preceded by a semi-colon.
String regex = "(\\d+,\\d+)(;(\\d+,\\d+))?";
for (String str : data) {
System.out.println("Testing String " + str + " : " +str.matches(regex));
}
prints
Testing String 57,1000 : true
Testing String 57,1000;6393,1000 : true

Removing whitespaces at the beginning of the string with Regex gives null Java

I would like to get groups from a string that is loaded from txt file. This file looks something like this (notice the space at the beginning of file):
as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655
First part of string until first comma can be digits and letter, second part of string are only digits and third are also only digits. After | its all repeating.
First, I load txt file into string :String readFile3 = readFromTxtFile("/resources/file.txt");
Then I remove all whitespaces with regex :
String no_whitespace = readFile3.replaceAll("\\s+", "");
After that i try to get groups :
Pattern p = Pattern.compile("[a-zA-Z0-9]*,\\d*,\\d*", Pattern.MULTILINE);
Matcher m = p.matcher(ue_No_whitespace);
int lastMatchPos = 0;
while (m.find()) {
System.out.println(m.group());
lastMatchPos = m.end();
}
if (lastMatchPos != ue_No_whitespace.length())
System.out.println("Invalid string!");
Now I would like, for each group remove "," and add every value to its variable, but I am getting this groups : (notice this NULL)
nullas431431af,87546,3214
5a341fafaf,3365,54465
6adrT43,5678,5655
What am i doing wrong? Even when i physicaly remove space from the beginning of the txt file , same result occurs.
Is there any easier way to get groups in this string with regex and add each string part, before "," , to its variable?
You can split with | enclosed with optional whitespaces and then split the obtained items with , enclosed with optional whitespaces:
String str = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String[] items = str.split("\\s*\\|\\s*");
List<String[]> res = new ArrayList<>();
for(String i : items) {
String[] parts = i.split("\\s*,\\s*");
res.add(parts);
System.out.println(parts[0] + " - " + parts[1] + " - " + parts[2]);
}
See the Java demo printing
as431431af - 87546 - 3214
5a341fafaf - 3365 - 54465
6adrT43 - 5678 - 5655
The results are in the res list.
Note that
\s* - matches zero or more whitespaces
\| - matches a pipe char
The pattern that you tried only has optional quantifiers * which could also match only comma's.
You also don't need Pattern.MULTILINE as there are no anchors in the pattern.
You can use 3 capture groups and use + as the quantifier to match at least 1 or more occurrence, and after each part either match a pipe | or assert the end of the string $
([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\||$)
Regex demo | Java demo
For example
String readFile3 = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String no_whitespace = readFile3.replaceAll("\\s+", "");
Pattern p = Pattern.compile("([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\\||$)");
Matcher matcher = p.matcher(no_whitespace);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
System.out.println("--------------------------------");
}
Output
as431431af
87546
3214
--------------------------------
5a341fafaf
3365
54465
--------------------------------
6adrT43
5678
5655
--------------------------------

How to split but not on all char

I have a string which looks like : String s = "date1, calculatedDate(currentDate, 35), false";.
I need to extract all param of verify function. So the expected result should be :
elem[0] = date1
elem[1] = calculatedDate(currentDate, 35)
elem[2] = false
If I use split function on , char but I got this result :
elem[0] = date1
elem[1] = calculatedDate(currentDate
elem[2] = 35)
elem[3] = false
Moreover, the method have to be generic, because some functions have 2 or 7 parameters...
Did you have any solution to help me on that?
You could use StringTokenizer to parse your arguments inside the parentheses:
final static String DELIMITER = ",";
final static String PARENTHESES_START = "(";
final static String PARENTHESES_END = ")";
public static List<String> parseArguments(String text) {
List<String> arguments = new ArrayList<>();
StringBuilder argParsed = new StringBuilder();
StringTokenizer st = new StringTokenizer(text, DELIMITER);
while (st.hasMoreElements()) {
// default: add next token
String token = st.nextToken();
System.out.println("Token: " + token);
argParsed.append(token);
// if token contains '(' we have
// an expression or nested call as argument
if (token.contains(PARENTHESES_START)) {
System.out.println("Nested expression with ( starting: " + token);
// reconstruct to string-builder until ')'
while(st.hasMoreElements() && !token.contains(PARENTHESES_END)) {
// add eliminated/tokenized delimiter
argParsed.append(DELIMITER);
// default: add next token
token=st.nextToken();
System.out.println("Token inside nested expression: " + token);
argParsed.append(token);
}
System.out.println("Nested expression with ) ending: " + token);
}
// add complete argument and start fresh
arguments.add(argParsed.toString());
argParsed.setLength(0);
}
return arguments;
}
It can parse even following input: date1, calculatedDate(currentDate, 35), false, (a+b), x.toString()
Sucessfully found all 5 arguments, including complex ones:
(nested) function-calls like calculatedDate(currentDate, 35)
expressions like (a+b)
method-calls on objects like x.toString()
Run this demo on IDEone.
Read more and extend
There might be more complex texts or grammars to handle (in the future).
Then, if neither regex-capturing, nor string-splitting, nor tokenizing can solve, consider using or generating a PEG- or CFG-parser. See the discussion about Regular Expression Vs. String Parsing.
Try this:
String s = "verify(date1, calculatedDate(currentDate, 35), false)";
Pattern p = Pattern.compile("(?<=verify\\()(\\w+)(,\\s)(.*)(,\\s)((?<=,\\s)\\w+)(?=\\))");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + "\n" + m.group(3) + "\n" + m.group(5));
}
Update for s = "date1, calculatedDate(currentDate, 35), false":
String s = "date1, calculatedDate(currentDate, 35), false";
Pattern p = Pattern.compile("(\\w+)(,\\s)(.*)(,\\s)((?<=,\\s)\\w+)");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1) + "\n" + m.group(3) + "\n" + m.group(5));
}
Output:
date1
calculatedDate(currentDate, 35)
false
About regex:
(\\w+) one or more(+) word characters
(,\\s) , part
(.*) matches any character, here just the part between two ,
(,\\s) , part
((?<=,\\s)\\w+) ?<= is a positive look behind, helps to catch , false part but does not include ,

Java - Regex to split mathematical expression for operator excluding operator which comes under brackets

I need to split below string using below regex. but it splits data which comes under brackets.
Input
T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14
Expected Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14
It should not split data which comes under "(" and ")";
I tried with below code but split data which comes under "(" and ")"
String[] result = expr.split("[+*/]");
any pointer to fix this.
I am new to this regex.
Input
(T(i-1).XX_1 + XY_8) + T(i-1).YY_2 * (ZY_14 + ZY_14)
Output
T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14 , ZY_14
if it is T(i-1) need to ignore.
For below expression its not working
XY_98 + XY_99 +XY_100
String lineExprVal = lineExpr.replaceAll("\\s+","");
String[] result = lineExprVal.split("[+*/-] (?!(^))");
You can split every thing outside your parentheses like this :
String str = "T(i-1).XX_1 + XY_8 + T(i-1).YY_2 * ZY_14";
String result[] = str.split("[+*/-] (?!(^))");
//---------------------------^----^^--List of your delimiters
System.out.println(Arrays.toString(result));
This will print :
[T(i-1).XX_1 , XY_8 , T(i-1).YY_2 , ZY_14]
The idea is simple you have to split with your delimiters that not inside your parenthesis.
You can check this here ideone and you can check your regex here Regex demo
EDIT
In your second case you have to use this regex :
String str = "(T(i - 1).XX_1 + XY_8)+ (i - 1).YY_2*(ZY_14 + ZY_14)";
String result[] = str.split("[+*+\\/-](?![^()]*(?:\\([^()]*\\))?\\))");
System.out.println(Arrays.toString(result));
This will give you :
[(T(i-1).XX_1+XY_8), T(i-1).YY_2, (ZY_14+ZY_14)]
^----Group1------^ ^--Groupe2-^ ^--Groupe3-^
You can find the Regex Demo, i inspirit this solution from this post here Regex to match only comma's but not inside multiple parentheses .
Hope this can help you.
Split in your second mathematical expression is really hard if it is not possible, so instead you have to use pattern, it is more helpful, so for your expression, you need this regex :
(\w+\([\w-*+\/]+\).\w+)|((?:(\w+\(.*?\))))|(\w+)
Here is a Demo regex you will understand more.
To get the result you need to loop throw your result :
public static void main(String[] args) {
String input = "(T(i-1).XX_1 + XY_8) + X + T(i-1).YY_2 * (ZY_14 + ZY_14) + T(i-1)";
Pattern pattern = Pattern.compile("(\\w+\\([\\w-*+\\/]+\\).\\w+)|((?:(\\w+\\(.*?\\))))|(\\w+)");
Matcher matcher = pattern.matcher(input);
List<String> reslt = new ArrayList<>();
while (matcher.find()) {//loop throw your matcher
if (matcher.group(1) != null) {
reslt.add(matcher.group(1));
}
//In your case you have to avoid this two groups
// if (matcher.group(2) != null) {
// reslt.add(matcher.group(2));
// }
// if (matcher.group(3) != null) {
// reslt.add(matcher.group(3));
// }
if (matcher.group(4) != null) {
reslt.add(matcher.group(4));
}
}
reslt.forEach(System.out::println);
}
This will gives you :
T(i-1).XX_1
XY_8
X
T(i-1).YY_2
ZY_14
ZY_14

Need to extract data from CSV file

In my file I have below data, everything is string
Input
"abcd","12345","success,1234,out",,"hai"
The output should be like below
Column 1: "abcd"
Column 2: "12345"
Column 3: "success,1234,out"
Column 4: null
Column 5: "hai"
We need to use comma as a delimiter , the null value is comming without double quotes.
Could you please help me to find a regular expression to parse this data
You could try a tool like CSVReader from OpenCsv https://sourceforge.net/projects/opencsv/
You can even configure a CSVParser (used by the reader) to output null on several conditions. From the doc :
/**
* Denotes what field contents will cause the parser to return null: EMPTY_SEPARATORS, EMPTY_QUOTES, BOTH, NEITHER (default)
*/
public static final CSVReaderNullFieldIndicator DEFAULT_NULL_FIELD_INDICATOR = NEITHER;
You can use this Regular Expression
"([^"]*)"
DEMO: https://regex101.com/r/WpgU9W/1
Match 1
Group 1. 1-5 `abcd`
Match 2
Group 1. 8-13 `12345`
Match 3
Group 1. 16-32 `success,1234,out`
Match 4
Group 1. 36-39 `hai`
Using the ("[^"]+")|(?<=,)(,) regex you may find either quoted strings ("[^"]+"), which should be treated as is, or commas preceded by commas, which denote null field values. All you need now is iterate through the matches and check which of the two capture groups defined and output accordingly:
String input = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
Pattern pattern = Pattern.compile("(\"[^\"]+\")|(?<=,)(,)");
Matcher matcher = pattern.matcher(input);
int col = 1;
while (matcher.find()) {
if (matcher.group(1) != null) {
System.out.println("Column " + col + ": " + matcher.group(1));
col++;
} else if (matcher.group(2) != null) {
System.out.println("Column " + col + ": null");
col++;
}
}
Demo: https://ideone.com/QmCzPE
Step #1:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "(,,)";
final String string = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"\n"
+ "\"abcd\",\"12345\",\"success,1234,out\",\"null\",\"hai\"";
final String subst = ",\"null\",";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
Original Text:
"abcd","12345","success,1234,out",,"hai"
Transformation: (with null)
"abcd","12345","success,1234,out","null","hai"
Step #2: (use REGEXP)
"([^"]*)"
Result:
abcd
12345
success,1234,out
null
hai
Credits:
Emmanuel Guiton [https://stackoverflow.com/users/7226842/emmanuel-guiton] REGEXP
You can also use the Replace function:
final String inuput = "\"abcd\",\"12345\",\"success,1234,out\",,\"hai\"";
System.out.println(inuput);
String[] strings = inuput
.replaceAll(",,", ",\"\",")
.replaceAll(",,", ",\"\",") // if you have more then one null successively
.replaceAll("\",\"", "\";\"")
.replaceAll("\"\"", "")
.split(";");
for (String string : strings) {
String output = string;
if (output.isEmpty()) {
output = null;
}
System.out.println(output);
}

Categories