I am new to regular expression syntax, after one whole day digging on the google, still can't find a good regex in java to extract the thing I want from a string...
for example:I have a
stringA = "-3.5 + 2 * 3 / 2"
stringB = "2 * 3 / 2 - 3.5";
the regex i used was
regex="[\\+\\-\\*\\/]", -->choose +,-,*,or / from the target;
by doing this, I am able to capture ANY signs in the string including negative sign.
However, I was to capture the negative sign(-) only when it is following by a whitespace.
That is, I want the result from
string A as [ +, *, /], these three signs and stringB as [ *, / , -]
I realized I only need to add another condition into regex for the negative sign such as
regex = "[\\+{\\-\\s}\\*\\/]" ---> I want to choose same thing but with
extra condition "-"sign has to follow by a whitespace.
the square bracket does not work like this way..Is there anyone can kindly guide my how to add another condition into the original regex? or write a new regex to qualify the need? Thank you so much in advance.
Chi, this might be the simple regex you're looking for:
[+*/]|(?<=\s)-
How does it work?
There is an alternation | in the middle, which is a way of saying "match this or match that."
On the left, the character class [+*/] matches one character that is a +, * or /
On the right, the lookbehind (?<=\s) asserts "preceded by a whitespace character", then we match a minus.
How to use it?
List<String> matchList = new ArrayList<String>();
try {
Pattern regex = Pattern.compile("[+*/]|(?<=\\s)-");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
If you are interested, you may want to read up on regex lookaheads and lookbehinds.
Let me know if you have any question.
What you can do is ditch the class (the [] enclosed Pattern), use OR instead, and use a negative lookahead for your minus sign, to avoid for it to be followed by a digit:
String input0 = "2 * 3 / 2 - 3.5";
String input1 = "-3.5 + 2 * 3 / 2";
Pattern p = Pattern.compile("\\+|\\-(?!\\d)|\\*|/");
Matcher m = p.matcher(input0);
while (m.find()) {
System.out.println(m.group());
}
System.out.println();
m = p.matcher(input1);
while (m.find()) {
System.out.println(m.group());
}
Output
*
/
-
+
*
/
Yet another solution.
Maybe you want to catch the minus sign regardless of white spaces and rather depending on its meaning, i. e. a binary-minus operator and not the minus sign before the numbers.
You could have the case where you could have a binary-minus without any space at all, like in 3-5 or you could have a minus sign before the number with a space between them (which it is allowed in many programming languages, Java included). So, in order to catch your tokens properly (positive-negative-numbers and binary-operators) you can try this:
public static void main(String[] args) {
String numberPattern = "(?:-? *\\d+(?:\\.\\d+)?(?:E[+-]?\\d+)?)";
String opPattern = "[+*/-]";
Pattern tokenPattern = Pattern.compile(numberPattern + "|" + opPattern);
String stringA = "-3.5 + -2 * 3 / 2";
Matcher matcher = tokenPattern.matcher(stringA);
while(matcher.find()) {
System.out.println(matcher.group().trim());
}
}
Here you are catching operators AND ALSO operands, regardless of white spaces. If you only need the binary operators, just filter them.
Try with the string "-3.5+-2*3/2" (without spaces at all) and you'll have your tokens anyway.
Try String#replaceAll(). Its very simple pattern.
// [any digit] or [minus followed by any digit] or [decimal]
String regex = "(\\d|-\\d|\\.)";
String stringA = "-3.5 + 2 * 3 / 2";
String stringA1 = stringA.replaceAll(regex, "").trim();
System.out.println(stringA1);
String stringB = "2 * 3 / 2 - 3.5";
String stringB1 = stringB.replaceAll(regex, "").trim();
System.out.println(stringB1);
output
+ * /
* / -
Note : You can get all the operators using String#split("\\s+").
Related
I have a String:
String thestra = "/aaa/bbb/ccc/ddd/eee";
Every time, in my situation, for this Sting, a minimum of two slashes will be present without fail.
And I am getting the /aaa/ like below, which is the subString between "FIRST TWO occurrences" of the char / in the String.
System.out.println("/" + thestra.split("\\/")[1] + "/");
It solves my purpose but I am wondering if there is any other elegant and cleaner alternative to this?
Please notice that I need both slashes (leading and trailing) around aaa. i.e. /aaa/
You can use indexOf, which accepts a second argument for an index to start searching from:
int start = thestra.indexOf("/");
int end = thestra.indexOf("/", start + 1) + 1;
System.out.println(thestra.substring(start, end));
Whether or not it's more elegant is a matter of opinion, but at least it doesn't find every / in the string or create an unnecessary array.
Scanner::findInLine returning the first match of the pattern may be used:
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(new Scanner(thestra).findInLine("/[^/]*/"));
Output:
/aaa/
Use Pattern and Matcher from java.util.regex.
Pattern pattern = Pattern.compile("/.*?/");
Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
String match = matcher.group(0); // output
}
Pattern.compile("/.*?/")
.matcher(thestra)
.results()
.map(MatchResult::group)
.findFirst().ifPresent(System.out::println);
You can test this variant :)
With best regards, Fr0z3Nn
Every time, in my situation, for this Sting, minimum two slashes will be present
if that is guaranteed, split at each / keeping those delimeters and take the first three substrings.
String str = String.format("%s%s%s",(thestra.split("((?<=\\/)|(?=\\/))")));
You could also match the leading forward slash, then use a negated character class [^/]* to optionally match any character except / and then match the trailing forward slash.
String thestra = "/aaa/bbb/ccc/ddd/eee";
Pattern pattern = Pattern.compile("/[^/]*/");
Matcher matcher = pattern.matcher(thestra);
if (matcher.find()) {
System.out.println(matcher.group());
}
Output
/aaa/
One of the many ways can be replacing the string with group#1 of the regex, [^/]*(/[^/].*?/).* as shown below:
public class Main {
public static void main(String[] args) {
String thestra = "/aaa/bbb/ccc/ddd/eee";
String result = thestra.replaceAll("[^/]*(/[^/].*?/).*", "$1");
System.out.println(result);
}
}
Output:
/aaa/
Explanation of the regex:
[^/]* : Not the character, /, any number of times
( : Start of group#1
/ : The character, /
[^/]: Not the character, /
.*?: Any character any number of times (lazy match)
/ : The character, /
) : End of group#1
.* : Any character any number of times
Updated the answer as per the following valuable suggestion from Holger:
Note that to the Java regex engine, the / has no special meaning, so there is no need for escaping here. Further, since you’re only expecting a single match (the .* at the end ensures this), replaceFirst would be more idiomatic. And since there was no statement about the first / being always at the beginning of the string, prepending the pattern with either , .*? or [^/]*, would be a good idea.
I am surprised nobody mentioned using Path as of Java 7.
String thestra = "/aaa/bbb/ccc/ddd/eee";
String path = Paths.get(thestra).getName(0).toString();
System.out.println("/" + path + "/");
/aaa/
String thestra = "/aaa/bbb/ccc/ddd/eee";
System.out.println(thestra.substring(0, thestra.indexOf("/", 2) + 1));
I need to extract a substring from a string using regex. The tricky (for me) part is that the string may be in one of two formats:
either LLDDDDLDDDDDDD/DDD (eg. AB1000G242424/001) or just between 1 and 7 digits (eg. 242424).
The substring I need to extract would needs to be:
If string is 7 digits or longer, then extract substring consisting of 7 digits.
Else (if string is shorter than 7 digits), then extract substring consisting of 1-6 digits.
Below is one of my tries.
String regex = ("([0-9]{7}|[0-9]{0,6})");
Pattern pattern = Pattern.compile(regex);
Matcher matcher;
matcher = pattern.matcher("242424");
String extractedNr1 = "";
while (matcher.find()) {
extractedNr1 += matcher.group();
}
matcher = pattern.matcher("AB1000G242424/001");
String extractedNr2 = "";
while (matcher.find()) {
extractedNr2 += matcher.group();
}
System.out.println("ExtractedNr1 = " + extractedNr1);
System.out.println("ExtractedNr2 = " + extractedNr2);
Output:
ExtractedNr1 = 242424
ExtractedNr2 = 1000242424001
I understand the second one is a concat from all the groups, but don't understand why matches are arranged like that. Can I make a regex that will stop immidiately after finding a match (with priority for the first option, that is 7 digits)?
I thought about using some conditional statement, but apparently these are not supported in java.util.regex, and I cannot use third party library.
I can do this in java obviously, but the whole point is in using regex.
Regex is a secundary concern, the occurrences of digits must be compared by length. As in regex \d stand for digit and \D for non-digit you can use String.splitAsStream as follows:
Optional<String> digits takeDigits(String s) {
return s.splitAsStream("\\D+")
filter(w -> !w.isEmpty() && w.length() <= 7)
max(Comparator.comparingInt(String::length));
}
You can use String.replaceAll to remove the non-digit characters:
String extracted = new String("AB1000G242424/001").replaceAll("[^0-9]","");
if (extracted.length() > 7)
extracted = extracted.substring(0, 7);
Output:
1000242
I have a method for the phone number masking. I need to replace all digits with stars except the last 4.
Sample inputs would be: +91 (333) 444-5678 and +1(333) 456-7890. Outputs should look this way:
But my output actually looks like this:
So here is my code:
public static String maskPhoneNumber(String inputPhoneNum){
return inputPhoneNum.replaceAll("\\(", "-")
.replaceAll("\\)", "-")
.replaceAll(" ", "-")
.replaceAll("\\d(?=(?:\\D*\\d){4})", "*");
}
My method works with different number of digits in country codes, but it breaks in cases when instead of a space between digits there are brackets near the country code (triad after it).
I would be grateful for some hints on how I can improve my approach!
Currently, you replace each individual space, ( and ) with a -. You need to replace all consecutive occurrences with 1 hyphen.
Use
public static String maskPhoneNumber(String inputPhoneNum){
return inputPhoneNum.replaceAll("[()\\s]+", "-")
.replaceAll("\\d(?=(?:\\D*\\d){4})", "*");
}
See this Java demo.
The +91 (333) 444-5678 turns into +**-***-***-5678 and +1(333) 456-7890 turns into +*-***-***-7890.
The [()\s]+ pattern matches 1 or more (+) consecutive (, ) or whitespace chars. See the "normalization" step regex demo and the final step demo.
There is a dedicated API in the language itself for that (in the form of appendReplacement)
String test = "+91 (333) 444-5678";
test = test.replaceAll("[()\\s]+", "-");
Pattern p = Pattern.compile("\\d+(?!\\d*$)");
Matcher m = p.matcher(test);
StringBuilder sb = new StringBuilder(); // +**-***-***-5678
for (; m.find();) {
m.appendReplacement(sb, m.group().replaceAll(".", "*"));
}
m.appendTail(sb);
System.out.println(sb.toString());
Soo i think i already solved it, what i did:
String pattern = "(<=|>=)\\s{0,2}((+]\\s{0,2})?(\\d+\\s{0,2}[/]\\s{0,2}(\\d{2,}|[1-9])\\s{0,2}|\\d+[.]\\d{1,2}|\\d+))\\s{0,2}";
The pattern had something wrong, i have corrected it above and now it works :)
I have an inequation that may containing >= or <=, some white spaces and a number. That number might be an integer, a decimal number with 2 decimal places or a fraction and I want to retrieve the number on the 2nd member of the inequation with the "Matcher". Example:
4x1 + 6x2 <= 40/3
I've tried to construct such a pattern and I was able to find it. But then I've remembered that a fraction cannot be divided by zero so I want to check that aswell. For that I have used the following code:
String inequation = "4x1 + 6x2 <= 40/3";
String pattern = "(<=|>=)\\s{0,2}((+]\\s{0,2})?(\\d+\\s{0,2}[/]\\s{0,2}(\\d{2,}|[1-9])\\s{0,2}\\d+|\\d+[.]\\d{1,2}|\\d+))\\s{0,2}";
Pattern ptrn = Pattern.compile(pattern);
Matcher match = ptrn.matcher(inequation);
if(match.find()){
String fraction = match.group(2);
System.out.println(fraction);
} else {
System.out.println("NO MATCH");
}
But it's not working as expected. If it has at least 2 digits on the denominator it returns correctly (e.g. 40/32). But if it only has 1 digit it only returns the integer part (e.g. 40).
Anyway to solve this?
Which expression should I use?
Do you just want the number after the inequality sign? Then do:
Matcher m = Pattern.compile("[<>]=?\\s*(.+?)\\s*$").matcher(string);
String number = m.find() ? m.group(1) : null;
You could try using debuggex to build regular expressions. It shows you a nice diagram of your expression and you can test your inputs as well.
Java implementation (validates that the numerator is non-zero.):
Matcher m = Pattern.compile("[<>]=?\\s{0,2}([0-9]*(/[1-9][0-9]*)?)$").matcher("4x1 + 6x2 <= 40/3");
if (m.find()) {
System.out.println(m.group(1));
}
You need an '$' at the end of your expression, so that it tries to match the entire inequality.
No question on SO addresses my particular problem. I know very little about regular expression. I am building an expression parser in Java using Regex Class for that purpose. I want to extract Operands, Arguments, Operators, Symbols and Function Names from expression and then save to ArrayList. Currently I am using this logic
String string = "2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)" //This is just for testing purpose later on it will be provided by user
List<String> res = new ArrayList<>();
Pattern pattern = Pattern.compile((\\Q^\\E|\\Q/\\E|\\Q-\\E|\\Q-\\E|\\Q+\\E|\\Q*\\E|\\Q)\\E|\\Q)\\E|\\Q(\\E|\\Q(\\E|\\Q%\\E|\\Q!\\E)) //This string was build in a function where operator names were provided. Its mean that user can add custom operators and custom functions
Matcher m = pattern.matcher(string);
int pos = 0;
while (m.find())
{
if (pos != m.start())
{
res.add(string.substring(pos, m.start()))
}
res.add(m.group())
pos = m.end();
}
if (pos != string.length())
{
addToTokens(res, string.substring(pos));
}
for(String s : res)
{
System.out.println(s);
}
Output:
2
!
+
atan2
(
3
+
9
,
2
+
3
)
-
2
*
PI
+
3
/
3
-
9
-
12
%
3
*
sin
(
9
-
9
)
+
(
2
+
6
/
2
)
Problem is that now Expression can contain Matrix with user defined format. I want to treat every Matrix as a Operand or Argument in case of functions.
Input 1:
String input_1 = "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6"
Output Should be:
2
+
3
-
9
*
[{2+3,2,6},{7,2+3,2+3i}]
+
9
*
6
Input 2:
String input_2 = "{[2,5][9/8,func(2+3)]}+9*8/5"
Output Should be:
{[2,5][9/8,func(2+3)]}
+
9
*
8
/
5
Input 3:
String input_3 = "<[2,9,2.36][2,3,2!]>*<[2,3,9][23+9*8/8,2,3]>"
Output Should be:
<[2,9,2.36][2,3,2!]>
*
<[2,3,9][23+9*8/8,2,3]>
I want that now ArrayList should contain every Operand, Operators, Arguments, Functions and symbols at each index. How can I achieve my desired output using Regular expression. Expression validation is not required.
I think you can try with something like:
(?<matrix>(?:\[[^\]]+\])|(?:<[^>]+>)|(?:\{[^\}]+\}))|(?<function>\w+(?=\())|(\d+[eE][-+]\d+)|(?<operand>\w+)|(?<operator>[-+\/*%])|(?<symbol>.)
DEMO
elements are captured in named capturing groups. If you don't need it, you can use short:
\[[^\]]+\]|<[^>]+>|\{[^\}]+\}|\d+[eE][-+]\d+|\w+(?=\()|\w+|[-+\/*%]|.
The \[[^\]]+\]|<[^>]+>|\{[^\}]+\} match opening bracket ({, [ or <), non clasing bracket characters, and closing bracket (},],>) so if there are no nested same-type brackets, there is no problem.
Implementatin in Java:
public class Test {
public static void main(String[] args) {
String[] expressions = {"2!+atan2(3+9,2+3)-2*PI+3/3-9-12%3*sin(9-9)+(2+6/2)", "2+3-9*[{2+3,2,6},{7,2+3,2+3i}]+9*6",
"{[2,5][9/8,func(2+3)]}+9*8/5","<[2,9,2.36][2,3,2!]>*<[2,3,9][23 + 9 * 8 / 8, 2, 3]>"};
Pattern pattern = Pattern.compile("(?<matrix>(?:\\[[^]]+])|(?:<[^>]+>)|(?:\\{[^}]+}))|(?<function>\\w+(?=\\())|(?<operand>\\w+)|(?<operator>[-+/*%])|(?<symbol>.)");
for(String expression : expressions) {
List<String> elements = new ArrayList<String>();
Matcher matcher = pattern.matcher(expression);
while (matcher.find()) {
elements.add(matcher.group());
}
for (String element : elements) {
System.out.println(element);
}
System.out.println("\n\n\n");
}
}
}
Explanation of alternatives:
\[[^\]]+\]|<[^>]+>|\{[^\}]+\} - match opening bracket of given
type, character which are not closing bracket of that type
(everything byt not closing bracket), and closing bracket of that
type,
\d+[eE][-+]\d+ = digit, followed by e or E, followed by operator +
or -, followed by digits, to capture elements like 2e+3
\w+(?=\() - match one or more word characters (A-Za-z0-9_) if it is
followed by ( for matching functions like sin,
\w+ - match one or more word characters (A-Za-z0-9_) for matching
operands,
[-+\/*%] - match one character from character class, to match
operators
. - match any other character, to match other symbols
Order of alternatives is quite important, as last alternative . will match any character, so it need to be last option. Similar case with \w+(?=\() and \w+, the second one will match everything like previous one, however if you don't wont to distinguish between functions and operands, the \w+ will be enough for all of them.
In longer exemple the part (?<name> ... ) in every alternative, is a named capturing group, and you can see in demo, how it group matched fragments in gorups like: operand, operator, function, etc.
With regular expressions you cannot match any level of nested balanced parentheses.
For example, in your second example {[2,5][9/8,func(2+3)]} you need to match the opening brace with the close brace, but you need to keep track of how many opening and closing inner braces/parens/etc there are. That cannot be done with regular expressions.
If, on the other hand, you simplify your problem to remove any requirement for balancing, then you probably can handle with regular expressions.