Find dash "-" that's not inside round brackets "()" within String

Find dash "-" that's not inside round brackets "()" within String - java

I'm trying to find/determine if a String contains the character "-" that is not enclosed in round brackets "()".
I've tried the regex
[^\(]*-[^\)]*,
but it's not working.
Examples:
100 - 200 mg -> should match because the "-" is not enclosed in round brackets.
100 (+/-) units -> should NOT match

Do you have to use regex? You could try just iterating over the string and keeping track of the scope like so:
public boolean HasScopedDash(String str)
{
int scope = 0;
boolean foundInScope = false;
for (int i = 0; i < str.length(); i++)
{
char c = str.charAt(i);
if (c == '(')
scope++;
else if (c == '-')
foundInScope = scope != 0;
else if (c == ')' && scope > 0)
{
if (foundInScope)
return true;
scope--;
}
}
return false;
}
Edit: As mentioned in the comments, it might be desirable to exclude cases where the dash comes after an opening parenthesis but no closing parenthesis ever follows. (I.e. "abc(2-xyz") The above edited code accounts for this.

You might not to want to check for that to make this pass. Maybe, you could simply make a check on other boundaries. This expression for instance checks for spaces and numbers before and after the dash or any other chars in the middle you wish to have, which is much easier to modify:
([0-9]\s+[-]\s+[0-9])
It passes your first input and fails the undesired input. You could simply add other chars to its middle char list using logical ORs.
Demo

Java supports quantified atomic groups, this works.
The way it works is to consume paired parenthesis and their contents,
and not giving anything back, up until it finds a dash -.
This is done via the atomic group constructs (?> ).
^(?>(?>\(.*?\))|[^-])*?-
https://www.regexplanet.com/share/index.html?share=yyyyd8n1dar
(click on the Java button, check the find() function column)
Readable
^
(?>
(?> \( .*? \) )
|
[^-]
)*?
-

If you don't mind to check the string by using 2 regex instead of 1 complicated regex. You can try this instead
public static boolean match(String input) {
Pattern p1 = Pattern.compile("\\-"); // match dash
Pattern p2 = Pattern.compile("\\(.*\\-.*\\)"); // match dash within bracket
Matcher m1 = p1.matcher(input);
Matcher m2 = p2.matcher(input);
if ( m1.find() && !m2.find() ) {
return true;
} else {
return false;
}
}
Test the string
public static void main(String[] args) {
String input1 = "100 - 200 mg";
String input2 = "100 (+/-) units";
System.out.println(input1 + " : " + ( match(input1) ? "match" : "not match") );
System.out.println(input2 + " : " + ( match(input2) ? "match" : "not match") );
}
The output will be
100 - 200 mg : match
100 (+/-) units : not match

Matcher m = Pattern.compile("\\([^()-]*-[^()]*\\)").matcher(s); return !m.find();
https://ideone.com/YXvuem

Related

Regex to consolidate multiple rules

I'm looking at optimising my string manipulation code and consolidating all of my replaceAll's to just one pattern if possible
Rules -
strip all special chars except -
replace space with -
condense consecutive - 's to just one -
Remove leading and trailing -'s
My code -
public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (StringUtils.isNotEmpty(value))
slugifiedVal = value
.replaceAll("[ ](?=[ ])|[^-A-Za-z0-9 ]+", "") // strips all special chars except -
.replaceAll("\\s+", "-") // converts spaces to -
.replaceAll("--+", "-"); // replaces consecutive -'s with just one -
slugifiedVal = StringUtils.stripStart(slugifiedVal, "-"); // strips leading -
slugifiedVal = StringUtils.stripEnd(slugifiedVal, "-"); // strips trailing -
return slugifiedVal;
}
Does the job but obviously looks shoddy.
My test assertions -
Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/ ==> heading-with-symbols
Heading with an asterisk* ==> heading-with-an-asterisk
Custom-id-&-stuff ==> custom-id-stuff
--Custom-id-&-stuff-- ==> custom-id-stuff

Disclaimer: I don't think a regex approach to this problem is wrong, or that this is an objectively better approach. I am merely presenting an alternative approach as food for thought.
I have a tendency against regex approaches to problems where you have to ask how to solve with regex, because that implies you're going to struggle to maintain that solution in the future. There is an opacity to regexes where "just do this" is obvious, when you know just to do this.
Some problems typically solved with regex, like this one, can be solved using imperative code. It tends to be more verbose, but it uses simple, apparent, code constructs; it's easier to debug; and can be faster because it doesn't involve the full "machinery" of the regex engine.
static String slugifyTitle(String value) {
boolean appendHyphen = false;
StringBuilder sb = new StringBuilder(value.length());
// Go through value one character at a time...
for (int i = 0; i < value.length(); i++) {
char c = value.charAt(i);
if (isAppendable(c)) {
// We have found a character we want to include in the string.
if (appendHyphen) {
// We previously found character(s) that we want to append a single
// hyphen for.
sb.append('-');
appendHyphen = false;
}
sb.append(c);
} else if (requiresHyphen(c)) {
// We want to replace hyphens or spaces with a single hyphen.
// Only append a hyphen if it's not going to be the first thing in the output.
// Doesn't matter if this is set for trailing hyphen/whitespace,
// since we then never hit the "isAppendable" condition.
appendHyphen = sb.length() > 0;
} else {
// Other characters are simply ignored.
}
}
// You can lowercase when appending the character, but `Character.toLowerCase()`
// recommends using `String.toLowerCase` instead.
return sb.toString().toLowerCase(Locale.ROOT);
}
// Some predicate on characters you want to include in the output.
static boolean isAppendable(char c) {
return (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9');
}
// Some predicate on characters you want to replace with a single '-'.
static boolean requiresHyphen(char c) {
return c == '-' || Character.isWhitespace(c);
}
(This code is wildly over-commented, for the purpose of explaining it in this answer. Strip out the comments and unnecessary things like the else, it's actually not super complicated).

Consider the following regex parts:
Any special chars other than -: [\p{S}\p{P}&&[^-]]+ (character class subtraction)
Any one or more whitespace or hyphens: [^-\s]+ (this will be used to replace with a single -)
You will still need to remove leading/trailing hyphens, it will be a separate post-processing step. If you wish, you can use a ^-+|-+$ regex.
So, you can only reduce this to three .replaceAll invocations keeping the code precise and readable:
public static String slugifyTitle(String value) {
String slugifiedVal = null;
if (value != null && !value.trim().isEmpty())
slugifiedVal = value.toLowerCase()
.replaceAll("[\\p{S}\\p{P}&&[^-]]+", "") // strips all special chars except -
.replaceAll("[\\s-]+", "-") // converts spaces/hyphens to -
.replaceAll("^-+|-+$", ""); // remove trailing/leading hyphens
return slugifiedVal;
}
See the Java demo:
List<String> strs = Arrays.asList("Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/",
"Heading with an asterisk*",
"Custom-id-&-stuff",
"--Custom-id-&-stuff--");
for (String str : strs)
System.out.println("\"" + str + "\" => " + slugifyTitle(str));
}
Output:
"Heading with symbols *~!##$%^&()_+-=[]{};',.<>?/" => heading-with-symbols
"Heading with an asterisk*" => heading-with-an-asterisk
"Custom-id-&-stuff" => custom-id-stuff
"--Custom-id-&-stuff--" => custom-id-stuff
NOTE: if your strings can contain any Unicode whitespace, replace "[\\s-]+" with "(?U)[\\s-]+".

How to replace excessive SQL wildcard by single regex pattern?

I am creating a function that strips the illegal wildcard patterns from the input string. The ideal solution should use a single regex expression, if at all possible.
The illegal wildcard patterns are: %% and %_%. Each instance of those should be replaced with %.
Here's the rub... I'm trying to perform some fuzz testing by running the function against various inputs to try to make it and break it.
It works for the most part; however, with complicated inputs, it doesn't.
The rest of this question has been updated:
The following inputs should return empty string (not an exhaustive list):
The following inputs should return % (not an exhaustive list).
%_%
%%
%%_%%
%_%%%
%%_%_%
%%_%%%_%%%_%
There will be cases where there are other characters with the input... like:
Foo123%_%
Should return "Foo123%"
B4r$%_%
Should return "B4r$%"
B4rs%%_%
Should return "B4rs%"
%%Lorem_%%
Should return "%Lorem_%"
I have tried using several different patterns and my tests are failing.
String input = "%_%%%%_%%%_%";
// old method:
public static String ancientMethod1(String input){
if (input == null)
return "";
return input.replaceAll("%_%", "").replaceAll("%%", ""); // Output: ""
}
// Attempt 1:
// Doesn't quite work right.
// "A%%" is returned as "A%%" instead of "A%"
public static String newMethod1(String input) {
String result = input;
while (result.contains("%%") || result.contains("%_%"))
result = result.replaceAll("%%","%").replaceAll("%_%","%");
if (result.equals("%"))
return "";
return input;
}
// Attempt 2:
// Succeeds, but I would like to simplify this:
public static String newMethod2(String input) {
if (input == null)
return "";
String illegalPattern1 = "%%";
String illegalPattern2 = "%_%";
String result = input;
while (result.contains(illegalPattern1) || result.contains(illegalPattern2)) {
result = result.replace(illegalPattern1, "%");
result = result.replace(illegalPattern2, "%");
}
if (result.equals("%") || result.equals("_"))
return "";
return result;
}
Here's a more complete defined example of how I'm using this: https://gist.github.com/sometowngeek/697c839a1bf1c9ee58be283b1396cf2e

This regular expression string matches all your examples:
"%(?:_?%)+"
It matches strings consisting of a '%' character followed by one or more sequences consisting of zero or one '_' character and one '%' character (close to literal translation), which is another way of saying what I did in comments: "a sequence of '%' and '_' characters, beginning and ending with '%', and not containing two consecutive '_' characters".

I'm not quite sure, if the listed inputs might have other instances, if not, maybe an expression with start and end anchor would be much applicable here, either one by one, or with something similar to:
^%{1,3}(_%{1,3})?(_%{1,3})?(_%)?$
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^%{1,3}(_%{1,3})?(_%{1,3})?(_%)?$";
final String string = "%_%\n"
+ "%%\n"
+ "%%_%%\n"
+ "%%%_%%%\n"
+ "%_%%%\n"
+ "%%%_%\n"
+ "%%_%_%\n"
+ "%%_%%%_%%%_%";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx Circuit
jex.im visualizes regular expressions:

Your newMethod1 actually works, except you have a typo - you're returning the input parmeter, not the result of your processing!
Change:
return input; // oops!
to:
return result;
Also, because you're not using regex, you should use replace() rather than replaceAll(), ie:
result = result.replace("%%","%").replace("%_%","%"); // still replaces all occurrences
replace() still replaces all occurrences.
BTW, although not as strict, this works for all of your (currently) posted examples:
public static String myMethod(String input) {
return input.replaceAll("%[%_]*", "%");
}

It looks like all the patterns start with %, then have 0+ % or _ chars and end with %.
Use a mere
input = input.replaceAll("%[%_]*%", "%");
See the regex demo and the regex graph:
Details
% - a % char
[%_]* - 0 or more % or _ chars
% - a % char.

Make regex not affecting Quotation mark [duplicate]

I have a string vaguely like this:
foo,bar,c;qual="baz,blurb",d;junk="quux,syzygy"
that I want to split by commas -- but I need to ignore commas in quotes. How can I do this? Seems like a regexp approach fails; I suppose I can manually scan and enter a different mode when I see a quote, but it would be nice to use preexisting libraries. (edit: I guess I meant libraries that are already part of the JDK or already part of a commonly-used libraries like Apache Commons.)
the above string should split into:
foo
bar
c;qual="baz,blurb"
d;junk="quux,syzygy"
note: this is NOT a CSV file, it's a single string contained in a file with a larger overall structure

Try:
public class Main {
public static void main(String[] args) {
String line = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";
String[] tokens = line.split(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)", -1);
for(String t : tokens) {
System.out.println("> "+t);
}
}
}
Output:
> foo
> bar
> c;qual="baz,blurb"
> d;junk="quux,syzygy"
In other words: split on the comma only if that comma has zero, or an even number of quotes ahead of it.
Or, a bit friendlier for the eyes:
public class Main {
public static void main(String[] args) {
String line = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";
String otherThanQuote = " [^\"] ";
String quotedString = String.format(" \" %s* \" ", otherThanQuote);
String regex = String.format("(?x) "+ // enable comments, ignore white spaces
", "+ // match a comma
"(?= "+ // start positive look ahead
" (?: "+ // start non-capturing group 1
" %s* "+ // match 'otherThanQuote' zero or more times
" %s "+ // match 'quotedString'
" )* "+ // end group 1 and repeat it zero or more times
" %s* "+ // match 'otherThanQuote'
" $ "+ // match the end of the string
") ", // stop positive look ahead
otherThanQuote, quotedString, otherThanQuote);
String[] tokens = line.split(regex, -1);
for(String t : tokens) {
System.out.println("> "+t);
}
}
}
which produces the same as the first example.
EDIT
As mentioned by #MikeFHay in the comments:
I prefer using Guava's Splitter, as it has saner defaults (see discussion above about empty matches being trimmed by String#split(), so I did:
Splitter.on(Pattern.compile(",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"))

While I do like regular expressions in general, for this kind of state-dependent tokenization I believe a simple parser (which in this case is much simpler than that word might make it sound) is probably a cleaner solution, in particular with regards to maintainability, e.g.:
String input = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";
List<String> result = new ArrayList<String>();
int start = 0;
boolean inQuotes = false;
for (int current = 0; current < input.length(); current++) {
if (input.charAt(current) == '\"') inQuotes = !inQuotes; // toggle state
else if (input.charAt(current) == ',' && !inQuotes) {
result.add(input.substring(start, current));
start = current + 1;
}
}
result.add(input.substring(start));
If you don't care about preserving the commas inside the quotes you could simplify this approach (no handling of start index, no last character special case) by replacing your commas in quotes by something else and then split at commas:
String input = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";
StringBuilder builder = new StringBuilder(input);
boolean inQuotes = false;
for (int currentIndex = 0; currentIndex < builder.length(); currentIndex++) {
char currentChar = builder.charAt(currentIndex);
if (currentChar == '\"') inQuotes = !inQuotes; // toggle state
if (currentChar == ',' && inQuotes) {
builder.setCharAt(currentIndex, ';'); // or '♡', and replace later
}
}
List<String> result = Arrays.asList(builder.toString().split(","));

http://sourceforge.net/projects/javacsv/
https://github.com/pupi1985/JavaCSV-Reloaded
(fork of the previous library that will allow the generated output to have Windows line terminators \r\n when not running Windows)
http://opencsv.sourceforge.net/
CSV API for Java
Can you recommend a Java library for reading (and possibly writing) CSV files?
Java lib or app to convert CSV to XML file?

I would not advise a regex answer from Bart, I find parsing solution better in this particular case (as Fabian proposed). I've tried regex solution and own parsing implementation I have found that:
Parsing is much faster than splitting with regex with backreferences - ~20 times faster for short strings, ~40 times faster for long strings.
Regex fails to find empty string after last comma. That was not in original question though, it was mine requirement.
My solution and test below.
String tested = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\",";
long start = System.nanoTime();
String[] tokens = tested.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
long timeWithSplitting = System.nanoTime() - start;
start = System.nanoTime();
List<String> tokensList = new ArrayList<String>();
boolean inQuotes = false;
StringBuilder b = new StringBuilder();
for (char c : tested.toCharArray()) {
switch (c) {
case ',':
if (inQuotes) {
b.append(c);
} else {
tokensList.add(b.toString());
b = new StringBuilder();
}
break;
case '\"':
inQuotes = !inQuotes;
default:
b.append(c);
break;
}
}
tokensList.add(b.toString());
long timeWithParsing = System.nanoTime() - start;
System.out.println(Arrays.toString(tokens));
System.out.println(tokensList.toString());
System.out.printf("Time with splitting:\t%10d\n",timeWithSplitting);
System.out.printf("Time with parsing:\t%10d\n",timeWithParsing);
Of course you are free to change switch to else-ifs in this snippet if you feel uncomfortable with its ugliness. Note then lack of break after switch with separator. StringBuilder was chosen instead to StringBuffer by design to increase speed, where thread safety is irrelevant.

You're in that annoying boundary area where regexps almost won't do (as has been pointed out by Bart, escaping the quotes would make life hard) , and yet a full-blown parser seems like overkill.
If you are likely to need greater complexity any time soon I would go looking for a parser library. For example this one

I was impatient and chose not to wait for answers... for reference it doesn't look that hard to do something like this (which works for my application, I don't need to worry about escaped quotes, as the stuff in quotes is limited to a few constrained forms):
final static private Pattern splitSearchPattern = Pattern.compile("[\",]");
private List<String> splitByCommasNotInQuotes(String s) {
if (s == null)
return Collections.emptyList();
List<String> list = new ArrayList<String>();
Matcher m = splitSearchPattern.matcher(s);
int pos = 0;
boolean quoteMode = false;
while (m.find())
{
String sep = m.group();
if ("\"".equals(sep))
{
quoteMode = !quoteMode;
}
else if (!quoteMode && ",".equals(sep))
{
int toPos = m.start();
list.add(s.substring(pos, toPos));
pos = m.end();
}
}
if (pos < s.length())
list.add(s.substring(pos));
return list;
}
(exercise for the reader: extend to handling escaped quotes by looking for backslashes also.)

Try a lookaround like (?!\"),(?!\"). This should match , that are not surrounded by ".

The simplest approach is not to match delimiters, i.e. commas, with a complex additional logic to match what is actually intended (the data which might be quoted strings), just to exclude false delimiters, but rather match the intended data in the first place.
The pattern consists of two alternatives, a quoted string ("[^"]*" or ".*?") or everything up to the next comma ([^,]+). To support empty cells, we have to allow the unquoted item to be empty and to consume the next comma, if any, and use the \\G anchor:
Pattern p = Pattern.compile("\\G\"(.*?)\",?|([^,]*),?");
The pattern also contains two capturing groups to get either, the quoted string’s content or the plain content.
Then, with Java 9, we can get an array as
String[] a = p.matcher(input).results()
.map(m -> m.group(m.start(1)<0? 2: 1))
.toArray(String[]::new);
whereas older Java versions need a loop like
for(Matcher m = p.matcher(input); m.find(); ) {
String token = m.group(m.start(1)<0? 2: 1);
System.out.println("found: "+token);
}
Adding the items to a List or an array is left as an excise to the reader.
For Java 8, you can use the results() implementation of this answer, to do it like the Java 9 solution.
For mixed content with embedded strings, like in the question, you can simply use
Pattern p = Pattern.compile("\\G((\"(.*?)\"|[^,])*),?");
But then, the strings are kept in their quoted form.

what about a one-liner using String.split()?
String s = "foo,bar,c;qual=\"baz,blurb\",d;junk=\"quux,syzygy\"";
String[] split = s.split( "(?<!\".{0,255}[^\"]),|,(?![^\"].*\")" );

A regular expression is not capable of handling escaped characters. For my application, I needed the ability to escape quotes and spaces (my separator is spaces, but the code is the same).
Here is my solution in Kotlin (the language from this particular application), based on the one from Fabian Steeg:
fun parseString(input: String): List<String> {
val result = mutableListOf<String>()
var inQuotes = false
var inEscape = false
val current = StringBuilder()
for (i in input.indices) {
// If this character is escaped, add it without looking
if (inEscape) {
inEscape = false
current.append(input[i])
continue
}
when (val c = input[i]) {
'\\' -> inEscape = true // escape the next character, \ isn't added to result
',' -> if (inQuotes) {
current.append(c)
} else {
result += current.toString()
current.clear()
}
'"' -> inQuotes = !inQuotes
else -> current.append(c)
}
}
if (current.isNotEmpty()) {
result += current.toString()
}
return result
}
I think this is not a place to use regular expressions. Contrary to other opinions, I don't think a parser is overkill. It's about 20 lines and fairly easy to test.

Rather than use lookahead and other crazy regex, just pull out the quotes first. That is, for every quote grouping, replace that grouping with __IDENTIFIER_1 or some other indicator, and map that grouping to a map of string,string.
After you split on comma, replace all mapped identifiers with the original string values.

I would do something like this:
boolean foundQuote = false;
if(charAtIndex(currentStringIndex) == '"')
{
foundQuote = true;
}
if(foundQuote == true)
{
//do nothing
}
else
{
string[] split = currentString.split(',');
}

how to split a string by "|"

I want to use regular expression to split this string:
String filter = "(go|add)addition|(sub)subtraction|(mul|into)multiplication|adding(add|go)values|(add|go)(go)(into)multiplication|";
I want to split it by | except when the pipe appears within brackets in which case they should be ignored, i.e. I am excepting an output like this:
(go|add)addition
(sub)subtraction
(mul|into)multiplication
adding(add|go)values
(add|go)(go)(into)multiplication
Updated
And then i want to move the words within the brackets at the start to the end.
Something like this..
addition(go|add)
subtraction(sub)
multiplication(mul|into)
adding(add|go)values
multiplication(add|go)(go)(into)
I have tried this regular expression: Splitting of string for `whitespace` & `and` but they have used quotes and I have not been able to make it work for brackets.

Already seen this question 15 min ago. Now that it is asked correctly, here is my proposition of answer :
Trying with a regex is complex because you need to count parenthesis. I advice you to manually parse the string like this :
public static void main(String[] args) {
String filter = "(go|add)addition|(sub)subtraction|(mul|into)multiplication|";
List<String> strings = new LinkedList<>();
int countParenthesis = 0;
StringBuilder current = new StringBuilder();
for(char c : filter.toCharArray()) {
if(c == '(') {countParenthesis ++;}
if(c == ')') {countParenthesis --;}
if(c == '|' && countParenthesis == 0) {
strings.add(current.toString());
current = new StringBuilder();
} else {
current.append(c);
}
}
strings.add(current.toString());
for (String string : strings) {
System.out.println(string+" ");
}
}
Output :
(go|add)addition
(sub)subtraction
(mul|into)multiplication

If you don't have nested parenthesis (so not (mul(iple|y)|foo)) you can use:
((?:\([^)]*\))*)([^()|]+(?:\([^)]*\)[^()|]*)*)
( #start first capturing group
(?: # non capturing group
\([^)]*\) # opening bracket, then anything except closing bracket, closing bracket
)* # possibly multiple bracket groups at the beginning
)
( # start second capturing group
[^()|]+ # go to the next bracket group, or the closing |
(?:
\([^)]*\)[^()|]* # bracket group, then go to the next bracket group/closing |
)* # possibly multiple brackets groups
) # close second capturing group
and replace with
\2\1
Explanation
((?:\([^)]*\))*) matches and captures all the parenthesis groups at the beginning
[^()|]* anything except (, ), or |. If there isn't any parenthesis, this will match everything.
(?:\([^)]*\)[^()|]*): (?:...) is a non capturing group, \([^)]*\) matches everything inside parenthesis, [^()|]* gets us up to the next parenthesis group or the | that ends the match.
Code sample:
String testString = "(go|add)addition|(sub)subtraction|(mul|into)multiplication|adding(add|go)values|(add|go)(go)(into)multiplication|";
Pattern p = Pattern.compile("((?:\\([^)]*\\))*)([^()|]+(?:\\([^)]*\\)[^()|]*)*)");
Matcher m = p.matcher(testString);
while (m.find()) {
System.out.println(m.group(2)+m.group(1));
}
Outputs (demo):
addition(go|add)
subtraction(sub)
multiplication(mul|into)
adding(add|go)values
multiplication(add|go)(go)(into)

Your String
"(go|add)addition|(sub)subtraction|(mul|into)multiplication|"
have a pattern |( from where you can split for this particular String pattern. But this wont give expected result if your sub string contains paranthesis( "(" ) in between ex:
(go|(add))addition.... continue
Hope this would help.

Set up bool to keep track if you are inside a parenthesis or not.
Bool isInside = True;
loop through string
if char at i = ")" isInside = False
if isInside = false
code for skipping |
else
code for leaving | here
something like this should work i think.

Android get nearby comma, space, or period in a string

I have a string and I want to get the first comma, space, or period in it.
int word = title.indexOf(" ", idx);
This will get the first space, how Can I make it to get the first thing from space, comma, or period?
I tried using || but didn't work.
ex.
int word = title.indexOf(" " || "," || ".", idx);

Gets the index of the first occurence of space, comma or dot or -1 if none of them could be found:
Pattern pattern = Pattern.compile("[ ,\\.]");
Matcher matcher = pattern.matcher(title);
int index = matcher.find() ? matcher.start() : -1;
Note that you can pre-compile the pattern and reuse it as often as you like.
See also http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Note also that if you want to break a text into single words, you can/should use a BreakIterator instead!

What you're doing isn't valid Java syntax. Use the indexOf() method with a space, comma and period, then determine the smallest of these 3 values.
int a = title.indexOf(" ", idx);
int b = title.indexof(",", idx);
int c = title.indexOf(".", idx);
Then just determine which is the smallest.
A faster way would be to write your own method. Behind the scenes, indexOf will just loop over all the characters. You can do that yourself manually
public static int findFirstOccurrence(String s) {
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) == ',' || // period/space) {
return i;
}
}
return -1;
}

Unfortunately, you can't use array of characters for indexOf, instead you need to call indexOf three times, or you can match a regex, the code you provided is invalid java syntax. this symbol || is a conditional OR operator that you can use to perform boolean operations like
if(x || y )

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find dash "-" that's not inside round brackets "()" within String - java

I'm trying to find/determine if a String contains the character "-" that is not enclosed in round brackets "()". I've tried the regex [^\(]-[^\)], but it's not working. Examples: 100 - 200 mg -> should match because the "-" is not enclosed in round brackets. 100 (+/-) units -> should NOT match

Matcher m = Pattern.compile("\\([^()-]-[^()]\\)").matcher(s); return !m.find(); https://ideone.com/YXvuem

Related

Regex to consolidate multiple rules

How to replace excessive SQL wildcard by single regex pattern?

Make regex not affecting Quotation mark [duplicate]

how to split a string by "|"

Android get nearby comma, space, or period in a string

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find dash "-" that's not inside round brackets "()" within String - java

I'm trying to find/determine if a String contains the character "-" that is not enclosed in round brackets "()". I've tried the regex [^\(]*-[^\)]*, but it's not working. Examples: 100 - 200 mg -> should match because the "-" is not enclosed in round brackets. 100 (+/-) units -> should NOT match

Matcher m = Pattern.compile("\\([^()-]*-[^()]*\\)").matcher(s); return !m.find(); https://ideone.com/YXvuem

Related

Regex to consolidate multiple rules

How to replace excessive SQL wildcard by single regex pattern?

Make regex not affecting Quotation mark [duplicate]

how to split a string by "|"

Android get nearby comma, space, or period in a string

Categories

Resources

I'm trying to find/determine if a String contains the character "-" that is not enclosed in round brackets "()". I've tried the regex [^\(]-[^\)], but it's not working. Examples: 100 - 200 mg -> should match because the "-" is not enclosed in round brackets. 100 (+/-) units -> should NOT match

Matcher m = Pattern.compile("\\([^()-]-[^()]\\)").matcher(s); return !m.find(); https://ideone.com/YXvuem