java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 + - java

I am getting the error when I launch my UI that causes this code to spit the error at me in the title. It works for all of my other operator symbols so I am really not sure what's going on here. I didn't want to post all the code so you can find the rest if this isn't enough on my gitHub: https://github.com/jparr721/Calculator-App/tree/master/src/calculator
public class Calculation_Controls {
public double A, B;
private String[] operators = new String[] {"-","+","/","*","x","^","X"};
/**
* Check for the symbol being used within the TextArea to then
* apply the correct caculation method.
* FIXME - Allow for multiple symbols to be used and have them return
* FIXME - a result in accordance with PEMDAS
*
*#param nums
*
* #return operator, or error
*/
public String findSymbol(String nums) {
for (String operator : operators) {
if (nums.contains(operator)) {
return operator;
}
}
return "invalid input";
}
/**
* Input method to take the user input from the text area
* and apply the correct calculation to it
*
* #param nums - Stores the input as a String which I then convert to an int
* then back to a string to be printed to the TextArea
*
* #return - The result of the calculation as a string
*/
public String input(String nums){
String operator = findSymbol(nums);
if (operator == null){
System.out.println("Invalid input");
}
String[] split = nums.split(operator);
int left = Integer.parseInt(split[0]);
int right = Integer.parseInt((split[1]));
String result = "";
switch (operator){
case "+":
result = Double.toString(add(left, right));
break;
case "-":
result = Double.toString(subtract(left, right));
break;
case "*":
case "x":
case "X":
result = Double.toString(multiply(left, right));
break;
case "/":
result = Double.toString(divide(left, right));
break;
case "^":
result = Double.toString(pwr(left, right));
break;
default:
System.out.println("Invalid Operator");
}
return result;
}

There are reserved character in Regex and you should scape these character to achieve what you want. For example, you can't use String.split("+"), you have to use String.split("\\+").
The correct operators would be:
String[] operators = new String[] {"-","\\+","/","\\*","x","\\^","X"};

in your case + * and ^ are treated with a special meaning, most often called as Metacharacters. String.split() method takes a regex expression as its argument and return a String array. To avoid treating above as a Metacharacters you need to use these escape sequences in your code "\\+" "\\*" "\\^"
modify your operator array like this
private String[] operators = new String[] {"-","\\+","/","\\*","x","\\^","X"};
for more detalis refere these links regex.Pattern and String.split()

you can use
case String.valueOf('+');

Change: String[] split = nums.split(operator);
To this: String[] split = nums.split("\\" + operator);
edit: This will only work for standard operators, not the x or X. You'll have to change your String[] operators declaration actually, like the other answer mentioned. Personally though, I'd do some kind of input validation and do a replace() on x or X to be * instead

the split() method uses regex. the '+' symbol is a special character in regex, so you need to escape it with the backslash symbol ('\'). But you also need to escape the backslash symbol in java, so you need two backslashes, e.g. "\\+"

Related

How do separate out these elements of an equation in a string to a required format?

I have a string input which looks like this:
String equation = "(5.5 + 65) - 33".
How would I go about separating these elements into an array which looked like this:
String array = {"(", "5.5", "+", "65", ")", "-", "33"}
I tried using the string split() method but because of there being no spaces between the parenthesis and the next digit it produces the incorrect format of:
String array = {"(5.5", "+"
You can do this with a StreamTokenizer:
StreamTokenizer st = new StreamTokenizer(new StringReader(equation));
st.parseNumbers();
List<String> tokens = new ArrayList<>();
while (st.nextToken() != StreamTokenizer.TT_EOF) {
switch (st.ttype) {
case StreamTokenizer.TT_EOL:
// ignore
break;
case StreamTokenizer.TT_WORD:
tokens.add(st.sval);
break;
case StreamTokenizer.TT_NUMBER:
tokens.add(String.valueOf(st.nval));
break;
default:
tokens.add(String.valueOf((char) st.ttype));
}
}
String[] array = tokens.toArray(new String[tokens.size()]);
Note that because this parses the numbers as double, they become e.g. 65.0 when converted back to strings. If you don't want that, you'll need to add some number formatting.
I suspect that whatever you're planning to do with them later, you actually want them as numbers though.

StreamTokenizer mangles integers and loose periods

I've appropriated and modified the below code which does a pretty good job of tokenizing Java code using Java's StreamTokenizer. Its number handling is problematic, though:
it turns all integers into doubles. I can get past that by testing num % 1 == 0, but this feels like a hack
More critically, a . following whitespace is treated as a number. "Class .method()" is legal Java syntax, but the resulting tokens are [Word "Class"], [Whitespace " "], [Number 0.0], [Word "method"], [Symbol "("], and [Symbol ")"]
I'd be happy turning off StreamTokenizer's number parsing entirely and parsing the numbers myself from word tokens, but commenting st.parseNumbers() seems to have no effect.
public class JavaTokenizer {
private String code;
private List<Token> tokens;
public JavaTokenizer(String c) {
code = c;
tokens = new ArrayList<>();
}
public void tokenize() {
try {
// Create the tokenizer
StringReader sr = new StringReader(code);
StreamTokenizer st = new StreamTokenizer(sr);
// Java-style tokenizing rules
st.parseNumbers();
st.wordChars('_', '_');
st.eolIsSignificant(false);
// Don't want whitespace tokens
//st.ordinaryChars(0, ' ');
// Strip out comments
st.slashSlashComments(true);
st.slashStarComments(true);
// Parse the file
int token;
do {
token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_NUMBER:
// A number was found; the value is in nval
double num = st.nval;
if(num % 1 == 0)
tokens.add(new IntegerToken((int)num);
else
tokens.add(new FPNumberToken(num));
break;
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
tokens.add(new WordToken(word));
break;
case '"':
// A double-quoted string was found; sval contains the contents
String dquoteVal = st.sval;
tokens.add(new DoubleQuotedStringToken(dquoteVal));
break;
case '\'':
// A single-quoted string was found; sval contains the contents
String squoteVal = st.sval;
tokens.add(new SingleQuotedStringToken(squoteVal));
break;
case StreamTokenizer.TT_EOL:
// End of line character found
tokens.add(new EOLToken());
break;
case StreamTokenizer.TT_EOF:
// End of file has been reached
tokens. add(new EOFToken());
break;
default:
// A regular character was found; the value is the token itself
char ch = (char) st.ttype;
if(Character.isWhitespace(ch))
tokens.add(new WhitespaceToken(ch));
else
tokens.add(new SymbolToken(ch));
break;
}
} while (token != StreamTokenizer.TT_EOF);
sr.close();
} catch (IOException e) {
}
}
public List<Token> getTokens() {
return tokens;
}
}
parseNumbers() in "on" by default. Use resetSyntax() to turn off number parsing and all other predefined character types, then enable what you need.
That said, manual number parsing might get tricky with accounting for dots and exponents... With a scanner and regular expressions it should be relatively straightforward to implement your own tokenizer, tailored exactly to your needs. For an example, you may want to take a look at the Tokenizer inner class here: https://github.com/stefanhaustein/expressionparser/blob/master/core/src/main/java/org/kobjects/expressionparser/ExpressionParser.java (about 120 LOC at the end)
I'll look into parboiled when I have a chance. In the meantime, the disgusting workaround I implemented to get it working is:
private static final String DANGLING_PERIOD_TOKEN = "___DANGLING_PERIOD_TOKEN___";
Then in tokenize()
//a period following whitespace, not followed by a digit is a "dangling period"
code = code.replaceAll("(?<=\\s)\\.(?![0-9])", " "+DANGLING_PERIOD_TOKEN+" ");
And in the tokenization loop
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
if(word.equals(DANGLING_PERIOD_TOKEN))
tokens.add(new SymbolToken('.'));
else
tokens.add(new WordToken(word));
break;
This solution is specific to my needs of not caring what the original whitespace was (as it adds some around the inserted "token")

StreamTokenizer doesn't treat + as word

In code
switch(token){
case StreamTokenizer.TT_EOF:
eof = true;
break;
case StreamTokenizer.TT_NUMBER:
double value = tokenizer.nval;
operands.add(value);
break;
case StreamTokenizer.TT_WORD:
operate(tokenizer.sval);
break;
default:
throw new WrongPhraseException("Unnexpected operator or operand: " + tokenizer.sval +".");
}
I give as input RPN, ex: 5 4 3 + *
Why is + not treated as TT_WORD, it isn't treated as it so it throws Exception.
From the StreamTokenizer documentation:
For a single character token, its value is the single character, converted to an integer.
Since your + character is single character, it is probably being treated as TT_NUMBER; your case statement for TT_NUMBER will need to handle these cases as well. The same will apply for your unquoted * character as well, I assume. Thus you might try something like this:
case StreamTokenizer.TT_NUMBER:
Double value = new Double(tokenizer.nval);
if (Character.isDigit(value.intValue()) {
operands.add(value.doubleValue());
} else {
// Possibly dealing with operator here. The hard/fun part is
// in coercing that double value back to its tokenized string
// form.
operate(new Character((char) tokenizer.nval).toString());
}
break;
Hope this helps!

How to create a Pattern matching given set of chars?

I get a set of chars, e.g. as a String containing all of them and need a charclass Pattern matching any of them. For example
for "abcde" I want "[a-e]"
for "[]^-" I want "[-^\\[\\]]"
How can I create a compact solution and how to handle border cases like empty set and set of all chars?
What chars need to be escaped?
Clarification
I want to create a charclass Pattern, i.e. something like "[...]", no repetitions and no such stuff. It must work for any input, that's why I'm interested in the corner cases, too.
Here's a start:
import java.util.*;
public class RegexUtils {
private static String encode(char c) {
switch (c) {
case '[':
case ']':
case '\\':
case '-':
case '^':
return "\\" + c;
default:
return String.valueOf(c);
}
}
public static String createCharClass(char[] chars) {
if (chars.length == 0) {
return "[^\\u0000-\\uFFFF]";
}
StringBuilder builder = new StringBuilder();
boolean includeCaret = false;
boolean includeMinus = false;
List<Character> set = new ArrayList<Character>(new TreeSet<Character>(toCharList(chars)));
if (set.size() == 1<<16) {
return "[\\w\\W]";
}
for (int i = 0; i < set.size(); i++) {
int rangeLength = discoverRange(i, set);
if (rangeLength > 2) {
builder.append(encode(set.get(i))).append('-').append(encode(set.get(i + rangeLength)));
i += rangeLength;
} else {
switch (set.get(i)) {
case '[':
case ']':
case '\\':
builder.append('\\').append(set.get(i));
break;
case '-':
includeMinus = true;
break;
case '^':
includeCaret = true;
break;
default:
builder.append(set.get(i));
break;
}
}
}
builder.append(includeCaret ? "^" : "");
builder.insert(0, includeMinus ? "-" : "");
return "[" + builder + "]";
}
private static List<Character> toCharList(char[] chars) {
List<Character> list = new ArrayList<Character>();
for (char c : chars) {
list.add(c);
}
return list;
}
private static int discoverRange(int index, List<Character> chars) {
int range = 0;
for (int i = index + 1; i < chars.size(); i++) {
if (chars.get(i) - chars.get(i - 1) != 1) break;
range++;
}
return range;
}
public static void main(String[] args) {
System.out.println(createCharClass("daecb".toCharArray()));
System.out.println(createCharClass("[]^-".toCharArray()));
System.out.println(createCharClass("".toCharArray()));
System.out.println(createCharClass("d1a3e5c55543b2000".toCharArray()));
System.out.println(createCharClass("!-./0".toCharArray()));
}
}
As you can see, the input:
"daecb".toCharArray()
"[]^-".toCharArray()
"".toCharArray()
"d1a3e5c55543b2000".toCharArray()
prints:
[a-e]
[-\[\]^]
[^\u0000-\uFFFF]
[0-5a-e]
[!\--0]
The corner cases in a character class are:
\
[
]
which will need a \ to be escaped. The character ^ doesn't need an escape if it's not placed at the start of a character class, and the - does not need to be escaped when it's placed at the start, or end of the character class (hence the boolean flags in my code).
The empty set is [^\u0000-\uFFFF], and the set of all the characters is [\u0000-\uFFFF]. Not sure what you need the former for as it won't match anything. I'd throw an IllegalArgumentException() on an empty string instead.
What chars need to be escaped?
- ^ \ [ ] - that's all of them, I've actually tested it. And unlike some other regex implementations [ is considered a meta character inside a character class, possibly due to the possibility of using inner character classes with operators.
The rest of task sounds easy, but rather tedious. First you need to select unique characters. Then loop through them, appending to a StringBuilder, possibly escaping. If you want character ranges, you need to sort the characters first and select contiguous ranges while looping. If you want the - to be at the beginning of the range with no escaping, then set a flag, but don't append it. After the loop, if the flag is set, prepend - to the result before wrapping it in [].
Match all characters ".*" (zero or more repeitions * of matching any character . .
Match a blank line "^$" (match start of a line ^ and end of a line $. Note the lack of stuff to match in the middle of the line).
Not sure if the last pattern is exactly what you wanted, as there's different interpretations to "match nothing".
A quick, dirty, and almost-not-pseudo-code answer:
StringBuilder sb = new StringBuilder("[");
Set<Character> metaChars = //...appropriate initialization
while (sourceString.length() != 0) {
char c = sourceString.charAt(0);
sb.append(metaChars.contains(c) ? "\\"+c : c);
sourceString.replace(c,'');
}
sb.append("]");
Pattern p = Pattern.compile(sb.toString());
//...can check here for the appropriate sb.length cases
// e.g, 2 = empty, all chars equals the count of whatever set qualifies as all chars, etc
Which gives you the unique string of char's you want to match, with meta-characters replaced. It will not convert things into ranges (which I think is fine - doing so smells like premature optimization to me). You can do some post tests for simple set cases - like matching sb against digits, non-digits, etc, but unless you know that's going to buy you a lot of performance (or the simplification is the point of this program), I wouldn't bother.
If you really want to do ranges, you could instead sourceString.toCharArray(), sort that, iterate deleting repetitions and doing some sort of range check and replacing meta characters as you add the contents to StringBuilder.
EDIT: I actually kind of liked the toCharArray version, so pseudo-coded it out as well:
//...check for empty here, if not...
char[] sourceC = sourceString.toCharArray();
Arrays.sort(sourceC);
lastC = sourceC[0];
StringBuilder sb = new StringBuilder("[");
StringBuilder range = new StringBuilder();
for (int i=1; i<sourceC.length; i++) {
if (lastC == sourceC[i]) continue;
if (//.. next char in sequence..//) //..add to range
else {
// check range size, append accordingly to sb as a single item, range, etc
}
lastC = sourceC[i];
}

Java String Special character replacement

I have string which contains alpahanumeric and special character.
I need to replace each and every special char with some string.
For eg,
Input string = "ja*va st&ri%n#&"
Expected o/p = "jaasteriskvaspacestandripercentagenatand"
= "asterisk"
& = "and"
% = "percentage"
# = "at"
thanks,
Unless you're absolutely desperate for performance, I'd use a very simple approach:
String result = input.replace("*", "asterisk")
.replace("%", "percentage")
.replace("#", "at"); // Add more to taste :)
(Note that there's a big difference between replace and replaceAll - the latter takes a regular expression. It's easy to get the wrong one and see radically different effects!)
An alternative would be something like:
public static String replaceSpecial(String input)
{
// Output will be at least as long as input
StringBuilder builder = new StringBuilder(input.length());
for (int i = 0; i < input.length(); i++)
{
char c = input.charAt(i);
switch (c)
{
case '*': builder.append("asterisk"); break;
case '%': builder.append("percentage"); break;
case '#': builder.append("at"); break;
default: builder.append(c); break;
}
}
return builder.toString();
Take a look at the following java.lang.String methods:
replace()
replaceAll()

Categories