StreamTokenizer doesn't treat + as word

StreamTokenizer doesn't treat + as word - java

In code
switch(token){
case StreamTokenizer.TT_EOF:
eof = true;
break;
case StreamTokenizer.TT_NUMBER:
double value = tokenizer.nval;
operands.add(value);
break;
case StreamTokenizer.TT_WORD:
operate(tokenizer.sval);
break;
default:
throw new WrongPhraseException("Unnexpected operator or operand: " + tokenizer.sval +".");
}
I give as input RPN, ex: 5 4 3 + *
Why is + not treated as TT_WORD, it isn't treated as it so it throws Exception.

From the StreamTokenizer documentation:
For a single character token, its value is the single character, converted to an integer.
Since your + character is single character, it is probably being treated as TT_NUMBER; your case statement for TT_NUMBER will need to handle these cases as well. The same will apply for your unquoted * character as well, I assume. Thus you might try something like this:
case StreamTokenizer.TT_NUMBER:
Double value = new Double(tokenizer.nval);
if (Character.isDigit(value.intValue()) {
operands.add(value.doubleValue());
} else {
// Possibly dealing with operator here. The hard/fun part is
// in coercing that double value back to its tokenized string
// form.
operate(new Character((char) tokenizer.nval).toString());
}
break;
Hope this helps!

Related

Java switch statement not working, even 'default' isn't

My switch statement isn't working as whole.
I have never used switch in Java, and I dont know what I did wrong. It is also not executing default. I looked some info up about switch statements, and I think maybe it is because of this line:
if (pair.length == 2) {
// Voorbeeld van het gebruik van de key/value pairs
switch (pair[0]) {
because what I looked up it looked like everybody was using a variable on the pair[0] spot.
Thanks in advance!
String scanString = result.getText(); // result.getText();
String[] parts = scanString.split("\\||");
// Loop alle delen tussen | langs
for (String part : parts) {
String[] pair = part.split("\\|"); // Bevat de key en value pair voor en na het streepje
if (pair.length == 2) {
// Voorbeeld van het gebruik van de key/value pairs
switch (pair[0]) {
case "po":
System.out.println("Productieorder: " + pair[1]);
edt2.setText(pair[1]);
break;
case "tnr":
System.out.println("Tekeningnummer: " + pair[1]);
break;
case "ref":
System.out.println("Referentie: " + pair[1]);
break;
case "hafa":
System.out.println("Half Fabrikaat: " + pair[1]);
break;
case "art":
System.out.println("Artikel: " + pair[1]);
break;
case "atl":
System.out.println("Aantal: " + pair[1]);
break;
case "loc":
System.out.println("Locatie: " + pair[1]);
edt4.setText(pair[1]);
break;
default:
System.out.println("NIET GELUKT");
}
}
}
Edit
I Will try simply this: if (pair.length > 2) instead of == 2, I acually don't even know why it was == 2, because I need to scan qr string that can exist out of more than 3000 chars.

Problem is here.
String[] parts = scanString.split("\\||");
It is no difference from
String[] parts = scanString.split("");
It will split every letters of the string.
For example:
"Hello".split("\\||")
Its return value is an array like
["H","e","l","l","o"]
If you want to split a string by two | , you should write:
String[] parts = scanString.split("\\|\\|")

Problem is in String[] parts = scanString.split("\||");
and String[] pair = part.split("\|"); which spliting string by character.
and the condition if (pair.length == 2) is checking size 2 whic returns false so the control isn't entering into the switch block.
You can install a breakpoint and debug it.

You should use split("\\|") if wanting to split by |; split("\\|\\|") if wanting to split by ||.
Otherwise the second | without regex escape \ will be an OR, and as such the string is split on the empty string too, giving an array of strings containing just one letter (though not |).

If you are using split function and then you need to keep in mind below points. This function does not take as it is input to split.
one-char String and this character is not one of the RegEx's meta characters ".$|()[{^?*+\"
two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter

StreamTokenizer mangles integers and loose periods

I've appropriated and modified the below code which does a pretty good job of tokenizing Java code using Java's StreamTokenizer. Its number handling is problematic, though:
it turns all integers into doubles. I can get past that by testing num % 1 == 0, but this feels like a hack
More critically, a . following whitespace is treated as a number. "Class .method()" is legal Java syntax, but the resulting tokens are [Word "Class"], [Whitespace " "], [Number 0.0], [Word "method"], [Symbol "("], and [Symbol ")"]
I'd be happy turning off StreamTokenizer's number parsing entirely and parsing the numbers myself from word tokens, but commenting st.parseNumbers() seems to have no effect.
public class JavaTokenizer {
private String code;
private List<Token> tokens;
public JavaTokenizer(String c) {
code = c;
tokens = new ArrayList<>();
}
public void tokenize() {
try {
// Create the tokenizer
StringReader sr = new StringReader(code);
StreamTokenizer st = new StreamTokenizer(sr);
// Java-style tokenizing rules
st.parseNumbers();
st.wordChars('_', '_');
st.eolIsSignificant(false);
// Don't want whitespace tokens
//st.ordinaryChars(0, ' ');
// Strip out comments
st.slashSlashComments(true);
st.slashStarComments(true);
// Parse the file
int token;
do {
token = st.nextToken();
switch (token) {
case StreamTokenizer.TT_NUMBER:
// A number was found; the value is in nval
double num = st.nval;
if(num % 1 == 0)
tokens.add(new IntegerToken((int)num);
else
tokens.add(new FPNumberToken(num));
break;
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
tokens.add(new WordToken(word));
break;
case '"':
// A double-quoted string was found; sval contains the contents
String dquoteVal = st.sval;
tokens.add(new DoubleQuotedStringToken(dquoteVal));
break;
case '\'':
// A single-quoted string was found; sval contains the contents
String squoteVal = st.sval;
tokens.add(new SingleQuotedStringToken(squoteVal));
break;
case StreamTokenizer.TT_EOL:
// End of line character found
tokens.add(new EOLToken());
break;
case StreamTokenizer.TT_EOF:
// End of file has been reached
tokens. add(new EOFToken());
break;
default:
// A regular character was found; the value is the token itself
char ch = (char) st.ttype;
if(Character.isWhitespace(ch))
tokens.add(new WhitespaceToken(ch));
else
tokens.add(new SymbolToken(ch));
break;
}
} while (token != StreamTokenizer.TT_EOF);
sr.close();
} catch (IOException e) {
}
}
public List<Token> getTokens() {
return tokens;
}
}

parseNumbers() in "on" by default. Use resetSyntax() to turn off number parsing and all other predefined character types, then enable what you need.
That said, manual number parsing might get tricky with accounting for dots and exponents... With a scanner and regular expressions it should be relatively straightforward to implement your own tokenizer, tailored exactly to your needs. For an example, you may want to take a look at the Tokenizer inner class here: https://github.com/stefanhaustein/expressionparser/blob/master/core/src/main/java/org/kobjects/expressionparser/ExpressionParser.java (about 120 LOC at the end)

I'll look into parboiled when I have a chance. In the meantime, the disgusting workaround I implemented to get it working is:
private static final String DANGLING_PERIOD_TOKEN = "___DANGLING_PERIOD_TOKEN___";
Then in tokenize()
//a period following whitespace, not followed by a digit is a "dangling period"
code = code.replaceAll("(?<=\\s)\\.(?![0-9])", " "+DANGLING_PERIOD_TOKEN+" ");
And in the tokenization loop
case StreamTokenizer.TT_WORD:
// A word was found; the value is in sval
String word = st.sval;
if(word.equals(DANGLING_PERIOD_TOKEN))
tokens.add(new SymbolToken('.'));
else
tokens.add(new WordToken(word));
break;
This solution is specific to my needs of not caring what the original whitespace was (as it adds some around the inserted "token")

java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 +

I am getting the error when I launch my UI that causes this code to spit the error at me in the title. It works for all of my other operator symbols so I am really not sure what's going on here. I didn't want to post all the code so you can find the rest if this isn't enough on my gitHub: https://github.com/jparr721/Calculator-App/tree/master/src/calculator
public class Calculation_Controls {
public double A, B;
private String[] operators = new String[] {"-","+","/","*","x","^","X"};
/**
* Check for the symbol being used within the TextArea to then
* apply the correct caculation method.
* FIXME - Allow for multiple symbols to be used and have them return
* FIXME - a result in accordance with PEMDAS
*
*#param nums
*
* #return operator, or error
*/
public String findSymbol(String nums) {
for (String operator : operators) {
if (nums.contains(operator)) {
return operator;
}
}
return "invalid input";
}
/**
* Input method to take the user input from the text area
* and apply the correct calculation to it
*
* #param nums - Stores the input as a String which I then convert to an int
* then back to a string to be printed to the TextArea
*
* #return - The result of the calculation as a string
*/
public String input(String nums){
String operator = findSymbol(nums);
if (operator == null){
System.out.println("Invalid input");
}
String[] split = nums.split(operator);
int left = Integer.parseInt(split[0]);
int right = Integer.parseInt((split[1]));
String result = "";
switch (operator){
case "+":
result = Double.toString(add(left, right));
break;
case "-":
result = Double.toString(subtract(left, right));
break;
case "*":
case "x":
case "X":
result = Double.toString(multiply(left, right));
break;
case "/":
result = Double.toString(divide(left, right));
break;
case "^":
result = Double.toString(pwr(left, right));
break;
default:
System.out.println("Invalid Operator");
}
return result;
}

There are reserved character in Regex and you should scape these character to achieve what you want. For example, you can't use String.split("+"), you have to use String.split("\\+").
The correct operators would be:
String[] operators = new String[] {"-","\\+","/","\\*","x","\\^","X"};

in your case + * and ^ are treated with a special meaning, most often called as Metacharacters. String.split() method takes a regex expression as its argument and return a String array. To avoid treating above as a Metacharacters you need to use these escape sequences in your code "\\+" "\\*" "\\^"
modify your operator array like this
private String[] operators = new String[] {"-","\\+","/","\\*","x","\\^","X"};
for more detalis refere these links regex.Pattern and String.split()

you can use
case String.valueOf('+');

Change: String[] split = nums.split(operator);
To this: String[] split = nums.split("\\" + operator);
edit: This will only work for standard operators, not the x or X. You'll have to change your String[] operators declaration actually, like the other answer mentioned. Personally though, I'd do some kind of input validation and do a replace() on x or X to be * instead

the split() method uses regex. the '+' symbol is a special character in regex, so you need to escape it with the backslash symbol ('\'). But you also need to escape the backslash symbol in java, so you need two backslashes, e.g. "\\+"

How to exclude an escape character from being treated as an escape character

I have a Java string
String t = "Region S\u00FCdost SER";
where \u00FC is a replacement for the unicode character "ü"
If i add a new escape char to the above string, i would still want my below function to escape other chars excluding the current .
For example, the below function on re running would return the result as "Region S\\u00FCdost SER" and "Region S\\\\u00FCdost SER" on subsequent iterations.
How do we prevent this?
public static String escapeString(String str)
{
StringBuffer result = new StringBuffer();
// char is 16 bits long and can hold an UTF-16 code
// i iterate on chars and not on code points
// i guess this will be enough until we need to support surrogate pairs
for (int i = 0; i < str.length(); i++)
{
char c = str.charAt(i);
switch (c) {
case '"':
result.append("\\\""); //$NON-NLS-1$
break;
case '\b':
result.append("\\b"); //$NON-NLS-1$
break;
case '\t':
result.append("\\t"); //$NON-NLS-1$
break;
case '\n':
result.append("\\n"); //$NON-NLS-1$
break;
case '\f':
result.append("\\f"); //$NON-NLS-1$
break;
case '\r':
result.append("\\r"); //$NON-NLS-1$
break;
case '\'':
result.append("\\'"); //$NON-NLS-1$
break;
case '\\':
result.append("\\\\"); //$NON-NLS-1$
break;
default:
if (c < 128)
{
//is ascii
result.append(c);
}
else
{
result.append(
String.format("\\u%04X", (int) c)); //$NON-NLS-1$
}
}
}
return result.toString();
}
}

You can do:
case '\\':
if(str.charAt(i+1)!='u')
result.append("\\\\");
else
result.append("\\");
break;
Assuming that \u will always denote a unicode character sequence in your string.

When you write a Java string literal as "Region S\u00FCdost SER", the Java compiler will interpret that as the string value Region Südost SER, which is what the escape() method will see when called on t.
If you wanted the string Region S\u00FCdost SER, you should have escaped the \ , i.e. "Region S\\u00FCdost SER".
If you keep running the escape() method, I believe you'll see what you want.
String s = "Region S\u00FCdost SER";
System.out.println(s); // print original text
for (int i = 0; i < 4; i++) {
s = escapeString(s);
System.out.println(s);
}
Output:
Region Südost SER <-- original text
Region S\u00FCdost SER
Region S\\u00FCdost SER
Region S\\\\u00FCdost SER
Region S\\\\\\\\u00FCdost SER
If you change input to "He'd say: \"Bitte schön\"", you get:
He'd say: "Bitte schön" <-- original text
He\'d say: \"Bitte sch\u00F6n\"
He\\\'d say: \\\"Bitte sch\\u00F6n\\\"
He\\\\\\\'d say: \\\\\\\"Bitte sch\\\\u00F6n\\\\\\\"
He\\\\\\\\\\\\\\\'d say: \\\\\\\\\\\\\\\"Bitte sch\\\\\\\\u00F6n\\\\\\\\\\\\\\\"
I mean, this is what you wanted, right? If not, please clarify question by actually showing example output of what you want.

Return value in atoi function

Currently I have an Atoi function in Java which returns an int value on passing a string input. The initial return value is set to 0. However, the value returned will be 0 if invalid characters or all characters are passed in input string and if the actual string passed is just "0". How can I use return values for these two cases? Or is this okay and I should leave it upto the client to handle this ?

You almost certainly shouldn't use a return value in that situation - you should probably use an exception.
At that point though, I'm not sure why you're writing your own method in the first place - use Integer.parseInt instead.
If you need to be able to convey the notion of invalid input without an exception, you could potentially write a method which returns Integer instead of int, and returns null if there's invalid input, and an appropriate Integer reference otherwise.
(I'd also point out that Java tends to favour meaningful names, rather than somewhat arbitrary collections of letters such as atoi.)

This exact function is already handled by the Integer.parseInt() family of methods. Note the way they handle malformed input: they throw an exception.
You should seriously consider using these methods instead of your own.

If your function were returning a float or double, you could return NaN, but that's not the case.
Don't encode this information in the return value. Throw an exception if you encounter an invalid input string.

If zero, 0, is an invalid value, for instance in a banking application transfering money, you won't need to differentiate between invalid zero and input zero.
If negative values are invalid input, you can return invalid negative one, -1, as a return value in case of input errors.
If zero, negative and positive values are all valid inputs, you should consider wrapping the object, for instance using Integer.
If you must use a zero to indicate invalid result, and all above cases are not possible to use, add another method, for instance checkAtoI, that returns a boolean whether the parsed input is correct. In this case, you can check for zero and then call the checkAtoI method to know whether it was a valid zero input, or an invalid input resulting in a zero, 0, returned integer. This implementation is quite easy too:
boolean check(String s) {
return s.equals("0");
}

pay attention: throwing exception is slower then returning null.
My suggest is to return null if parsing failed. But this solution must be taken into account in the context of the whole your application

If you wanted just the integer prefix of a string, i'd probably use regex to grab just numbers
[0-9]*
and then call Integer.parseInt
on that, something like
Pattern p = Pattern.compile("\s*([0-9]*)");
Match m = p.matcher(input);
if (m.matches())
return Integer.parseInt(m.group(1));
else
return 0;

One of the option is to throw an exception which the client would need to handle and then act appropriately, doing so you are delegating the error handling to the client so that as the API implementer you do not have to worry about whether this situation is error, exception or normal for the clients. They will decide and handle the exception (suppress the noise, alert client, ask client to re-enter etc).
Another option would be to return -1 if there is any error in parsing the string, in this case also client will take the decision what to do next.
Although the advantage with throwing an exception is you can always pass details about what the error was in Exception message which is not possible if you pass -1.
private static int getDecValue(char hex) {
int dec = 0;
switch (hex) {
case '0':
dec = 0;
break;
case '1':
dec = 1;
break;
case '2':
dec = 2;
break;
case '3':
dec = 3;
break;
case '4':
dec = 4;
break;
case '5':
dec = 5;
break;
case '6':
dec = 6;
break;
case '7':
dec = 7;
break;
case '8':
dec = 8;
break;
case '9':
dec = 9;
break;
default:
// do nothing
}
return dec;
}
public static int atoi(String ascii) throws Exception {
int integer = 0;
for (int index = 0; index < ascii.length(); index++) {
if (ascii.charAt(index) >= '0' && ascii.charAt(index) <= '9') {
integer = (integer * 10) + getDecValue(ascii.charAt(index));
} else {
throw new Exception("Is not an Integer : " + ascii.charAt(index));
}
}
return integer;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

StreamTokenizer doesn't treat + as word - java

Related

Java switch statement not working, even 'default' isn't

StreamTokenizer mangles integers and loose periods

java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0 +

How to exclude an escape character from being treated as an escape character

Return value in atoi function

Categories

Resources