How to discard unwanted tokens - java

I made this token iterator class that scans an input line by each character and creates Strings. I am able to make the class read all the tokens and separate them, but I cannot get it to remove all the invalid/unwanted tokens in an input.
I tried doing something with character.toString(counter) so those characters are made into a string and then wrote if statements that when the token is not "not", "true" , "false" or "and" then discard it and go on to the next token.
import java.util.Iterator;
public class Tokiter implements Iterator<String>{
private char counter = 0;
//input line to be tokenized
private String line;
// the next Token, null if no next Token
private String nextToken;
// implement
public TokIter(String line){
this.line = line;
}
#Override
// implement
public boolean hasNext() {
if (counter >= line.length())
return false;
else if (line.charAt(counter) == ' ')
{
counter++;
return hasNext();
}
else
return true;
}
#Override
//implement
public String next() {
String s = "";
if (!hasNext())
{
// System.out.println("Null");
return null;
}
else if( line.charAt(counter) == ('('))
{
// System.out.println("Token");
s += line.charAt(counter);
counter++;
return s;
}
else if( line.charAt(counter) == (')'))
{
// System.out.println("Token");
s += line.charAt(counter);
counter++;
return s;
}
else
s += line.charAt(counter);
counter++;
if (counter >= line.length()){
return s;
}
while (Character.isLetter(line.charAt(counter)))
{
s += line.charAt(counter);
counter++;
if (counter >= line.length()){
return s;
}
}
return s;
}
#Override
public void remove() {
// TODO Auto-generated method stub
throw new UnsupportedOperationException();
}
// provided
public static void main(String[] args){
String line;
// you can play with other inputs on the command line
if(args.length>0)
line = args[0];
// or do the standard test
else
line = " not random (true or false) ** whatever ";
System.out.println("line: [" + line + "]");
Tokiter tokIt = new Tokiter(line);
while(tokIt.hasNext()){
System.out.println("next token: [" + tokIt.next() + "]");
}
}
}
So for example when the program runs the input line not random (true or false) ** whatever the output will be:
line: [ not random (true or false) ** whatever ]
next token: [not]
next token: [(]
next token: [true]
next token: [or]
next token: [false]
next token: [)]

Related

How to skip comments in java when reading from a text file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
What I'm trying to do is make it so that when the text file is read the comments within in the text file are ignored and not printed along will everything else.
This is the code I have atm and the first part which skips single line comments works however the second part where it tries to skip block comments just doesn't work and the following error message is given
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 186
at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:48)
at java.base/java.lang.String.charAt(String.java:712)
at Q3.scan(Q3.java:281)
at Q3.main(Q3.java:15)
Any help would be great. I can't really change the style of the way it currently is too much either
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Q3 {
public static void main(String[] args) {
System.out.println("Q3Example1:");
String prog1 = readFile2String("src/Q3Example1.txt");
scan(prog1);
System.out.println("\nQ3Example2:");
String prog2 = readFile2String("src/Q3Example2.txt");
scan(prog2);
//fix block comments
}
/**
* takes the input operator and finds the operator tokentype
* #param ch is the current character
* #return the corresponding tokentype
*/
public static TokenType getOp(char ch) {
TokenType t = null;
if(ch == '+') t = TokenType.OP_ADD;
else if (ch == '-') t = TokenType.OP_SUBTRACT;
else if (ch == '*') t = TokenType.OP_MULTIPLY;
else if (ch == '/') t = TokenType.OP_DIVIDE;
else if (ch == '%') t = TokenType.OP_MOD;
else if (ch == '<') t = TokenType.OP_LESS;
else if (ch == '>') t = TokenType.OP_GREATER;
else if (ch == '=') t = TokenType.OP_ASSIGN;
return t;
}
/**
* takes the input double operator and finds the Double operator tokentype
* #param s is the double operator that is input
* #return the corresponding tokentype to the double operator
*/
public static TokenType getOp(String s) {
TokenType t = null;
if(s.equals("<=")) t = TokenType.OP_LESSEQUAL;
if(s.equals(">=")) t = TokenType.OP_GREATEREQUAL;
if(s.equals("==")) t = TokenType.OP_EQUAL;
if(s.equals("!=")) t = TokenType.OP_NOTEQUAL;
return t;
}
/**
* takes the input character and finds the Symbol tokentype
* #param ch is the current character at the index
* #return the corresponding tokentype if it equals any of the symbol characters
*/
public static TokenType getSymbol(char ch) {
TokenType t = null;
if(ch == '(') t = TokenType.LEFT_PAREN;
else if (ch == ')') t = TokenType.RIGHT_PAREN;
else if (ch == '{') t = TokenType.LEFT_BRACE;
else if (ch == '}') t = TokenType.RIGHT_BRACE;
else if (ch == '[') t = TokenType.LEFT_BRACKET;
else if (ch == ']') t = TokenType.RIGHT_BRACKET;
else if (ch == ';') t = TokenType.SEMICOLON;
else if (ch == ',') t = TokenType.COMMA;
return t;
}
/**
* takes the input string and finds the Keyword tokentype
* #param s the input string
* #return if the string equals any of the else if parameters the keyword tokentype is returned
*/
public static TokenType getKeyword(String s) {
TokenType t = null;
if(s.equals("if")) t = TokenType.KEYWORD_IF;
else if (s.equals("else")) t = TokenType.KEYWORD_ELSE;
else if (s.equals("int")) t = TokenType.KEYWORD_INT;
else if (s.equals("String")) t = TokenType.KEYWORD_STRING;
else if (s.equals("public")) t = TokenType.KEYWORD_PUBLIC;
else if (s.equals("class")) t = TokenType.KEYWORD_CLASS;
else if (s.equals("void")) t = TokenType.KEYWORD_VOID;
else if (s.equals("static")) t = TokenType.KEYWORD_STATIC;
return t;
}
/**
* takes the input string and finds the Klingon tokentype
* #param s the input string
* #return the corresponding tokentype for the string else null
*/
public static TokenType getKlingon(String s) {
TokenType t = null;
if(s.equals("rItlh")) t = TokenType.KLINGON_PRINT;
else if (s.equals("boq")) t = TokenType.KLINGON_ADD;
else if (s.equals("boqha")) t = TokenType.KLINGON_SUBTRACT;
else if (s.equals("boqegh")) t = TokenType.KLINGON_MULTIPLY;
else if (s.equals("boqHaegh")) t = TokenType.KLINGON_DIVIDE;
return t;
}
/**
* checks if the current character is a letter from a-z or A-Z
* #param ch input character
* #return if character is a letter then isLetter returns true else false
*/
public static boolean isLetter(char ch) {
if(ch >='a' && ch<='z') return true;
else if(ch >= 'A' && ch <= 'Z') return true;
else return false;
}
/**
* this is a method for including special symbols within strings and checks if any of the input characters correspond to any of the else if branches
* #param ch input character
* #return either true or false depending on the input character
*/
public static boolean isSpecialSymbol(char ch) {
if(ch == ':')return true;
else if(ch == '#')return true;
else if (ch == ',')return true;
else if (ch == '?')return true;
else if (ch == '#')return true;
else if (ch == '$')return true;
else if (ch == '£')return true;
else if (ch == '!')return true;
else if (ch == '^')return true;
else if (ch == '.')return true;
else if (ch == '~')return true;
else return false;
}
/**
* This method just checks if it is a digit at the current index
* #param ch input character
* #return either true if it is a digit or false if its not
*/
public static boolean isDigit(char ch) {
if(ch >='0' && ch<='9') return true;
else return false;
}
/**
* This checks for white space
* #param ch input character
* #return true if there is whitespace or false if there isn't
*/
public static boolean isWhiteSpace(char ch) {
if (ch == ' ')return true;
else if(ch == '\t')return true;
else return false;
}
/**
* checks for a new line or a line break
* #param ch input character
* #return true if there is a line break or false if there isn't
*/
public static boolean isLineBreak(char ch) {
if(ch == '\n') return true;
else return false;
}
/**
* reads the specified file
* #param fname the required file name
* #return the content of the file
*/
public static String readFile2String (String fname) {
String content = null;
try {
content = new String(Files.readAllBytes(Paths.get(fname)));
} catch (IOException e){
System.out.println("Fail to read a file");
}
return content;
}
/**
* this takes the input string, reads it and assigns keywords, line numbers etc to each string, letter, mathematical sign
* #param prog the input string
*/
public static void scan(String prog) {
int n = prog.length(); // n = to the length of the string being scanned
int index = 0;
int linenumber = 1;
while (index < n) { // while the current character is less than the total characters the loop will run
char ch = prog.charAt(index);
char ch_next = ' ';
char ch_next2 = ' ';
if (index < n-1) ch_next = prog.charAt(index+1);
if (index < n-2) ch_next2 = prog.charAt(index+2);
boolean blockComment;
boolean whiteSpace = isWhiteSpace(ch);
boolean newline = isLineBreak(ch);
TokenType sym = getSymbol(ch);
TokenType op = getOp(ch);
boolean letter = isLetter(ch);
boolean digit = isDigit(ch);
if (whiteSpace) { // if there is whitespace then it skips it and moves to the next character
index++;
continue;
} else if (newline) {// if there is a new line then the line number is increased by one and the index increases by one
linenumber++;
index++;
continue;
} else if(ch == '/' && ch_next == '/'){
index++;
index++;
ch = prog.charAt(index);
while(ch != '\n') {
index++;
ch = prog.charAt(index);
}
continue;
} else if(ch == '/' && ch_next == '*' && ch_next2 == '*'){
blockComment = true;
index++;
index++;
index++;
ch = prog.charAt(index);
while(blockComment) {
index++;
ch = prog.charAt(index);
if(ch == '*' && ch_next == '/') {
blockComment = false;
}
}
continue;
} else if (sym != null) { // getSymbol is called and if it doesn't return null then this is carried out
System.out.println(linenumber + ", " + sym + ", " + ch);
index++; // the index is increased and the loop is continued to the next character
continue;
} else if (op != null || ch == '!') { // if getOp(ch) doesn't return null or the ch == ! then this is carried out
String operator = ""; // string operator is made
operator += ch; // operator == the current character
index++; // index increases by one to check the next character
while (index < n) { // this while loop adds the next character onto the current character in operator
ch = prog.charAt(index);
operator += ch;
if (getOp(operator) != null) { // if the string operator doesn't return null it means its a double operator so this is carried out and the while statement ends
System.out.println(linenumber + ", " + getOp(operator) + ", " + operator);
break;
} else if (getOp(operator) == null) // if the operator does return null when put into getOp(s) then it must be a single operator and so this branch is carried out
index--; // the index is reduced by one to return it to the previous operator (the single operator)
ch = prog.charAt(index); //ch is assigned to the current character so that it equals the single operator
System.out.println(linenumber + ", " + op + ", " + ch);
break;
}
index++; // index and continue to the next character
continue;
} else if (letter) { // if the current character is a letter then this branch is executed
String word = ""; // similar to the last else if branch new string is made for the word
word += ch; // the word is built up
index++;// move onto the next character
while (index < n) { // while the current index is less than the total the loop will continue
ch = prog.charAt(index);
if (isLetter(ch) || isDigit(ch)) { // the loop takes the current letter and adds it onto the word until it hits something that isn't a letter and then stops
word += ch;
index++;
} else
break;
}
// once the word is made the word runs through the two methods getKeyword and getKlingon to find its tokentype
TokenType keyword = getKeyword(word);
TokenType klingon = getKlingon(word);
// this checks which method didn't return null and if neither klingon or keyword returned a value then it is assigned the identifier tokentype
if (keyword!= null) {
System.out.println(linenumber + ", " + keyword + ", " + word);
} else if(klingon != null) {
System.out.println(linenumber + ", " + klingon + ", " + word);
} else {
System.out.println(linenumber + ", " + TokenType.IDENTIFIER + ", " + word);
continue;
}
} else if (digit) {
// the same process as the word builder
String number = "";
number += ch;
index++;
while(index < n) {
ch = prog.charAt(index);
if(isDigit(ch)) {
number += ch;
index++;
} else break;
}
System.out.println(linenumber + ", " + TokenType.INTEGER + ", " + number);
continue;
} else if (ch == '\"') { // once a double quotation mark is encountered this string takes place
String str = "";// new string made
str += ch;// current character is added to the current string
index++;// index increases by one
while(index < n) { // this loop builds the string literal by adding characters as long as the index is less than the total string length
ch = prog.charAt(index);
if(isLetter(ch)) { // all of these branches check for different types of letters symbols, spaces and the final double quotation marks and adds them onto the string
str += ch;
index++;
} else if (isSpecialSymbol(ch)) {
str += ch;
index++;
continue;
} else if (isWhiteSpace(ch)) {
str += ch;
index++;
continue;
} else if (ch == '\"') {
str += ch;
index++;
continue;
}
else break;
}
//string is printed with the line number tokentype and the string itself
System.out.println(linenumber + ", " + TokenType.STRING + ", " + str);
} else {
index++;
continue;
}
}
}
}
To be honest i dont understand your code, so here some simple code.
My Input:
package de;
public class Test {
// Ignore this
/*
ignore this
*/
/*
* Ignore this too
*/
public void hey() {
}
}
My Code:
BufferedReader reader = new BufferedReader(new FileReader("src/test/java/de/Test.java"));
boolean currentlyInComment = false;
String line = reader.readLine();
while (line != null) {
if(line.trim().startsWith("/*")){
currentlyInComment = true;
}
if(!currentlyInComment && !line.trim().startsWith("//")){
// Do your algorithmic stuff with line
System.out.println(line);
}
if(line.trim().startsWith("*/") && currentlyInComment) {
currentlyInComment = false;
}
line = reader.readLine();
}
My Output:
package de;
public class Test {
public void hey() {
}
}
(So implement your code interpretation into this)
What does this do?:
I introduced a variable wether currently there is an comment. You have to set the boolean to false after working with the current line, because this line has to be ignored too.
Basically the programm recognizes comments by "/" and then the end of a comment by "/" and simple single line comments by "//". Every other line will be procceded.

Java Stack Evaluation from TXT file

In this assignment, I need to read a .txt file and determine if the expressions are correct or "Balanced". The first problem I got correct but for the second problem I am getting more output than I want. Here is the problem for #2:
Write a stack-based algorithm that evaluates a post-fixed expression. Your program needs to read its input from a file called “problem2.txt”. This file contains one expression per line.
For each expression output its value to the standard output. If an expression is ill-formed print “Ill-formed”.
The Problem2.txt is as follows:
3 2 + 5 6 8 2 / + + * 1 +
8 * 2 3 + + - 9 1 +
1 4 + 9 4 - * 2 *
// For my output I need to get:
76
Ill-formed
50
// With my code I am getting:
76
Ill-formatted
Ill-formatted
Ill-formatted
10
50
// and I’m not sure why I’m getting extra ill-formatted and a 10 in there
Below is my code:
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Stack;
import java.util.EmptyStackException;
public class Eval {
public static void main(String args[]) throws IOException {
//driver
try (BufferedReader filereader = new BufferedReader(new FileReader("Problem1.txt"))) {
while (true) {
String line = filereader.readLine();
if (line == null) {
break;
}
System.out.println(balancedP(line));
}
}
System.out.println("\n");
try (BufferedReader filereader2 = new BufferedReader(new FileReader("Problem2.txt"))) {
while (true) {
String line = filereader2.readLine();
if (line == null) {
break;
}
System.out.println(evaluatePostfix(line));
}
}
}
public static boolean balancedP (String s) {
Stack<Character> stackEval = new Stack<Character>();
for(int i = 0; i < s.length(); i++) {
char token = s.charAt(i);
if(token == '[' || token == '(' || token == '{' ) {
stackEval.push(token);
} else if(token == ']') {
if(stackEval.isEmpty() || stackEval.pop() != '[') {
return false;
}
} else if(token == ')') {
if(stackEval.isEmpty() || stackEval.pop() != '(') {
return false;
}
} else if(token == '}') {
if(stackEval.isEmpty() || stackEval.pop() != '{') {
return false;
}
}
}
return stackEval.isEmpty();
}
//problem 2 algo to evaluate a post-fixed expression
static int evaluatePostfix(String exp) throws EmptyStackException
{
Stack<Integer> stackEval2 = new Stack<>();
for(int i = 0; i < exp.length(); i++)
{
char c = exp.charAt(i);
if(c == ' ')
continue;
else if(Character.isDigit(c)) {
int n = 0;
while(Character.isDigit(c)) {
n = n*10 + (int)(c-'0');
i++;
c = exp.charAt(i);
}
i--;
stackEval2.push(n);
}
else {
try {
//if operand pops two values to do the calculation through the switch statement
int val1 = stackEval2.pop();
int val2 = stackEval2.pop();
//operands in a switch to test and do the operator's function each value grabbed and tested
switch(c) {
case '+':
stackEval2.push(val2 + val1);
break;
case '-':
stackEval2.push(val2 - val1);
break;
case '/':
stackEval2.push(val2 / val1);
break;
case '*':
stackEval2.push(val2 * val1);
break;
}
} catch (EmptyStackException e) {
System.out.println("Ill-formatted");
}
}
}
return stackEval2.pop();
}
}
A simple way to have the output formatted how you want is to just put the try-catch block around where you are calling the evaluatePostfix() method (make sure to delete the try-catch block that is inside the evaluatePostfix() method):
System.out.println("\n");
try (BufferedReader filereader2 = new BufferedReader(new FileReader("Problem2.txt"))) {
while (true) {
String line = filereader2.readLine();
if (line == null) {
break;
}
try {
System.out.println(evaluatePostfix(line));
} catch (EmptyStackException e) {
System.out.println("Ill-formatted");
}
}
}
This way, when an exception occurs inside the evaluatePostfix() method, the method will throw the exception and the exception will be dealt with outside of the looping, thus, avoiding duplicate error messages and other unwanted effects.

Connecting a wrapper class to another class

So I have this wrapper program that enables me to return two quantities from a method.
** Wrapper Class**
public class Words
{
private String leftWords;
private String rightWords;
public Words(String leftWords, String rightWords) {
this.leftWords = leftWords;
this.rightWords = rightWords;
}
public String getLeftWords() {
return leftWords;
}
public String getRightWords() {
return rightWords;
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result
+ ((leftWords == null) ? 0 : leftWords.hashCode());
result = prime * result
+ ((rightWords == null) ? 0 : rightWords.hashCode());
return result;
}
#Override
public boolean equals(Object obj)
{
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Words other = (Words) obj;
if (leftWords == null)
{
if (other.leftWords != null)
return false;
}
else if (!leftWords.equals(other.leftWords))
return false;
if (rightWords == null)
{
if (other.rightWords != null)
return false;
}
else if (!rightWords.equals(other.rightWords))
return false;
return true;
}
}
The method I want to tie this with is :
private static Map <Set<String>,Set<Words>> getLeftRightWords(LinkedHashMap<Set<String>,Set<Integer>> nnpIndexTokens, NLChunk chunk) throws FileNotFoundException
{
// Map <Set<String>,Set<Integer>> nnpMap = new LinkedHashMap<Set<String>, Set<Integer>>();
Map <Set<String>,Set<Words>> contextMap = new LinkedHashMap<Set<String>, Set<Words>>();
Set<Words> leftRightWords = new HashSet<Words>();
//for(NLChunk chunk : sentence.getChunks()){
if(chunk.getStrPostags().contains("NNP")){
String leftWords = "";
String rightWords = "";
int chunkStartIndex = chunk.getStartIndex();
int chunkEndIndex = chunk.getEndIndex();
//nnpMap = getNNPs(chunk);
String previous = null;
int previousNnpEndIndex = 0;
int previousNnpStartIndex = 0;
for (Map.Entry<Set<String>, Set<Integer>> entry : nnpIndexTokens.entrySet()){
for (Iterator<String> i = entry.getKey().iterator(); i.hasNext();){
Set<Integer> entryIndex = null;
int nnpStartIndex = 0;
int nnpEndIndex = 0;
String currentElement = i.next();
//Deriving values for beginning and ending of chunk
//and beginning and ending of NNP
if (!(entry.getValue().isEmpty())){
if (currentElement.trim().split(" ").length > 1){
entryIndex = entry.getValue();
nnpStartIndex = entryIndex.iterator().next();
nnpEndIndex = getLastElement(entryIndex);
}
else {
entryIndex = entry.getValue();
nnpStartIndex = entryIndex.iterator().next();
nnpEndIndex = nnpStartIndex;
}
}
if(!(chunkStartIndex<=nnpStartIndex && chunkEndIndex>=nnpEndIndex)){
continue;
}
//Extracting LEFT WORDS of the NNP
//1)If another NNP is present in left words, left words of current NNP start from end index of previous NNP
if (previous != null && chunk.toString().substring(chunkStartIndex, nnpStartIndex).contains(previous)){
int leftWordsEndIndex = nnpStartIndex;
int leftWordsStartIndex = previousNnpEndIndex;
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=leftWordsStartIndex
&& nlword.getIndex()<leftWordsEndIndex )
leftWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
//2) If no left words are present
if (chunkStartIndex == nnpStartIndex){
System.out.println("NO LEFT WORDS");
}
//3) Normal case where left words consist of all the words left of the NNP starting from the beginning of the chunk
else {
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=chunkStartIndex
&& nlword.getIndex()<nnpStartIndex )
leftWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
//Extracting RIGHT WORDS of NNP
if (entry.getKey().iterator().hasNext()){// entry.getKey().iterator().hasNext()){
String nextElement = entry.getKey().iterator().next();
//1)If another NNP is present in right words, right words of current NNP start from end index of current NNP to beginning of next NNP
if (nextElement !=null && nextElement != currentElement && chunk.toString().substring(entry.getValue().iterator().next(), chunkEndIndex).contains(nextElement)){
int rightWordsStartIndex = entryIndex.iterator().next();
int rightWordsEndIndex = entry.getValue().iterator().next();
//String rightWord="";
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=rightWordsStartIndex
&& nlword.getIndex()<rightWordsEndIndex )
rightWords+=nlword.getToken() +" ";
}
System.out.println("LEFT WORDS:" + leftWords+ "OF:"+ currentElement);
}
}
//2) If no right words exist
if(nnpEndIndex == chunkEndIndex){
System.out.println("NO RIGHT WORDS");
//continue;
}
//3) Normal case where right words consist of all the words right of the NNP starting from the end of the NNP till the end of the chunk
else {
for (NLWord nlword : chunk.getTokens())
{
if(nlword.getIndex()>=nnpEndIndex+1
&& nlword.getIndex()<=chunkEndIndex )
rightWords+=nlword.getToken() +" ";
}
System.out.println("RIGHT WORDS:" + rightWords+ "OF:"+ currentElement);
}
if (previous == null){
previous = currentElement;
previousNnpStartIndex = nnpStartIndex;
previousNnpEndIndex = nnpEndIndex;
}
Words contextWords = new Words(leftWords.toString(), rightWords.toString());
leftRightWords.add(contextWords);
}
contextMap.put(entry.getKey(), leftRightWords);
}//nnps set
}
System.out.println(contextMap);
return contextMap;
}
As you can see what I am trying to do in this method is taking a proper noun and extracting the left and right words of that proper noun.E.g for a chunk "fellow Rhode Island solution provider" my output is:
LEFT WORDS:fellow OF:Rhode Island
RIGHT WORDS:solution provider OF:Rhode Island
Now I want to put these in a map where Rhode Island is the key and the values for this are solution provider and fellow.
When I try to print this map the output get is:
{[Rhode Island ]=[com.gyan.siapp.nlp.test.Words#681330f0]}
How do i get the right output?
I don't know if it is the only issue but your class Words does not override
toString() method.
Not sure about your Java skill level. So sorry if im posting what you are familiar to.
System.out.println(...) calls toString() method to get message for the object.
By overriding default with your own implementation
#Override
public String toString(){
return "leftWords: "+leftWords+", rightWords: "+rightWords;
}
You change com.gyan.siapp.nlp.test.Words#681330f0 to your own output.

Creating a Lexer that implements Iterator<Lexeme>

I'm new to programming and I have an assignment that asks us to make a Lexer that implements Iterator (in java). We are given Strings of equations with variable white spaces and we have to produce Lexemes for a variety of types. This is what I have so far but when I run it, I get an OutOfMemoryError: Java heap space. I have no idea what is causing this error or if I am even on the right track with my code. Any suggestions?
Thanks
public enum LexemeType {
LEFT_PAREN, RIGHT_PAREN, OPERATOR, NUMBER, VARIABLE, EQUALS, SEMICOLON, USER_INPUT;
}
import java.io.*;
import java.util.*;
public class Lexer implements Iterator<Lexeme> {
Lexeme token = null; // last token recognized
boolean eof = false; // reached end of file
private Reader reader = null; // input stream
private int lookahead = 0; // lookahead, if any
private int[] buffer = new int[100]; // lexeme buffer
private int index = 0; // length of lexeme
public Lexer(String toLex) {
this.reader = new StringReader(toLex);
}
// Extract lexeme from buffer
public String getLexeme() {
return new String(buffer,0,index);
}
// Reset state to begin new token
private void reset() throws IOException {
if (eof)
throw new IllegalStateException();
index = 0;
token = null;
if (lookahead==0)
read();
}
// Read one more char.
// Add previous char, if any, to the buffer.
private void read() throws IOException {
if (eof)
throw new IllegalStateException();
if (lookahead != 0) {
buffer[index] = lookahead;
index++;
}
lookahead = reader.read();
}
// Recognize a token
public void lex() throws IOException {
reset();
// Skip whitespace
while (Character.isWhitespace(lookahead)) {
read();
}
reset();
// Recognize (
if (lookahead == '(') {
token = new Lexeme(LexemeType.LEFT_PAREN, "(");
read();
return;
}
// Recognize )
if (lookahead == ')') {
token = new Lexeme(LexemeType.RIGHT_PAREN, "(");
read();
return;
}
// Recognize ;
if (lookahead == ';') {
token = new Lexeme(LexemeType.SEMICOLON, ";");
read();
return;
}
// Recognize =
if (lookahead == '=') {
token = new Lexeme(LexemeType.EQUALS, "=");
read();
return;
}
// Recognize ?
if (lookahead == '?') {
token = new Lexeme(LexemeType.USER_INPUT, "?");
read();
return;
}
// Recognize float
if (Character.isDigit(lookahead)) {
do {
read();
} while (Character.isDigit(lookahead));
if (lookahead=='.') {
read();
while (Character.isDigit(lookahead))
read();
}
token = new Lexeme(LexemeType.NUMBER, ("-?[0-9]+"));
return;
}
// Recognize string
if (lookahead=='"') {
do {
read();
} while (lookahead!='"');
read();
token = new Lexeme(LexemeType.VARIABLE, ("^[a-zA-Z]*$"));
return;
}
}
#Override
public boolean hasNext() {
if (token!=null)
return true;
if (eof)
return false;
try {
lex();
} catch (IOException e) {
}
return true;
}
#Override
public Lexeme next() {
if (hasNext()) {
Lexeme result = token;
token = null;
return result;
}
else
throw new IllegalStateException();
}
}

java homework(recursion)

This is the question:
Problem I.
We define the Pestaina strings as follows:
ab is a Pestaina string.
cbac is a Pestaina string.
If S is a Pestaina string, so is SaS.
If U and V are Pestaina strings, so is UbV.
Here a, b, c are constants and S,U,V are variables. In these rules,
the same letter represents the same string. So, if S = ab, rule 3
tells us that abaab is a Pestaina string. In rule 4, U and V represent
Grandpa strings, but they may be different.
Write the method
public static boolean isPestaina(String in)
That returns true if in is a Pestaina string and false otherwise.
And this is what i have so far which only works for the first rule, but the are some cases in which doesnt work for example "abaaab":
public class Main {
private static boolean bool = true;
public static void main(String[] args){
String pestaina = "abaaab";
System.out.println(pestaina+" "+pestainaString(pestaina));
}
public static boolean pestainaString(String p){
if(p == null || p.length() == 0 || p.length() == 3) {
return false;
}
if(p.equals("ab")) {
return true;
}
if(p.startsWith("ab")){
bool = pestainaString(p, 1);
}else{
bool = false;
}
return bool;
}
public static boolean pestainaString(String p, int sign){
String letter;
char concat;
if("".equals(p)){
return false;
}
if(p.length() < 3){
letter = p;
concat = ' ';
p = "";
pestainaString(p);
}else if(p.length() == 3 && (!"ab".equals(p.substring(0, 2)) || p.charAt(2) != 'a')){
letter = p.substring(0, 2);
concat = p.charAt(2);
p = "";
pestainaString(p);
}else{
letter = p.substring(0, 2);
concat = p.charAt(2);
pestainaString(p.substring(3));
}
if(letter.length() == 2 && concat == ' '){
if(!"ab".equals(letter.trim())){
bool = false;
//concat = 'a';
}
}else if((!"ab".equals(letter)) || (concat != 'a')){
bool = false;
}
System.out.println(letter +" " + concat);
return bool;
}
}
Please tell me what i have done wrong.
I found the problem i was calling the wrong method.
You are describing a Context Free Language, which can be described as a Context Free Grammer and parsed with it. The field of parsing these is widely researched and there is a lot of resources for it out there.
The wikipedia page also discusses some algorithms to parse these, specifically - I think you are interested in the Early Parsers
I also believe this "language" can be parsed using a push down automaton (though not 100% sure about it).
public static void main(String[] args) {
// TODO code application logic here
String text = "cbacacbac";
System.out.println("Is \""+ text +"\" a Pestaina string? " + isPestaina(text));
}
public static boolean isPestaina(String in) {
if (in.equals("ab")) {
return true;
}
if (in.equals("cbac")) {
return true;
}
if (in.length() > 3) {
if ((in.startsWith("ab") || in.startsWith("cbac"))
&& (in.endsWith("ab") || in.endsWith("cbac"))) {
return true;
}
}
return false;
}
That was fun.
public boolean isPestaina(String p) {
Set<String> existingPestainas = new HashSet<String>(Arrays.asList(new String[]{"ab", "cbac"}));
boolean isP = false;
int lengthParsed = 0;
do {
if (lengthParsed > 0) {
//just realized there's a touch more to do here for the a/b
//connecting rules...I'll leave it as an excersize for the readers.
if (p.substring(lengthParsed).startsWith("a") ||
p.substring(lengthParsed).startsWith("b")) {
//good connector.
lengthParsed++;
} else {
//bad connector;
return false;
}
}
for (String existingP : existingPestainas) {
if (p.substring(lengthParsed).startsWith(existingP)) {
isP = true;
lengthParsed += existingP.length();
}
}
if (isP) {
System.err.println("Adding pestaina: " + p.substring(0, lengthParsed));
existingPestainas.add(p.substring(0, lengthParsed));
}
} while (isP && p.length() >= lengthParsed + 1);
return isP;
}

Categories