Regular expression for validating an answer to a question - java

Hey everyone,
I'm having a minor difficulty setting up a regular expression that evaluates a sentence entered by a user in a textbox to keyword(s). Essentially, the keywords have to be entered consecutive from one to the other and can have any number of characters or spaces before, between, and after (ie. if the keywords are "crow" and "feet", crow must be somewhere in the sentence before feet. So with that in mind, this statement should be valid "blah blah sccui crow dsj feet "). The characters and to some extent, the spaces (i would like the keywords to have at least one space buffer in the beginning and end) are completely optional, the main concern is whether the keywords were entered in their proper order.
So far, I was able to have my regular expression work in a sentence but failed to work if the answer itself was entered only.
I have the regular expression used in the function below:
// Comparing an answer with the right solution
protected boolean checkAnswer(String a, String s) {
boolean result = false;
//Used to determine if the solution is more than one word
String temp[] = s.split(" ");
//If only one word or letter
if(temp.length == 1)
{
if (s.length() == 1) {
// check multiple choice questions
if (a.equalsIgnoreCase(s)) result = true;
else result = false;
}
else {
// check short answer questions
if ((a.toLowerCase()).matches(".*?\\s*?" + s.toLowerCase() + "\\s*?.*?")) result = true;
else result = false;
}
}
else
{
int count = temp.length;
//Regular expression used to
String regex=".*?\\s*?";
for(int i = 0; i<count;i++)
regex+=temp[i].toLowerCase()+"\\s*?.*?";
//regex+=".*?";
System.out.println(regex);
if ((a.toLowerCase()).matches(regex)) result = true;
else result = false;
}
return result;
Any help would greatly be appreciated.
Thanks.

I would go about this in a different way. Instead of trying to use one regular expression, why not use something similar to:
String answer = ... // get the user's answer
if( answer.indexOf("crow") < answer.indexOf("feet") ) {
// "correct" answer
}
You'll still need to tokenize the words in the correct answer, then check in a loop to see if the index of each word is less than the index of the following word.

I don't think you need to split the result on " ".
If I understand correctly, you should be able to do something like
String regex="^.*crow.*\\s+.*feet.*"
The problem with the above is that it will match "feetcrow feetcrow".
Maybe something like
String regex="^.*\\s+crow.*\\s+feet\\s+.*"
That will enforce that the word is there as opposed to just in a random block of characters.

Depending on the complexity Bill's answer might be the fastest solution. If you'd prefer a regular expression, I wouldn't look for any spaces, but word boundaries instead. That way you won't have to handle commas, dots, etc. as well:
String regex = "\\bcrow(?:\\b.*\\b)?feet\\b"
This should match "crow bla feet" as well as "crowfeet" and "crow, feet".
Having to match multiple words in a specific order you could just join them together using '(?:\b.*\b)?' without requiring any additional sorting or checking.

Following Bill answer, I'd try this:
String input = // get user input
String[] tokens = input.split(" ");
String key1 = "crow";
String key2 = "feet";
String[] tokens = input.split(" ");
List<String> list = Arrays.asList(tokens);
return list.indexOf(key1) < list.indexOf(key2)

Related

Reading a file -- pairing a String and int value -- with multiple split lines

I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)
If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}
I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.
simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui

Why I cannot get the string without tokens with the program I have written?

Scanner scan = new Scanner(System.in);
String s = scan.nextLine();
Queue q=new LinkedList();
for(int i=0;i<s.length();i++){
int x=(int)s.charAt(i);
if(x<65 || (x>90 && x<97) || x>122) {
q.add(s.charAt(i));
}
}
System.out.println(q.peek());
String redex="";
while(!q.isEmpty()) {
redex+=q.remove();
}
String[] x=s.split(redex,-1);
for(String y:x) {
if(y!=null)
System.out.println(y);
}
scan.close();
I am trying to print the string "my name is NLP and I, so, works:fine;"yes"." without tokens such as {[]}+-_)*&%$ but it just prints out all the String as it is, and I don't understand the problem?
This is 3 answers in one:
For your initial problem
For a solution without regex
For a correct use of Scanner (this is up to you).
First
When you use a regex build from whatever character you got under the hand, you should quote it:
String[] x=s.split(Pattern.quote(redex),-1);
That would be the usual problem, but the second problem is that you are building a regexp range but you are omitting the [] making the range, so it can work as is:
String[] x=s.split("[" + Pattern.quote(redex) + "]",-1);
This one may work, but may fail if Pattern.quote don't quote - and - is found in between two characters making a range such as : $-!.
This would means: character in range starting at $ from !. It may fail if the range is invalid and my example may be invalid ($ may be after !).
Finally, you may use:
String redex = q.stream()
.map(Pattern::quote)
.collect(Collectors.joining("|"));
This regexp should match the unwanted character.
Second:
For the rest, the other answer point out another problem: you are not using the Character.isXXX method to check for valid characters.
Firstly, be wary that some method does not use char but code points. For example, isAlphabetic use code points. A code points is simply a representation of a character in a multibyte encoding. There some unicode character which take two char.
Secondly, I think your problem lies in the fact you are not using the right tool to split your words.
In pseudo code, this should be:
List<String> words = new ArrayList<>();
int offset = 0;
for (int i = 0, n = line.length(); i < n; ++i) {
// if the character fail to match, then we switched from word to non word
if (!Character.isLetterOrDigit(line.charAt(i)) {
if (offset != i) {
words.add(line.substring(offset, i));
}
offset = i + 1; // next char
}
}
if (offset != line.length()) {
words.add(line.substring(offset));
}
This would:
- Find transition from word to non word and change offset (where we started)
- Add word to the list
- Add the last token as ending word.
Last
Alternatively, you may also play with Scanner class since it allows you to input a custom delimiter for its hasNext(): https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
I quote the class javadoc:
The scanner can also use delimiters other than whitespace. This
example reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
As you guessed, you may pass on any delimiter and then use hasNext() and next() to get only valid words.
For example, using [^a-zA-Z0-9] would split on each non alpha/digit transition.
As noted in the comment, the condition x<65 will catch all sorts of special characters you're not interested in. Using Character's built-in methods will help you write this condition in a clearer, bug-free way:
x = s.charAt(i);
if (Character.isLetter(x) || Character.isWhiteSpace(x)) {
q.add(x);
}

match exact the same words between 2 strings

I would like to compare and match exactly one word (characters and length) between two strings.
This is what I have:
String wordCompare = "eagle:1,3:7;6\nBasils,45673:ewwsk\nlola:flower:1:2:b";
String lolo = scanner.nextLine();
if ( motCompare.toLowerCase().indexOf(lolo.toLowerCase()) != -1 ) {
System.out.println("Bingo !!!");
} else {
System.out.println("not found !!!");
}
If I type eagle:1,3:7;6 it should display Bingo !!!
If I type eagle:1,3 it still displays Bingo !!! which is wrong, it should display Not found.
If I type eagle:1,3:7;6 Basils,45673:ewwsk or eagle:1,3:7;6\nBasils,45673:ewwsk it should also display Not Found. Length of the typed word should be acknowledged between \n.
If I type Basils,45673:ewwsk, it displays bingo !!!
It looks like what you're wanting is an exact match, with the words being split by the newline character. With that assumption in mind, I would recommend splitting the string out into an array and then loading that into a HashSet like so:
boolean search(String wordDictionary, String search){
String[] options = wordDictionary.split("\n");
HashSet<String> searchSet = new HashSet<String>(Arrays.asList(options));
return searchSet.contains(search);
}
If the search function returns true, it has found whatever word you're searching for, if not, it hasn't.
Installing it in your code will look something like this:
String wordCompare = "eagle:1,3:7;6\nBasils,45673:ewwsk\nlola:flower:1:2:b";
String lolo = scanner.nextLine();
if(search(wordCompare, lolo))
System.out.println("Bingo!!!");
else
System.out.println("Not found.");
(For the record, you'd probably be better off with more clear variable names)
As #Grey has already mentioned within his answer, since you have a newline tag (\n) between your phrases you can Split the String using the String.split() method into a String Array and then compare the elements of that Array for equality with what the User supplies.
The code below is just another example of how this can be done. It also allows for the option to Ignore Letter case:
boolean ignoreCase = false;
String userString = "Basils,45673:ewwsk";
String isInString = "'" + userString + "' Was Not Found !!!";
String wordCompare = "eagle:1,3:7;6\nBasils,45673:ewwsk\nlola:flower:1:2:b";
String[] tmp = wordCompare.split("\n");
for (int i = 0; i < tmp.length; i++) {
// Ternary used for whether or not to ignore letter case.
if (!ignoreCase ? tmp[i].trim().equals(userString) :
tmp[i].trim().equalsIgnoreCase(userString)) {
isInString = "Bingo !!!";
break;
}
}
System.out.println(isInString);
Thank you,
The thing is I am not allowed to use regular expression nor tables.
so basing on your suggestions I made this code :
motCompare.toLowerCase().indexOf(lolo.toLowerCase(), ' ' ) != -1 ||
motCompare.toLowerCase().lastIndexOf(lolo.toLowerCase(),' ' ) != -1)
as a condition for a do while loop.
Could you please confirm if it is correct ?
Thank you.

Java: Removing duplicate words & substrings of words in java

Recently i have come up against a question which i am not able to tackle in school.
I need to remove duplicate words in an input string which consists of words. The main issue here is that the requirement states that i cannot use arrays or regular expressions.
E.g.
userInput = "this is a test testing is fun really fun"
the first "is" is a duplicate of "this" as it is a substring
the second "is" is a duplicate of the first "is"
"testing" is not a duplicate of "test" as it is not an exact match
therefore the output comes out as - "this a test testing fun really"
How would one actually achieve this without using Arrays or Regular Expressions as it is impossible to split the words up by the white spaces and dynamically create a String in java.
I didn't compile this code, but I think it should works.
Let me know if it can help you to solved your problem.
public String solve(String input) {
String ret = "";
int pos = 0;
while(pos<input.length()) {
// find next position of space
int next = input.indexOf(' ',pos);
// space not exists, skip next to end of string
if(next==-1) next = input.length();
// take 1 word from input
String word = input.substring(pos,next);
// check if word exists in previous result
if(ret.indexOf(word)==-1) {
if(ret.length() > 0) ret += " ";
// append word to ret
ret += word;
}
pos = next + 1;
}
return ret;
}

Dynamic AND conditions in an if statement

I have an input String that contains a couple of search terms to find a line in a text including all of the search terms.
For example:
String searchTerms = "java stackoverflow conditions";
String [] splittedTerm = searchTerms.split(" ");
The search terms are AND connective:
if (textLine.contains(splittedTerm[0] && textLine.contains(splittedTerm[1]) && textLine.contains(splittedTerm[2])) start=true;
But the number of search terms is dynamic, it alwayse depends on the users request...
So is there any possibility to use the if statement depending on the number of search terms?
You need to loop through the String[] that you get after splitiing the String :-
First add all the elements you want to compare in an array, then iterate and compare through first array and array returned from split(). Make sure both arrays are of equal length
boolean flag=true;
String searchTerms = "java stackoverflow conditions hello test";
String [] splittedTerm = searchTerms.split(" ");
for(int i=0;i<splittedTerm.length;i++){
if (!(textLine[i].equals(splittedTerm[i]))){ //textLine is the array containing String literals you want to compare.
flag=false;
}
}
start=flag;
you could do a loop which iterates over all search terms. If any missmatch is found, set a flag and break the loop. Below the loop you can then check the flag, if it's still true all of your search terms matched.
boolean flag = true;
for (String searchTerm : splittedTerm){
if (!stringToSearch.contains(searchTerm) {
flag = false;
break;
}
}
if (flag)
all terms matched
else
one or more terms did not match

Categories