Why this simple regex/Scanner is not working?

Why this simple regex/Scanner is not working? - java

I have to resolve the next exercise:
Create a keyboard scanner in which the tokens are unsigned integers, and write the code to determine the sum of the integers.
Note: -5 will be scanned as the unsigned integer 5, and the minus sign will be skipped over as a delimiter
this is my solution (not working) :
import java.util.*;
public class testing{
public static void main( String[] argv ){
testing p = new testing();
}
public testing(){
Scanner myScanner = new Scanner(System.in);
String regex = "\\s+|-";
myScanner.useDelimiter(regex);
int sum = 0;
while(myScanner.hasNextInt()){
int partial = myScanner.nextInt();
sum += partial;
}
System.out.println(sum);
}
}
The problem with my solution is that it is working only for positive integers or negative integer (only first input) added to positive integers.
for example:
-2
3
4
f (used to stop the program)
will retrieve 9.
but
3
-2
will stop the program and retrieve 3
I am trying to understand the reason of this behavior but not luck so far.

You are using an OR in the regex, which means it just consumes the line break and then stops at the '-', because it's not consumed as a delimiter any more.
The input your scanner sees is:
-2\n3\n4\nf
'f' is the first thing after an integer that does not match your pattern of whitespaces OR one minus.
The second pattern though requires both to be matched vs your delimiter:
3\n-2
So the line break (white space) is matched versus your delimiter pattern and the minus remains unmatched. Since it's not an Integer .hasNextInt() returns false.
A repeatable group of whitespaces and minus works as you intend:
final String regex = "[\\s-]+";
This might end up accepting more than your exercise requires, like multiple minus signs in front of a number. If that is not acceptable, you can of course limit the minus to one occurance after an indeterminate amount of whitespaces:
final String regex = "[\\s]*-?";
The '?' means "once or not at all" and limits the occurances of minuses.
Edit:
As #maraca pointed out in a comment, my previously proposed solution only works for single digit numbers, as the pattern accepts the empty string as well. The solution that works for numbers >9 as well is:
final String regex = "\\s+|\\s*-";
What it does is consume either one or more whitespaces OR zero or more whitespaces followed by a minus sign.
Turns out even small things like these can be rather difficult to do right. -.-

You can solve by catching InputMismatchException like this:
public Testing(){
Scanner myScanner = new Scanner(System.in);
String regex = "\\s+|-";
myScanner.useDelimiter(regex);
int sum = 0;
while(myScanner.hasNext()){
int partial = 0;
try{
partial = myScanner.nextInt();
} catch(InputMismatchException e){
continue;
}
sum += partial;
}
System.out.println(sum);
}
The issue is you are trying to parse a int but you receive a "-".

Related

Reading a file -- pairing a String and int value -- with multiple split lines

I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)

If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}

I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.

simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui

Why I cannot get the string without tokens with the program I have written?

Scanner scan = new Scanner(System.in);
String s = scan.nextLine();
Queue q=new LinkedList();
for(int i=0;i<s.length();i++){
int x=(int)s.charAt(i);
if(x<65 || (x>90 && x<97) || x>122) {
q.add(s.charAt(i));
}
}
System.out.println(q.peek());
String redex="";
while(!q.isEmpty()) {
redex+=q.remove();
}
String[] x=s.split(redex,-1);
for(String y:x) {
if(y!=null)
System.out.println(y);
}
scan.close();
I am trying to print the string "my name is NLP and I, so, works:fine;"yes"." without tokens such as {[]}+-_)*&%$ but it just prints out all the String as it is, and I don't understand the problem?

This is 3 answers in one:
For your initial problem
For a solution without regex
For a correct use of Scanner (this is up to you).
First
When you use a regex build from whatever character you got under the hand, you should quote it:
String[] x=s.split(Pattern.quote(redex),-1);
That would be the usual problem, but the second problem is that you are building a regexp range but you are omitting the [] making the range, so it can work as is:
String[] x=s.split("[" + Pattern.quote(redex) + "]",-1);
This one may work, but may fail if Pattern.quote don't quote - and - is found in between two characters making a range such as : $-!.
This would means: character in range starting at $ from !. It may fail if the range is invalid and my example may be invalid ($ may be after !).
Finally, you may use:
String redex = q.stream()
.map(Pattern::quote)
.collect(Collectors.joining("|"));
This regexp should match the unwanted character.
Second:
For the rest, the other answer point out another problem: you are not using the Character.isXXX method to check for valid characters.
Firstly, be wary that some method does not use char but code points. For example, isAlphabetic use code points. A code points is simply a representation of a character in a multibyte encoding. There some unicode character which take two char.
Secondly, I think your problem lies in the fact you are not using the right tool to split your words.
In pseudo code, this should be:
List<String> words = new ArrayList<>();
int offset = 0;
for (int i = 0, n = line.length(); i < n; ++i) {
// if the character fail to match, then we switched from word to non word
if (!Character.isLetterOrDigit(line.charAt(i)) {
if (offset != i) {
words.add(line.substring(offset, i));
}
offset = i + 1; // next char
}
}
if (offset != line.length()) {
words.add(line.substring(offset));
}
This would:
- Find transition from word to non word and change offset (where we started)
- Add word to the list
- Add the last token as ending word.
Last
Alternatively, you may also play with Scanner class since it allows you to input a custom delimiter for its hasNext(): https://docs.oracle.com/javase/7/docs/api/java/util/Scanner.html
I quote the class javadoc:
The scanner can also use delimiters other than whitespace. This
example reads several items in from a string:
String input = "1 fish 2 fish red fish blue fish";
Scanner s = new Scanner(input).useDelimiter("\\s*fish\\s*");
System.out.println(s.nextInt());
System.out.println(s.nextInt());
System.out.println(s.next());
System.out.println(s.next());
s.close();
As you guessed, you may pass on any delimiter and then use hasNext() and next() to get only valid words.
For example, using [^a-zA-Z0-9] would split on each non alpha/digit transition.

As noted in the comment, the condition x<65 will catch all sorts of special characters you're not interested in. Using Character's built-in methods will help you write this condition in a clearer, bug-free way:
x = s.charAt(i);
if (Character.isLetter(x) || Character.isWhiteSpace(x)) {
q.add(x);
}

How to add a space at the end of every sentence?

I have couple of sentence that I process in my Android app. At the end of every sentence, I need to add an extra white space. I tried below.
bodyText=body.replaceAll("\\.",". ");
This did work, until I find dots in between sentences. For an example, If there is a sentence with a decimal number, then the above code added a space to that number too. Check the below example, where I applied the above code and it did not work as expected.
Last year the overall pass percentage was 90. 95%. It was 96. 21% in 2016.
You can see how the decimal places are separated by a space.
How can I add a space only at the sentence end? Normally every sentence end will contain a full stop.

You can get result of your own code, like below
public static String modifySentence(String input) {
StringBuilder sb = new StringBuilder(input);
// Counter which will increase with every insertion of char in StringBuilder
int insertCounter = 1;
int index = input.indexOf(".");
// If index is not of last digit of input, or not a digit or not a space.
// In all above cases we need to skip
while (index >= 0) {
if ((index + 1 < input.length())
&& (!Character.isDigit(input.charAt(index + 1)))
&& (!Character.isSpaceChar(input.charAt(index + 1)))) {
sb.insert(index + insertCounter, " ");
insertCounter++;
}
index = input.indexOf(".", index + 1);
}
return sb.toString();
}
Input is like
System.out.println(modifySentence("Last year the overall pass percentage was 90.95%.It was 96.21% in 2016."));
System.out.println(modifySentence("Last year the overall pass percentage was 90.95%.It was 96.21% in 2016. And this is extra . test string"));
And output is
Last year the overall pass percentage was 90.95%. It was 96.21% in 2016.
Last year the overall pass percentage was 90.95%. It was 96.21% in 2016. And this is extra . test string
As wiktor-stribiżew commented, this same result can be achieved using your_string.replaceAll("\\.([^\\d\\s])", ". $1");. Or you can use your_string.replaceAll("\\.(?<!\\d\\.\\d)(\\S)", ". $1"), it will handle the case like if a number starts right after dot.
If you have any confusion regarding these regexes, you can ask directly (by mentioning him in comment) to wiktor-stribiżew. These regex credit goes to him.

I dont know this is correct way or not but you can check for latter after dot(.) if it is capital (Uppercase latter), then you can consider the end of this statment and add one space. expect if your statment is started from lowercase later you can not use this.
but it is difficult to check the first letter is capital or not.
but you can also do it with
String first = myString.substring(0,1);
myString should coming after dot(.), and it should not start with any number.

If you want to add additional space to a sentence that already has a space after period, you can do the following:
String sentence = "Last year the overall pass percentage was 90.95%. It was 96.21% in 2016.";
sentence = sentence.replaceAll("\\. ",". ");
But if you need to add space to sentences that are not separated by space after period, do the following:
import java.util.regex.*;
public class MyClass {
public static void main(String args[]) {
String sentence = "Last year the overall pass percentage was 90.95%.It was 96.21% in 2016.example.";
String[] sentenceArr=sentence.split("\\.");
String str = "";
for(int i = 0; i < sentenceArr.length; i++) {
if(Pattern.matches(".*\\d+",sentenceArr[i]) && Pattern.matches("\\d+.*",sentenceArr[i+1])){
str=str+sentenceArr[i]+".";
}
else{
str=str+sentenceArr[i]+". ";
}
}
System.out.println(str);
}
}
Input: Last year the overall pass percentage was 90.95%.It was 96.21% in 2016.example
Output: Last year the overall pass percentage was 90.95%. It was 96.21% in 2016. example.

Java - How to create a substring until a non-numerical character is reached?

The title speaks for itself. I'm trying to create a calculator that integrates polynomial functions using basic coding, not just whipping out a math operator to do it for me :). I haven't had to go far until I hit a wall, as I'm unable to find a way to: create a substring of the numbers in the original string until a non-numerical character is reached. i.e. if the string is 123x, I want to create a substring of 123, without the 'x'. Here is what I've got so far:
public static void indefinite()
{
int x = 0;
System.out.print("Enter your function to integrate:\n F ");
Scanner input = new Scanner(System.in);
String function = input.nextLine();
String s1 = "";
for (int i = 0; i < function.length(); i++)
{
s1 = s1 + function.substring(x, i+1);
x = i+1;
}
}
It all looks a bit nonsensical, but basically, if the string 'function' is 32x^4, I want the substring to be 32. I'll figure out the rest myself, but this part I cant seem to do.
p.s. i know the for loop's repetition variable is wrong, it shouldn't repeat until the end of the string if I'm looking at functions with more than just 2x^3. I haven't gotten around trying to figure that out yet, so I just made sure it does it for 1 part.

Use replaceAll() to "extract" it:
String number = str.replaceAll("\\D.*", "");
This replaces the first non digit and everything after it with nothing (effectively deleting it), leaving you with just the number.
You can also go directly to a numeric primitive, without having to use a String variable if you prefer (like me) to have less code:
int number = Integer.parseInt(str.replaceAll("\\D.*", ""));

You could split your string at the letter-digit marks, like so:
str.split("(?<=\\D)(?=\\d)|(?<=\\d)(?=\\D)");
For instance, "123x54y7z" will return [123, x, 54, y, 7, z]

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.

If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.

Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.

Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Why this simple regex/Scanner is not working? - java

Related

Reading a file -- pairing a String and int value -- with multiple split lines

Why I cannot get the string without tokens with the program I have written?

How to add a space at the end of every sentence?

Java - How to create a substring until a non-numerical character is reached?

Finding the index of a permutation within a string

Categories

Resources