How to add a space at the end of every sentence? - java

I have couple of sentence that I process in my Android app. At the end of every sentence, I need to add an extra white space. I tried below.
bodyText=body.replaceAll("\\.",". ");
This did work, until I find dots in between sentences. For an example, If there is a sentence with a decimal number, then the above code added a space to that number too. Check the below example, where I applied the above code and it did not work as expected.
Last year the overall pass percentage was 90. 95%. It was 96. 21% in 2016.
You can see how the decimal places are separated by a space.
How can I add a space only at the sentence end? Normally every sentence end will contain a full stop.

You can get result of your own code, like below
public static String modifySentence(String input) {
StringBuilder sb = new StringBuilder(input);
// Counter which will increase with every insertion of char in StringBuilder
int insertCounter = 1;
int index = input.indexOf(".");
// If index is not of last digit of input, or not a digit or not a space.
// In all above cases we need to skip
while (index >= 0) {
if ((index + 1 < input.length())
&& (!Character.isDigit(input.charAt(index + 1)))
&& (!Character.isSpaceChar(input.charAt(index + 1)))) {
sb.insert(index + insertCounter, " ");
insertCounter++;
}
index = input.indexOf(".", index + 1);
}
return sb.toString();
}
Input is like
System.out.println(modifySentence("Last year the overall pass percentage was 90.95%.It was 96.21% in 2016."));
System.out.println(modifySentence("Last year the overall pass percentage was 90.95%.It was 96.21% in 2016. And this is extra . test string"));
And output is
Last year the overall pass percentage was 90.95%. It was 96.21% in 2016.
Last year the overall pass percentage was 90.95%. It was 96.21% in 2016. And this is extra . test string
As wiktor-stribiżew commented, this same result can be achieved using your_string.replaceAll("\\.([^\\d\\s])", ". $1");. Or you can use your_string.replaceAll("\\.(?<!\\d\\.\\d)(\\S)", ". $1"), it will handle the case like if a number starts right after dot.
If you have any confusion regarding these regexes, you can ask directly (by mentioning him in comment) to wiktor-stribiżew. These regex credit goes to him.

I dont know this is correct way or not but you can check for latter after dot(.) if it is capital (Uppercase latter), then you can consider the end of this statment and add one space. expect if your statment is started from lowercase later you can not use this.
but it is difficult to check the first letter is capital or not.
but you can also do it with
String first = myString.substring(0,1);
myString should coming after dot(.), and it should not start with any number.

If you want to add additional space to a sentence that already has a space after period, you can do the following:
String sentence = "Last year the overall pass percentage was 90.95%. It was 96.21% in 2016.";
sentence = sentence.replaceAll("\\. ",". ");
But if you need to add space to sentences that are not separated by space after period, do the following:
import java.util.regex.*;
public class MyClass {
public static void main(String args[]) {
String sentence = "Last year the overall pass percentage was 90.95%.It was 96.21% in 2016.example.";
String[] sentenceArr=sentence.split("\\.");
String str = "";
for(int i = 0; i < sentenceArr.length; i++) {
if(Pattern.matches(".*\\d+",sentenceArr[i]) && Pattern.matches("\\d+.*",sentenceArr[i+1])){
str=str+sentenceArr[i]+".";
}
else{
str=str+sentenceArr[i]+". ";
}
}
System.out.println(str);
}
}
Input: Last year the overall pass percentage was 90.95%.It was 96.21% in 2016.example
Output: Last year the overall pass percentage was 90.95%. It was 96.21% in 2016. example.

Related

Reading a file -- pairing a String and int value -- with multiple split lines

I am working on an exercise with the following criteria:
"The input consists of pairs of tokens where each pair begins with the type of ticket that the person bought ("coach", "firstclass", or "discount", case-sensitively) and is followed by the number of miles of the flight."
The list can be paired -- coach 1500 firstclass 2000 discount 900 coach 3500 -- and this currently works great. However, when the String and int value are split like so:
firstclass 5000 coach 1500 coach
100 firstclass
2000 discount 300
it breaks entirely. I am almost certain that it has something to do with me using this format (not full)
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ")
while(token.hasMoreTokens())
{
String ticketClass = token.nextToken().toLowerCase();
int count = Integer.parseInt(token.nextToken());
...
}
}
because it will always read the first value as a String and the second value as an integer. I am very lost on how to keep track of one or the other while going to read the next line. Any help is truly appreciated.
Similar (I think) problems:
Efficient reading/writing of key/value pairs to file in Java
Java-Read pairs of large numbers from file and represent them with linked list, get the sum and product of each pair
Reading multiple values in multiple lines from file (Java)
If you can afford to read the text file in all at once as a very long String, simply use the built-in String.split() with the regex \\s+, like so
String[] tokens = fileAsString.split("\\s+");
This will split the input file into tokens, assuming the tokens are separated by one or more whitespace characters (a whitespace character covers newline, space, tab, and carriage return). Even and odd tokens are ticket types and mile counts, respectively.
If you absolutely have to read in line-by-line and use StringTokenizer, a solution is to count number of tokens in the last line. If this number is odd, the first token in the current line would be of a different type of the first token in the last line. Once knowing the starting type of the current line, simply alternating types from there.
int tokenCount = 0;
boolean startingType = true; // true for String, false for integer
boolean currentType;
while(fileScanner.hasNextLine())
{
StringTokenizer token = new StringTokenizer(fileScanner.nextLine(), " ");
startingType = startingType ^ (tokenCount % 2 == 1); // if tokenCount is odd, the XOR ^ operator will flip the starting type of this line
tokenCount = 0;
while(token.hasMoreTokens())
{
tokenCount++;
currentType = startingType ^ (tokenCount % 2 == 0); // alternating between types in current line
if (currentType) {
String ticketClass = token.nextToken().toLowerCase();
// do something with ticketClass here
} else {
int mileCount = Integer.parseInt(token.nextToken());
// do something with mileCount here
}
...
}
}
I found another way to do this problem without using either the StringTokenizer or the regex...admittedly I had trouble with the regular expressions haha.
I declare these outside of the try-catch block because I want to use them in both my finally statement and return the points:
int points = 0;
ArrayList<String> classNames = new ArrayList<>();
ArrayList<Integer> classTickets = new ArrayList<>();
Then inside my try-statement, I declare the index variable because I won't need that outside of this block. That variable increases each time a new element is read. Odd elements are read as ticket classes and even elements are read as ticket prices:
try
{
int index = 0;
// read till the file is empty
while(fileScanner.hasNext())
{
// first entry is the ticket type
if(index % 2 == 0)
classNames.add(fileScanner.next());
// second entry is the number of points
else
classTickets.add(Integer.parseInt(fileScanner.next()));
index++;
}
}
You can either catch it here like this or use throws NoSuchElementException in your method declaration -- As long as you catch it on your method call
catch(NoSuchElementException noElement)
{
System.out.println("<###-NoSuchElementException-###>");
}
Then down here, loop through the number of elements. See which flight class it is and multiply the ticket count respectively and return the points outside of the block:
finally
{
for(int i = 0; i < classNames.size(); i++)
{
switch(classNames.get(i).toLowerCase())
{
case "firstclass": // 2 points for first
points += 2 * classTickets.get(i);
break;
case "coach": // 1 point for coach
points += classTickets.get(i);
break;
default:
// budget gets nothing
}
}
}
return points;
The regex seems like the most convenient way, but this was more intuitive to me for some reason. Either way, I hope the variety will help out.
simply use the built-in String.split() - #bui
I was finally able to wrap my head around regular expressions, but \s+ was not being recognized for some reason. It kept giving me this error message:
Invalid escape sequence (valid ones are \b \t \n \f \r " ' \ )Java(1610612990)
So when I went through with those characters instead, I was able to write this:
int points = 0, multiplier = 0, tracker = 0;
while(fileScanner.hasNext())
{
String read = fileScanner.next().split(
"[\b \t \n \f \r \" \' \\ ]")[0];
if(tracker % 2 == 0)
{
if(read.toLowerCase().equals("firstclass"))
multiplier = 2;
else if(read.toLowerCase().equals("coach"))
multiplier = 1;
else
multiplier = 0;
}else
{
points += multiplier * Integer.parseInt(read);
}
tracker++;
}
This code goes one entry at a time instead of reading a whole array void of whitespace as a work-around for that error message I was getting. If you could show me what the code would look like with String[] tokens = fileAsString.split("\s+"); instead I would really appreciate it :)
you need to add another "\" before "\s" to escape the slash before "s" itself – #bui

Hashmap in for loop not reading all the input

This is for AOC day 2. The input is something along the lines of
"6-7 z: dqzzzjbzz
13-16 j: jjjvjmjjkjjjjjjj
5-6 m: mmbmmlvmbmmgmmf
2-4 k: pkkl
16-17 k: kkkkkkkkkkkkkkkqf
10-16 s: mqpscpsszscsssrs
..."
It's formatted like 'min-max letter: password' and seperated by line. I'm supposed to find how many passwords meet the minimum and maximum requirements. I put all that prompt into a string variable and used Pattern.quote("\n") to seperate the lines into a string array. This worked fine. Then, I replaced all the letters except for the numbers and '-' by making a pattern Pattern.compile("[^0-9]|-"); and running that for every index in the array and using .trim() to cut off the whitespace at the end and start of each string. This is all working fine, I'm getting the desired output like 6 7 and 13 16.
However, now I want to try and split this string into two. This is my code:
HashMap<Integer,Integer> numbers = new HashMap<Integer,Integer>();
for(int i = 0; i < inputArray.length; i++){
String [] xArray = x[i].split(Pattern.quote(" "));
int z = Integer.valueOf(xArray[0]);
int y = Integer.valueOf(xArray[1]);
System.out.println(z);
System.out.println(y);
numbers.put(z, y);
}
System.out.println(numbers);
So, first making a hasmap which will store <min, max> values. Then, the for loop (which runs 1000 times) splits every index of the 6 7 and 13 16 string into two, determined by the " ". The System.out.println(z); and System.out.println(y); are working as intended.
6
7
13
16
...
This output goes on to give me 2000 integers seperated by a line each time. That's exactly what I want. However, the System.out.println(numbers); is outputting:
{1=3, 2=10, 3=4, 4=7, 5=6, 6=9, 7=12, 8=11, 9=10, 10=18, 11=16, 12=13, 13=18, 14=16, 15=18, 16=18, 17=18, 18=19, 19=20}
I have no idea where to even start with debugging this. I made a test file with an array that is formatted like "even, odd" integers all the way up to 100. Using this exact same code (I did change the variable names), I'm getting a better output. It's not exactly desired since it starts at 350=351 and then goes to like 11=15 and continues in a non-chronological order but at least it contains all the 100 keys and values.
Also, completely unrelated question but is my formatting of the for loop fine? The extra space at the beginning and the end of the code?
Edit: I want my expected output to be something like {6=7, 13=16, 5=6, 2=4, 16=17...}. Basically, the hashmap would have the minimum and maximum as the key and value and it'd be in chronological order.
The problem with your code is that you're trying to put in a nail with a saw. A hashmap is not the right tool to achieve what you want, since
Keys are unique. If you try to input the same key multiple times, the first input will be overwritten
The order of items in a HashMap is undefined.
A hashmap expresses a key-value-relationship, which does not exist in this context
A better datastructure to save your Passwords would probably just be a ArrayList<IntegerPair> where you would have to define IntegerPair yourself, since java doesn't have the notion of a type combining two other types.
I think you are complicating the task unnecessarily. I would proceed as follows:
split the input using the line separator
for each line remove : and split using the spaces to get an array with length 3
build from the array in step two
3.1. the min/max char count from array[0]
3.2 charachter classes for the letter and its negation
3.3 remove from the password all letters that do not correspond to the given one and check if the length of the password is in range.
Something like:
public static void main(String[] args){
String input = "6-7 z: dqzzzjbzz\n" +
"13-16 j: jjjvjmjjkjjjjjjj\n" +
"5-6 m: mmbmmlvmbmmgmmf\n" +
"2-4 k: pkkl\n" +
"16-17 k: kkkkkkkkkkkkkkkqf\n" +
"10-16 s: mqpscpsszscsssrs\n";
int count = 0;
for(String line : input.split("\n")){
String[] temp = line.replace(":", "").split(" "); //[6-7, z, dqzzzjbzz]
String minMax = "{" + (temp[0].replace('-', ',')) + "}"; //{6,7}
String letter = "[" + temp[1] + "]"; //[z]
String letterNegate = "[^" + temp[1] + "]"; //[^z]
if(temp[2].replaceAll(letterNegate, "").matches(letter + minMax)){
count++;
}
}
System.out.println(count + "passwords are valid");
}

Single characters don't left-justify using format in Java?

A problem I'm having in my programming class is asking me to make a pattern like this:
I used the following code:
public static void ch5ex18c() {
System.out.println("Pattern C");
String num = "6 5 4 3 2 1";
for (int count = 10; count >= 0; count-=2){
System.out.printf("%-10s", num.substring(count, 11) + "\n");
}
}
and I got everything to print out well except the first number line:
I know I can fix this using an if statement, but I'd just prefer not to and I want to know why it would do this in the first place.
The problem is that your new lines are inserted before extra spaces ("1\n" + spaces), you need to remove the minus sign (that justifies left) and make some minor math alterations.
1\n spaces (shown on next line to make it seem like your justifying right
2 1\n spaces
and so on

Why this simple regex/Scanner is not working?

I have to resolve the next exercise:
Create a keyboard scanner in which the tokens are unsigned integers, and write the code to determine the sum of the integers.
Note: -5 will be scanned as the unsigned integer 5, and the minus sign will be skipped over as a delimiter
this is my solution (not working) :
import java.util.*;
public class testing{
public static void main( String[] argv ){
testing p = new testing();
}
public testing(){
Scanner myScanner = new Scanner(System.in);
String regex = "\\s+|-";
myScanner.useDelimiter(regex);
int sum = 0;
while(myScanner.hasNextInt()){
int partial = myScanner.nextInt();
sum += partial;
}
System.out.println(sum);
}
}
The problem with my solution is that it is working only for positive integers or negative integer (only first input) added to positive integers.
for example:
-2
3
4
f (used to stop the program)
will retrieve 9.
but
3
-2
will stop the program and retrieve 3
I am trying to understand the reason of this behavior but not luck so far.
You are using an OR in the regex, which means it just consumes the line break and then stops at the '-', because it's not consumed as a delimiter any more.
The input your scanner sees is:
-2\n3\n4\nf
'f' is the first thing after an integer that does not match your pattern of whitespaces OR one minus.
The second pattern though requires both to be matched vs your delimiter:
3\n-2
So the line break (white space) is matched versus your delimiter pattern and the minus remains unmatched. Since it's not an Integer .hasNextInt() returns false.
A repeatable group of whitespaces and minus works as you intend:
final String regex = "[\\s-]+";
This might end up accepting more than your exercise requires, like multiple minus signs in front of a number. If that is not acceptable, you can of course limit the minus to one occurance after an indeterminate amount of whitespaces:
final String regex = "[\\s]*-?";
The '?' means "once or not at all" and limits the occurances of minuses.
Edit:
As #maraca pointed out in a comment, my previously proposed solution only works for single digit numbers, as the pattern accepts the empty string as well. The solution that works for numbers >9 as well is:
final String regex = "\\s+|\\s*-";
What it does is consume either one or more whitespaces OR zero or more whitespaces followed by a minus sign.
Turns out even small things like these can be rather difficult to do right. -.-
You can solve by catching InputMismatchException like this:
public Testing(){
Scanner myScanner = new Scanner(System.in);
String regex = "\\s+|-";
myScanner.useDelimiter(regex);
int sum = 0;
while(myScanner.hasNext()){
int partial = 0;
try{
partial = myScanner.nextInt();
} catch(InputMismatchException e){
continue;
}
sum += partial;
}
System.out.println(sum);
}
The issue is you are trying to parse a int but you receive a "-".

Finding the index of a permutation within a string

I just attempted a programming challenge, which I was not able to successfully complete. The specification is to read 2 lines of input from System.in.
A list of 1-100 space separated words, all of the same length and between 1-10 characters.
A string up to a million characters in length, which contains a permutation of the above list just once. Return the index of where this permutation begins in the string.
For example, we may have:
dog cat rat
abcratdogcattgh
3
Where 3 is the result (as printed by System.out).
It's legal to have a duplicated word in the list:
dog cat rat cat
abccatratdogzzzzdogcatratcat
16
The code that I produced worked providing that the word that the answer begins with has not occurred previously. In the 2nd example here, my code will fail because dog has already appeared before where the answer begins at index 16.
My theory was to:
Find the index where each word occurs in the string
Extract this substring (as we have a number of known words with a known length, this is possible)
Check that all of the words occur in the substring
If they do, return the index that this substring occurs in the original string
Here is my code (it should be compilable):
import java.io.BufferedReader;
import java.io.InputStreamReader;
public class Solution {
public static void main(String[] args) throws Exception {
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String line = br.readLine();
String[] l = line.split(" ");
String s = br.readLine();
int wl = l[0].length();
int len = wl * l.length;
int sl = s.length();
for (String word : l) {
int i = s.indexOf(word);
int z = i;
//while (i != -1) {
int y = i + len;
if (y <= sl) {
String sub = s.substring(i, y);
if (containsAllWords(l, sub)) {
System.out.println(s.indexOf(sub));
System.exit(0);
}
}
//z+= wl;
//i = s.indexOf(word, z);
//}
}
System.out.println("-1");
}
private static boolean containsAllWords(String[] l, String s) {
String s2 = s;
for (String word : l) {
s2 = s2.replaceFirst(word, "");
}
if (s2.equals(""))
return true;
return false;
}
}
I am able to solve my issue and make it pass the 2nd example by un-commenting the while loop. However this has serious performance implications. When we have an input of 100 words at 10 characters each and a string of 1000000 characters, the time taken to complete is just awful.
Given that each case in the test bench has a maximum execution time, the addition of the while loop would cause the test to fail on the basis of not completing the execution in time.
What would be a better way to approach and solve this problem? I feel defeated.
If you concatenate the strings together and use the new string to search with.
String a = "dog"
String b = "cat"
String c = a+b; //output of c would be "dogcat"
Like this you would overcome the problem of dog appearing somewhere.
But this wouldn't work if catdog is a valid value too.
Here is an approach (pseudo code)
stringArray keys(n) = {"cat", "dog", "rat", "roo", ...};
string bigString(1000000);
L = strlen(keys[0]); // since all are same length
int indices(n, 1000000/L); // much too big - but safe if only one word repeated over and over
for each s in keys
f = -1
do:
f = find s in bigString starting at f+1 // use bigString.indexOf(s, f+1)
write index of f to indices
until no more found
When you are all done, you will have a series of indices (location of first letter of match). Now comes the tricky part. Since the words are all the same length, we're looking for a sequence of indices that are all spaced the same way, in the 10 different "collections". This is a little bit tedious but it should complete in a finite time. Note that it's faster to do it this way than to keep comparing strings (comparing numbers is faster than making sure a complete string is matched, obviously). I would again break it into two parts - first find "any sequence of 10 matches", then "see if this is a unique permutation".
sIndx = sort(indices(:))
dsIndx = diff(sIndx);
sequence = find {n} * 10 in dsIndx
for each s in sequence
check if unique permutation
I hope this gets you going.
Perhaps not the best optimized version, but how about following theory to give you some ideas:
Count length of all words in row.
Take random word from list and find the starting index of its first
occurence.
Take a substring with length counted above before and after that
index (e.g. if index is 15 and 3 words of 4 letters long, take
substring from 15-8 to 15+11).
Make a copy of the word list with earlier random word removed.
Check the appending/prepending [word_length] letters to see if they
match a new word on the list.
If word matches copy of list, remove it from copy of list and move further
If all words found, break loop.
If not all words found, find starting index of next occurence of
earlier random word and go back to 3.
Why it would help:
Which word you pick to begin with wouldn't matter, since every word
needs to be in the succcessful match anyway.
You don't have to manually loop through a lot of the characters,
unless there are lots of near complete false matches.
As a supposed match keeps growing, you have less words on the list copy left to compare to.
Can also keep track or furthest index you've gone to, so you can
sometimes limit the backwards length of picked substring (as it
cannot overlap to where you've already been, if the occurence are
closeby to each other).

Categories