Word count algorithm issue - java

So I am trying to make a word count program where the user can paste text in to get a word count of type text in to get a word count. The typing text in works for the most part but sometimes I get a string index out of range error when I try to go back and replace text. The pasting works but I am having issues with the string index being out of range issue here as well. My logic works like this: a space equals a word, two spaces back to back is minus one word, and the end of the string counts as a word. I am relatively new to Java and I thought this was going to be an easy thing to make, I was wrong! Some help/explanation would be appreciated!
public static int getWordCount(String getInput, int e){
int numberOfWords = 0;
char l1 = 0;
char l2 = 0;
StringBuilder convertInput = new StringBuilder(getInput);
System.out.println(convertInput);
for (int i = 0, i1 = 1; i < getInput.length();i++, i1++){
l2 = convertInput.charAt(i);
if (l2 == ' '){
numberOfWords += 1;
l1 = convertInput.charAt(i1);
}
if (i == getInput.length() - 1){
numberOfWords += 1;
}
if (l2 == ' ' && l1 == ' '){
numberOfWords -= 1;
}
}
return numberOfWords;
} // end of getWordCount method

you can do this easy and quick with:
String phrase = "word1 word2 word3 word4";
String delims = " "; //u can declare more delims here like delims = " ,.{[]}\";
String[] tokens = phrase.split(delims);
tokens.length = number of words in your string

Related

Breaking string into multiple lines in Java

I have a single line string of length n, which I want to split into maximum of 3 lines. Each line can have a maximum of 45 chars, after which I want to add a new-line char ("\n"). The 3rd line can have a maximum of 42 chars after which I need to include 3 dots (...) if the string goes beyond that, thus making the total characters in the 3rd line 45 as well.
The condition is that the new line character should not be added in the middle of a word. How do I do this efficiently? This operation is just a small part of the entire program, but will be called repeatedly. So I'm not sure if I should actually bother about the efficiency.
What I'm doing right now is that I first figure out where the spaces between words are and then add it to a List. I then iterate through the list and find 3 indices each representing the end word of each line. So the first index will be the space closest to 45, the next closest to 90, and the third closest to 135. I then use these indices to split the actual string, and add "\n" and "..." respectively. This is my code:
//maxCharsPerLine will be 45
public String splitString(String input, int maxCharsPerLine){
String output = "";
ArrayList<Integer> spaces = new ArrayList<Integer>();
// Logic to figure out after which word the sentence should be split so that we don't split in middle of a word
for(int index = 0; index < input.length(); index++){
if(input.charAt(index)==' '){
spaces.add(index);
}
}
//add index of last word of string
spaces.add(input.length());
int index1 = 0; int index2 = 0; int index3 = 0;
for(Integer index : spaces){
// find word closest to and less than maxCharsPerLine. This index will be used to find the last word in line1
if(index<=maxCharsPerLine)
index1 = index;
// find word closest to and less than 2*maxCharsPerLine. This index will be used to find the last word in line2
else if(index<=2*maxCharsPerLine)
index2 = index;
// find word closest to and less than 3*maxCharsPerLine, but exclude 3 chars for adding the dots (...). This index will be used to find the last word in line3
else if(index<=(3*maxCharsPerLine)-3)
index3 = index;
}
if(input.length()>maxCharsPerLine){
if(index1 > 0)
output = input.substring(0, index1);
if(index2 > 0)
output += "\n"+input.substring(index1+1, index2);
if(index3 > 0){
output += "\n"+input.substring(index2+1, index3);
if(input.length()>3*maxCharsPerLine)
output += "...";
}
}
//if length of input is < 45, just return the input
else
output = input;
return output;
}
Not sure in which scenarios this will fail. Is there a better way to do this?
Thanks.
You can use WordUtils.wrap method of Apache Commans Lang if 3 dots are not be considered for wrapping the line.
WordUtils.wrap(str, 45)
Code
public class test3 {
public static void main(String[] args) {
String S = "The condition is that the new line should not be added in the middle of a word. How do I do this efficiently? This operation is just a small part of the entire program, but will be called repeatedly. So I'm not sure if I should actually bother about the efficiency";
String Op = "";
String Op1 = "";
String Op2 = "";
String Op3 = "";
String Temp[] = S.split(" ");
int max_size_1 = 45;
int max_size_2 = 45;
int max_size_3 = 42;
int length = 0;
for (int i = 0; i < Temp.length; i++) {
length = length + Temp[i].length()+1;
if(length <= max_size_1) Op1 = Op1 + Temp[i]+" ";
else if(length <= Op1.length()+max_size_2) Op2 = Op2 +Temp[i]+" ";
else if(length <= Op1.length()+Op2.length()+max_size_3) Op3 = Op3 + Temp[i]+" ";
else {Op3 = Op3 +'\b' + "..."; i =Temp.length ; } //backspace
}
Op = Op1+"\n"+Op2+"\n"+Op3;
System.out.println(Op);
System.out.println(Op1.length()+" "+Op2.length()+" "+Op3.length()+" ");
}}
Output
The condition is that the new line should
not be added in the middle of a word. How do
I do this efficiently? This operation...
42 45 45
Here another solution, though it might be corrupted and needs to be edited.
int sizeOfString = input.lenght();
//the maximum lenght of a String
int aPartialStringLenght = 45;
String firstString;
String secondString;
String thirdString;
for(int x = 1; x <= 3; x++){
// looks for the last space before your 45th character
//sets the lenght for the third String to max. 42characters
if(x == 3){
aPartialStringLenght = 42;
}
while(!input.charAt(aPartialStringLenght*x).equals(" ")){
aPartialStringLenght -=1;
}
switch(x){
// gets the substring till your first partialString
case 1: firstString = input.substring(0, aPartialStringlenght);
aPartialStringLenght = 45;
// gets the substring from the end of your first partialString till the end of your second partialString
case 2: secondString = input.substring(firstString.lenght(), aPartialStringLenght + firstString.lenght());
aPartialStringLenght = 45;
// gets the substring from the end of your second partialString till till the end of your third partialString + "..."
case 3 thirdString = input.substring(firstString.lenght()+secondString.lenght(), aPartialStringLenght + firstString.lenght()+ secondString.lenght() )+"..."
aPartialStringLenght = 45;
}
}
Based on surya answer
public class test3 {
public static void main(String[] args) {
String S = "The condition is that the new line should not be added in the middle of a word. How do I do this efficiently? This operation is just a small part of the entire program, but will be called repeatedly. So I'm not sure if I should actually bother about the efficiency";
String F = WordUtils.wrap(S, 45);
String[] F1 = F.split(System.lineSeparator());
System.out.println(F1[0]);
System.out.println(F1[1]);
F1[2] = F1[2] +'\b'+'\b'+'\b'+"...";
System.out.println(F1[2]);
}
}
Output
The condition is that the new line should not
be added in the middle of a word. How do I do
this efficiently? This operation is jus...
My proposal is highly efficient, because:
It needs just two objects: the final string and a temporary StringBuilder, which is pre-sized,
And it does not waste time in pre-processing: Processes each character just once, and decides on the fly what to do.
And it is also flexible, because all the involved data are received as parameters:
public final class LinesSplitter
{
private LinesSplitter(){}
private static final char NL='\n';
public static String splitInLines(String text, int maxLineLength, int maxLines, String lastLineSuffix)
{
StringBuilder output=new StringBuilder((1 + maxLineLength) * maxLines);
int p=0;
int startOfLine=0;
int lastBlank=0;
int lastNonBlank=0;
int len=text.length();
String neededSuffix=text.length() > maxLineLength * maxLines
? lastLineSuffix
: "";
int lines=0;
while (lines < maxLines && p < len)
{
char c=text.charAt(p);
if (Character.isWhitespace(c))
{
lastBlank=p;
lastNonBlank=1 + p;
}
else if (p < len)
{
int maxLengthForCurrentLine=getMaxLength(maxLineLength, maxLines, 1 + lines, neededSuffix);
if (p - startOfLine == maxLengthForCurrentLine)
{
output.append(text, startOfLine, lastBlank);
String suffix=getSuffix(maxLineLength, maxLines, 1 + lines, neededSuffix);
if (!suffix.isEmpty())
{
output.append(suffix);
}
else
{
output.append(NL);
}
lines++;
startOfLine=lastNonBlank;
}
}
p++;
}
if (lines < maxLines && p - startOfLine > 0)
{
output.append(text, startOfLine, len);
}
return output.toString();
}
private final static int getMaxLength(int maxLineLength, int maxLines, int currentLine, String lastLineSuffix)
{
return currentLine == maxLines
? maxLineLength - lastLineSuffix.length()
: maxLineLength;
}
private final static String getSuffix(int maxLineLength, int maxLines, int currentLine, String lastLineSuffix)
{
return currentLine == maxLines
? lastLineSuffix
: "";
}
}
The only possible drawback is that it does not support several adjacent blanks.

Can I use indexOf to pull characters from a string

I am trying to use index of to basically print the first letter in a string after each whitespace
I want it to grab the first letter of a persons full name entered to pring back out the intials so if they enterd Billy Bob Joe it would grab BBJ and print it like that I am trying to get it to go from each whitespace +1 to grab the chars.
I cannot use chartAt as I do not know the input the user will give.
I have this code I can get it to go to a certain white space but cannot get it to grab just the first letter after the whitespace it take the whole strign after it
String str ="Billy Joe Bob";
int targetMatch = 1;
int offset = 0;
for(int i = 0 ; i < targetMatch; i++){
int position = str.indexOf(' ',offset);
if(position != -1){
offset = position+1;
}
}
String result = str.substring(offset);
System.out.println(result);
Any help would be appreciated.
String str ="Billy Joe Bob";
int targetMatch = 1;
int offset = 0;
int position = str.indexOf(' ',offset);
String result = "";
result += str.substring(0, 1);
while(position != -1){
position++;
result += str.substring(position,position+1);
position = str.indexOf(' ', position);
}
System.out.println(result);
Try this
Ideally, you'd just split the string on whitespace using String.split. E.g.
String str = "foo bar qux";
for(String tok: str.split("\s+"))
System.out.println(tok.charAt(0));
The easiest solution I can think of is to use String.split.
String str ="Billy Joe Bob";
for (String word : str.split("\s+")) {
if (word.length >= 1) {
System.out.print(word.charAt(0));
}
}

Finding characters in a string

i'm doing an encoding program where i'm supposed to delete every character in the string which appears twice. i've tried to traverse through the string but it hasn't worked. does anyone know how to do this? Thanks.
public static String encodeScrambledAlphabet(String str)
{
String newword = str;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
newword += alphabet;
newword = newword.toUpperCase();
for (int i = 0, j = newword.length(); i < newword.length() && j >=0; i++,j--)
{
char one = newword.charAt(i);
char two = newword.charAt(j);
if (one == two)
{
newword = newword.replace(one, ' ');
}
}
newword = newword.replaceAll(" ", "");
return newword;
}
Assuming that you would like to keep only the first occurrence of the character, you can do this:
boolean seen[65536];
StringBuilder res = new StringBuilder();
str = str.toUpperCase();
for (char c : str.toCharArray()) {
if (!seen[c]) res.append(c);
seen[c] = true;
}
return res.toString();
The seen array contains flags, one per character, indicating that we've seen this character already. If your characters are all ASCII, you can shrink the seen array to 128.
Assuming by saying deleting characters that appears twice, you mean AAABB becomes AAA, below code should work for you.
static String removeDuplicate(String s) {
StringBuilder newString = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
String s1 = s.substring(i, i + 1);
// We need deep copy of original String.
String s2 = new String(s);
// Difference in size in two Strings gives you the number of
// occurences of that character.
if(s.length() - s2.replaceAll(s1, "").length() != 2)
newString.append(s1);
}
return newString.toString();
}
Efficiency of this code is arguable :) It might be better approach to count the number of occurences of character by a loop.
So, from the code that you've shown, it looks like you aren't comparing every character in the string. You are comparing the first and last, then the second and next to last. Example:
Here's your string:
THISISTHESTRINGSTRINGABCDEFGHIJKLMNOPQRSTUVWXYZ
First iteration, you will be comparing the T at the beginning, and the Z at the end.
Second iteration, you will be comparing the H and the Y.
Third: I and X
etc.
So the T a the beginning never gets compared to the rest of the characters.
I think a better way to do this would be to to do a double for loop:
int length = newword.length(); // This way the number of iterations doesn't change
for(i = 0; i < length; i++){
for(j = 0; j < length; j++){
if(i!=j){
if(newword.charAt(i) == newword.charAt(j)){
newword.replace(newword.charAt(i), ' ');
}
}
}
}
I'm sure that's not the most efficient algorithm for it, but it should get it done.
EDIT: Added an if statement in the middle, to handle i==j case.
EDIT AGAIN: Here's an almost identical post: function to remove duplicate characters in a string

Parse String and Replace Letters Java

At input i have some string : "today snowing know " , here i have 3 words , so i must to parse them is such way : every character i must compare with all other characters , and to sum how many same characters these words have , like exemple for "o" letter will be 2 (from "today" and "snowing") or "w" letter will be 2 (from "know" and "snowing"). After that i must to replace these characters with number(transformed in char format) of letters. The result should be "13111 133211 1332".
What i did ?
First i tape some words and
public void inputStringsForThreads () {
boolean flag;
do {
// will invite to input
stringToParse = Input.value();
try {
flag = true;
// in case that found nothing , space , number and other special character , throws an exception
if (stringToParse.equals("") | stringToParse.startsWith(" ") | stringToParse.matches(".*[0-9].*") | stringToParse.matches(".*[~`!##$%^&*()-+={};:',.<>?/'_].*"))
throw new MyStringException(stringToParse);
else analizeString(stringToParse);
}
catch (MyStringException exception) {
stringToParse = null;
flag = false;
exception.AnalizeException();
}
}
while (!flag);
}
I eliminate spaces between words , and from those words make just one
static void analizeString (String someString) {
// + sign treat many spaces as one
String delimitator = " +";
// words is a String Array
words = someString.split(delimitator);
// temp is a string , will contain a single word
temp = someString.replaceAll("[^a-z^A-Z]","");
System.out.println("=============== Words are : ===============");
for (int i=0;i<words.length;i++)
System.out.println((i+1)+")"+words[i]);
}
So i try to compare for every word in part (every word is split in letters) with all letter from all words , But i don know how to count number of same letter and after replace letters with correct number of each letter??? Any ideas ?
// this will containt characters for every word in part
char[] motot = words[id].toCharArray();
// this will containt all characters from all words
char[] notot = temp.toCharArray();
for (int i =0;i<words[i].length();i++)
for (int j=0;j<temp.length ;j++)
{
if (i == j) {
System.out.println("Same word");
}
else if (motot[i] == notot[j] ) {
System.out.println("Found equal :"+lol[i]+" "+lol1[j]);
}}
For counting you might want to use a Map<Character, Integer> counter like java.util.HashMap. If getting a Value(Integer) using a specific key (Character) from counter is 'not null', then your value++ (leverage autoboxing). Otherwise put a new entry (char, 1) in the counter.
Replacing the letters with the numbers should be fairly easy then.
It is better to use Pattern Matching like this:
initially..
private Matcher matcher;
Pattern regexPattern = Pattern.compile( pattern );
matcher = regexPattern.matcher("");
for multiple patterns to match.
private final String[] patterns = new String [] {/* instantiate patterns here..*/}
private Matcher matchers[];
for ( int i = 0; i < patterns.length; i++) {
Pattern regexPattern = Pattern.compile( pattern[i] );
matchers[i] = regexPattern.matcher("");
}
and then for matching pattern.. you do this..
if(matcher.reset(charBuffer).find() ) {//matching pattern.}
for multiple matcher check.
for ( int i = 0; i < matchers.length; i++ ) if(matchers[i].reset(charBuffer).find() ) {//matching pattern.}
Don't use string matching, not efficient.
Always use CharBuffer instead of String.
Here is some C# code (which is reasonably similar to Java):
void replace(string s){
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in s){
// skip spaces
if(c == ' ') continue;
// update count for char c
if(!counts.ContainsKey(c)) counts.Add(c, 1);
else counts[c]++;
}
// replace characters in s
for(int i = 0; i < s.Length; i++)
if(s[i] != ' ')
s[i] = counts[s[i]];
}
Pay attention to immutable strings in the second loop. Might want to use a StringBuilder of some sort.
Here is a solution that works for lower case strings only. Horrible horrible code, but I was trying to see how few lines I could write a solution in.
public static String letterCount(String in) {
StringBuilder out = new StringBuilder(in.length() * 2);
int[] count = new int[26];
for (int t = 1; t >= 0; t--)
for (int i = 0; i < in.length(); i++) {
if (in.charAt(i) != ' ') count[in.charAt(i) - 'a'] += t;
out.append((in.charAt(i) != ' ') ? "" + count[in.charAt(i) - 'a'] : " ");
}
return out.substring(in.length());
}

Word count on Java

How can I count the words of a sentence given as string? We are allowed to use only the following: for loops, if statemant, while, charAt, length().
I wrote this code:
public static int getWordCount()
{
String data = "bla bla bla bla";
int Count = 0;
for (int i=0; i<data.length(); i++)
{
if (data.charAt(i) != ' ')
Count ++;
}
return Count;
}
But it counts only the letters and not the words.
Here's a suggestion: Count the number of ' ' and add 1?
Example:
"bla bla bla bla"
1 2 3 : 3 + 1 = 4
"hello"
: 0 + 1 = 1
If you want to get fancy you could keep a boolean variable named something like lastWasSpace, set it to true when running into a space, setting it to false when you run into a non-space character. If you only increment the Count when lastWasSpace is false, you'll be able to handle strings with multiple consecutive spaces as well.
"bla bla bla"
1 2 : 2 + 1 = 3
lastWasSpace: FFFFTTTFFFFTTTTTFFFF
the given code would indeed count letters and not words. You may want to change the condition to:
if (data.charAt(i) == ' ')
this means, if you find a space, this would mark the beginning of the next word. Also, the last word will not be counted so you should return Count+1 instead of Count.
There are several assumptions I made here:
There will be exactly one space in between words.
There will not be any leading or trailing spaces.
To consider multiple spaces between words, you would need to modify the code a little. Instead of checking if a character is space, check to see if a character is non-space and the previous character was either a space or no character for the case of first word. This would also handle leading and trailing spaces.
public class Main {
public static void main(String[] args) {
String data = "This is a Test";
int wordCount = 1;
int charCount = 0;
for (int i = 0; i < data.length(); i++) {
if (data.charAt(i) == ' ') {
wordCount++;
} else {
charCount++;
}
}
System.out.println("wordCount = " + wordCount);
System.out.println("charCount = " + charCount);
}
}
String ss = " leading spaces in string ";
String[] sa = ss.trim().split("\\w+");
System.out.println(sa.length);
Note the use of trim to handle surrounding whitespace.
Use the below code for count the words in the line,
int index = 0;
int numWords =0;
boolean prevwhitespace = true;
String line = "bla bla bla bla";
while(index < line.length())
{
char c = line.charAt(index++);
boolean currwhitespace = Character.isWhitespace(c);
if(prevwhitespace && !currwhitespace)
{
numWords++;
}
prevwhitespace= currwhitespace;
}
System.out.println("no. of words in the line :: " +numWords);
My solution:
public static int getWordCount() {
String data = "bla bla bla bla";
String[] arr = data.split(" ");
return arr.length;
}
String s = "Aljohani Abdallah";
int counter = 1;
for (int i = 0; i < s.length() - 1; i++) {
if (s.charAt(i) == ' ' && s.charAt(i + 1) != ' ')
counter++;
}
if (s == " ")
counter = 0;
System.out.println(counter);
this code above here is count number of words in String so the first thing I have to know is length of the string and then we do if condition, if i was in index equals space at the same time must the letter after space not equal space the add 1 to counter
the end if the String was empty the counter should be zero.
String str = " Hello there my name is Bill ";
str = str.trim();
int count = 0;
for(int i = 0; i<str.length(); i++) {
if(str.charAt(i) == ' ' && str.charAt(i-1) != ' ') {
count++;
}
}
System.out.println(count+1);

Categories