Breaking up paragraphs into String tokens

Breaking up paragraphs into String tokens - java

I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?
This is the algorithm I am using
int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));
String[] result = new String[arrayLength];
int j = 0;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
result[i] = mText.substring(j, j + charLimit);
j += charLimit;
}
result[lastIndex] = mText.substring(j);
I am setting the charLimit variable with any nth, integer value. And mText is string with a paragraph of text. Any suggestions on how I can improve this? Thank you in advance.
I am receiving good responses, just so you know what I did to figure out of I landed on a space or not, I used this while loop. I just do not know how to correct from this point.
while (!strTemp.substring(strTemp.length() - 1).equalsIgnoreCase(" ")) {
// somehow refine string before added to array
}

Not sure if I understood correctly what you wanted but an answer to my interpretation:
You could find the last space before your character limit with lastIndexOf and then check if you are close enough to your limit (for text without whitespace) i.e.:
int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));
String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
result[i] = mText.substring(j, splitpoint).trim();
j = splitpoint;
}
result[lastIndex] = mText.substring(j).trim();
this will search for the last space before charLimit (example value) and either split the string there if it is less then tolerance away or split at charLimit if it isn't.
Only problem with this solution is that the last Stringtoken can be longer than charLimit so you might need to adjust arrayLength and loop while (mText - j > charLimit)
Edit
running sample code:
public static void main(String[] args) {
String mText = "I am able to break up paragraphs of text into substrings based upon nth given character limit. The conflict I have is that my algorithm is doing exactly this, and is breaking up words. This is where I am stuck. If the character limit occurs in the middle of a word, how can I back track to a space so that all my substrings have entire words?";
int charLimit = 40;
int arrayLength = 0;
arrayLength = (int) Math.ceil(((mText.length() / (double) charLimit)));
String[] result = new String[arrayLength];
int j = 0;
int tolerance = 10;
int splitpoint;
int lastIndex = result.length - 1;
for (int i = 0; i < lastIndex; i++) {
splitpoint = mText.lastIndexOf(' ' ,j+charLimit);
splitpoint = splitpoint > j+charLimit-tolerance ? splitpoint:j+charLimit;
result[i] = mText.substring(j, splitpoint);
j = splitpoint;
}
result[lastIndex] = mText.substring(j);
for (int i = 0; i<arrayLength; i++) {
System.out.println(result[i]);
}
}
Output:
I am able to break up paragraphs of text
into substrings based upon nth given
character limit. The conflict I have is
that my algorithm is doing exactly
this, and is breaking up words. This is
where I am stuck. If the character
limit occurs in the middle of a word,
how can I back track to a space so that
all my substrings have entire words?
Additional Edit: added trim() as per suggestion by curiosu. It removes whitespace surroundig the string tokens.

Related

Find char in string that is a digit and larger or equal to 2

I have a string that contains numbers like: 02101403101303101303140
how can I iterate the string to check whether the number in string is >= 2 and remember that number's index in array or list for further processing?
the further processing should be replacing substrings.
for example: the iterator found number 2 and remembers the index of this character.
Now it takes the next character from 2 and remembers this character index also.
Now it is possible to replace the substring.
Let's say there is 21. Now I want this to become 11
Or lets say there is 60, this should be replaced with 000000.
First number is indicator of "how many" and the second number is "what".
Or is there a better way to remember and replace certain substrings in that way?
Thank you in advance.

There you go. but remember to atleast try next time
String str = "02101403101303101303140";
StringBuilder sb = new StringBuilder();
for(int i=0; i < str.length(); i+=2)
for(int j =0; j < Integer.parseInt(String.valueOf(str.charAt(i))); j++)
sb.append(str.charAt(i+1));
System.out.print(sb.toString());

Not sure if I'm understanding well your question, you could try something like this:
String mystring = "02101403101303101303140";
String target = "21";
String replacement = "11"
String newString = mystring.replace(target, replacement);

String str = "02101403101303101303140";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if(Integer.parseInt(String.valueOf(str.charAt(i))) >= 2) {
int temp = Integer.parseInt(String.valueOf(str.charAt(i))) - 1;
for (int j = 0; j < temp ; j++) {
sb.append(str.charAt(i+1));
}
}
else {
sb.append(str.charAt(i));
}
}
System.out.println(sb.toString());
This would produce: 01101000011101000111010001110000 which is binary for "http" (without quotes).
Thank you all! What I really needed was a push to right direction and thank zubergu for that. Also fr34k gave the best answer!

Unsure how to implement for loop

Hello I am having trouble implementing this function
Function:
Decompress the String s. Character in the string is preceded by a number. The number tells you how many times to repeat the letter. return a new string.
"3d1v0m" becomes "dddv"
I realize my code is incorrect thus far. I am unsure on how to fix it.
My code thus far is :
int start = 0;
for(int j = 0; j < s.length(); j++){
if (s.isDigit(charAt(s.indexOf(j)) == true){
Integer.parseInt(s.substring(0, s.index(j))

Assuming the input is in correct format, the following can be a simple code using for loop. Of course this is not a stylish code and you may write more concise and functional style code using Commons Lang or Guava.
StringBuilder builder = new StringBuilder();
for (int i = 0; i < s.length(); i += 2) {
final int n = Character.getNumericValue(s.charAt(i));
for (int j = 0; j < n; j++) {
builder.append(s.charAt(i + 1));
}
}
System.out.println(builder.toString());

Here is a solution you may like to use that uses Regex:
String query = "3d1v0m";
StringBuilder result = new StringBuilder();
String[] digitsA = query.split("\\D+");
String[] letterA = query.split("[0-9]+");
for (int arrIndex = 0; arrIndex < digitsA.length; arrIndex++)
{
for (int count = 0; count < Integer.parseInt(digitsA[arrIndex]); count++)
{
result.append(letterA[arrIndex + 1]);
}
}
System.out.println(result);
Output
dddv
This solution is scalable to support more than 1 digit numbers and more than 1 letter patterns.
i.e.
Input
3vs1a10m
Output
vsvsvsammmmmmmmmm

Though Nami's answer is terse and good. I'm still adding my solution for variety, built as a static method, which does not use a nested For loop, instead, it uses a While loop. And, it requires that the input string has even number of characters and every odd positioned character in the compressed string is a number.
public static String decompress_string(String compressed_string)
{
String decompressed_string = "";
for(int i=0; i<compressed_string.length(); i = i+2) //Skip by 2 characters in the compressed string
{
if(compressed_string.substring(i, i+1).matches("\\d")) //Check for a number at odd positions
{
int reps = Integer.parseInt(compressed_string.substring(i, i+1)); //Take the first number
String character = compressed_string.substring(i+1, i+2); //Take the next character in sequence
int count = 1;
while(count<=reps)//check if at least one repetition is required
{
decompressed_string = decompressed_string + character; //append the character to end of string
count++;
};
}
else
{
//In case the first character of the code pair is not a number
//Or when the string has uneven number of characters
return("Incorrect compressed string!!");
}
}
return decompressed_string;
}

Is there a difference in terms of algorithm efficiency between subtracting ASCII values and simply subtracting an integer?

I recently completed TopCoder algorithm contest single round match 618, and the problem was quite simple. Given a string consisting only of capital letters from A to Z, A = 1, B = 2, etc. and Z = 26. The objective was to return the string's total value using these values.
This was my algorithm:
public class WritingWords {
public int write(String word) {
int total = 0;
word = word.replaceAll("\\s+","");
for(int i = 0; i < word.length(); i++){
total += (int)word.charAt(i)-64;
}
return total;
}
}
I obtained a score of ~165/250.
This is the code of another user who got ~249/250:
public class WritingWords {
public int write(String word) {
int total = 0;
for(int i = 0; i < word.length(); i++){
total += word.charAt(i)-'A'+1;
}
return total;
}
}
To me, the two source codes look very similar, and I'm unsure as to why I might have gotten such a lower score. What might be the reason that the latter algorithm is so much more efficient than mine? Seems to me that they'd both run in O(n) time anyways.

Given a string consisting only of capital letters from A to Z, A = 1, B = 2, etc. and Z = 26.
Given that problem statement, this line
word = word.replaceAll("\\s+","");
is useless and iterates over the whole String value pointlessly.

Both total += (int)word.charAt(i)-64; and total += word.charAt(i)-'A'+1; would run pretty much equally fast. The problem is in this line here:
word = word.replaceAll("\\s+","");
This line (which is only in your code) is what slows down your program. As you can see in the other response, this line is unnecassary.

This
total += word.charAt(i) - 64;
is exactly the same as
total += (int) word.charAt(i) - 64;
which is the same as
total += word.charAt(i) - 'A' + 1;
If you want to speed up your program, don't use a regular expression
public int write(String word) {
int total = 0;
for(int i = 0; i < word.length(); i++) {
char ch = word.charAt(i);
if (ch >= 'A')
total += word.charAt(i) - 64;
}
return total;
}

Going back to the first index after reaching the last one in an array

After my array in the for loop reaches the last index, I get an exception saying that the index is out of bounds. What I wanted is for it to go back to the first index until z is equal to ctr. How can I do that?
My code:
char res;
int ctr = 10
char[] flames = {'F','L','A','M','E','S'};
for(int z = 0; z < ctr-1; z++){
res = (flames[z]);
jLabel1.setText(String.valueOf(res));
}

You need to use an index that is limited to the size of the array. More precisely, and esoterically speaking, you need to map the for-loop iterations {0..9} to the valid indexes for the flame array {0..flames.length()-1}, which are the same, in this case, to {0..5}.
When the loop iterates from 0 to 5, the mapping is trivial. When the loop iterates a 6th time, then you need to map it back to array index 0, when it iterates to the 7th time, you map it to array index 1, and so on.
== Naïve Way ==
for(int z = 0, j = 0; z < ctr-1; z++, j++)
{
if ( j >= flames.length() )
{
j = 0; // reset back to the beginning
}
res = (flames[j]);
jLabel1.setText(String.valueOf(res));
}
== A More Appropriate Way ==
Then you can refine this by realizing flames.length() is an invariant, which you move out of a for-loop.
final int n = flames.length();
for(int z = 0, j = 0; z < ctr-1; z++, j++)
{
if ( j >= n )
{
j = 0; // reset back to the beginning
}
res = (flames[j]);
jLabel1.setText(String.valueOf(res));
}
== How To Do It ==
Now, if you are paying attention, you can see we are simply doing modular arithmetic on the index. So, if we use the modular (%) operator, we can simplify your code:
final int n = flames.length();
for(int z = 0; z < ctr-1; z++)
{
res = (flames[z % n]);
jLabel1.setText(String.valueOf(res));
}
When working with problems like this, think about function mappings, from a Domain (in this case, for loop iterations) to a Range (valid array indices).
More importantly, work it out on paper before you even begin to code. That will take you a long way towards solving these type of elemental problems.

While luis.espinal answer, performance-wise, is better I think you should also take a look into Iterator's as they will give you greater flexibility reading back-and-forth.
Meaning that you could just as easy write FLAMESFLAMES as FLAMESSEMALF, etc...
int ctr = 10;
List<Character> flames = Arrays.asList('F','L','A','M','E','S');
Iterator it = flames.iterator();
for(int z=0; z<ctr-1; z++) {
if(!it.hasNext()) // if you are at the end of the list reset iterator
it = flames.iterator();
System.out.println(it.next().toString()); // use the element
}
Out of curiosity doing this loop 1M times (avg result from 100 samples) takes:
using modulo: 51ms
using iterators: 95ms
using guava cycle iterators: 453ms
Edit:
Cycle iterators, as lbalazscs nicely put it, are even more elegant. They come at a price, and Guava implementation is 4 times slower. You could roll your own implementation, tough.
// guava example of cycle iterators
Iterator<Character> iterator = Iterators.cycle(flames);
for (int z = 0; z < ctr - 1; z++) {
res = iterator.next();
}

You should use % to force the index stay within flames.length so that they make valid index
int len = flames.length;
for(int z = 0; z < ctr-1; z++){
res = (flames[z % len]);
jLabel1.setText(String.valueOf(res));
}

You can try the following:-
char res;
int ctr = 10
char[] flames = {'F','L','A','M','E','S'};
int n = flames.length();
for(int z = 0; z < ctr-1; z++){
res = flames[z %n];
jLabel1.setText(String.valueOf(res));
}

Here is how I would do this:
String flames = "FLAMES";
int ctr = 10;
textLoop(flames.toCharArray(), jLabel1, ctr);
The textLoop method:
void textLoop(Iterable<Character> text, JLabel jLabel, int count){
int idx = 0;
while(true)
for(char ch: text){
jLabel.setText(String.valueOf(ch));
if(++idx < count) return;
}
}
EDIT: found a bug in the code (idx needed to be initialized outside the loop). It's fixed now. I've also refactored it into a seperate function.

Java, generate a text

I have a problem with a part of my program. In the following code 27 is the number of letters of our alphabet; list_extended is an objcet of type Hashtable(StringBuffer,Integer), containing all strings of length n+1 in the text with the number of occurrences; list_extended is built correctly because I have already controlled this part. The aim is : for every repetition of the outer for we take te last n characters of text generated, and for every letter of the alphabet we count the number of occurrences in list_extended of the n+1 characters obtained by adding the character to the last n characters of text_generated; then we choose the letter with the biggest number of occurrences. The result I obtain is that occurrences contains all 0's, why ? The code
int index;
int[] occurrences = new int[27];
StringBuffer curr_word;
for(int x = 0; x < m; x++){ // m is the number of characters the user wants to add
curr_word = new StringBuffer(text_generated.substring(x, x+n)); // n is an integer entered previously, initially text_generated is formed by n characters
for(int j = 0; j < 27; j++){
if(list_extended.get(curr_word.append(array[j]))==null)
occurrences[j] = 0;
else
occurrences[j] =(int) list_extended.get(curr_word.append(array[j]));
}
index = 0;
for(int j = 1; j < 27; j++){
if(occurrences[j] > occurrences[index])
index = j;
}
text_generated = text_generated.append(array[index]);
}
System.out.println("The text generatd is \n" + text_generated.toString());

Because you create new object curr_word, but you didn't put it in list_extended, so every time you check
if(list_extended.get(curr_word.append(array[j]))==null)
will be null and
occurrences[j] will be 0

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Breaking up paragraphs into String tokens - java

Related

Find char in string that is a digit and larger or equal to 2

Unsure how to implement for loop

Is there a difference in terms of algorithm efficiency between subtracting ASCII values and simply subtracting an integer?

Going back to the first index after reaching the last one in an array

Java, generate a text

Categories

Resources