Set line length for a string - java

The problem I'm trying to solve is given a string that may contain carriage returns, insert additional carriage returns such that each line does not exceed a set number of characters. Also it should try to keep a word intact if possible.
Is there a library in either Java or Scala that does what I need?

There is a BreakIterator class in the java.text package that can tell you where you could insert a line break, but it's a little complicated to use. A regular expression like this can do 80% of the job:
str += "\n"; // Needed to handle last line correctly
// insert line break after max 50 chars on a line
str = str.replaceAll("(.{1,50})\\s+", "$1\n");
The Apache commons lang library has a WordUtils class, which includes a wrap method, to wrap a long line of text to several lines of given length on word boundaries.

public static String addReturns(String s, int maxLength)
{
String newString = "";
int ind = 0;
while(ind < s.length())
{
String temp = s.substring(ind, Math.min(s.length(), ind+maxLength));
int lastSpace = temp.lastIndexOf(" ");
int firstNewline = temp.indexOf("\n");
if(firstNewline>-1)
{
newString += temp.substring(0, firstNewline + 1);
ind += firstNewline + 1;
}
else if(lastSpace>-1)
{
newString += temp.substring(0, lastSpace + 1) + "\n";
ind += lastSpace + 1;
}
else
{
newString += temp + "\n";
ind += maxLength;
}
}
return newString;
}
This will do the trick if you don't want to use regular expressions.
System.out.println(addReturns("Hi there, I'm testing to see if this\nalgorithm is going to work or not. Let's see. ThisIsAReallyLongWordThatShouldGetSplitUp", 20));
Output:
Hi there, I'm
testing to see if
this
algorithm is going
to work or not.
Let's see.
ThisIsAReallyLongWor
dThatShouldGetSplitU
p

I think you can start with something like that. Note that you will have to handle the special case when a word is more than the MAX_LINE_LENGTH.
package com.ekse.nothing;
public class LimitColumnSize {
private static String DATAS = "It was 1998 and the dot-com boom was in full effect. I was making websites as a 22 year old freelance programmer in NYC. I charged my first client $1,400. My second client paid $5,400. The next paid $24,000. I remember the exact amounts ā€” they were the largest checks Iā€™d seen up til that point.\n"
+ "Then I wrote a proposal for $340,000 to help an online grocery store with their website. I had 5 full time engineers at that point (all working from my apartment) but it was still a ton of dough. The client approved, but wanted me to sign a contract ā€” everything had been handshakes up til then.\n"
+ "No prob. Sent the contract to my lawyer. She marked it up, sent it to the client. Then the client marked it up and sent it back to my lawyer. And so on, back and forth for almost a month. I was inexperienced and believed that this is just how business was done."
+ "Annoyed by my lawyering, the client eventually gave up and hired someone else.";
private static int MAX_LINE_LENGTH = 80;
private static char[] BREAKING_CHAR = {' ', ',', ';', '!', '?', ')', ']', '}'}; // Probably some others
public static void main(String[] args) {
String current = DATAS;
String result = "";
while (current.length() != 0) {
for (int i = (current.length() - 1) < MAX_LINE_LENGTH ? current.length() - 1 : MAX_LINE_LENGTH; i >= 0; i--) {
if (current.charAt(i) == '\n') {
result += current.substring(0, i);
current = current.substring(i + 1);
break;
} else if (isBreakingChar(current.charAt(i))) {
result += current.substring(0, i) + "\n";
current = current.substring(i + 1);
break;
} else if (i == 0 && (current.length() - 1) < MAX_LINE_LENGTH) {
result += current;
current = "";
} else {
// Line cannot be break, try to go to the right and find the next BREAKING_CHAR
}
}
}
System.out.println(result);
}
private static boolean isBreakingChar(char c) {
for (int i = 0; i < BREAKING_CHAR.length; ++i) {
if (c == BREAKING_CHAR[i]) {
return true;
}
}
return false;
}
}

If anybody is interested my final solution used Apache Commons WordUtils, thanks to Joni for pointing the WordUtils out to me.
private static String wrappify(String source, int lineLength, String eolMarker){
String[] lines = source.split(eolMarker);
StringBuffer wrappedStr = new StringBuffer();
for (String line : lines) {
if(line.length() <= lineLength){
wrappedStr.append(line + eolMarker);
}else{
wrappedStr.append(WordUtils.wrap(line, lineLength, eolMarker, true) + eolMarker);
}
}
return wrappedStr.replace(wrappedStr.lastIndexOf(eolMarker), wrappedStr.length(), "").toString();
}

Related

How to efficiently remove consecutive same characters in a string

I wrote a method to reduce a sequence of the same characters to a single character as follows. It seems its logic is correct while there is a room for improvement in terms of performance, according to my tutor. Could anyone shed some light on this?
Comments of aspects other than performance is also really appreciated.
public class RemoveRepetitions {
public static String remove(String input) {
String ret = "";
String last = "";
String[] stringArray = input.split("");
for(int j=0; j < stringArray.length; j++) {
if (! last.equals(stringArray[j]) ) {
ret += stringArray[j];
}
last = stringArray[j];
}
return ret;
}
public static void main(String[] args) {
System.out.println(RemoveRepetitions.remove("foobaarrbuzz"));
}
}
We can improve the performance by using StringBuilder instead of using string as string operations are costlier. Also, the split function is also not required (it will make the program slower as well).
Here is a way to solve this:
public static String remove(String input)
{
StringBuilder answer = new StringBuilder("");
int N = input.length();
int i = 0;
while (i < N)
{
char c = input.charAt(i);
answer.append( c );
while (i<N && input.charAt(i)==c)
++i;
}
return answer.toString();
}
The idea is to iterate over all characters of the input string and keep appending every new character to the answer and skip all the same consecutive characters.
Possible change which you could think of in your code is:
Time Complexity: Your code is achieving output in O(n) time complexity, which might be the best possible way.
Space Complexity: Your code is using extra memory space which arises due to splitting.
Question to ask: Can you achieve this output, without using the extra space for character array that you get after splitting the string? (as character by character traversal is possible directly on string).
I can provide you the code here but, it would be great if you could try it on your own, once you are done with your attempts
you can lookup for the best solution here (you are almost there)
https://www.geeksforgeeks.org/remove-consecutive-duplicates-string/
Good luck!
As mentioned before, it is much better to access the characters in the string using method String::charAt or at least by iterating a char array retrieved with String::toCharArray instead of splitting the input string into String array.
However, Java strings may contain characters exceeding basic multilingual plane of Unicode (e.g. emojis šŸ˜‚šŸ˜šŸ˜Š, Chinese or Japanese characters etc.) and therefore String::codePointAt should be used. Respectively, Character.charCount should be used to calculate appropriate offset while iterating the input string.
Also the input string should be checked if it's null or empty, so the resulting code may look like this:
public static String dedup(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int prev = -1;
int n = str.length();
System.out.println("length = " + n + " of [" + str + "], real length: " + str.codePointCount(0, n));
StringBuilder sb = new StringBuilder(n);
for (int i = 0; i < n; ) {
int cp = str.codePointAt(i);
if (i == 0 || cp != prev) {
sb.appendCodePoint(cp);
}
prev = cp;
i += Character.charCount(cp); // for emojis it returns 2
}
return sb.toString();
}
A version with String::charAt may look like this:
public static String dedup2(String str) {
if (null == str || str.isEmpty()) {
return str;
}
int n = str.length();
StringBuilder sb = new StringBuilder(n);
sb.append(str.charAt(0));
for (int i = 1; i < n; i++) {
if (str.charAt(i) != str.charAt(i - 1)) {
sb.append(str.charAt(i));
}
}
return sb.toString();
}
The following test proves that charAt fails to deduplicate repeated emojis:
System.out.println("codePoint: " + dedup ("šŸ˜‚šŸ˜‚šŸ˜šŸ˜šŸ˜ŠšŸ˜ŠšŸ˜‚ hello"));
System.out.println("charAt: " + dedup2("šŸ˜‚šŸ˜‚šŸ˜šŸ˜šŸ˜ŠšŸ˜ŠšŸ˜‚ hello"));
Output:
length = 20 of [šŸ˜‚šŸ˜‚šŸ˜šŸ˜šŸ˜ŠšŸ˜ŠšŸ˜‚ hello], real length: 13
codePoint: šŸ˜‚šŸ˜šŸ˜ŠšŸ˜‚ helo
charAt: šŸ˜‚šŸ˜‚šŸ˜šŸ˜šŸ˜ŠšŸ˜ŠšŸ˜‚ helo

Need to encode repetitive pattern in String with * , such that * means "repeat from beginning"

Encoding format: introduce * to indicate "repeat from beginning". Example. Input-{a,b,a,b,c,a,b,a,b,c,d} can be written as {a , b, * ,c, * , d}. Output:5; E.g 2: ABCABCE, output- 5.
Here * means repeat from beginning. For example if given String is ABCABCABCABC , it will return ABC**, another example is if String is ABCABCABC, it will return ABC*ABC.
I have the below code but this code assumes that the string will contain the repetitive pattern only and no other characters, I want to modify it to check :
1. Which pattern is repeating
2. Ignore non repeating patterns
2. encode that pattern according to the problem statement
import java.util.Scanner;
public class Magicpotion {
public static void main(String args[]) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter the string:");
String str = sc.nextLine();
int len = str.length();
if (len != 0) {
int lenby3 = len / 3;
int starcount = ( int).(Math.log(lenby3) / Math.log(2));
int leftstring = (lenby3 - (int) Math.pow(2, starcount));
int resultlen = (1 * 3) + starcount + (leftstring * 3);
System.out.println("ResultLength: " + resultlen);
System.out.print("ABC");
for (int i = 0; i < starcount; i++) {
System.out.print("*");
}
for (int i = 0; i < leftstring; i++) {
System.out.print("ABC");
}
} else
System.out.println("ResultLength: " + 0);
}
}
Here my assumption is that ABC will always be repeating pattern , hence I have divided the length by 3. I want to generalise it such that I find the repeating pattern which can be a AB or BC or ABCD and proceed accordingly.
This looks like homework. So instead of a full solution just some hints:
You can process the input string character by character and encode as you go. If you have at some point already read k characters and the next k characters are exactly the same, output a * and advance to position 2k.
Otherwise, output the next input character and advance position to k+1.
As mentioned by dyukha this algorithm does not always result in the shortest possible encoding. If this is required some more effort has to be put into the search.
This problem can be solved using dynamic programming.
Assume that you processed your stay at some position i. You want to understand what it the minimal length of encoding of str[0..i]. Let's call it ans[i]. You have two options:
Just add i-th character to the encoding. So the length is ans[i-1] + 1.
You may write *, when possible. In this case the length is ans[i / 2] + 1 or something like this.
The final length is in ans[n-1]. You can store how you obtained ans[i] to recover the encoding itself.
Checking whether you can write * can be optimized, using some hashing (to obtain O(n) solution instead of O(n^2)).
The difference with Henry's solution is that he always applies * when it's possible. It's not clear to me that it results into the minimal length (if I understood correctly, aaaaaa is a counterexample), so I'm giving a solution I'm sure about.
/**
* #author mohamed ali
* https://www.linkedin.com/in/oo0shaheen0oo/
*/
public class Magic_potion_encoding
{
private static int minimalSteps( String ingredients )
{
StringBuilder sb = new StringBuilder(ingredients);
for(int i =0;i<sb.length();i++)
{
char startChar = sb.charAt(i);
int walkingIndex1=i;
int startIndex2 =sb.toString().indexOf(startChar,i+1);
int walkingIndex2=startIndex2;
while(walkingIndex2 !=-1 && walkingIndex2<sb.length() && sb.charAt(walkingIndex1) == sb.charAt(walkingIndex2) )
{
if(walkingIndex1+1==startIndex2)
{
String subStringToBeEncoded = sb.substring(i,walkingIndex2+1);//substring the string found and the original "substring does not include the last index hence the +1
int matchStartIndex = sb.indexOf(subStringToBeEncoded,walkingIndex2+1);// look for first match for the whole string matched
int matchEndeIndex= matchStartIndex+subStringToBeEncoded.length();
int origStartIndex=i;
int origEndIndex = i+subStringToBeEncoded.length();
if (matchStartIndex!=-1 )
{
if(origEndIndex==matchStartIndex)
{
sb.replace(matchStartIndex,matchEndeIndex,"*");
}
else
{
while(matchStartIndex!=-1 && matchEndeIndex<sb.length() && sb.charAt(origEndIndex) == sb.charAt(matchEndeIndex) )
{
if(origEndIndex==matchStartIndex-1)// if the index of the 2 strings are right behind one another
{
sb.replace(matchStartIndex,matchEndeIndex+1,"*");
}
else
{
origEndIndex++;
matchEndeIndex++;
}
}
}
}
sb.replace(startIndex2,walkingIndex2+1,"*");
break;
}
walkingIndex1++;
walkingIndex2++;
}
}
System.out.println("orig= " + ingredients + " encoded = " + sb);
return sb.length();
}
public static void main( String[] args )
{
if ( minimalSteps("ABCABCE") == 5 &&
minimalSteps("ABCABCEA") == 6 &&
minimalSteps("abbbbabbbb") == 5 &&
minimalSteps("abcde") == 5 &&
minimalSteps("abcbcbcbcd") == 6 &&
minimalSteps("ababcababce") == 6 &&
minimalSteps("ababababxx") == 6 &&
minimalSteps("aabbccbbccaabbccbbcc") == 8)
{
System.out.println( "Pass" );
}
else
{
System.out.println( "Fail" );
}
}
}
Given that the repetitions are from the beginning, every such repeating substring will have the very first character of the given string. [Every repetition needs to be represented by a "star". (i.e ABCABCABC ans = ABC** ) . If all sequential repetitions are to be represented with one "star". (i.e ABCABCABC and = ABC* ), a slight modification to (2) will do the thing (i.e remove the if case where the just a star is added)]
Divide the given string to substrings based on the first character.
Eg. Given String = "ABABCABD"
Sub Strings = {"AB", "ABC", "AB", "ABD"}
Just traverse through the list of substrings and get the required result. I've used a map here, to make the search easy.
Just a rough write up.
SS = {"AB", "ABC", "AB", "ABD"};
result = SS[0];
Map<string, bool> map;
map.put(SS[0],true);
for (i = 1; i < SS.length; i++){
if (map.hasKey(SS[i])){
result += "*";
}
else {
res = nonRepeatingPart(SS[i], map);
result += "*" + res;
map.put(SS[i], true);
}
}
String nonRepeatingPart(str, map){
for (j = str.length-1; j >= 0; j--){
if (map.hasKey(str.subString(0, j))){
return str.subString(j, str.length-1);
}
}
return throwException("Wrong Input");
}
string getCompressed(string str){
string res;
res += str[0];
int i=1;
while(i<str.size()){
//check if current char is the first char in res
char curr = str[i];
if(res[0]==curr){
if(str.substr(0,i)==str.substr(i,i)){
res += '*';
i+=i; continue;
}else{
res += curr;
i++; continue;
}
}else {
res += curr;
i++; continue;
}
}
return res;
}
int main()
{
string s = "ABCABCABC";
string res = getCompressed(s);
cout<<res.size();
return 0;
}

Returning a string minus a specific character between specific characters

I am going through the Java CodeBat exercises. Here is the one I am stuck on:
Look for patterns like "zip" and "zap" in the string -- length-3, starting with 'z' and ending with 'p'. Return a string where for all such words, the middle letter is gone, so "zipXzap" yields "zpXzp".
Here is my code:
public String zipZap(String str){
String s = ""; //Initialising return string
String diff = " " + str + " "; //Ensuring no out of bounds exceptions occur
for (int i = 1; i < diff.length()-1; i++) {
if (diff.charAt(i-1) != 'z' &&
diff.charAt(i+1) != 'p') {
s += diff.charAt(i);
}
}
return s;
}
This is successful for a few of them but not for others. It seems like the && operator is acting like a || for some of the example strings; that is to say, many of the characters I want to keep are not being kept. I'm not sure how I would go about fixing it.
A nudge in the right direction if you please! I just need a hint!
Actually it is the other way around. You should do:
if (diff.charAt(i-1) != 'z' || diff.charAt(i+1) != 'p') {
s += diff.charAt(i);
}
Which is equivalent to:
if (!(diff.charAt(i-1) == 'z' && diff.charAt(i+1) == 'p')) {
s += diff.charAt(i);
}
This sounds like the perfect use of a regular expression.
The regex "z.p" will match any three letter token starting with a z, having any character in the middle, and ending in p. If you require it to be a letter you could use "z[a-zA-Z]p" instead.
So you end up with
public String zipZap(String str) {
return str.replaceAll("z[a-zA-Z]p", "zp");
}
This passes all the tests, by the way.
You could make the argument that this question is about raw string manipulation, but I would argue that that makes this an even better lesson: applying regexes appropriately is a massively useful skill to have!
public String zipZap(String str) {
//If bigger than 3, because obviously without 3 variables we just return the string.
if (str.length() >= 3)
{
//Create a variable to return at the end.
String ret = "";
//This is a cheat I worked on to get the ending to work easier.
//I noticed that it wouldn't add at the end, so I fixed it using this cheat.
int minusAmt = 2;
//The minus amount starts with 2, but can be changed to 0 when there is no instance of z-p.
for (int i = 0; i < str.length() - minusAmt; i++)
{
//I thought this was a genius solution, so I suprised myself.
if (str.charAt(i) == 'z' && str.charAt(i+2) == 'p')
{
//Add "zp" to the return string
ret = ret + "zp";
//As long as z-p occurs, we keep the minus amount at 2.
minusAmt = 2;
//Increment to skip over z-p.
i += 2;
}
//If it isn't z-p, we do this.
else
{
//Add the character
ret = ret + str.charAt(i);
//Make the minus amount 0, so that we can get the rest of the chars.
minusAmt = 0;
}
}
//return the string.
return ret;
}
//If it was less than 3 chars, we return the string.
else
{
return str;
}
}

Translating phrases into pig latin with string

I am currently coding a pig latin translator that breaks down strings into words and then translate them. If any of the first four letters of a word are consonsants, it will move those letters to the back and add an "ay." If the word begins with a vowel, add "way" to the end of the word.
Apparently my pig latin translator code does not output a translated string, but instead it gives me original english words broken down in several different parts. I am a little stuck on what to go next, if anyone can help me diagnose the problem then it will be great. Thanks!
public class WL10Driver {
public String convertToPig(String english){
String pigLatin = "";
int pigLatinWord = 0;
String vowel = "[aeiouAEIOU]";
for(int i = 0; i<english.length(); i++){
char let = english.charAt(i);
int ind = vowel.indexOf(let);
if(ind > -1){
if(i == 0){
return english+"yay";
}
else{
String start = english.substring(0,i);
String end = english.substring(i);
return end+start+"ay";
}
}
}
return english+"ay";
}
}
It seems like the problem is with the calling method. I made it show JOptionPane.showMessageDialog(null,english);.
What should I make JOptionPane show instead?
boolean isVowel(char ch) {
return "aeiouAEIOU".contains("" + ch);
}
public String convertToPig(String english) {
if (english == null) return null;
for (int i = 0; i < Math.min(english.length(), 4); i++) {
char ch = english.charAt(i);
if (i == 0 && isVowel(ch)) return english + "way";
if (!isVowel(ch)) {
String tmp = "";
if (i < english.length() - 1) tmp = english.substring(i + 1);
return english.substring(0, i) + tmp + ch + "ay";
}
}
return english;
}

Recursively swap pairs of letters in a string in java

For example, if I call exchangePairs("abcdefg"), I should receive "badcfeg" in return.
This is for a homework assignment, any kind of pseudocode would be very helpful. I am just beginning to learn recursion and up until this problem I haven't had too much of an issue.
public String swapPairs(String s) {
if (s.length() < 2)
return s;
else
return swap(s.charAt(0), s.charAt(1)) + swapPairs(s.substring(2));
}
You're not just beginning to learn recursion, because recursion is part of your everyday live. You just don't notice, because it is so normal and nobody calls it recursion.
For example, you watch a movie on TV, and in one scene there is someone watching a movie on TV.
In programming, recursion is a way to make hard things easy. Always start with the easy case:
What is the result of exchangePairs("")?
What is the result of exchangePairs("x") where x is any character?
Suppose you have already completed exchangePairs(), how would the result be for "xy..." where "..." is any string? Surely "yx+++", where "+++" is the result of exchangePairs("...").
Now, it turns out that we've covered all cases! Problem solved!
Such is the greatness of recursion. You just use your function as if it were complete despite you've not completed it yet.
Why use recursion?
for (int i = 0; i + 1 < strlen(str); ++i) {
char tmp = str[i + 1];
str[i + 1] = str[i];
str[i] = tmp;
}
If you have to use recursion, I suppose you could do something like this:
char* exchangePairs(char* str) {
if (strlen(str) >= 2) {
// if there are characters left, swap the first two, then recurse
char tmp = str[1];
str[1] = str[0];
str[0] = str[1];
exchangePairs(str + 2);
}
return str;
}
That's in C, but it should give you the idea (I'm better in C and didn't want to just give you a copy/pasteable solution).
Use tail recursion
String reverse(String input)
{
if(String.length()==1)
{
return input;
}
else
{
return reverse(input,"");
}
}
String reverse(String input, String result)
{
if(input.length == 0) return result;
else return result(input.substring(1),input.charAt(0) + result);
}
Ok Here is my solution. I dont have Java at my disposal so I did it in C# which is very similar to Java so should be easy to understand/port;
public static char[] exchangePairs(char[] charArray, int current)
{
if(current >= charArray.Length - 1)
{
return charArray;
}
char nextChar = charArray[current + 1];
char currentChar = charArray[current];
charArray[current] = nextChar;
charArray[current + 1] = currentChar;
int i = current + 2;
return exchangePairs(charArray, i);
}
Call to the method:
exchangePairs("abcdefghij".ToCharArray(), 0);
public static String swapPairs(String s) {
String even = "";
String odd = "";
int length = s.length();
for (int i = 0; i <= length-2; i+=2) {
even += s.charAt(i+1) + "" + s.charAt(i);
}
if (length % 2 != 0) {
odd = even + s.charAt(length-1);
return odd;
} else {
return even;
}
}
A small adding on Steven's solution, you can use StringBuffer/StringBuilder.reverse() for reversing a string.
public String swapPairs(String s) {
if (s.length() < 2)
return s;
else {
return new StringBuffer(s.substring(0, 2)).reverse().toString() + swapPairs(s.substring(2));
}
}
I'd introduce an integer recursion control variable which is how much of the string has already been exchanged. At each level, check the control variable to see if there's more to do and, if so, exchange the next pair, increment by 2, and recurse.

Categories