Compressing a string in Java

Compressing a string in Java - java

Not sure why my code isn't working. If I input qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT, I get qw9w5e2ry5y4qE2ET3T when I should be getting q9w5e2rt5y4qw2Er3T.
Run-length encoding (RLE) is a simple "compression algorithm" (an algorithm which takes a block of data and reduces its size, producing a block that contains the same information in less space). It works by replacing repetitive sequences of identical data items with short "tokens" that represent entire sequences. Applying RLE to a string involves finding sequences in the string where the same character repeats. Each such sequence should be replaced by a "token" consisting of:
the number of characters in the sequence
the repeating character
If a character does not repeat, it should be left alone.
For example, consider the following string:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT
After applying the RLE algorithm, this string is converted into:
q9w5e2rt5y4qw2Er3T
In the compressed string, "9w" represents a sequence of 9 consecutive lowercase "w" characters. "5e" represents 5 consecutive lowercase "e" characters, etc.
Write a program that takes a string as input, compresses it using RLE, and outputs the compressed string. Case matters - uppercase and lowercase characters should be considered distinct. You may assume that there are no digit characters in the input string. There are no other restrictions on the input - it may contain spaces or punctuation. There is no need to treat non-letter characters any differently from letters.
public class Compress{
public static void main(String[] args){
System.out.println("Enter a string: ");
String str = IO.readString();
int count = 0;
String result = "";
for (int i=1; i<=str.length(); i++) {
char a = str.charAt(i-1);
count = 1;
if (i-2 >= 0) {
while (i<=str.length() && str.charAt(i-1) == str.charAt(i-2)) {
count++;
i++;
}
}
if (count==1) {
result = result.concat(Character.toString(a));
}
else {
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
}
IO.outputStringAnswer(result);
}
}

I would start at zero, and look forward:
public static void main(String[] args){
System.out.println("Enter a string: ");
String str = IO.readString();
int count = 0;
String result = "";
for (int i=0; i < str.length(); i++) {
char a = str.charAt(i);
count = 1;
while (i + 1 < str.length() && str.charAt(i) == str.charAt(i+1)) {
count++;
i++;
}
if (count == 1) {
result = result.concat(Character.toString(a));
} else {
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
}
IO.outputStringAnswer(result);
}
Some outputs:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT => q9w5e2rt5y4qw2Er3T
qqwwwwwwwweeeeerrtyyyyyqqqqwEErTTT => 2q8w5e2rt5y4qw2Er3T
qqwwwwwwwweeeeerrtyyyyyqqqqwEErTXZ => 2q8w5e2rt5y4qw2ErTXZ
aaa => 3a
abc => abc
a => a

Related

Count the numbers of numeric value in a given string

Java program to accept a string and count total numeric values.
public class Test2{
public static void main(String[] args){
String str = "I was 2 years old in 2002";
int count = 0, i;
for(i = 0; i < str.length(); i++){
if(str.charAt(i) >= 48 && str.charAt(i) <= 57){
count++;
// while(str.charAt(i) >= 48 && str.charAt(i) <= 57)
// i++;
}
}
System.out.println("Output: " +count);
}
}
Output = 5
After uncommenting the two lines written inside while loop -
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25
at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:48)
at java.base/java.lang.String.charAt(String.java:712)
at Test2.main(Test2.java:9)
The output should be 2, because there are two numeric values - 2 and 2002
I have commented on the two lines in the above code, after uncommenting the code, the same logic works perfectly in C++.

An alternative to #DarkMatter´s answer using Pattern:
public static void main(String[] args) {
String str = "I was 2 years old in 2002";
long count = Pattern.compile("\\d+").matcher(str).results().count();
System.out.println(count);
}

You are checking individual charters so it counts every digit (as you probably realize). Java String has some nice tools to help you here. You could split the line into words and check each against a regular expression using String.matches():
String str = "I was 2 years old in 2002";
int count = 0;
for(String s : str.split(" ")) {
if(s.matches("[0-9]*")) {
count++;
}
}
System.out.println(count);
You can do the same thing (almost) with a stream:
String str = "I was 2 years old in 2002";
long count = Arrays.stream(str.split(" "))
.filter(s -> s.matches("[0-9]*")).count();
System.out.println(count);

In C, strings end in an ASCII NUL character (well, in basic C, strings don't exist, it's a library bolt-on, but most bolt-ons have NUL terminated strings). In java, that's not how it works.
The reason that your code is not working in java, but it is in C, is that you just keep going until you hit a non-digit character in that inner while loop. That means if the string ends in a digit (which yours does), your code asks the string: Give me the character at (one position beyond its length). In C that works; that's ASCII NUL, and thus your inner loop ends, as that's not a digit.
In java it doesn't, because you can't ask for a character beyond the end of a string.
You can 'fix' your code as pasted by also adding a check that i is still below length: if (i < str.length() && str.charAt(i).... ).
As the other answers showed you, there are more java idiomatic ways to solve this problem too, and probably the strategies shown in the other answers is what your average java coder would most likely do if faced with this problem. But there's nothing particularly wrong with your C-esque solution, once you add the 'check length' fix.

below code will input String from user and return the number of occurrences of numeric values as count.
import java.util.Scanner;
public class NumberCountingString
{
public static void main(String[] args)
{
Scanner in = new Scanner(System.in);
String str = in.nextLine();
int count = 0, i;
int size = str.length(); // will only get size once instead of using in loop which will always get size before comparing
for(i = 0; i < size; i++)
{
if(Character.isDigit(str.charAt(i))) //if char is digit count++
{
count++;
for (int j = i; j < size; ) //inner loop to check if next characters are also digits
{
if(Character.isDigit(str.charAt(j))) // if yes skip next char
{
i++;
j=i;
}
else{ //break inner loop
break;
}
}
}
}
System.out.println("Output: " +count);
}
}

There are many options in Java as already shared by others. Below is very similar to your existing code and gives your desired output:
public static void main(String[] args) {
String str = "I was 2 years old in 2002";
String[] splittedString = str.split(" ");
int count = 0, i;
for (i = 0; i < splittedString.length; i++) {
if (StringUtils.isNumeric(splittedString[i])) {
count++;
}
}
System.out.println("Output: " + count);
}

You can split this string into an array of words, then filter those words where codePoints of the characters match digits, i. e. allMatch (Character::isDigit), and count these words:
String str = "I was 2 years old in 2002";
long count = Arrays
// split string into an array of words
.stream(str.split("\\s+"))
// for each word check the code points of the
// characters, whether they are digits or not.
.filter(w -> w.codePoints()
.mapToObj(ch -> (char) ch)
.allMatch(Character::isDigit))
.count();
System.out.println(count); // 2
See also: Transform String to byte then to int

How to code all possible combinations of string?

Lets say I have String word = "hello12".
I need to have all possible combinations of special characters instead of numbers (characters I get when use shift+number). So, the result I want to get is hello12, hello!2, hello1#, hello!#.
What I did is created switch with all cases (1 = '!', 2 = '#'...) but I can't get how to code all the combinations. All I could code is change all the numbers with special symbols (code is below)
char[] passwordInCharArray;
for(int i=0; i<passwordList.length; i++){
for(int j = 0; j<passwordList[i].length(); j++){
if(Character.isDigit((passwordList[i].charAt(j)))){
passwordInCharArray = passwordList[i].toCharArray();
passwordInCharArray[j] = getSpecialSymbol(passwordList[i].charAt(j));
passwordList[i]=String.valueOf(passwordInCharArray);
}
}
}

Theory
Combinatory is often easier to express with recursive methods (methods that call themselves).
I think that the algorithm is more understandable with an example so let's take String word = hello12.
We will iterate on each character until a digit is found. The first one is 1. At this point, we can imagine the word been split in two by a virtual cursor:
hello is on the left side. We know that it won't change.
12 is on the right side. Each character is likely to be a digit and thus to change.
To retrieve all the possible combinations, we want to:
Keep the first part of the word
Compute all the possible combinations of the second part of the word
Append each of these combinations to the first part of the word
The following tree represents what we want to compute (the root is the first part of the word, each branch represent a combination)
hello
├───1
│ ├───2 (-> hello12)
│ └───# (-> hello1#)
└───!
├───2 (-> hello!2)
└───# (-> hello!#)
You want to write an algorithm that gathers all the branches of this tree.
Java Code
/!\ I advise you to try to implement what I described above before taking a look at the code: that's how we improve ourselves!
Here is the corresponding Java code:
public static void main(String[] args) {
Set<String> combinations = combinate("hello12");
combinations.forEach(System.out::println);
}
public static Set<String> combinate(String word) {
// Will hold all the combinations of word
Set<String> combinations = new HashSet<String>();
// The word is a combination (could be ignored if empty, though)
combinations.add(word);
// Iterate on each word's characters
for (int i = 0; i < word.toCharArray().length; i++) {
char character = word.toCharArray()[i];
// If the character should be replaced...
if (Character.isDigit(character)) {
// ... we split the word in two at the character's position & pay attention not be exceed word's length
String firstWordPart = word.substring(0, i);
boolean isWordEnd = i + 1 >= word.length();
String secondWordPart = isWordEnd ? "" : word.substring(i + 1);
// Here is the trick: we compute all combinations of the second word part...
Set<String> allCombinationsOfSecondPart = combinate(secondWordPart);
// ... and we append each of them to the first word part one by one
for (String string : allCombinationsOfSecondPart) {
String combination = firstWordPart + getSpecialSymbol(character) + string;
combinations.add(combination);
}
}
}
return combinations;
}
Please leave a comment if you want me to explain the algorithm further.

Building on the code from: Generate All Possible Combinations - Java, I've come up with this implementation that does what you need. It will find the index of all digits in your string and then generate all possibilities in which they can be replaced with the special characters.
import java.util.*;
public class Comb {
public static List<String> combinations(String pass) {
String replace = ")!##$%^&*(";
char[] password = pass.toCharArray();
List<Integer> index = new ArrayList<Integer>();
List<String> results = new ArrayList<String>();
results.add(pass);
//find all digits
for (int i = 0; i < password.length; i++) {
if (Character.isDigit(password[i])) {
index.add(i);
}
}
//generate combinations
int N = (int) Math.pow(2d, Double.valueOf(index.size()));
for (int i = 1; i < N; i++) {
String code = Integer.toBinaryString(N | i).substring(1);
char[] p = Arrays.copyOf(password, password.length);
//replace the digits with special chars
for (int j = 0; j < index.size(); j++) {
if (code.charAt(j) == '1') {
p[index.get(j)] = replace.charAt(p[index.get(j)] - '0');
}
}
results.add(String.valueOf(p));
}
return results;
}
public static void main(String... args) {
System.out.println(combinations("hello12"));
}
}

Modify the characters of words in a Java string with punctuation, but keep the positions of said punctuation?

For instance, take the following list of Strings, disregarding the inverted commas:
"Hello"
"Hello!"
"I'm saying Hello!"
"I haven't said hello yet, but I will."
Now let's say I'd like to perform a certain operation on the characters of each word — for instance, say I'd like to reverse the characters, but keep the positions of the punctuation. So the result would be:
"olleH"
"olleH!"
"m'I gniyas olleH!"
"I tneva'h dias olleh tey, tub I lliw."
Ideally I'd like my code to be independent of the operation performed on the string (another example would be a random shuffling of letters), and independent of all punctuation—so hyphens, apostrophes, commas, full stops, en/em dashes, etc. all remain in their original positions after the operation is performed. This probably requires some form of regular expressions.
For this, I was thinking that I should save the indices and characters of all punctuation in a given word, perform the operation, and then re-insert all punctuation at their correct positions. However, I can't think of a way to do this, or a class to use.
I have a first attempt, but this unfortunately does not work with punctuation, which is the key:
jshell> String str = "I haven't said hello yet, but I will."
str ==> "I haven't said hello yet, but I will."
jshell> Arrays.stream(str.split("\\s+")).map(x -> (new StringBuilder(x)).reverse().toString()).reduce((x, y) -> x + " " + y).get()
$2 ==> "I t'nevah dias olleh ,tey tub I .lliw"
Has anyone got an idea how I might fix this? Thanks very much. There's no need for full working code—maybe just a signpost to an appropriate class I could use to perform this operation.

No need to use regex for this, and you certainly shouldn't use split("\\s+"), since you'd lose consecutive spaces, and the type of whitespace characters, i.e. the spaces of the result could be incorrect.
You also shouldn't use charAt() or anything like it, since that would not support letters from the Unicode Supplemental Planes, i.e. Unicode characters that are stored in Java strings as surrogate pairs.
Basic logic:
Locate start of word, i.e. start of string or first character following whitespace.
Locate end of word, i.e. last character preceding whitespace or end of string.
Iterating from beginning and end in parallel:
Skip characters that are not letters.
Swap the letters.
As Java code, with full Unicode support:
public static String reverseLettersOfWords(String input) {
int[] codePoints = input.codePoints().toArray();
for (int i = 0, start = 0; i <= codePoints.length; i++) {
if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
for (int end = i - 1; ; start++, end--) {
while (start < end && ! Character.isLetter(codePoints[start]))
start++;
while (start < end && ! Character.isLetter(codePoints[end]))
end--;
if (start >= end)
break;
int tmp = codePoints[start];
codePoints[start] = codePoints[end];
codePoints[end] = tmp;
}
start = i + 1;
}
}
return new String(codePoints, 0, codePoints.length);
}
Test
System.out.println(reverseLettersOfWords("Hello"));
System.out.println(reverseLettersOfWords("Hello!"));
System.out.println(reverseLettersOfWords("I'm saying Hello!"));
System.out.println(reverseLettersOfWords("I haven't said hello yet, but I will."));
System.out.println(reverseLettersOfWords("Works with surrogate pairs: 𝓐𝓑𝓒+𝓓 "));
Output
olleH
olleH!
m'I gniyas olleH!
I tneva'h dias olleh tey, tub I lliw.
skroW htiw etagorrus sriap: 𝓓𝓒𝓑+𝓐
Note that the special letters at the end are the first 4 shown here in column "Script (or Calligraphy)", "Bold", e.g. the 𝓐 is Unicode Character 'MATHEMATICAL BOLD SCRIPT CAPITAL A' (U+1D4D0), which in Java is two characters "\uD835\uDCD0".
UPDATE
The above implementation is optimized for reversing the letters of the word. To apply an arbitrary operation to mangle the letters of the word, use the following implementation:
public static String mangleLettersOfWords(String input) {
int[] codePoints = input.codePoints().toArray();
for (int i = 0, start = 0; i <= codePoints.length; i++) {
if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
int wordCodePointLen = 0;
for (int j = start; j < i; j++)
if (Character.isLetter(codePoints[j]))
wordCodePointLen++;
if (wordCodePointLen != 0) {
int[] wordCodePoints = new int[wordCodePointLen];
for (int j = start, k = 0; j < i; j++)
if (Character.isLetter(codePoints[j]))
wordCodePoints[k++] = codePoints[j];
int[] mangledCodePoints = mangleWord(wordCodePoints.clone());
if (mangledCodePoints.length != wordCodePointLen)
throw new IllegalStateException("Mangled word is wrong length: '" + new String(wordCodePoints, 0, wordCodePoints.length) + "' (" + wordCodePointLen + " code points)" +
" vs mangled '" + new String(mangledCodePoints, 0, mangledCodePoints.length) + "' (" + mangledCodePoints.length + " code points)");
for (int j = start, k = 0; j < i; j++)
if (Character.isLetter(codePoints[j]))
codePoints[j] = mangledCodePoints[k++];
}
start = i + 1;
}
}
return new String(codePoints, 0, codePoints.length);
}
private static int[] mangleWord(int[] codePoints) {
return mangleWord(new String(codePoints, 0, codePoints.length)).codePoints().toArray();
}
private static CharSequence mangleWord(String word) {
return new StringBuilder(word).reverse();
}
You can of course replace the hardcoded call to the either mangleWord method with a call to a passed-in Function<int[], int[]> or Function<String, ? extends CharSequence> parameter, if needed.
The result with that implementation of the mangleWord method(s) is the same as the original implementation, but you can now easily implement a different mangling algorithm.
E.g. to randomize the letters, simply shuffle the codePoints array:
private static int[] mangleWord(int[] codePoints) {
Random rnd = new Random();
for (int i = codePoints.length - 1; i > 0; i--) {
int j = rnd.nextInt(i + 1);
int tmp = codePoints[j];
codePoints[j] = codePoints[i];
codePoints[i] = tmp;
}
return codePoints;
}
Sample Output
Hlelo
Hlleo!
m'I nsayig oHlel!
I athen'v siad eohll yte, btu I illw.
srWok twih rueoatrsg rpasi: 𝓑𝓒𝓐+𝓓

I suspect there's a more efficient solution but here's a naive one:
Split sentence into words on spaces (note - if you have multiple spaces my implementation will have problems)
Strip punctuation
Reverse each word
Go through each letter, and insert character from reversed word AND insert punctuation from original word if necessary
public class Reverser {
public String reverseSentence(String sentence) {
String[] words = sentence.split(" ");
return Arrays.stream(words).map(this::reverseWord).collect(Collectors.joining(" "));
}
private String reverseWord(String word) {
String noPunctuation = word.replaceAll("\\W", "");
String reversed = new StringBuilder(noPunctuation).reverse().toString();
StringBuilder result = new StringBuilder();
for (int i = 0; i < word.length(); ++i) {
char ch = word.charAt(i);
if (!Character.isAlphabetic(ch) && !Character.isDigit(ch)) {
result.append(ch);
}
if (i < reversed.length()) {
result.append(reversed.charAt(i));
}
}
return result.toString();
}
}

breaking down any String

Hi guys I am busy with breaking / splitting Strings.
However the String is not fixed so when the input changes the program still has to work with any character input.
Till now I got this far but I got lost.
I have made an array of characters and set the size of the array equal to the lenght of any string that is will get as input. I made a for loop to loop through the characters of a string.
how do I insert my string now into the array because I know that my string is not yet in there? Then when its finally looping through the characters of my string is has to printout numbers and operands on different lines. So the ouput would look like in this case like this;
1
+
3
,
432
.
123
etc
I want to do this without using matchers,scanner, etc. I want to use basic Java techniques like you learn in the first 3 chapters of HeadfirstJava.
public class CharAtExample {
public static void main(String[] args) {
// This is the string we are going to break down
String inputString = "1+3,432.123*4535-24.4";
int stringLength = inputString.length();
char[] destArray = new char[stringLength];{
for (int i=0; i<stringLength; i++);
}

You could use Character.isDigit(char) to distinguish numeric and not numeric chars as actually this is the single criteria to group multiple chars in a same line.
It would give :
public static void main(String[] args) {
String inputString = "1+3,432.123*4535-24.4";
String currentSequence = "";
for (int i = 0; i < inputString.length(); i++) {
char currentChar = inputString.charAt(i);
if (Character.isDigit(currentChar)) {
currentSequence += currentChar;
continue;
}
System.out.println(currentSequence);
System.out.println(currentChar);
currentSequence = "";
}
// print the current sequence that is a number if not printed yet
if (!currentSequence.equals("")) {
System.out.println(currentSequence);
}
}
Character.isDigit() relies on unicode category.
You could code it yourself such as :
if (Character.getType(currentChar) == Character.DECIMAL_DIGIT_NUMBER) {...}
Or you could code it still at a lower level by checking that the int value of the char is included in the range of ASCII decimal values for numbers:
if(currentChar >= 48 && currentChar <= 57 ) {
It outputs what you want :
1
+
3
,
432
.
123
*
4535
-
24
.
4

It's easier than you might think.
First: to get an array with the chars of your string you just use the toCharArray() method that all strings have. ex. myString.toCharArray()
Second: When you see that a character is not a number, you want to move to the next line, print the character and then move to the next line again. The following code does exactly that :
public class JavaApplication255 {
public static void main(String[] args) {
String inputString = "1+3,432.123*4535-24.4";
char[] destArray = inputString.toCharArray();
for (int i = 0 ; i < destArray.length ; i++){
char c = destArray[i];
if (isBreakCharacter(c)){
System.out.println("\n" + c);
} else {
System.out.print(c);
}
}
}
public static boolean isBreakCharacter(char c){
return c == '+' || c == '*' || c == '-' || c == '.' || c == ',' ;
}

char[] charArray = inputString.toCharArray();

Here is a possible solution where we go character by character and either add to an existing string which will be our numbers or it adds the string to the array, clears the current number and then adds the special characters. Finally we loop through the array as many times as we find a number or non-number character. I used the ASCII table to identify a character as a digit, the table will come in handy throughout your programming career. Lastly I changed the array to a String array because a character can't hold a number like "432", only '4' or '3' or '2'.
String inputString = "1+3,432.123*4535-24.4";
int stringLength = inputString.length();
String[] destArray = new String[stringLength];
int destArrayCount = 0;
String currentString = "";
for (int i=0; i<stringLength; i++)
{
//check it's ascii value if its between 0 (48) and 9 (57)
if(inputString.charAt(i) >= 48 && inputString.charAt(i) <= 57 )
{
currentString += inputString.charAt(i);
}
else
{
destArray[destArrayCount++] = currentString;
currentString = "";
//we know we don't have a number at i so its a non-number character, add it
destArray[destArrayCount++] = "" + inputString.charAt(i);
}
}
//add the last remaining number
destArray[destArrayCount++] = currentString;
for(int i = 0; i < destArrayCount; i++)
{
System.out.println("(" + i + "): " + destArray[i]);
}
IMPORTANT - This algorithm will fail if a certain type of String is used. Can you find a String where this algorithm fails? What can you do to to ensure the count is always correct and not sometimes 1 greater than the actual count?

Can't find logic errors

The code should do the following:
Write a method called compress that takes a string as input, compresses it using RLE, and returns the compressed string. Case matters - uppercase and lowercase characters should be considered distinct. You may assume that there are no digit characters in the input string. There are no other restrictions on the input - it may contain spaces or punctuation. There is no need to treat non-letter characters any differently from letters.If a character does not repeat, it should be left alone.
For example, consider the following string:
qwwwwwwwwweeeeerrtyyyyyqqqqwEErTTT
After applying the RLE algorithm, this string is converted into:
q9w5e2rt5y4qw2Er3T
However, when I upload it the grading system gives a zero and gives me the following hints:
Double check your algorithm for logical errors (2 occurrences)
Double check that you are compressing single characters properly (2 occurrences)
I am not sure where the errors are since all the test cases I used the output was correct.
Here is my compress method:
public static String compress (String original)
{
StringBuilder compressed = new StringBuilder();
char letter = 0;
int count = 1;
for (int i = 0; i < original.length(); i++) {
if (letter == original.charAt(i)) {
count = count + 1;
}
else {
compressed = count !=1 ? compressed.append(count) : compressed;
compressed.append(letter);
letter = original.charAt(i);
count = 1;
}
}
compressed = count !=1 ? compressed.append(count) : compressed;
compressed.append(letter);
return compressed.toString();
}

Base on the definition of RLE https://en.wikipedia.org/wiki/Run-length_encoding
Single character should have also have a count in front of them.
So the result should be
1q9w5e2r1t5y4q1w2E1r3T
Instead of
q9w5e2rt5y4qw2Er3T
Therefore, you need to change
compressed = count !=1 ? compressed.append(count) : compressed;
To just
compressed.append(count);
Below is one way to resolve it, I treat the previousLetter a bit differently from you:
public static String compress(String original) {
if (original.isEmpty()) return "";
StringBuilder compressed = new StringBuilder();
char previousLetter = original.charAt(0); // initialize the previous letter
int count = 1;
// start searching from the second letter
for (int i = 1; i < original.length(); i++) {
if (previousLetter == original.charAt(i)) {
count = count + 1;
} else {
compressed.append(count);
compressed.append(previousLetter);
previousLetter = original.charAt(i);
count = 1;
}
}
compressed.append(count);
compressed.append(previousLetter);
return compressed.toString();
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Compressing a string in Java - java

Related

Count the numbers of numeric value in a given string

How to code all possible combinations of string?

Modify the characters of words in a Java string with punctuation, but keep the positions of said punctuation?

breaking down any String

Can't find logic errors

Categories

Resources