Java: Find the longest sequential same character array

Java: Find the longest sequential same character array - java

I am a new guy to java. I want to find the longest sequential same character array in a input character arrays. For example,this character array bddfDDDffkl, the longest is DDD， and this one: rttttDDddjkl, the longest is tttt.
I use the following code to deal with this problem. But, I want to improve my code, For example, if there are two same length arrays (for example rtttgHHH, there are two longest: ttt and HHH), how to solve this problem?
Thanks in advance.
My following code:
public class SeqSameChar {
public static void main (String[] args) {
int subLength = 0;
Scanner sc = new Scanner(System.in);
String[] num = null;
num = sc.nextLine().split(" ");
String[] number = new String[num.length];
for(int i = 0; i< number.length;i++) {
number[i] = String.valueOf(num[i]);
}
subLength =length(number,num.length);
System.out.println(subLength);
for(int i = index; i < index+subLength; i++) {
System.out.print(number[i]);
}
System.out.println(c==c1);
}
public static int index;
//to calculate the longest contiguous increasing sequence
public static int length(String[] A,int size){
if(size<=0)return 0;
int res=1;
int current=1;
for(int i=1;i<size;i++){
if(A[i].equals(A[i-1])){
current++;
}
else{
if(current>res){
index=i-current;
res=current;
}
current=1;
}
}
return res;
}
}

This algorithm will work perfectly fine for what you want to develop:
Before that, let me make it clear that if you want to check repeatitions of 2 different characters same number of times, you have to run a for loop in reverse to identify the 2nd character. So if the 2nd character is not same as the first one identified, and also if it's number of repeatitions are the same, you print both the characters or else, just print the single character you find at the first for loop because both the characters are going to be same.
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter String 1: ");
String A1 = sc.nextLine();
MaxRepeat(A1);
}
public static void MaxRepeat(String A) {
int count = 1;
int max1 = 1;
char mostrepeated1 = ' ';
for(int i = 0; i < A.length()-1;i++) {
char number = A.charAt(i);
if(number == A.charAt(i+1)) {
count++;
if(count>max1) {
max1 = count;
mostrepeated1 = number;
}
continue;
}
count = 1;
}
count = 1;
int max2 = 1;
char mostrepeated2 = ' ';
for(int i = A.length()-1; i>0; i--) {
char number = A.charAt(i);
if(number == A.charAt(i-1)) {
count++;
if(count>max2) {
max2 = count;
mostrepeated2 = number;
}
continue;
}
count = 1;
}
if((max1==max2) && (mostrepeated1==mostrepeated2)) {
System.out.println("Most Consecutively repeated character is: " + mostrepeated1 + " and is repeated " + max1 + " times.");
}
else if((max1==max2) && (mostrepeated1!=mostrepeated2)) {
System.out.println("Most continously repeated characters are: " + mostrepeated1 + " and " + mostrepeated2 + " and they are repeated " + max1 + " times");
}
}

I'll give you a Scala implementation for that problem.
Here it is the automatic test (in BDD style with ScalaTest)
import org.scalatest._
class RichStringSpec extends FlatSpec with MustMatchers {
"A rich string" should "find the longest run of consecutive characters" in {
import Example._
"abceedd".longestRun mustBe Set("ee", "dd")
"aeebceeedd".longestRun mustBe Set("eee")
"aaaaaaa".longestRun mustBe Set("aaaaaaa")
"abcdefgh".longestRun mustBe empty
}
}
Following is the imperative style implementation, with nested loops and mutable variables as you would normally choose to do in Java or C++:
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
val chunks = mutable.Set.empty[String]
val ilen = string.length
var gmax = 0
for ((ch, curr) <- string.zipWithIndex) {
val chunk = mutable.ListBuffer(ch)
var next = curr + 1
while (next < ilen && string(next) == ch) {
chunk += string(next)
next = next + 1
}
gmax = chunk.length max gmax
if (gmax > 1) chunks += chunk.mkString
}
chunks.toSet.filter( _.length == gmax )
}
}
}
Following is a functional-style implementation, hence no variables, no loops but tail recursion with result accumulators and pattern matching to compare each character with the next one (Crazy! Isn't it?):
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
def recurse(chars: String, chunk: mutable.ListBuffer[Char], chunks: mutable.Set[String]): Set[String] = {
chars.toList match {
case List(x, y, _*) if (x == y) =>
recurse(
chars.tail,
if (chunk.isEmpty) chunk ++= List(x, y) else chunk += y,
chunks
)
case Nil =>
// terminate recursion
chunks.toSet
case _ => // x != y
recurse(
chars.tail,
chunk = mutable.ListBuffer(),
chunks += chunk.mkString
)
}
}
val chunks = recurse(string, mutable.ListBuffer(), mutable.Set.empty[String])
val max = chunks.map(_.length).max
if (max > 0) chunks.filter( _.length == max ) else Set()
}
}
}
For example, for the given "aeebceeedd" string, both implementations above will build the following set of chunks (repeating characters)
Set("ee", "eee", "dd")
and they will filter those chunks having the maximum length (resulting "eee").

Related

How do I find the decomposition of a string?

I need to create an algorithm for String decomposition.
Some examples:
ABCABCDEDEDEF --> ABC*2+DE*3+F
ABCcABCczcz --> ABC*2+cz*2+c
test --> test
Each segment of the string should be seperated by a + and, if repeated, followed up by a * plus the number of times it appears in succession.
This is what I have tried:
private static int[] prefixFunction(String source) {
int n = source.length();
int[] pi = new int[n];
for (int i = 1; i < n; i++) {
int j = pi[i - 1];
while (j > 0 && source.charAt(i) != source.charAt(j))
j = pi[j - 1];
if (source.charAt(i) == source.charAt(j))
j++;
pi[i] = j;
}
return pi;
}

This solution keeps everything in order, meaning an input like ABCABCDEDEDEF will return ABC*2+DE*3+F or an input like abDEDEab will return ab+DE*2+ab.
If you don't keep the order, it will be impossible to reconstruct the String later with 100 % accuracy.
public static void main(String[] args) {
String input = "ABCABCDEDEDEF";
String output = findDecomposition(input);
System.out.println("Output: " + output);
}
public static String findDecomposition(String input) {
String substring = input;
StringBuilder builder = new StringBuilder();
for (int start = 0, count = 1; start < input.length(); start++, count = 1) {
for (int end = start + 1; end < input.length(); end++) {
substring = input.substring(start, end);
while (true) {
String next = input.substring(start + substring.length(), Math.min(end + substring.length(), input.length()));
if (next.equals(substring)) {
count++;
start += substring.length();
end += substring.length();
} else
break;
}
if (count > 1) {
start += substring.length() - 1;
break;
}
}
if (count > 1) {
if (builder.length() > 0 && builder.charAt(builder.length() - 1) != '+')
builder.append('+');
builder.append(substring + "*" + count + "+");
} else
builder.append(input.charAt(start));
}
String result = builder.toString();
if (result.endsWith("+"))
return result.substring(0, result.length() - 1);
else
return result;
}

THe brute force alghoritm can work as follows.
Prerequisities:
First letter is set as root
Data structure of each possible solution is linked list. Value of each node is text to be written.
When outputting solution, first put to Map all text values together with number of appereances. If it appears more than once, use * as multiplicator
Example: One of the solution looks like this ABC-C-ABC, the output will be ABC*2+C
Solution:
Take next letter from input
New solutions are based on existing solutions. Each new solution is old solution + new letter added in one of the existing nodes or as single letter in new node.
Save new solutions as existing solutions.
Repeat from 1 until you process all letters
Calculate value of all solutions and select one with lowest string characters
I added example, as you can see the number of solutions are increasing quickly so it is not fully finished for all 6 letters. Each step represent the cycle from 1. to 4., you can see that in each step the previous solutions are used as base for new solutions. There are multiple new solutions created for each existing solution.

This code returns the following compositions:
ABCABCDEDEDEF -> ABC*2+DE*3+F
ABCcABCczcz -> ABCc*2+zcz
cefABCcABCczcz -> cef+ABCc*2+zcz
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
public class Decomposition {
public static void main(String[] args) {
Decomposition d = new Decomposition("ABCABCDEDEDEF");
System.out.println(d.getOptimalDecomposition());// Output: ABC*2+DE*3+F
d = new Decomposition("ABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: ABCc*2+zcz
d = new Decomposition("cefABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: cef+ABCc*2+zcz
}
private List> decompositions;
private String toDecompose;
public Decomposition(String toDecompose) {
decompositions = new ArrayList();
this.toDecompose = toDecompose;
}
public String getOptimalDecomposition() {
decompose(0, new ArrayList());
return calculateOptimal(convertToPartsMap());
}
private String calculateOptimal(List> partsCount) {
Collections.sort(partsCount, new SortDecompositions());
StringBuilder optimal = new StringBuilder();
for (int i = 0; i 1) {
optimal.append("*");
optimal.append(pc.count);
}
if (i != partsCount.get(0).size() - 1) {
optimal.append("+");
}
}
return optimal.toString();
}
private List> convertToPartsMap() {
List> partsMap = new ArrayList();
for (List parts : decompositions) {
List partsCount = new ArrayList();
String lastPart = null;
int curCount = 0;
for (int i = 0; i parts) {
if (nextChar == toDecompose.length()) {
decompositions.add(parts);
return;
}
char toAdd = toDecompose.charAt(nextChar);
if (parts.isEmpty()) {
parts.add("" + toAdd);
decompose(nextChar + 1, parts);
} else {
// left
List leftParts = parts.stream().collect(Collectors.toList());// shallow copy
if (!leftParts.isEmpty()) {
int last = leftParts.size() - 1;
leftParts.set(last, leftParts.get(last) + toAdd);
} else {
leftParts.add("" + toAdd);
}
// right
List rightParts = parts.stream().collect(Collectors.toList());// shallow copy
rightParts.add("" + toAdd);
decompose(nextChar + 1, leftParts);
decompose(nextChar + 1, rightParts);
}
}
}
class PartCount {
String part;
int count;
public PartCount(String part, int count) {
this.part = part;
this.count = count;
}
#Override
public String toString() {
return "[" + part + ", " + count + "]";
}
}
class SortDecompositions implements Comparator> {
public int compare(List a, List b) {
// Here you can define what exactly means "taking up least space".
return countChars(a) - countChars(b);
}
private int countChars(List listPc) {
int count = 0;
for (PartCount pc : listPc) {
count += pc.part.length();
}
return count;
}
}

This can be solved by using KMP alogorthm longest prefix which is also suffix
Steps:
iterate the string "ABCABCDEDEDEF" and construct lps array for the string. The values in the array will be
0 0 0 1 2 3 0 0 0 0 0 0 0
This lps array gives the number of times the prefix is repeated in the string.
In the above case it is repeated only one time. Considering the actual prefix number of times will be 2 it becomes ABC*2
Take the substring of the remaining string and repeat the step 1 till the end of the string.
I can provide you the code if needed. The worst time complexity will be O(n2)

implement basic string compression

I am working on question 1.5 from the book Cracking The Coding interview. The problem is to take a string "aabcccccaaa" and turn it into a2b1c5a3.
If the compressed string is not smaller than the original string, then return the original string.
My code is below. I used an ArrayList because I would not know how long the compressed string would be.
My output is [a, 2, b, 1, c, 5], aabc, []. When the program gets to the end of string, it doesn't have a character to compare the last character too.
import java.util.*;
import java.io.*;
public class stringCompression {
public static void main(String[] args) {
String a = "aabcccccaaa";
String b = "aabc";
String v = "aaaa";
check(a);
System.out.println("");
check(b);
System.out.println("");
check(v);
}
public static void check(String g){
ArrayList<Character> c = new ArrayList<Character>();
int count = 1;
int i = 0;
int h = g.length();
for(int j = i + 1; j < g.length(); j++)
{
if(g.charAt(i) == g.charAt(j)){
count++;
}
else {
c.add(g.charAt(i));
c.add((char)( '0' + count));
i = j;
count = 1;
}
}
if(c.size() == g.length()){
System.out.print(g);
}
else{
System.out.print(c);
}
}
}

In the last loop you're not adding the result to the array. When j = g.length() still needs to add the current char and count to the array. So you could check the next value of j before increment it:
for(int j = i + 1; j < g.length(); j++)
{
if(g.charAt(i) == g.charAt(j)){
count++;
}
else {
c.add(g.charAt(i));
c.add((char)( '0' + count));
i = j;
count = 1;
}
if((j + 1) = g.length()){
c.add(g.charAt(i));
c.add((char)( '0' + count));
}
}

I would use a StringBuilder rather than an ArrayList to build your compressed String. When you start compressing, the first character should already be added to the result. The count of the character will be added once you've encountered a different character. When you've reached the end of the String you should just be appending the remaining count to the result for the last letter.
public static void main(String[] args) throws Exception {
String[] data = new String[] {
"aabcccccaaa",
"aabc",
"aaaa"
};
for (String d : data) {
System.out.println(compress(d));
}
}
public static String compress(String str) {
StringBuilder compressed = new StringBuilder();
// Add first character to compressed result
char currentChar = str.charAt(0);
compressed.append(currentChar);
// Always have a count of 1
int count = 1;
for (int i = 1; i < str.length(); i++) {
char nextChar = str.charAt(i);
if (currentChar == nextChar) {
count++;
} else {
// Append the count of the current character
compressed.append(count);
// Set the current character and count
currentChar = nextChar;
count = 1;
// Append the new current character
compressed.append(currentChar);
}
}
// Append the count of the last character
compressed.append(count);
// If the compressed string is not smaller than the original string, then return the original string
return (compressed.length() < str.length() ? compressed.toString() : str);
}
Results:
a2b1c5a3
aabc
a4

You have two errors:
one that Typo just mentioned, because your last character was not added;
and another one, if the original string is shorter like "abc" with only three chars: "a1b1c1" has six chars (the task is "If the compressed string is not smaller than the original string, then return the original string.")
You have to change your if statement, ask for >= instead of ==
if(c.size() >= g.length()){
System.out.print(g);
} else {
System.out.print(c);
}

Use StringBuilder and then iterate on the input string.
private static string CompressString(string inputString)
{
var count = 1;
var compressedSb = new StringBuilder();
for (var i = 0; i < inputString.Length; i++)
{
// Check if we are at the end
if(i == inputString.Length - 1)
{
compressedSb.Append(inputString[i] + count.ToString());
break;
}
if (inputString[i] == inputString[i + 1])
count++;
else
{
compressedSb.Append(inputString[i] + count.ToString());
count = 1;
}
}
var compressedString = compressedSb.ToString();
return compressedString.Length > inputString.Length ? inputString : compressedString;
}

Compression algorithm in java

My goal is to write a program that compresses a string, for example:
input: hellooopppppp!
output:he2l3o6p!
Here is the code I have so far, but there are errors.
When I have the input: hellooo
my code outputs: hel2l3o
instead of: he213o
the 2 is being printed in the wrong spot, but I cannot figure out how to fix this.
Also, with an input of: hello
my code outputs: hel2l
instead of: he2lo
It skips the last letter in this case all together, and the 2 is also in the wrong place, an error from my first example.
Any help is much appreciated. Thanks so much!
public class compressionTime
{
public static void main(String [] args)
{
System.out.println ("Enter a string");
//read in user input
String userString = IO.readString();
//store length of string
int length = userString.length();
System.out.println(length);
int count;
String result = "";
for (int i=1; i<=length; i++)
{
char a = userString.charAt(i-1);
count = 1;
if (i-2 >= 0)
{
while (i<=length && userString.charAt(i-1) == userString.charAt(i-2))
{
count++;
i++;
}
System.out.print(count);
}
if (count==1)
result = result.concat(Character.toString(a));
else
result = result.concat(Integer.toString(count).concat(Character.toString(a)));
}
IO.outputStringAnswer(result);
}
}

I would
count from 0 as that is how indexes work in Java. Your code will be simpler.
would compare the current char to the next one. This will avoid printing the first character.
wouldn't compress ll as 2l as it is no smaller. Only sequences of at least 3 will help.
try to detect if a number 3 to 9 has been used and at least print an error.
use the debugger to step through the code to understand what it is doing and why it doesn't do what you think it should.

I am doing it this way. Very simple:
public static void compressString (String string) {
StringBuffer stringBuffer = new StringBuffer();
for (int i = 0; i < string.length(); i++) {
int count = 1;
while (i + 1 < string.length()
&& string.charAt(i) == string.charAt(i + 1)) {
count++;
i++;
}
if (count > 1) {
stringBuffer.append(count);
}
stringBuffer.append(string.charAt(i));
}
System.out.println("Compressed string: " + stringBuffer);
}

You can accomplish this using a nested for loops and do something simial to:
count = 0;
String results = "";
for(int i=0;i<userString.length();){
char begin = userString.charAt(i);
//System.out.println("begin is: "+begin);
for(int j=i+1; j<userString.length();j++){
char next = userString.charAt(j);
//System.out.println("next is: "+next);
if(begin == next){
count++;
}
else{
System.out.println("Breaking");
break;
}
}
i+= count+1;
if(count>0){
String add = begin + "";
int tempcount = count +1;
results+= tempcount + add;
}
else{
results+= begin;
}
count=0;
}
System.out.println(results);
I tested this output with Hello and the result was He2lo
also tested with hellooopppppp result he2l3o6p

If you don't understand how this works, you should learn regular expressions.
public String rleEncodeString(String in) {
StringBuilder out = new StringBuilder();
Pattern p = Pattern.compile("((\\w)\\2*)");
Matcher m = p.matcher(in);
while(m.find()) {
if(m.group(1).length() > 1) {
out.append(m.group(1).length());
}
out.append(m.group(2));
}
return out.toString();
}

Try something like this:
public static void main(String[] args) {
System.out.println("Enter a string:");
Scanner IO = new Scanner(System.in);
// read in user input
String userString = IO.nextLine() + "-";
int length = userString.length();
int count = 0;
String result = "";
char new_char;
for (int i = 0; i < length; i++) {
new_char = userString.charAt(i);
count++;
if (new_char != userString.charAt(i + 1)) {
if (count != 1) {
result = result.concat(Integer.toString(count + 1));
}
result = result.concat(Character.toString(new_char));
count = 0;
}
if (userString.charAt(i + 1) == '-')
break;
}
System.out.println(result);
}

The problem is that your code checks if the previous letter, not the next, is the same as the current.
Your for loops basically goes through each letter in the string, and if it is the same as the previous letter, it figures out how many of that letter there is and puts that number into the result string. However, for a word like "hello", it will check 'e' and 'l' (and notice that they are preceded by 'h' and 'e', receptively) and think that there is no repeat. It will then get to the next 'l', and then see that it is the same as the previous letter. It will put '2' in the result, but too late, resulting in "hel2l" instead of "he2lo".
To clean up and fix your code, I recommend the following to replace your for loop:
int count = 1;
String result = "";
for(int i=0;i<length;i++) {
if(i < userString.length()-1 && userString.charAt(i) == userString.charAt(i+1))
count++;
else {
if(count == 1)
result += userString.charAt(i);
else {
result = result + count + userString.charAt(i);
count = 1;
}
}
}
Comment if you need me to explain some of the changes. Some are necessary, others optional.

Here is the solution for the problem with better time complexity:
public static void compressString (String string) {
LinkedHashSet<String> charMap = new LinkedHashSet<String>();
HashMap<String, Integer> countMap = new HashMap<String, Integer>();
int count;
String key;
for (int i = 0; i < string.length(); i++) {
key = new String(string.charAt(i) + "");
charMap.add(key);
if(countMap.containsKey(key)) {
count = countMap.get(key);
countMap.put(key, count + 1);
}
else {
countMap.put(key, 1);
}
}
Iterator<String> iterator = charMap.iterator();
String resultStr = "";
while (iterator.hasNext()) {
key = iterator.next();
count = countMap.get(key);
if(count > 1) {
resultStr = resultStr + count + key;
}
else{
resultStr = resultStr + key;
}
}
System.out.println(resultStr);
}

Longest Common Substring in a big text

i have this assignment for school which ask us to write code to find the longest common Substring. I have done that, but it only works with text that are not so big and it is being asked to find the common substring for Moby Dick and War And Peace. If you could point me in the right direction of what i'm doing wrong, i would appreciate it. The compiler is complaining that the error is in the substring method of the MyString class when i call it to create the SuffixArray but idk why its saying its too big, giving me the outofmemory
package datastructuresone;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
class SuffixArray
{
private final MyString[] suffixes;
private final int N;
public SuffixArray(String s)
{
N = s.length();
MyString snew = new MyString(s);
suffixes = new MyString[N];
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
Arrays.sort(suffixes);
}
public int length()
{
return N;
}
public int index(int i)
{
return N - suffixes[i].length();
}
public MyString select(int i)
{
return suffixes[i];
}
// length of longest common prefix of s and t
private static int lcp(MyString s, MyString t)
{
int N = Math.min(s.length(), t.length());
for (int i = 0; i < N; i++)
{
if (s.charAt(i) != t.charAt(i))
{
return i;
}
}
return N;
}
// longest common prefix of suffixes(i) and suffixes(i-1)
public int lcp(int i)
{
return lcp(suffixes[i], suffixes[i - 1]);
}
// longest common prefix of suffixes(i) and suffixes(j)
public int lcp(int i, int j)
{
return lcp(suffixes[i], suffixes[j]);
}
}
public class DataStructuresOne
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner in1 = new Scanner(new File("./build/classes/WarAndPeace.txt"));
Scanner in2 = new Scanner(new File("./build/classes/MobyDick.txt"));
StringBuilder sb = new StringBuilder();
StringBuilder sb1 = new StringBuilder();
while (in1.hasNextLine())
{
sb.append(in1.nextLine());
}
while (in2.hasNextLine())
{
sb1.append(in2.nextLine());
}
String text1 = sb.toString().replaceAll("\\s+", " ");
String text2 = sb1.toString().replaceAll("\\s+", " ");
int N1 = text1.length();
int N2 = text2.length();
SuffixArray sa = new SuffixArray(text1 + "#" + text2);
int N = sa.length();
String substring = "";
for (int i = 1; i < N; i++)
{
// adjacent suffixes both from second text string
if (sa.select(i).length() <= N2 && sa.select(i - 1).length() <= N2)
{
continue;
}
// adjacent suffixes both from first text string
if (sa.select(i).length() > N2 + 1 && sa.select(i - 1).length() > N2 + 1)
{
continue;
}
// check if adjacent suffixes longer common substring
int length = sa.lcp(i);
if (length > substring.length())
{
substring = sa.select(i).toString().substring(0, length);
System.out.println(substring + " ");
}
}
System.out.println("The length of the substring " + substring.length() + "length on first N " + N1 + " length of Second N " + N2
+ "The length of the array sa: " + N);
System.out.println("'" + substring + "'");
final class MyString implements Comparable<MyString>
{
public MyString(String str)
{
offset = 0;
len = str.length();
arr = str.toCharArray();
}
public int length()
{
return len;
}
public char charAt(int idx)
{
return arr[ idx + offset];
}
public int compareTo(MyString other)
{
int myEnd = offset + len;
int yourEnd = other.offset + other.len;
int i = offset, j = other.offset;
for (; i < myEnd && j < yourEnd; i++, j++)
{
if (arr[ i] != arr[ j])
{
return arr[ i] - arr[ j];
}
}
// reached end. Who got there first?
if (i == myEnd && j == yourEnd)
{
return 0; // identical strings
}
if (i == myEnd)
{
return -1;
} else
{
return +1;
}
}
public MyString substring(int beginIndex, int endIndex)
{
return new MyString(arr, beginIndex + offset, endIndex - beginIndex);
}
public MyString substring(int beginIndex)
{
return substring(beginIndex, offset + len);
}
public boolean equals(Object other)
{
return (other instanceof MyString) && compareTo((MyString) other) == 0;
}
public String toString()
{
return new String(arr, offset, len);
}
private MyString(char[] a, int of, int ln)
{
arr = a;
offset = of;
len = ln;
}
private char[] arr;
private int offset;
private int len;
}

Here:
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
You are trying to store, not only the entire long string, but the entire string - 1 letter, and the entire string - 2 letters, etc. All of these are stored separately.
If your String were only 10 letters, you would be storing a total of 55 characters worth in 10 different string.
At 1000 characters, you are storing 500500 characters total.
More generally, you are having to handle, length*(length+1)/2 characters.
Just for fun, I don't know how many characters are in War and Peace, but with a page count around 1250, a typical words/page estimate being 250, and the average word being about 5 characters long, comes to:
(1250 * 250 * 5)*(1250 * 250 * 5 + 1)/2 = 1.2207039 * 10^12 characters.
The size of a char in memory being 2 bytes, so you're looking at about 2.22 TB in size (compared to 1.49 MB for just the text of the novel).

I count at least 3 copies of both texts in the first few lines of the code. Here's a few ideas
convert the spaces as you read each line in--not after they are huge strings. Don't forget the case of spaces at the front and end of lines.
build your MyString class using StringBuilder as the base instead of String. Do all the looking inside the StringBuilder with its native methods, if you can.
don't extract strings any more than you have to.
Look up the -Xmx java runtime option and set the heap space large than the default. You'll have to google this as I don't have it memorized. Just notice that -Xmx=1024M needs that M at the end. (Look at the file size to see how big the two books are.)

When you construct MyString, you call arr = str.toCharArray(); which makes a new copy of the string's character data. But in Java, a string is immutable - so why not store a reference to the string instead of a copy of its data?
You construct every suffix at once, but you only refer to one (well, two) at a time. If you recode your solution to only reference the suffixes it currently cares about, and construct them only when it needs them (and lose a reference to them afterwards), they can be garbage collected by Java. This will make running out of memory less likely. Compare the memory overhead of storing 2 strings to storing hundreds of thousands of strings :)

I wrote this program in Scala. Maybe you can translate it to Java.
class MyString private (private val string: String, startIndex: Int, endIndex: Int) extends Comparable[MyString] {
def this(string: String) = this(string, 0, string.length)
def length() = endIndex-startIndex
def charAt(i: Int) = {
if(i >= length) throw new IndexOutOfBoundsException
string.charAt(startIndex + i)
}
def substring(start: Int, end: Int): MyString = {
if(start < 0 || end > length || end < start) throw new IndexOutOfBoundsException
new MyString(string, startIndex + start, startIndex + end)
}
def substring(start: Int): MyString = substring(start, length)
def longestCommonSubstring(other: MyString): MyString = {
var index = 0
val len = math.min(length, other.length)
while(index < len && charAt(index) == other.charAt(index)) index += 1
substring(0, index)
}
def compareTo(other: MyString): Int = {
val len = math.min(length, other.length)
for(i <- 0 until len) {
if(charAt(i) > other.charAt(i)) return 1
if(charAt(i) < other.charAt(i)) return -1
}
length-other.length
}
def >(other: MyString) = compareTo(other) > 0
def <(other: MyString) = compareTo(other) < 0
override def equals(other: Any) = other.isInstanceOf[MyString] && compareTo(other.asInstanceOf[MyString]) == 0
override def toString() = "\"" + string.substring(startIndex, endIndex) + "\""
}
def readFile(name: String) = new MyString(io.Source.fromFile(name).getLines.mkString(" ").replaceAll("\\s+", " "))
def makeList(str: MyString) = (0 until str.length).map(i => str.substring(i)).toIndexedSeq
val string1 = readFile("WarAndPeace.txt")
val string2 = readFile("MobyDick.txt")
val (list1, list2) = (makeList(string1).sorted, makeList(string2).sorted)
var longestMatch = new MyString("")
var (index1, index2) = (0,0)
while(index1 < list1.size && index2 < list2.size) {
val lcs = list1(index1).longestCommonSubstring(list2(index2))
if(lcs.length > longestMatch.length) longestMatch = lcs
if(list1(index1) < list2(index2)) index1 += 1
else index2 += 1
}
println(longestMatch)

Generating the lexicographically greatest string

The question is to generate the lexicographically greatest string given some string s.
So the aim is to find lexicographically greatest, unique(no repetitions) substring s1 from s.
We say that some subsequence s1 is greater than another subsequence s2 if s1 has more characters than s2 or s1 is lexicographically greater than s2 if equal length.
I/O are as follows:
Input is: babab
output is: ba
Second input is: nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle
Second output is:
tsocrpkijgdqnbafhmle
This is what I wrote for my java code but my code fails on the second test case. Also I'm having a hard time understanding why second output isn't tsrqponmlkjihgfedcba.
Can somebody provide suggestions for a fix or even java code?
I think the algorithm has to be more efficient than generating all possible unique strings, sort them and find lexicographically largest one.
To make the question much clearer, if the input is babab, then all the possible unique combinations would be b, a, ba, ab. And the output will be ba because it's the longest and lexicographically greater than ab.
Note: this is not a homework assignment.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class mostBeautiful {
final static int MAX = 1000000;
static String[] permute;
static void permutation(String prefix, String str, int counter) {
int n = str.length();
//System.out.println("n is: "+ n);
if (n == 0) {
permute[counter] = prefix;
} else {
for (int i = 0; i < n; i++) {
//System.out.println("str is: "+ str);
permutation(prefix + str.charAt(i), str.substring(0, i) + str.substring(i+1, n), counter++);
}
}
}
public static void main(String[] args) throws IOException {
BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
String s = bf.readLine();
char[] unique = new char[26];
int counter = 0;
String answer = "";
//System.out.println("s is: " + s);
int ascii = 0;
final int asciiAVal = 97;
final int asciiZVal = 122;
for (int i = 0; i < s.length(); i++) {
ascii = (int)s.charAt(i);
if (ascii < asciiAVal || ascii > asciiZVal) {
continue;
}
char ch = s.charAt(i);
unique[ch - 'a'] = ch;
}
String result = "";
for (int j = 25; j >= 0; j--) {
result += unique[j];
}
result = result.trim();
System.out.println(result);
int size = result.length() * (result.length() - 1);
permute = new String[size];
permutation("", result, counter);
for (int i = 1; i < size; i++) {
if (permute[i].compareTo(permute[i - 1]) > 0){
answer = permute[i];
} else {
answer = permute[i - 1];
}
}
System.out.println("answer is: " + answer);
}
}

After thinking about this problem in many ways, I have determined a divide-and-conquer algorithm that gets the results right:
Algorithm - Pseudocode
Assuming some input string, S defined as a concatenation of two substrings A + B, we compute the lexicographically greatest string recursively as:
LexMax(S) = Merge(LexMax(A),LexMax(B))
Where
LexMax(S)
{
if Length(S) = 1
return S
else
{
LMA = LexMax(S[0:Length/2])
LMB = LexMax(S[Length/2:end])
return Merge(LMA,LMB)
}
}
Merge(A,B)
{
Sa = A
Sb = B
for n = 0:Length(A)
{
if Sb contains A[n]
{
if A[n+1:end] contains character > A[n]
Remove A[n] from Sa
else
Remove A[n] from Sb
}
}
return Sa + Sb
}
Java Code
Coming soon!
Example
Given an input string
cefcfdabbcfed
Divide it into
cefcfda
bbcfed
Assuming the function works we have:
LexMax("cefcfda") = "efcda"
LexMax("bbcfed") = "bcfed"
Merging works as follows:
e: efcda bcfed
In both substrings, greater value found to right of e in left substring, remove from left
f: fcda bcfed
In both substrings, no greater value in left substring, remove from right
c: fcda bced
In both substrings, greater value found to right of c in left substring, remove from left
d: fda bced
In both substrings, no greater value in left substring, remove from right
a: fda bce
Not in both substrings, do nothing
Final result:
LexMax(cefcfdabbcfed) = fdabce

This is not a direct answer, but doesn't this code meet the requirement as you explained it in the discussion above?
final String x = "saontehusanoethusnaoteusnaoetuh";
final SortedSet<Character> chars =
new TreeSet<Character>(Collections.reverseOrder());
for (char c : x.toCharArray()) chars.add(c);
System.out.println(chars);

Lexicographic order is an order in which words are displayed in alphabetical order using the appearance of letters in the word.It is also know as dictionary order or alphabetical order.For ex:-"Africa" is smaller than "Bangladesh" ,"He" is smaller than "he".
public class LexicographicExample {
public static void main(String a[]) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter the String:-");
String str = sc.nextLine();
System.out.println("Enter the length");
int count = sc.nextInt();
List<String> list = new ArrayList<String>();
for (int i = 0; i < str.length(); i = i + 1) {
if (str.length() - i >= count) {
list.add(str.substring(i, count + i));
}
}
Collections.sort(list);
System.out.println("Smallest subString:-" + list.get(0));
System.out.println("Largest subString:-" + list.get(list.size() - 1));
}
}
For reference ,refer this link http://techno-terminal.blogspot.in/2015/09/java-program-to-find-lexicographically.html

"tsrqponmlkjihgfedcba" is not the answer because it is not a subsequence of the input. The definition of subsequence requires that the characters of the subsequence occur in the original sequence in the same order. For example, "abc" is a subsequence of "apbqcr", while "cba" is not.
As to the solution, I think a simple greedy algorithm would suffice. First, one has to understand that the maximum possible length of the output is the number of unique symbols (say, N) in the input. Since any output shorter than that would not be the greatest one, it has to be exactly N symbols long. The rest of the procedure is simple and at most quadratic in time complexity: one has to go through the input string and at each step pick the lexicographically highest symbol such that the part of the string to the left of it would still contain all the "unused" symbols.
As an example, consider a string "bacb". The first symbol can be 'a' or 'b', since in both cases the remainder contains both of the other letters. 'b' is greater, so we pick it. Now for "acb" we can only pick 'a' and than 'c' according to that condition, so we end up with "bac" for output.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Scanner;
class aaa {
public static void main(String args[]) throws Exception {
Scanner scan = new Scanner(System.in);
// int n = scan.nextInt();
String s = scan.next();
HashMap<Character, Node5> map = new HashMap<>();
for (int i = 0; i < s.length(); i++) {
if (!map.containsKey(s.charAt(i))) {
Node5 node = new Node5();
node.nl.add(i);
node.li = i;
map.put(s.charAt(i), node);
} else {
Node5 rn = map.get(s.charAt(i));
rn.nl.add(i);
rn.li = i;
map.put(s.charAt(i), rn);
}
}
String s1 = "";
int index = -1;
for (int i = 25; i >= 0; i--) {
if (map.containsKey((char) (97 + i))) {
if (map.get((char) (97 + i)).li > index) {
for (int j = 0; j < map.get((char) (97 + i)).nl.size(); j++) {
if (map.get((char) (97 + i)).nl.get(j) > index) {
s1 += (char) (97 + i);
index = map.get((char) (97 + i)).nl.get(j);
}
}
}
}
}
System.out.println(s1);
scan.close();
}
}
class Node5 {
int li;
ArrayList<Integer> nl;
public Node5() {
this.nl = new ArrayList<>();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Find the longest sequential same character array - java

Related

How do I find the decomposition of a string?

implement basic string compression

Compression algorithm in java

Longest Common Substring in a big text

Generating the lexicographically greatest string

Categories

Resources