How do I find the decomposition of a string?

How do I find the decomposition of a string? - java

I need to create an algorithm for String decomposition.
Some examples:
ABCABCDEDEDEF --> ABC*2+DE*3+F
ABCcABCczcz --> ABC*2+cz*2+c
test --> test
Each segment of the string should be seperated by a + and, if repeated, followed up by a * plus the number of times it appears in succession.
This is what I have tried:
private static int[] prefixFunction(String source) {
int n = source.length();
int[] pi = new int[n];
for (int i = 1; i < n; i++) {
int j = pi[i - 1];
while (j > 0 && source.charAt(i) != source.charAt(j))
j = pi[j - 1];
if (source.charAt(i) == source.charAt(j))
j++;
pi[i] = j;
}
return pi;
}

This solution keeps everything in order, meaning an input like ABCABCDEDEDEF will return ABC*2+DE*3+F or an input like abDEDEab will return ab+DE*2+ab.
If you don't keep the order, it will be impossible to reconstruct the String later with 100 % accuracy.
public static void main(String[] args) {
String input = "ABCABCDEDEDEF";
String output = findDecomposition(input);
System.out.println("Output: " + output);
}
public static String findDecomposition(String input) {
String substring = input;
StringBuilder builder = new StringBuilder();
for (int start = 0, count = 1; start < input.length(); start++, count = 1) {
for (int end = start + 1; end < input.length(); end++) {
substring = input.substring(start, end);
while (true) {
String next = input.substring(start + substring.length(), Math.min(end + substring.length(), input.length()));
if (next.equals(substring)) {
count++;
start += substring.length();
end += substring.length();
} else
break;
}
if (count > 1) {
start += substring.length() - 1;
break;
}
}
if (count > 1) {
if (builder.length() > 0 && builder.charAt(builder.length() - 1) != '+')
builder.append('+');
builder.append(substring + "*" + count + "+");
} else
builder.append(input.charAt(start));
}
String result = builder.toString();
if (result.endsWith("+"))
return result.substring(0, result.length() - 1);
else
return result;
}

THe brute force alghoritm can work as follows.
Prerequisities:
First letter is set as root
Data structure of each possible solution is linked list. Value of each node is text to be written.
When outputting solution, first put to Map all text values together with number of appereances. If it appears more than once, use * as multiplicator
Example: One of the solution looks like this ABC-C-ABC, the output will be ABC*2+C
Solution:
Take next letter from input
New solutions are based on existing solutions. Each new solution is old solution + new letter added in one of the existing nodes or as single letter in new node.
Save new solutions as existing solutions.
Repeat from 1 until you process all letters
Calculate value of all solutions and select one with lowest string characters
I added example, as you can see the number of solutions are increasing quickly so it is not fully finished for all 6 letters. Each step represent the cycle from 1. to 4., you can see that in each step the previous solutions are used as base for new solutions. There are multiple new solutions created for each existing solution.

This code returns the following compositions:
ABCABCDEDEDEF -> ABC*2+DE*3+F
ABCcABCczcz -> ABCc*2+zcz
cefABCcABCczcz -> cef+ABCc*2+zcz
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
public class Decomposition {
public static void main(String[] args) {
Decomposition d = new Decomposition("ABCABCDEDEDEF");
System.out.println(d.getOptimalDecomposition());// Output: ABC*2+DE*3+F
d = new Decomposition("ABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: ABCc*2+zcz
d = new Decomposition("cefABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: cef+ABCc*2+zcz
}
private List> decompositions;
private String toDecompose;
public Decomposition(String toDecompose) {
decompositions = new ArrayList();
this.toDecompose = toDecompose;
}
public String getOptimalDecomposition() {
decompose(0, new ArrayList());
return calculateOptimal(convertToPartsMap());
}
private String calculateOptimal(List> partsCount) {
Collections.sort(partsCount, new SortDecompositions());
StringBuilder optimal = new StringBuilder();
for (int i = 0; i 1) {
optimal.append("*");
optimal.append(pc.count);
}
if (i != partsCount.get(0).size() - 1) {
optimal.append("+");
}
}
return optimal.toString();
}
private List> convertToPartsMap() {
List> partsMap = new ArrayList();
for (List parts : decompositions) {
List partsCount = new ArrayList();
String lastPart = null;
int curCount = 0;
for (int i = 0; i parts) {
if (nextChar == toDecompose.length()) {
decompositions.add(parts);
return;
}
char toAdd = toDecompose.charAt(nextChar);
if (parts.isEmpty()) {
parts.add("" + toAdd);
decompose(nextChar + 1, parts);
} else {
// left
List leftParts = parts.stream().collect(Collectors.toList());// shallow copy
if (!leftParts.isEmpty()) {
int last = leftParts.size() - 1;
leftParts.set(last, leftParts.get(last) + toAdd);
} else {
leftParts.add("" + toAdd);
}
// right
List rightParts = parts.stream().collect(Collectors.toList());// shallow copy
rightParts.add("" + toAdd);
decompose(nextChar + 1, leftParts);
decompose(nextChar + 1, rightParts);
}
}
}
class PartCount {
String part;
int count;
public PartCount(String part, int count) {
this.part = part;
this.count = count;
}
#Override
public String toString() {
return "[" + part + ", " + count + "]";
}
}
class SortDecompositions implements Comparator> {
public int compare(List a, List b) {
// Here you can define what exactly means "taking up least space".
return countChars(a) - countChars(b);
}
private int countChars(List listPc) {
int count = 0;
for (PartCount pc : listPc) {
count += pc.part.length();
}
return count;
}
}

This can be solved by using KMP alogorthm longest prefix which is also suffix
Steps:
iterate the string "ABCABCDEDEDEF" and construct lps array for the string. The values in the array will be
0 0 0 1 2 3 0 0 0 0 0 0 0
This lps array gives the number of times the prefix is repeated in the string.
In the above case it is repeated only one time. Considering the actual prefix number of times will be 2 it becomes ABC*2
Take the substring of the remaining string and repeat the step 1 till the end of the string.
I can provide you the code if needed. The worst time complexity will be O(n2)

Related

sub arraylist's size isn't correct

After hard searchig I still haven't found the proper answer for my question and there is it:
I have to write a java program that enters an array of strings and finds in it the largest sequence of equal elements. If several sequences have the same longest length, the program should print the leftmost of them. The input strings are given as a single line, separated by a space.
For example:
if the input is: "hi yes yes yes bye",
the output should be: "yes yes yes".
And there is my source code:
public static void main(String[] args) {
System.out.println("Please enter a sequence of strings separated by spaces:");
Scanner inputStringScanner = new Scanner(System.in);
String[] strings = inputStringScanner.nextLine().split(" ");
System.out.println(String.join(" ", strings));
ArrayList<ArrayList<String>> stringsSequencesCollection = new ArrayList<ArrayList<String>>();
ArrayList<String> stringsSequences = new ArrayList<String>();
stringsSequences.add(strings[0]);
for (int i = 1; i < strings.length; i++) {
if(strings[i].equals(strings[i - 1])) {
stringsSequences.add(strings[i]);
} else {
System.out.println(stringsSequences + " " + stringsSequences.size());
stringsSequencesCollection.add(stringsSequences);
stringsSequences.clear();
stringsSequences.add(strings[i]);
//ystem.out.println("\n" + stringsSequences);
}
if(i == strings.length - 1) {
stringsSequencesCollection.add(stringsSequences);
stringsSequences.clear();
System.out.println(stringsSequences + " " + stringsSequences.size());
}
}
System.out.println(stringsSequencesCollection.size());
System.out.println(stringsSequencesCollection.get(2).size());
System.out.println();
int maximalStringSequence = Integer.MIN_VALUE;
int index = 0;
ArrayList<String> currentStringSequence = new ArrayList<String>();
for (int i = 0; i < stringsSequencesCollection.size(); i++) {
currentStringSequence = stringsSequencesCollection.get(i);
System.out.println(stringsSequencesCollection.get(i).size());
if (stringsSequencesCollection.get(i).size() > maximalStringSequence) {
maximalStringSequence = stringsSequencesCollection.get(i).size();
index = i;
//System.out.println("\n" + index);
}
}
System.out.println(String.join(" ", stringsSequencesCollection.get(index)));
I think it should be work correct but there is a problem - the sub array list's count isn't correct: All the sub arrayList's size is 1 and for this reason the output is not correct. I don't understand what is the reason for this. If anybody can help me to fix the code I will be gratefull!

I think it is fairly straight forward just keep track of a max sequence length as you go through the array building sequences.
String input = "hi yes yes yes bye";
String sa[] = input.split(" ");
int maxseqlen = 1;
String last_sample = sa[0];
String longest_seq = last_sample;
int seqlen = 1;
String seq = last_sample;
for (int i = 1; i < sa.length; i++) {
String sample = sa[i];
if (sample.equals(last_sample)) {
seqlen++;
seq += " " + sample;
if (seqlen > maxseqlen) {
longest_seq = seq;
maxseqlen = seqlen;
}
} else {
seqlen = 1;
seq = sample;
}
last_sample = sample;
}
System.out.println("longest_seq = " + longest_seq);

Lots of issues.
First of all, when dealing with the last string of the list you are not actually printing it before clearing it. Should be:
if(i == strings.length - 1)
//...
System.out.println(stringsSequences + " " + stringsSequences.size());
stringsSequences.clear();
This is the error in the output.
Secondly, and most importantly, when you do stringsSequencesCollection.add you are adding an OBJECT, i.e. a reference to the collection. When after you do stringsSequences.clear(), you empty the collection you just added too (this is because it's not making a copy, but keeping a reference!). You can verify this by printing stringsSequencesCollection after the first loop finishes: it will contain 3 empty lists.
So how do we do this? First of all, we need a more appropriate data structure. We are going to use a Map that, for each string, contains the length of its longest sequence. Since we want to manage ties too, we'll also have another map that for each string stores the leftmost ending position of the longest sequence:
Map<String, Integer> lengths= new HashMap<>();
Map<String, Integer> indexes= new HashMap<>();
String[] split = input.split(" ");
lengths.put(split[0], 1);
indexes.put(split[0], 0);
int currentLength = 1;
int maxLength = 1;
for (int i = 1; i<split.length; i++) {
String s = split[i];
if (s.equals(split[i-1])) {
currentLength++;
}
else {
currentLength = 1;
}
int oldLength = lengths.getOrDefault(s, 0);
if (currentLength > oldLength) {
lengths.put(s, currentLength);
indexes.put(s, i);
}
maxLength = Math.max(maxLength, currentLength);
}
//At this point, youll have in lengths a map from string -> maxSeqLengt, and in indexes a map from string -> indexes for the leftmost ending index of the longest sequence. Now we need to reason on those!
Now we can just scan for the strings with the longest sequences:
//Find all strings with equal maximal length sequences
Set<String> longestStrings = new HashSet<>();
for (Map.Entry<String, Integer> e: lengths.entrySet()) {
if (e.value == maxLength) {
longestStrings.add(e.key);
}
}
//Of those, search the one with minimal index
int minIndex = input.length();
String bestString = null;
for (String s: longestStrings) {
int index = indexes.get(s);
if (index < minIndex) {
bestString = s;
}
}
System.out.println(bestString);

Below code results in output as you expected:
public static void main(String[] args) {
System.out.println("Please enter a sequence of strings separated by spaces:");
Scanner inputStringScanner = new Scanner(System.in);
String[] strings = inputStringScanner.nextLine().split(" ");
System.out.println(String.join(" ", strings));
List <ArrayList<String>> stringsSequencesCollection = new ArrayList<ArrayList<String>>();
List <String> stringsSequences = new ArrayList<String>();
//stringsSequences.add(strings[0]);
boolean flag = false;
for (int i = 1; i < strings.length; i++) {
if(strings[i].equals(strings[i - 1])) {
if(flag == false){
stringsSequences.add(strings[i]);
flag= true;
}
stringsSequences.add(strings[i]);
}
}
int maximalStringSequence = Integer.MIN_VALUE;
int index = 0;
List <String> currentStringSequence = new ArrayList<String>();
for (int i = 0; i < stringsSequencesCollection.size(); i++) {
currentStringSequence = stringsSequencesCollection.get(i);
System.out.println(stringsSequencesCollection.get(i).size());
if (stringsSequencesCollection.get(i).size() > maximalStringSequence) {
maximalStringSequence = stringsSequencesCollection.get(i).size();
index = i;
//System.out.println("\n" + index);
}
}
System.out.println(stringsSequences.toString());

Java: Find the longest sequential same character array

I am a new guy to java. I want to find the longest sequential same character array in a input character arrays. For example,this character array bddfDDDffkl, the longest is DDD， and this one: rttttDDddjkl, the longest is tttt.
I use the following code to deal with this problem. But, I want to improve my code, For example, if there are two same length arrays (for example rtttgHHH, there are two longest: ttt and HHH), how to solve this problem?
Thanks in advance.
My following code:
public class SeqSameChar {
public static void main (String[] args) {
int subLength = 0;
Scanner sc = new Scanner(System.in);
String[] num = null;
num = sc.nextLine().split(" ");
String[] number = new String[num.length];
for(int i = 0; i< number.length;i++) {
number[i] = String.valueOf(num[i]);
}
subLength =length(number,num.length);
System.out.println(subLength);
for(int i = index; i < index+subLength; i++) {
System.out.print(number[i]);
}
System.out.println(c==c1);
}
public static int index;
//to calculate the longest contiguous increasing sequence
public static int length(String[] A,int size){
if(size<=0)return 0;
int res=1;
int current=1;
for(int i=1;i<size;i++){
if(A[i].equals(A[i-1])){
current++;
}
else{
if(current>res){
index=i-current;
res=current;
}
current=1;
}
}
return res;
}
}

This algorithm will work perfectly fine for what you want to develop:
Before that, let me make it clear that if you want to check repeatitions of 2 different characters same number of times, you have to run a for loop in reverse to identify the 2nd character. So if the 2nd character is not same as the first one identified, and also if it's number of repeatitions are the same, you print both the characters or else, just print the single character you find at the first for loop because both the characters are going to be same.
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter String 1: ");
String A1 = sc.nextLine();
MaxRepeat(A1);
}
public static void MaxRepeat(String A) {
int count = 1;
int max1 = 1;
char mostrepeated1 = ' ';
for(int i = 0; i < A.length()-1;i++) {
char number = A.charAt(i);
if(number == A.charAt(i+1)) {
count++;
if(count>max1) {
max1 = count;
mostrepeated1 = number;
}
continue;
}
count = 1;
}
count = 1;
int max2 = 1;
char mostrepeated2 = ' ';
for(int i = A.length()-1; i>0; i--) {
char number = A.charAt(i);
if(number == A.charAt(i-1)) {
count++;
if(count>max2) {
max2 = count;
mostrepeated2 = number;
}
continue;
}
count = 1;
}
if((max1==max2) && (mostrepeated1==mostrepeated2)) {
System.out.println("Most Consecutively repeated character is: " + mostrepeated1 + " and is repeated " + max1 + " times.");
}
else if((max1==max2) && (mostrepeated1!=mostrepeated2)) {
System.out.println("Most continously repeated characters are: " + mostrepeated1 + " and " + mostrepeated2 + " and they are repeated " + max1 + " times");
}
}

I'll give you a Scala implementation for that problem.
Here it is the automatic test (in BDD style with ScalaTest)
import org.scalatest._
class RichStringSpec extends FlatSpec with MustMatchers {
"A rich string" should "find the longest run of consecutive characters" in {
import Example._
"abceedd".longestRun mustBe Set("ee", "dd")
"aeebceeedd".longestRun mustBe Set("eee")
"aaaaaaa".longestRun mustBe Set("aaaaaaa")
"abcdefgh".longestRun mustBe empty
}
}
Following is the imperative style implementation, with nested loops and mutable variables as you would normally choose to do in Java or C++:
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
val chunks = mutable.Set.empty[String]
val ilen = string.length
var gmax = 0
for ((ch, curr) <- string.zipWithIndex) {
val chunk = mutable.ListBuffer(ch)
var next = curr + 1
while (next < ilen && string(next) == ch) {
chunk += string(next)
next = next + 1
}
gmax = chunk.length max gmax
if (gmax > 1) chunks += chunk.mkString
}
chunks.toSet.filter( _.length == gmax )
}
}
}
Following is a functional-style implementation, hence no variables, no loops but tail recursion with result accumulators and pattern matching to compare each character with the next one (Crazy! Isn't it?):
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
def recurse(chars: String, chunk: mutable.ListBuffer[Char], chunks: mutable.Set[String]): Set[String] = {
chars.toList match {
case List(x, y, _*) if (x == y) =>
recurse(
chars.tail,
if (chunk.isEmpty) chunk ++= List(x, y) else chunk += y,
chunks
)
case Nil =>
// terminate recursion
chunks.toSet
case _ => // x != y
recurse(
chars.tail,
chunk = mutable.ListBuffer(),
chunks += chunk.mkString
)
}
}
val chunks = recurse(string, mutable.ListBuffer(), mutable.Set.empty[String])
val max = chunks.map(_.length).max
if (max > 0) chunks.filter( _.length == max ) else Set()
}
}
}
For example, for the given "aeebceeedd" string, both implementations above will build the following set of chunks (repeating characters)
Set("ee", "eee", "dd")
and they will filter those chunks having the maximum length (resulting "eee").

Get all possible combinations of n booleans? [duplicate]

I tied to simplify the task as much as possible, so I could apply it to my algorithm.
And here is the challenge for mathematicians and programmers:
I need to create a method where I pass parameter int n:
public void optionality_generator(int n){
//some kind of loops, or recursions...to make it workable
System.out.println("current combination: ...");
}
The output should show all possible combinations of true's and false's.
Here is examples where N=1; N=2; N=3; N=4; N=5 where x=false and 0=true; Please note, empty break lines is just for you to recognise easier the patterns. Hopefully, I included all possible combinations):
Combination of 1:
0
x
Combination of 2:
00
x0
0x
xx
Combination of 3:
000
X00
0X0
00X
XX0
0XX
XXX
Combination of 4:
0000
X000
0X00
00X0
000X
XX00
X0X0
X00X
0XX0
0X0X
00XX
XXX0
XX0X
X0XX
0XXX
XXXX
Combination of 5:
00000
X0000
0X000
00X00
000X0
0000X
XX000
X0X00
X00X0
X000X
X0X00
X00X0
X000X
0XX00
0X0X0
0X00X
00XX0
00X0X
000XX
XXX00
XX0X0
XX00X
X0XX0
X0X0X
X00XX
0XXX0
0XX0X
00XXX
XXXX0
XXX0X
XX0XX
X0XXX
0XXXX
XXXXX
Also, If you see the output, here is the pattern I recognized, that all combinations are inverted on half (e.g first combination is 00000 last one will be XXXXX, second one X0000, one before the last one will be 0XXXX etc..). Maybe, this pattern will help to make the whole algorithm more efficient, not sure about this.
Thank you in advance!

Here is a really basic way using only Java APIs:
final int n = 3;
for (int i = 0; i < Math.pow(2, n); i++) {
String bin = Integer.toBinaryString(i);
while (bin.length() < n)
bin = "0" + bin;
System.out.println(bin);
}
Result:
000
001
010
011
100
101
110
111
Of course, you can set n to whatever you like. And, with this result, you can pick the nth character from the string as true/false.
If you only need to check if a bit is true, you don't need to convert it to a string. This is just to illustrate the output values.

Just a clue but think about the bits that are set for a number with at most 'n' bits. You'll see if you go from 0 to 'n' number of bits (3 in this case); the bits are 000, 001, 010, 011, 100, 101, 110, 111. You can figure out the max number that can fit in 'n' bits by using the ((n*n)-1) formula.

This should do the trick
int cols = 3;
int rows = (int) Math.pow(2, cols);
for (int row = 0; row < rows; row++)
System.out.println(String.format("%" + cols + "s",
Integer.toBinaryString(row)).replace(' ', '0').replace('1', 'X'));
out:
000
00X
0X0
0XX
X00
X0X
XX0
XXX

Using recursion is not as easy as using the Java Integer.toBinaryString() API for generating binary strings. But the code below gives you the flexibility to generate any base representation, e.g. base 3:
"000"
"001"
"002"
"010"
"011"
"012"
For base 2 (i.e. binary) strings, you call it like this:
getBinaryStrings(2, 3);
For base 3 strings, you call it like this:
getBinaryStrings(3, 3);
Here is the code:
public static List<String> getBinaryStrings(int base, int n){
ArrayList<String> result = new ArrayList<>();
getBinaryStringsCore(base, n, "", result);
return result;
}
private static void getBinaryStringsCore(int base, int n, String tempString, List<String> result){
if (tempString.length() == n) {
result.add(tempString);
return;
}
for (int i = 0; i < base; i++) {
tempString += i;
getBinaryStringsCore(base, n, tempString, result);
tempString = tempString.substring(0, tempString.length() - 1);
}
}

Here's a simple version implemented using recursion
public void optionality_generator(int n){
ArrayList<String> strings = generatorHelper(n);
for(String s : strings){
System.out.println(s);
}
}
private ArrayList<String> generatorHelper(int n){
if(n == 1){
ArrayList<String> returnVal = new ArrayList<String>();
returnVal.add("0");
returnVal.add("X");
return returnVal;
}
ArrayList<String> trueStrings = generatorHelper(n-1);
for(String s : trueStrings){
s += "0";
}
ArrayList<String> falseStrings = generatorHelper(n-1);
for(String s : falseStrings){
s += "X";
}
trueStrings.addAll(falseStrings);
return trueStrings;
}

Here's a test-driven version:
import static org.junit.Assert.assertEquals;
import java.util.ArrayList;
import java.util.List;
import org.junit.Test;
public class OptionalityTest {
#Test
public void testOptionality0() throws Exception {
assertEquals("[]", optionality(0).toString());
}
#Test
public void testOptionality1() throws Exception {
assertEquals("[0, x]", optionality(1).toString());
}
#Test
public void testOptionality2() throws Exception {
assertEquals("[00, x0, 0x, xx]", optionality(2).toString());
}
#Test
public void testOptionality3() throws Exception {
assertEquals("[000, x00, 0x0, xx0, 00x, x0x, 0xx, xxx]", optionality(3).toString());
}
private List<String> optionality(int i) {
final ArrayList<String> list = new ArrayList<String>();
if (i == 1) {
list.add("0");
list.add("x");
}
if (i > 1) {
List<String> sublist = optionality(i - 1);
for (String s : sublist) {
list.add("0" + s);
list.add("x" + s);
}
}
return list;
}
}

Here is a modification from Erics code above, that uses c# and allows input of any number of boolean variable names. It will output all possible combinations in c# code ready for insert into an if statement. Just edit the 1st line of code with var names, and then run in LINQpad to get a text output.
Output example...
!VariableNameA && !VariableNameB && !VariableNameC
!VariableNameA && !VariableNameB && VariableNameC
!VariableNameA && VariableNameB && !VariableNameC
!VariableNameA && VariableNameB && VariableNameC
VariableNameA && !VariableNameB && !VariableNameC
VariableNameA && !VariableNameB && VariableNameC
VariableNameA && VariableNameB && !VariableNameC
VariableNameA && VariableNameB && VariableNameC
//To setup edit var names below
string[] varNames = { "VariableNameA", "VariableNameB", "VariableNameC" };
int n = varNames.Count();
for (int i = 0; i < Math.Pow(2, n); i++) {
String bin = Convert.ToString(i, 2);
while (bin.Length < n) {
bin = "0" + bin;
}
string and = " && ";
string f = "!";
string t = " ";
var currentNot = bin[0] == '0' ? f : t;
//string visual = bin[0].ToString();
string visual = currentNot + varNames[0];
for (var j = 1; j < n; j++) {
currentNot = bin[j] == '0' ? f : t;
//visual = visual + and + bin[j].ToString();
visual = visual + and + currentNot + varNames[j];
}
Console.WriteLine(visual);
}

Longest Common Substring in a big text

i have this assignment for school which ask us to write code to find the longest common Substring. I have done that, but it only works with text that are not so big and it is being asked to find the common substring for Moby Dick and War And Peace. If you could point me in the right direction of what i'm doing wrong, i would appreciate it. The compiler is complaining that the error is in the substring method of the MyString class when i call it to create the SuffixArray but idk why its saying its too big, giving me the outofmemory
package datastructuresone;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
class SuffixArray
{
private final MyString[] suffixes;
private final int N;
public SuffixArray(String s)
{
N = s.length();
MyString snew = new MyString(s);
suffixes = new MyString[N];
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
Arrays.sort(suffixes);
}
public int length()
{
return N;
}
public int index(int i)
{
return N - suffixes[i].length();
}
public MyString select(int i)
{
return suffixes[i];
}
// length of longest common prefix of s and t
private static int lcp(MyString s, MyString t)
{
int N = Math.min(s.length(), t.length());
for (int i = 0; i < N; i++)
{
if (s.charAt(i) != t.charAt(i))
{
return i;
}
}
return N;
}
// longest common prefix of suffixes(i) and suffixes(i-1)
public int lcp(int i)
{
return lcp(suffixes[i], suffixes[i - 1]);
}
// longest common prefix of suffixes(i) and suffixes(j)
public int lcp(int i, int j)
{
return lcp(suffixes[i], suffixes[j]);
}
}
public class DataStructuresOne
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner in1 = new Scanner(new File("./build/classes/WarAndPeace.txt"));
Scanner in2 = new Scanner(new File("./build/classes/MobyDick.txt"));
StringBuilder sb = new StringBuilder();
StringBuilder sb1 = new StringBuilder();
while (in1.hasNextLine())
{
sb.append(in1.nextLine());
}
while (in2.hasNextLine())
{
sb1.append(in2.nextLine());
}
String text1 = sb.toString().replaceAll("\\s+", " ");
String text2 = sb1.toString().replaceAll("\\s+", " ");
int N1 = text1.length();
int N2 = text2.length();
SuffixArray sa = new SuffixArray(text1 + "#" + text2);
int N = sa.length();
String substring = "";
for (int i = 1; i < N; i++)
{
// adjacent suffixes both from second text string
if (sa.select(i).length() <= N2 && sa.select(i - 1).length() <= N2)
{
continue;
}
// adjacent suffixes both from first text string
if (sa.select(i).length() > N2 + 1 && sa.select(i - 1).length() > N2 + 1)
{
continue;
}
// check if adjacent suffixes longer common substring
int length = sa.lcp(i);
if (length > substring.length())
{
substring = sa.select(i).toString().substring(0, length);
System.out.println(substring + " ");
}
}
System.out.println("The length of the substring " + substring.length() + "length on first N " + N1 + " length of Second N " + N2
+ "The length of the array sa: " + N);
System.out.println("'" + substring + "'");
final class MyString implements Comparable<MyString>
{
public MyString(String str)
{
offset = 0;
len = str.length();
arr = str.toCharArray();
}
public int length()
{
return len;
}
public char charAt(int idx)
{
return arr[ idx + offset];
}
public int compareTo(MyString other)
{
int myEnd = offset + len;
int yourEnd = other.offset + other.len;
int i = offset, j = other.offset;
for (; i < myEnd && j < yourEnd; i++, j++)
{
if (arr[ i] != arr[ j])
{
return arr[ i] - arr[ j];
}
}
// reached end. Who got there first?
if (i == myEnd && j == yourEnd)
{
return 0; // identical strings
}
if (i == myEnd)
{
return -1;
} else
{
return +1;
}
}
public MyString substring(int beginIndex, int endIndex)
{
return new MyString(arr, beginIndex + offset, endIndex - beginIndex);
}
public MyString substring(int beginIndex)
{
return substring(beginIndex, offset + len);
}
public boolean equals(Object other)
{
return (other instanceof MyString) && compareTo((MyString) other) == 0;
}
public String toString()
{
return new String(arr, offset, len);
}
private MyString(char[] a, int of, int ln)
{
arr = a;
offset = of;
len = ln;
}
private char[] arr;
private int offset;
private int len;
}

Here:
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
You are trying to store, not only the entire long string, but the entire string - 1 letter, and the entire string - 2 letters, etc. All of these are stored separately.
If your String were only 10 letters, you would be storing a total of 55 characters worth in 10 different string.
At 1000 characters, you are storing 500500 characters total.
More generally, you are having to handle, length*(length+1)/2 characters.
Just for fun, I don't know how many characters are in War and Peace, but with a page count around 1250, a typical words/page estimate being 250, and the average word being about 5 characters long, comes to:
(1250 * 250 * 5)*(1250 * 250 * 5 + 1)/2 = 1.2207039 * 10^12 characters.
The size of a char in memory being 2 bytes, so you're looking at about 2.22 TB in size (compared to 1.49 MB for just the text of the novel).

I count at least 3 copies of both texts in the first few lines of the code. Here's a few ideas
convert the spaces as you read each line in--not after they are huge strings. Don't forget the case of spaces at the front and end of lines.
build your MyString class using StringBuilder as the base instead of String. Do all the looking inside the StringBuilder with its native methods, if you can.
don't extract strings any more than you have to.
Look up the -Xmx java runtime option and set the heap space large than the default. You'll have to google this as I don't have it memorized. Just notice that -Xmx=1024M needs that M at the end. (Look at the file size to see how big the two books are.)

When you construct MyString, you call arr = str.toCharArray(); which makes a new copy of the string's character data. But in Java, a string is immutable - so why not store a reference to the string instead of a copy of its data?
You construct every suffix at once, but you only refer to one (well, two) at a time. If you recode your solution to only reference the suffixes it currently cares about, and construct them only when it needs them (and lose a reference to them afterwards), they can be garbage collected by Java. This will make running out of memory less likely. Compare the memory overhead of storing 2 strings to storing hundreds of thousands of strings :)

I wrote this program in Scala. Maybe you can translate it to Java.
class MyString private (private val string: String, startIndex: Int, endIndex: Int) extends Comparable[MyString] {
def this(string: String) = this(string, 0, string.length)
def length() = endIndex-startIndex
def charAt(i: Int) = {
if(i >= length) throw new IndexOutOfBoundsException
string.charAt(startIndex + i)
}
def substring(start: Int, end: Int): MyString = {
if(start < 0 || end > length || end < start) throw new IndexOutOfBoundsException
new MyString(string, startIndex + start, startIndex + end)
}
def substring(start: Int): MyString = substring(start, length)
def longestCommonSubstring(other: MyString): MyString = {
var index = 0
val len = math.min(length, other.length)
while(index < len && charAt(index) == other.charAt(index)) index += 1
substring(0, index)
}
def compareTo(other: MyString): Int = {
val len = math.min(length, other.length)
for(i <- 0 until len) {
if(charAt(i) > other.charAt(i)) return 1
if(charAt(i) < other.charAt(i)) return -1
}
length-other.length
}
def >(other: MyString) = compareTo(other) > 0
def <(other: MyString) = compareTo(other) < 0
override def equals(other: Any) = other.isInstanceOf[MyString] && compareTo(other.asInstanceOf[MyString]) == 0
override def toString() = "\"" + string.substring(startIndex, endIndex) + "\""
}
def readFile(name: String) = new MyString(io.Source.fromFile(name).getLines.mkString(" ").replaceAll("\\s+", " "))
def makeList(str: MyString) = (0 until str.length).map(i => str.substring(i)).toIndexedSeq
val string1 = readFile("WarAndPeace.txt")
val string2 = readFile("MobyDick.txt")
val (list1, list2) = (makeList(string1).sorted, makeList(string2).sorted)
var longestMatch = new MyString("")
var (index1, index2) = (0,0)
while(index1 < list1.size && index2 < list2.size) {
val lcs = list1(index1).longestCommonSubstring(list2(index2))
if(lcs.length > longestMatch.length) longestMatch = lcs
if(list1(index1) < list2(index2)) index1 += 1
else index2 += 1
}
println(longestMatch)

Generating the lexicographically greatest string

The question is to generate the lexicographically greatest string given some string s.
So the aim is to find lexicographically greatest, unique(no repetitions) substring s1 from s.
We say that some subsequence s1 is greater than another subsequence s2 if s1 has more characters than s2 or s1 is lexicographically greater than s2 if equal length.
I/O are as follows:
Input is: babab
output is: ba
Second input is: nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle
Second output is:
tsocrpkijgdqnbafhmle
This is what I wrote for my java code but my code fails on the second test case. Also I'm having a hard time understanding why second output isn't tsrqponmlkjihgfedcba.
Can somebody provide suggestions for a fix or even java code?
I think the algorithm has to be more efficient than generating all possible unique strings, sort them and find lexicographically largest one.
To make the question much clearer, if the input is babab, then all the possible unique combinations would be b, a, ba, ab. And the output will be ba because it's the longest and lexicographically greater than ab.
Note: this is not a homework assignment.
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
public class mostBeautiful {
final static int MAX = 1000000;
static String[] permute;
static void permutation(String prefix, String str, int counter) {
int n = str.length();
//System.out.println("n is: "+ n);
if (n == 0) {
permute[counter] = prefix;
} else {
for (int i = 0; i < n; i++) {
//System.out.println("str is: "+ str);
permutation(prefix + str.charAt(i), str.substring(0, i) + str.substring(i+1, n), counter++);
}
}
}
public static void main(String[] args) throws IOException {
BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
String s = bf.readLine();
char[] unique = new char[26];
int counter = 0;
String answer = "";
//System.out.println("s is: " + s);
int ascii = 0;
final int asciiAVal = 97;
final int asciiZVal = 122;
for (int i = 0; i < s.length(); i++) {
ascii = (int)s.charAt(i);
if (ascii < asciiAVal || ascii > asciiZVal) {
continue;
}
char ch = s.charAt(i);
unique[ch - 'a'] = ch;
}
String result = "";
for (int j = 25; j >= 0; j--) {
result += unique[j];
}
result = result.trim();
System.out.println(result);
int size = result.length() * (result.length() - 1);
permute = new String[size];
permutation("", result, counter);
for (int i = 1; i < size; i++) {
if (permute[i].compareTo(permute[i - 1]) > 0){
answer = permute[i];
} else {
answer = permute[i - 1];
}
}
System.out.println("answer is: " + answer);
}
}

After thinking about this problem in many ways, I have determined a divide-and-conquer algorithm that gets the results right:
Algorithm - Pseudocode
Assuming some input string, S defined as a concatenation of two substrings A + B, we compute the lexicographically greatest string recursively as:
LexMax(S) = Merge(LexMax(A),LexMax(B))
Where
LexMax(S)
{
if Length(S) = 1
return S
else
{
LMA = LexMax(S[0:Length/2])
LMB = LexMax(S[Length/2:end])
return Merge(LMA,LMB)
}
}
Merge(A,B)
{
Sa = A
Sb = B
for n = 0:Length(A)
{
if Sb contains A[n]
{
if A[n+1:end] contains character > A[n]
Remove A[n] from Sa
else
Remove A[n] from Sb
}
}
return Sa + Sb
}
Java Code
Coming soon!
Example
Given an input string
cefcfdabbcfed
Divide it into
cefcfda
bbcfed
Assuming the function works we have:
LexMax("cefcfda") = "efcda"
LexMax("bbcfed") = "bcfed"
Merging works as follows:
e: efcda bcfed
In both substrings, greater value found to right of e in left substring, remove from left
f: fcda bcfed
In both substrings, no greater value in left substring, remove from right
c: fcda bced
In both substrings, greater value found to right of c in left substring, remove from left
d: fda bced
In both substrings, no greater value in left substring, remove from right
a: fda bce
Not in both substrings, do nothing
Final result:
LexMax(cefcfdabbcfed) = fdabce

This is not a direct answer, but doesn't this code meet the requirement as you explained it in the discussion above?
final String x = "saontehusanoethusnaoteusnaoetuh";
final SortedSet<Character> chars =
new TreeSet<Character>(Collections.reverseOrder());
for (char c : x.toCharArray()) chars.add(c);
System.out.println(chars);

Lexicographic order is an order in which words are displayed in alphabetical order using the appearance of letters in the word.It is also know as dictionary order or alphabetical order.For ex:-"Africa" is smaller than "Bangladesh" ,"He" is smaller than "he".
public class LexicographicExample {
public static void main(String a[]) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter the String:-");
String str = sc.nextLine();
System.out.println("Enter the length");
int count = sc.nextInt();
List<String> list = new ArrayList<String>();
for (int i = 0; i < str.length(); i = i + 1) {
if (str.length() - i >= count) {
list.add(str.substring(i, count + i));
}
}
Collections.sort(list);
System.out.println("Smallest subString:-" + list.get(0));
System.out.println("Largest subString:-" + list.get(list.size() - 1));
}
}
For reference ,refer this link http://techno-terminal.blogspot.in/2015/09/java-program-to-find-lexicographically.html

"tsrqponmlkjihgfedcba" is not the answer because it is not a subsequence of the input. The definition of subsequence requires that the characters of the subsequence occur in the original sequence in the same order. For example, "abc" is a subsequence of "apbqcr", while "cba" is not.
As to the solution, I think a simple greedy algorithm would suffice. First, one has to understand that the maximum possible length of the output is the number of unique symbols (say, N) in the input. Since any output shorter than that would not be the greatest one, it has to be exactly N symbols long. The rest of the procedure is simple and at most quadratic in time complexity: one has to go through the input string and at each step pick the lexicographically highest symbol such that the part of the string to the left of it would still contain all the "unused" symbols.
As an example, consider a string "bacb". The first symbol can be 'a' or 'b', since in both cases the remainder contains both of the other letters. 'b' is greater, so we pick it. Now for "acb" we can only pick 'a' and than 'c' according to that condition, so we end up with "bac" for output.

import java.util.ArrayList;
import java.util.HashMap;
import java.util.Scanner;
class aaa {
public static void main(String args[]) throws Exception {
Scanner scan = new Scanner(System.in);
// int n = scan.nextInt();
String s = scan.next();
HashMap<Character, Node5> map = new HashMap<>();
for (int i = 0; i < s.length(); i++) {
if (!map.containsKey(s.charAt(i))) {
Node5 node = new Node5();
node.nl.add(i);
node.li = i;
map.put(s.charAt(i), node);
} else {
Node5 rn = map.get(s.charAt(i));
rn.nl.add(i);
rn.li = i;
map.put(s.charAt(i), rn);
}
}
String s1 = "";
int index = -1;
for (int i = 25; i >= 0; i--) {
if (map.containsKey((char) (97 + i))) {
if (map.get((char) (97 + i)).li > index) {
for (int j = 0; j < map.get((char) (97 + i)).nl.size(); j++) {
if (map.get((char) (97 + i)).nl.get(j) > index) {
s1 += (char) (97 + i);
index = map.get((char) (97 + i)).nl.get(j);
}
}
}
}
}
System.out.println(s1);
scan.close();
}
}
class Node5 {
int li;
ArrayList<Integer> nl;
public Node5() {
this.nl = new ArrayList<>();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I find the decomposition of a string? - java

Related

sub arraylist's size isn't correct

Java: Find the longest sequential same character array

Get all possible combinations of n booleans? [duplicate]

Longest Common Substring in a big text

Generating the lexicographically greatest string

Categories

Resources