I need to find the frequency of each character in a String using recursion.
Found this question online and wanted to do this as a challenge.
Have used two variables 'a' and 'i', where 'a' is used to store the index of the current character in the string that needs to be searched and 'i' is used to go through the entire string in search of the character that 'a' has extracted.
Finding the frequency of each character present in the word.
import java.util.*;
public class ini {
public static void main(String args[]) {
recur(0, 0, "Hello how are you", 0);
}
static private char m = ' ';
private static boolean recur(int a, int i, String s, int count) {
if (s.length() >= 1) {
if (a < s.length()) {
m = s.charAt(a);
if (i < s.length()) {
if (s.charAt(a) == s.charAt(i)) {
count += 1;
}
recur(a, ++i, s, count);
}
i = 0;
System.out.println(s.charAt(a) + ":" + count);
s = s.replaceAll(Character.toString(s.charAt(a)), "");
a += 1;
count = 0;
}
if (a != s.length() - 1) {
recur(a, i, s, count);
}
} else {
return false;
}
return true;
}
}
The current output ignores the letter "w" altogether
H:1
l:2
:3
o:3
r:1
y:1
Exception in thread "main" java.lang.StackOverflowError
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at ini.recur(ini.java:26)
at...
There are a couple of things that we don't know:
Should h and H be considered only one character?
Should you count the spaces? (Programmatically speaking, space is a character)
Do you need an improved solution?
Are you allowed to do manipulate the initial text?
Some observations:
You need to rename your variables better
You don't need the static field
You don't need the recursive function to be boolean
a is used only for the identification of the character, and the increment is not needed
Quick solution:
private static void recur(int startingIndex, int recursionIndex, String text, int count) {
if (text.length() >= 1) {
if (startingIndex < text.length()) {
char currentCharacter = text.charAt(startingIndex);
if (recursionIndex < text.length()) {
if (currentCharacter == text.charAt(recursionIndex)) {
count += 1;
}
recur(startingIndex, ++recursionIndex, text, count);
} else {
System.out.println(currentCharacter + ":" + count);
text = text.replace(Character.toString(currentCharacter), "");
recur(0, 0, text, 0);
}
}
}
}
Improved solution:
public class Main {
public static void main(String[] args) {
recur(0, "Hello how are you", 0);
}
private static void recur(int index, String text, int count) {
if (text.length() >= 1) {
char currentCharacter = text.charAt(0);
if (index< text.length()) {
if (currentCharacter == text.charAt(index)) {
count += 1;
}
recur(++index, text, count);
} else {
System.out.println(currentCharacter + ":" + count);
text = text.replace(Character.toString(currentCharacter), "");
recur(0, text, 0);
}
}
}
}
The optimal solution without modifying the initial text:
private static int recur(char character, String text, int index) {
if (index >= text.length()) {
return 0;
}
int count = text.charAt(index) == character? 1 : 0;
return count + recur(text, character, index + 1);
}
After much tinkering I've figured it out. Basically you should not increment a. This will skip over letters and thus remove the line where a is incremented.a += 1; Furthermore, with recursion (I was struggling to remember myself) you want to be careful how you call the function you are in. If you don't make the recursive call as the last step (tail recursion), you will enter an infinite loop for various reasons here. All you need to do is add a return statement before the first recursive call and you will have solved it like so.
import java.util.*;
public class ini {
public static void main(String args[]) {
recur(0, 0, "Hello how are you", 0);
}
static private char m = ' ';
private static boolean recur(int a, int i, String s, int count) {
if (s.length() >= 1) {
if (a < s.length()) {
m = s.charAt(a);
if (i < s.length()) {
if (s.charAt(a) == s.charAt(i)) {
count += 1;
}
//Added crucial return statement
return recur(a, ++i, s, count);
}
i = 0;
System.out.println(s.charAt(a) + ":" + count);
s = s.replaceAll(Character.toString(s.charAt(a)), "");
//removed a += 1;
count = 0;
}
if (a != s.length() - 1) {
recur(a, i, s, count);
}
} else {
return false;
}
return true;
}
}
Output :
H:1
e:2
l:2
o:3
:3
h:1
w:1
a:1
r:1
y:1
Here is a link about tail vs. head recursion : Tail vs. Head Recursion
Hope this helps you!
My approach is slightly different from yours but you might find it interesting.
In my approach I am removing the character and checking the difference in the length of String. The change in length would be the times that character repeated. Rest is explained in the code.
public class CharactersFrequency {
public static void main(String[] args) {
CharactersFrequency cF = new CharactersFrequency();
long startTime = System.currentTimeMillis();
// I generated a sting with 1000 characters from a website
cF.frequencyOfCharacters("a quick brown fox jumps over the lazy dog");
long endTime = System.currentTimeMillis();
System.out.println("Runtime: " + (endTime - startTime) + " ms");
}
private void frequencyOfCharacters(String input) {
CharactersFrequency cF = new CharactersFrequency();
cF.frequencyOfCharactersRec(input, input.charAt(0) + "");
}
public void frequencyOfCharactersRec(String input, String currentChar) {
// If only one char is left
if (input.length() <= 1) {
System.out.println(currentChar + ": 1");
} else {
// Checking Initial length and saving it
int inputOldLength = input.length();
// Removing the char whose frequency I am checking
input = input.replace(currentChar, "");
// Checking new length
int inputNewLength = input.length();
// The difference between length should be the number of times that char
// repeated
System.out.println(currentChar + " : " + (inputOldLength - inputNewLength));
// In some cases after replace function the string becomes empty
// thus charAt(0) gives an error
if (inputNewLength > 0) {
frequencyOfCharactersRec(input, input.charAt(0) + "");
}
}
}
}
Output:
a : 2
: 8
q : 1
u : 2
i : 1
c : 1
k : 1
b : 1
r : 2
o : 4
w : 1
n : 1
f : 1
x : 1
j : 1
m : 1
p : 1
s : 1
v : 1
e : 2
t : 1
h : 1
l : 1
z : 1
y : 1
d : 1
g: 1
Runtime: 3 ms
Related
I need to create an algorithm for String decomposition.
Some examples:
ABCABCDEDEDEF --> ABC*2+DE*3+F
ABCcABCczcz --> ABC*2+cz*2+c
test --> test
Each segment of the string should be seperated by a + and, if repeated, followed up by a * plus the number of times it appears in succession.
This is what I have tried:
private static int[] prefixFunction(String source) {
int n = source.length();
int[] pi = new int[n];
for (int i = 1; i < n; i++) {
int j = pi[i - 1];
while (j > 0 && source.charAt(i) != source.charAt(j))
j = pi[j - 1];
if (source.charAt(i) == source.charAt(j))
j++;
pi[i] = j;
}
return pi;
}
This solution keeps everything in order, meaning an input like ABCABCDEDEDEF will return ABC*2+DE*3+F or an input like abDEDEab will return ab+DE*2+ab.
If you don't keep the order, it will be impossible to reconstruct the String later with 100 % accuracy.
public static void main(String[] args) {
String input = "ABCABCDEDEDEF";
String output = findDecomposition(input);
System.out.println("Output: " + output);
}
public static String findDecomposition(String input) {
String substring = input;
StringBuilder builder = new StringBuilder();
for (int start = 0, count = 1; start < input.length(); start++, count = 1) {
for (int end = start + 1; end < input.length(); end++) {
substring = input.substring(start, end);
while (true) {
String next = input.substring(start + substring.length(), Math.min(end + substring.length(), input.length()));
if (next.equals(substring)) {
count++;
start += substring.length();
end += substring.length();
} else
break;
}
if (count > 1) {
start += substring.length() - 1;
break;
}
}
if (count > 1) {
if (builder.length() > 0 && builder.charAt(builder.length() - 1) != '+')
builder.append('+');
builder.append(substring + "*" + count + "+");
} else
builder.append(input.charAt(start));
}
String result = builder.toString();
if (result.endsWith("+"))
return result.substring(0, result.length() - 1);
else
return result;
}
THe brute force alghoritm can work as follows.
Prerequisities:
First letter is set as root
Data structure of each possible solution is linked list. Value of each node is text to be written.
When outputting solution, first put to Map all text values together with number of appereances. If it appears more than once, use * as multiplicator
Example: One of the solution looks like this ABC-C-ABC, the output will be ABC*2+C
Solution:
Take next letter from input
New solutions are based on existing solutions. Each new solution is old solution + new letter added in one of the existing nodes or as single letter in new node.
Save new solutions as existing solutions.
Repeat from 1 until you process all letters
Calculate value of all solutions and select one with lowest string characters
I added example, as you can see the number of solutions are increasing quickly so it is not fully finished for all 6 letters. Each step represent the cycle from 1. to 4., you can see that in each step the previous solutions are used as base for new solutions. There are multiple new solutions created for each existing solution.
This code returns the following compositions:
ABCABCDEDEDEF -> ABC*2+DE*3+F
ABCcABCczcz -> ABCc*2+zcz
cefABCcABCczcz -> cef+ABCc*2+zcz
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
public class Decomposition {
public static void main(String[] args) {
Decomposition d = new Decomposition("ABCABCDEDEDEF");
System.out.println(d.getOptimalDecomposition());// Output: ABC*2+DE*3+F
d = new Decomposition("ABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: ABCc*2+zcz
d = new Decomposition("cefABCcABCczcz");
System.out.println(d.getOptimalDecomposition());// Output: cef+ABCc*2+zcz
}
private List> decompositions;
private String toDecompose;
public Decomposition(String toDecompose) {
decompositions = new ArrayList();
this.toDecompose = toDecompose;
}
public String getOptimalDecomposition() {
decompose(0, new ArrayList());
return calculateOptimal(convertToPartsMap());
}
private String calculateOptimal(List> partsCount) {
Collections.sort(partsCount, new SortDecompositions());
StringBuilder optimal = new StringBuilder();
for (int i = 0; i 1) {
optimal.append("*");
optimal.append(pc.count);
}
if (i != partsCount.get(0).size() - 1) {
optimal.append("+");
}
}
return optimal.toString();
}
private List> convertToPartsMap() {
List> partsMap = new ArrayList();
for (List parts : decompositions) {
List partsCount = new ArrayList();
String lastPart = null;
int curCount = 0;
for (int i = 0; i parts) {
if (nextChar == toDecompose.length()) {
decompositions.add(parts);
return;
}
char toAdd = toDecompose.charAt(nextChar);
if (parts.isEmpty()) {
parts.add("" + toAdd);
decompose(nextChar + 1, parts);
} else {
// left
List leftParts = parts.stream().collect(Collectors.toList());// shallow copy
if (!leftParts.isEmpty()) {
int last = leftParts.size() - 1;
leftParts.set(last, leftParts.get(last) + toAdd);
} else {
leftParts.add("" + toAdd);
}
// right
List rightParts = parts.stream().collect(Collectors.toList());// shallow copy
rightParts.add("" + toAdd);
decompose(nextChar + 1, leftParts);
decompose(nextChar + 1, rightParts);
}
}
}
class PartCount {
String part;
int count;
public PartCount(String part, int count) {
this.part = part;
this.count = count;
}
#Override
public String toString() {
return "[" + part + ", " + count + "]";
}
}
class SortDecompositions implements Comparator> {
public int compare(List a, List b) {
// Here you can define what exactly means "taking up least space".
return countChars(a) - countChars(b);
}
private int countChars(List listPc) {
int count = 0;
for (PartCount pc : listPc) {
count += pc.part.length();
}
return count;
}
}
This can be solved by using KMP alogorthm longest prefix which is also suffix
Steps:
iterate the string "ABCABCDEDEDEF" and construct lps array for the string. The values in the array will be
0 0 0 1 2 3 0 0 0 0 0 0 0
This lps array gives the number of times the prefix is repeated in the string.
In the above case it is repeated only one time. Considering the actual prefix number of times will be 2 it becomes ABC*2
Take the substring of the remaining string and repeat the step 1 till the end of the string.
I can provide you the code if needed. The worst time complexity will be O(n2)
The scenario is - I read the last line of a file, increment it by one and write it back.
The read and write has been done. I am finding it difficult to increment the alpha-numberic values as it has a few conditions.
The conditions are:
It should only be 3 characters long
Example : A01, A02.... A99, B01, B02.... B99..
Once Z99 is reached it should be AA1, AA2, AA3...AA9, .....
Then AB1, AB2,... AZ9
So basically while incrementing the value should not go AA10 which makes it 4 characters
What I am doing now is separating the alphabets and integers, incrementing it and concatenating them back.
The code so far:
String[] part = lastLine.split("(?<=\\D)(?=\\d)");
System.out.println(part[0]);
System.out.println(part[1]);
int numberOnly = Integer.parseInt(lastLine.replaceAll("[^0-9]", ""));
numberOnly++;
String lettersOnly = lastLine.replaceAll("[^A-Z]", "");
if (lettersOnly.length() > 1){
String lastLetter = lettersOnly.substring(lettersOnly.length() - 1);
if(lastLetter.equalsIgnoreCase("Z") && number.equalsIgnoreCase("9") ){
String notLastLetter = lettersOnly.substring(lettersOnly.length() - 2);
char d = lettersOnly.charAt(0);
d++;
System.out.println("Letters after increment more tan two : " +d);
lettersOnly = Character.toString(d) + "Z";
}
}
else{
}
System.out.println("Letters after increment : " +lettersOnly);
Any help would be greatly appreciated.
public class AlphaNumericCounter {
String[] part;
int counter; //Variable storing numeric part of counter
String alpha; //Variable storing Alpha part of counter
static String final_output = "A00"; // First Input considered as A00 and also the variable which will be store each count
static boolean continueIncrement = true; //For running the loop till we reach ZZ9
/* Def constructor */
public AlphaNumericCounter() {
}
/* Constructor called from main method with primary input A00 */
public AlphaNumericCounter(String number) {
part = number.split("(?<=\\D)(?=\\d)");
}
/* Function called each time from inside loop to generate next alphanumeric count */
public void increment() {
part = final_output.split("(?<=\\D)(?=\\d)");
counter = Integer.valueOf(part[1]) + 1;
alpha = part[0];
}
public String toString() {
if (alpha.length() == 1){
if (String.valueOf(counter).length() > 2){
if ((int)alpha.charAt(0) + 1 > 90/*If Z encountered*/){
alpha = "AA";
}else{
alpha = String.valueOf((char)((int)alpha.charAt(0) + 1));//Take Next Alphabet
}
counter = 1; //Reset counter to 1
}
}else{
//We have AA, AB ... ZZ format of alpha
if (String.valueOf(counter).length() > 1){
if ((int)alpha.charAt(0) + 1 > 90 && (int)alpha.charAt(1) + 1 > 90){
continueIncrement = false;
System.out.println("NO MORE COMBINATION AVAILABLE"); //We reached ZZ
return "";
}else if ((int)alpha.charAt(1) + 1 <= 90){
alpha = String.valueOf((char)((int)alpha.charAt(0))) + String.valueOf((char)((int)alpha.charAt(1) + 1));
counter = 1;
}else if ((int)alpha.charAt(1) + 1 > 90){
if ((int)alpha.charAt(0) + 1 <= 90){
alpha = String.valueOf((char)((int)alpha.charAt(0) + 1)) + "A";
counter = 1;
}
}
}
}
generateString();
return final_output;
}
private void generateString(){
int l1 = String.valueOf(counter).length();
int l2 = alpha.length();
final_output = alpha + (l2 == 1 && l1 == 1 ? "0" : "") + String.valueOf(counter);
}
public static void main(String[] args) {
AlphaNumericCounter lic = new AlphaNumericCounter(final_output);
while (continueIncrement){
lic.increment();
System.out.println(lic);
}
}
}
What about incrementing each "digit" separatly from right to left and handle overvlow to the next digit:
String number;//number - is your originally string
char[] digits = number.toCharArray();
boolean overflow = true;
for(int i = 2; i >= 0; i--){
if(overflow){
switch(digits[i]){
case 'Z':
digits[i] = '0';
overflow = true;
break;
case '9':
digits[i] = 'A';
overflow = false;
break;
default:
digits[i]++;
overflow = false;
}
}
}
if(overflow){
//handle ZZZ overflow here
}
String result = new String(digits);
A simple solution is to count in Base36
Try this:
class AlphaNumericIncrementer {
public static void main(String[] args) {
/*
When starting at '000' => We hit 'zzz' (i.e. Dead End) at 46,656
When starting at 'A00' => We hit 'zzz' (i.e. Dead End) at 33,696
*/
int index = 0;
String currentNumber = "000";
while (index < 46656) {
index++;
String incrementedNumber = base36Incrementer(currentNumber, 36);
currentNumber = incrementedNumber;
if (incrementedNumber.toCharArray().length != 3) {
System.out.println("We got intruder with length: " + incrementedNumber.toCharArray().length);
System.out.println("Our Intruder is: " + incrementedNumber);
break;
}
System.out.println(incrementedNumber);
}
System.out.println("Number of entries: " + index);
}
// The function that increments current string
public static String base36Incrementer(String v, int targetBase) {
String answer = Integer.toString(Integer.parseInt(v, targetBase) + 1, targetBase);
return String.format("%3s", answer).replace(' ', '0');
}
}
I have a problem for my task. I must make a program that the input is a palindrome / not a palindrome, and the output is return the substring of the string that can be a palindrome in recursive. Example :
"marah" , the output should be, ("m","a","r","a","h") , ("m","ara","h") . I dont know to implement this in recursive. Please anyone who can help me, i'm very need that code. I worked it in java. Thank you, i hope there is a help coming :D .
public static String palindrome(String s) {
String s, sub;
int i, c, length;
Scanner in = new Scanner(System.in);
System.out.println("Enter a string to print it's all substrings");
s = in.nextLine();
length = string.length();
System.out.println("Substrings of \"" + string + "\" are :-");
for (c = 0; c < length; c++) {
for (i = 1; i <= length - c; i++) {
sub = string.substring(c, c + i);
System.out.println(sub);
}
}
}
public static String longestPalindrome(String s) {
if (s.isEmpty()) {
return null;
}
if (s.length() == 1) {
return s;
}
String longest = s.substring(0, 1);
for (int i = 0; i < s.length(); i++) {
// get longest palindrome with center of i
String tmp = helper(s, i, i);
if (tmp.length() > longest.length()) {
longest = tmp;
}
// get longest palindrome with center of i, i+1
tmp = helper(s, i, i + 1);
if (tmp.length() > longest.length()) {
longest = tmp;
}
}
return longest;
}
// Given a center, either one letter or two letter,
// Find longest palindrome
public static String helper(String s, int begin, int end) {
while (begin >= 0 && end <= s.length() - 1 && s.charAt(begin) == s.charAt(end)) {
begin--;
end++;
}
return s.substring(begin + 1, end);
}
if the input is "mama", the output is only "ama", the expected is, "m","a","m","a" , "mam","a" , and "m","ama" . Anybody can help?
This is called palindrom partition, you can find it here http://www.programcreek.com/2013/03/leetcode-palindrome-partitioning-java/
I am a new guy to java. I want to find the longest sequential same character array in a input character arrays. For example,this character array bddfDDDffkl, the longest is DDD, and this one: rttttDDddjkl, the longest is tttt.
I use the following code to deal with this problem. But, I want to improve my code, For example, if there are two same length arrays (for example rtttgHHH, there are two longest: ttt and HHH), how to solve this problem?
Thanks in advance.
My following code:
public class SeqSameChar {
public static void main (String[] args) {
int subLength = 0;
Scanner sc = new Scanner(System.in);
String[] num = null;
num = sc.nextLine().split(" ");
String[] number = new String[num.length];
for(int i = 0; i< number.length;i++) {
number[i] = String.valueOf(num[i]);
}
subLength =length(number,num.length);
System.out.println(subLength);
for(int i = index; i < index+subLength; i++) {
System.out.print(number[i]);
}
System.out.println(c==c1);
}
public static int index;
//to calculate the longest contiguous increasing sequence
public static int length(String[] A,int size){
if(size<=0)return 0;
int res=1;
int current=1;
for(int i=1;i<size;i++){
if(A[i].equals(A[i-1])){
current++;
}
else{
if(current>res){
index=i-current;
res=current;
}
current=1;
}
}
return res;
}
}
This algorithm will work perfectly fine for what you want to develop:
Before that, let me make it clear that if you want to check repeatitions of 2 different characters same number of times, you have to run a for loop in reverse to identify the 2nd character. So if the 2nd character is not same as the first one identified, and also if it's number of repeatitions are the same, you print both the characters or else, just print the single character you find at the first for loop because both the characters are going to be same.
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
System.out.println("Enter String 1: ");
String A1 = sc.nextLine();
MaxRepeat(A1);
}
public static void MaxRepeat(String A) {
int count = 1;
int max1 = 1;
char mostrepeated1 = ' ';
for(int i = 0; i < A.length()-1;i++) {
char number = A.charAt(i);
if(number == A.charAt(i+1)) {
count++;
if(count>max1) {
max1 = count;
mostrepeated1 = number;
}
continue;
}
count = 1;
}
count = 1;
int max2 = 1;
char mostrepeated2 = ' ';
for(int i = A.length()-1; i>0; i--) {
char number = A.charAt(i);
if(number == A.charAt(i-1)) {
count++;
if(count>max2) {
max2 = count;
mostrepeated2 = number;
}
continue;
}
count = 1;
}
if((max1==max2) && (mostrepeated1==mostrepeated2)) {
System.out.println("Most Consecutively repeated character is: " + mostrepeated1 + " and is repeated " + max1 + " times.");
}
else if((max1==max2) && (mostrepeated1!=mostrepeated2)) {
System.out.println("Most continously repeated characters are: " + mostrepeated1 + " and " + mostrepeated2 + " and they are repeated " + max1 + " times");
}
}
I'll give you a Scala implementation for that problem.
Here it is the automatic test (in BDD style with ScalaTest)
import org.scalatest._
class RichStringSpec extends FlatSpec with MustMatchers {
"A rich string" should "find the longest run of consecutive characters" in {
import Example._
"abceedd".longestRun mustBe Set("ee", "dd")
"aeebceeedd".longestRun mustBe Set("eee")
"aaaaaaa".longestRun mustBe Set("aaaaaaa")
"abcdefgh".longestRun mustBe empty
}
}
Following is the imperative style implementation, with nested loops and mutable variables as you would normally choose to do in Java or C++:
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
val chunks = mutable.Set.empty[String]
val ilen = string.length
var gmax = 0
for ((ch, curr) <- string.zipWithIndex) {
val chunk = mutable.ListBuffer(ch)
var next = curr + 1
while (next < ilen && string(next) == ch) {
chunk += string(next)
next = next + 1
}
gmax = chunk.length max gmax
if (gmax > 1) chunks += chunk.mkString
}
chunks.toSet.filter( _.length == gmax )
}
}
}
Following is a functional-style implementation, hence no variables, no loops but tail recursion with result accumulators and pattern matching to compare each character with the next one (Crazy! Isn't it?):
object Example {
implicit class RichString(string: String) {
def longestRun: Set[String] = {
def recurse(chars: String, chunk: mutable.ListBuffer[Char], chunks: mutable.Set[String]): Set[String] = {
chars.toList match {
case List(x, y, _*) if (x == y) =>
recurse(
chars.tail,
if (chunk.isEmpty) chunk ++= List(x, y) else chunk += y,
chunks
)
case Nil =>
// terminate recursion
chunks.toSet
case _ => // x != y
recurse(
chars.tail,
chunk = mutable.ListBuffer(),
chunks += chunk.mkString
)
}
}
val chunks = recurse(string, mutable.ListBuffer(), mutable.Set.empty[String])
val max = chunks.map(_.length).max
if (max > 0) chunks.filter( _.length == max ) else Set()
}
}
}
For example, for the given "aeebceeedd" string, both implementations above will build the following set of chunks (repeating characters)
Set("ee", "eee", "dd")
and they will filter those chunks having the maximum length (resulting "eee").
i have this assignment for school which ask us to write code to find the longest common Substring. I have done that, but it only works with text that are not so big and it is being asked to find the common substring for Moby Dick and War And Peace. If you could point me in the right direction of what i'm doing wrong, i would appreciate it. The compiler is complaining that the error is in the substring method of the MyString class when i call it to create the SuffixArray but idk why its saying its too big, giving me the outofmemory
package datastructuresone;
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.Scanner;
class SuffixArray
{
private final MyString[] suffixes;
private final int N;
public SuffixArray(String s)
{
N = s.length();
MyString snew = new MyString(s);
suffixes = new MyString[N];
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
Arrays.sort(suffixes);
}
public int length()
{
return N;
}
public int index(int i)
{
return N - suffixes[i].length();
}
public MyString select(int i)
{
return suffixes[i];
}
// length of longest common prefix of s and t
private static int lcp(MyString s, MyString t)
{
int N = Math.min(s.length(), t.length());
for (int i = 0; i < N; i++)
{
if (s.charAt(i) != t.charAt(i))
{
return i;
}
}
return N;
}
// longest common prefix of suffixes(i) and suffixes(i-1)
public int lcp(int i)
{
return lcp(suffixes[i], suffixes[i - 1]);
}
// longest common prefix of suffixes(i) and suffixes(j)
public int lcp(int i, int j)
{
return lcp(suffixes[i], suffixes[j]);
}
}
public class DataStructuresOne
{
public static void main(String[] args) throws FileNotFoundException
{
Scanner in1 = new Scanner(new File("./build/classes/WarAndPeace.txt"));
Scanner in2 = new Scanner(new File("./build/classes/MobyDick.txt"));
StringBuilder sb = new StringBuilder();
StringBuilder sb1 = new StringBuilder();
while (in1.hasNextLine())
{
sb.append(in1.nextLine());
}
while (in2.hasNextLine())
{
sb1.append(in2.nextLine());
}
String text1 = sb.toString().replaceAll("\\s+", " ");
String text2 = sb1.toString().replaceAll("\\s+", " ");
int N1 = text1.length();
int N2 = text2.length();
SuffixArray sa = new SuffixArray(text1 + "#" + text2);
int N = sa.length();
String substring = "";
for (int i = 1; i < N; i++)
{
// adjacent suffixes both from second text string
if (sa.select(i).length() <= N2 && sa.select(i - 1).length() <= N2)
{
continue;
}
// adjacent suffixes both from first text string
if (sa.select(i).length() > N2 + 1 && sa.select(i - 1).length() > N2 + 1)
{
continue;
}
// check if adjacent suffixes longer common substring
int length = sa.lcp(i);
if (length > substring.length())
{
substring = sa.select(i).toString().substring(0, length);
System.out.println(substring + " ");
}
}
System.out.println("The length of the substring " + substring.length() + "length on first N " + N1 + " length of Second N " + N2
+ "The length of the array sa: " + N);
System.out.println("'" + substring + "'");
final class MyString implements Comparable<MyString>
{
public MyString(String str)
{
offset = 0;
len = str.length();
arr = str.toCharArray();
}
public int length()
{
return len;
}
public char charAt(int idx)
{
return arr[ idx + offset];
}
public int compareTo(MyString other)
{
int myEnd = offset + len;
int yourEnd = other.offset + other.len;
int i = offset, j = other.offset;
for (; i < myEnd && j < yourEnd; i++, j++)
{
if (arr[ i] != arr[ j])
{
return arr[ i] - arr[ j];
}
}
// reached end. Who got there first?
if (i == myEnd && j == yourEnd)
{
return 0; // identical strings
}
if (i == myEnd)
{
return -1;
} else
{
return +1;
}
}
public MyString substring(int beginIndex, int endIndex)
{
return new MyString(arr, beginIndex + offset, endIndex - beginIndex);
}
public MyString substring(int beginIndex)
{
return substring(beginIndex, offset + len);
}
public boolean equals(Object other)
{
return (other instanceof MyString) && compareTo((MyString) other) == 0;
}
public String toString()
{
return new String(arr, offset, len);
}
private MyString(char[] a, int of, int ln)
{
arr = a;
offset = of;
len = ln;
}
private char[] arr;
private int offset;
private int len;
}
Here:
for (int i = 0; i < N; i++)
{
suffixes[i] = snew.substring(i);
}
You are trying to store, not only the entire long string, but the entire string - 1 letter, and the entire string - 2 letters, etc. All of these are stored separately.
If your String were only 10 letters, you would be storing a total of 55 characters worth in 10 different string.
At 1000 characters, you are storing 500500 characters total.
More generally, you are having to handle, length*(length+1)/2 characters.
Just for fun, I don't know how many characters are in War and Peace, but with a page count around 1250, a typical words/page estimate being 250, and the average word being about 5 characters long, comes to:
(1250 * 250 * 5)*(1250 * 250 * 5 + 1)/2 = 1.2207039 * 10^12 characters.
The size of a char in memory being 2 bytes, so you're looking at about 2.22 TB in size (compared to 1.49 MB for just the text of the novel).
I count at least 3 copies of both texts in the first few lines of the code. Here's a few ideas
convert the spaces as you read each line in--not after they are huge strings. Don't forget the case of spaces at the front and end of lines.
build your MyString class using StringBuilder as the base instead of String. Do all the looking inside the StringBuilder with its native methods, if you can.
don't extract strings any more than you have to.
Look up the -Xmx java runtime option and set the heap space large than the default. You'll have to google this as I don't have it memorized. Just notice that -Xmx=1024M needs that M at the end. (Look at the file size to see how big the two books are.)
When you construct MyString, you call arr = str.toCharArray(); which makes a new copy of the string's character data. But in Java, a string is immutable - so why not store a reference to the string instead of a copy of its data?
You construct every suffix at once, but you only refer to one (well, two) at a time. If you recode your solution to only reference the suffixes it currently cares about, and construct them only when it needs them (and lose a reference to them afterwards), they can be garbage collected by Java. This will make running out of memory less likely. Compare the memory overhead of storing 2 strings to storing hundreds of thousands of strings :)
I wrote this program in Scala. Maybe you can translate it to Java.
class MyString private (private val string: String, startIndex: Int, endIndex: Int) extends Comparable[MyString] {
def this(string: String) = this(string, 0, string.length)
def length() = endIndex-startIndex
def charAt(i: Int) = {
if(i >= length) throw new IndexOutOfBoundsException
string.charAt(startIndex + i)
}
def substring(start: Int, end: Int): MyString = {
if(start < 0 || end > length || end < start) throw new IndexOutOfBoundsException
new MyString(string, startIndex + start, startIndex + end)
}
def substring(start: Int): MyString = substring(start, length)
def longestCommonSubstring(other: MyString): MyString = {
var index = 0
val len = math.min(length, other.length)
while(index < len && charAt(index) == other.charAt(index)) index += 1
substring(0, index)
}
def compareTo(other: MyString): Int = {
val len = math.min(length, other.length)
for(i <- 0 until len) {
if(charAt(i) > other.charAt(i)) return 1
if(charAt(i) < other.charAt(i)) return -1
}
length-other.length
}
def >(other: MyString) = compareTo(other) > 0
def <(other: MyString) = compareTo(other) < 0
override def equals(other: Any) = other.isInstanceOf[MyString] && compareTo(other.asInstanceOf[MyString]) == 0
override def toString() = "\"" + string.substring(startIndex, endIndex) + "\""
}
def readFile(name: String) = new MyString(io.Source.fromFile(name).getLines.mkString(" ").replaceAll("\\s+", " "))
def makeList(str: MyString) = (0 until str.length).map(i => str.substring(i)).toIndexedSeq
val string1 = readFile("WarAndPeace.txt")
val string2 = readFile("MobyDick.txt")
val (list1, list2) = (makeList(string1).sorted, makeList(string2).sorted)
var longestMatch = new MyString("")
var (index1, index2) = (0,0)
while(index1 < list1.size && index2 < list2.size) {
val lcs = list1(index1).longestCommonSubstring(list2(index2))
if(lcs.length > longestMatch.length) longestMatch = lcs
if(list1(index1) < list2(index2)) index1 += 1
else index2 += 1
}
println(longestMatch)