Find all common substrings with a minimum length of 2 characters

Find all common substrings with a minimum length of 2 characters - java

Task's content:
Find all (starting from the longest, the shortest ending) common substrings with a minimum length of 2 characters, which do not overlap with previously found.
My task is to remake below code or write new one. I don't even know how to start with it :/
public static void main(String[] args) {
String text1 = "AABABB";
String text2 = "BAABAB";
int len1 = text1.length();
int len2 = text2.length();
int max = 0;
int position_w1 = -1;
int position_w2 = -1;
for (int i = 0; i < len1 - max; i++)
{
for (int k = len2 - 1; k >= 0; k--)
{
int counter = 0;
int limit = Math.min(len2, (len1 -i + k) );
for (int j = k; j < limit; j++)
{
if (text1.charAt(i+j-k) == text2.charAt(j))
{
counter++;
if (max < counter)
{
max = counter;
position_w1 = i + j - k - max + 1;
position_w2 = j - max + 1;
}
}
else counter = 0;
}
}
}
System.out.println("Position of text1: " + position_w1 + ", position of text2: " + position_w2 + ", length: " + max);
System.out.println(text1.substring(0, position_w1) + "\u001B[31m"
+ text1.substring(position_w1, max + position_w1) + "\u001B[0m" + text1.substring(max + position_w1) );
System.out.println(text2.substring(0, position_w2) + "\u001B[31m"
+ text2.substring(position_w2, max + position_w2) + "\u001B[0m" + text2.substring(max + position_w2) );
}

Maybe try something like this :
public static void main(String[] args){
String text1 = "AABABB";
String text2 = "BAABAB";
int maxLen = 3;
for (int len = 2; len < maxLen; len++) {
ArrayList<Pair<Integer, Integer>> intervalsText1 = new ArrayList<>();
ArrayList<Pair<Integer, Integer>> intervalsText2 = new ArrayList<>();
for (int i = 0; i < text1.length() - len; i++) {
String sub1 = text1.substring(i, i + len);
for (int j = 0; j < text2.length() - len; i++) {
String sub2 = text2.substring(j, j + len);
if(sub1.equals(sub2)){
//check if overlapping
for(Pair<Integer, Integer> inter : intervalsText1){
for(Pair<Integer, Integer> inter : intervalsText1){
if(!isInsideInterval(inter, i) && !isInsideInterval(inter, i + len)){
if(!isInsideInterval(inter, j) && !isInsideInterval(inter, j + len)){
//the strings are equal and outside previous strings
intervalsText1.add(Pair.of(i, i + len));
intervalsText2.add(Pair.of(j, j + len));
}
}
}
}
}
}
}
}
}
public static boolean isInsideInterval(Pair<Integer, Integer> interval, int toCheck){
if(interval.getLeft() < toCheck && interval.getRight() > toCheck)
return true;
return false;
}
*Please note that I haven't checked it working

As this is home work, just some mental help.
Immediately diving into for-i like loops is hard to read and/or change.
Consider how one would do the task oneself, and see what the original writer did.
Starting with the maximal subsequence, would be:
int max = Math.min(len1, len2);
...
--max;
or even
for (int max = Math.min(len1, len2); may > 0; --max) {
for (int i = 0; i + max <= len1; ++i) {
if (match found) {
print match;
i += max - 1;
}
}
}
The for-k loop does a --k; a bit needlessly IMHO.
Matching a substring and when matched skip by i += max; or i += max - 1; would benefit from the String.substring or your own function for that.
By the way, when ready try to consider things like skipping a second occurrence of a sought substring: "AB" appears twice in text1. Good names important, comments might help. Otherwise demerits for "circumstantial code" might follow.
int notEarlierPos = text1.indexOf(text1.subtring(i, i + max));
if (notEarlierPos == i) { // Not already treated earlier.
int matchPos = text2.indexOf(text1.subtring(i, i + max));
if (matchPos != -1) {
print match;
i += max - 1;
}
}

Related

Most basic way to insert a character into a string in Java?

Say I have a string x and I want to add some character x amount of times so that string x becomes of length y, how would I do this?
String x = "this_is_a_line";
//x.length() = 14;
length y = 20;
//insert character into line so String x becomes balanced to:
x = "this___is___a___line";
//x.length() = 20;
Another example:
String y = "in_it_yeah";
//y.length() = 10
length j = 15;
//inserting characters so String y becomes balanced to:
y = "in____it___yeah";
I want to avoid using StringBuilder to append characters.
My thought process:
Create a Char array of length y.
Copy the string to array.
Attempt to shift the characters one by one
from the furthest character to the right.

I hope I understood you correctly but I think this code does what you're requesting.
Edit: this one will distribute the specified character evenly and can create strings like "a__b_c" if the string isn't long enough yet.
import java.util.ArrayList;
public class Main {
public static void main(String[] args) {
String string = "this_is_a_line";
int length = string.length();
int desiredLength = 16;
int extraLengthRequired = desiredLength - length;
char characterToBeDuplicated = '_';
String[] rawPieces = string.split(Character.toString(characterToBeDuplicated));
ArrayList<String> pieces = new ArrayList<>();
int emptyIndexes = 0;
for (String piece : rawPieces) {
if(piece.equals("")) {
emptyIndexes++;
} else {
pieces.add(piece);
}
}
int numOfCharactersToBeMultiplied = pieces.size() - 1;
int numOfMultiplications = (int) Math.floor((extraLengthRequired + emptyIndexes) / numOfCharactersToBeMultiplied) + 1;
int lengthLeft = (extraLengthRequired + emptyIndexes) % numOfCharactersToBeMultiplied;
String newString = pieces.get(0);
for (int i = 1; i < pieces.size(); i++) {
newString += characterToBeDuplicated;
if(lengthLeft > 0) {
newString += characterToBeDuplicated;
lengthLeft--;
}
for (int j = 1; j < numOfMultiplications; j++) {
newString += characterToBeDuplicated;
}
newString += pieces.get(i);
}
System.out.println(newString + " - " + newString.length());
}
}
Old solution:
public class Main {
public static void main(String[] args) {
String string = "this_is_a_line";
int length = string.length();
int desiredLength = 20;
int extraLengthRequired = desiredLength - length;
char characterToBeDuplicated = '_';
String[] pieces = string.split(Character.toString(characterToBeDuplicated));
int numOfCharactersToBeMultiplied = pieces.length - 1;
int numOfMultiplications = (int) Math.floor(extraLengthRequired / numOfCharactersToBeMultiplied);
String newString = pieces[0];
for (int i = 1; i < pieces.length; i++) {
newString += characterToBeDuplicated;
for (int j = 1; j < numOfMultiplications; j++) {
newString += characterToBeDuplicated;
}
newString += pieces[i];
}
System.out.println(newString);
}
}

I did this one because when I started doing a pseudo code it was a bit challenging for me, though I'm not a fan of answering questions like this on "gimme teh codez".
I recommend you though using StringBuilder on String concatenation inside for loops because it's more efficient than actual String concatenation with +=.
Note that:
This program adds spaces to the end, that's why I print it's length trimmed.
I split it by space with a regex \\s+ instead of _ because that's how you wrote it 1st
The following code works for
a____b_c -> a___b___c (Lenght 9)
a____b_c -> a____b___c (Lenght 10)
this_is_a_line -> this___is___a___line (Lenght 20)
in_it_yeah -> in____it___yeah (Length 15)
Code:
class EnlargeString {
public static void main (String args[]) {
String x = "a b c";
int larger = 10;
int numberOfSpaces = 0;
String s[] = x.split("\\s+"); //We get the number of words
numberOfSpaces = larger - s.length; //The number of spaces to be added after each token
int extraSpaces = numberOfSpaces % s.length; //Extra spaces for the cases of 4 spaces then 3 or something like that
System.out.println(extraSpaces);
String newSpace[] = new String[s.length]; //The String array that will contain all string between words
for (int i = 0; i < s.length; i++) {
newSpace[i] = " "; //Initialize the previous array
}
//Here we add the extra spaces (the cases of 4 and 3)
for (int i = 0; i < s.length; i++) {
if (extraSpaces == 0) {
break;
}
newSpace[i] += " ";
extraSpaces--;
}
for (int i = 0; i < s.length; i++) {
for (int j = 0; j < totalSpaces / s.length; j++) {
newSpace[i] += " "; //Here we add all the equal spaces for all tokens
}
}
String finalWord = "";
for (int i = 0; i < s.length; i++) {
finalWord += (s[i] + newSpace[i]); //Concatenate
}
System.out.println(x);
System.out.println(x.length());
System.out.println(finalWord);
System.out.println(finalWord.trim().length());
}
}
Next time try to do it yourself 1st and show YOUR logic, then your question will be better received (maybe with upvotes instead of downvotes) this is called a Runnable Example. I also recommend you to take a look at String concatenation vs StringBuilder and How to ask a good question, also you might want to Take the tour

Here is my solution:
public class Main
{
public static void main(String args[]){
System.out.print("#Enter width : " );
int width = BIO.getInt();
System.out.print("#Enter line of text : " );
String line = BIO.getString().trim();
int nGaps, spToAdd, gapsLeft, modLeft, rem;
nGaps = spToAdd = gapsLeft = rem = 0;
double route = 0;
String sp = " ";
while ( ! line.equals( "END" )){
nGaps = numGaps(line);
if (nGaps == 0) { line = compLine(line, width).replace(" ", "."); }
else if (nGaps == width) { line = line.replace(" ", "."); }
else{
int posArray[] = new int[nGaps];
posArray = pos(line, nGaps);
gapsLeft = width - line.length();
spToAdd = gapsLeft / nGaps;
modLeft = gapsLeft % nGaps;
route = gapsLeft / nGaps;
sp = spGen(spToAdd);
line = reFormat(posArray, line, width, sp, spToAdd);
if (line.length() < width){
System.out.print("#OK\n");
nGaps = numGaps(line);
int posArray2[] = new int[nGaps];
posArray2 = pos(line, nGaps);
line = compFormat(posArray2, line, modLeft);
}
line = line.replace(" ", ".");
}
System.out.println(line);
System.out.println("#Length is: " + line.length());
System.out.print("#Enter line of text : " );
line = BIO.getString().trim();
}
}
public static int numGaps(String oLine){
int numGaps = 0;
for (char c : oLine.toCharArray()) { if (c == ' ') { numGaps++; } }
return numGaps;
}
public static String spGen(int count) {
return new String(new char[count]).replace("\0", " ");
}
public static String compLine(String oLine, int width){
String newLine = oLine;
int pos = oLine.length();
int numOSpace = width - oLine.length();
String sp = spGen(numOSpace);
newLine = new StringBuilder(newLine).insert(pos, sp).toString();
return newLine;
}
public static int[] pos(String oLine, int nGaps){
int posArray[] = new int[nGaps];
int i = 0;
for (int pos = 0; pos < oLine.length(); pos++) {
if (oLine.charAt(pos) == ' ') { posArray[i] = pos; i++; }
}
//for (int y = 0; y < x; ++y) { System.out.println(posArray[y]); }
return posArray;
}
public static String reFormat(int[] posArray, String oLine, int width, String sp, int spToAdd){
String newLine = oLine;
int mark = 0;
for (int i = 0; i < posArray.length; ++i){ /*insert string at mark, shift next element by the num of elements inserted*/
if (newLine.length() > width) { System.out.println("Maths is wrong: ERROR"); System.exit(1);}
else { newLine = new StringBuilder(newLine).insert(posArray[i]+mark, sp).toString(); mark += spToAdd; }
}
return newLine;
}
public static String compFormat(int[] posArray2, String mLine, int modLeft){
String newLine = mLine;
int mark = 0;
for (int i = 0; i < modLeft; ++i){
//positions
//if position y is != y+1 insert sp modLeft times
if (posArray2[i] != posArray2[i+1] && posArray2[i] != posArray2[posArray2.length - 1]){
newLine = new StringBuilder(newLine).insert(posArray2[i]+mark, " ").toString(); mark++;
}
}
return newLine;
}
}

How to find the longest substring with equal amount of characters efficiently

I have a string that consists of characters A,B,C and D and I am trying to calculate the length of the longest substring that has an equal amount of each one of these characters in any order.
For example ABCDB would return 4, ABCC 0 and ADDBCCBA 8.
My code currently:
public int longestSubstring(String word) {
HashMap<Integer, String> map = new HashMap<Integer, String>();
for (int i = 0; i<word.length()-3; i++) {
map.put(i, word.substring(i, i+4));
}
StringBuilder sb;
int longest = 0;
for (int i = 0; i<map.size(); i++) {
sb = new StringBuilder();
sb.append(map.get(i));
int a = 4;
while (i<map.size()-a) {
sb.append(map.get(i+a));
a+= 4;
}
String substring = sb.toString();
if (equalAmountOfCharacters(substring)) {
int length = substring.length();
if (length > longest)
longest = length;
}
}
return longest;
}
This currently works pretty well if the string length is 10^4 but I'm trying to make it 10^5. Any tips or suggestions would be appreciated.

Let's assume that cnt(c, i) is the number of occurrences of the character c in the prefix of length i.
A substring (low, high] has an equal amount of two characters a and b iff cnt(a, high) - cnt(a, low) = cnt(b, high) - cnt(b, low), or, put it another way, cnt(b, high) - cnt(a, high) = cnt(b, low) - cnt(a, low). Thus, each position is described by a value of cnt(b, i) - cnt(a, i). Now we can generalize it for more that two characters: each position is described by a tuple (cnt(a_2, i) - cnt(a_1, i), ..., cnt(a_k, i) - cnt(a_1, i)), where a_1 ... a_k is the alphabet.
We can iterate over the given string and maintain the current tuple. At each step, we should update the answer by checking the value of i - first_occurrence(current_tuple), where first_occurrence is a hash table that stores the first occurrence of each tuple seen so far. Do not forget to put a tuple of zeros to the hash map before iteration(it corresponds to an empty prefix).

If there were only A's and B's, then you could do something like this.
def longest_balanced(word):
length = 0
cumulative_difference = 0
first_index = {0: -1}
for index, letter in enumerate(word):
if letter == 'A':
cumulative_difference += 1
elif letter == 'B':
cumulative_difference -= 1
else:
raise ValueError(letter)
if cumulative_difference in first_index:
length = max(length, index - first_index[cumulative_difference])
else:
first_index[cumulative_difference] = index
return length
Life is more complicated with all four letters, but the idea is much the same. Instead of keeping just one cumulative difference, for A's versus B's, we keep three, for A's versus B's, A's versus C's, and A's versus D's.

Well, first of all abstain from constructing any strings.
If you don't produce any (or nearly no) garbage, there's no need to collect it, which is a major plus.
Next, use a different data-structure:
I suggest 4 byte-arrays, storing the count of their respective symbol in the 4-span starting at the corresponding string-index.
That should speed it up considerably.

You can count the occurrences of the characters in word. Then, a possible solution could be:
If min is the minimum number of occurrences of any character in word, then min is also the maximum possible number of occurrences of each character in the substring we are looking for. In the code below, min is maxCount.
We iterate over decreasing values of maxCount. At every step, the string we are searching for will have length maxCount * alphabetSize. We can view this as the size of a sliding window we can slide over word.
We slide the window over word, counting the occurrences of the characters in the window. If the window is the substring we are searching for, we return the result. Otherwise, we keep searching.
[FIXED] The code:
private static final int ALPHABET_SIZE = 4;
public int longestSubstring(String word) {
// count
int[] count = new int[ALPHABET_SIZE];
for (int i = 0; i < word.length(); i++) {
char c = word.charAt(i);
count[c - 'A']++;
}
int maxCount = word.length();
for (int i = 0; i < count.length; i++) {
int cnt = count[i];
if (cnt < maxCount) {
maxCount = cnt;
}
}
// iterate over maxCount until found
boolean found = false;
while (maxCount > 0 && !found) {
int substringLength = maxCount * ALPHABET_SIZE;
found = findSubstring(substringLength, word, maxCount);
if (!found) {
maxCount--;
}
}
return found ? maxCount * ALPHABET_SIZE : 0;
}
private boolean findSubstring(int length, String word, int maxCount) {
int startIndex = 0;
boolean found = false;
while (startIndex + length <= word.length()) {
int[] count = new int[ALPHABET_SIZE];
for (int i = startIndex; i < startIndex + length; i++) {
char c = word.charAt(i);
int cnt = ++count[c - 'A'];
if (cnt > maxCount) {
break;
}
}
if (equalValues(count, maxCount)) {
found = true;
break;
} else {
startIndex++;
}
}
return found;
}
// Returns true if all values in c are equal to value
private boolean equalValues(int[] count, int value) {
boolean result = true;
for (int i : count) {
if (i != value) {
result = false;
break;
}
}
return result;
}
[MERGED] This is Hollis Waite's solution using cumulative counts, but taking my observations at points 1. and 2. into consideration. This may improve performance for some inputs:
private static final int ALPHABET_SIZE = 4;
public int longestSubstring(String word) {
// count
int[][] cumulativeCount = new int[ALPHABET_SIZE][];
for (int i = 0; i < ALPHABET_SIZE; i++) {
cumulativeCount[i] = new int[word.length() + 1];
}
int[] count = new int[ALPHABET_SIZE];
for (int i = 0; i < word.length(); i++) {
char c = word.charAt(i);
count[c - 'A']++;
for (int j = 0; j < ALPHABET_SIZE; j++) {
cumulativeCount[j][i + 1] = count[j];
}
}
int maxCount = word.length();
for (int i = 0; i < count.length; i++) {
int cnt = count[i];
if (cnt < maxCount) {
maxCount = cnt;
}
}
// iterate over maxCount until found
boolean found = false;
while (maxCount > 0 && !found) {
int substringLength = maxCount * ALPHABET_SIZE;
found = findSubstring(substringLength, word, maxCount, cumulativeCount);
if (!found) {
maxCount--;
}
}
return found ? maxCount * ALPHABET_SIZE : 0;
}
private boolean findSubstring(int length, String word, int maxCount, int[][] cumulativeCount) {
int startIndex = 0;
int endIndex = (startIndex + length) - 1;
boolean found = true;
while (endIndex < word.length()) {
for (int i = 0; i < ALPHABET_SIZE; i++) {
if (cumulativeCount[i][endIndex] - cumulativeCount[i][startIndex] != maxCount) {
found = false;
break;
}
}
if (found) {
break;
} else {
startIndex++;
endIndex++;
}
}
return found;
}

You'll probably want to cache cumulative counts of characters for each index of String -- that's where the real bottleneck is. Haven't thoroughly tested but something like the below should work.
public class Test {
static final int LEN = 4;
static class RandomCharSequence implements CharSequence {
private final Random mRandom = new Random();
private final int mAlphabetLen;
private final int mLen;
private final int mOffset;
RandomCharSequence(int pLen, int pOffset, int pAlphabetLen) {
mAlphabetLen = pAlphabetLen;
mLen = pLen;
mOffset = pOffset;
}
public int length() {return mLen;}
public char charAt(int pIdx) {
mRandom.setSeed(mOffset + pIdx);
return (char) (
'A' +
(mRandom.nextInt() % mAlphabetLen + mAlphabetLen) % mAlphabetLen
);
}
public CharSequence subSequence(int pStart, int pEnd) {
return new RandomCharSequence(pEnd - pStart, pStart, mAlphabetLen);
}
#Override public String toString() {
return (new StringBuilder(this)).toString();
}
}
public static void main(String[] pArgs) {
Stream.of("ABCDB", "ABCC", "ADDBCCBA", "DADDBCCBA").forEach(
pWord -> System.out.println(longestSubstring(pWord))
);
for (int i = 0; ; i++) {
final double len = Math.pow(10, i);
if (len >= Integer.MAX_VALUE) break;
System.out.println("Str len 10^" + i);
for (int alphabetLen = 1; alphabetLen <= LEN; alphabetLen++) {
final Instant start = Instant.now();
final int val = longestSubstring(
new RandomCharSequence((int) len, 0, alphabetLen)
);
System.out.println(
String.format(
" alphabet len %d; result %08d; time %s",
alphabetLen,
val,
formatMillis(ChronoUnit.MILLIS.between(start, Instant.now()))
)
);
}
}
}
static String formatMillis(long millis) {
return String.format(
"%d:%02d:%02d.%03d",
TimeUnit.MILLISECONDS.toHours(millis),
TimeUnit.MILLISECONDS.toMinutes(millis) -
TimeUnit.HOURS.toMinutes(TimeUnit.MILLISECONDS.toHours(millis)),
TimeUnit.MILLISECONDS.toSeconds(millis) -
TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(millis)),
TimeUnit.MILLISECONDS.toMillis(millis) -
TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(millis))
);
}
static int longestSubstring(CharSequence pWord) {
// create array that stores cumulative char counts at each index of string
// idx 0 = char (A-D); idx 1 = offset
final int[][] cumulativeCnts = new int[LEN][];
for (int i = 0; i < LEN; i++) {
cumulativeCnts[i] = new int[pWord.length() + 1];
}
final int[] cumulativeCnt = new int[LEN];
for (int i = 0; i < pWord.length(); i++) {
cumulativeCnt[pWord.charAt(i) - 'A']++;
for (int j = 0; j < LEN; j++) {
cumulativeCnts[j][i + 1] = cumulativeCnt[j];
}
}
final int maxResult = Arrays.stream(cumulativeCnt).min().orElse(0) * LEN;
if (maxResult == 0) return 0;
int result = 0;
for (int initialOffset = 0; initialOffset < LEN; initialOffset++) {
for (
int start = initialOffset;
start < pWord.length() - result;
start += LEN
) {
endLoop:
for (
int end = start + result + LEN;
end <= pWord.length() && end - start <= maxResult;
end += LEN
) {
final int substrLen = end - start;
final int expectedCharCnt = substrLen / LEN;
for (int i = 0; i < LEN; i++) {
if (
cumulativeCnts[i][end] - cumulativeCnts[i][start] !=
expectedCharCnt
) {
continue endLoop;
}
}
if (substrLen > result) result = substrLen;
}
}
}
return result;
}
}

Suppose there are K possible letters in a string of length N. We could track the balance of letters seen with a vector pos of length K that is updated as follows:
If letter 1 is seen, add (K-1, -1, -1, ...)
If letter 2 is seen, add (-1, K-1, -1, ...)
If letter 3 is seen, add (-1, -1, K-1, ...)
Maintain a hash that maps pos to the first string position where pos is reached. Balanced substrings occur whenever hash[pos] already exists and the substring value is s[hash[pos]:pos].
The cost of maintaining the hash is O(log N) so processing the string takes O(N log N). How does this compare with solutions so far? These types of problems tend to have linear solutions but I haven't come across one yet.
Here's some code demonstrating the idea for 3 letters and a run using biased random strings. (Uniform random strings allow for solutions that are around half the string length, which is unwieldy to print).
#!/usr/bin/python
import random
from time import time
alphabet = "abc"
DIM = len(alphabet)
def random_string(n):
# return a random string over choices[] of length n
# distribution of letters is non-uniform to make matches harder to find
choices = "aabbc"
s = ''
for i in range(n):
r = random.randint(0, len(choices) - 1)
s += choices[r]
return s
def validate(s):
# verify frequencies of each letter are the same
f = [0, 0, 0]
a2f = {alphabet[i] : i for i in range(DIM)}
for c in s:
f[a2f[c]] += 1
assert f[0] == f[1] and f[1] == f[2]
def longest_balanced(s):
"""return length of longest substring of s containing equal
populations of each letter in alphabet"""
slen = len(s)
p = [0 for i in range(DIM)]
vec = {alphabet[0] : [2, -1, -1],
alphabet[1] : [-1, 2, -1],
alphabet[2] : [-1, -1, 2]}
x = -1
best = -1
hist = {str([0, 0, 0]) : -1}
for c in s:
x += 1
p = [p[i] + vec[c][i] for i in range(DIM)]
pkey = str(p)
if pkey not in hist:
hist[pkey] = x
else:
span = x - hist[pkey]
assert span % DIM == 0
if span > best:
best = span
cand = s[hist[pkey] + 1: x + 1]
print("best so far %d = [%d,%d]: %s" % (best,
hist[pkey] + 1,
x + 1,
cand))
validate(cand)
return best if best > -1 else 0
def main():
#print longest_balanced( "aaabcabcbbcc" )
t0 = time()
s = random_string(1000000)
print "generate time:", time() - t0
t1 = time()
best = longest_balanced( s )
print "best:", best
print "elapsed:", time() - t1
main()
Sample run on an input of 10^6 letters with an alphabet of 3 letters:
$ ./bal.py
...
best so far 189 = [847894,848083]: aacacbcbabbbcabaabbbaabbbaaaacbcaaaccccbcbcbababaabbccccbbabbacabbbbbcaacacccbbaacbabcbccaabaccabbbbbababbacbaaaacabcbabcbccbabbccaccaabbcabaabccccaacccccbaacaaaccbbcbcabcbcacaabccbacccacca
best: 189
elapsed: 1.43609690666

How can I get the longest increasing subsequence in a string?

I'm pretty rusty on my Java skills but I was trying to write a program that prompts the user to enter a string and displays a maximum length increasing ordered subsequence of characters. For example, if the user entered Welcome the program would output Welo. If the user entered WWWWelllcommmeee, the program would still output Welo. I've gotten this much done but it's not doing what it should be and I'm honestly at a loss as to why.
import java.util.ArrayList;
import java.util.Scanner;
public class Stuff {
public static void main(String[] args) {
Scanner input = new Scanner(System.in);
System.out.println("Please enter a string. ");
String userString = input.next();
ArrayList charList = new ArrayList();
ArrayList finalList = new ArrayList();
int currentLength = 0;
int max = 0;
for(int i = 0; i < userString.length(); i++){
charList.add(userString.charAt(i));
for(int j = i; j < userString.length(); j++){
int k=j+1;
if(k < userString.length() && userString.charAt(k) > userString.charAt(j)){
charList.add(userString.charAt(j));
currentLength++;
}
}
}
if(max < currentLength){
max = currentLength;
finalList.addAll(charList);
}
for (int i = 0; i < finalList.size(); i++){
char item = (char) finalList.get(i);
System.out.print(item);
}
int size1 = charList.size();
int size2 = finalList.size();
System.out.println("");
System.out.println("Size 1 is: " + size1 + " Size 2 is : " + size2);
}
}
My code, if I input Welcome, outputs WWeceeclcccome.
Does anyone have some tips on what I'm doing wrong?

In these cases it tends to help to step away from the keyboard and think about the algorithm you're trying to implement. Try to explain it first in words.
You are constructing a list of individual characters by appending each of the characters in the input string followed by characters to its right that are in correct alphabetical with their successor. For the input "Welcome" this means the accumulated output will be, showing the outer loop in vertical and inner loop in horizontal:
W W e c
e e c
l c
c c
o
m
e
In total: WWeceeclccome

I can't see the logic of this implementation. Here is a faster solution which runs in O(nlogn) time.
import java.util.Scanner;
public class Stuff
{
//return the index of the first element that's not less than the target element
public static int bsearch(char[] arr, int size, int key)
{
int left = 0;
int right = size - 1;
int mid;
while (left <= right)
{
mid = (left + right) / 2;
if(arr[mid] < key)
left = mid + 1;
else
right = mid - 1;
}
return left;
}
public static void main(String[] args)
{
Scanner input = new Scanner(System.in);
System.out.println("Please enter a string: ");
String userString = input.next();
char[] maxArr = new char[userString.length()];
char[] precedent = new char[userString.length()];
maxArr[0] = userString.charAt(0);
precedent[0] = userString.charAt(0);
int len = 1;
for(int i = 1; i < userString.length(); i++)
{
if(userString.charAt(i) > maxArr[len - 1])
{
maxArr[len] = userString.charAt(i);
precedent[len] = userString.charAt(i);
len++;
}
else
maxArr[bsearch(maxArr, len, userString.charAt(i))] = userString.charAt(i);
}
//System.out.println(len);
for(int i = 0; i < len; i++)
System.out.print(precedent[i]);
}
}

Using Dynamic Programming O(N^2) in lexicography order mean if i/p is abcbcbcd then o/p can be abcccd, abbbcd, abbccd but as per lexicography order o/p will be abbbcd.
public static String longestIncreasingSubsequence(String input1) {
int dp[] = new int[input1.length()];
int i,j,max = 0;
StringBuilder str = new StringBuilder();
/* Initialize LIS values for all indexes */
for ( i = 0; i < input1.length(); i++ )
dp[i] = 1;
/* Compute optimized LIS values in bottom up manner */
for ( i = 1; i < input1.length(); i++ )
for ( j = 0; j < i; j++ )
if (input1.charAt(i) >= input1.charAt(j) && dp[i] < dp[j]+1)
dp[i] = dp[j] + 1;
/* Pick maximum of all LIS values */
for ( i = 0; i < input1.length(); i++ ) {
if ( max < dp[i] ) {
max = dp[i];
if (i + 1 < input1.length() && max == dp[i+1] && input1.charAt(i+1) < input1.charAt(i)) {
str.append(input1.charAt(i+1));
i++;
} else {
str.append(input1.charAt(i));
}
}
}
return str.toString();
}

How to organize a single array into a two-dimensional one?

I am trying to split a one dimensional array of display modes into a 2-dimensional string array, though I have encountered trouble where I got to the point of splitting the sorted array of display modes. My question: How can I split the single array, which is already sorted, into a two dimensional array? The code (Sorry about the weird variable names):
public static String[][] OrganizeDisplayModes (DisplayMode[] modes) {
int iter = 0;
int deltaIter = 0;
int rows = 0;
int columns = 0;
String[][] tobe;
//bubble sorting
for (int a = 0; a < modes.length - 1; a++) {
for(int i = 0; i < modes.length - 1; i++) {
if (modes[i].getWidth() < modes[i+1].getWidth()) {
DisplayMode change = modes[i];
modes[i] = modes[i+1];
modes[i+1] = change;
}
}
}
for(int i = 0; i < modes.length - 1; i++) {
if ((modes[i].getWidth() == modes[i+1].getWidth()) && (modes[i].getBitsPerPixel() < modes[i+1].getBitsPerPixel())) {
DisplayMode change = modes[i];
modes[i] = modes[i+1];
modes[i+1] = change;
}
}
for(int i = 0; i < modes.length - 1; i++) {
if ((modes[i].getWidth() == modes[i+1].getWidth()) && (modes[i].getFrequency() < modes[i+1].getFrequency())) {
DisplayMode change = modes[i];
modes[i] = modes[i+1];
modes[i+1] = change;
}
}
for(int i = 0; i < modes.length; i++) {
DisplayMode current = modes[i];
System.out.println(i + ". " + current.getWidth() + "x" + current.getHeight() + "x" +
current.getBitsPerPixel() + " " + current.getFrequency() + "Hz");
}
//fit into string array
for (int i = 0; i < modes.length - 1; i++) {
if (!(modes[i].getWidth() == modes[i+1].getWidth())) {
rows += 1;
deltaIter = i - deltaIter;
if (deltaIter > columns)
columns = deltaIter;
}
}
//split the displaymode array into the two-dimensional string one here
tobe = new String[rows][columns];
for (int i = 0; i < rows; i++) {
for (int a = 0; a < columns; a++) {
if((modes[iter].getWidth() == modes[iter+1].getWidth())) {
tobe[i][a] = iter + ". " + modes[iter].toString() + " ";
}
else
break;
if (!(iter >= 68))
iter += 1;
}
if (iter >= 68)
break;
}
tobe[rows-1][columns-1] = (iter + 1) + ". " + modes[iter].toString() + " ";
//test to see that it works
for (int i = 0; i < rows; i++) {
for (int a = 0; a < columns; a++) {
if(tobe[i][a] != null)
System.out.print(tobe[i][a]);
else
break;
}
System.out.println("");
}
System.exit(0);
return null;
}
The output looks like this:
0. 1440x900x32 75Hz
1. 1440x900x16 75Hz
2. 1440x900x32 60Hz
3. 1440x900x16 60Hz
with all of the different possible resolutions. Basically, what I'm trying to do is make a readable list of the different resolutions, so a user can select their desired one. Thank you.

How about storing the information in a different manner? When I made the transition from C to Java I used to use multi-dimensional arrays quite a bit, but it's a pain.
public class ScreenType implements Comparable {
int width;
int height;
int color; // not sure what this one was for, assuming 32/16 bit color
int hertz;
public ScreenType(int w, int h, int c, int h) {
width = w;
height = h;
color = t;
hertz = h;
}
// remember to maintain transitive property,
// see JDK documentation for Comparable
#Override
public int compareTo(Tuple other) {
}
// getters and setters and whatnot
#Override
public String toString() {
return width + "X" + height + "X" + color+ " " + hertz + "hz";
}
}
Then store your screen data like this and use an ArrayList of ScreenTypes which, if I remember right, will sort based on your overriding of compareTo()

How to find the longest substring containing two unique repeating characters

The task is to find the longest substring in a given string that is composed of any two unique repeating characters
Ex. in an input string "aabadefghaabbaagad", the longest such string is "aabbaa"
I came up with the following solution but wanted to see if there is a more efficient way to do the same
import java.util.*;
public class SubString {
public static void main(String[] args) {
//String inStr="defghgadaaaaabaababbbbbbd";
String inStr="aabadefghaabbaagad";
//String inStr="aaaaaaaaaaaaaaaaaaaa";
System.out.println("Input string is "+inStr);
StringBuilder sb = new StringBuilder(inStr.length());
String subStr="";
String interStr="";
String maxStr="";
int start=0,length=0, maxStart=0, maxlength=0, temp=0;
while(start+2<inStr.length())
{ int i=0;
temp=start;
char x = inStr.charAt(start);
char y = inStr.charAt(start+1);
sb.append(x);
sb.append(y);
while( (x==y) && (start+2<inStr.length()) )
{ start++;
y = inStr.charAt(start+1);
sb.append(y);
}
subStr=inStr.substring(start+2);
while(i<subStr.length())
{ if(subStr.charAt(i)==x || subStr.charAt(i)==y )
{ sb.append(subStr.charAt(i));
i++;
}
else
break;
}
interStr= sb.toString();
System.out.println("Intermediate string "+ interStr);
length=interStr.length();
if(maxlength<length)
{ maxlength=length;
length=0;
maxStr = new String(interStr);
maxStart=temp;
}
start++;
sb.setLength(0);
}
System.out.println("");
System.out.println("Longest string is "+maxStr.length()+" chars long "+maxStr);
}
}

Here's a hint that might guide you towards a linear-time algorithm (I assume that this is homework, so I won't give the entire solution): At the point where you have found a character that is neither equal to x nor to y, it is not necessary to go all the way back to start + 1 and restart the search. Let's take the string aabaaddaa. At the point where you have seen aabaa and the next character is d, there is no point in restarting the search at index 1 or 2, because in those cases, you'll only get abaa or baa before hitting d again. As a matter of fact, you can move start directly to index 3 (the first index of the last group of as), and since you already know that there is a contiguous sequene of as up to d, you can move i to index 5 and continue.
Edit: Pseudocode below.
// Find the first letter that is not equal to the first one,
// or return the entire string if it consists of one type of characters
int start = 0;
int i = 1;
while (i < str.length() && str[i] == str[start])
i++;
if (i == str.length())
return str;
// The main algorithm
char[2] chars = {str[start], str[i]};
int lastGroupStart = 0;
while (i < str.length()) {
if (str[i] == chars[0] || str[i] == chars[1]) {
if (str[i] != str[i - 1])
lastGroupStart = i;
}
else {
//TODO: str.substring(start, i) is a locally maximal string;
// compare it to the longest one so far
start = lastGroupStart;
lastGroupStart = i;
chars[0] = str[start];
chars[1] = str[lastGroupStart];
}
i++;
}
//TODO: After the loop, str.substring(start, str.length())
// is also a potential solution.

Same question to me, I wrote this code
public int getLargest(char [] s){
if(s.length<1) return s.length;
char c1 = s[0],c2=' ';
int start = 1,l=1, max=1;
int i = 1;
while(s[start]==c1){
l++;
start++;
if(start==s.length) return start;
}
c2 = s[start];
l++;
for(i = l; i<s.length;i++){
if(s[i]==c1 || s[i]==c2){
if(s[i]!=s[i-1])
start = i;
l++;
}
else {
l = i-start+1;
c1 = s[start];
c2 = s[i];
start = i;
}
max = Math.max(l, max);
}
return max;
}

so the way I think of this is to solve it in 2 steps
scan the entire string to find continuous streams of the same letter
loop the extracted segments and condense them until u get a gap.
This way you can also modify the logic to scan for longest sub-string of any length not just 2.
class Program
{
static void Main(string[] args)
{
//.
string input = "aabbccdddxxxxxxxxxxxxxxxxx";
int max_chars = 2;
//.
int flip = 0;
var scanned = new List<string>();
while (flip > -1)
{
scanned.Add(Scan(input, flip, ref flip));
}
string found = string.Empty;
for(int i=0;i<scanned.Count;i++)
{
var s = Condense(scanned, i, max_chars);
if (s.Length > found.Length)
{
found = s;
}
}
System.Console.WriteLine("Found:" + found);
System.Console.ReadLine();
}
/// <summary>
///
/// </summary>
/// <param name="s"></param>
/// <param name="start"></param>
/// <returns></returns>
private static string Scan(string s, int start, ref int flip)
{
StringBuilder sb = new StringBuilder();
flip = -1;
sb.Append(s[start]);
for (int i = start+1; i < s.Length; i++)
{
if (s[i] == s[i - 1]) { sb.Append(s[i]); continue; } else { flip=i; break;}
}
return sb.ToString();
}
/// <summary>
///
/// </summary>
/// <param name="list"></param>
/// <param name="start"></param>
/// <param name="repeat"></param>
/// <param name="flip"></param>
/// <returns></returns>
private static string Condense(List<string> list, int start, int repeat)
{
StringBuilder sb = new StringBuilder();
List<char> domain = new List<char>(){list[start][0]};
for (int i = start; i < list.Count; i++)
{
bool gap = false;
for (int j = 0; j < domain.Count; j++)
{
if (list[i][0] == domain[j])
{
sb.Append(list[i]);
break;
}
else if (domain.Count < repeat)
{
domain.Add(list[i][0]);
sb.Append(list[i]);
break;
}
else
{
gap=true;
break;
}
}
if (gap) { break;}
}
return sb.ToString();
}
}

A general solution: Longest Substring Which Contains K Unique Characters.
int longestKCharSubstring(string s, int k) {
int i, max_len = 0, start = 0;
// either unique char & its last pos
unordered_map<char, int> ht;
for (i = 0; i < s.size(); i++) {
if (ht.size() < k || ht.find(s[i]) != ht.end()) {
ht[s[i]] = i;
} else {
// (k + 1)-th char
max_len = max(max_len, i - start);
// start points to the next of the earliest char
char earliest_char;
int earliest_char_pos = INT_MAX;
for (auto key : ht)
if (key.second < earliest_char_pos)
earliest_char = key.first;
start = ht[earliest_char] + 1;
// replace earliest_char
ht.erase(earliest_char);
ht[s[i]] = i;
}
}
// special case: e.g., "aaaa" or "aaabb" when k = 2
if (k == ht.size())
max_len = max(max_len, i - start);
return max_len;
}

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap; import java.util.Iterator; import java.util.List;
import java.util.Map;
public class PrintLLargestSubString {
public static void main(String[] args){ String string =
"abcdefghijklmnopqrstuvbcdefghijklmnopbcsdcelfabcdefghi";
List<Integer> list = new ArrayList<Integer> (); List<Integer>
keyList = new ArrayList<Integer> (); List<Integer> Indexlist = new
ArrayList<Integer> (); List<Integer> DifferenceList = new
ArrayList<Integer> (); Map<Integer, Integer> map = new
HashMap<Integer, Integer>(); int index = 0; int len = 1; int
j=1; Indexlist.add(0); for(int i = 0; i< string.length() ;i++) {
if(j< string.length()){
if(string.charAt(i) < string.charAt(j)){
len++;
list.add(len);
} else{
index= i+1;
Indexlist.add(index); // System.out.println("\nindex" + index);
len=1;
} } j++; } // System.out.println("\nlist" +list); System.out.println("index List" +Indexlist); // int n =
Collections.max(list); // int ind = Collections.max(Indexlist);
// System.out.println("Max number in IndexList " +n);
// System.out.println("Index Max is " +ind);
//Finding max difference in a list of elements for(int diff = 0;
diff< Indexlist.size()-1;diff++){ int difference =
Indexlist.get(diff+1)-Indexlist.get(diff);
map.put(Indexlist.get(diff), difference);
DifferenceList.add(difference); }
System.out.println("Difference between indexes" +DifferenceList); // Iterator<Integer> keySetIterator = map.keySet().iterator(); // while(keySetIterator.hasNext()){
// Integer key = keySetIterator.next();
// System.out.println("index: " + key + "\tDifference "
+map.get(key)); // // } // System.out.println("Diffferenece List" +DifferenceList); int maxdiff = Collections.max(DifferenceList); System.out.println("Max diff is " + maxdiff); ////// Integer
value = maxdiff; int key = 0; keyList.addAll(map.keySet());
Collections.sort(keyList); System.out.println("List of al keys"
+keyList); // System.out.println(map.entrySet()); for(Map.Entry entry: map.entrySet()){ if(value.equals(entry.getValue())){
key = (int) entry.getKey(); } } System.out.println("Key value of max difference starting element is " + key);
//Iterating key list and finding next key value int next = 0 ;
int KeyIndex = 0; int b; for(b= 0; b<keyList.size(); b++) {
if(keyList.get(b)==key){
KeyIndex = b; } } System.out.println("index of key\t" +KeyIndex); int nextIndex = KeyIndex+1; System.out.println("next Index = " +nextIndex); next = keyList.get(nextIndex);
System.out.println("next Index value is = " +next);
for( int z = KeyIndex; z < next ; z++) {
System.out.print(string.charAt(z)); } }
}

The problem can be solved in O(n). Idea is to maintain a window and add elements to the window till it contains less or equal 2, update our result if required while doing so. If unique elements exceeds than required in window, start removing the elements from left side.
#code
from collections import defaultdict
def solution(s, k):
length = len(set(list(s)))
count_dict = defaultdict(int)
if length < k:
return "-1"
res = []
final = []
maxi = -1
for i in range(0, len(s)):
res.append(s[i])
if len(set(res)) <= k:
if len(res) >= maxi and len(set(res)) <= k :
maxi = len(res)
final = res[:]
count_dict[maxi] += 1
else:
while len(set(res)) != k:
res = res[1:]
if maxi <= len(res) and len(set(res)) <= k:
maxi = len(res)
final = res[:]
count_dict[maxi] += 1
return len(final)
print(solution(s, k))

The idea here is to add occurrence of each character to a hashmap, and when the hasmap size increases more than k, remove the unwanted character.
private static int getMaxLength(String str, int k) {
if (str.length() == k)
return k;
var hm = new HashMap<Character, Integer>();
int maxLength = 0;
int startCounter = 0;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (hm.get(c) != null) {
hm.put(c, hm.get(c) + 1);
} else {
hm.put(c, 1);
}
//atmost K different characters
if (hm.size() > k) {
maxLength = Math.max(maxLength, i - startCounter);
while (hm.size() > k) {
char t = str.charAt(startCounter);
int count = hm.get(t);
if (count > 1) {
hm.put(t, count - 1);
} else {
hm.remove(t);
}
startCounter++;
}
}
}
return maxLength;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Find all common substrings with a minimum length of 2 characters - java

Related

Most basic way to insert a character into a string in Java?

How to find the longest substring with equal amount of characters efficiently

How can I get the longest increasing subsequence in a string?

How to organize a single array into a two-dimensional one?

How to find the longest substring containing two unique repeating characters

Categories

Resources