How to find the longest substring with equal amount of characters efficiently

How to find the longest substring with equal amount of characters efficiently - java

I have a string that consists of characters A,B,C and D and I am trying to calculate the length of the longest substring that has an equal amount of each one of these characters in any order.
For example ABCDB would return 4, ABCC 0 and ADDBCCBA 8.
My code currently:
public int longestSubstring(String word) {
HashMap<Integer, String> map = new HashMap<Integer, String>();
for (int i = 0; i<word.length()-3; i++) {
map.put(i, word.substring(i, i+4));
}
StringBuilder sb;
int longest = 0;
for (int i = 0; i<map.size(); i++) {
sb = new StringBuilder();
sb.append(map.get(i));
int a = 4;
while (i<map.size()-a) {
sb.append(map.get(i+a));
a+= 4;
}
String substring = sb.toString();
if (equalAmountOfCharacters(substring)) {
int length = substring.length();
if (length > longest)
longest = length;
}
}
return longest;
}
This currently works pretty well if the string length is 10^4 but I'm trying to make it 10^5. Any tips or suggestions would be appreciated.

Let's assume that cnt(c, i) is the number of occurrences of the character c in the prefix of length i.
A substring (low, high] has an equal amount of two characters a and b iff cnt(a, high) - cnt(a, low) = cnt(b, high) - cnt(b, low), or, put it another way, cnt(b, high) - cnt(a, high) = cnt(b, low) - cnt(a, low). Thus, each position is described by a value of cnt(b, i) - cnt(a, i). Now we can generalize it for more that two characters: each position is described by a tuple (cnt(a_2, i) - cnt(a_1, i), ..., cnt(a_k, i) - cnt(a_1, i)), where a_1 ... a_k is the alphabet.
We can iterate over the given string and maintain the current tuple. At each step, we should update the answer by checking the value of i - first_occurrence(current_tuple), where first_occurrence is a hash table that stores the first occurrence of each tuple seen so far. Do not forget to put a tuple of zeros to the hash map before iteration(it corresponds to an empty prefix).

If there were only A's and B's, then you could do something like this.
def longest_balanced(word):
length = 0
cumulative_difference = 0
first_index = {0: -1}
for index, letter in enumerate(word):
if letter == 'A':
cumulative_difference += 1
elif letter == 'B':
cumulative_difference -= 1
else:
raise ValueError(letter)
if cumulative_difference in first_index:
length = max(length, index - first_index[cumulative_difference])
else:
first_index[cumulative_difference] = index
return length
Life is more complicated with all four letters, but the idea is much the same. Instead of keeping just one cumulative difference, for A's versus B's, we keep three, for A's versus B's, A's versus C's, and A's versus D's.

Well, first of all abstain from constructing any strings.
If you don't produce any (or nearly no) garbage, there's no need to collect it, which is a major plus.
Next, use a different data-structure:
I suggest 4 byte-arrays, storing the count of their respective symbol in the 4-span starting at the corresponding string-index.
That should speed it up considerably.

You can count the occurrences of the characters in word. Then, a possible solution could be:
If min is the minimum number of occurrences of any character in word, then min is also the maximum possible number of occurrences of each character in the substring we are looking for. In the code below, min is maxCount.
We iterate over decreasing values of maxCount. At every step, the string we are searching for will have length maxCount * alphabetSize. We can view this as the size of a sliding window we can slide over word.
We slide the window over word, counting the occurrences of the characters in the window. If the window is the substring we are searching for, we return the result. Otherwise, we keep searching.
[FIXED] The code:
private static final int ALPHABET_SIZE = 4;
public int longestSubstring(String word) {
// count
int[] count = new int[ALPHABET_SIZE];
for (int i = 0; i < word.length(); i++) {
char c = word.charAt(i);
count[c - 'A']++;
}
int maxCount = word.length();
for (int i = 0; i < count.length; i++) {
int cnt = count[i];
if (cnt < maxCount) {
maxCount = cnt;
}
}
// iterate over maxCount until found
boolean found = false;
while (maxCount > 0 && !found) {
int substringLength = maxCount * ALPHABET_SIZE;
found = findSubstring(substringLength, word, maxCount);
if (!found) {
maxCount--;
}
}
return found ? maxCount * ALPHABET_SIZE : 0;
}
private boolean findSubstring(int length, String word, int maxCount) {
int startIndex = 0;
boolean found = false;
while (startIndex + length <= word.length()) {
int[] count = new int[ALPHABET_SIZE];
for (int i = startIndex; i < startIndex + length; i++) {
char c = word.charAt(i);
int cnt = ++count[c - 'A'];
if (cnt > maxCount) {
break;
}
}
if (equalValues(count, maxCount)) {
found = true;
break;
} else {
startIndex++;
}
}
return found;
}
// Returns true if all values in c are equal to value
private boolean equalValues(int[] count, int value) {
boolean result = true;
for (int i : count) {
if (i != value) {
result = false;
break;
}
}
return result;
}
[MERGED] This is Hollis Waite's solution using cumulative counts, but taking my observations at points 1. and 2. into consideration. This may improve performance for some inputs:
private static final int ALPHABET_SIZE = 4;
public int longestSubstring(String word) {
// count
int[][] cumulativeCount = new int[ALPHABET_SIZE][];
for (int i = 0; i < ALPHABET_SIZE; i++) {
cumulativeCount[i] = new int[word.length() + 1];
}
int[] count = new int[ALPHABET_SIZE];
for (int i = 0; i < word.length(); i++) {
char c = word.charAt(i);
count[c - 'A']++;
for (int j = 0; j < ALPHABET_SIZE; j++) {
cumulativeCount[j][i + 1] = count[j];
}
}
int maxCount = word.length();
for (int i = 0; i < count.length; i++) {
int cnt = count[i];
if (cnt < maxCount) {
maxCount = cnt;
}
}
// iterate over maxCount until found
boolean found = false;
while (maxCount > 0 && !found) {
int substringLength = maxCount * ALPHABET_SIZE;
found = findSubstring(substringLength, word, maxCount, cumulativeCount);
if (!found) {
maxCount--;
}
}
return found ? maxCount * ALPHABET_SIZE : 0;
}
private boolean findSubstring(int length, String word, int maxCount, int[][] cumulativeCount) {
int startIndex = 0;
int endIndex = (startIndex + length) - 1;
boolean found = true;
while (endIndex < word.length()) {
for (int i = 0; i < ALPHABET_SIZE; i++) {
if (cumulativeCount[i][endIndex] - cumulativeCount[i][startIndex] != maxCount) {
found = false;
break;
}
}
if (found) {
break;
} else {
startIndex++;
endIndex++;
}
}
return found;
}

You'll probably want to cache cumulative counts of characters for each index of String -- that's where the real bottleneck is. Haven't thoroughly tested but something like the below should work.
public class Test {
static final int LEN = 4;
static class RandomCharSequence implements CharSequence {
private final Random mRandom = new Random();
private final int mAlphabetLen;
private final int mLen;
private final int mOffset;
RandomCharSequence(int pLen, int pOffset, int pAlphabetLen) {
mAlphabetLen = pAlphabetLen;
mLen = pLen;
mOffset = pOffset;
}
public int length() {return mLen;}
public char charAt(int pIdx) {
mRandom.setSeed(mOffset + pIdx);
return (char) (
'A' +
(mRandom.nextInt() % mAlphabetLen + mAlphabetLen) % mAlphabetLen
);
}
public CharSequence subSequence(int pStart, int pEnd) {
return new RandomCharSequence(pEnd - pStart, pStart, mAlphabetLen);
}
#Override public String toString() {
return (new StringBuilder(this)).toString();
}
}
public static void main(String[] pArgs) {
Stream.of("ABCDB", "ABCC", "ADDBCCBA", "DADDBCCBA").forEach(
pWord -> System.out.println(longestSubstring(pWord))
);
for (int i = 0; ; i++) {
final double len = Math.pow(10, i);
if (len >= Integer.MAX_VALUE) break;
System.out.println("Str len 10^" + i);
for (int alphabetLen = 1; alphabetLen <= LEN; alphabetLen++) {
final Instant start = Instant.now();
final int val = longestSubstring(
new RandomCharSequence((int) len, 0, alphabetLen)
);
System.out.println(
String.format(
" alphabet len %d; result %08d; time %s",
alphabetLen,
val,
formatMillis(ChronoUnit.MILLIS.between(start, Instant.now()))
)
);
}
}
}
static String formatMillis(long millis) {
return String.format(
"%d:%02d:%02d.%03d",
TimeUnit.MILLISECONDS.toHours(millis),
TimeUnit.MILLISECONDS.toMinutes(millis) -
TimeUnit.HOURS.toMinutes(TimeUnit.MILLISECONDS.toHours(millis)),
TimeUnit.MILLISECONDS.toSeconds(millis) -
TimeUnit.MINUTES.toSeconds(TimeUnit.MILLISECONDS.toMinutes(millis)),
TimeUnit.MILLISECONDS.toMillis(millis) -
TimeUnit.SECONDS.toMillis(TimeUnit.MILLISECONDS.toSeconds(millis))
);
}
static int longestSubstring(CharSequence pWord) {
// create array that stores cumulative char counts at each index of string
// idx 0 = char (A-D); idx 1 = offset
final int[][] cumulativeCnts = new int[LEN][];
for (int i = 0; i < LEN; i++) {
cumulativeCnts[i] = new int[pWord.length() + 1];
}
final int[] cumulativeCnt = new int[LEN];
for (int i = 0; i < pWord.length(); i++) {
cumulativeCnt[pWord.charAt(i) - 'A']++;
for (int j = 0; j < LEN; j++) {
cumulativeCnts[j][i + 1] = cumulativeCnt[j];
}
}
final int maxResult = Arrays.stream(cumulativeCnt).min().orElse(0) * LEN;
if (maxResult == 0) return 0;
int result = 0;
for (int initialOffset = 0; initialOffset < LEN; initialOffset++) {
for (
int start = initialOffset;
start < pWord.length() - result;
start += LEN
) {
endLoop:
for (
int end = start + result + LEN;
end <= pWord.length() && end - start <= maxResult;
end += LEN
) {
final int substrLen = end - start;
final int expectedCharCnt = substrLen / LEN;
for (int i = 0; i < LEN; i++) {
if (
cumulativeCnts[i][end] - cumulativeCnts[i][start] !=
expectedCharCnt
) {
continue endLoop;
}
}
if (substrLen > result) result = substrLen;
}
}
}
return result;
}
}

Suppose there are K possible letters in a string of length N. We could track the balance of letters seen with a vector pos of length K that is updated as follows:
If letter 1 is seen, add (K-1, -1, -1, ...)
If letter 2 is seen, add (-1, K-1, -1, ...)
If letter 3 is seen, add (-1, -1, K-1, ...)
Maintain a hash that maps pos to the first string position where pos is reached. Balanced substrings occur whenever hash[pos] already exists and the substring value is s[hash[pos]:pos].
The cost of maintaining the hash is O(log N) so processing the string takes O(N log N). How does this compare with solutions so far? These types of problems tend to have linear solutions but I haven't come across one yet.
Here's some code demonstrating the idea for 3 letters and a run using biased random strings. (Uniform random strings allow for solutions that are around half the string length, which is unwieldy to print).
#!/usr/bin/python
import random
from time import time
alphabet = "abc"
DIM = len(alphabet)
def random_string(n):
# return a random string over choices[] of length n
# distribution of letters is non-uniform to make matches harder to find
choices = "aabbc"
s = ''
for i in range(n):
r = random.randint(0, len(choices) - 1)
s += choices[r]
return s
def validate(s):
# verify frequencies of each letter are the same
f = [0, 0, 0]
a2f = {alphabet[i] : i for i in range(DIM)}
for c in s:
f[a2f[c]] += 1
assert f[0] == f[1] and f[1] == f[2]
def longest_balanced(s):
"""return length of longest substring of s containing equal
populations of each letter in alphabet"""
slen = len(s)
p = [0 for i in range(DIM)]
vec = {alphabet[0] : [2, -1, -1],
alphabet[1] : [-1, 2, -1],
alphabet[2] : [-1, -1, 2]}
x = -1
best = -1
hist = {str([0, 0, 0]) : -1}
for c in s:
x += 1
p = [p[i] + vec[c][i] for i in range(DIM)]
pkey = str(p)
if pkey not in hist:
hist[pkey] = x
else:
span = x - hist[pkey]
assert span % DIM == 0
if span > best:
best = span
cand = s[hist[pkey] + 1: x + 1]
print("best so far %d = [%d,%d]: %s" % (best,
hist[pkey] + 1,
x + 1,
cand))
validate(cand)
return best if best > -1 else 0
def main():
#print longest_balanced( "aaabcabcbbcc" )
t0 = time()
s = random_string(1000000)
print "generate time:", time() - t0
t1 = time()
best = longest_balanced( s )
print "best:", best
print "elapsed:", time() - t1
main()
Sample run on an input of 10^6 letters with an alphabet of 3 letters:
$ ./bal.py
...
best so far 189 = [847894,848083]: aacacbcbabbbcabaabbbaabbbaaaacbcaaaccccbcbcbababaabbccccbbabbacabbbbbcaacacccbbaacbabcbccaabaccabbbbbababbacbaaaacabcbabcbccbabbccaccaabbcabaabccccaacccccbaacaaaccbbcbcabcbcacaabccbacccacca
best: 189
elapsed: 1.43609690666

Related

how to find the highest most repeated number in an integer

for example if
int number = 30530;
it has to return 3`
this is what I tried but it's over my mind, I don't know where I lost it and I also would appreciate it if there is any other way to do it without converting it to String
public static int maharishi(int functionNum){
String num = Integer.toString(functionNum);
int length = num.length();
int count = 0;
int tempCount = 0;
int charLetter = 0;
for(int i = 1; i < length; i++ ){
for(int j = 1; j < length; j++){
if(i==1 && j!=1 ){
if(num.charAt(i) == num.charAt(j)){
tempCount++;
if(tempCount > count){
count = tempCount;
charLetter = i;
}
}
}
}
}
char highestChar = num.charAt(charLetter);
int change = Integer.parseInt(String.valueOf(highestChar));
return change;
}

You can use a Map<Character, Integer>:
public static int maharishiMaheshYogi(int functionNum){
// Convert the number to a string
String num = Integer.toString(functionNum);
// Create a Map where you will store each character count
final Map<Character, Integer> counts = new HashMap<>();
// Iterate over each character of this string
final int length = num.length();
for (int i = 0; i < length; i++) {
final char c = num.charAt(i);
// Increment the value of its respective character in the map
final int currentCount = counts.getOrDefault(c, 0);
counts.put(c, currentCount + 1);
}
// Return the key with the maximum value in the map
Map.Entry<Character, Integer> maxEntry = null;
for (Map.Entry<Character, Integer> entry : counts.entrySet()) {
if (maxEntry == null || entry.getValue() > maxEntry.getValue()) {
maxEntry = entry;
}
}
return Integer.parseInt(maxEntry.getKey().toString());
}

Your middle if condition make all fails, you count only when i is 0, so you don't look to other values,
also make the comparison if after the inner loop, don't need to check for every value
the inner loop should start at i+1 to read only next chars
tempCount should be re-initialized at 0 at every round of outer loop
public static int maharishi(int functionNum) {
String num = Integer.toString(functionNum);
int length = num.length();
int count = 0, bestPosition = 0, tempCount;
for (int i = 0; i < length; i++) {
char c = num.charAt(i);
tempCount = 0;
for (int j = i + 1; j < length; j++) {
if (c == num.charAt(j)) {
tempCount++;
}
}
if (tempCount > count) {
count = tempCount;
bestPosition = i;
}
}
char highestChar = num.charAt(bestPosition);
return Integer.parseInt(String.valueOf(highestChar));
}
// testing
System.out.println(maharishi(1234)); // 1
System.out.println(maharishi(12344)); // 4
System.out.println(maharishi(12343)); // 3
System.out.println(maharishi(12342)); // 2
For Stream lovers
public static int maharishi(int functionNum) {
String l = Arrays.stream(Integer.toString(functionNum).split(""))
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet()
.stream()
.max(Map.Entry.comparingByValue())
.orElseThrow()
.getKey();
return Integer.parseInt(l);
}

If you don't want to convert to string, the easiest way would be to use a "helper array" to hold the count for every possible digit from 0 to 9.
public static int maharishi(int functionNum) {
int [] counts = new int[10];
//count all the digits in the number:
while (functionNum > 0) {
counts[functionNum % 10] += 1;
functionNum /= 10;
}
int record = -1;
int mostRepeated = -1;
//find higest most repeated digit
for (int i = 0; i < counts.length; i++) {
if (counts[i] >= record) {
record = counts[i]; //how many times this digit is in the number
mostRepeated = i; //what digit it is
}
return mostRepeated;
}
What it does is:
Count each digit in the number. Separates digits by division an modulus (remainder).
To separate a digit from a number without converting to string, just get a remainder of division by 10.
Run over the counts from 0 to 9, and store which count is biggest. Return the digit (index) that count belongs to, that is your answer.
You can do this without the array, but you would need more loops.
The principle of separating digits by modulus and division remains the same.

Function is working in some cases but fails when longest sub-string "reuses" a character

I have a function called lengthOfLongestSubstring and its job is to find the longest substring without any repeated characters. For the most part, it works, but when it gets an input like "dvdf" it prints out 2 (rather than 3) and gives [dv, df] when it should be [d, vdf].
So, I first go through the string and see if there are any unique characters. If there are, I append it to the ans variable. (I think this is the part that needs some fixing). If there is a duplicate, I store it in the substrings linked list and reset the ans variable to the duplicate string.
Once the whole string has been traversed, I find the longest substring and return its length.
public static int lengthOfLongestSubstring(String s) {
String ans = "";
int len = 0;
LinkedList<String> substrings = new LinkedList<String>();
for (int i = 0; i < s.length(); i++) {
if (!ans.contains("" + s.charAt(i))) {
ans += s.charAt(i);
} else {
substrings.add(ans);
ans = "" + s.charAt(i);
}
}
substrings.add(ans); // add last seen substring into the linked list
for (int i = 0; i < substrings.size(); i++) {
if (substrings.get(i).length() >= len)
len = substrings.get(i).length();
}
System.out.println(Arrays.toString(substrings.toArray()));
return len;
}
Here are some test results:
//correct
lengthOfLongestSubstring("abcabcbb") -> 3 ( [abc, abc, b, b])
lengthOfLongestSubstring("pwwkew") -> 3 ([pw, wke, w]).
lengthOfLongestSubstring("ABDEFGABEF"); -> 6 ([ABDEFG, ABEF])
// wrong
System.out.println(lengthOfLongestSubstring("acadf")); -> 3, ([ac, adf]) *should be 4, with the linked list being [a, cadf]
Any suggestions to fix this? Do I have to redo all my logic?
Thanks!

You code is mistakenly assuming that when you find a repeated character, the next candidate substring starts at the repeated character. That is not true, it starts right after the original character.
Example: If string is "abcXdefXghiXjkl", there are 3 candidate substrings: "abcXdef", "defXghi", and "ghiXjkl".
As you can see, the candidate substrings ends before a repeating character and starts after a repeating character (and begin and end of string).
So, when you find a repeating character, the position of the previous instance of that character is needed to determine the start of the next substring candidate.
The easiest way to handle that, is to build a Map of character to last seen position. That will also perform faster than continually performing substring searches to check for repeating character, like the question code and the other answers are doing.
Something like this:
public static int lengthOfLongestSubstring(String s) {
Map<Character, Integer> charPos = new HashMap<>();
List<String> candidates = new ArrayList<>();
int start = 0, maxLen = 0;
for (int idx = 0; idx < s.length(); idx++) {
char ch = s.charAt(idx);
Integer preIdx = charPos.get(ch);
if (preIdx != null && preIdx >= start) { // found repeat
if (idx - start > maxLen) {
candidates.clear();
maxLen = idx - start;
}
if (idx - start == maxLen)
candidates.add(s.substring(start, idx));
start = preIdx + 1;
}
charPos.put(ch, idx);
}
if (s.length() - start > maxLen)
maxLen = s.length() - start;
if (s.length() - start == maxLen)
candidates.add(s.substring(start));
System.out.print(candidates + ": ");
return maxLen;
}
The candidates is only there for debugging purposes, and is not needed, so without that, the code is somewhat simpler:
public static int lengthOfLongestSubstring(String s) {
Map<Character, Integer> charPos = new HashMap<>();
int start = 0, maxLen = 0;
for (int idx = 0; idx < s.length(); idx++) {
char ch = s.charAt(idx);
Integer preIdx = charPos.get(ch);
if (preIdx != null && preIdx >= start) { // found repeat
if (idx - start > maxLen)
maxLen = idx - start;
start = preIdx + 1;
}
charPos.put(ch, idx);
}
return Math.max(maxLen, s.length() - start);
}
Test
System.out.println(lengthOfLongestSubstring(""));
System.out.println(lengthOfLongestSubstring("x"));
System.out.println(lengthOfLongestSubstring("xx"));
System.out.println(lengthOfLongestSubstring("xxx"));
System.out.println(lengthOfLongestSubstring("abcXdefXghiXjkl"));
System.out.println(lengthOfLongestSubstring("abcabcbb"));
System.out.println(lengthOfLongestSubstring("pwwkew"));
System.out.println(lengthOfLongestSubstring("ABDEFGABEF"));
Output (with candidate lists)
[]: 0
[x]: 1
[x, x]: 1
[x, x, x]: 1
[abcXdef, defXghi, ghiXjkl]: 7
[abc, bca, cab, abc]: 3
[wke, kew]: 3
[ABDEFG, BDEFGA, DEFGAB]: 6

Instead of setting ans to the current char when a character match is found
ans = "" + s.charAt(i);
You should add the current char to all the characters after the first match of the current char
ans = ans.substring(ans.indexOf(s.charAt(i)) + 1) + s.charAt(i);
The full method thus becomes
public static int lengthOfLongestSubstring(String s) {
String ans = "";
int len = 0;
LinkedList<String> substrings = new LinkedList<>();
for (int i = 0; i < s.length(); i++) {
if (!ans.contains("" + s.charAt(i))) {
ans += s.charAt(i);
} else {
substrings.add(ans);
// Only the below line changed
ans = ans.substring(ans.indexOf(s.charAt(i)) + 1) + s.charAt(i);
}
}
substrings.add(ans); // add last seen substring into the linked list
for (int i = 0; i < substrings.size(); i++) {
if (substrings.get(i).length() >= len)
len = substrings.get(i).length();
}
System.out.println(Arrays.toString(substrings.toArray()));
return len;
}
Using this code the acceptance criteria you specified passed successfully
//correct
lengthOfLongestSubstring("dvdf") -> 3 ( [dv, vdf])
lengthOfLongestSubstring("abcabcbb") -> 3 ([abc, bca, cab, abc, cb, b])
lengthOfLongestSubstring("pwwkew") -> 3 ([pw, wke, kew]).
lengthOfLongestSubstring("ABDEFGABEF"); -> 6 ([ABDEFG, BDEFGA, DEFGAB, FGABE, GABEF])
lengthOfLongestSubstring("acadf"); -> 4 ([ac, cadf])

Create a nested for loop to check at each index in the array.
public static int lengthOfLongestSubstring(String s) {
String ans = "";
int len = 0;
LinkedList<String> substrings = new LinkedList<String>();
int k = 0;
for (int i = 0; i < s.length(); i++) {
if(k == s.length()) {
break;
}
for(k = i; k < s.length(); k++) {
if (!ans.contains("" + s.charAt(k))) {
ans += s.charAt(k);
} else {
substrings.add(ans);
ans = "";
break;
}
}
}
substrings.add(ans); // add last seen substring into the linked list
for (int i = 0; i < substrings.size(); i++) {
if (substrings.get(i).length() >= len)
len = substrings.get(i).length();
}
System.out.println(Arrays.toString(substrings.toArray()));
return len;
}
Example:
lengthOfLongestSubstring("ABDEFGABEF"); -> 6 ([ABDEFG, BDEFGA, DEFGAB, EFGAB, FGABE, GABEF])

Compare letter in java

I want to compare every letter on file 2 with file 1.
example :
file 1 : my name
file 2 : mi n#mes
i want to get the number of difference is 3, on file 2 : (i, #,and s).
Can you help me
Here is my code
public float getCER(String originalteks,String extractteks){
int end=0;
int start=0;
int different_char=0;
if(originalteks.length()!=extractteks.length()){
different_char=Math.abs(originalteks.length()-extractteks.length());
}
while(start<end){
if(originalteks.charAt(start)!=originalteks.charAt(start++))
different_char++;//jumlah diferent chart
}
return (float) different_char/originalteks.length();
}
And it's only counting the number of characters, not the different characters.

The following implementation tests for the total difference you need and is able to handle strings with different length, by comparing the shorter string to each substring of the longer up to the maximum offset of their difference. From those differences the smallest is chosen. Of course, if handleOffset is false, then we limit ourselves to only the start of the string and adding the difference to the result;
public int getCER(String originalteks,String extractteks, boolean handleOffset){
String shorter = originalteks;
String longer = extractteks;
if (shorter.length() > longer.length()) {
shorter = extractteks;
longer = originalteks;
}
int[] differences = new int[handleOffset ? (longer.length() - shorter.length + 1) : 1];
for (int i = 0; i < differences.length; i++) differences[i] = 0;
for (int i = 0; i < minLength; i++) {
for (j = 0; j < differences.length; j++) {
if (shorter.charAt(i) !== longer.charAt(i + j)) {
differences[j]++;
}
}
}
int min = shorter.length() + 1;
for (int i = 0; i < differences.length; i++) {
if (differences[i] < min) min = differences[i];
}
if (!handleOffset) min += longer.length() - shorter.length();
return min;
}

This should work for you. I just comment my changes within the example.
public int getCER(String originalteks,String extractteks){
int end;
int different_char=0;
//define the shorter end
if(originalteks.length < extractteks.length)
end = originalteks.length();
else
end = extractteks.length();
//no if needed -> same length, diff will be 0
different_char=Math.abs(originalteks.length()-extractteks.length());
for(int start = 0; start < end; start++){
if(originalteks.charAt(start)!=extractteks.charAt(start))
different_char++;//jumlah diferent chart
}
return different_char;
}

KMP Algorithm for string search?

I found this very challenging coding problem online which I though I'd give a try.
The general idea is that given string of text T and pattern P, find the occurrences of this pattern, sum up it's corresponding value and return max and min. If you want to read the problem in more details, please refer to this.
However, below is the code I've provided, it works for a simple test case, but when running on multiple and complex test cases its pretty slow, and I'm not sure where my code needs to be optimized.
Can anyone please help where im getting the logic wrong.
public class DeterminingDNAHealth {
private DeterminingDNAHealth() {
/*
* Fixme:
* Each DNA contains number of genes
* - some of them are beneficial and increase DNA's total health
* - Each Gene has a health value
* ======
* - Total health of DNA = sum of all health values of beneficial genes
*/
}
int checking(int start, int end, String pattern) {
String[] genesChar = new String[] {
"a",
"b",
"c",
"aa",
"d",
"b"
};
String numbers = "123456";
int total = 0;
for (int i = start; i <= end; i++) {
total += KMPAlgorithm.initiateAlgorithm(pattern, genesChar[i]) * (i + 1);
}
return total;
}
public static void main(String[] args) {
String[] genesChar = new String[] {
"a",
"b",
"c",
"aa",
"d",
"b"
};
Gene[] genes = new Gene[genesChar.length];
for (int i = 0; i < 6; i++) {
genes[i] = new Gene(genesChar[i], i + 1);
}
String[] checking = "15caaab 04xyz 24bcdybc".split(" ");
DeterminingDNAHealth DNA = new DeterminingDNAHealth();
int i, mostHealthiest, mostUnhealthiest;
mostHealthiest = Integer.MIN_VALUE;
mostUnhealthiest = Integer.MAX_VALUE;
for (i = 0; i < checking.length; i++) {
int start = Character.getNumericValue(checking[i].charAt(0));
int end = Character.getNumericValue(checking[i].charAt(1));
String pattern = checking[i].substring(2, checking[i].length());
int check = DNA.checking(start, end, pattern);
if (check > mostHealthiest)
mostHealthiest = check;
else
if (check < mostUnhealthiest)
mostUnhealthiest = check;
}
System.out.println(mostHealthiest + " " + mostUnhealthiest);
// DNA.checking(1,5, "caaab");
}
}
KMPAlgorithm
public class KMPAlgorithm {
KMPAlgorithm() {}
public static int initiateAlgorithm(String text, String pattern) {
// let us generate our LPC table from the pattern
int[] partialMatchTable = partialMatchTable(pattern);
int matchedOccurrences = 0;
// initially we don't have anything matched, so 0
int partialMatchLength = 0;
// we then start to loop through the text, !note, not the pattern. The text that we are testing the pattern on
for (int i = 0; i < text.length(); i++) {
// if there is a mismatch and there's no previous match, then we've hit the base-case, hence break from while{...}
while (partialMatchLength > 0 && text.charAt(i) != pattern.charAt(partialMatchLength)) {
/*
* otherwise, based on the number of chars matched, we decrement it by 1.
* In fact, this is the unique part of this algorithm. It is this part that we plan to skip partialMatchLength
* iterations. So if our partialMatchLength was 5, then we are going to skip (5 - 1) iteration.
*/
partialMatchLength = partialMatchTable[partialMatchLength - 1];
}
// if however we have a char that matches the current text[i]
if (text.charAt(i) == pattern.charAt(partialMatchLength)) {
// then increment position, so hence we check the next char of the pattern against the next char in text
partialMatchLength++;
// we will know that we're at the end of the pattern matching, if the matched length is same as the pattern length
if (partialMatchLength == pattern.length()) {
// to get the starting index of the matched pattern in text, apply this formula (i - (partialMatchLength - 1))
// this line increments when a match string occurs multiple times;
matchedOccurrences++;
// just before when we have a full matched pattern, we want to test for multiple occurrences, so we make
// our match length incomplete, and let it run longer.
partialMatchLength = partialMatchTable[partialMatchLength - 1];
}
}
}
return matchedOccurrences;
}
private static int[] partialMatchTable(String pattern) {
/*
* TODO
* Note:
* => Proper prefix: All the characters in a string, with one or more cut off the end.
* => proper suffix: All the characters in a string, with one or more cut off the beginning.
*
* 1.) Take the pattern and construct a partial match table
*
* To construct partial match table {
* 1. Loop through the String(pattern)
* 2. Create a table of size String(pattern).length
* 3. For each character c[i], get The length of the longest proper prefix in the (sub)pattern
* that matches a proper suffix in the same (sub)pattern
* }
*/
// we will need two incremental variables
int i, j;
// an LSP table also known as “longest suffix-prefix”
int[] LSP = new int[pattern.length()];
// our initial case is that the first element is set to 0
LSP[0] = 0;
// loop through the pattern...
for (i = 1; i < pattern.length(); i++) {
// set our j as previous elements data (not the index)
j = LSP[i - 1];
// we will be comparing previous and current elements data. ei char
char current = pattern.charAt(i), previous = pattern.charAt(j);
// we will have a case when we're somewhere in loop and two chars will not match, and j is not in base case.
while (j > 0 && current != previous)
// we decrement our j
j = LSP[j - 1];
// simply put, if two characters are same, then we update our LSP to say that at that point, we hold the j's value
if (current == previous)
// increment our j
j++;
// update the table
LSP[i] = j;
}
return LSP;
}
}
Cource code credit to Github

You may try this KMP implementation. It is O(m+n), as KMP is intended to be. It should be a lot faster:
private static int[] failureFunction(char[] pattern) {
int m = pattern.length;
int[] f = new int[pattern.length];
f[0] = 0;
int i = 1;
int j = 0;
while (i < m) {
if (pattern[i] == pattern[j]) {
f[i] = j + 1;
i++;
j++;
} else if (j > 0) {
j = f[j - 1];
} else {
f[i] = 0;
i++;
}
}
return f;
}
private static int kmpMatch(char[] text, char[] pattern) {
int[] f = failureFunction(pattern);
int m = pattern.length;
int n = text.length;
int i = 0;
int j = 0;
while (i < n) {
if (pattern[j] == text[i]) {
if (j == m - 1){
return i - (m - 1);
} else {
i++;
j++;
}
} else if (j > 0) {
j = f[j - 1];
} else {
i++;
}
}
return -1;
}

How to find the longest substring containing two unique repeating characters

The task is to find the longest substring in a given string that is composed of any two unique repeating characters
Ex. in an input string "aabadefghaabbaagad", the longest such string is "aabbaa"
I came up with the following solution but wanted to see if there is a more efficient way to do the same
import java.util.*;
public class SubString {
public static void main(String[] args) {
//String inStr="defghgadaaaaabaababbbbbbd";
String inStr="aabadefghaabbaagad";
//String inStr="aaaaaaaaaaaaaaaaaaaa";
System.out.println("Input string is "+inStr);
StringBuilder sb = new StringBuilder(inStr.length());
String subStr="";
String interStr="";
String maxStr="";
int start=0,length=0, maxStart=0, maxlength=0, temp=0;
while(start+2<inStr.length())
{ int i=0;
temp=start;
char x = inStr.charAt(start);
char y = inStr.charAt(start+1);
sb.append(x);
sb.append(y);
while( (x==y) && (start+2<inStr.length()) )
{ start++;
y = inStr.charAt(start+1);
sb.append(y);
}
subStr=inStr.substring(start+2);
while(i<subStr.length())
{ if(subStr.charAt(i)==x || subStr.charAt(i)==y )
{ sb.append(subStr.charAt(i));
i++;
}
else
break;
}
interStr= sb.toString();
System.out.println("Intermediate string "+ interStr);
length=interStr.length();
if(maxlength<length)
{ maxlength=length;
length=0;
maxStr = new String(interStr);
maxStart=temp;
}
start++;
sb.setLength(0);
}
System.out.println("");
System.out.println("Longest string is "+maxStr.length()+" chars long "+maxStr);
}
}

Here's a hint that might guide you towards a linear-time algorithm (I assume that this is homework, so I won't give the entire solution): At the point where you have found a character that is neither equal to x nor to y, it is not necessary to go all the way back to start + 1 and restart the search. Let's take the string aabaaddaa. At the point where you have seen aabaa and the next character is d, there is no point in restarting the search at index 1 or 2, because in those cases, you'll only get abaa or baa before hitting d again. As a matter of fact, you can move start directly to index 3 (the first index of the last group of as), and since you already know that there is a contiguous sequene of as up to d, you can move i to index 5 and continue.
Edit: Pseudocode below.
// Find the first letter that is not equal to the first one,
// or return the entire string if it consists of one type of characters
int start = 0;
int i = 1;
while (i < str.length() && str[i] == str[start])
i++;
if (i == str.length())
return str;
// The main algorithm
char[2] chars = {str[start], str[i]};
int lastGroupStart = 0;
while (i < str.length()) {
if (str[i] == chars[0] || str[i] == chars[1]) {
if (str[i] != str[i - 1])
lastGroupStart = i;
}
else {
//TODO: str.substring(start, i) is a locally maximal string;
// compare it to the longest one so far
start = lastGroupStart;
lastGroupStart = i;
chars[0] = str[start];
chars[1] = str[lastGroupStart];
}
i++;
}
//TODO: After the loop, str.substring(start, str.length())
// is also a potential solution.

Same question to me, I wrote this code
public int getLargest(char [] s){
if(s.length<1) return s.length;
char c1 = s[0],c2=' ';
int start = 1,l=1, max=1;
int i = 1;
while(s[start]==c1){
l++;
start++;
if(start==s.length) return start;
}
c2 = s[start];
l++;
for(i = l; i<s.length;i++){
if(s[i]==c1 || s[i]==c2){
if(s[i]!=s[i-1])
start = i;
l++;
}
else {
l = i-start+1;
c1 = s[start];
c2 = s[i];
start = i;
}
max = Math.max(l, max);
}
return max;
}

so the way I think of this is to solve it in 2 steps
scan the entire string to find continuous streams of the same letter
loop the extracted segments and condense them until u get a gap.
This way you can also modify the logic to scan for longest sub-string of any length not just 2.
class Program
{
static void Main(string[] args)
{
//.
string input = "aabbccdddxxxxxxxxxxxxxxxxx";
int max_chars = 2;
//.
int flip = 0;
var scanned = new List<string>();
while (flip > -1)
{
scanned.Add(Scan(input, flip, ref flip));
}
string found = string.Empty;
for(int i=0;i<scanned.Count;i++)
{
var s = Condense(scanned, i, max_chars);
if (s.Length > found.Length)
{
found = s;
}
}
System.Console.WriteLine("Found:" + found);
System.Console.ReadLine();
}
/// <summary>
///
/// </summary>
/// <param name="s"></param>
/// <param name="start"></param>
/// <returns></returns>
private static string Scan(string s, int start, ref int flip)
{
StringBuilder sb = new StringBuilder();
flip = -1;
sb.Append(s[start]);
for (int i = start+1; i < s.Length; i++)
{
if (s[i] == s[i - 1]) { sb.Append(s[i]); continue; } else { flip=i; break;}
}
return sb.ToString();
}
/// <summary>
///
/// </summary>
/// <param name="list"></param>
/// <param name="start"></param>
/// <param name="repeat"></param>
/// <param name="flip"></param>
/// <returns></returns>
private static string Condense(List<string> list, int start, int repeat)
{
StringBuilder sb = new StringBuilder();
List<char> domain = new List<char>(){list[start][0]};
for (int i = start; i < list.Count; i++)
{
bool gap = false;
for (int j = 0; j < domain.Count; j++)
{
if (list[i][0] == domain[j])
{
sb.Append(list[i]);
break;
}
else if (domain.Count < repeat)
{
domain.Add(list[i][0]);
sb.Append(list[i]);
break;
}
else
{
gap=true;
break;
}
}
if (gap) { break;}
}
return sb.ToString();
}
}

A general solution: Longest Substring Which Contains K Unique Characters.
int longestKCharSubstring(string s, int k) {
int i, max_len = 0, start = 0;
// either unique char & its last pos
unordered_map<char, int> ht;
for (i = 0; i < s.size(); i++) {
if (ht.size() < k || ht.find(s[i]) != ht.end()) {
ht[s[i]] = i;
} else {
// (k + 1)-th char
max_len = max(max_len, i - start);
// start points to the next of the earliest char
char earliest_char;
int earliest_char_pos = INT_MAX;
for (auto key : ht)
if (key.second < earliest_char_pos)
earliest_char = key.first;
start = ht[earliest_char] + 1;
// replace earliest_char
ht.erase(earliest_char);
ht[s[i]] = i;
}
}
// special case: e.g., "aaaa" or "aaabb" when k = 2
if (k == ht.size())
max_len = max(max_len, i - start);
return max_len;
}

import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap; import java.util.Iterator; import java.util.List;
import java.util.Map;
public class PrintLLargestSubString {
public static void main(String[] args){ String string =
"abcdefghijklmnopqrstuvbcdefghijklmnopbcsdcelfabcdefghi";
List<Integer> list = new ArrayList<Integer> (); List<Integer>
keyList = new ArrayList<Integer> (); List<Integer> Indexlist = new
ArrayList<Integer> (); List<Integer> DifferenceList = new
ArrayList<Integer> (); Map<Integer, Integer> map = new
HashMap<Integer, Integer>(); int index = 0; int len = 1; int
j=1; Indexlist.add(0); for(int i = 0; i< string.length() ;i++) {
if(j< string.length()){
if(string.charAt(i) < string.charAt(j)){
len++;
list.add(len);
} else{
index= i+1;
Indexlist.add(index); // System.out.println("\nindex" + index);
len=1;
} } j++; } // System.out.println("\nlist" +list); System.out.println("index List" +Indexlist); // int n =
Collections.max(list); // int ind = Collections.max(Indexlist);
// System.out.println("Max number in IndexList " +n);
// System.out.println("Index Max is " +ind);
//Finding max difference in a list of elements for(int diff = 0;
diff< Indexlist.size()-1;diff++){ int difference =
Indexlist.get(diff+1)-Indexlist.get(diff);
map.put(Indexlist.get(diff), difference);
DifferenceList.add(difference); }
System.out.println("Difference between indexes" +DifferenceList); // Iterator<Integer> keySetIterator = map.keySet().iterator(); // while(keySetIterator.hasNext()){
// Integer key = keySetIterator.next();
// System.out.println("index: " + key + "\tDifference "
+map.get(key)); // // } // System.out.println("Diffferenece List" +DifferenceList); int maxdiff = Collections.max(DifferenceList); System.out.println("Max diff is " + maxdiff); ////// Integer
value = maxdiff; int key = 0; keyList.addAll(map.keySet());
Collections.sort(keyList); System.out.println("List of al keys"
+keyList); // System.out.println(map.entrySet()); for(Map.Entry entry: map.entrySet()){ if(value.equals(entry.getValue())){
key = (int) entry.getKey(); } } System.out.println("Key value of max difference starting element is " + key);
//Iterating key list and finding next key value int next = 0 ;
int KeyIndex = 0; int b; for(b= 0; b<keyList.size(); b++) {
if(keyList.get(b)==key){
KeyIndex = b; } } System.out.println("index of key\t" +KeyIndex); int nextIndex = KeyIndex+1; System.out.println("next Index = " +nextIndex); next = keyList.get(nextIndex);
System.out.println("next Index value is = " +next);
for( int z = KeyIndex; z < next ; z++) {
System.out.print(string.charAt(z)); } }
}

The problem can be solved in O(n). Idea is to maintain a window and add elements to the window till it contains less or equal 2, update our result if required while doing so. If unique elements exceeds than required in window, start removing the elements from left side.
#code
from collections import defaultdict
def solution(s, k):
length = len(set(list(s)))
count_dict = defaultdict(int)
if length < k:
return "-1"
res = []
final = []
maxi = -1
for i in range(0, len(s)):
res.append(s[i])
if len(set(res)) <= k:
if len(res) >= maxi and len(set(res)) <= k :
maxi = len(res)
final = res[:]
count_dict[maxi] += 1
else:
while len(set(res)) != k:
res = res[1:]
if maxi <= len(res) and len(set(res)) <= k:
maxi = len(res)
final = res[:]
count_dict[maxi] += 1
return len(final)
print(solution(s, k))

The idea here is to add occurrence of each character to a hashmap, and when the hasmap size increases more than k, remove the unwanted character.
private static int getMaxLength(String str, int k) {
if (str.length() == k)
return k;
var hm = new HashMap<Character, Integer>();
int maxLength = 0;
int startCounter = 0;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
if (hm.get(c) != null) {
hm.put(c, hm.get(c) + 1);
} else {
hm.put(c, 1);
}
//atmost K different characters
if (hm.size() > k) {
maxLength = Math.max(maxLength, i - startCounter);
while (hm.size() > k) {
char t = str.charAt(startCounter);
int count = hm.get(t);
if (count > 1) {
hm.put(t, count - 1);
} else {
hm.remove(t);
}
startCounter++;
}
}
}
return maxLength;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to find the longest substring with equal amount of characters efficiently - java

Related

how to find the highest most repeated number in an integer

Function is working in some cases but fails when longest sub-string "reuses" a character

Compare letter in java

KMP Algorithm for string search?

How to find the longest substring containing two unique repeating characters

Categories

Resources