Find longest substring formed with characters of other string - java

Given two strings:
str1 = "abcdefacbccbagfacbacer"
str2 = "abc"
I've to find the longest substring in str1 that is formed by the subset of characters of str2, in this case it would be - 7 (acbccba). What would be the approach to solve this in least complexity. First I thought of DP. But, I guess DP is really not required for this, as we have to search for substring, and not subsequence. Then I though of suffix tree. But that would require extra pre-processing time.
What would be the best way to do this? In fact, is this problem even suitable for a suffix tree, or DP?

The easiest approach by far:
Build a hashset of the second string.
Loop over the first string and for each character, check if it is in the hashset. Keep track of the longest substring.
Running time: O(n+m) where n is the length of str1 and m is the length of str2.
(Non-tested) code:
Set<Character> set = new HashSet<>();
for (int i = 0; i < str2.length(); i++) {
set.add(str2.charAt(i));
}
int longest = 0;
int current = 0;
int longestEnd = -1;
for (int i = 0; i < str1.length(); i++) {
if (set.contains(str1.charAt(i)) {
current++;
if (current > longest) {
longest = current;
longestEnd = i + 1;
}
} else {
current = 0;
}
}
String result = "";
if (longest > 0) {
result = str1.substr(longestEnd - longest, longestEnd);
}

Just an idea: wrap second string in [] and use Pattern's match method:
Pattern p = Pattern.compile("(["+str2+"])");
Matcher m = p.matcher(str1);
m.find();
and then m.group(1) shall find it.

There is actually only one way i can think of:
Go on the chars of the string str1.
Foreach char in str1 check if its in str2
increase a counter (i) eachtime the current char in str1 was found in str2
Once the char in str1 not part of str2 save the counter (i) value if its begger than the maxfound counter in maxfound which represents the longest found sequence and reset the (i) counter.

Tested code in Perl.
use strict;
use warnings;
my $str1 = "abcdefacbccbagfacbacer";
my $str2 = "abc";
my #str1 = split ("", $str1);
my #str2 = split ("", $str2);
my #res = ();
my $index = undef;
my $max = 0;
my #max_char = ();
for(my $i = 0; $i < #str1; $i++){
if ($str1[$i] =~ /[#str2]/){
push (#res , $str1[$i]);
next;
}else{
if(#res > $max){
$max = #res;
#max_char = #res;
$index = $i;
}
#res = ();
}
}
if(#res > $max){
#max_char = #res;
$index = $i;
}
$index = $index - $#max_char - 1;
print "Longest substring = #max_char. Starting from index $index";

Related

Is algorithm optimal and does it satisfy specified complexity?

Characters of given string must be sorted according to the order defined by another pattern string. Requirements for complexity O(n + m) where n is length of string and m is length of pattern.
Example:
Pattern: 1234567890AaBbCcDdEeFfGgHh
String: dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g
Result: 112222233456677790AaaaaaBbCccDddEeeFfGggHh
Pattern has all characters of the string and each one appears in pattern only once.
My code:
// Instances of possible values ​​for input:
String pattern = "1234567890AaBbCcDdEeFfGgHh";
String string = "dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g";
// Builder to collect characters for sorted result:
StringBuilder result = new StringBuilder();
// Hash table based on characters from pattern to count occurrence of each character in string:
Map<Character, Integer> characterCount = new LinkedHashMap<>();
for (int i = 0; i < pattern.length(); i++) {
// Put each character from pattern and initialize its counter with initial value of 0:
characterCount.put(pattern.charAt(i), 0);
}
// Traverse string and increment counter at each occurrence of character
for (int i = 0; i < string.length(); i++) {
char ch = string.charAt(i);
Integer count = characterCount.get(ch);
characterCount.put(ch, ++count);
}
// Traverse completed dictionary and collect sequentially all characters collected from string
for (Map.Entry<Character, Integer> entry : characterCount.entrySet()) {
Integer count = entry.getValue();
if (count > 0) {
Character ch = entry.getKey();
// Append each character as many times as it appeared in string
for (int i = 0; i < count; i++) {
result.append(ch);
}
}
}
// Get final result from builder
return result.toString();
Is this code optimal? Is there any way to improve this algorithm? Do I understand correctly that it satisfies the given complexity O(n + m)?
Not sure if timing wise yours or mine is faster.
But here's an alternative:
import java.math.BigDecimal;
class Playground {
public static void main(String[ ] args) {
String pattern = "1234567890AaBbCcDdEeFfGgHh";
String s = "dH7ee2D6a341Fb9Ea20dhC1g7ca32Ba2Gac5f76A2g";
long startTime = System.nanoTime();
StringBuilder sb = new StringBuilder();
for (char c : pattern.toCharArray()) {
sb.append(s.replaceAll("[^" + c + "]", ""));
}
System.out.println(sb.toString());
BigDecimal elapsedTime =
new BigDecimal( String.valueOf(System.nanoTime() - startTime)
)
.divide(
new BigDecimal( String.valueOf(1_000_000_000)
)
);
System.out.println(elapsedTime + " seconds");
}
}
Explanation:
For each character in pattern, use a String's regex based replaceAll method to replace all characters except the current one with an empty string. Rinse and repeat. That will leave you with the count of each character in original intact, ordered by the character sequence of pattern.
Outputs:
112222233456677790AaaaaaBbCccDddEeeFfGggHh
0.021151652 seconds
(The timing is somewhat subjective. It came from the Sololearn Java Playground. It obviously depends on the current load on their servers)

searching a Char letter by letter

Trying to search for patterns of letters in a file, the pattern is entered by a user and comes out as a String, so far I've got it to find the first letter by unsure how to make it test to see if the next letter also matches the pattern.
This is the loop I currently have. any help would be appreciated
public void exactSearch(){
if (pattern==null){UI.println("No pattern");return;}
UI.println("===================\nExact searching for "+patternString);
int j = 0 ;
for(int i=0; i<data.size(); i++){
if(patternString.charAt(i) == data.get(i) )
j++;
UI.println( "found at " + j) ;
}
}
You need to iterate over the first string until you find the first character of the other string. From there, you can create an inner loop and iterate on both simultaneously, like you did.
Hint: be sure to look watch for boundaries as the strings might not be of the same size.
You can try this :-
String a1 = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = 0;
while(foundIndex != -1) {
foundIndex = a1.indexOf(pattern,foundIndex);
if(foundIndex != -1)
{
System.out.println(foundIndex);
foundIndex += 1;
}
}
indexOf - first parameter is the pattern string,
second parameter is starting index from where we have to search.
If pattern is found, it will return the starting index from where the pattern matched.
If pattern is not found, indexOf will return -1.
String data = "foo-bar-baz-bar-";
String pattern = "bar";
int foundIndex = data.indexOf(pattern);
while (foundIndex > -1) {
System.out.println("Match found at: " + foundIndex);
foundIndex = data.indexOf(pattern, foundIndex + pattern.length());
}
Based on your request, you can use this algorithm to search for your positions:
1) We check if we reach at the end of the string, to avoid the invalidIndex error, we verify if the remaining substring's size is smaller than the pattern's length.
2) We calculate the substring at each iteration and we verify the string with the pattern.
List<Integer> positionList = new LinkedList<>();
String inputString = "AAACABCCCABC";
String pattern = "ABC";
for (int i = 0 ; i < inputString.length(); i++) {
if (inputString.length() - i < pattern.length()){
break;
}
String currentSubString = inputString.substring(i, i + pattern.length());
if (currentSubString.equals(pattern)){
positionList.add(i);
}
}
for (Integer pos : positionList) {
System.out.println(pos); // Positions : 4 and 9
}
EDIT :
Maybe it can be optimized, not to use a Collection for this simple task, but I used a LinkedList to write a quicker approach.

How many times one string contains another [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Occurences of substring in a string
As in the subject how to check how many times one string contains another one?
Example:
s1 "babab"
s2 "bab"
Result : 2
If i use Matcher it does only recognize first occurence:
String s1 = JOptionPane.showInputDialog(" ");
String s2 = JOptionPane.showInputDialog(" ");
Pattern p = Pattern.compile(s2);
Matcher m = p.matcher(s1);
int counter = 0;
while(m.find()){
System.out.println(m.group());
counter++;
}
System.out.println(counter);
I can do it like that, but I would like below to use Java libraries iike Scanner, StringTokenizer, Matcher etc:
String s1 = JOptionPane.showInputDialog(" ");
String s2 = JOptionPane.showInputDialog(" ");
String pom;
int count = 0;
for(int i = 0 ; i< s1.length() ; i++){
if(s1.charAt(i) == s2.charAt(0)){
if(i + s2.length() <= s1.length()){
pom = s1.substring(i,i+s2.length());
if(pom.equals(s2)){
count++;
}
}
}
}
System.out.println(count);
One liner solution for the lulz
longStr is the input string. findStr is the string to search for. No assumption, except that longStr and findStr must not be null and findStr must have at least 1 character.
longStr.length() - longStr.replaceAll(Pattern.quote(findStr.substring(0,1)) + "(?=" + Pattern.quote(findStr.substring(1)) + ")", "").length()
Since 2 matches are considered different as long as they starts at different index, and overlapping can happen, we need a way to differentiate between the matches and allow for matched part to be overlapped.
The trick is to consume only the first character of the search string, and use look-ahead to assert the rest of the search string. This allows overlapping portion to be rematched, and by removing the first character of the match, we can count the number of matches.
i think this might work if you know the word you are looking for in the string you might need to edit the regex pattern tho.
String string = "hellohellohellohellohellohello";
Pattern pattern = Pattern.compile("hello");
Matcher matcher = pattern.matcher(string);
int count = 0;
while (matcher.find()) count++;
The class Matcher has two methods "start" and "end" which return the start index and end index of the last match. Further, the method find has an optional parameter "start" at which it starts searching.
you can do it like this
private int counterString(String s,String search) {
int times = 0;
int index = s.indexOf(search,0);
while(index > 0) {
index = s.indexOf(search,index+1);
++times;
}
return times;
}
Some quick Bruce Forte solution:
String someString = "bababab";
String toLookFor = "bab";
int count = 0;
for (int i = 0; i < someString.length(); i++) {
if (someString.length() - i >= toLookFor.length()) {
if (someString.substring(i, i + toLookFor.length()).equals(toLookFor) && !"".equals(toLookFor)) {
count++;
}
}
}
System.out.println(count);
This prints out 3. Please note I assume that none of the Strings is null.

Finding characters in a string

i'm doing an encoding program where i'm supposed to delete every character in the string which appears twice. i've tried to traverse through the string but it hasn't worked. does anyone know how to do this? Thanks.
public static String encodeScrambledAlphabet(String str)
{
String newword = str;
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
newword += alphabet;
newword = newword.toUpperCase();
for (int i = 0, j = newword.length(); i < newword.length() && j >=0; i++,j--)
{
char one = newword.charAt(i);
char two = newword.charAt(j);
if (one == two)
{
newword = newword.replace(one, ' ');
}
}
newword = newword.replaceAll(" ", "");
return newword;
}
Assuming that you would like to keep only the first occurrence of the character, you can do this:
boolean seen[65536];
StringBuilder res = new StringBuilder();
str = str.toUpperCase();
for (char c : str.toCharArray()) {
if (!seen[c]) res.append(c);
seen[c] = true;
}
return res.toString();
The seen array contains flags, one per character, indicating that we've seen this character already. If your characters are all ASCII, you can shrink the seen array to 128.
Assuming by saying deleting characters that appears twice, you mean AAABB becomes AAA, below code should work for you.
static String removeDuplicate(String s) {
StringBuilder newString = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
String s1 = s.substring(i, i + 1);
// We need deep copy of original String.
String s2 = new String(s);
// Difference in size in two Strings gives you the number of
// occurences of that character.
if(s.length() - s2.replaceAll(s1, "").length() != 2)
newString.append(s1);
}
return newString.toString();
}
Efficiency of this code is arguable :) It might be better approach to count the number of occurences of character by a loop.
So, from the code that you've shown, it looks like you aren't comparing every character in the string. You are comparing the first and last, then the second and next to last. Example:
Here's your string:
THISISTHESTRINGSTRINGABCDEFGHIJKLMNOPQRSTUVWXYZ
First iteration, you will be comparing the T at the beginning, and the Z at the end.
Second iteration, you will be comparing the H and the Y.
Third: I and X
etc.
So the T a the beginning never gets compared to the rest of the characters.
I think a better way to do this would be to to do a double for loop:
int length = newword.length(); // This way the number of iterations doesn't change
for(i = 0; i < length; i++){
for(j = 0; j < length; j++){
if(i!=j){
if(newword.charAt(i) == newword.charAt(j)){
newword.replace(newword.charAt(i), ' ');
}
}
}
}
I'm sure that's not the most efficient algorithm for it, but it should get it done.
EDIT: Added an if statement in the middle, to handle i==j case.
EDIT AGAIN: Here's an almost identical post: function to remove duplicate characters in a string

Parse String and Replace Letters Java

At input i have some string : "today snowing know " , here i have 3 words , so i must to parse them is such way : every character i must compare with all other characters , and to sum how many same characters these words have , like exemple for "o" letter will be 2 (from "today" and "snowing") or "w" letter will be 2 (from "know" and "snowing"). After that i must to replace these characters with number(transformed in char format) of letters. The result should be "13111 133211 1332".
What i did ?
First i tape some words and
public void inputStringsForThreads () {
boolean flag;
do {
// will invite to input
stringToParse = Input.value();
try {
flag = true;
// in case that found nothing , space , number and other special character , throws an exception
if (stringToParse.equals("") | stringToParse.startsWith(" ") | stringToParse.matches(".*[0-9].*") | stringToParse.matches(".*[~`!##$%^&*()-+={};:',.<>?/'_].*"))
throw new MyStringException(stringToParse);
else analizeString(stringToParse);
}
catch (MyStringException exception) {
stringToParse = null;
flag = false;
exception.AnalizeException();
}
}
while (!flag);
}
I eliminate spaces between words , and from those words make just one
static void analizeString (String someString) {
// + sign treat many spaces as one
String delimitator = " +";
// words is a String Array
words = someString.split(delimitator);
// temp is a string , will contain a single word
temp = someString.replaceAll("[^a-z^A-Z]","");
System.out.println("=============== Words are : ===============");
for (int i=0;i<words.length;i++)
System.out.println((i+1)+")"+words[i]);
}
So i try to compare for every word in part (every word is split in letters) with all letter from all words , But i don know how to count number of same letter and after replace letters with correct number of each letter??? Any ideas ?
// this will containt characters for every word in part
char[] motot = words[id].toCharArray();
// this will containt all characters from all words
char[] notot = temp.toCharArray();
for (int i =0;i<words[i].length();i++)
for (int j=0;j<temp.length ;j++)
{
if (i == j) {
System.out.println("Same word");
}
else if (motot[i] == notot[j] ) {
System.out.println("Found equal :"+lol[i]+" "+lol1[j]);
}}
For counting you might want to use a Map<Character, Integer> counter like java.util.HashMap. If getting a Value(Integer) using a specific key (Character) from counter is 'not null', then your value++ (leverage autoboxing). Otherwise put a new entry (char, 1) in the counter.
Replacing the letters with the numbers should be fairly easy then.
It is better to use Pattern Matching like this:
initially..
private Matcher matcher;
Pattern regexPattern = Pattern.compile( pattern );
matcher = regexPattern.matcher("");
for multiple patterns to match.
private final String[] patterns = new String [] {/* instantiate patterns here..*/}
private Matcher matchers[];
for ( int i = 0; i < patterns.length; i++) {
Pattern regexPattern = Pattern.compile( pattern[i] );
matchers[i] = regexPattern.matcher("");
}
and then for matching pattern.. you do this..
if(matcher.reset(charBuffer).find() ) {//matching pattern.}
for multiple matcher check.
for ( int i = 0; i < matchers.length; i++ ) if(matchers[i].reset(charBuffer).find() ) {//matching pattern.}
Don't use string matching, not efficient.
Always use CharBuffer instead of String.
Here is some C# code (which is reasonably similar to Java):
void replace(string s){
Dictionary<char, int> counts = new Dictionary<char, int>();
foreach(char c in s){
// skip spaces
if(c == ' ') continue;
// update count for char c
if(!counts.ContainsKey(c)) counts.Add(c, 1);
else counts[c]++;
}
// replace characters in s
for(int i = 0; i < s.Length; i++)
if(s[i] != ' ')
s[i] = counts[s[i]];
}
Pay attention to immutable strings in the second loop. Might want to use a StringBuilder of some sort.
Here is a solution that works for lower case strings only. Horrible horrible code, but I was trying to see how few lines I could write a solution in.
public static String letterCount(String in) {
StringBuilder out = new StringBuilder(in.length() * 2);
int[] count = new int[26];
for (int t = 1; t >= 0; t--)
for (int i = 0; i < in.length(); i++) {
if (in.charAt(i) != ' ') count[in.charAt(i) - 'a'] += t;
out.append((in.charAt(i) != ' ') ? "" + count[in.charAt(i) - 'a'] : " ");
}
return out.substring(in.length());
}

Categories