Reasoning behind using a divide and conquer approach - java

I am trying to solve this question on LeetCode:
A string s is nice if, for every letter of the alphabet that s contains, it appears both in uppercase and lowercase. For example, "abABB" is nice because 'A' and 'a' appear, and 'B' and 'b' appear. However, "abA" is not because 'b' appears, but 'B' does not.
Given a string s, return the longest substring of s that is nice. If there are multiple, return the substring of the earliest occurrence. If there are none, return an empty string.
For s = "YazaAay", the expected output is: "aAa"
One of the top voted solutions uses a Divide and Conquer approach:
class Solution {
public String longestNiceSubstring(String s) {
if (s.length() < 2) return "";
char[] arr = s.toCharArray();
Set<Character> set = new HashSet<>();
for (char c: arr) set.add(c);
for (int i = 0; i < arr.length; i++) {
char c = arr[i];
if (set.contains(Character.toUpperCase(c)) && set.contains(Character.toLowerCase(c))) continue;
String sub1 = longestNiceSubstring(s.substring(0, i));
String sub2 = longestNiceSubstring(s.substring(i+1));
return sub1.length() >= sub2.length() ? sub1 : sub2;
}
return s;
}
}
I understand how it works, but not the intuition behind using a Divide and Conquer approach. In other words, if I revisit the problem again after a few days/weeks after I have forgotten everything about it, I won't be able to realize it is a Divide and Conquer problem.
What is that 'thing' that makes it solvable by a Divide and Conquer approach?

This is how the algorithm could be described in plain English:
If the entire string is nice, we are done.
Otherwise, there must be a character which exists in only one case. Such a character naturally divides the string into two substrings. Conquer each of them individually, and compare results.
Edit: BTW, I don't think it is a good example of D&C problem. The point is, once we encounter the first "bad" character, the substring to the left of it is nice. There is no need to descend into it. Just record its length and keep going. A simple loop it is.

Divide-And-Conquer, to paraphrase wikipedia, is most appropriate when a problem can be broken down into "2 or more subproblems". The solution here checks that the input string meets the condition, then breaks it in two at each character, and recursively checks the strings meet the condition until there is no solution. Generally, the application of divide-and-conquer is easy to get a feel for when the problem can be subdivided symmetrically, such as in the DeWall algorithm for computing the delaunay triangulation for a set of points (http://vcg.isti.cnr.it/publications/papers/dewall.pdf - cool stuff).
What sets the substring problem apart in this instance is it checks all (edit:) possible viable subdivisions by incrementing the line of subdivision. To clarify for anyone who might be confused, this is necessary because the string can't be split down the middle, else you might be splitting a substring like "aAaA" apart and returning only half of it in the end. This kind of meets the more condition in "two or more problems", but I agree it's not intuitive in this instance.
Hope this helps, I had to learn about this a lot recently while implementing the referenced algorithm. Someone with more experience might have a better answer.

Related

boolean character compare - trying to compare three characters at once in Java

I am trying to write a piece of code to simulate text messaging on an old keypad. 2 = a,b,c 3 = def etc.
I can read the string and pull out the character but I am trying to develop an elegant way in Java of mapping the character to the number.
I could use the Character.compare. But I am going to have to compare my character with the full alphabet.
compareOneTwo = Character.compare(ch[r], 'a'); etc
I would rather use a Boolean function that compares three characters at once using an "or"
if(ch[r] = 'a'||'b'||'c') {
But I am struggling - with getting this to work.
I appreciate that this is basic and probably a silly mistake but we all have to start somewhere...
Any help will be appreciated.
You said it yourself, you are trying to find a way to map the characters to the numbers, so use a map!
Map<Character, Integer> characters = new HashMap<>();
characters.put('a', 2);
characters.put('b', 2);
characters.put('c', 2);
characters.put('d', 3);
...
characters.put('z', 0);
Integer number = characters.get('a');
System.out.println(number); // Will print '2'
The initial setup is a bit more code since you have to specify the whole alphabet, but store it in a static variable and it'll be done once for your whole application.
This will definitely yield the best performances in terms of speed, and regarding memory usage, it's only 26 characters and as many integers, so negligible :)
Another advantage is that it is easy to update, if you need to handle a new character like *, just add one row to the map and it's done!
You can't use the OR operator as you wish.
You do have other alternatives if you don't want to have many conditions connected by OR (||) operators.
You can create a Set and use contains:
if (Set.of('a','b','c').contains(ch[r])) {}
Or you can use a range of characters if you need to check for a consecutive range:
if (ch[r] >= 'a' && ch[r] <= 'c') {}

Save Humanity Code in Hacker Rank

There is a problem under string domain in Hacker Rank, named as Save Humanity
with Link
https://www.hackerrank.com/contests/princetonolympics/challenges/save-humanity.
The are two strings given under the condition that if the two strings are equals its returns true and if there is a one-bit-error in the string then it returns true with the error indices.
Otherwise False for every other case.
My solution works fine for some test-cases,but for some test-cases the result is a timeout.
The question is how to decrease the Complexity.
For checking the one-bit-error I am using charAt function. Due to this the complexity rises.
Please help.
If I understand you correctly you have two strings that you want to compare, but you want to accept them even if the they differ in one character.
This can be done in one loop by just comparing the strings byte by byte.
char[] a = "abc".toCharArray();
char[] b = "abb".toCharArray();
boolean oneDiff = false;
for (int i = 0; i<){
if(a[i] != b[i]){
if(oneDiff)
return false;
oneDiff = true;
}
}
return true;
This has a time complexity of just O(n), which should be fast enough for most cases. If you need faster algorithms you can maybe research Edit Distance which is the name of this problem in the general case, but I don't think there are any faster algorithms.

What is the best way of detecting whether two strings differ by one one character?

I have this code, but it seems pretty unwieldy. Is there a more canonical way of doing so in Java?
public boolean oneDiff(String from, String s) {
if (from.length()!=s.length()) return false;
int differences = 0;
for (int charIndex = 0;charIndex<from.length();charIndex++) {
if (from.charAt(charIndex)!=s.charAt(charIndex)) differences++;
}
return (differences==1);
}
I agree with #mk. However to minimize the loop execution you should not run the loop till the string ends. Instead you can break the loop as soon as the difference becomes greater than 1. Like this:
for (int charIndex = 0;charIndex<from.length();charIndex++) {
if (from.charAt(charIndex)!=s.charAt(charIndex)) differences++;
if(differences > 1) break;
}
return (differences==1);
This will help in faster execution by loop optimization if this is what you want.
Nope, that really is the best way!
There's nothing built-in because this isn't something you need to do often. The closest trick is doing an xor on two integers, and then getting the Hamming Weight using bitCount, in order to check for how many flipped bits they have in common:
Integer.bitCount(int1 ^ int2)
But there's nothing like that for Strings - it's not a common case, so you have to code your own. And the way you've coded it seems fine - you really do have to loop over every character. I guess you could shorten the variable names and remove the parens around your return, but that's just cosmetic.

JAVA string how can i implement the length method

My roommate's teacher gave them a assignment to implement string length method in JAVA?
we have thought out two ways.
Check the element,and when get the out of bounds exception,it means the end of string,we catch this exception,then we can get the length.
Every time a string is pass to calculate the length,we add the special character to the end of it,it can be '\0',or "A",etc..
But we all think this two way may can finish the assignment,but they are bad(or bad habit to do with exception),it's not cool.
And we have googled it,but don't get what we want.
Something like this?
int i = 0;
for (char ch : string.toCharArray()) {
i++;
}
The pseudo-code you probably want is:
counter = 0
for(Character c in string) {
counter = counter + 1
}
This requires you to find a way to turn a Java String into an array of characters.
Likely the teacher is trying to make his or her students think, and will be satisfied with creative solutions that solve the problem.
None of these solutions would be used in the real world, because we have the String.length() method. But the creative, problem-solving process you're learning would be used in real development.
"1. Check the element,and when get the out of bounds exception,it means the end of string,we catch this exception,then we can get the length."
Here, you're causing an exception to be thrown in the normal case. A common style guideline is for exceptions to be thrown only in exceptional cases. Compared to normal flow of control, throwing an exception can be more expensive and more difficult to follow by humans.
That said, this one of your ideas has a potential advantage for very long strings. All of the posted answers so far run in linear time and space. The time and/or additional space they take to execute is proportional to the length of the string. With this approach, you could implement an O(log n) search for the length of the string.
Linear or not, it's possible that the teacher would find this approach acceptable for its creativity. Avoid if the teacher has communicated the idea that exceptions are only for exceptional cases.
"2. Every time a string is pass to calculate the length,we add the special character to the end of it,it can be '\0',or "A",etc.."
This idea has a flaw. What happens if the string contains your special character?
EDIT
A simple implementation would be to get a copy of the underlying char array with String.toCharArray(), then simply take its length. Unlike your ideas, this is not an in-place approach - making the copy requires additional space in memory.
String s = "foo";
int length = s.toCharArray().length;
Try this
public static int Length(String str) {
str = str + '\0';
int count = 0;
for (int i = 0; str.charAt(i) != '\0'; i++) {
count++;
}
return count;
}
What about:
"your string".toCharArray().length

Would Java indexOf (brute force method) be more practical for me or some other substring algorithm?

I'm looking at finding very short substrings (pattern, needle) in many short lines of text (haystack). However, I'm not quite sure which method to use outside the naive, brute force method.
Background: I'm doing a side project for fun where I receive text messaging chat logs of multiple users (anywhere from 2000-15000 lines of text and 2-50 users), and I want to find all the various pattern matches in the chat logs based on predetermined words that I've come up with. So far I have about 1600 patterns that I'm looking for, but I may look for more.
So for example, I want to find the number of food related words that are used in an average text message log such as "hamburger", "pizza", "coke", "lunch", "dinner", "restaurant", "McDonalds". While I gave out English examples, I will actually be using Korean for my program. Each of these designated words will have their own respective score, which I put in a hashmap as key and value separately. I then show the top scorers for food related words as well as the most frequent words used by those users for food words.
My current method is to eliminate each line of text by whitespaces, and process each individual word from the haystack by using contains method (which uses the indexOf method and the naive substring search algorithm) of the haystack contains the pattern.
wordFromInput.contains(wordFromPattern);
To give an example, with 17 users in chat, 13000 lines of text, and the 1600 patterns, I've found that this whole program took 12-13 seconds with this method. And on the Android app that I'm developing, it took 2 minutes and 30 seconds to process, which is far too slow.
Originally, I tried to use a hash map and to merely get the pattern instead of searching for it in the ArrayList, but I then realized that is...
not possible with hash table
for what I am trying to do with a substring.
I've looked around through Stackoverflow and found a lot of helpful and related questions, such as these two:
1 and 2. I'm somewhat more familiar with the various string algorithms (Boyer Moore, KMP, etc.)
I initially thought then that the naive method would of course be the worst type of algorithm for my case, but having found this question, I've realized that my case (short pattern, short text), might actually be more effective with the naive method. But I wanted to know if there was something that I was neglecting completely.
Here is a snippet of my code though if anyone wants to see my issue more concretely.
While I removed large parts of the code to simplify it, the primary method that I use to actually match substrings is there in the method matchWords().
I know that's really ugly and bad code (5 for loops...), so if there are any suggestions for that, I'm happy to hear it as well.
So to clean it up:
lines of text from chat logs (2000-10,000+), haystack
1600+ patterns, needle(s)
mostly using Korean characters, although some English is included
Brute force naive method is simply too slow, but debating whether there are other alternatives and even if there are, whether they are practical given the nature of short patterns and text.
I just want some input on my thought process, and possibly some general advice. But additionally, I would like some specific suggestion for a particular algorithm or method if that is possible.
You can replace the hashtable with a Trie.
Split the line of text into words using white space to separate words. Then check if the word is in the Trie. If it is in the Trie, update a counter associated with the word. Ideally, the counter would be integrated into the Trie.
This appraoch is O(C) where C is the number of characters in the text. It's highly unlikely that you can avoid checking each character at least once. Thus this approach should be as good as you can get at least in terms of big O.
However, it sounds like you may not want to list all of the possible words you are searching for. Therefore, you might want to simply use you could build a counting Trie from all of the words. If nothing else that'll probably make it easier for any pattern matching algorithm you use. Although, it might require some modifications to the Trie.
What you're describing sounds like an excellent use case for the Aho-Corasick string-matching algorithm. This algorithm finds all matches of a set of pattern strings inside of a source string and does so in linear time (plus the time to report the matches). If you have a fixed set of strings to search for, you can do linear preprocessing work up front on the patterns to search for all matches very quickly.
There's a Java implementation of Aho-Corasick available here. I haven't tried it out, but it might be a good match.
Hope this helps!
I'm pretty sure string.contains is already highly optimized, so replacing it with something else is not going to do you a lot of good.
So the way to go, I suspect, is not to look for each and every bank-word in your chat words, but rather do multiple comparisons at once.
The first way to do it would be to create one huge regular expression that will match all your bank-words. Compile it and hope the regular expression package is efficient enough (chances are - it is). You will have a rather lengthy setup stage (the regex compilation), but matches should be a lot faster.
You can build an index of the words you need to match and count them as you process them. If you can use a HashMap to lookup the patterns for each word, the cost will be O(n * m)
You can use a HashMap for all the possible words, you can then dissect the words later.
e.g. say you need to match red and apple, you can combine the sum of
redapple = 1
applered = 0
red = 10
apple = 15
This means that red is actually 11 (10 + 1), and apple is 16 (15 + 1)
I don't know Korean so I imagine the same strategies used to tinker with Strings in Korean isn't necessarily possible in the way it is with English, but perhaps this strategy in pseudocode can be applied with your knowledge of Korean to make it work. (Java is of course still the same, but for example, in Korean is it still highly likely for the letters "ough" to be in succession? Are there even letters for "ough"? But with that being said, hopefully the principle can be applied
I would use String.toCharArray to create a two-dimensional array (or ArrayList if variable size needed). The
if (first letter of word matches keyword's first letter)//we have a candidate
skip to last letter of the current word //see comment below
if(last letter of word matches keyword's last letter)//strong candidate
iterate backwards to start+1 checking remainder of letters
The reason I suggest to skip to the last letter is because statistically a "consonant, vowel" for the first two letters of a word is significantly high, especially nouns, which will consist of alot of your keywords since any food is a noun (almost all the keyword examples you gave were matched that structure of consonant, vowel). And since there are only 5 vowels(plus y), the likelihood of the second letter "i" showing up in the keyword "pizza" is inherently highly likely, yet after that point there is still a good chance that the word may turn out to not be a match.
However if you know that the first letter and the last letter match, then you probably have a much stronger candidate and can then iterate in reverse. I think over larger sets of data, this would eliminate candidates much faster than checking letters in order. Basically you'd be letting too many fake candidates past the second iteration, thus increasing your overall conditional operations. It might sound like something small, but in a project like this there's lots of reiterating, so micro-optimizations will accumulate very quickly.
If this approach can be applied in a language that's probably structurally very different from English(I'm speaking from ignorance here though), then I think it might provide some efficiency for you whether you make it happen through iterating a char array or with a scanner, or any other construct.
The trick is to realise that if you can describe the string you are searching for as a regular expression you can also, by definition, describe it with a state machine.
At every character in your message start a state machine for every one of your 1600 patterns and pass the character through it. This sounds scary but believe me most of them will terminate immediately anyway so you aren't really doing a huge amount of work. Bear in mind that a state machine can usually be encoded with a simple switch/case or a ch == s.charAt at each step so they are close to the ultimate in light-weight.
Obviously you know what to do whenever one of your search machines terminates at the end of their search. Any that terminate before full-match can be discarded immediately.
private static class Matcher {
private final int where;
private final String s;
private int i = 0;
public Matcher ( String s, int where ) {
this.s = s;
this.where = where;
}
public boolean match(char ch) {
return s.charAt(i++) == ch;
}
public int matched() {
return i == s.length() ? where: -1;
}
}
// Words I am looking for.
String[] watchFor = new String[] {"flies", "like", "arrow", "banana", "a"};
// Test string to search.
String test = "Time flies like an arrow, fruit flies like a banana";
public void test() {
// Use a LinkedList because it is O(1) to remove anywhere.
List<Matcher> matchers = new LinkedList<> ();
int pos = 0;
for ( char c : test.toCharArray()) {
// Fire off all of the matchers at this point.
for ( String s : watchFor ) {
matchers.add(new Matcher(s, pos));
}
// Discard all matchers that fail here.
for ( Iterator<Matcher> i = matchers.iterator(); i.hasNext(); ) {
Matcher m = i.next();
// Should it be removed?
boolean remove = !m.match(c);
if ( !remove ) {
// Still matches! Is it complete?
int matched = m.matched();
if ( matched >= 0 ) {
// Todo - Should use getters.
System.out.println(" "+m.s +" found at "+m.where+" active matchers "+matchers.size());
// Complete!
remove = true;
}
}
// Remove it where necessary.
if ( remove ) {
i.remove();
}
}
// Step pos to keep track.
pos += 1;
}
}
prints
flies found at 5 active matchers 6
like found at 11 active matchers 6
a found at 16 active matchers 2
a found at 19 active matchers 2
arrow found at 19 active matchers 6
flies found at 32 active matchers 6
like found at 38 active matchers 6
a found at 43 active matchers 2
a found at 46 active matchers 3
a found at 48 active matchers 3
banana found at 45 active matchers 6
a found at 50 active matchers 2
There are several simple optimisations. With some simple pre-processing the most obvious is to use the current character to determine which matchers may be applicable.
This is a pretty broad question, so I won't go into too much detail, but roughly:
Pre-process the haystacks using something like broad lemmatizer to create "topic word only" versions of the messages by noting which topics all words in it cover. For example, any occurrences of "hamburger", "pizza", "coke", "lunch", "dinner", "restaurant", or "McDonalds" would cause the "topic" word "food" to be collected for that message. Some words may have multiple topics, eg "McDonalds" may be in the topics "food" and "business". Most words won't have any topic.
After this process, you'll have haystacks consisting of only "topic" words. Then create a Map<String, Set<Integer>> and populate it with the topic word and the Set of chat message ids that contain it. This is reverse index of topic word to the chat messages that contain it.
The runtime code to find all documents that contain all n words is then trivial and super fast - near O(#terms):
private Map<String, Set<Integer>> index; // pre-populated
Set<Integer> search(String... topics) {
Set<Integer> results = null;
for (String topic : topics) {
Set<Integer> hits = index.get(topic);
if (hits == null)
return Collections.emptySet();
if (results == null)
results = new HashSet<Integer>(hits);
else
results.retainAll(hits);
if (results.isEmpty())
return Collections.emptySet(); // exit early
}
return results;
}
This will perform near O(1), and tell you which messages share all search terms. If you just want the number, use the trivial size() of the returned Set.

Categories