Related
i get termination due to timeout error when i compile. Please help me
Given two strings, determine if they share a common substring. A substring may be as small as one character.
For example, the words "a", "and", "art" share the common substring "a" . The words "be" and "cat" do not share a substring.
Input Format
The first line contains a single integer , the number of test cases.
The following pairs of lines are as follows:
The first line contains string s1 .
The second line contains string s2 .
Output Format
For each pair of strings, return YES or NO.
my code in java
public static void main(String args[])
{
String s1,s2;
int n;
Scanner s= new Scanner(System.in);
n=s.nextInt();
while(n>0)
{
int flag = 0;
s1=s.next();
s2=s.next();
for(int i=0;i<s1.length();i++)
{
for(int j=i;j<s2.length();j++)
{
if(s1.charAt(i)==s2.charAt(j))
{
flag=1;
}
}
}
if(flag==1)
{
System.out.println("YES");
}
else
{
System.out.println("NO");
}
n--;
}
}
}
any tips?
Below is my approach to get through the same HackerRank challenge described above
static String twoStrings(String s1, String s2) {
String result="NO";
Set<Character> set1 = new HashSet<Character>();
for (char s : s1.toCharArray()){
set1.add(s);
}
for(int i=0;i<s2.length();i++){
if(set1.contains(s2.charAt(i))){
result = "YES";
break;
}
}
return result;
}
It passed all the Test cases without a time out issue.
The reason for the timeout is probably: to compare two strings that each are 1.000.000 characters long, your code needs 1.000.000 * 1.000.000 comparisons, always.
There is a faster algorithm that only needs 2 * 1.000.000 comparisons. You should use the faster algorithm instead. Its basic idea is:
for each character in s1: add the character to a set (this is the first million)
for each character in s2: test whether the set from step 1 contains the character, and if so, return "yes" immediately (this is the second million)
Java already provides a BitSet data type that does all you need. It is used like this:
BitSet seenInS1 = new BitSet();
seenInS1.set('x');
seenInS1.get('x');
Since you're worried about execution time, if they give you an expected range of characters (for example 'a' to 'z'), you can solve it very efficiently like this:
import java.util.Arrays;
import java.util.Scanner;
public class Whatever {
final static char HIGHEST_CHAR = 'z'; // Use Character.MAX_VALUE if unsure.
public static void main(final String[] args) {
final Scanner scanner = new Scanner(System.in);
final boolean[] characterSeen = new boolean[HIGHEST_CHAR + 1];
mainloop:
for (int word = Integer.parseInt(scanner.nextLine()); word > 0; word--) {
Arrays.fill(characterSeen, false);
final String word1 = scanner.nextLine();
for (int i = 0; i < word1.length(); i++) {
characterSeen[word1.charAt(i)] = true;
}
final String word2 = scanner.nextLine();
for (int i = 0; i < word2.length(); i++) {
if (characterSeen[word2.charAt(i)]) {
System.out.println("YES");
continue mainloop;
}
}
System.out.println("NO");
}
}
}
The code was tested to work with a few inputs.
This uses a fast array rather than slower sets, and it only creates one non-String object (other than the Scanner) for the entire run of the program. It also runs in O(n) time rather than O(n²) time.
The only thing faster than an array might be the BitSet Roland Illig mentioned.
If you wanted to go completely overboard, you could also potentially speed it up by:
skipping the creation of a Scanner and all those String objects by using System.in.read(buffer) directly with a reusable byte[] buffer
skipping the standard process of having to spend time checking for and properly handling negative numbers and invalid inputs on the first line by making your own very fast int parser that just assumes it's getting the digits of a valid nonnegative int followed by a newline
There are different approaches to solve this problem but solving this problem in linear time is a bit tricky.
Still, this problem can be solved in linear time. Just apply KMP algorithm in a trickier way.
Let's say you have 2 strings. Find the length of both strings first. Say length of string 1 is bigger than string 2. Make string 1 as your text and string 2 as your pattern. If the length of the string is n and length of the pattern is m then time complexity of the above problem would be O(m+n) which is way faster than O(n^2).
In this problem, you need to modify the KMP algorithm to get the desired result.
Just need to modify the KMP
public static void KMPsearch(char[] text,char[] pattern)
{
int[] cache = buildPrefix(pattern);
int i=0,j=0;
while(i<text.length && j<pattern.length)
{
if(text[i]==pattern[j])
{System.out.println("Yes");
return;}
else{
if(j>0)
j = cache[j-1];
else
i++;
}
}
System.out.println("No");
return;
}
Understanding Knuth-Morris-Pratt Algorithm
There are two concepts involved in solving this question.
-Understanding that a single character is a valid substring.
-Deducing that we only need to know that the two strings have a common substring — we don’t need to know what that substring is.
Thus, the key to solving this question is determining whether or not the two strings share a common character.
To do this, we create two sets, a and b, where each set contains the unique characters that appear in the string it’s named after.
Because sets 26 don’t store duplicate values, we know that the size of our sets will never exceed the letters of the English alphabet.
In addition, the small size of these sets makes finding the intersection very quick.
If the intersection of the two sets is empty, we print NO on a new line; if the intersection of the two sets is not empty, then we know that strings and share one or more common characters and we print YES on a new line.
In code, it may look something like this
import java.util.*;
public class Solution {
static Set<Character> a;
static Set<Character> b;
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
for(int i = 0; i < n; i++) {
a = new HashSet<Character>();
b = new HashSet<Character>();
for(char c : scan.next().toCharArray()) {
a.add(c);
}
for(char c : scan.next().toCharArray()) {
b.add(c);
}
// store the set intersection in set 'a'
a.retainAll(b);
System.out.println( (a.isEmpty()) ? "NO" : "YES" );
}
scan.close();
}
}
public String twoStrings(String sOne, String sTwo) {
if (sOne.equals(sTwo)) {
return "YES";
}
Set<Character> charSetOne = new HashSet<Character>();
for (Character c : sOne.toCharArray())
charSetOne.add(c);
Set<Character> charSetTwo = new HashSet<Character>();
for (Character c : sTwo.toCharArray())
charSetTwo.add(c);
charSetOne.retainAll(charSetTwo);
if (charSetOne.size() > 0) {
return "YES";
}
return "NO";
}
This must work. Tested with some large inputs.
Python3
def twoStrings(s1, s2):
flag = False
for x in s1:
if x in s2:
flag = True
if flag == True:
return "YES"
else:
return "NO"
if __name__ == '__main__':
q = 2
text = [("hello","world"), ("hi","world")]
for q_itr in range(q):
s1 = text[q_itr][0]
s2 = text[q_itr][1]
result = twoStrings(s1, s2)
print(result)
static String twoStrings(String s1, String s2) {
for (Character ch : s1.toCharArray()) {
if (s2.indexOf(ch) > -1)
return "YES";
}
return "NO";
}
I have 2 strings of pattern a.{var1}.{var2} and b.{var1}.{var2}.
Two Strings are matching if var1 in first string is same as var1 in second string, as well as var2 in first string is same as var2 in second string.
The variables can be any order like a.{var1}.{var2} and b.{var2}.{var1}.
How do I match the two strings efficiently?
Example 1:
String pattern1 = "1.{var1}";
String pattern2 = "2.{var1}";
//Match True = (1.111,2.111)
//Match False = (1.121,2.111)
Example 2:
String pattern1 = "1.{var1}.{var2}";
String pattern2 = "2.{var1}.{var2}";
//Match True = (1.11.22,2.11.22)
//Match False = (1.11.22,2.111.22)
Example 3:
String pattern1 = "1.{var1}.{var2}";
String pattern2 = "2.{var2}.{var1}";
//Match True = (1.22.11,2.11.22)
//Match False = (1.11.22,2.111.22)
So whats the best way to match these 2 strings?
I want to match these 2 strings to find if they are related with the pattern mentioned.
Extending this problem to a set of strings i.e Set A strings has to be matched with strings in Set B. Finally pairs of strings have to be formed which satisfy this matching algorithm. The pattern will remain the same when matching for all strings in Set A to Set B.
This might not be the most efficient way of doing this, but it does give you the expected output.
01/05: Code updated after an error pointed out by Ole in the comments::
private boolean compareStr(String a, String b) {
ArrayList<String> aList = new
ArrayList<String>(Arrays.asList(a.split("\\.")));
ArrayList<String> bList = new ArrayList<String>(Arrays.asList(b.split("\\.")));
bList.remove(0);
aList.remove(0);
if(aList.size() != bList.size())
return false;
boolean aMatchFlag = false;
for(int i=0; i< aList.size(); i++){
if (!bList.contains(aList.get(i))) {
return false;
}
}
aMatchFlag = true;
System.out.println("All elements of A are present in B");
boolean bMatchFlag = false;
for(int i=0; i< bList.size(); i++){
if (!aList.contains(bList.get(i))) {
return false;
}
}
bMatchFlag = true;
System.out.println("All elements of B are present in A");
if(aMatchFlag && bMatchFlag)
return true;
else
return false;
}
For those also looking for the performance of the code
Input:1.11.11, 2.11.11.11
Compilation time: 1.45 sec, absolute running time: 0.24 sec, cpu time: 0.26 sec, memory peak: 18 Mb, absolute service time: 1,7 sec
Input:1.11.11, 2.11.22
Compilation time: 1.25 sec, absolute running time: 0.24 sec, cpu time: 0.23 sec, memory peak: 18 Mb, absolute service time: 1,49 sec
Input:1.11.2, 2.11.22
Compilation time: 1.34 sec, absolute running time: 0.24 sec, cpu time: 0.24 sec, memory peak: 18 Mb, absolute service time: 1,58 sec
Input:1.11.2, 2.11.111
Compilation time: 1.65 sec, absolute running time: 0.28 sec, cpu time: 0.32 sec, memory peak: 18 Mb, absolute service time: 1,94 sec
You can use following String class methods:
boolean regionMatches(int toffset, String other, int ooffset, int len)
Tests whether the specified region of this string matches the specified region of the String argument.
Region is of length len and begins at the index toffset for this string and ooffset for the other string.
For ignoring case:
boolean regionMatches(boolean ignoreCase, int toffset, String other, int ooffset, int len)
More information : https://docs.oracle.com/javase/tutorial/java/data/comparestrings.html
Or try to create a Regex pattern dynamically from one string and compare with other ...though not an efficient approach
I suppose following:
string[] arr1 = pattern1.split
string[] arr2 = pattern2.split
int hash1 = arr1[0].hashCode() + arr1[1].hashCode();
int hash2 = arr2[0].hashCode() + arr2[1].hashCode();
if(hash1 = hash2)=> pattern1 == pattern2
Remove the patterns from the String, extract the vars from the String by splitting around the dot (assuming your vars has no dots inside), put them in a Set (Sets don't retain the order and hence automatically solve the problem you have with ignoring the position), Check the equality of the Sets.
Running demo: https://ideone.com/5MwOHC
Example code:
final static String pattern1head = "blablabla.";
final static String pattern2head = "yada yada.";
private static Set<String> extractVars(String v){
if (v.startsWith(pattern1head)) { v = v.replace(pattern1head,""); }
else if (v.startsWith(pattern2head)) { v = v.replace(pattern2head,""); }
else { return null; }
return new HashSet<String>(Arrays.asList(v.split("\\.")));
}
private static void checkEquality(String v1, String v2) {
System.out.println("\n"+v1+" == "+v2+" ? " + extractVars(v1).equals(extractVars(v2)));
}
public static void main (String[] args) throws java.lang.Exception {
String v1 = "blablabla.123.456";
String v2 = "yada yada.123.456";
String v3 = "yada yada.456.123";
String v4 = "yada yada.123.456789";
checkEquality(v1,v2);
checkEquality(v1,v3);
checkEquality(v1,v4);
checkEquality(v2,v3);
checkEquality(v2,v4);
}
Output:
blablabla.123.456 == yada yada.123.456 ? true
blablabla.123.456 == yada yada.456.123 ? true
blablabla.123.456 == yada yada.123.456789 ? false
yada yada.123.456 == yada yada.456.123 ? true
yada yada.123.456 == yada yada.123.456789 ? false
This can be done as follows:
While we check if the first string and the first pattern match, we extract a map of the values in the string corresponding to the placeholders (var1, var2, ...) in the pattern;
While we check if the second string and the second pattern match, we also check the second string against the values of the placeholders.
This is interesting, because the map placeholder - > value is computed once for a couple (first string, first pattern),
and can be used to check every couple (second string, second pattern).
Translation in the code: create an object of type PatternMatcher from (first string, first pattern). This object will contain a map valueByPlaceHolder
used to check other couples.
Here are the relevant parts of the code.
Check if string and pattern match + creation of the map:
private static Optional<Map<String, String>> extractValueByPlaceHolder(
String[] sChunks, String[] patternChunks) {
// string and pattern should have the same length
if (sChunks.length != patternChunks.length)
return Optional.empty();
Map<String, String> valueByPlaceHolder = new HashMap<>(sChunks.length);
for (int i = 0; i < patternChunks.length; i++) {
String patternChunk = patternChunks[i];
String sChunk = sChunks[i];
if (isAPlaceHolder(patternChunk)) { // first char = {, last char = }
valueByPlaceHolder.put(patternChunk, sChunk); // just get the value
} else if (!patternChunk.equals(sChunk)) {
// if it's not a placeholder, the chunks should be the same in the string
// and the pattern
return Optional.empty();
}
}
return Optional.of(valueByPlaceHolder);
}
Check if other string and otherpattern match + comparison with first (string, pattern) couple:
public boolean check(String[] otherChunks, String[] otherPatternChunks) {
// other string and other pattern should have the same length, other string and string too
if (otherChunks.length != this.chunks_length || otherChunks.length != otherPatternChunks.length)
return false;
for (int i = 0; i < otherChunks.length; i++) {
String otherPatternChunk = otherPatternChunks[i];
String otherChunk = otherChunks[i];
// get the value from the first string if a it's placeholder, else keep the pattern chunk
String expectedChunk = this.valueByPlaceHolder
.getOrDefault(otherPatternChunk, otherPatternChunk);
// the chunk is neither equal to the value of the placeholder, nor to the chunk of the pattern
if (!expectedChunk.equals(otherChunk))
return false;
}
return true;
}
Use String.split() and then String.equals() on the resulting array elements, handling your three cases separately.
After splitting, first check that both the resulting arrays have the same length (if not they don’t match). Also use String.equals() for checking that the first element is "1" and "2" if this is required. Then branch on whether the length is 2 or 3. If length is 2, check that the match is as in your example 1; again use String.equals() on the array elements. If length is 3, you need to check both orders of the variable parts in accordance with your two examples 2 and 3.
Remember that the argument to String.split() is a regular expression, and that the dot has a special meaning in regular expressions. So you need to use .split("\\."), not .split(".").
It should run pretty fast too. However, don’t start optimizing until you know you need better performance. Readability is king.
Edit: I present my own solution:
public static boolean match(String s1, String s2) {
String[] a1 = s1.split("\\.", 4);
String[] a2 = s2.split("\\.", 4);
if (a1.length != a2.length) {
return false;
}
if (a1[0].equals("1") && a2[0].equals("2")) {
if (a1.length == 2) {
return a1[1].equals(a2[1]);
} else if (a1.length == 3) {
return (a1[1].equals(a2[1]) && a1[2].equals(a2[2]))
|| (a1[1].equals(a2[2]) && a1[2].equals(a2[1]));
}
}
return false;
}
Trying it with the 6 examples from the question:
System.out.println("(1.111,2.111) " + match("1.111", "2.111"));
System.out.println("(1.121,2.111) " + match("1.121", "2.111"));
System.out.println("(1.11.22,2.11.22) " + match("1.11.22", "2.11.22"));
System.out.println("(1.11.22,2.111.22) " + match("1.11.22", "2.111.22"));
System.out.println("(1.22.11,2.11.22) " + match("1.22.11", "2.11.22"));
System.out.println("(1.11.22,2.111.22) " + match("1.11.22", "2.111.22"));
This prints:
(1.111,2.111) true
(1.121,2.111) false
(1.11.22,2.11.22) true
(1.11.22,2.111.22) false
(1.22.11,2.11.22) true
(1.11.22,2.111.22) false
So recently I got invited to this google foo.bar challenge and I believe the code runs the way it should be. To be precise what I need to find is the number of occurrences of "abc" in a String. When I verify my code with them, I pass 3/10 test cases. I'm starting to feel bad because I don't know what I am doing wrong. I have written the code which I will share with you guys. Also the string needs to be less than 200 characters. When I run this from their website, I pass 3 tests and fail 7. Basically 7 things need to be right.
The actual question:
Write a function called answer(s) that, given a non-empty string less
than 200 characters in length describing the sequence of M&Ms. returns the maximum number of equal parts that can be cut from the cake without leaving any leftovers.
Example : Input : (string) s = "abccbaabccba"
output : (int) 2
Input: (string) s = "abcabcabcabc"
output : (int) 4
public static int answer(String s) {
int counter = 0;
int index;
String findWord ="ABC";
if(s!=null && s.length()<200){
s = s.toUpperCase();
while (s.contains(findWord))
{
index = s.indexOf(findWord);
s = s.substring(index + findWord.length(), s.length());
counter++;
}
}
return counter;
}
I see a couple of things in your code snippet:
1.
if(s.length()<200){
Why are you checking for the length to be lesser than 200? Is that a requirement? If not, you can skip checking the length.
2.
String findWord ="abc";
...
s.contains(findWord)
Can the test program be checking for upper case alphabets? Example: "ABC"? If so, you might need to consider changing your logic for the s.contains() line.
Update:
You should also consider putting a null check for the input string. This will ensure that the test cases will not fail for null inputs.
The logic of your code is well but on the other hand i found that you didn't check for if input string is empty or null.
I belief that google foo.bar wants to see the logic and the way of coding in a proper manner.
so don't be feel bad
I would go for a simpler approach
int beforeLen = s.length ();
String after = s.replace (findWord, "");
int afterLen = after.length ();
return (beforeLen - afterLen) / findWord.length ();
String pattern = "abc";
String line="<input text here>";
int i=0;
Pattern TokenPattern=Pattern.compile(pattern);
if(line!=null){
Matcher m=TokenPattern.matcher(line);
while(m.find()){
i++;
}}
System.out.println("No of occurences : "+ " "+i);
put declaration of index out before while block, isn't never good re-declare the same variable n time.
int index;
while (s.contains(findWord))
{
index = s.indexOf(findWord);
....
}
I hope this help
Update:
try to compact your code
public static int answer(String s) {
int counter = 0;
int index;
String findWord = "ABC";
if (s != null && s.length() < 200) {
s = s.toUpperCase();
while ((index = s.indexOf(findWord)) > -1) {
s = s.substring(index + findWord.length(), s.length());
counter++;
}
}
return counter;
}
Update:
The logic seems good to me, I'm still try to improve the performance, if you can try this
while ((index = s.indexOf(findWord, index)) > -1) {
//s = s.substring(index + findWord.length(), s.length());
index+=findWord.length();
counter++;
}
I'm using codingbat.com to get some java practice in. One of the String problems, 'withoutString' is as follows:
Given two strings, base and remove, return a version of the base string where all instances of the remove string have been removed (not case sensitive).
You may assume that the remove string is length 1 or more. Remove only non-overlapping instances, so with "xxx" removing "xx" leaves "x".
This problem can be found at: http://codingbat.com/prob/p192570
As you can see from the the dropbox-linked screenshot below, all of the runs pass except for three and a final one called "other tests." The thing is, even though they are marked as incorrect, my output matches exactly the expected output for the correct answer.
Here's a screenshot of my output:
And here's the code I'm using:
public String withoutString(String base, String remove) {
String result = "";
int i = 0;
for(; i < base.length()-remove.length();){
if(!(base.substring(i,i+remove.length()).equalsIgnoreCase(remove))){
result = result + base.substring(i,i+1);
i++;
}
else{
i = i + remove.length();
}
if(result.startsWith(" ")) result = result.substring(1);
if(result.endsWith(" ") && base.substring(i,i+1).equals(" ")) result = result.substring(0,result.length()-1);
}
if(base.length()-i <= remove.length() && !(base.substring(i).equalsIgnoreCase(remove))){
result = result + base.substring(i);
}
return result;
}
Your solution IS failing AND there is a display bug in coding bat.
The correct output should be:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours is:
withoutString("This is a FISH", "IS") -> "Th a FH"
Yours fails because it is removing spaces, but also, coding bat does not display the correct expected and run output string due to HTML removing extra spaces.
This recursive solution passes all tests:
public String withoutString(String base, String remove) {
int remIdx = base.toLowerCase().indexOf(remove.toLowerCase());
if (remIdx == -1)
return base;
return base.substring(0, remIdx ) +
withoutString(base.substring(remIdx + remove.length()) , remove);
}
Here is an example of an optimal iterative solution. It has more code than the recursive solution but is faster since far fewer function calls are made.
public String withoutString(String base, String remove) {
int remIdx = 0;
int remLen = remove.length();
remove = remove.toLowerCase();
while (true) {
remIdx = base.toLowerCase().indexOf(remove);
if (remIdx == -1)
break;
base = base.substring(0, remIdx) + base.substring(remIdx + remLen);
}
return base;
}
I just ran your code in an IDE. It compiles correctly and matches all tests shown on codingbat. There must be some bug with codingbat's test cases.
If you are curious, this problem can be solved with a single line of code:
public String withoutString(String base, String remove) {
return base.replaceAll("(?i)" + remove, ""); //String#replaceAll(String, String) with case insensitive regex.
}
Regex explaination:
The first argument taken by String#replaceAll(String, String) is what is known as a Regular Expression or "regex" for short.
Regex is a powerful tool to perform pattern matching within Strings. In this case, the regular expression being used is (assuming that remove is equal to IS):
(?i)IS
This particular expression has two parts: (?i) and IS.
IS matches the string "IS" exactly, nothing more, nothing less.
(?i) is simply a flag to tell the regex engine to ignore case.
With (?i)IS, all of: IS, Is, iS and is will be matched.
As an addition, this is (almost) equivalent to the regular expressions: (IS|Is|iS|is), (I|i)(S|s) and [Ii][Ss].
EDIT
Turns out that your output is not correct and is failing as expected. See: dansalmo's answer.
public String withoutString(String base, String remove) {
String temp = base.replaceAll(remove, "");
String temp2 = temp.replaceAll(remove.toLowerCase(), "");
return temp2.replaceAll(remove.toUpperCase(), "");
}
Please find below my solution
public String withoutString(String base, String remove) {
final int rLen=remove.length();
final int bLen=base.length();
String op="";
for(int i = 0; i < bLen;)
{
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
i +=rLen;
continue;
}
op += base.substring(i, i + 1);
i++;
}
return op;
}
Something things go really weird on codingBat this is just one of them.
I am adding to a previous solution, but using a StringBuilder for better practice. Most credit goes to Anirudh.
public String withoutString(String base, String remove) {
//create a constant integer the size of remove.length();
final int rLen=remove.length();
//create a constant integer the size of base.length();
final int bLen=base.length();
//Create an empty string;
StringBuilder op = new StringBuilder();
//Create the for loop.
for(int i = 0; i < bLen;)
{
//if the remove string lenght we are looking for is not less than the base length
// and the base substring equals the remove string.
if(!(i + rLen > bLen) && base.substring(i, i + rLen).equalsIgnoreCase(remove))
{
//Increment by the remove length, and skip adding it to the string.
i +=rLen;
continue;
}
//else, we add the character at i to the string builder.
op.append(base.charAt(i));
//and increment by one.
i++;
}
//We return the string.
return op.toString();
}
Taylor's solution is the most efficient one, however I have another solution that is a naive one and it works.
public String withoutString(String base, String remove) {
String returnString = base;
while(returnString.toLowerCase().indexOf(remove.toLowerCase())!=-1){
int start = returnString.toLowerCase().indexOf(remove.toLowerCase());
int end = remove.length();
returnString = returnString.substring(0, start) + returnString.substring(start+end);
}
return returnString;
}
#Daemon
your code works. Thanks for the regex explanation. Though dansalmo pointed out that codingbat is displaying the intended output incorrectly, I through in some extra lines to your code to unnecessarily account for the double spaces with the following:
public String withoutString(String base, String remove){
String result = base.replaceAll("(?i)" + remove, "");
for(int i = 0; i < result.length()-1;){
if(result.substring(i,i+2).equals(" ")){
result = result.replace(result.substring(i,i+2), " ");
}
else i++;
}
if(result.startsWith(" ")) result = result.substring(1);
return result;
}
public String withoutString(String base, String remove){
return base.replace(remove,"");
}
String a="(Yeahhhh) I have finally made it to the (top)";
Given above String, there are 4 of '(' and ')' altogether.
My idea of counting that is by utilizing String.charAt method. However, this method is rather slow as I have to perform this counting for each string for at least 10000 times due to the nature of my project.
Anyone has any better idea or suggestion than using .chartAt method?????
Sorry for not explaining clearly earlier on, what I meant for the 10000 times is for the 10000 sentences to be analyzed which is the above String a as only one sentence.
StringUtils.countMatches(wholeString, searchedString) (from commons-lang)
searchedString may be one-char - "("
It (as noted in the comments) is calling charAt(..) multiple times. However, what is the complexity? Well, its O(n) - charAt(..) has complexity O(1), so I don't understand why do you find it slow.
Sounds like homework, so I'll try to keep it at the "nudge in the right direction".
What if you removed all characters NOT the character you are looking for, and look at the length of that string?
There is a String method that will help you with this.
You can use toCharArray() once and iterate over that. It might be faster.
Why do you need to do this 10000 times per String? Why don't you simply remember the result of the first time? This would save a lot more than speeding up a single counting.
You can achieve this by following method.
This method would return a map with key as the character and value as its occurence in input string.
Map countMap = new HashMap();
public void updateCountMap(String inStr, Map<Character, Integer> countMap)
{
char[] chars = inStr.toCharArray();
for(int i=0;i<chars.length;i++)
{
if(!countMap.containsKey(chars[i]))
{
countMap.put(chars[i], 1);
}
countMap.put(chars[i] ,countMap.get(chars[i])+1);
}
return countMap;
}
What we can do is read the file line by line and calling the above method for every line. Each time the map would keep adding the values(number of occurences) for characters. Thus, the Character array size would never be too long and we achieve what we need.
Advantage:
Single iteration over the input string's characters.
Character array size never grows to high limits.
Result map contains occurences for each character.
Cheers
You could do that with Regular Expressions:
Pattern pattern = Pattern.compile("[\\(\\)]"); //Pattern says either '(' or ')'
Matcher matcher = pattern.matcher("(Yeahhhh) I have finally made it to the (top)");
int count = 0;
while (matcher.find()) { //call find until nothing is found anymore
count++;
}
System.out.println("count "+count);
The Pro is, that the Patterns are very flexible. You could also search for embraced words: "\\(\\w+\\)" (A '(' followed by one or more word characters, followed by ')')
The Con is, that it may be like breaking a fly on the wheel for very simple cases
See the Javadoc of Pattern for more details on Regular Expressions
I tested the following methods for 10M strings to count "," symbol.
// split a string by ","
public static int nof1(String s)
{
int n = 0;
if (s.indexOf(',') > -1)
n = s.split(",").length - 1;
return n;
} // end method nof1
// count "," using char[]
public static int nof2(String s)
{
char[] C = s.toCharArray();
int n = 0;
for (char c : C)
{
if (c == ',')
n++;
} // end for c
return n;
} // end method nof2
// replace "," and calculate difference in length
public static int nof3(String s)
{
String s2 = s.replaceAll(",", "");
return s.length() - s2.length();
} // end method nof3
// count "," using charAt
public static int nof4(String s)
{
int n = 0;
for(int i = 0; i < s.length(); i++)
{
if (',' == s.charAt(i) )
n++;
} // end for i
return n;
} // end method nof4
// count "," using Pattern
public static int nof5(String s)
{
// Pattern pattern = Pattern.compile(","); // compiled outside the method
Matcher matcher = pattern.matcher(s);
int n = 0;
while (matcher.find() )
{
n++;
}
return n;
} // end method nof5
The results:
nof1: 4538 ms
nof2: 474 ms
nof3: 4357 ms
nof4: 357 ms
nof5: 1780 ms
So, charAt is the fastest one. BTW, grep -o ',' | wc -l took 7402 ms.