First non-repeating character in a stream

First non-repeating character in a stream - java

My answer to this question is as follows, but I want to know if I can use this code and what will be the complexity:
import java.util.LinkedHashMap;
import java.util.Map.Entry;
public class FirstNonRepeatingCharacterinAString {
private char firstNonRepeatingCharacter(String str) {
LinkedHashMap<Character, Integer> hash =
new LinkedHashMap<Character, Integer>();
for(int i = 0 ; i< str.length() ; i++)
{
if(hash.get(str.charAt(i))==null)
hash.put(str.charAt(i), 1);
else
hash.put(str.charAt(i), hash.get(str.charAt(i))+1);
}
System.out.println(hash.toString());
for(Entry<Character, Integer> c : hash.entrySet())
{
if(c.getValue() == 1)
return c.getKey();
}
return 0 ;
}
public static void main(String args[])
{
String str = "geeksforgeeks";
FirstNonRepeatingCharacterinAString obj =
new FirstNonRepeatingCharacterinAString();
char c = obj.firstNonRepeatingCharacter(str);
System.out.println(c);
}
}

Your question about whether you "can use this code" is a little ambiguous - if you wrote it, I'd think you can use it :)
As for the complexity, it is O(n) where n is the number of characters in the String. To count the number of occurrences, you must iterate over the entire String, plus iterate over them again to find the first one with a count of 1. In the worst case, you have no non-repeating characters, or the only non-repeating character is the last one. In either case, you have to iterate over the whole String once more. So it's O(n+n) = O(n).
EDIT
There is a bug in your code, by the way. Because you are using an insertion-order LinkedHashMap, each call to put(Character,Integer) results in a re-ordering of the underlying list. You should probably use a LinkedHashMap<Character,int[]> instead, and check for the presence of keys before putting. If they exist, then merely increment the value stored in the int[] to avoid re-ording the map by making another put call. Even so, the resulting list will be in reverse order from the way you iterate over it, so the first non-repeating character will be the last one you find when iterating over it whose value is 1. Alternatively, you could just iterate in reverse in your first for loop, then you avoid having to always go through the entire Entry set if the first non-repeating character comes sooner than the final character in the original String.

Related

ArrayList vs HashMap time complexity

The scenario is the following:
You have 2 strings (s1, s2) and want to check whether one is a permutation of the other so you generate all permutations of lets say s1 and store them and then iterate over and compare against s2 until either it's found or not.
Now, in this scenario, i am deliberating whether an ArrayList is better to use or a HashMap when considering strictly time complexity as i believe both have O(N) space complexity.
According to the javadocs, ArrayList has a search complexity of O(N) whereas HashMap is O(1). If this is the case, is there any reason to favor using ArrayList over HashMap here since HashMap would be faster?
The only potential downside i could think of is that your (k,v) pairs might be a bit weird if you did something like where the key = value, i.e. {k = "ABCD", v = "ABCD"}, etc..

As shown here:
import java.io.*;
import java.util.*;
class GFG{
static int NO_OF_CHARS = 256;
/* function to check whether two strings
are Permutation of each other */
static boolean arePermutation(char str1[], char str2[])
{
// Create 2 count arrays and initialize
// all values as 0
int count1[] = new int [NO_OF_CHARS];
Arrays.fill(count1, 0);
int count2[] = new int [NO_OF_CHARS];
Arrays.fill(count2, 0);
int i;
// For each character in input strings,
// increment count in the corresponding
// count array
for (i = 0; i <str1.length && i < str2.length ;
i++)
{
count1[str1[i]]++;
count2[str2[i]]++;
}
// If both strings are of different length.
// Removing this condition will make the program
// fail for strings like "aaca" and "aca"
if (str1.length != str2.length)
return false;
// Compare count arrays
for (i = 0; i < NO_OF_CHARS; i++)
if (count1[i] != count2[i])
return false;
return true;
}
/* Driver program to test to print printDups*/
public static void main(String args[])
{
char str1[] = ("geeksforgeeks").toCharArray();
char str2[] = ("forgeeksgeeks").toCharArray();
if ( arePermutation(str1, str2) )
System.out.println("Yes");
else
System.out.println("No");
}
}
// This code is contributed by Nikita Tiwari.
If you're glued to your implementation, use a HashSet, it still has O(1) lookup time, just without keys

You can use HashSet as you need only one parameter.

Counting Common Elements in a String Array

I am trying to make a program to count common elements occuring in all the Strings in a String[] array. I have the following:-
A master array and a flag array both of size 26
Now for each string: I am marking frequency 1 for each character that appears in the string without incrementing in flag array.
Now I am adding the values of flag array to corresponding values of master array
my code looks like this
for(String str : arr)
{
for(char ch : str.toCharArray())
{
flag[ch - 97] = 1;
master[ch - 97] =master[ch -97] + flag[ch - 97];
}
}
My plan is to finally count elements in the master array that have value equal to input string array's length. This count will represent the count of characters that are common to all the strings
But my code has a flaw.
if a String has duplicate elements for example, 'ball' (with 2 ls). The corresponding value of the element in master array is getting incremented again.
Which makes its value larger than what I wanted.
So this is what I did.
for(String str : arr)
{
newstr = ""; //to keep track of each character in the string
for(char ch : str.toCharArray())
{
int counter = 0;
for(int i = 0; i < newstr.length();i++)
{
char ch2 = newstr.charAt(i);
if (ch == ch2 )
{
counter = counter + 1; //if duplicate
break;
}
}
if(counter == 1)
{
break;
}
flag[ch - 97] = 1;
master[ch - 97] =master[ch -97] + flag[ch - 97];
newstr = newstr + ch;
}
}
Is this the right approach? or could this code be more optimized?

IMHO - "The right approach" is one you fully understand and can refactor at will. There are generally always multiple ways to approach solving any programming problem. Personally, I would approach (what I think is) the problem you are trying to solve in a manner more ideomatic to Java.
For the entire array of strings you are going to examine, every character in the first string you examine is in every string examined so far, so every character in that first string would go into a Map<Character, Integer> i.e. charCountMap.put(aChar, 1) For the second string and every string thereafter: If a character in the string under examination is in the map's keySet, then increment the associated Integer (increment that key's associated value) charCountMap.get(aChar)++. After examining every character in every string, then the keys in the keyset that map to Integers with values that match the original string array's length are exactly the characters that were found in every string.
So far, this proposed solution doesn't solve the repeating character problem you describe above. To solve that part, I think you need to keep a separate list of unique characters "seen so far" in the "string under examination" (and empty the list for every new string). You would check the "seen so far" list first, and skip all further processing of that character if found in "seen so far", only characters not "seen so far" (in this string under examination) would be checked against the map's keyset. example code
There is also a recursive approach to programming a solution to this problem, but I'll leave that fruit hanging low...

concatenation of distinct substrings

question - Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated string. It is assured that given value of K will be valid i.e. there will be a Kth character
Input Format
First line will contain a number T i.e. number of test cases.
First line of each test case will contain a string containing characters (a−z) and second line will contain a number K.
Output Format
Print Kth character ( the string is 1 indexed )
Constraints
1≤T≤5
1≤length≤105
K will be an appropriate integer.
Sample Input #00
1
dbac
3
Sample Output #00
c
Explanation #00
The substrings when arranged in lexicographic order are as follows
a, ac, b, ba, bac, c, d, db, dba, dbac
On concatenating them, we get
aacbbabaccddbdbadbac
The third character in this string is c and hence the answer.
This is my code :
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution
{
public static void gen(String str,int k)
{
int i,c;ArrayList<String>al=new ArrayList<String>();
for(c=0;c<str.length();c++)
{
for(i=1;i<=str.length()-c;i++)
{
String sub = str.substring(c,c+i);
al.add(sub);
}
}
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);
String[] res = al.toArray(new String[al.size()]);
Arrays.sort(res);
StringBuilder sb= new StringBuilder();
for(String temp:res)
{
sb.append(temp);
}
String s = sb.toString();
System.out.println(s.charAt(k-1));
}
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
int t = Integer.parseInt(sc.nextLine());
while((t--)>0)
{
String str = sc.nextLine();
int k = Integer.parseInt(sc.nextLine());
gen(str,k);
}
}
}
This code worked well for small inputs like for above test case but for large input's it either times out or shows something like this i do understand that problem is with memory , any alternate method to do this question or anyway to reuse the same memory??
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.String.substring(String.java:1913)
at Solution.gen(Solution.java:19)
at Solution.main(Solution.java:54)

With the constraints you are given (up to 105 characters) you shouldn't be having out-of-memory problems. Perhaps you were testing with very big strings.
So in case you have, here are some places where you are wasting memory:
After you fill the set, you copy it to your list. This means two copies of the collection of substrings, while you are not going to use the set any more.
After you copy the list to an array, you now have three copies of the collection of substrings, although you are not going to use the list anymore.
Now you create a StringBuilder and put all the substrings into it. But it's not really interesting to know the entire concatenated string. We only need one character in it, so why put the concatenation in memory at all? In addition, in all the wasteful copies above, at least you didn't duplicate the substrings themselves. But now that you are appending them to the StringBuilder, you are creating a duplicate of them. And that's going to be a very long string.
And then you copy the StringBuilder's content to a new string by using toString(). This creates a copy of the very large concatenated string (which we already said we don't actually need).
You already got a sound advice of using a TreeSet and filling it directly rather than creating a list, a set, and a sorted list. The next step is to extract the correct character from that set without actually keeping the concatenated string around.
So, assuming your set is called set:
Iterator<String> iter = set.iterator();
int lengthSoFar = 0;
String str = null;
while ( lengthSoFar < k && iter.hasNext() ) {
str = iter.next(); // Got the next substring;
lengthSoFar += str.length();
}
// At this point we have the substring where we expect the k'th
// character to be.
System.out.println( str.charAt( k - lengthSoFar + str.length() - 1 );
Note that it will take the program longer to get to high values of k than low values, but generally it will be faster than building the whole concatenated string, because you'll stop as soon as you get to the correct substring.

You are running out of memory. You can increase the memory that the JVM is using by using starting the JVM with -Xms256m -Xmx1024 and you can try some optimizations.
public static void gen(String str, int k) {
int i, c;
//Adding directly to the Set prevents a larger list because you remove the duplicates
Set<String> set = new TreeSet<String>();
for (c = 0; c < str.length(); c++) {
for (i = 1; i <= str.length() - c; i++) {
String sub = str.substring(c, c + i);
set.add(sub);
}
}
//TreeSet already orders by the String comparator
StringBuilder sb = new StringBuilder();
for (String temp : set) {
sb.append(temp);
if(sb.length()>k){
break;
}
}
String s = sb.toString();
System.out.println(s.charAt(k - 1));
}
[EDIT] Added small performance boost. Try it to see if it gets faster or not, I did not look at the performance of StringBuilder.length() to see if it will improve or decrease.

Most efficient way to find unique entries in a large data set

Before anything, I am making it clear that this is an assignment and I do not expect full coded answers. All I seek is advice and maybe snippets of code that helps me.
So, I am reading in about 900,000 words all stored in a arrayList. I need to count unique words using a sorted array (or arraylist) in java.
So far, I am simply looping over the given arrayList and use
Collections.sort(words);
and Collections.binarySearch(words, wordToLook); to achieve it like the following:
OrderedSet set = new OrderedSet();
for(String a : words){
if(!set.contains(a)){
set.add(a);
}
}
and
public boolean contains(String word) {
Collections.sort(uniqueWords);
int result = Collections.binarySearch(uniqueWords, word);
if(result<0){
return false;
}else{
return true;
}
}
This code has a running time of about 60 seconds but I was wondering if there is any better way to do this because running a sort every time an element is added seems very inefficient (but of couse necessary if I were to use binary search).
Any sort of feedback would be greatly appreciated. Thanks.

So, you are required to use a sorted array. That is ok, since you are (not yet) programming in the real world.
I will suggest two alternatives:
The first uses binary search (which you are using in your current code).
I would create a class that contains two fields: the word (a String) and the count for that word (an int). You will build a sorted array of these classes.
Start with an empty array and add to it as you read each word. For each word, do a binary search for the word in the array you are building. The search will either find the entry containing the word (and you will increment the count), or you will determine that the word is not yet in the array.
When your binary search ends without finding the word, you will create a new object to hold the word+count and add it to the array in the location where your search ended (be careful to make sure that your logic really puts it in the right spot to keep your list sorted). Of course, your count is set to 1 for new words.
Another alternative:
Read all of your words into a list and sort it. After sorting, all duplicates will be next to each other in the list.
You will walk down this sorted list once and create a list of word+count as you go. If the next word you see is the same as the last word+count, increment the count. If it is a new word, add a new word+count to your result list with count=1.

I would not use a sorted array. I would create a Map<String, Integer> where the key is your word and the value is the count of the number of occurrences of the word. As you read each word, do something like this:
Integer count = map.get(word);
if (count == null) {
count = 0;
}
map.put(word, count + 1);
Then just iterate over the map's entry set and do whatever you need to do with the counts.
If you know, or can estimate, the number of unique words then you should use this number in the HashMap constructor (so you don't grow the map many times).
If you use a sorted array, your run time cannot be better than proportional to NlogN (where N is the number of words in your list). If you use a HashMap, you can achieve a runtime that grows linearly with N (you save yourself the factor of logN).
Another advantage of using a Map is the memory used is proportional to the number of unique words, rather than the total number of words (assuming that you build the map while reading the words, rather than reading all words into a collection and then adding them to the map).

public static int countUnique(array) {
if(array.length == 0) return 0;
int count = 1;
for i from 1 to array.length - 1 {
if(!array[i].equals(array[i - 1])) count++;
}
return count;
}
This is a O(N) algorithm in pseudocode for counting the number of unique entries in a sorted array. The idea behind it is that we count the number of transitions between groups of equal elements. Then, the number of unique entries is the number of transitions plus one (for the first entry).
Hopefully you see how to apply this algorithm to your array after the elements are sorted.

You could always use comparator to get unique values.
List newList = new ArrayList(new Comparator() {
#Override
public int compare(words o1, words o2) {
if(o1.equalsIgnoreCase(o2)){
return 0;
}
return 1;
}
});
Now count:
words - newList = no. of repeated values.
Hope this helps!!!!

Count occurrences of each unique character

How to find the number of occurrence of every unique character in a String? You can use at most one loop. please post your solution, thanks.

Since this sounds like a homework problem, let's try to go over how to solve this problem by hand. Once we do that, let's see how we can try to implement that in code.
What needs to be done?
Let's take the following string:
it is nice and sunny today.
In order to get a count of how many times each character appears in the above string, we should:
Iterate over each character of the string
Keep a tally of how many times each character in the string appears
How would we actually try it?
Doing this this by hand might be like this:
First, we find a new characeter i, so we could note that in a table and say that i appeared 1 time so far:
'i' -> 1
Second, we find another new character t, so we could add that in the above table:
'i' -> 1
't' -> 1
Third, a space, and repeat again...
'i' -> 1
't' -> 1
' ' -> 1
Fourth, we encounter an i which happens to exist in the table already. So, we'll want to retrieve the existing count, and replace it with the existing count + 1:
'i' -> 2
't' -> 1
' ' -> 1
And so on.
How to translate into code?
Translating the above to code, we may write something like this:
For every character in the string
Check to see if the character has already been encountered
If no, then remember the new character and say we encountered it once
If yes, then take the number of times it has been encountered, and increment it by one
For the implementation, as others have mentioned, using a loop and a Map could achieve what is needed.
The loop (such as a for or while loop) could be used to iterate over the characters in the string.
The Map (such as a HashMap) could be used to keep track of how many times a character has appeared. In this case, the key would be the character and the value would be the count for how many times the character appears.
Good luck!

It's a homework, so cannot post the code, but here is one approach:
Iterate through the string, char by char.
Put the char in a hashmap key and initialize its value to 1 (count). Now, if the char is encountered again, update the value (count+1). Else add the new char to key and again set its value (count=1)

Here you go! I have done a rough program on Count occurrences of each unique character
public class CountUniqueChars{
public static void main(String args[]){
HashMap<Character, Integer> map;
ArrayList<HashMap<Character, Integer>> list = new ArrayList<HashMap<Character,Integer>>();
int i;
int x = 0;
Boolean fire = false;
String str = "Hello world";
str = str.replaceAll("\\s", "").toLowerCase();
System.out.println(str.length());
for(i=0; i<str.length() ; i++){
if(list.size() <= 0){
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
list.add(map);
}else{
map = new HashMap<Character, Integer>();
map.put(str.charAt(i), 1);
fire = false;
for (HashMap<Character, Integer> t : list){
if(t.containsKey(str.charAt(i)) == map.containsKey(str.charAt(i))){
x = list.indexOf(t);
fire = true;
map.put(str.charAt(i), t.get(str.charAt(i))+1);
}
}
if(fire){
list.remove(x);
}
list.add(map);
}
}
System.out.println(list);
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.