Reduced String - compress

Reduced String - compress - java

I am practicing Strings programming examples. i would like to reduce the given strings. it should eliminate a character if its in even numbers
example: Input - aaabbc, Output should be: ac
I have used HashMap to count and store character and count value and computing using value % 2 then continue or else print the output. But some of the test cases are failing in Hackerrank. Could you please help me identify the problem?
static String super_reduced_string(String s){
HashMap<Character, Integer> charCount = new HashMap<Character, Integer>();
StringBuilder output = new StringBuilder();
if (s == null || s.isEmpty()) {
return "Empty String";
}
char[] arr = s.toCharArray();
for (int i = 0; i < s.length(); i++) {
char c = arr[i];
if (!charCount.containsKey(c)) {
charCount.put(c,1);
} else {
charCount.put(c,charCount.get(c)+1);
}
}
for (char c:charCount.keySet()) {
if (charCount.get(c) % 2 != 0) {
output.append(c);
}
}
return output.toString();
}

The problem lies in how you are selecting the output. HashSets and HashMaps do not have any ordering associated with them. In your test case, the output could be either ac OR ca .
To solve this, you can do a variety of things. The quickest way I see is to take your orignal string, lets say s, and call
s.replace(c,"")
for ever char you need to remove.
I doubt this is anywhere near the most, or even mildly, efficient way to solve this, but it should work.

Related

Comparing array items, index out of bound

I have a piece of code and I'm a bit confused how to deal with my issue so, please review method below. I was trying to search for a solution but unfortunately none of them fit my needs so I am looking for an advice here. The method is taking a String and removing duplicated characters so for example - input: ABBCDEF should return ABCDEF, but when entering i+1 in the last iteration I got IndexOutOfBound Exception, so I can iterate until string.length-1 but then I loose the last element, what is the SMARTEST solution in your opinion, thanks.
public String removeDuplicates(String source){
if(source.length() < 2){
return source;
}
StringBuilder noDuplicates = new StringBuilder();
char[] string = source.toCharArray();
for(int i = 0; i < string.length-1; i++){
if(string[i] != string[i+1]){
noDuplicates.append(string[i]);
}
}
return noDuplicates.toString();
}

You could do this like so: append the first character in source, and then only append subsequent characters if they are not equal to the previously-appended character.
if (source.isEmpty()) {
return source; // Or "", it doesn't really matter.
}
StringBuilder sb = new StringBuilder();
sb.append(source.charAt(0));
for (int i = 1; i < source.length(); ++i) {
char c = source.charAt(i);
if (c != sb.charAt(sb.length() - 1)) {
sb.append(c);
}
}
return sb.toString();
But if you wanted to do this more concisely, you could do it with regex:
return source.replaceAll("(.)\\1+", "$1");

You could simply append the last character after the loop:
public String removeDuplicates(String source){
...
noDuplicates.append(string[string.length - 1]);
return noDuplicates.toString();
}

You have a simple logic error:
You make your string to a char array.
That is fine, but the length property of any array will show you the
human way of counting if someting is in it.
If there is 1 element the length will be 1
2 -> 2
3 -> 3
etc.
You get the idea.
So when you go string[i + 1] you go one character to far.
You could just change the abort condition to
i < = string.length - 2
Or you could write a string iterator, to be able to access the next element, but
that seems like overkill for this example

This is just what LinkedHashSet was made for! Under the hood it's a HashSet with an iterator to keep track of insertion order, so you can remove duplicates by adding to the set, then reconstruct the string with guaranteed ordering.
public static String removeDuplicates(String source) {
Set<String> dupeSet = new LinkedHashSet<>();
for (Character v : source.toCharArray()) {
dupeSet.add(v.toString());
}
return String.join("", dupeSet);
}

If you wish to remove all repeating characters regardless of their position in the given String you might want to consider using the chars() method which provides a IntStream of the chars and that has the distinct() method to filter out repeating values. You can then put them back together with a StringBuilder like so:
public class RemoveDuplicatesTest {
public static void main(String[] args) {
String value = "ABBCDEFE";
System.out.println("No Duplicates: " + removeDuplicates(value));
}
public static String removeDuplicates(String value) {
StringBuilder result = new StringBuilder();
value.chars().distinct().forEach(c -> result.append((char) c));
return result.toString();
}
}

Why does using arrays instead of string give less memory consumption and time executing?

Given the string in the form of char array. Modify it the way that all the exclamation point symbols '!' are shifted to the start of the array, and all ohters are in the same order. Please write a method with a single argument of type char[]. Focus on either memory and time consumption of alghoritm.
Feedback that i've received: it was possible to use working with arrays instead of strings. Where can i find info about memory?
public static String formatString(char[] chars) {
StringBuilder exclamationSymbols = new StringBuilder();
StringBuilder otherSymbols = new StringBuilder();
for (char c : chars) {
if (c == '!') {
exclamationSymbols.append(c);
} else {
otherSymbols.append(c);
}
}
return (exclamationSymbols.toString() + otherSymbols.toString());
}

You can do this faster using a char[] than a StringBuilder because:
a StringBuilder is just a wrapper around a char[], so there's no way it can be faster. The indirection means it will be slower.
you know exactly how long the result will be, so you can allocate the minimum-sized char[] that you'll need. With a StringBuilder, you can pre-size it, but with two StringBuilders you can't exactly, so you either have to over-allocate the length (e.g. make both the same length as chars) or rely on StringBuilder resizing itself internally (which will be slower than not; and it uses moer memory).
My idea would be to use two integer pointers to point to the next position that you'll write a char to in the string: one starts at the start of the array, the other starts at the end; as you work your way through the input, the two pointers will move closer together.
Once you've processed the entire input, the portion of the result array corresponding to the "end pointer" will be backwards, so reverse it.
You can do it like this:
char[] newChars = new char[chars.length];
int left = 0;
int right = chars.length;
for (char c : chars) {
if (c == '!') {
newChars[left++] = c;
} else {
newChars[--right] = c;
}
}
// Reverse the "otherSymbols".
for (int i = right, j = newChars.length - 1; i < j; ++i, --j) {
char tmp = newChars[i];
newChars[i] = newChars[j];
newChars[j] = tmp;
}
return new String(newChars);
Ideone demo

how to compare two strings to find common substring

i get termination due to timeout error when i compile. Please help me
Given two strings, determine if they share a common substring. A substring may be as small as one character.
For example, the words "a", "and", "art" share the common substring "a" . The words "be" and "cat" do not share a substring.
Input Format
The first line contains a single integer , the number of test cases.
The following pairs of lines are as follows:
The first line contains string s1 .
The second line contains string s2 .
Output Format
For each pair of strings, return YES or NO.
my code in java
public static void main(String args[])
{
String s1,s2;
int n;
Scanner s= new Scanner(System.in);
n=s.nextInt();
while(n>0)
{
int flag = 0;
s1=s.next();
s2=s.next();
for(int i=0;i<s1.length();i++)
{
for(int j=i;j<s2.length();j++)
{
if(s1.charAt(i)==s2.charAt(j))
{
flag=1;
}
}
}
if(flag==1)
{
System.out.println("YES");
}
else
{
System.out.println("NO");
}
n--;
}
}
}
any tips?

Below is my approach to get through the same HackerRank challenge described above
static String twoStrings(String s1, String s2) {
String result="NO";
Set<Character> set1 = new HashSet<Character>();
for (char s : s1.toCharArray()){
set1.add(s);
}
for(int i=0;i<s2.length();i++){
if(set1.contains(s2.charAt(i))){
result = "YES";
break;
}
}
return result;
}
It passed all the Test cases without a time out issue.

The reason for the timeout is probably: to compare two strings that each are 1.000.000 characters long, your code needs 1.000.000 * 1.000.000 comparisons, always.
There is a faster algorithm that only needs 2 * 1.000.000 comparisons. You should use the faster algorithm instead. Its basic idea is:
for each character in s1: add the character to a set (this is the first million)
for each character in s2: test whether the set from step 1 contains the character, and if so, return "yes" immediately (this is the second million)
Java already provides a BitSet data type that does all you need. It is used like this:
BitSet seenInS1 = new BitSet();
seenInS1.set('x');
seenInS1.get('x');

Since you're worried about execution time, if they give you an expected range of characters (for example 'a' to 'z'), you can solve it very efficiently like this:
import java.util.Arrays;
import java.util.Scanner;
public class Whatever {
final static char HIGHEST_CHAR = 'z'; // Use Character.MAX_VALUE if unsure.
public static void main(final String[] args) {
final Scanner scanner = new Scanner(System.in);
final boolean[] characterSeen = new boolean[HIGHEST_CHAR + 1];
mainloop:
for (int word = Integer.parseInt(scanner.nextLine()); word > 0; word--) {
Arrays.fill(characterSeen, false);
final String word1 = scanner.nextLine();
for (int i = 0; i < word1.length(); i++) {
characterSeen[word1.charAt(i)] = true;
}
final String word2 = scanner.nextLine();
for (int i = 0; i < word2.length(); i++) {
if (characterSeen[word2.charAt(i)]) {
System.out.println("YES");
continue mainloop;
}
}
System.out.println("NO");
}
}
}
The code was tested to work with a few inputs.
This uses a fast array rather than slower sets, and it only creates one non-String object (other than the Scanner) for the entire run of the program. It also runs in O(n) time rather than O(n²) time.
The only thing faster than an array might be the BitSet Roland Illig mentioned.
If you wanted to go completely overboard, you could also potentially speed it up by:
skipping the creation of a Scanner and all those String objects by using System.in.read(buffer) directly with a reusable byte[] buffer
skipping the standard process of having to spend time checking for and properly handling negative numbers and invalid inputs on the first line by making your own very fast int parser that just assumes it's getting the digits of a valid nonnegative int followed by a newline

There are different approaches to solve this problem but solving this problem in linear time is a bit tricky.
Still, this problem can be solved in linear time. Just apply KMP algorithm in a trickier way.
Let's say you have 2 strings. Find the length of both strings first. Say length of string 1 is bigger than string 2. Make string 1 as your text and string 2 as your pattern. If the length of the string is n and length of the pattern is m then time complexity of the above problem would be O(m+n) which is way faster than O(n^2).
In this problem, you need to modify the KMP algorithm to get the desired result.
Just need to modify the KMP
public static void KMPsearch(char[] text,char[] pattern)
{
int[] cache = buildPrefix(pattern);
int i=0,j=0;
while(i<text.length && j<pattern.length)
{
if(text[i]==pattern[j])
{System.out.println("Yes");
return;}
else{
if(j>0)
j = cache[j-1];
else
i++;
}
}
System.out.println("No");
return;
}
Understanding Knuth-Morris-Pratt Algorithm

There are two concepts involved in solving this question.
-Understanding that a single character is a valid substring.
-Deducing that we only need to know that the two strings have a common substring — we don’t need to know what that substring is.
Thus, the key to solving this question is determining whether or not the two strings share a common character.
To do this, we create two sets, a and b, where each set contains the unique characters that appear in the string it’s named after.
Because sets 26 don’t store duplicate values, we know that the size of our sets will never exceed the letters of the English alphabet.
In addition, the small size of these sets makes finding the intersection very quick.
If the intersection of the two sets is empty, we print NO on a new line; if the intersection of the two sets is not empty, then we know that strings and share one or more common characters and we print YES on a new line.
In code, it may look something like this
import java.util.*;
public class Solution {
static Set<Character> a;
static Set<Character> b;
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
for(int i = 0; i < n; i++) {
a = new HashSet<Character>();
b = new HashSet<Character>();
for(char c : scan.next().toCharArray()) {
a.add(c);
}
for(char c : scan.next().toCharArray()) {
b.add(c);
}
// store the set intersection in set 'a'
a.retainAll(b);
System.out.println( (a.isEmpty()) ? "NO" : "YES" );
}
scan.close();
}
}

public String twoStrings(String sOne, String sTwo) {
if (sOne.equals(sTwo)) {
return "YES";
}
Set<Character> charSetOne = new HashSet<Character>();
for (Character c : sOne.toCharArray())
charSetOne.add(c);
Set<Character> charSetTwo = new HashSet<Character>();
for (Character c : sTwo.toCharArray())
charSetTwo.add(c);
charSetOne.retainAll(charSetTwo);
if (charSetOne.size() > 0) {
return "YES";
}
return "NO";
}
This must work. Tested with some large inputs.

Python3
def twoStrings(s1, s2):
flag = False
for x in s1:
if x in s2:
flag = True
if flag == True:
return "YES"
else:
return "NO"
if __name__ == '__main__':
q = 2
text = [("hello","world"), ("hi","world")]
for q_itr in range(q):
s1 = text[q_itr][0]
s2 = text[q_itr][1]
result = twoStrings(s1, s2)
print(result)

static String twoStrings(String s1, String s2) {
for (Character ch : s1.toCharArray()) {
if (s2.indexOf(ch) > -1)
return "YES";
}
return "NO";
}

Detecting duplicates in a file generated using the sliding window concept

I am working on a project where I have to parse a text file and divide the strings into substrings of a length that the user specifies. Then I need to detect the duplicates in the results.
So the original file would look like this:
ORIGIN
1 gatccaccca tctcggtctc ccaaagtgct aggattgcag gcctgagcca ccgcgcccag
61 ctgccttgtg cttttaatcc cagcactttc agaggccaag gcaggcgatc agctgaggtc
121 aggagttcaa gaccagcctg gccaacatgg tgaaacccca tctctaatac aaatacaaaa
181 aaaaaacaaa aaacgttagc caggaatgag gcccggtgct tgtaatccta aggaaggaga
241 ccaccactcc tcctgctgcc cttcccttcc ccacaccgct tccttagttt ataaaacagg
301 gaaaaaggga gaaagcaaaa agcttaaaaa aaaaaaaaaa cagaagtaag ataaatagct
I loop over the file and generate a line of the strings then use line.toCharArray() to slide over the resulting line and divide according to the user specification. So if the substrings are of length 4 the result would look like this:
GATC
ATCC
TCCA
CCAC
CACC
ACCC
CCCA
CCAT
CATC
ATCT
TCTC
CTCG
TCGG
CGGT
GGTC
GTCT
TCTC
CTCC
TCCC
CCCA
CCAA
Here is my code for splitting:
try {
scanner = new Scanner(toSplit);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length - (k - 1); i++) {
String s = "";
for(int j = i; j < i + k; j++) {
s += chars[j];
}
if (!s.contains("N")) {
System.out.println(s);
}
}
}
}
My question is: given that the input file can be huge, how can I detect duplicates in the results?

If You want to check duplicates a Set would be a good choice to hold and test data. Please tell in which context You want to detect the duplicates: words, lines or "output chars".

You can use a bloom filter or a table of hashes to detect possible duplicates and then make a second pass over the file to check if those "duplicate candidates" are true duplicates or not.
Example with hash tables:
// First we make a list of candidates so we count the times a hash is seen
int hashSpace = 65536;
int[] substringHashes = new int[hashSpace];
for (String s: tokens) {
substringHashes[s.hashCode % hashSpace]++; // inc
}
// Then we look for words that have a hash that seems to be repeated and actually see if they are repeated. We use a set but only of candidates so we save a lot of memory
Set<String> set = new HashSet<String>();
for (String s: tokens) {
if (substringHashes[s.hashCode % hashSpace] > 1) {
boolean repeated = !set.add(s);
if (repeated) {
// TODO whatever
}
}
}

You could do something like this:
Map<String, Integer> substringMap = new HashMap<>();
int index = 0;
Set<String> duplicates = new HashSet<>();
For each substring you pull out of the file, add it to substringMap only if it's not a duplicate (or if it is a duplicate, add it to duplicates):
if (substringMap.putIfAbsent(substring, index) == null) {
++index;
} else {
duplicates.add(substring);
}
You can then pull out all the substrings with ease:
String[] substringArray = new String[substringMap.size()];
for (Map.Entry<String, Integer> substringEntry : substringMap.entrySet()) {
substringArray[substringEntry.getValue()] = substringEntry.getKey();
}
And voila! An array of output in the original order with no duplicates, plus a set of all the substrings that were duplicates, with very nice performance.

Code optimization by chosing another datastructure

I have a piece of code that needs to be optimized.
for (int i = 0; i < wordLength; i++) {
for (int c = 0; c < alphabetLength; c++) {
if (alphabet[c] != x.word.charAt(i)) {
String res = WordList.Contains(x.word.substring(0,i) +
alphabet[c] +
x.word.substring(i+1));
if (res != null && WordList.MarkAsUsedIfUnused(res)) {
WordRec wr = new WordRec(res, x);
if (IsGoal(res)) return wr;
q.Put(wr);
}
}
}
Words are represented by string. The problem is that the code on line 4-6 creates to many string objects, because strings are immutable.
Which data structure should I change my word representation to, if I want to get faster code ? I have tried to change it to char[], but then I have problem with getting the following code work:
x.word.substring(0,i)
How to get subarray from a char[] ? And how to concatenate the char and char[] on line 4.6 ?
Is there any other suitable and mutable datastrucure that I can use ? I have thought of stringbuffer but can't find suitable operations on stringbuffers.
This function generates, given a specific word, all the word that differs by one character.
WordRec is just a class with a string representing a word, and a pointer to the "father" of that word.
Thanks in advance

You can reduce number of objects by using this approach:
StringBuilder tmp = new StringBuilder(wordLength);
tmp.append(x.word);
for (int i=...) {
for (int c=...) {
if (...) {
char old = tmp.charAt(i);
tmp.setCharAt(i, alphabet[c]);
String res = tmp.toString();
tmp.setCharAt(i, old);
...
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Reduced String - compress - java

Related

Comparing array items, index out of bound

Why does using arrays instead of string give less memory consumption and time executing?

how to compare two strings to find common substring

Detecting duplicates in a file generated using the sliding window concept

Code optimization by chosing another datastructure

Categories

Resources