Not able to understand the code to Count Duplicates in a string? - java

This program finds the count of duplicates in a string.
Example 1:
Input:
"abbdde"
Output:
2
Explanation:
"b" and "d" are the two duplicates.
Example 2:
Input:
"eefggghii22"
Output:
3
Explanation:
duplicates are "e", "g", and "2".
Help me with this code.
public class CountingDuplicates {
public static int duplicateCount(String str1) {
// Write your code here
int c = 0;
str1 = str1.toLowerCase();
final int MAX_CHARS = 256;
int ctr[] = new int[MAX_CHARS];
countCharacters(str1, ctr);
for (int i = 0; i < MAX_CHARS; i++) {
if(ctr[i] > 1) {
// System.out.printf("%c appears %d times\n", i, ctr[i]);
c = ctr[i];
}
}
return c;
}
static void countCharacters(String str1, int[] ctr)
{
for (int i = 0; i < str1.length(); i++)
ctr[str1.charAt(i)]++;
}
}

You need to maintain a count and if the value of that character exceeds 1, you need to increment the count.
Return that count to know the count of duplicates.
Added comments to understand the code better.
public class CountingDuplicates {
public static int duplicateCount(String str1) {
// Initialised integer to count the duplicates
int count = 0;
// Converting a string to lowercase to count lowerCase and Uppercase as duplicates
str1 = str1.toLowerCase();
// According to ASCII, the Maximum number of characters is 256,
// So, initialized an array of size 256 to maintain the count of those characters.
final int MAX_CHARS = 256;
int ctr[] = new int[MAX_CHARS];
countCharacters(str1, ctr);
for (int i = 0; i < MAX_CHARS; i++) {
if(ctr[i] > 1) {
// System.out.printf("%c appears %d times\n", i, ctr[i]);
count = count + 1;
}
}
return count;
}
static void countCharacters(String str1, int[] ctr)
{
for (int i = 0; i < str1.length(); i++)
ctr[str1.charAt(i)]++;
}
}

In short it is counting the number of characters appearing in the String str and saving it in ctr array.
How? ctr is the array that has a length of 256. So it can have 256 values (0-255 indexed). str1 is the string that contains the String. charAt(i) method returns the character at index i. Because String acts like an array where you can access each char a index values of an array.
Now assuming your input will always ASCII characters, each ASCII chars contain a value of 0-255 (i.e. ASCII value 'a' is 97). ++ after any variable means adding 1 to that. i.e.c++ means c = c+1
Now coming to the loop, ctr[str1.charAt(i)]++;, you can see the loops starts from 0 and ends at the length of the String str where 0 is the first value str. So if value of 0 indexed value (first value) of the String str is a, str.charAt(0) would return 97(well actually it will return 'a' but java takes the ASCII value). so the line actually is (for 0 th index) ctr[97]++; so it's incrementing the value of the 97th index (which is initially 0) by 1. So now the value is 1.
Like this way it will only increment the index values that matches with the ASCII values of the character in the String, thus counting the amount of time the characters occur.

Related

How to obtain the length of the last word in the string

I have reversed the string and have a for loop to iterate through the reversed string.
I am counting characters and I know I have a logic flaw, but I cannot pinpoint why I am having this issue.
The solution needs to return the length of the last word in the string.
My first thought was to iterate through the string backward (I don't know why I decided to create a new string, I should have just iterated through it by decrementing my for loop from the end of the string).
But the logic should be the same from that point for my second for loop.
My logic is basically to try to count characters that aren't whitespace in the last word, and then when the count variable has a value, as well as the next whitespace after the count has counted the characters of the last word.
class Solution {
public int lengthOfLastWord(String s) {
int count = 0;
int countWhite = 0;
char ch;
String reversed = "";
for(int i = 0; i < s.length(); i++) {
ch = s.charAt(i);
reversed += ch;
}
for(int i = 0; i < reversed.length(); i++) {
if(!Character.isWhitespace(reversed.charAt(i))) {
count++;
if(count > 1 && Character.isWhitespace(reversed.charAt(i)) == true) {
break;
}
}
}
return count;
}
}
Maybe try this,
public int lengthOfLastWord(String s) {
String [] arr = s.trim().split(" ");
return arr[arr.length-1].length();
}
Another option would be to use index of last space and calculate length from it:
public int lengthOfLastWord(String string) {
int whiteSpaceIndex = string.lastIndexOf(" ");
if (whiteSpaceIndex == -1) {
return string.length();
}
int lastIndex = string.length() - 1;
return lastIndex - whiteSpaceIndex;
}
String.lastIndexOf() finds the start index of the last occurence of the specified string. -1 means the string was not found, in which case we have a single word and length of the entire string is what we need. Otherwise means we have index of the last space and we can calculate last word length using lastIndexInWord - lastSpaceIndex.
There are lots of ways to achieve that. The most efficient approach is to determine the index of the last white space followed by a letter.
It could be done by iterating over indexes of the given string (reminder: String maintains an array of bytes internally) or simply by invoking method lastIndexOf().
Keeping in mind that the length of a string that could be encountered at runtime is limited to Integer.MAX_VALUE, it'll not be a performance-wise solution to allocate in memory an array, produced as a result of splitting of this lengthy string, when only the length of a single element is required.
The code below demonstrates how to address this problem with Stream IPA and a usual for loop.
The logic of the stream:
Create an IntStream that iterates over the indexes of the given string, starting from the last.
Discard all non-alphabetic symbols at the end of the string with dropWhile().
Then retain all letters until the first non-alphabetic symbol encountered by using takeWhile().
Get the count of element in the stream.
Stream-based solution:
public static int getLastWordLength(String source) {
return (int) IntStream.iterate(source.length() - 1, i -> i >= 0, i -> --i)
.map(source::charAt)
.dropWhile(ch -> !Character.isLetter(ch))
.takeWhile(Character::isLetter)
.count();
}
If your choice is a loop there's no need to reverse the string. You can start iteration from the last index, determine the values of the end and start and return the difference.
Just in case, if you need to reverse a string that is the most simple and efficient way:
new StringBuilder(source).reverse().toString();
Iterative solution:
public static int getLastWordLength(String source) {
int end = -1; // initialized with illegal index
int start = 0;
for (int i = source.length() - 1; i >= 0; i--) {
if (Character.isLetter(source.charAt(i)) && end == -1) {
end = i;
}
if (Character.isWhitespace(source.charAt(i)) && end != -1) {
start = i;
break;
}
}
return end == -1 ? 0 : end - start;
}
main()
public static void main(String[] args) {
System.out.println(getLastWord("Humpty Dumpty sat on a wall % _ (&)"));
}
output
4 - last word is "wall"
Firstly, as you have mentioned, your reverse string formed is just a copy of your original string. To rectify that,
for (int i = s.length() - 1; i >= 0; i--) {
ch = s.charAt(i);
reversed += ch;
}
Secondly, the second if condition is inside your first if condition. That is why, it will never break ( because you are first checking if character is whitespace, if it is, then you are not going inside the if statement, thus your second condition of your inner if loop will never be satisfied).
public class HW5 {
public static void main(String[] args) {
String s = "My name is Mathew";
int count = lengthOfLastWord(s);
System.out.println(count);
}
public static int lengthOfLastWord(String s) {
int count = 0;
int countWhite = 0;
char ch;
String reversed = "";
System.out.println("original string is----" + s);
for (int i = s.length() - 1; i >= 0; i--) {
ch = s.charAt(i);
reversed += ch;
}
System.out.println("reversed string is----" + reversed);
for (int i = 0; i < reversed.length(); i++) {
if (!Character.isWhitespace(reversed.charAt(i)))
count++;
if (count > 1 && Character.isWhitespace(reversed.charAt(i)) == true) {
break;
}
}
return count;
}
}
=
and the output is :
original string is----My name is Mathew
reversed string is----wehtaM si eman yM
6
Another way to go about is : you use the inbuilt function split which returns an array of string and then return the count of last string in the array.

Hash only LETTERS using Horner's method in java

I understand how Horner's method in hashing works but I am having issues hashing a string that may contain non-alphabetical characters and I want to hash just the alphabetical characters. I want to ignore non-alphabetical characters and hash just alphabetic characters
Here's is the code I have done for this but doesn't work entirely
private int hash(String key){
int constant = 27;
int lastHashValue = key.charAt(0); //convert the first char to ascii first
//because for each character we multiply the constant by the
// hash code by the constant.
for(int i = 1; i < key.length(); i++){
if( Character.isLetter(key.charAt(i)) ){ //checks if it is a letter
lastHashValue = (key.charAt(i) + (constant * lastHashValue) ) % array.length;
}
}
return lastHashValue;
}
Here is the issue I have: What if the first character is a non-alphabetic character. How do I ignore it? (knowing that we need to get the first character hash code to move to the next).
You can initialize lastHashValue to 0 and start looping at index 0.
int lastHashValue = 0;
for(int i = 0; i < key.length(); i++){
if( Character.isLetter(key.charAt(i)) ){ //checks if it is a letter
lastHashValue = (key.charAt(i) + (constant * lastHashValue) ) % array.length;
}
}

How count of characters getting stored in char array. Basically wanted to the working of increment operator on the array

public static void getCharCountArray(String str)
{
for (int i = 0; i < str.length(); i++)
{
count[str.charAt(i)]++;
}
}
How is count array getting the count of the character. Working of the increment operator?
count is an indexed array of integers.
Each index of this array is a char.
The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
In your loop, str.charAt(i) return the char of your String str for the current iteration.
You get the previous count for the current char of your String with the expression :
count[str.charAt(i)]
and you increment this value with the ++ operator.
we could rewrite your code like this :
for (int i = 0; i < str.length(); i++)
{
char currentChar = str.charAt(i);
int previousCharCount = count[currentChar];
int currentCharCount = previousCharCount + 1;
count[currentChar] = currentCharCount;
}
Your line : count[str.charAt(i)]++; does the same things, but in a simpler and more readable way.
The ++ operator, doesn't operate an increment on the array (non sense) but operates an increment on the integer value at the position char.

Java: Assign values to alphabet and determine value of a string

So I am trying to solve the problem in Java below. Could someone give me an idea of how to approach this? I can only think of using a bunch of confusing for-loops to split up the arr, go through the alphabet, and go through each string, and even then I am confused about strings versus chars. Any advice would be great.
--
Suppose the letter 'A' is worth 1, 'B' is worth 2, and so forth, with 'Z' worth 26. The value of a word is the sum of all the letter values in it. Given an array arr of words composed of capital letters, return the value of the watch with the largest value. You may assume that arr has length at least 1.
{"AAA","BBB","CCC"} => 9
{"AAAA","B","C"} => 4
{"Z"} => 26
{"",""} => 0
--
Here is what I have tried so far but I'm lost:
public static int largestValue(String[] arr){
String alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
int largest = 0;
int wordTotal=0;
for (int i = 0; i < arr.length; i++){
String[] parts = arr[i].split("");
if (wordTotal < largest){ //I don't think this is in the right place
largest = 0; }
for (int j = 0; j < alphabet.length(); j++){
for(int k = 0; k <parts.length; k++){
if ( alphabet.charAt(j) == parts[k].charAt(0) ){
wordTotal = 0;
wordTotal += alphabet.indexOf(alphabet.charAt(j))+1;
}
}
}
}
return largest;
}
I would start by breaking the problem into parts, the first step is summing one String. To calculate the sum you can iterate the characters, test if the character is between 'A' and 'Z' (although your requirements say your input is guaranteed to be valid), subtract 'A' (a char literal) from the character and add it to your sum. Something like,
static int sumString(final String str) {
int sum = 0;
for (char ch : str.toCharArray()) {
if (ch >= 'A' && ch <= 'Z') { // <-- validate input
sum += 1 + ch - 'A'; // <-- 'A' - 'A' == 0, 'B' - 'A' == 1, etc.
}
}
return sum;
}
Then you can iterate an array of String(s) to get the maximum sum; something like
static int maxString(String[] arr) {
int max = sumString(arr[0]);
for (int i = 1; i < arr.length; i++) {
max = Math.max(max, sumString(arr[i]));
}
return max;
}
or with Java 8+
static int maxString(String[] arr) {
return Stream.of(arr).mapToInt(x -> sumString(x)).max().getAsInt();
}
And, finally, validate the entire operation like
public static void main(String[] args) {
String[][] strings = { { "AAA", "BBB", "CCC" }, { "AAAA", "B", "C" },
{ "Z" }, { "", "" } };
for (String[] arr : strings) {
System.out.printf("%s => %d%n", Arrays.toString(arr), maxString(arr));
}
}
And I get
[AAA, BBB, CCC] => 9
[AAAA, B, C] => 4
[Z] => 26
[, ] => 0
I think it helps to take note of the two key parts here:
1: You need to be able to find the value of a single word, which is the sum of each letter
2: You need to find the value of all words, and find the largest
Since you need to go through each element (letter/character) in a string, and also each element (word) in the array, the problem really is set up for using 2 loops. I think part of the whole problem is making the for loops clear and concise, which is definitely doable. I don't want to give it away, but having a function that, given a word, returns the value of the word, will help. You could find the value of a word, see if its the largest so far, and repeat. Also, to find the value of a word, please do not use 26 if's (look up ASCII table instead!). Hope this gives you a better understanding without giving it away!

StringBuilder#appendCodePoint(int) behaves unexpectedly

java.lang.StringBuilder's appendCodePoint(...) method, to me, behaves in an unexpected manner.
For unicode code points above Character.MAX_VALUE (which will need 3 or 4 bytes to encode in UTF-8, which is my Eclipse workspace setting), it behaves strangely.
I append a String's Unicode code points one by one to a StringBuilder, but its output looks different in the end.
I suspect that a call to Character.toSurrogates(codePoint, value, count) in AbstractStringBuilder#appendCodePoint(...) causes this, but I don't know how to work around it.
My code:
// returns random string in range of unicode code points 0x2F800 to 0x2FA1F
// e.g. 槪𥥼報悔𦖨嘆汧犕尢𦔣洴真硎尢趼犀㠯弢卿𢛔芋玥峀䔫䩶莭型築𡷦𩐊
String s = getRandomChineseJapaneseKoreanStringCompatibilitySupplementOfMaxLength(length);
System.out.println(s);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < getCodePointCount(s); i++) {
sb.appendCodePoint(s.codePointAt(i));
}
// prints some of the CJK characters, but between them there is a '?'
// e.g. 槪?𥥼?報?悔?𦖨?嘆?汧?犕?尢?𦔣?洴?真?硎?尢?趼?
System.out.println(sb.toString());
// returns random string in range of unicode code points 0x20000 to 0x2A6DF
// e.g. 𤸥𤈍𪉷𪉔𤑺𡹋𠋴𨸁𦧖𣯠𨚾𣥷𪂶𦄃𧊈𤧘𢙕𪚋𤧒𥩛𧆞𨕌𣸑𡚊𥽚𡛳𣐸𩆟𩣞𥑡
s = getRandomChineseJapaneseKoreanStringExtensionBOfMaxLength(length);
// prints the CJK characters correctly
System.out.println(s);
sb = new StringBuilder();
for (int i = 0; i < getCodePointCount(s); i++) {
sb.appendCodePoint(s.codePointAt(i));
}
// prints some of the CJK characters, but between them there is a '?'
// e.g. 𤸥?𤈍?𪉷?𪉔?𤑺?𡹋?𠋴?𨸁?𦧖?𣯠?𨚾?𣥷?𪂶?𦄃?𧊈?
System.out.println(sb.toString());
With:
public static int getCodePointCount(String s) {
return s.codePointCount(0, s.length());
}
public static String getRandomChineseJapaneseKoreanStringExtensionBOfMaxLength(int length) {
return getRandomStringOfMaxLengthInRange(length, 0x20000, 0x2A6DF);
}
public static String getRandomChineseJapaneseKoreanStringCompatibilitySupplementOfMaxLength(int length) {
return getRandomStringOfMaxLengthInRange(length, 0x2F800, 0x2FA1F);
}
private static String getRandomStringOfMaxLengthInRange(int length, int from, int to) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++) {
// try to find a valid character MAX_TRIES times
for (int j = 0; j < MAX_TRIES; j++) {
int unicodeInt = from + random.nextInt(to - from);
if (Character.isValidCodePoint(unicodeInt) &&
(Character.isLetter(unicodeInt) || Character.isDigit(unicodeInt) ||
Character.isWhitespace(unicodeInt))) {
sb.appendCodePoint(unicodeInt);
break;
}
}
}
return new String(sb.toString().getBytes(), "UTF-8");
}
You're iterating over the code points incorrectly. You should use the strategy presented by Jonathan Feinberg here
final int length = s.length();
for (int offset = 0; offset < length; ) {
final int codepoint = s.codePointAt(offset);
// do something with the codepoint
offset += Character.charCount(codepoint);
}
or since Java 8
s.codePoints().forEach(/* do something */);
Note the Javadoc of String#codePointAt(int)
Returns the character (Unicode code point) at the specified index. The
index refers to char values (Unicode code units) and ranges from 0 to
length()- 1.
You were iterating from 0 to codePointCount. If the character is not a high-low surrogate pair, it's returned alone. In that case, your index should only increase by 1. Otherwise, it should be increased by 2 (Character#charCount(int) deals with this) as you're getting the codepoint corresponding to the pair.
Change your loops from this:
for (int i = 0; i < getCodePointCount(s); i++) {
to this:
for (int i = 0; i < getCodePointCount(s); i = s.offsetByCodePoints(i, 1)) {
In Java, a char is a single UTF-16 value. Supplemental codepoints take up two chars in a String.
But you are looping every single char in your String. This means that you are reading each supplemental codepoint twice: The first time, you are reading both of its UTF-16 surrogate chars; the second time, you are reading and appending just the low surrogate char.
Consider a string which contains only one codepoint, 0x2f8eb. A Java String representing that codepoint would actually contain this:
"\ud87e\udceb"
If you loop through each individual char index, then your loop would effectively do this:
sb.appendCodePoint(0x2f8eb); // codepoint found at index 0
sb.appendCodePoint(0xdceb); // codepoint found at index 1

Categories