Java 8 Streams Remove Duplicate Letter

Java 8 Streams Remove Duplicate Letter - java

I'm trying to apply my knowledge of streams to some leetcode algorithm questions. Here is a general summary of the question:
Given a string which contains only lowercase letters, remove duplicate
letters so that every letter appears once and only once. You must make
sure your result is the smallest in lexicographical order among all
possible results.
Example:
Input: "bcabc"
Output: "abc"
Another example:
Input: "cbacdcbc"
Output: "acdb"
This seemed like a simple problem, just stream the values into a new list from the string, sort the values, find the distinct values, and then throw it back into a list, and append the list's value to a string. Here is what I came up with:
public String removeDuplicateLetters(String s)
{
char[] c = s.toCharArray();
List<Character> list = new ArrayList<>();
for(char ch : c)
{
list.add(ch);
}
List<Character> newVal = list.stream().distinct().collect(Collectors.toList());
String newStr = "";
for(char ch : newVal)
{
newStr += ch;
}
return newStr;
}
The first example is working perfectly, but instead of "acdb" for the second output, I'm getting "abcd". Why would abcd not be the least lexicographical order? Thanks!

As I had pointed out in the comments using a LinkedHashSet would be best here, but for the Streams practice you could do this:
public static String removeDuplicateLetters(String s) {
return s.chars().sorted().distinct().collect(
StringBuilder::new,
StringBuilder::appendCodePoint,
StringBuilder::append
).toString();
}
Note: distinct() comes after sorted() in order to optimize the stream, see Holger's explanation in the comments as well as this answer.
Lot of different things here so I'll number them:
You can stream the characters of a String using String#chars() instead of making a List where you add all the characters.
To ensure that the resulting string is smallest in lexographical order, we can sort the IntStream.
We can convert the IntStream back to a String by performing a mutable reduction with a StringBuilder. We then convert this StringBuilder to our desired string.
A mutable reduction is the Stream way of doing the equivalent of something like:
for (char ch : newVal) {
newStr += ch;
}
However, this has the added benefit of using a StringBuilder for concatenation instead of a String. See this answer as to why this is more performant.
For the actual question you have about the conflict of expected vs. observed output: I believe abcd is the right answer for the second output, since it is the smallest in lexographical order.

public static void main(String[] args) {
String string = "cbacdcbc";
string.chars()
.mapToObj(item -> (char) item)
.collect(Collectors.toSet()).forEach(System.out::print);
}
the output：abcd，hope help you!

Related

Convert string representation of a list to a list in Java

Is there any easy, one (not too complicated) line for this?
If we have List<String> li1 = Arrays.asList("a","b");, then li1.toString() will yield [a, b]. So how to convert string [a, b] to a list of strings?
Connected with it is the question why produced string isn't ["a","b"]? Because from string representations we cannot distinguish Arrays.asList("1","2","3") from Arrays.asList(1,2,3). And also it's not possible to parse produced string with e.g. Jackson with objectMapper.readValue(li1.toString(), String[].class);, which would be that one line solution.

So how to convert string [a, b] to a list of strings?
It is not intended for this purpose.
Connected with it is the question why produced string isn't ["a","b"]? Because from string representations we cannot distinguish Arrays.asList("1","2","3") from Arrays.asList(1,2,3).
This has probably to tdo with the following: Arrays.toString() just concatenates the string representations of the items with , separator and puts [] around it. So the real answer is "because string "1" and integer 1 have the same representation". And this, again, has to do with the fact that both should be printable as they are. E. g., Python differentiates between __str__() and __repr()__, Java has only .toString().
But as the .toString() outputs are (apart from certain special cases) only made for debugging, it should not matter.
And also it's not possible to parse produced string with e.g. Jackson with objectMapper.readValue(li1.toString(), String[].class);, which would be that one line solution.
If you want to do that, there are other solutions which are far better suitable for sertializtion and deserialization.
Formats for serialization are
binary
XML
JSON
all of which are made of a pair of unambiguous serialized data creation and deserialization which does the opposite.

There are some ways how to do it, but I don't think there is any method how to make it directly.
And Its also not a good idea to deal with toString() as you are trying (it can be overriden, eg.)
One of possible solutions can be like follows:
public static List<String> getListFromListStringArray(String inString){
List<String> returnList = null;
System.out.println("in: " + inString);
/*
* Will remove leading and trailing [ ]
* [a, b] => a, b
*/
inString = inString.replaceAll("[\\[\\]]", "");
/*
* Will create List from String array by delimiter (, ) in this case
*/
returnList = Arrays.asList(inString.split(", "));
//[a, b]
System.out.println(returnList.toString());
return returnList;
}

Don't use the toString for serialization /deserialization.
Instead of that you should convert it into string as below:
String str = objectMapper.writeValueAsString(list);
// To convert it back into string array
objectMapper.readValue(str, String[].class);

As the other answer suggested, it's not a good idea to do that. But to still give an answer to your question:
how to convert string [a, b] to a list of strings?
String s = "[a, b]";
//remove first and last character and create array of Strings splitting around the character ','
String[] strings = s.substring(1, s.length() - 1).split(",");
//Now since every element except the first one has an extra leading space, remove it
for (int i = 1; i < strings.length; i++) {
strings[i] = strings[i].substring(1);
}
List<String> list = Arrays.asList(strings);
System.out.println(list);
Note that if one of your strings in the list contains a comma, this will not work, but there is no way to handle this case.
Or you could use org.springframework.util.StringUtils.commaDelimitedListToStringArray but it will keep the leading spaces.

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

I want to write a function that takes string as a parameter and returns a number corresponding to that string.
Integer hashfunction(String a)
{
//logic
}
Actually the question im solving is as follows :
Given an array of strings, return all groups of strings that are anagrams. Represent a group by a list of integers representing the index in the original list.
Input : cat dog god tca
Output : [[1, 4], [2, 3]]
Here is my implementation :-
public class Solution {
Integer hashfunction(String a)
{
int i=0;int ans=0;
for(i=0;i<a.length();i++)
{
ans+=(int)(a.charAt(i));//Adding all ASCII values
}
return new Integer(ans);
}
**Obviously this approach is incorrect**
public ArrayList<ArrayList<Integer>> anagrams(final List<String> a) {
int i=0;
HashMap<String,Integer> hashtable=new HashMap<String,Integer>();
ArrayList<Integer> mylist=new ArrayList<Integer>();
ArrayList<ArrayList<Integer>> answer=new ArrayList<ArrayList<Integer>>();
if(a.size()==1)
{
mylist.add(new Integer(1));
answer.add(mylist);
return answer;
}
int j=1;
for(i=0;i<a.size()-1;i++)
{
hashtable.put(a.get(i),hashfunction(a.get(i)));
for(j=i+1;j<a.size();j++)
{
if(hashtable.containsValue(hashfunction(a.get(j))))
{
mylist.add(new Integer(i+1));
mylist.add(new Integer(j+1));
answer.add(mylist);
mylist.clear();
}
}
}
return answer;
}
}

Oh boy... there's quite a bit of stuff that's open for interpretation here. Case-sensitivity, locales, characters allowed/blacklisted... There are going to be a lot of ways to answer the general question. So, first, let me lay down a few assumptions:
Case doesn't matter. ("Rat" is an anagram of "Tar", even with the capital lettering.)
Locale is American English when it comes to the alphabet. (26 letters from A-Z. Compare this to Spanish, which has 28 IIRC, among which 'll' is considered a single letter and a potential consideration for Spanish anagrams!)
Whitespace is ignored in our definition of an anagram. ("arthas menethil" is an anagram of "trash in a helmet" even though the number of whitespaces is different.)
An empty string (null, 0-length, all white-space) has a "hash" (I prefer the term "digest", but a name is a name) of 1.
If you don't like any of those assumptions, you can modify them as you wish. Of course, that will result in the following algorithm being slightly different, but they're a good set of guidelines that will make the general algorithm relatively easy to understand and refactor if you wish.
Two strings are anagrams if they are exhaustively composed of the same set of characters and the same number of each included character. There's a lot of tools available in Java that makes this task fairly simple. We have String methods, Lists, Comparators, boxed primitives, and existing hashCode methods for... well, all of those. And we're going to use them to make our "hash" method.
private static int hashString(String s) {
if (s == null) return 0; // An empty/null string will return 0.
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
return charList.hashCode(); // See contract of java.util.List#haschCode
}
And voila; you have a method that can digest a string and produce an integer representing it, regardless of the order of the characters within. You can use this as the basis for determining whether two strings are anagrams of each other... but I wouldn't. You asked for a digest function that produces an Integer, but keep in mind that in java, an Integer is merely a 32-bit value. This method can only produce about 4.2-billion unique values, and there are a whole lot more than 4.2-billion strings you can throw at it. This method can produce collisions and give you nonsensical results. If that's a problem, you might want to consider using BigInteger instead.
private static BigInteger hashString(String s) {
BigInteger THIRTY_ONE = BigInteger.valueOf(31); // You should promote this to a class constant!
if (s == null) return BigInteger.ONE; // An empty/null string will return 1.
BigInteger r = BigInteger.ONE; // The value of r will be returned by this method
List<Character> charList = new ArrayList<>();
String lowercase = s.toLowerCase(); // This gets us around case sensitivity
for (int i = 0; i < lowercase.length(); i++) {
Character c = Character.valueOf(lowercase.charAt(i));
if (Character.isWhitespace(c)) continue; // spaces don't count
charList.add(c); // Note the character for future processing...
}
// Now we have a list of Characters... Sort it!
Collections.sort(charList);
// Calculate our bighash, similar to how java's List interface does.
for (Character c : charList) {
int charHash = c.hashCode();
r=r.multiply(THIRTY_ONE).add(BigInteger.valueOf(charHash));
}
return r;
}

You need a number that is the same for all strings made up of the same characters.
The String.hashCode method returns a number that is the same for all strings made up of the same characters in the same order.
If you can sort all words consistently (for example: alphabetically) then String.hashCode will return the same number for all anagrams.
return String.valueOf(Arrays.sort(inputString.toCharArray())).hashCode();
Note: this will work for all words that are anagrams (no false negatives) but it may not work for all words that are not anagrams (possibly false positives). This is highly unlikely for short words, but once you get to words that are hundreds of characters long, you will start encountering more than one set of anagrams with the same hash code.
Also note: this gives you the answer to the (title of the) question, but it isn't enough for the question you're solving. You need to figure out how to relate this number to an index in your original list.

concatenation of distinct substrings

question - Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated string. It is assured that given value of K will be valid i.e. there will be a Kth character
Input Format
First line will contain a number T i.e. number of test cases.
First line of each test case will contain a string containing characters (a−z) and second line will contain a number K.
Output Format
Print Kth character ( the string is 1 indexed )
Constraints
1≤T≤5
1≤length≤105
K will be an appropriate integer.
Sample Input #00
1
dbac
3
Sample Output #00
c
Explanation #00
The substrings when arranged in lexicographic order are as follows
a, ac, b, ba, bac, c, d, db, dba, dbac
On concatenating them, we get
aacbbabaccddbdbadbac
The third character in this string is c and hence the answer.
This is my code :
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution
{
public static void gen(String str,int k)
{
int i,c;ArrayList<String>al=new ArrayList<String>();
for(c=0;c<str.length();c++)
{
for(i=1;i<=str.length()-c;i++)
{
String sub = str.substring(c,c+i);
al.add(sub);
}
}
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);
String[] res = al.toArray(new String[al.size()]);
Arrays.sort(res);
StringBuilder sb= new StringBuilder();
for(String temp:res)
{
sb.append(temp);
}
String s = sb.toString();
System.out.println(s.charAt(k-1));
}
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
int t = Integer.parseInt(sc.nextLine());
while((t--)>0)
{
String str = sc.nextLine();
int k = Integer.parseInt(sc.nextLine());
gen(str,k);
}
}
}
This code worked well for small inputs like for above test case but for large input's it either times out or shows something like this i do understand that problem is with memory , any alternate method to do this question or anyway to reuse the same memory??
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.String.substring(String.java:1913)
at Solution.gen(Solution.java:19)
at Solution.main(Solution.java:54)

With the constraints you are given (up to 105 characters) you shouldn't be having out-of-memory problems. Perhaps you were testing with very big strings.
So in case you have, here are some places where you are wasting memory:
After you fill the set, you copy it to your list. This means two copies of the collection of substrings, while you are not going to use the set any more.
After you copy the list to an array, you now have three copies of the collection of substrings, although you are not going to use the list anymore.
Now you create a StringBuilder and put all the substrings into it. But it's not really interesting to know the entire concatenated string. We only need one character in it, so why put the concatenation in memory at all? In addition, in all the wasteful copies above, at least you didn't duplicate the substrings themselves. But now that you are appending them to the StringBuilder, you are creating a duplicate of them. And that's going to be a very long string.
And then you copy the StringBuilder's content to a new string by using toString(). This creates a copy of the very large concatenated string (which we already said we don't actually need).
You already got a sound advice of using a TreeSet and filling it directly rather than creating a list, a set, and a sorted list. The next step is to extract the correct character from that set without actually keeping the concatenated string around.
So, assuming your set is called set:
Iterator<String> iter = set.iterator();
int lengthSoFar = 0;
String str = null;
while ( lengthSoFar < k && iter.hasNext() ) {
str = iter.next(); // Got the next substring;
lengthSoFar += str.length();
}
// At this point we have the substring where we expect the k'th
// character to be.
System.out.println( str.charAt( k - lengthSoFar + str.length() - 1 );
Note that it will take the program longer to get to high values of k than low values, but generally it will be faster than building the whole concatenated string, because you'll stop as soon as you get to the correct substring.

You are running out of memory. You can increase the memory that the JVM is using by using starting the JVM with -Xms256m -Xmx1024 and you can try some optimizations.
public static void gen(String str, int k) {
int i, c;
//Adding directly to the Set prevents a larger list because you remove the duplicates
Set<String> set = new TreeSet<String>();
for (c = 0; c < str.length(); c++) {
for (i = 1; i <= str.length() - c; i++) {
String sub = str.substring(c, c + i);
set.add(sub);
}
}
//TreeSet already orders by the String comparator
StringBuilder sb = new StringBuilder();
for (String temp : set) {
sb.append(temp);
if(sb.length()>k){
break;
}
}
String s = sb.toString();
System.out.println(s.charAt(k - 1));
}
[EDIT] Added small performance boost. Try it to see if it gets faster or not, I did not look at the performance of StringBuilder.length() to see if it will improve or decrease.

How to print out all permutations of a string in Java

Given a string, I need to print out all permutations of the string. How should I do that? I have tried
for(int i = 0; i<word.length();i++)
{
for(int j='a';j<='z';j++){
word = word.charAt(i)+""+(char)j;
System.out.println(word);
}
}
Is there a good way about doing this?

I'm not 100% sure that I understand what you are trying to do. I'm going to go by your original wording of the question and your comment to #ErstwhileIII's answer, which make me think that it's not really "permutations" (i.e. rearrangement of the letters in the word) that you are looking for, but rather possible single-letter modifications (not sure what a better word for this would be either), like this:
Take a word like "hello" and print a list of all "versions" you can get by adding one "typo" to it:
hello -> aello, bello, cello, ..., zello, hallo, hbllo, hcllo, ..., hzllo, healo, heblo, ...
If that's indeed what you're looking for, the following code will do that for you pretty efficiently:
public void process(String word) {
// Convert word to array of letters
char[] letters = word.toCharArray();
// Run through all positions in the word
for (int pos=0; pos<letters.length; pos++) {
// Run through all letters for the current position
for (char letter='a'; letter<='z'; letter++) {
// Replace the letter
letters[pos] = letter;
// Re-create a string and print it out
System.out.println(new String(letters));
}
// Set the current letter back to what it was
letters[pos] = word.charAt(pos);
}
}

OH .. to print out all permutations of a string, consider your algorithm first. What is the definition of "all permutations" .. for example:
String "a" would have answer a only
String "ab" would have answer: ab, ba
String "abc" would have answer: abc acb, bca, bac, cba, cab
Reflect on the algorithm you would use (write it down in english) .. then translate to Java code
While not the most efficient, a recursive solution might be easiest to use (i.e. for a string of length n, go through each of the characters and follow that with the permutations of the string with that character removed).

EDIT: Ok... you changed your request. Permutations is a whole other story. I think this will help: Generating all permutations of a given string
Not sure what you are trying to do... Example 1 is to get the alphabet one letter next to another. Example 2 is to print whatever you gave us there as an example.
//Example 1
String word=""; //empty string
for(int i = 65; i<=90;i++){ //65-90 are the Ascii numbers for capital letters
word+=(char)i; //cast int to char
}
System.out.println(word);
//Example 2
String word="";
for (int i=65;i<=90;i++){
word+=(char)i+"rse";
if(i!=90){ //you don't want this at the end of your sentence i suppose :)
word+=", ";
}
}
System.out.println(word);

Splitting string N into N/X strings

I would like some guidance on how to split a string into N number of separate strings based on a arithmetical operation; for example string.length()/300.
I am aware of ways to do it with delimiters such as
testString.split(",");
but how does one uses greedy/reluctant/possessive quantifiers with the split method?
Update: As per request a similar example of what am looking to achieve;
String X = "32028783836295C75546F7272656E745C756E742E657865000032002E002E005C0"
Resulting in X/3 (more or less... done by hand)
X[0] = 32028783836295C75546F
X[1] = 6E745C756E742E6578650
x[2] = 65000032002E002E005C0
Dont worry about explaining how to put it into the array, I have no problem with that, only on how to split without using a delimiter, but an arithmetic operation

You could do that by splitting on (?<=\G.{5}) whereby the string aaaaabbbbbccccceeeeefff would be split into the following parts:
aaaaa
bbbbb
ccccc
eeeee
fff
The \G matches the (zero-width) position where the previous match occurred. Initially, \G starts at the beginning of the string. Note that by default the . meta char does not match line breaks, so if you want it to match every character, enable DOT-ALL: (?s)(?<=\G.{5}).
A demo:
class Main {
public static void main(String[] args) {
int N = 5;
String text = "aaaaabbbbbccccceeeeefff";
String[] tokens = text.split("(?<=\\G.{" + N + "})");
for(String t : tokens) {
System.out.println(t);
}
}
}
which can be tested online here: http://ideone.com/q6dVB
EDIT
Since you asked for documentation on regex, here are the specific tutorials for the topics the suggested regex contains:
\G, see: http://www.regular-expressions.info/continue.html
(?<=...), see: http://www.regular-expressions.info/lookaround.html
{...}, see: http://www.regular-expressions.info/repeat.html

If there's a fixed length that you want each String to be, you can use Guava's Splitter:
int length = string.length() / 300;
Iterable<String> splitStrings = Splitter.fixedLength(length).split(string);
Each String in splitStrings with the possible exception of the last will have a length of length. The last may have a length between 1 and length.
Note that unlike String.split, which first builds an ArrayList<String> and then uses toArray() on that to produce the final String[] result, Guava's Splitter is lazy and doesn't do anything with the input string when split is called. The actual splitting and returning of strings is done as you iterate through the resulting Iterable. This allows you to just iterate over the results without allocating a data structure and storing them all or to copy them into any kind of Collection you want without going through the intermediate ArrayList and String[]. Depending on what you want to do with the results, this can be considerably more efficient. It's also much more clear what you're doing than with a regex.

How about plain old String.substring? It's memory friendly (as it reuses the original char array).

well, I think this is probably as efficient a way to do this as any other.
int N=300;
int sublen = testString.length()/N;
String[] subs = new String[N];
for(int i=0; i<testString.length(); i+=sublen){
subs[i] = testString.substring(i,i+sublen);
}
You can do it faster if you need the items as a char[] array rather as individual Strings - depending on how you need to use the results - e.g. using testString.toCharArray()

Dunno, you'll probably need a method that takes string and int times and returns a list of strings. Pseudo code (haven't checked if it works or not):
public String[] splintInto(String splitString, int parts)
{
int dlength = splitString.length/parts
ArrayList<String> retVal = new ArrayList<String>()
for(i=0; i<splitString.length;i+=dlength)
{
retVal.add(splitString.substring(i,i+dlength)
}
return retVal.toArray()
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java 8 Streams Remove Duplicate Letter - java

public static void main(String[] args) { String string = "cbacdcbc"; string.chars() .mapToObj(item -> (char) item) .collect(Collectors.toSet()).forEach(System.out::print); } the output：abcd，hope help you!

Related

Convert string representation of a list to a list in Java

What should be the logic of hashfunction() in order to check that two strings are anagrams or not?

concatenation of distinct substrings

How to print out all permutations of a string in Java

Splitting string N into N/X strings

Categories

Resources