how to find duplicate and unique string entries using Hashtable - java

Assume I'm taking input a string from command line and I want to find the duplicate and unique entries in the string by using Hashtable.
eg:
i/p:
hi hello bye hi good hello name hi day hi
o/p:
Unique elements are: bye, good, name, day
Duplicate elements are:
hi 3 times
hello 2 times

You can break the input apart by calling split(" ") on the input String. This will return a String[] representing each word. Iterate over this array, and use each String as the key into your Hashtable, with the value being an Integer. Each time you encounter a word, either increment its value, or set the value to 0 if no value is currently there.
Hashtable<String, Integer> hashtable = new Hashtable<String, Integer>();
String[] splitInput = input.split(" ");
for(String inputToken : splitInput) {
Integer val = hashtable.get(inputToken);
if(val == null) {
val = new Integer(0);
}
++val;
hashtable.put(inputToken, val);
}
Also, you may want to look into HashMap rather than Hashtable. HashMap is not thread safe, but is faster. Hashtable is a bit slower, but is thread safe. If you are trying to do this in a single thread, I would recommend HashMap.

Use a hashtable with string as key and a numeric type as counter.
Go through all the words and if they are not in the map, insert them; otherwise increase the count (the data part of the hashtable).
hth
Mario

you can convert each string into an integer. Then, use the generated integer as the hash value. To convert string to int, you can treat it as a base 256 number and then convert it

Related

Extracting Substrings from a List in Java

If I have a parent string (let's call it output) that contains a list of variable assignments like so ...
status.availability-state available
status.enabled-state enabled
status.status-reason The pool is available
And I want to extract the values of each variable in that list given the variable names, ie the substring after the space following status.availability-state, status.enabled-state, and status.status-reason, such that I end up with three different variable assignments making each of the following String comparisons true ...
String availability = output.substring(TODO);
String enabled = output.substring(TODO);
String reason = output.substring(TODO);
availability.equals("available");
enabled.equals("enabled");
reason.equals("The pool is available");
What is the simplest way to do this? Should I even use substring for this?
This is a little tricky because you need to assign the value to a specific variable - you can't just have a map of keys to variables in Java.
I would consider doing this with a switch:
for (String line : output.split('\n')) {
String[] frags = line.split(' ', 2); // Split the line in 2 at the space.
switch (frags[0]) { // This is the "key" of the variable.
case "status.availability-state":
availability = frags[1]; // This assigns the "value" to the relevant variable.
break;
case "status.enabled-state":
enabled = frags[1];
break;
// ... etc
}
}
It's not very pretty, but you don't have too many options.
There seem to be two questions here -- how to parse the string, and how to assign to variables by name.
Tackle the string parsing one step at a time:
first write a program to read one line at a time and output each one in the body of a loop. String.split() or StringTokenizer are two options here.
next enhance this by writing a method to handle one line. The same tools are helpful here, to split on spaces.
You should now have a program that can print name: status.availability-state, value: available for each line of input.
Next, you're asking to programatically assign to variables based on the name of the parameter.
There is no legitimate way to look at a variable's name at runtime (OK, Java 8 reflection has ways, but it shouldn't be used without very good reason).
So, the best you can do is to use a switch or if statement:
switch(name) {
case status.availability-state:
availability = value;
break;
... etc.
}
However, whenever you use switch or if you should think about whether there's a better way.
Is there any reason you can't turn these variables into Map entries?
configMap.add(name,value);
Then to read it:
doSomethingWith(configMap.get("status.availability");
That's what maps are for. Use them.
This is a similar situation to the rookie mistake of using variables called person1, person2, person3... instead of using an array. Eventually they ask "How do I go from the number 25 to my variable person25?" -- and the answer is, you can't, but an array or list makes it easy. people[number] or people.get(number)
A valid alternative is to split the string by \n and add to a Map. Example:
String properties = "status.availability-state available\nstatus.enabled-state enabled\nstatus.status-reason The pool is available";
Map<String, String> map = Arrays.stream(properties.split("\n"))
.collect(Collectors.toMap(s -> s.split(" ")[0], s -> s.split(" ", 2)[1]));
System.out.println(map.get("status.status-reason"));
Should output The pool is available
This loop will match and extract the variables, and you can then assign them as you see fit:
Pattern regex = Pattern.compile("status\\.(.*?)-.*? ([a-z]+)");
Matcher matcher = regex.matcher(output);
while (matcher.find()) {
System.out.println(matcher.group(1) + "=" + matcher.group(2));
}
status\\. matches "status."
(.*?) matches any sequence of characters but isn't greedy, and captures them
-.* matches dash, any chars, space
([a-z]+) matches any string of lower-case letters, and captures them
Here's one way to do it:
Map<String, String> properties = getProperties(propertiesString);
availability = properties.get("availability-state");
enabled = properties.get("enabled-state");
reason = properties.get("status-reason");
// ...
public void getProperties(String input) {
Map<String, String> properties = new HashMap<>();
String[] lines = output.split("\n");
for (String line : lines) {
String[] parts = line.split(" ");
int keyStartIndex = parts[0].indexOf(".") + 1;
int spaceIndex = parts[1].indexOf(" ");
string key = parts[0].substring(keyStartIndex, spaceIndex);
properties.put(key, parts[1]);
}
return properties;
}
This seems to be a bit more straight-forward, in terms of the code that's setting these values, as each value is set to exactly the value from the map, rather than iterating over some list of strings and seeing if it contains a particular value and doing different things based on that.
This is designed with the primary use-case being that the string is created at runtime in memory. If the properties are created in an external file, this code would still work (after creating the desired String in memory), but it may be a better idea to use either a Properties file, or perhaps a Scanner.

HashSet Java find words according to length

Write a method that takes two parameters (1) the original string and (2) the word length and
returns a new string that contains the words of the specified length from the original string without
any duplicates. Here is an example of program execution:
getWordsOfLengthN(“We are the best, are we ?”, 3)  “are the”
getWordsOfLengthN(“We are the best, are we ?”, 2)  “we”
Notice that the method considers does not differentiate between the upper and lower cases.
Hint:
 Tokenize string into an array of words
 Change all the words to be become lowercase
 Store all the words into a HashSet
 Retrieve all the items from the HashSet and store them into the resulting string
I am new to java, I am taking an online course so everything I do know is self taught. I'm not sure how to go about this method, it is really stumping me. Can anyone offer me some ideas? Thanks
public static String getWordsOfLengthN(String originalString, int wordLegth) {
Set<String> hashSet = new HashSet<String>();
String[] words = originalString.split(" ");
for (String word : words) {
if(word.length() == wordLegth) {
hashSet.add(word);
}
}
return hashSet.toString();
}

Most efficient way to find unique entries in a large data set

Before anything, I am making it clear that this is an assignment and I do not expect full coded answers. All I seek is advice and maybe snippets of code that helps me.
So, I am reading in about 900,000 words all stored in a arrayList. I need to count unique words using a sorted array (or arraylist) in java.
So far, I am simply looping over the given arrayList and use
Collections.sort(words);
and Collections.binarySearch(words, wordToLook); to achieve it like the following:
OrderedSet set = new OrderedSet();
for(String a : words){
if(!set.contains(a)){
set.add(a);
}
}
and
public boolean contains(String word) {
Collections.sort(uniqueWords);
int result = Collections.binarySearch(uniqueWords, word);
if(result<0){
return false;
}else{
return true;
}
}
This code has a running time of about 60 seconds but I was wondering if there is any better way to do this because running a sort every time an element is added seems very inefficient (but of couse necessary if I were to use binary search).
Any sort of feedback would be greatly appreciated. Thanks.
So, you are required to use a sorted array. That is ok, since you are (not yet) programming in the real world.
I will suggest two alternatives:
The first uses binary search (which you are using in your current code).
I would create a class that contains two fields: the word (a String) and the count for that word (an int). You will build a sorted array of these classes.
Start with an empty array and add to it as you read each word. For each word, do a binary search for the word in the array you are building. The search will either find the entry containing the word (and you will increment the count), or you will determine that the word is not yet in the array.
When your binary search ends without finding the word, you will create a new object to hold the word+count and add it to the array in the location where your search ended (be careful to make sure that your logic really puts it in the right spot to keep your list sorted). Of course, your count is set to 1 for new words.
Another alternative:
Read all of your words into a list and sort it. After sorting, all duplicates will be next to each other in the list.
You will walk down this sorted list once and create a list of word+count as you go. If the next word you see is the same as the last word+count, increment the count. If it is a new word, add a new word+count to your result list with count=1.
I would not use a sorted array. I would create a Map<String, Integer> where the key is your word and the value is the count of the number of occurrences of the word. As you read each word, do something like this:
Integer count = map.get(word);
if (count == null) {
count = 0;
}
map.put(word, count + 1);
Then just iterate over the map's entry set and do whatever you need to do with the counts.
If you know, or can estimate, the number of unique words then you should use this number in the HashMap constructor (so you don't grow the map many times).
If you use a sorted array, your run time cannot be better than proportional to NlogN (where N is the number of words in your list). If you use a HashMap, you can achieve a runtime that grows linearly with N (you save yourself the factor of logN).
Another advantage of using a Map is the memory used is proportional to the number of unique words, rather than the total number of words (assuming that you build the map while reading the words, rather than reading all words into a collection and then adding them to the map).
public static int countUnique(array) {
if(array.length == 0) return 0;
int count = 1;
for i from 1 to array.length - 1 {
if(!array[i].equals(array[i - 1])) count++;
}
return count;
}
This is a O(N) algorithm in pseudocode for counting the number of unique entries in a sorted array. The idea behind it is that we count the number of transitions between groups of equal elements. Then, the number of unique entries is the number of transitions plus one (for the first entry).
Hopefully you see how to apply this algorithm to your array after the elements are sorted.
You could always use comparator to get unique values.
List newList = new ArrayList(new Comparator() {
#Override
public int compare(words o1, words o2) {
if(o1.equalsIgnoreCase(o2)){
return 0;
}
return 1;
}
});
Now count:
words - newList = no. of repeated values.
Hope this helps!!!!

Count the number of numbers in String (Java) [duplicate]

This question already has answers here:
Java equivalent to Explode and Implode(PHP) [closed]
(6 answers)
Closed 8 years ago.
Hi all :) I`ve got very easy problem I spend more than 2 hours searching here and on java docs.
So I have a string which contain more than 5k lines in each line there are 6 numbers from 1-49 and they are separated by ";". I want to count how many times each of the number occur in my very long string. Most of topic which i found was about char counting. The closest i think was to use common lang and function .countMatches should I use arrayList? I need some clue, if solution is to long tip me how to do it :)
the straightforward solution is to read line by line from your file and split by ;, then you got each number as string, finally put them into a HashMap<String, Integer>, if the key exists, just +1 the value. At the end you have the counts for each string (your number).
I hope I understand your question right.
Try this:
Create a variable to hold the counts. Here is an example: Map<String, Integer> counts = new HashMap<String, Integer>();
Split each line using String.split() (specifically line.split(";").
Each time you split the line you will receive an array of numbers. for each of these numbers, retrieve the value from the the counts map. if null, add it to the map with a count of 1, of not null, increment the count and add it back to the map.
Edit: some code.
Map<String, Integer> counts = new HashMap<String, Integer>();
String line;
String[] parts
while (there are more lines)
{
line = read the line somehow.
parts = line.split(";");
if (parts != null)
{
for (String current : parts)
{
Integer value = counts.get(current);
if (value == null) // number not in the counts map yet.
{
counts.put(current, 1);
}
else
{
int currentCount = value.intValue() + 1;
counts.put(current, currentCount);
}
}
}
}

search elements in an array in java

I'm wondering what kind method should I use to search the elements in an array and what data structure to store the return value
For example a txt file contains following
123 Name line Moon night table
124 Laugh Cry Dog
123 quote line make pet table
127 line array hello table
and the search elements are line+table
I read every line as an string and then spilt by space
the output should like this
123 2 (ID 123 occurs twice that contains the search elements)
127 1
I want some suggestions of what kind method to search the elements in the array and what kind data structure to store the return value (the ID and the number of occurs. I'm thinking hashmap)
Read the text file and store each line that ends with table in ArrayList<String>. Then use contains for each element in ArrayList<String>. Store result in HashMap<key,value> where key is ID and value is Integer which represent number of times ID occurs.
First, I would keep reading through the file line by line, there's really no other way of going about it other than that.
Second, to pick out the rows to save, you don't need to do the split (assumption: they all end in (space)table). You can just get them by using:
if (line.endsWith(" table"))
Then, I would suggest using a Map<String, Integer> datatype to store your information. This way, you have the number of the table (key) and how many times if was found in the file (value).
Map<String, Integer> map = new HashMap<String, Integer>();
....reading file....
if (line.endsWith(" table")) {
String number = line.substring(0, line.indexOf(" "))
if (!map.containsKey(number)) {
map.put(number, 1);
} else {
Integer value = map.get(number);
value++;
map.put(number, value);
}
}

Categories