Distinctive number from hashmap - java

I'm new to java and I need to know How can i calculate distinctive number of words in HashMap
I got tweets and stored it into array of string like that
String [] words = {i, to , go , eat,know , i ,let , let , figure , eat};
HashMap <String,Integer> set=new HashMap();
for (String w:words)
{
int freq=set.get(w);
if (freq==null)
{
set.put(w1,1)
}
else
set.put(w1,freq+1)
}
let's suppose that HashMap now has all words that i need
now how can i calculate total of number of distinctive words ?
that i can see that words that have value = 1 in hashmap right ?
I tried to check
if (set.containsvalue(1))
int dist +=set.size();
but didn't work !

int dist = 0;
for (int i : set.values())
if (i == 1)
++dist;

Before you put a word into set, you should check if the key exists or not. If the key exists, then you should increase the value.

The following segment of your code is wrong:
int freq=set.get(w);
if (freq==null)
{
set.put(w1,1)
}
freq is declared to be an int to which null check cannot be applied. null checks are applicable to references.
Also, I think there is a typo that you are between w and w1
The correct code is:
String [] words = {i, to , go , eat,know , i ,let , let , figure , eat};
Map <String,Integer> set=new HashMap();
for (String w:words)
{
if (set.get(w)==null)
{
set.put(w,1)
}
else
set.put(w,set.get(w)+1)
}
Now if you iterate over the map to check the keys for which the value is 1, you will have your distinct words.

You have just to iterate through all the map and get the keys with freq == 1
int unique = 0;
for(String word : set.keySet()) {
int freq = set.get(word);
if(freq == 1) {
unique++;
}
}
System.out.println(unique);

Related

comparing Hashmaps by different String Keys

i have two HashMaps and want compare it as fast as possible but the problem is, the String of mapA consist of two words connected with a space. The String of mapB is only one word.
I dont want to count the occurences, that is already done, i want to compare the two diferent Strings
mapA:
key: hello world, value: 10
key: earth hi, value: 20
mapB:
key: hello, value: 5
key: world, value: 15
key: earth, value: 25
key: hi, value: 35
the first key of mapA should find key "hello" and key "world" from mapB
what i trying to do is parsing a long Text to find Co occurences and set a value how often they occur related to all words.
my first try:
for(String entry : mapA.keySet())
{
String key = (String) entry;
Integer mapAvalue = (Integer) mapA.get(entry);
Integer tokenVal1=0, tokenVal2=0;
String token1=key.substring(0, key.indexOf(" "));
String token2=key.substring(key.indexOf(" "),key.length()).trim();
for( String mapBentry : mapb.keySet())
{
String tokenkey = mapBentry;
if(tokenkey.equals(token1)){
tokenVal1=(Integer)tokens.get(tokenentry);
}
if(tokenkey.equals(token2)){
tokenVal2=(Integer)tokens.get(tokenentry);
}
if(token1!=null && token2!=null && tokenVal1>1000 && tokenVal2>1000 ){
**procedurecall(mapAvalue, token1, token2, tokenVal1, tokenVal2);**
}
}
}
You shouldn't iterate over a HashMap (O(n)) if you are just trying to find a particular key, that's what the HashMap lookup (O(1)) is used for. So eliminate your inner loop.
Also you can eliminate a few unnecessary variables in your code (e.g. key, tokenkey). You also don't need a third tokens map, you can put the token values in mapb.
for(String entry : mapA.keySet())
{
Integer mapAvalue = (Integer) mapA.get(entry);
String token1=entry.substring(0, entry.indexOf(" "));
String token2=entry.substring(entry.indexOf(" "),entry.length()).trim();
if(mapb.containsKey(token1) && mapb.containskey(token2))
{
// look up the tokens:
Integer tokenVal1=(Integer)mapb.get(token1);
Integer tokenVal2=(Integer)mapb.get(token2);
if(tokenVal1>1000 && tokenVal2>1000)
{
**procedurecall(mapAvalue, token1, token2, tokenVal1, tokenVal2);**
}
}

Detecting duplicates in a file generated using the sliding window concept

I am working on a project where I have to parse a text file and divide the strings into substrings of a length that the user specifies. Then I need to detect the duplicates in the results.
So the original file would look like this:
ORIGIN
1 gatccaccca tctcggtctc ccaaagtgct aggattgcag gcctgagcca ccgcgcccag
61 ctgccttgtg cttttaatcc cagcactttc agaggccaag gcaggcgatc agctgaggtc
121 aggagttcaa gaccagcctg gccaacatgg tgaaacccca tctctaatac aaatacaaaa
181 aaaaaacaaa aaacgttagc caggaatgag gcccggtgct tgtaatccta aggaaggaga
241 ccaccactcc tcctgctgcc cttcccttcc ccacaccgct tccttagttt ataaaacagg
301 gaaaaaggga gaaagcaaaa agcttaaaaa aaaaaaaaaa cagaagtaag ataaatagct
I loop over the file and generate a line of the strings then use line.toCharArray() to slide over the resulting line and divide according to the user specification. So if the substrings are of length 4 the result would look like this:
GATC
ATCC
TCCA
CCAC
CACC
ACCC
CCCA
CCAT
CATC
ATCT
TCTC
CTCG
TCGG
CGGT
GGTC
GTCT
TCTC
CTCC
TCCC
CCCA
CCAA
Here is my code for splitting:
try {
scanner = new Scanner(toSplit);
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length - (k - 1); i++) {
String s = "";
for(int j = i; j < i + k; j++) {
s += chars[j];
}
if (!s.contains("N")) {
System.out.println(s);
}
}
}
}
My question is: given that the input file can be huge, how can I detect duplicates in the results?
If You want to check duplicates a Set would be a good choice to hold and test data. Please tell in which context You want to detect the duplicates: words, lines or "output chars".
You can use a bloom filter or a table of hashes to detect possible duplicates and then make a second pass over the file to check if those "duplicate candidates" are true duplicates or not.
Example with hash tables:
// First we make a list of candidates so we count the times a hash is seen
int hashSpace = 65536;
int[] substringHashes = new int[hashSpace];
for (String s: tokens) {
substringHashes[s.hashCode % hashSpace]++; // inc
}
// Then we look for words that have a hash that seems to be repeated and actually see if they are repeated. We use a set but only of candidates so we save a lot of memory
Set<String> set = new HashSet<String>();
for (String s: tokens) {
if (substringHashes[s.hashCode % hashSpace] > 1) {
boolean repeated = !set.add(s);
if (repeated) {
// TODO whatever
}
}
}
You could do something like this:
Map<String, Integer> substringMap = new HashMap<>();
int index = 0;
Set<String> duplicates = new HashSet<>();
For each substring you pull out of the file, add it to substringMap only if it's not a duplicate (or if it is a duplicate, add it to duplicates):
if (substringMap.putIfAbsent(substring, index) == null) {
++index;
} else {
duplicates.add(substring);
}
You can then pull out all the substrings with ease:
String[] substringArray = new String[substringMap.size()];
for (Map.Entry<String, Integer> substringEntry : substringMap.entrySet()) {
substringArray[substringEntry.getValue()] = substringEntry.getKey();
}
And voila! An array of output in the original order with no duplicates, plus a set of all the substrings that were duplicates, with very nice performance.

How can I use a string array as key in hash map?

I've made an String array out of a .txt and now want to make a HashMap with this string as key. But I don't want to have the String as one key to one value, I want to have each Information as a new key for the HashMap.
private static String[] readAndConvertInputFile() {
String str = StdIn.readAll();
String conv = str.replaceAll("\'s", "").replaceAll("[;,?.:*/\\-_()\"\'\n]", " ").replaceAll(" {2,}", " ").toLowerCase();
return conv.split(" "); }
So the information in the string is like ("word", "thing", "etc.", "pp.", "thing").
My value should be the frequency of the word in the text. So for example key: "word" value: 1, key: "thing" value: 2 and so on... I'm clueless and would be grateful if someone could help me, at least with the key. :)
You can create a Map while using the String value at each array index as the key, and an Integer as the value to keep track of how many times a word appeared.
Map<String,Integer> map = new HashMap<String,Integer>();
Then when you want to increment, you can check if the Map already contains the key, if it does, increase it by 1, otherwise, set it to 1.
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
So, while you are looping over your string array, convert the String to lower case (if you want to ignore case for word occurrences), and increment the map using the if statement above.
for (String word : words) {
word = word.toLowerCase(); // remove if you want case sensitivity
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
}
A full example is shown below. I converted to words to lowercase to ignore case when using the key in the map, if you want to keep case, remove the line where I convert it to lowercase.
public static void main(String[] args) {
String s = "This this the has dog cat fish the cat horse";
String[] words = s.split(" ");
Map<String, Integer> occurences = new HashMap<String, Integer>();
for (String word : words) {
word = word.toLowerCase(); // remove if you want case sensitivity
if (occurences.containsKey(word)) {
occurences.put(word, occurences.get(word) + 1);
} else {
occurences.put(word, 1);
}
}
for(Entry<String,Integer> en : occurences.entrySet()){
System.out.println("Word \"" + en.getKey() + "\" appeared " + en.getValue() + " times.");
}
}
Which will give me output:
Word "cat" appeared 2 times.
Word "fish" appeared 1 times.
Word "horse" appeared 1 times.
Word "the" appeared 2 times.
Word "dog" appeared 1 times.
Word "this" appeared 2 times.
Word "has" appeared 1 times.
Yes, you can use an array (regardless of element type) as a HashMap key.
No, shouldn't do so. The behavior is unlikely to be what you want (in general).
In your particular case, I don't see why you even propose using an array as a key in the first place. You seem to want Strings drawn from among your array elements as keys.
You could construct a word frequency table like so:
Map<String, Integer> computeFrequencies(String[] words) {
Map<String, Integer> frequencies = new HashMap<String, Integer>();
for (String word: words) {
Integer wordFrequency = frequencies.get(word);
frequencies.put(word,
(wordFrequency == null) ? 1 : (wordFrequency + 1));
}
return frequencies;
}
In java 8 using stream
String[] array=new String[]{"a","b","c","a"};
Map<String,Integer> map1=Arrays.stream(array).collect(Collectors.toMap(x->x,x->1,(key,value)->value+1));

How to get the two highest values in a HashMap<String, Integer>

I have a HashMap made of two types: String as Keys, which are Player names and Integer as values, which are scores.
I am creating a ranking system, and for that, I need to get the two highest values from this HashMap, so the top two players that were put into it.
For example:
If these were the values of my HashMap:
Key Value
String1 1
String2 2
String3 3
String4 4
String5 5
I would want a way to only return String4 and String5.
I thought that getting the entrySet would be enough, but there's no get method in a Set, so I can't get value 0 (highest) and value 1 (second highest).
I also tried using
Collections.max(map);
And it wouldn't accept a HashMap as an argument
And
final String[] top = new String[1];
int topInt = 0;
map.forEach((s, i) -> {if (i > topInt) top[0] = s;});
But that didn't quite work either, it was way too slow for my performance.
How do I obtain both highest values?
Try something like this. I haven't tested it, let me know if I overlooked something.
int highest = Integer.MIN_VALUE;
String highestString = null;
int secondHighest = Integer.MIN_VALUE;
String secondHighestString = null;
for(Map.Entry<String, Integer> pair : yourHashMap.entrySet())
{
if(highest < pair.getValue())
{
secondHighest = highest;
secondHighestString = highestString;
highest = pair.getValue();
highestString = pair.getKey();
}
}
Also as others have stated in the comments; this is probably not the best approach, as you could have multiple values of the same value with different keys. If indexing is what you are after use an array, ArrayList, or something from the java.util.Collections class.

Java - dynamically pad left with printf

I'm learning Java and have spent way too much time on this stupid little problem. I'm trying to dynamically pad the left side of my string outputs with spaces, so all values displayed will be padded left. The problem is, I don't know the length of the values until a user enters them.
Here's an example of what I'm trying to do. nLongestString is the length of the longest string I'm displaying, and strValue is the value of the string itself. This doesn't work dynamically at all. If I hardcode a value for nLongestString it works, but I can't do that since I don't always know how long the strings will be.
System.out.printf("%"+nLongestString+"s", strValue + ": ");
Output should look like:
thisisalongstring:
longstring:
short:
I'm not seeing your problem, the following works fine for me. (Java 7)
Edit: Have you checked the value of nLongestString? I'm guessing it doesn't get set to what you think it does.
String[] arr = { "foo", "bar", "foobar" };
int max = 0;
for( String s : arr ) {
if( s.length() > max ) {
max = s.length();
}
}
for( String s : arr ) {
System.out.printf( ">%" + max + "s<%n", s );
}
Random random = new Random( System.currentTimeMillis() );
// just to settle the question of whether it works when
// Java can't know ahead of time what the value will be
max = random.nextInt( 10 ) + 6;
for( String s : arr ) {
System.out.printf( ">%" + max + "s<%n", s );
}
}
Output:
> foo<
> bar<
>foobar<
// the following varies, of course
> foo<
> bar<
> foobar<
If you already have your data then you just need to find max length of your words and after that print them. Here is code sample
// lets say you have your data in List of strings
List<String> words = new ArrayList<>();
words.add("thisisalongstring");
words.add("longstring");
words.add("short");
// lets find max length
int nLongestString = -1;
for (String s : words)
if (s.length() > nLongestString)
nLongestString = s.length();
String format = "%"+nLongestString+"s:\n";// notice that I added `:` in format so
// you don't have to concatenate it in
// printf argument
//now lets print your data
for (String s:words)
System.out.printf(format,s);
Output:
thisisalongstring:
longstring:
short:

Categories