Scenario: You have been supplied with an ascii text file containing one day’s worth of catch records.
Each line in the file contains one "colon separated" catch record with three fields:
CONTESTANTS_NAME:FISH_TYPE:FISH_WEIGHT
for example
PETER:TUNNY:13.3
which indicates that a competitor called PETER caught a TUNNY weighing 13.3 kg. Note
that PETER may have caught more than one fish on the day.
How would you solve this problem using java's built-in classes Tokenizer and HashMap?
Your design should provide the following analysis:
The total weight of each type of fish caught on the day.
The total weight of fish caught by each competitor.
The top three competitors ranked by total catch weight.
Reason I'm posting this is that at first glance I sort of panicked knowing that any map contains just a key-value pair and had no idea how to solve this since it has three fields. What I did is have two HashMaps, first one had keys with CONTESTANT-NAME and second one keys were FISH_NAME and was able to provide the required analysis: this required a number of loops and I'm not sure if that's a good way of programming. If somebody has a better approach, please let me know. I just need the logic.
You may want to check out table class, like Guava Table (think of it as of 2-dimensional map). Then you may use CONTESTANT_NAME as a first key, FISH_NAME as a second, and a weight as a stored value.
Guava Table even pretends to do well with sparse tables, so I strongly suggest you to give it a try.
you can do a get/update/put combo on the hashmap
Double contestantTotal = contestantMap.get(contestant);
if(contestantTotal ==null)contestantTotal = Double.getValue(0);//if it wasn't already in the map the returned value will be null
contestantTotal += weight;
contestantMap.put(contestant,contestantTotal );//put overwrites the previous values
Double fishTypeTotal = fishTypeMap.get(fishType);
if(fishTypeTotal ==null)fishTypeTotal = Double.getValue(0);
fishTypeTotal += weight;
fishTypeMap.put(fishType,fishTypeTotal);
this requires just 3 loops one the input loop and 2 output loops
Related
I'm new to Java and as a learning project, I would like to program a little vocabulary application, so that the user can test himself but also search for entries. However, I struggle to find the right datastructure for this and even after spending the last few days googling for it, I'm still at a loss.
Here is what I have in mind for my vocabulary object:
import java.io.*;
class Vocab implements Serializable {
String lang1;
String lang2;
int rightAnswersInARow; // to influence what to ask during testing
int numberOfTimesSearched; // to influence search suggestions
// ... plus the appropriate setter and getter methods.
}
Now for the testing, at first glance an ArrayList seems to be the most appropriate (choosing a random number and then selecting that object to test). But what if I would also like to factor in the rightAnswersInARow and ask vocabularies with a low number more often? My approach would be count the number of objects for each value, give each value an interval (e.g. the interval for rightAnswersInARow = 0 would be inflated by the factor 3) and then randomly select from there.
But even if I go through the ArrayList each time, get the rightAnswersInARow and determine the intervals...how would I then map the calculated number to the right index since the elements are not sorted? So would a TreeSet be more appropriate?
To search for entries in both languages and maybe even adding a dropdown-list with suggested words (like in Google's search) would require that I find the strings quickly (HashMap?). Or maybe go through 2+ (one for each language) TreeSets to reach the first element that starts with those letters, then selecting the next few elements from there? But that would mean the search would always suggest the same words, ignoring which words were searched for the most.
What would you suggest? Have a HashMap with each value pair and manually implement something like a relational database?
Thank you in advance! :)
To save time on calculations, I am making a program that will use formula to calculate a value based on the data that the user inputs. The program will prompt the user for five double values: A, B, and C, D, and E. It will then multiply A by B and then find the corresponding value on a conversion table. It will do the same for C and D and plug in the corresponding values along with E in a formula to give the user the answer. My question is: How would I include the table of values I mentioned above into my program so that I can easily find the corresponding values? I'm thinking of hardcoding these values into hashmaps but that would take quite awhile. Is there a file format that stores similar types of data that would be optimal to the situation?
Store the values in CSV. Load the values into an custom object/class with a field for each column. Start by looping over the entire set of objects to find the correct value/range each time. If that does not perform well optimize by doing things like having multiple lists of references to the objects where each list is sorted by a different column-- use those sorted lists to quickly find the correct object.
I say "range" here, because I am assuming you are sometimes looking for doubles. If the result of your calculation tells you to look for 1.999999 you may actually have to look for that +/- some tolerance. For this same reason you wouldn't want to use doubles as the keys for a map.
I have an assignment that I am working on, and I can't get a hold of the professor to get clarity on something. The idea is that we are writing an anagram solver, using a given set of words, that we store in 3 different dictionary classes: Linear, Binary, and Hash.
So we read in the words from a textfile, and for the first 2 dictionary objects(linear and binary), we store the words as an ArrayList...easy enough.
But for the HashDictionary, he want's us to store the words in a HashTable. I'm just not sure what the values are going to be for the HashTable, or why you would do that. The instructions say we store the words in a Hashtable for quick retrieval, but I just don't get what the point of that is. Makes sense to store words in an arraylist, but I'm just not sure of how key/value pairing helps with a dictionary.
Maybe i'm not giving enough details, but I figured maybe someone would have seen something like this and its obvious to them.
Each of our classes has a contains method, that returns a boolean representing whether or not a word passed in is in the dictionary, so the linear does a linear search of the arraylist, the binary does a binary search of the arraylist, and I'm not sure about the hash....
The difference is speed. Both methods work, but the hash table is fast.
When you use an ArrayList, or any sort of List, to find an element, you must inspect each list item, one by one, until you find the desired word. If the word isn't there, you've looped through the entire list.
When you use a HashTable, you perform some "magic" on the word you are looking up known as calculating the word's hash. Using that hash value, instead of looping through a list of values, you can immediately deduce where to find your word - or, if your word doesn't exist in the hash, that your word isn't there.
I've oversimplified here, but that's the general idea. You can find another question here with a variety of explanations on how a hash table works.
Here is a small code snippet utilizing a HashMap.
// We will map our words to their definitions; word is the key, definition is the value
Map<String, String> dictionary = new HashMap<String, String>();
map.put("hello","A common salutation");
map.put("chicken","A delightful vessel for protein");
// Later ...
map.get("chicken"); // Returns "A delightful vessel for protein";
The problem you describe asks that you use a HashMap as the basis for a dictionary that fulfills three requirements:
Adding a word to the dictionary
Removing a word from the dictionary
Checking if a word is in the dictionary
It seems counter-intuitive to use a map, which stores a key and a value, since all you really want to is store just a key (or just a value). However, as I described above, a HashMap makes it extremely quick to find the value associated with a key. Similarly, it makes it extremely quick to see if the HashMap knows about a key at all. We can leverage this quality by storing each of the dictionary words as a key in the HashMap, and associating it with a garbage value (since we don't care about it), such as null.
You can see how to fulfill the three requirements, as follows.
Map<String, Object> map = new HashMap<String, Object>();
// Add a word
map.put('word', null);
// Remove a word
map.remove('word');
// Check for the presence of a word
map.containsKey('word');
I don't want to overload you with information, but the requirements we have here align with a data structure known as a Set. In Java, a commonly used Set is the HashSet, which is almost exactly what you are implementing with this bit of your homework assignment. (In fact, if this weren't a homework assignment explicitly instructing you to use a HashMap, I'd recommend you instead use a HashSet.)
Arrays are hard to find stuff in. If I gave you array[0] = "cat"; array[1] = "dog"; array[2] = "pikachu";, you'd have to check each element just to know if jigglypuff is a word. If I gave you hash["cat"] = 1; hash["dog"] = 1; hash["pikachu"] = 1;", instant to do this in, you just look it up directly. The value 1 doesn't matter in this particular case although you can put useful information there, such as how many times youv'e looked up a word, or maybe 1 will mean real word and 2 will mean name of a Pokemon, or for a real dictionary it could contain a sentence-long definition. Less relevant.
It sounds like you don't really understand hash tables then. Even Wikipedia has a good explanation of this data structure.
Your hash table is just going to be a large array of strings (initially all empty). You compute a hash value using the characters in your word, and then insert the word at that position in the table.
There are issues when the hash value for two words is the same. And there are a few solutions. One is to store a list at each array position and just shove the word onto that list. Another is to step through the table by a known amount until you find a free position. Another is to compute a secondary hash using a different algorithm.
The point of this is that hash lookup is fast. It's very quick to compute a hash value, and then all you have to do is check that the word at that array position exists (and matches the search word). You follow the same rules for hash value collisions (in this case, mismatches) that you used for the insertion.
You want your table size to be a prime number that is larger than the number of elements you intend to store. You also need a hash function that diverges quickly so that your data is more likely to be dispersed widely through your hash table (rather than being clustered heavily in one region).
Hope this is a help and points you in the right direction.
All,
I am wondering what's the most efficient way to check if a row already exists in a List<Set<Foo>>. A Foo object has a key/value pair(as well as other fields which aren't applicable to this question). Each Set in the List is unique.
As an example:
List[
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:2][Foo_Key:C, Foo_Value:4]
Set<Foo>[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:3]
]
I want to be able to check if a new Set (Ex: Set[Foo_Key:A, Foo_Value:1][Foo_Key:B, Foo_Value:3][Foo_Key:C, Foo_Value:4]) exists in the List.
Each Set could contain anywhere from 1-20 Foo objects. The List can contain anywhere from 1-100,000 Sets. Foo's are not guaranteed to be in the same order in each Set (so they will have to be pre-sorted for the correct order somehow, like a TreeSet)
Idea 1: Would it make more sense to turn this into a matrix? Where each column would be the Foo_Key and each row would contain a Foo_Value?
Ex:
A B C
-----
1 3 4
1 2 4
1 3 3
And then look for a row containing the new values?
Idea 2: Would it make more sense to create a hash of each Set and then compare it to the hash of a new Set?
Is there a more efficient way I'm not thinking of?
Thanks
If you use TreeSets for your Sets can't you just do list.contains(set) since a TreeSet will handle the equals check?
Also, consider using Guava's MultSet class.Multiset
I would recommend you use a less weird data structure. As for finding stuff: Generally Hashes or Sorting + Binary Searching or Trees are the ways to go, depending on how much insertion/deletion you expect. Read a book on basic data structures and algorithms instead of trying to re-invent the wheel.
Lastly: If this is not a purely academical question, Loop through the lists, and do the comparison. Most likely, that is acceptably fast. Even 100'000 entries will take a fraction of a second, and therefore not matter in 99% of all use cases.
I like to quote Knuth: Premature optimisation is the root of all evil.
In a game, I'm trying to keep a list of users and have it sorted by score, so that I could query the list at any given time and return (for example) the top ten users by score. This list should be thread-safe. I envision using the userName string as a key and the value would be a User object which implements Comparable and has properties such as displayName and score. The User object would therefore have a compareTo method which would compare the score attribute to determine its position.
I'm looking at using a ConcurrentSkipListMap for this, but as best I can tell, the Map (as opposed to the Set) uses the key to sort. I'd like to have the list sorted by the score property of the User object, but still use a Map because I need to be able access any given user and modify their score attribute from a thread.
It doesn't seem that using my own Comparator for the key would solve my problem, as I doubt I'd have access to the associated value for comparison. I could use a ConcurrentSkipListSet but accessing the list to modify an individual user's score would be (I would imagine) an expensive operation (due to the need to iterate every time).
Would anyone be able to suggest how to accomplish this?
No, I don't think you can. The comparator used for ordering is the same one used for indexing. You will probably have to maintain 2 collections. One for keeping the ordering of user's scores the for referring to the users by name.
get(key) depends on the comparator (to be able to locate the key). You propose a comparator that would depend on get(key) (to access the mapped value of a key an compare based on that). That necessarily leads to infinite recursion and stack overflow (on the bright side, you are posting at the right website!!)
Michael is right, you can't have your cake and eat it too ;)
I think you have 3 choices:
Use a Map so that updates to a user's score are quick, and you pay the price when sorting to find the highest scores.
Use a SortedSet that sorts by score so that finding the highest scores is fast, but you must pay the price when updating user's scores
Maintain two data structures, so that you can have the best of 1 and 2. For example, you have your real data in a set sorted by score, but then also maintain a mapping of username to index into the set or similar. That way you always have the sorted scores, and updating a user's score is just a lookup, not a search. The price you pay for this is now you are maintaining some duplicate information in two places, and especially considering concurrent access, it can be tricky ensuring both places are always updated in synch.
I would not make assumptions about which is faster between 1 & 2. I would try them both out with your expected usage and measure to see what is worst.
If you are really only interested in the top n scores, then there is the possibility to just maintain that list separately. So have your map of username to score for everyone, but also maintain a small set of the top scores (and their users). Every time you add/update someone's score, just check the score against the top score list, and if it's bigger than the smallest one there, just add it and bump off the lower one. This is similar to suggestion 3 above, but is less overhead and perhaps easier to maintain.