Need help to decide about datastructure of my project - java

I try to index lots of words and group them in different files. I mean for each alphabet character I consider a file, like a.txt,b.txt,...
My idea is this structure but I am looking for better strcucture because for each hashmap's key (word) has one file and no need to have other hashmap.
HashMap<String, HashMap<String,ArrayList<Posting>>>
for example
HashMap<"book", HashMap<"b.txt",ArrayList<Posting>>>
HashMap<"baby", HashMap<"b.txt",ArrayList<Posting>>>

Why do you need a nested hashmap? You can always conclude the file name from the word itself. Alternatively - wrap the word and file name in the object that will be used as a key in single level hashmap.

Related

How to store over 300 words with definitions in a Android studio

I'm trying to create an android app where you can learn hard words its over 300 words. I'm wondering how I should store the data in java.
I have a text file where all the words are. I have split the text so I have one array with the words and another Array with the definitions, they have the same index. In an activity, I want to make it as clean as possible, because sometimes I need to delete an index and It's not efficient to that with an ArrayList since they all need to move down.
PS. I really don't wanna use a database like Firebase.
Instead of using two different arrays and trying to ensure that their order/indices are matched, you should consider defining your own class.
class Word {
String wordName;
String wordDefinition;
}
You can then make a collection of this using ArrayList or similar.
ArrayList<Word> wordList;
I know you were concerned about using an ArrayList due to the large number of words, however I think for your use case the ArrayList is fine. Using a database is probably overkill, unless if you want to put in the whole dictionary ;)
In any case, it is better to define your own class and use this as a "wildcard" to collection types which accept these. This link may give you some ideas of other feasible data types.
https://developer.android.com/reference/java/util/Collections
I personally would use a HashMap.
The reason for this is because you can set the key to be the word and the value to be the definition of the word. And then you can grab the definition of the word by doing something like
// Returns the definition of word or null if the word isn't in the hashmap
hashMapOfWords.getOrDefault(word, null);
Check out this link for more details on a HashMap
https://developer.android.com/reference/java/util/HashMap

Index of words found in document - Java

I am trying to write a program that takes in a text file as input, retrieves the words, and outputs each word with each line number that they are located in. I'm having a lot of trouble with this project, although I've made some progress...
So far I have an ArrayList which holds all of the words found in the document, without punctuation marks. I am able to output this list and see all the words in the text file, but I do not know where to go from here... any ideas?
example:
myList = [A, ACTUALLY, ALMOST,....]
I need to somehow be able to associate each word with which line they came from, so I can populate a data structure that will hold each word with their associated line number(s).
I am a programming novice so I am not very familiar with all the types of data structures and algorithms out there... my instructor suggested I use a dynamic multilinked list but I don't know how I would implement that verses ArrayLists and arrays.
Any ideas would be greatly appreciated. Thanks!
You should use a hash table. A hash table is a key/value pair. The key can be every word in the text file, the value, an array list containing the line numbers.
Basically, loop through every word in the text file. If that word is not in your list of words, add it as the key and the line number as the value in a list into the hash table. If that word is already in the table, append the line number to the array list.
Java has good docs on a hash table here
for you to get the methods you need.

Storing a dictionary in a hashtable

I have an assignment that I am working on, and I can't get a hold of the professor to get clarity on something. The idea is that we are writing an anagram solver, using a given set of words, that we store in 3 different dictionary classes: Linear, Binary, and Hash.
So we read in the words from a textfile, and for the first 2 dictionary objects(linear and binary), we store the words as an ArrayList...easy enough.
But for the HashDictionary, he want's us to store the words in a HashTable. I'm just not sure what the values are going to be for the HashTable, or why you would do that. The instructions say we store the words in a Hashtable for quick retrieval, but I just don't get what the point of that is. Makes sense to store words in an arraylist, but I'm just not sure of how key/value pairing helps with a dictionary.
Maybe i'm not giving enough details, but I figured maybe someone would have seen something like this and its obvious to them.
Each of our classes has a contains method, that returns a boolean representing whether or not a word passed in is in the dictionary, so the linear does a linear search of the arraylist, the binary does a binary search of the arraylist, and I'm not sure about the hash....
The difference is speed. Both methods work, but the hash table is fast.
When you use an ArrayList, or any sort of List, to find an element, you must inspect each list item, one by one, until you find the desired word. If the word isn't there, you've looped through the entire list.
When you use a HashTable, you perform some "magic" on the word you are looking up known as calculating the word's hash. Using that hash value, instead of looping through a list of values, you can immediately deduce where to find your word - or, if your word doesn't exist in the hash, that your word isn't there.
I've oversimplified here, but that's the general idea. You can find another question here with a variety of explanations on how a hash table works.
Here is a small code snippet utilizing a HashMap.
// We will map our words to their definitions; word is the key, definition is the value
Map<String, String> dictionary = new HashMap<String, String>();
map.put("hello","A common salutation");
map.put("chicken","A delightful vessel for protein");
// Later ...
map.get("chicken"); // Returns "A delightful vessel for protein";
The problem you describe asks that you use a HashMap as the basis for a dictionary that fulfills three requirements:
Adding a word to the dictionary
Removing a word from the dictionary
Checking if a word is in the dictionary
It seems counter-intuitive to use a map, which stores a key and a value, since all you really want to is store just a key (or just a value). However, as I described above, a HashMap makes it extremely quick to find the value associated with a key. Similarly, it makes it extremely quick to see if the HashMap knows about a key at all. We can leverage this quality by storing each of the dictionary words as a key in the HashMap, and associating it with a garbage value (since we don't care about it), such as null.
You can see how to fulfill the three requirements, as follows.
Map<String, Object> map = new HashMap<String, Object>();
// Add a word
map.put('word', null);
// Remove a word
map.remove('word');
// Check for the presence of a word
map.containsKey('word');
I don't want to overload you with information, but the requirements we have here align with a data structure known as a Set. In Java, a commonly used Set is the HashSet, which is almost exactly what you are implementing with this bit of your homework assignment. (In fact, if this weren't a homework assignment explicitly instructing you to use a HashMap, I'd recommend you instead use a HashSet.)
Arrays are hard to find stuff in. If I gave you array[0] = "cat"; array[1] = "dog"; array[2] = "pikachu";, you'd have to check each element just to know if jigglypuff is a word. If I gave you hash["cat"] = 1; hash["dog"] = 1; hash["pikachu"] = 1;", instant to do this in, you just look it up directly. The value 1 doesn't matter in this particular case although you can put useful information there, such as how many times youv'e looked up a word, or maybe 1 will mean real word and 2 will mean name of a Pokemon, or for a real dictionary it could contain a sentence-long definition. Less relevant.
It sounds like you don't really understand hash tables then. Even Wikipedia has a good explanation of this data structure.
Your hash table is just going to be a large array of strings (initially all empty). You compute a hash value using the characters in your word, and then insert the word at that position in the table.
There are issues when the hash value for two words is the same. And there are a few solutions. One is to store a list at each array position and just shove the word onto that list. Another is to step through the table by a known amount until you find a free position. Another is to compute a secondary hash using a different algorithm.
The point of this is that hash lookup is fast. It's very quick to compute a hash value, and then all you have to do is check that the word at that array position exists (and matches the search word). You follow the same rules for hash value collisions (in this case, mismatches) that you used for the insertion.
You want your table size to be a prime number that is larger than the number of elements you intend to store. You also need a hash function that diverges quickly so that your data is more likely to be dispersed widely through your hash table (rather than being clustered heavily in one region).
Hope this is a help and points you in the right direction.

Help need in creating a hashset from a hashmap

I've been able to read a four column text file into a hashmap and get it to write to a output file. However, I need to get the second column(distinct values) into a hashset and write to the output file. I've been able to create the hashset, but it is grabbing everything and not sorting. By the way I'm new, so please take this into consideration when you answer. Thanks
Neither HashSet nor HashMap are meant to sort. They're fundamentally unsorted data structures. You should use an implementation of SortedSet, such as TreeSet.
Some guesses, related to mr Skeets answer and your apparent confusion...
Are you sure you are not inserting the whole line in the TreeSet? If you are going to use ONLY the second column, you will need to split() the strings (representing the lines) into columns - that's nothing that's done automatically.
Also, If you are actually trying to sort the whole file using the second column as key, You will need a TreeMap instead, and use the 2:nd column as key, and the whole line as data. But that won't solve the splitting, it only to keep the relation between the line and the key.
Edit: Here is some terminology for you, you might need it.
You have a Set. It's a collection of other objects - like String. You add other objects to it, and then you can fetch all objects in it by iterating through the set. Adding is done through the method add()and iterating can be done using the enhanced for loop syntax or using the iterator() method.
The set doesn't "grab" or "take" stuff; You add something to the set - in this case a String - Not an array of Strings which is written as String[]
(Its apparently possible to add array to a TreeSet (they are objects too) , but the order is not related to the contents of the String. Maybe thats what you are doing.)
String key = splittedLine[1]; // 2:nd element
"The second element of the keys" doesn't make sense at all. And what's the duplicates you're talking about. (note the correct use of apostrophes... :-)

Properties in java - can we have comma-separated keys with single value?

I want to have multiple keys (>1) for a single value in a properties file in my java application. One simple way of doing the define each key in separate line in property file and the same value to all these keys. This approach increases the maintainability of property file. The other way (which I think could be smart way) is define comma separated keys with the value in single line. e.g.
key1,key2,key3=value
Java.util.properties doesn't support this out of box. Does anybody did simillar thing before? I did google but didn't find anything.
--manish
I'm not aware of an existing solution, but it should be quite straightforward to implement:
String key = "key1,key2,key3", val = "value";
Map<String, String> map = new HashMap<String, String>();
for(String k : key.split(",")) map.put(k, val);
System.out.println(map);
One of the nice things about properties files is that they are simple. No complex syntax to learn, and they are easy on the eye.
Want to know what the value of the property foo is? Quickly scan the left column until you see "foo".
Personally, I would find it confusing if I saw a properties file like that.
If that's what you really want, it should be simple to implement. A quick first stab might look like this:
Open file
For each line:
trim() whitespace
If the line is empty or starts with a #, move on
Split on "=" (with limit set to 2), leaving you with key and value
Split key on ","
For each key, trim() it and add it to the map, along with the trim()'d value
That's it.
Since java.util.Properties extends java.util.Hashtable, you could use Properties to load the data, then post-process the data.
The advantage to using java.util.Properties to load the data instead of rolling your own is that the syntax for properties is actually fairly robust, already supporting many of the useful features you might end up having to re-implement (such as splitting values across multiple lines, escapes, etc.).

Categories