I am trying to go through a text from a file, get each Word and save it as a Word object in a HashSet but i am alway getting the size 1 for the HashSet,it stores only the first object, i dont know maybe there is some really easy way to do it, or maybe i have made some stupid mistake here are you the code:
public static void main(String[] args) throws IOException {
File file = new File("C:\\Users\\Taner\\Desktop\\words.txt");
Scanner input = new Scanner(file);
HashSet<Word> wordHash = new HashSet<>();
while (input.hasNextLine()) {
String line = input.nextLine();
for (String retval: line.split(" ", 0)){
wordTree.add(new Word(retval));
}
}
input.close();
System.out.println(wordTree);
}
This can happen if Word has a broken implementation of hashCode and equals:
it would seem that all the values added in the HashSet are the same.
Check the implementations of those methods in Word.
If you use an IDE like Eclipse or IntelliJ,
it can generate correct implementations for hashCode and equals.
I suggest you use that.
Related
I am trying to read in a large block of text and store each unique word and the number of times it came up in the text. To do this I made an array list of a Word class. The Word class simply stores the word and number of times it came up. To avoid duplicates of the same word I use the .contains() function to see if I have already processed a word before, but for some reason, .contains() is not working.
class Main
{
public static void main(String[] args) throws FileNotFoundException
{
File file = new File("poe.text");
Scanner f = new Scanner(file);
ArrayList<Word> words = new ArrayList<Word>();
int total = 0;
while(f.hasNext())
{
Word temp = new Word(f.next().toLowerCase().trim());
//System.out.println("temp is "+temp.getWord());
total++;
if(words.contains(temp))
{
words.get(words.indexOf(temp)).up();
} else
{
words.add(temp);
}
}
for(Word w:words)
{
System.out.println(w.toString());
}
}
}
The if statement never evaluates to true and every word is added to the words ArrayList even if it is already present.
In the Javadoc for List, the contains method uses the equals() method to evaluate if two objects are the same. have you implemented equals and hashcode in your words class ?
Javadoc
http://docs.oracle.com/javase/7/docs/api/java/util/List.html#contains%28java.lang.Object%29
I'm stuck.. I trying to parse a text from file in words, but save it in List of objects. Whether it is possible to do so?
public class Text {
public static List<Words> words = new ArrayList<Words>();
}
public class Words {
private String path;
private String[] inside;
private BufferedReader in;
public Words(String path, String[] inside) {
this.inside = inside;
this.path = path;
}
public String[] splittinIntoWords() throws IOException {
in = new BufferedReader(new FileReader(path));
String s;
while ((s = in.readLine()) != null) {
inside = s.split(" ");
//System.out.println(Arrays.toString(inside));
}
return inside;
}
}
and main class
public class Main {
public static void main(String[] args) throws IOException {
String file_name = "book.doc";
String[] inside = null;
Words w = new Words(file_name, inside);
w.splittinIntoWords();
Text.words.add(w); //after add in list i have a reference.
System.out.println(Text.words.toString());
}
}
i do smthg wrong. I understand how to do this with List of Strings
tell me please, it is possible, to add text splitting into words in List of Words
You’re overwriting the array of words Words.inside with each line you read. You need to add the output of split() to a List every time round the while loop, not just at the end.
I would expect your code to display the words in the last line of your file, but possibly it has a blank last line, in which case you will see nothing.
Also, I assume your "book.doc" is not really a .doc format file—word processor files need special parsing; what you have written will only work on plain text files.
there is several things wrong with your code.
Text.words shouldn't be static. Every instance of text consists of a different collection of words.
When you make a "Collection of Words", then it should be Collection< Word>. Because every item inside the collection is just a single item.
but then again, Collection< Word> is just the same as Collection< String>. So use that.
"path", "in" should not be member variables of "Words". Just use them locally in your method. Especially since you never closed "in".
you're overwriting whatever is in "inside" for each line in your file that you loop over. When you have your "Collection< String> words", then just do words.addAll(inside); in
your loop.
Yes, I know this is not an answer, but I'm trying to point you in the right direction. This might help you more in the long run.
I am simply trying to see if the inputted value matches a value that is already in the array and if it does return "Valid". I realize this is very simple but I cannot get this to work:
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
String[] accountNums = { "5658845", "8080152", "1005231", "4520125", "4562555",
"6545231", "7895122", "5552012", "3852085", "8777541",
"5050552", "7576651", "8451277", "7881200", "1302850",
"1250255", "4581002" };
String newAccount;
String test = "Invalid";
newAccount = keyboard.next();
for (int i = 0; i < accountNums.length; i++)
{
if(newAccount == accountNums[i])
{
test = "Valid";
}
}
System.out.println(test);
}
}
thank you for any assistance (and patience)
Use equals method. Check here why to.
if (newAccount.equals(accountNums[i]))
Jayamohan's answer is correct and working but I suggest working with integers rather than Strings. It is a more effective approach as CPUs handle numbers (integers) with a lot more ease than they handle Strings.
What has to be done in this case is change newAccount and accountNums to ints instead of Strings and also remove all the quotation marks from the accountNums initialization. Instead of calling keyboard.next() you can call keyboard.nextInt(), which returns an integer. The if-statement is fine as it is.
Why are you using an array?
List<String> accountNums = Arrays.asList( "5658845", "8080152", "1005231", "4520125", "4562555",
"6545231", "7895122", "5552012", "3852085", "8777541",
"5050552", "7576651", "8451277", "7881200", "1302850",
"1250255", "4581002" );
String test = "Invalid";
Then you just need this (no loop):
if (accountNums.contains(newAccount)) {
test = "Valid";
}
Plus, it's easier to read and understand.
You cannot compare strings with ==
You must use .equals()
Sorry, silly question here, I have googled it but anything I search for seems to be returning methods of using binary search etc., but what I need is actually much simpler.
I have an array of languages. I am creating a scanner, asking for input. Trying to make it so that if the language input by the user isn't in the array, it displays an error and asks again. Should be simple, I have just drawn a blank.
Can anyone help please ? Here is what I have so far !
Scanner scan = new Scanner(System.in);
language = scan.next();
while( language NOT IN ARRAY languages) {
System.out.print("error!");
language = scan.next();
}
I understand your question better now. You should use a Set for sure. But you will want to use the contains() method of the set to check if the language exists.
Scanner scan = new Scanner(System.in);
language = scan.next();
while(!set.contains(language)) {
System.out.print("error!");
language = scan.next();
}
Old answer, still relevant info though:
What yo want to use is a Set collection type. A set does not allow duplicate entries.
From the Javadocs:
A collection that contains no duplicate elements.
Set<String> set = new HashSet<String>();
// will not add another entry if set contains language already
set.add(language);
Also, if you want to know if the value was rejected or not, you can use the return type of the add() method. It returns true if the item did not exist, and false otherwise.
You can do something like:
public static boolean contains(String language, String[] lang_array) {
for (int i = 0; i < lang_array.length; i++) {
if (lang_array[i].equals(language))
return true;
}
return false;
}
public static void main(String[] args) {
String[] lang_array = {"Java", "Python", "Ruby"};
Scanner scan = new Scanner(System.in);
String language = scan.next();
while(!contains(language, lang_array)) {
System.out.print("error!");
language = scan.next();
}
}
You can do this is you really need to use an array:
Arrays.sort(languages);
Scanner scan = new Scanner(System.in);
language = scan.next();
while( Arrays.binarySearch(languages, language) < 0) {
System.out.print("error!");
language = scan.next();
}
There are two things you can do:
Either iterate through the array and compare the contents on each element to what you're trying to see if it's there every time you add a language.
OR
Use a LinkedList or some other kind of Java Structure that has a .contains() method, this will, in a way, do something similar to what I mentioned for the Array.
How can you make the efficient many-to-many -relation from fileID to Words and from word to fileIDs without database -tools like Postgres in Java?
I have the following classes.
The relation from fileID to words is cheap, but not the reverse, since I need three for -loops for it.
alt text http://img191.imageshack.us/img191/4077/oliorakenne1.png
My solution is not apparently efficient.
Other options may be to create an extra class that have word as an ID with the ArrayList of fileIDs.
Reply to JacobM's answer
The relevant part of MyFile's constructors is:
/**
* Synopsis of data in wordToWordConutInFile.txt:
* fileID|wordID|wordCount
*
* Synopsis of the data in the file wordToWordID.txt:
* word|wordID
**/
/**
* Getting words by getting first wordIDs from wordToWordCountInFile.txt and then words in wordToWordID.txt.
*/
InputStream in2 = new FileInputStream("/home/dev/wordToWordCountInFile.txt");
BufferedReader fi2 = new BufferedReader(new InputStreamReader(in2));
ArrayList<Integer> wordIDs = new ArrayList<Integer>();
String line = null;
while ((line = fi2.readLine()) != null) {
if ((new Integer(line.split("|")[0]) == currentFileID)) {
wordIDs.add(new Integer(line.split("|")[6]));
}
}
in2.close();
// Getting now the words by wordIDs.
InputStream in3 = new FileInputStream("/home/dev/wordToWordID.txt");
BufferedReader fi3 = new BufferedReader(new InputStreamReader(in3));
line = null;
while ((line = fi3.readLine()) != null) {
for (Integer wordID : wordIDs) {
if (wordID == (new Integer(line.split("|")[1]))) {
this.words.add(new Word(new String(line.split("|")[0]), fileID));
break;
}
}
}
in3.close();
this.words.addAll(words);
The constructor of Word is at the paste.
Wouldn't a more efficient approach be to assign the link from Word to MyFile at the point that you know the Word is in the File? That is to say, how do you build the list of Words in the MyFile object? If you're reading the words in to the MyFile out of, say, a file on the filesystem, than as you read in each word, you assign its MyFile to the current file.
//within MyFile constructor or setter for Words
while (//there's another word to add) {
Word newWord = new Word(//read word from file);
words.add(newWord);
newWord.setMyFile(this);
}
This is akin to the typical way to manage a bidirectional parent-child relationship:
//in Parent
public void addChild(Child child) {
myChildren.add(child);
child.setParent(this);
}
It might help if you show us how you build the MyFile object.
Edited after you added the code that builds the list of Words:
OK, so having seen the code that builds your Words, I don't think setting up the relationship is the source of your inefficiencies. It looks like you are setting up the relationship in exactly the way I suggested (as you add each word, you give that word the fileID of the corresponding file).
It looks like the source of your inefficiencies are that, for each word, you have to match it up with various things that you currently have in a set of files (e.g. WordToWordId). So for every word you have to loop through every line of that file, and find the match. This is certainly inefficient.
The better approach is to have those pairings in memory in a HashMap, initialized at startup. That way, if you have a particular word and need the corresponding ID, or vice versa, you look them up in your HashMap, which is a constant-time operation. Similarly, for each word, you are looping through every file; again, do that loop ONCE, and store the result in a HashMap. Then lookups become constant time.
Both classes should override hashCode and equals. Thus you will decide what is equal.
Then you will create a set in each of your classes.
public class MyFile implements Comparable<MyFile> {
//your fields here
Set<Word> words = new HashSet<Word>(0);
//Remember to override hashCode and equals
}
public class Word implements Comparable<Word> {
//your fields here
Set<MyFile> words = new HashSet<MyFile>(0);
//Remember to override hashCode and equals
}
In your sets now you will have all the MyFiles.words and otherway around, all the Words.myFile
I think you want that the file know it's words and the words know the files where it is used.
public class File {
private List<Word> words;
public File(){
words=new Vector<Word>();
}
/**
*The method add word to word list.
**/
public addWord(Word word){
this.words.add(word);
word.addFile(this);
}
}
public class Word{
List<File> files;
public addFile(File file){
this.files.add(file);
}
}
or vice versa... but you should question GRASP Design pattern.Maybe your data type is wrong (I dont say wrong because itis your desing,so i respect).