I'm messing around trying to learn to use HashSets to remove duplicate elements in my output but I'm running into some trouble.
My goal is to select a text file when the program is run and for it to display the words of the text file without duplicates, punctuation, or capital letters. All of it works fine except for removing the duplicates.
This is my first time using a Set like this. Any suggestions as to what I'm missing? Thanks!
Partial text file input for example: "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure"
import java.util.Scanner;
import java.util.List;
import java.io.*;
import java.util.*;
import javax.swing.JFileChooser;
public class Lab7 {
public interface OrderedList<T extends Comparable<T>> extends Iterable<T>
{
public void add(T element);
public T removeFront();
public T removeRear();
public int size();
public boolean isEmpty();
public boolean contains(T element);
public Iterator<T> iterator();
}
public static void main(String[] arg) throws FileNotFoundException
{
Scanner scan = null;
JFileChooser chooser = new JFileChooser("../Text");
int returnValue = chooser.showOpenDialog(null);
if( returnValue == JFileChooser.APPROVE_OPTION)
{
File file = chooser.getSelectedFile();
scan = new Scanner(file);
}
else
return;
int count = 0;
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String[] noDuplicate = {scan.next().replaceAll("[\\W]", "").toLowerCase()};
List<String> list = Arrays.asList(noDuplicate);
set.addAll(list);
count++;
}
scan.close();
System.out.println(set);
System.out.println();
System.out.println(chooser.getName() + " has " + count + " words.");
}
}
Your problem is that you are creating a new HashSet every time you read a word using the Scanner, so there is no chance for it to do de-duplication. You can fix it with the following steps. Also, normal HashSet does not retain ordering.
Create the HashSet once, before the Scanner loop.
Use a LinkedHashSet, so that order is preserved in the same order that you added it.
Inside the loop, use set.add(item);. As the other answer mentions, you do not need to create a one-element list.
Adding the code for completeness.
public static void main(String[] arg) throws FileNotFoundException
{
Scanner scan = null;
scan = new Scanner(new File("Input.txt"));
int count = 0;
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String word = scan.next().replaceAll("[\\W]", "").toLowerCase();
set.add(word);
count++;
}
scan.close();
// System.out.println(set);
System.out.println();
System.out.println("Input.txt has " + count + " words.");
// How do I print a set by myself?
for (String word : set) {
// Also remove commas
System.out.println(word.replaceAll(",",""));
}
}
I would do it this way:
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String noDuplicate = scan.next().replaceAll("[\\W]", "").toLowerCase();
set.add(noDuplicate);
}
scan.close();
System.out.println("The text has " + set.size() + " unique words.");
Your solution (Creating a one element array, converting that to a List, and converting that to a HashSet) is extremely inefficient, in addition to being incorrect. Just use the String you're originally working with, and add it to the LinkedHashSet (which will preserve ordering). At the end set.size() will show you the number of unique words in your sentence.
Related
We I have this college assignment where I have to Read a file with a list of names and add up to 3 presents to each one. I can do it but the presents are repeating and some people in the list are getting the same present more than once. How can I stop it so each person receives different variety of present each time?
Here is my code:
public static void main(String[] args) throws IOException {
String path = "Christmas.txt";
String line = "";
ArrayList<String> kids = new ArrayList<>();
FileWriter fw = new FileWriter("Deliveries.txt");
SantasFactory sf = new SantasFactory();
try (Scanner s = new Scanner(new FileReader("Christmas.txt"))) {
while (s.hasNext()) {
kids.add(s.nextLine());
}
}
for (String boys : kids) {
ArrayList<String> btoys = new ArrayList<>();
int x = 0;
while (x < 3) {
if (!btoys.contains(sf.getRandomBoyToy().equals(sf.getRandomBoyToy()))) {
btoys.add(sf.getRandomBoyToy());
x++;
}
}
if (boys.endsWith("M")) {
fw.write(boys + " (" + btoys + ")\n\n");
}
}
fw.close();
}
}
Just use a Set data structure instead of a List.
if (!btoys.contains(sf.getRandomBoyToy().equals(sf.getRandomBoyToy()))) {
btoys.add(sf.getRandomBoyToy());
x++;
}
Generates 3 toys, comparing 2 of them with each other first, and checking if the resulting boolean is present in the list of strings (which it presumably isn't), then appending the 3rd one.
Instead you should generate a single one, and use it for both checking and adding:
String toy = sf.getRandomBoyToy();
if(!btoys.contains(toy)) {
btoys.add(toy);
x++;
}
The set interface present in the java.util package and extends the Collection interface is an unordered collection of objects in which duplicate values cannot be stored. It is an interface which implements the mathematical set. This interface contains the methods inherited from the Collection interface and adds a feature which restricts the insertion of the duplicate elements. There are two interfaces which extend the set implementation namely
for (String boys : kids) {
Set<String> btoys = new HashSet<String>();
btoys.add(sf.getRandomBoyToy());
if (boys.endsWith("M")) {
fw.write(boys + " (" + btoys + ")\n\n");
}
}
I have the following code:
package sportsCardsTracker;
import java.io.*;
import java.text.*;
import java.util.*;
import java.util.stream.Collectors;
public class Test_Mark6 {
public static ArrayList<String> listingNameList;
public static ArrayList<String> finalNamesList;
public static void main(String[] args) throws IOException, ParseException {
listingNameList = new ArrayList();
listingNameList.add("LeBron James 2017-18 Hoops Card");
listingNameList.add("Stephen Curry Auto Patch, HOT INVESTMENTS!");
listingNameList.add("Michael Jordan 1998 Jersey Worn Card");
ArrayList<String> playersNamesList = new ArrayList();
playersNamesList.add("LeBron James");
playersNamesList.add("Stephen Curry");
playersNamesList.add("Michael Jordan");
finalNamesList = new ArrayList();
String directory = System.getProperty("user.dir");
File file = new File(directory + "/src/sportsCardsTracker/CardPrices.csv");
FileWriter fw = new FileWriter(file, false); //true to not over ride
for (int i = 0; i < listingNameList.size(); i++) {
for (String listingNames : listingNameList) {
List<String> result = NBARostersScraper_Mark3.getNBARoster().stream().map(String::toLowerCase).collect(Collectors.toList());
boolean valueContained = result.stream().anyMatch(s -> listingNames.toLowerCase().matches(".*" + s + ".*"));
if(valueContained == true) {
finalNamesList.add(//The players' name);
}
}
fw.write(String.format("%s, %s\n", finalNamesList.get(i)));
}
}
}
Basically, in the listingsNameList, I have the listing's names and in the playersNamesList, I have all the players' names. What I would like is that, if the code matches the names between the two arrayList and find a player's name, it should returns the players' only.
For example, instead of "LeBron James 2017-18 Hoops Card" it should return "Lebron James" only. If it does not find anything, then just return the listing's name. So far, I have created a new ArrayList namely finalNamesList, my idea would be using an if statement (if match found then add players' name to finalNamesList, if not add the listing' name to finalNamesList). However the code above is not working and it is just adding all of the names in the listingNameList to the finalNamesList. I suspect that the way I grab the index is wrong - but I don't know how to fix it.
The method you are using to match a pattern that seems wrong. Instead of "match()" you can use string contains method as below.
List<String> temp = new ArrayList<>();
for (String listingNames : listingNameList) {
temp = playersNamesList.parallelStream().filter(s -> listingNames.toLowerCase().contains(s.toLowerCase())).map(s -> s).collect(Collectors.toList());
if(temp.size() > 0){
System.out.println(temp.get(0));
//fw.write(String.format("%s, %s\n", temp.get(0));
}
}
One more thing, You don't need to use 2 for loop here, with one loop you can achieve your output.
Though You can still optimize this code, I have taken the temp list above that you can avoid.
I am a beginner in Java. Basically, I have loaded each text document and stored each individual words in the text document in the hasmap. Afterwhich, I tried storing all the hashmaps in an ArrayList. Now I am stuck with how to retrieve all the words in my hashmaps that is in the arraylist!
private static long numOfWords = 0;
private String userInputString;
private static long wordCount(String data) {
long words = 0;
int index = 0;
boolean prevWhiteSpace = true;
while (index < data.length()) {
//Intialise character variable that will be checked.
char c = data.charAt(index++);
//Determine whether it is a space.
boolean currWhiteSpace = Character.isWhitespace(c);
//If previous is a space and character checked is not a space,
if (prevWhiteSpace && !currWhiteSpace) {
words++;
}
//Assign current character's determination of whether it is a spacing as previous.
prevWhiteSpace = currWhiteSpace;
}
return words;
} //
public static ArrayList StoreLoadedFiles()throws Exception{
final File f1 = new File ("C:/Users/Admin/Desktop/dataFiles/"); //specify the directory to load files
String data=""; //reset the words stored
ArrayList<HashMap> hmArr = new ArrayList<HashMap>(); //array of hashmap
for (final File fileEntry : f1.listFiles()) {
Scanner input = new Scanner(fileEntry); //load files
while (input.hasNext()) { //while there are still words in the document, continue to load all the words in a file
data += input.next();
input.useDelimiter("\t"); //similar to split function
} //while loop
String textWords = data.replaceAll("\\s+", " "); //remove all found whitespaces
HashMap<String, Integer> hm = new HashMap<String, Integer>(); //Creates a Hashmap that would be renewed when next document is loaded.
String[] words = textWords.split(" "); //store individual words into a String array
for (int j = 0; j < numOfWords; j++) {
int wordAppearCount = 0;
if (hm.containsKey(words[j].toLowerCase().replaceAll("\\W", ""))) { //replace non-word characters
wordAppearCount = hm.get(words[j].toLowerCase().replaceAll("\\W", "")); //remove non-word character and retrieve the index of the word
}
if (!words[j].toLowerCase().replaceAll("\\W", "").equals("")) {
//Words stored in hashmap are in lower case and have special characters removed.
hm.put(words[j].toLowerCase().replaceAll("\\W", ""), ++wordAppearCount);//index of word and string word stored in hashmap
}
}
hmArr.add(hm);//stores every single hashmap inside an ArrayList of hashmap
} //end of for loop
return hmArr; //return hashmap ArrayList
}
public static void LoadAllHashmapWords(ArrayList m){
for(int i=0;i<m.size();i++){
m.get(i); //stuck here!
}
Firstly your login wont work correctly. In the StoreLoadedFiles() method you iterate through the words like for (int j = 0; j < numOfWords; j++) { . The numOfWords field is initialized to zero and hence this loop wont execute at all. You should initialize that with length of words array.
Having said that to retrieve the value from hashmap from a list of hashmap, you should first iterate through the list and with each hashmap you could take the entry set. Map.Entry is basically the pair that you store in the hashmap. So when you invoke map.entrySet() method it returns a java.util.Set<Map.Entry<Key, Value>>. A set is returned because the key will be unique.
So a complete program will look like.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
import java.util.Scanner;
public class FileWordCounter {
public static List<HashMap<String, Integer>> storeLoadedFiles() {
final File directory = new File("C:/Users/Admin/Desktop/dataFiles/");
List<HashMap<String, Integer>> listOfWordCountMap = new ArrayList<HashMap<String, Integer>>();
Scanner input = null;
StringBuilder data;
try {
for (final File fileEntry : directory.listFiles()) {
input = new Scanner(fileEntry);
input.useDelimiter("\t");
data = new StringBuilder();
while (input.hasNext()) {
data.append(input.next());
}
input.close();
String wordsInFile = data.toString().replaceAll("\\s+", " ");
HashMap<String, Integer> wordCountMap = new HashMap<String, Integer>();
for(String word : wordsInFile.split(" ")){
String strippedWord = word.toLowerCase().replaceAll("\\W", "");
int wordAppearCount = 0;
if(strippedWord.length() > 0){
if(wordCountMap.containsKey(strippedWord)){
wordAppearCount = wordCountMap.get(strippedWord);
}
wordCountMap.put(strippedWord, ++wordAppearCount);
}
}
listOfWordCountMap.add(wordCountMap);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} finally {
if(input != null) {
input.close();
}
}
return listOfWordCountMap;
}
public static void loadAllHashmapWords(List<HashMap<String, Integer>> listOfWordCountMap) {
for(HashMap<String, Integer> wordCountMap : listOfWordCountMap){
for(Entry<String, Integer> wordCountEntry : wordCountMap.entrySet()){
System.out.println(wordCountEntry.getKey() + " - " + wordCountEntry.getValue());
}
}
}
public static void main(String[] args) {
List<HashMap<String, Integer>> listOfWordCountMap = storeLoadedFiles();
loadAllHashmapWords(listOfWordCountMap);
}
}
Since you are beginner in Java programming I would like to point out a few best practices that you could start using from the beginning.
Closing resources : In your while loop to read from files you are opening a Scanner like Scanner input = new Scanner(fileEntry);, But you never closes it. This causes memory leaks. You should always use a try-catch-finally block and close resources in finally block.
Avoid unnecessary redundant calls : If an operation is the same while executing inside a loop try moving it outside the loop to avoid redundant calls. In your case for example the scanner delimiter setting as input.useDelimiter("\t"); is essentially a one time operation after a scanner is initialized. So you could move that outside the while loop.
Use StringBuilder instead of String : For repeated string manipulations such as concatenation should be done using a StringBuilder (or StringBuffer when you need synchronization) instead of using += or +. This is because String is an immutable object, meaning its value cannot be changed. So each time when you do a concatenation a new String object is created. This results in a lot of unused instances in memory. Where as StringBuilder is mutable and values could be changed.
Naming convention : The usual naming convention in Java is starting with lower-case letter and first letter upper-case for each word. So its a standard practice to name a method as storeLoadedFiles as opposed to StoreLoadedFiles. (This could be opinion based ;))
Give descriptive names : Its a good practice to give descriptive names. It helps in later code maintenance. Say its better to give a name as wordCountMap as opposed to hm. So in future if someone tries to go through your code they'll get a better and faster understanding about your code with descriptive names. Again opinion based.
Use generics as much as possible : This avoid additional casting overhead.
Avoid repetition : Similar to point 2 if you have an operation that result in the same output and need to be used multiple times try moving it to a variable and use the variable. In your case you were using words[j].toLowerCase().replaceAll("\\W", "") multiple times. All the time the result is the same but it creates unnecessary instances and repetitions. So you could move that to a String and use that String elsewhere.
Try using for-each loop where ever possible : This relieves us from taking care of indexing.
These are just suggestions. I tried to include most of it in my code but I wont say its the perfect one. Since you are a beginner if you tried to include these best practices now itself it'll get ingrained in you. Happy coding.. :)
for (HashMap<String, Integer> map : m) {
for(Entry<String,Integer> e:map.entrySet()){
//your code here
}
}
or, if using java 8 you can play with lambda
m.stream().forEach((map) -> {
map.entrySet().stream().forEach((e) -> {
//your code here
});
});
But before all you have to change method signature to public static void LoadAllHashmapWords(List<HashMap<String,Integer>> m) otherwise you would have to use a cast.
P.S. are you sure your extracting method works? I've tested it a bit and had list of empty hashmaps all the time.
I am piping in a file. I am tracking word pairs from the file. Using a treemap the keys are all sorted. However, when i add words to those keys they are not sorted.
here is the part i need help on in the process function:
private static void process(){
if(!result.containsKey(thisWord)){
result.put(thisWord, new ArrayList<String>());
}
// Add nextWord to the list of adjacent words to thisWord:
result.get(thisWord).add(nextWord); // nextword is not sorted within the key
thisword is sorted
nextWord is not..
Can i use Collections.sort(result); somehow?
im just not sure how i get to the nextWord within the result to do that.
or, is there no way to do it within my situation. I would rather not change things unless you recommend it.
This is the program
import java.util.Map.Entry;
import java.util.TreeSet;
import java.io.*;
import java.util.*;
public class program1 {
private static List<String> inputWords = new ArrayList<String>();
private static Map<String, List<String>> result = new TreeMap<String, List<String>>();
public static void main(String[] args) {
collectInput();
process();
generateOutput();
}
private static void collectInput(){
Scanner sc = new Scanner(System.in);
String word;
while (sc.hasNext()) { // is there another word?
word = sc.next(); // get next word
if (word.equals("---"))
{
break;
}
inputWords.add(word);
}
}
private static void process(){
// Iterate through every word in our input list
for(int i = 0; i < inputWords.size() - 1; i++){
// Create references to this word and next word:
String thisWord = inputWords.get(i);
String nextWord = inputWords.get(i+1);
// If this word is not in the result Map yet,
// then add it and create a new empy list for it.
if(!result.containsKey(thisWord)){
result.put(thisWord, new ArrayList<String>());
}
// Add nextWord to the list of adjacent words to thisWord:
result.get(thisWord).add(nextWord); // need to sort nextword
// Collections.sort(result);
}
}
private static void generateOutput()
{
for(Entry e : result.entrySet()){
System.out.println(e.getKey() + ":");
// Count the number of unique instances in the list:
Map<String, Integer> count = new HashMap<String, Integer>();
List<String> words = (List)e.getValue();
for(String s : words){
if(!count.containsKey(s)){
count.put(s, 1);
}
else{
count.put(s, count.get(s) + 1);
}
}
// Print the occurances of following symbols:
for(Entry f : count.entrySet()){
System.out.println(" " + f.getKey() + ", " + f.getValue() );
}
}
System.out.println();
}
}
If you want the collection of "nextword"s sorted, why not use a TreeSet rather than an ArrayList? The only reason I can see against it is if you might have duplicates. If duplicates are allowed, then yes, use Collections.sort on the ArrayList when you're done adding to them. Or look in the Apache Commons or Google collection classes - I don't know them off the top of my head, but I'm sure there is a sorted List that allows duplicates in one or both of them.
result.get(thisWord).add(nextWord);
Collections.sort(result.get(thisWord));
Y Don't you try some thing like this
Collections.sort(inputWords);
Using a msdos window I am piping in an amazon.txt file.
I am trying to use the collections framework. Keep in mind I want to keep this
as simple as possible.
What I want to do is count all the unique words in the file... with no duplicates.
This is what I have so far. Please be kind this is my first java project.
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;
public class project1 {
// ArrayList<String> a = new ArrayList<String>();
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String word;
String grab;
int count = 0;
ArrayList<String> a = new ArrayList<String>();
// Iterator<String> it = a.iterator();
System.out.println("Java project\n");
while (sc.hasNext()) {
word = sc.next();
a.add(word);
if (word.equals("---")) {
break;
}
}
Iterator<String> it = a.iterator();
while (it.hasNext()) {
grab = it.next();
if (grab.contains("a")) {
System.out.println(it.next()); // Just a check to see
count++;
}
}
System.out.println("I counted abc = ");
System.out.println(count);
System.out.println("\nbye...");
}
}
In your version, the wordlist a will contain all words but duplicates aswell. You can either
(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution
(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order
Edit
If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).
Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.
If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.