add file to collections framework from file - java

I am not trying to duplicate threads here.
My problem is i am piping in a file using msdos called amazon.txt
the file has 637 words in it..
I want a count of unique words.. and not a count of "a", "the" , "this"
which i havent counted for yet in the code..
when i add to a tree set it only has 8 words..
There should be atlest 300 unique words..
count of total file = 637
count2 of treeset = 8
I thought treeset handles duplicates? what am i doing wrong?
The file does contain some ints an $
import java.util.Scanner;
import java.util.ArrayList;
import java.util.TreeSet;
import java.util.Iterator;
import java.util.HashSet;
public class practice1
{
public static void main(String[] args)
{
Scanner sc = new Scanner(System.in);
String word;
//String grab;
int count = 0;
int count2 =0;
int count3 =0;
int count4 =0;
int number;
//ArrayList<String> a = new ArrayList<String>();
TreeSet<String> a = new TreeSet<String>();
while (sc.hasNext())
{
word = sc.next();
count++; // 637 words
a.add(word);
if (word.equals("---"))
{
break;
}
}
Iterator<String> it = a.iterator();
while(it.hasNext())
{
string grab = it.next();
count2++; // 8 words
if (grab.equals("---"))
{
break;
}
}
System.out.println("count2");
System.out.println(count2);
System.out.println("count");
System.out.println(count);
System.out.println("\nbye...");
}
}

Your method for counting the number of entries in the TreeSet is to iterate over the Set and stop counting when you first see the string "---".
This isn't correct. You are probably assuming that the order of entries returned by TreeSet.iterator() is the same order as which they were inserted in. That isn't the case:
The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
"Natural ordering" here means the results of String.compareTo(String) (since String implements Comparable<String>), which tests for lexicographical order. In other words, a the iterator of a TreeSet<String> returns the items in alphabetical order.
If you want to know the size of your Set, just use size().

I don't see anywhere where you are adding the word into the TreeSet 'a'.
If I'm just missing that (and I might be) I'd bet the problem is that a TreeSet is not guaranteed to iterate in the order of insertion. That is, you add "---" last but there's no reason it won't come out of the iterator 8th and terminate your program.
So I'd say get rid of the check where you see if the iterator returns "---" and see where that gets you.
Had time to verify, change:
if (grab.equals("---"))
{
break;
}
to:
if (grab.equals("---"))
{
//break;
}
and it works as expected.
Good luck!

There is no need to iterate a 2nd time, just replace 2nd loop with
System.out.println("Treeset.size():" + a.size() );
and do not add "---" to treeset in the first loop (assuming this is some kind of end of file marker)
if (word.equals("---"))
{
break;
}
a.add(word);

Related

Removing duplicates from an ArrayList?

Good afternoon everyone, I am currently studying for my Java Final and I have a review exercise that asks the reader to create a program that asks the user to input 10 integers and then to use a method to remove duplicates and display the distinct list. The method is provided for you as well.
I've gotten the majority of the code written, in fact I thought I was done until I realized that the for loop is removing more than just duplicates..
Here is my code:
public class lab25 {
public static void main(String[] args) {
// TODO Auto-generated method stub
Scanner input = new Scanner(System.in);
int i;
//Create array list
ArrayList<Integer> numbers = new ArrayList<>();
System.out.println("Please enter 10 numbers!");
//Populate
for(i=0; i<10; i++) {
numbers.add(input.nextInt());
}
System.out.println("Your numbers are: " + numbers.toString());
removeDuplicate(numbers);
System.out.println("The distinct numbers are: " +numbers.toString());
input.close();
}
public static void removeDuplicate(ArrayList<Integer> list) {
int i;
for(i=0; i<list.size(); i++) {
if(list.contains(list.get(i))) {
list.remove(i);
}
}
}
}
Just curious what I have done wrong here? I think my issue might lie in my for loop.. Thanks to all who answer.
list.contains(list.get(i)) always returns true, since the i'th element of the List is contained in the List.
Therefore removeDuplicate is trying to remove all the elements (but you only remove half of them, since after removing the i'th element you skip the new i'th element).
There are many ways to remove duplicates. The most efficient involve using a HashSet. If you want to find duplicates using only List methods, you can check if list.lastIndexOf(list.get(i)) > i.
The expression list.contains(list.get(i)) is always true, since you're asking if the list contains some element from the list. You need to check if list.get(i) is contained in the first i-1 items in the list, which I recommend doing with a loop.
Be aware that a loop with list.remove will run slowly, since removing item i from an ArrayList is done by replacing item i with i+1, then replacing item i+1 with i+2 and so on. This means it takes around length^2 time to make a loop that calls remove in every iteration. The function list.contains has the same problem, as it has to go through the entire list. This may not matter if you have 10 items, but if you had a list with a million items, it would take a long time to run.
The easiest ways is to use Stream.distinct():
public static List<Integer> removeDuplicate(List<Integer> list) {
return list.stream().distinct().collect(Collectors.toList());
}
In case you are free to choose collection, you should use LinkedHashSet instead. It holds ordered unique numbers.
A solution could be this one. I startet at the end of the list that I don't delete indexes the loop has to visit in the future.
public static void removeDuplicate(ArrayList<Integer> list) {
int i = list.size() - 1;
while (i > -1) {
// check for duplicate
for (int j = 0; j < i; j++) {
if (list.get(i) == list.get(j)) {
// is duplicate: remove
list.remove(i);
break;
}
}
i--;
}
}
You are taking the list.get(i) which of course is present in the list, and you will delete all of the values in the end.
You could remove them by using a set:
Set<String> hs = new HashSet<>();
hs.addAll(numbers);
numbers.clear();
numbers.addAll(hs);
If you want to keep the current order and do not want to use set.
List<String> notduplicatedList =
new ArrayList<>(new LinkedHashSet<>(String));

Why isn't my program removing "all"?

My problem is, when I output this code, it's not outputting what I want which is to remove the "all". It outputs the same exact thing the first print statement did.
Here's my code:
// RemoveAll
// Spec: To remove the "all"
// ArrayList remove() exercise
import java.util.ArrayList;
public class RemoveAll
{
public static void main(String args[])
{
ArrayList<String> ray;
ray = new ArrayList<String>();
int spot = ray.size() - 1;
ray.add("all");
ray.add("all");
ray.add("fun");
ray.add("dog");
ray.add("bat");
ray.add("cat");
ray.add("all");
ray.add("dog");
ray.add("all");
ray.add("all");
System.out.println(ray);
System.out.println(ray.size());
// add in a loop to remove all occurrences of all
while (spot >= 0)
{
if (ray.get(spot).equalsIgnoreCase("all"))
{
ray.remove(spot);
}
spot = spot - 1;
}
System.out.println("\n" + ray);
System.out.println(ray.size());
}
}
Any ideas?
you are determining size() before filling list
put this after once you have list filled (i.e. after all add())
int spot = ray.size() - 1;
Another way to remove items from the list is to use an Iterator:
for(Iterator<String> i = ray.iterator(); i.hasNext(); ) {
if(i.next().equalsIgnoreCase("all")) {
i.remove();
}
}
That way you don't have to keep track of where you are in the list with respect to removed items.
Two problems. You are setting the size of spot before the array has any values in it so it will have a value of -1 when you get to
while (spot >= 0)
also you are mutating (modifying) the array while you are iterating over it which will cause all sorts of errors. The way you want to do this is using an iterator
Iterator iter = ray.iterator();
while(iter.hasNext()){
String cur = iter.next();
//logic to determin if you need to remove
iter.remove();
}

Finding subsets of size k for a given set of size n

I am trying to work out the solution to the above problem and I came up with this
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Set;
public class Subset_K {
public static void main(String[]args)
{
Set<String> x;
int n=4;
int k=2;
int arr[]={1,2,3,4};
StringBuilder sb=new StringBuilder();
for(int i=1;i<=(n-k);i++)
sb.append("0");
for(int i=1;i<=k;i++)
sb.append("1");
String bin=sb.toString();
x=generatePerm(bin);
Set<ArrayList <Integer>> outer=new HashSet<ArrayList <Integer>>();
for(String s:x){
int dec=Integer.parseInt(s,2);
ArrayList<Integer> inner=new ArrayList<Integer>();
for(int j=0;j<n;j++){
if((dec&(1<<j))>0)
inner.add(arr[j]);
}
outer.add(inner);
}
for(ArrayList<Integer> z:outer){
System.out.println(z);
}
}
public static Set<String> generatePerm(String input)
{
Set<String> set = new HashSet<String>();
if (input == "")
return set;
Character a = input.charAt(0);
if (input.length() > 1)
{
input = input.substring(1);
Set<String> permSet = generatePerm(input);
for (String x : permSet)
{
for (int i = 0; i <= x.length(); i++)
{
set.add(x.substring(0, i) + a + x.substring(i));
}
}
}
else
{
set.add(a + "");
}
return set;
}
}
I am working on a 4 element set for test purpose and using k=2. What I try to do is initially generate a binary string where k bits are set and n-k bits are not set. Now using this string I find all the possible permutations of this string. And then using these permutations I output the respective element in the set. Now i cant figure out the complexity of this code because I used the generatePerm method from someone else. Can someone help me with the time complexity of the generatePerm method and also the overall time complexity of my solution. I found other recursive implementation of this problem in here Find all subsets of length k in an array However I cant figure out the complexity of it either. So need some help there.
Also I was trying to re-factor my code so that its not just for integers but for all types of data. I dont have much experience with generics. so when I try to modify ArrayList< Integer> to ArrayList< ?> in line 21 eclipse says
Cannot instantiate the type ArrayList< ?>
How do I correct that?
You can use ArrayList<Object> throughout. That will accept any kind of object. If you want a specific type that is determined by the calling code, you will need to introduce a generic type parameter.
Note that in your generatePerm method, you should not use the test
if (input == "")
Instead, you should use:
if ("".equals(input))
Your current code will only succeed if input is the interned string "". It will not work, for instance, if input is computed as a substring() with zero length. In general you should always compare strings with .equals() rather than with == (except under very specific conditions when you are looking for object identity rather than object equality).

How to Count Unique Values in an ArrayList?

I have to count the number of unique words from a text document using Java. First I had to get rid of the punctuation in all of the words. I used the Scanner class to scan each word in the document and put in an String ArrayList.
So, the next step is where I'm having the problem! How do I create a method that can count the number of unique Strings in the array?
For example, if the array contains apple, bob, apple, jim, bob; the number of unique values in this array is 3.
public countWords() {
try {
Scanner scan = new Scanner(in);
while (scan.hasNext()) {
String words = scan.next();
if (words.contains(".")) {
words.replace(".", "");
}
if (words.contains("!")) {
words.replace("!", "");
}
if (words.contains(":")) {
words.replace(":", "");
}
if (words.contains(",")) {
words.replace(",", "");
}
if (words.contains("'")) {
words.replace("?", "");
}
if (words.contains("-")) {
words.replace("-", "");
}
if (words.contains("‘")) {
words.replace("‘", "");
}
wordStore.add(words.toLowerCase());
}
} catch (FileNotFoundException e) {
System.out.println("File Not Found");
}
System.out.println("The total number of words is: " + wordStore.size());
}
Are you allowed to use Set? If so, you HashSet may solve your problem. HashSet doesn't accept duplicates.
HashSet noDupSet = new HashSet();
noDupSet.add(yourString);
noDupSet.size();
size() method returns number of unique words.
If you have to really use ArrayList only, then one way to achieve may be,
1) Create a temp ArrayList
2) Iterate original list and retrieve element
3) If tempArrayList doesn't contain element, add element to tempArrayList
Starting from Java 8 you can use Stream:
After you add the elements in your ArrayList:
long n = wordStore.stream().distinct().count();
It converts your ArrayList to a stream and then it counts only the distinct elements.
I would advice to use HashSet. This automatically filters the duplicate when calling add method.
Although I believe a set is the easiest solution, you can still use your original solution and just add an if statement to check if value already exists in the list before you do your add.
if( !wordstore.contains( words.toLowerCase() )
wordStore.add(words.toLowerCase());
Then the number of words in your list is the total number of unique words (ie: wordStore.size() )
This general purpose solution takes advantage of the fact that the Set abstract data type does not allow duplicates. The Set.add() method is specifically useful in that it returns a boolean flag indicating the success of the 'add' operation. A HashMap is used to track the occurrence of each original element. This algorithm can be adapted for variations of this type of problem. This solution produces O(n) performance..
public static void main(String args[])
{
String[] strArray = {"abc", "def", "mno", "xyz", "pqr", "xyz", "def"};
System.out.printf("RAW: %s ; PROCESSED: %s \n",Arrays.toString(strArray), duplicates(strArray).toString());
}
public static HashMap<String, Integer> duplicates(String arr[])
{
HashSet<String> distinctKeySet = new HashSet<String>();
HashMap<String, Integer> keyCountMap = new HashMap<String, Integer>();
for(int i = 0; i < arr.length; i++)
{
if(distinctKeySet.add(arr[i]))
keyCountMap.put(arr[i], 1); // unique value or first occurrence
else
keyCountMap.put(arr[i], (Integer)(keyCountMap.get(arr[i])) + 1);
}
return keyCountMap;
}
RESULTS:
RAW: [abc, def, mno, xyz, pqr, xyz, def] ; PROCESSED: {pqr=1, abc=1, def=2, xyz=2, mno=1}
You can create a HashTable or HashMap as well. Keys would be your input strings and Value would be the number of times that string occurs in your input array. O(N) time and space.
Solution 2:
Sort the input list.
Similar strings would be next to each other.
Compare list(i) to list(i+1) and count the number of duplicates.
In shorthand way you can do it as follows...
ArrayList<String> duplicateList = new ArrayList<String>();
duplicateList.add("one");
duplicateList.add("two");
duplicateList.add("one");
duplicateList.add("three");
System.out.println(duplicateList); // prints [one, two, one, three]
HashSet<String> uniqueSet = new HashSet<String>();
uniqueSet.addAll(duplicateList);
System.out.println(uniqueSet); // prints [two, one, three]
duplicateList.clear();
System.out.println(duplicateList);// prints []
duplicateList.addAll(uniqueSet);
System.out.println(duplicateList);// prints [two, one, three]
public class UniqueinArrayList {
public static void main(String[] args) {
StringBuffer sb=new StringBuffer();
List al=new ArrayList();
al.add("Stack");
al.add("Stack");
al.add("over");
al.add("over");
al.add("flow");
al.add("flow");
System.out.println(al);
Set s=new LinkedHashSet(al);
System.out.println(s);
Iterator itr=s.iterator();
while(itr.hasNext()){
sb.append(itr.next()+" ");
}
System.out.println(sb.toString().trim());
}
}
3 distinct possible solutions:
Use HashSet as suggested above.
Create a temporary ArrayList and store only unique element like below:
public static int getUniqueElement(List<String> data) {
List<String> newList = new ArrayList<>();
for (String eachWord : data)
if (!newList.contains(eachWord))
newList.add(eachWord);
return newList.size();
}
Java 8 solution
long count = data.stream().distinct().count();

Find unique words in a file - Java

Using a msdos window I am piping in an amazon.txt file.
I am trying to use the collections framework. Keep in mind I want to keep this
as simple as possible.
What I want to do is count all the unique words in the file... with no duplicates.
This is what I have so far. Please be kind this is my first java project.
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;
public class project1 {
// ArrayList<String> a = new ArrayList<String>();
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String word;
String grab;
int count = 0;
ArrayList<String> a = new ArrayList<String>();
// Iterator<String> it = a.iterator();
System.out.println("Java project\n");
while (sc.hasNext()) {
word = sc.next();
a.add(word);
if (word.equals("---")) {
break;
}
}
Iterator<String> it = a.iterator();
while (it.hasNext()) {
grab = it.next();
if (grab.contains("a")) {
System.out.println(it.next()); // Just a check to see
count++;
}
}
System.out.println("I counted abc = ");
System.out.println(count);
System.out.println("\nbye...");
}
}
In your version, the wordlist a will contain all words but duplicates aswell. You can either
(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution
(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order
Edit
If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).
Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.
If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

Categories