counting the duplicates in the arraylist [duplicate]

counting the duplicates in the arraylist [duplicate] - java

This question already has answers here:
How to remove duplicates from a list?
(15 answers)
Closed 8 years ago.
I have a text file that contains:
File1.txt
File2.doc
File3.out
File4.txt
File5.so
File6.dll
I'm trying to get the output to return just the extension name and how many times it has occurred in the text file.
So for this specific file the output should return:
txt 2
doc 1
out 1
so 1
dll 1
The output needs to be in the java output, not the file it self.
I have this so far:
import java.util.*;
import java.io.*;
public class prob03 {
public static void main(String[] args) throws Exception {
Scanner input = new Scanner(new File("prob03.txt"));
ArrayList<String> extensionsArray = new ArrayList<String>();
while (input.hasNextLine()) {
String line = input.nextLine();
String[] parts = line.split("\\.");
String part1 = parts[0];
String extension = parts[1];
extensionsArray.add(extension);
}
for (int i = 0; i < extensionsArray.size(); i++) {
for (int j = 0; j < i; j++) {
if (extensionsArray.get(i).equals(extensionsArray.get(j))) {
}
}
}
}
}
Please help me figure out this problem, I'm not trying to copy and this is not my homework or any kind of assignment. Let me know if I'm going in the right direction please. thank you

If you really want to be really lazy, you can do something like:
int dupeCount = extensionsArray.size() - new HashSet<String>(extensionsArray).size();
Which basically subtracts the total number of elements by the number of unique elements (Sets don't allow duplicates), which will give you the number of elements that were a duplicate.

You can use a HashMap, where the key is the file extension and the value is the count of times you found it.
HashMap<String, Integer> extensionsCount = new HashMap<>();
for (String extension: extensionsArray) {
Integer count = extensionsCount.get(extension);
if (count == null) {
count = 0;
}
count++;
extensionsCount.put(extension, count);
}

Use a HashMap
ArrayList<String> files // <-- your list of filenames
HashMap<String, Integer> extensions = new HashMap<String, Integer>();
for (String filename : files) {
String ext = filename.split(".")[1];
if (extensions.containsKey()) {
Integer count = extensions.get(ext);
count++;
extensions.put(ext, count);
}
else {
extensions.put(ext, new Integer(1))
}
}
for (String ext : extensions.ketSet()) {
System.out.println(ext + ", " + exttensions.get(ext));
}
This will give you the counts of ALL file extensions.

Related

Java Counting occurrences in a List and removing addition/duplicates from the structure while updating the word count

I currently have the words reading from a text file into a String ArrayList. My assignment asked me to not use any HashMaps or HashSets, anything of that nature. While counting the occurrences of a word I also have to remove any additionals(, . : [] ; = -) and duplicates of the same word. Just currently having trouble with how to remove the additionals and removing duplicates any help is appreciated (Beginner at Java). Unable to use splits.
Here is my code:
public static void main(String[] args) throws FileNotFoundException, IOException
{
//Create input Scanner
FileInputStream file = new FileInputStream("Assignment1BData.txt");
Scanner input = new Scanner(file);
//Create the ArrayList
ArrayList<String> wordCount = new ArrayList<String>();
ArrayList<Integer> numCount = new ArrayList<Integer>();
//Read through the file and find the words from text
while(input.hasNext())
{
String word = input.next();
//Create index to look through lines of text
if(wordCount.contains(word))
{
int index = numCount.indexOf(word);
numCount.set(index, numCount.get(index) + 1);
}
else
{
wordCount.add(word);
numCount.add(1);
}
}
input.close();
file.close();
//Print output in for loop
for(int i = 0; i < wordCount.size(); i++)
{
System.out.println(wordCount.get(i) + " = " + numCount.get(i));
}
}

indexOf(Object o) method on the ArrayList should solve the problem of duplicates. Before you go on to add the string to the ArrayList simply call this method if that string already exists in the ArrayList it returns the index otherwise it returns -1. Just keep adding the string read from the text file to ArrayList as long as the indexOf method returns -1 otherwise simply ignore(since it already exists).

You can use something like: newList = removeDuplicates(YourList) So that the duplicates would get removed and you get a new list...
Got it here: https://www.geeksforgeeks.org/how-to-remove-duplicates-from-arraylist-in-java/

You can try by creating a list with unique elements using Stream API as,
List<Integer> listWithDuplicates = Lists.newArrayList(5, 0, 3, 1, 2, 3, 0, 0);
List<Integer> listWithoutDuplicates = listWithDuplicates.stream()
.distinct()
.collect(Collectors.toList());
Then iterate over the original array/list (listWithDuplicates ) and get it compared with listWithoutDuplicates and count for the match.

Try this, for replacng special symbols before doing lookup and doing the count .
import java.io.FileInputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
public class WordCounter {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
//Create input Scanner
FileInputStream file = new FileInputStream("/tmp/Assignment1BData.txt");
Scanner input = new Scanner(file);
//Create the ArrayList
ArrayList<String> wordCount = new ArrayList<String>();
ArrayList<Integer> numCount = new ArrayList<Integer>();
//Read through the file and find the words from text
while(input.hasNextLine())
{
String word = input.nextLine();
//Replace all characters
word = word.replaceAll("[,.:;=-\\[\\]]", "");
//Create index to look through lines of text
if(wordCount.contains(word))
{
int index = wordCount.indexOf(word);
numCount.set(index, numCount.get(index) + 1);
}
else
{
wordCount.add(word);
numCount.add(1);
}
}
input.close();
file.close();
//Print output in for loop
for(int i = 0; i < wordCount.size(); i++)
{
System.out.println(wordCount.get(i) + " = " + numCount.get(i));
}
}
}

Java checking if an element from a list appears in all occurrences

I have a method that takes in an ArrayList of strings with each element in the list equaling to a variation of:
>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
This method is broken down into the Acc, which is AX018718 in this case and seq which are the two lines following the Acc
This is then checked by another ArrayList of strings called pal to see if the substrings match [AAAATTTT, AAACGTTT, AAATATATTT]
I am able to get all of the matches for the different elements of the first list outputted as:
AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205
Is it possible to check if for all of the outputted information if all of the Acc match one pal?
In the case above, the wanted output would be:
AATATT was found in all of the Acc.
my working code:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
ArrayList<String> PalindromesSpotted = new ArrayList<String>();
File file = new File("IAV_PB2_32640.txt");
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<String>();
//initializes the writer
FileWriter fileWriter = new FileWriter("PB2out");
PrintWriter printwriter = new PrintWriter(fileWriter);
//Loads the Array List
while(sc.hasNext()) Gene1.add(sc.next());
for(int i = 0; i < Gene1.size(); i++)
{
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for(int x = 0; x<Pal.size(); x++)
{
if(seq.contains(Pal.get(x))){
String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
printwriter.println(match);
PalindromesSpotted.add(match);
}
}
}
Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}

First off, your code won't write to any file to log the results since you don't close your writers or at the very least flush PrintWriter. As a matter of fact you don't close your reader as well. You really should close your Readers and Writers to free resources. Food for thought.
You can make your PB2Scan() method return either a simple result list as it does now, or a result list of just acc's which contain the same Pal(s), or perhaps both where a simple result list is logged and at the end of that list a list of acc's which contain the same Pal(s) which will also be logged.
Some additional code and an additional integer parameter for the PB2Scan() method would do this. For the additional parameter you might want to add something like this:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException
{ .... }
Where the integer resultType argument would take one of three integer values from 0 to 2:
0 - Simple result list as the code currently does now;
1 - Acc's that match Pal's;
2 - Simple result list and Acc's that Match Pal's at the end of result list.
You should also really have the file to read as an argument for the PB2Scan() method since this file could very easily be a different name the next go around. This makes the method more versatile rather than if the name of the file was hard-coded.
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException { .... }
The method can always write the Same output file since it would best suit what method it came from.
Using the above concept rather than writing to the output file (PB2Out.txt) as the PalindromesSpotted ArrayList is being created I think it's best to write the file after your ArrayList or ArrayLists are complete. To do this another method (writeListToFile()) is best suited to carry out the task. To find out if any same Pal's match other Acc's it is again a good idea to have yet another method (getPalMatches()) do that task.
Since the index locations of of more than one given Pal in any given Seq was not reporting properly either I have provided yet another method (findSubstringIndexes()) to quickly take care of that task.
It should be noted that the code below assumes that the Seq acquired from the trimHeader() method is all one single String with no Line Break characters within it.
The reworked PB2Scan() method and the other above mentioned methods are listed below:
The PB2Scan() Method:
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException {
// Make sure the supplied result type is either
// 0, 1, or 2. If not then default to 0.
if (resultType < 0 || resultType > 2) {
resultType = 0;
}
ArrayList<String> PalindromesSpotted = new ArrayList<>();
File file = new File(filePath);
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<>();
//Loads the Array List
while (sc.hasNext()) {
Gene1.add(sc.next());
}
sc.close(); // Close the read in text file.
for (int i = 0; i < Gene1.size(); i++) {
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for (int x = 0; x < Pal.size(); x++) {
if (seq.contains(Pal.get(x))) {
String match = Pal.get(x) + " in organism: " + Acc +
" Was found in position(s): " +
findSubstringIndexes(seq, Pal.get(x));
PalindromesSpotted.add(match);
}
}
}
// If there is nothing to work with get outta here.
if (PalindromesSpotted.isEmpty()) {
return PalindromesSpotted;
}
// Sort the ArrayList
Collections.sort(PalindromesSpotted);
// Another ArrayList for matching Pal's to Acc's
ArrayList<String> accMatchingPal = new ArrayList<>();
switch (resultType) {
case 0: // if resultType is 0 is supplied
writeListToFile("PB2Out.txt", PalindromesSpotted);
return PalindromesSpotted;
case 1: // if resultType is 1 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
writeListToFile("PB2Out.txt", accMatchingPal);
return accMatchingPal;
default: // if resultType is 2 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
ArrayList<String> fullList = new ArrayList<>();
fullList.addAll(PalindromesSpotted);
// Create a Underline made of = signs in the list.
fullList.add(String.join("", Collections.nCopies(70, "=")));
fullList.addAll(accMatchingPal);
writeListToFile("PB2Out.txt", fullList);
return fullList;
}
}
The findSubstringIndexes() Method:
private static String findSubstringIndexes(String inputString, String stringToFind){
String indexes = "";
int index = inputString.indexOf(stringToFind);
while (index >= 0){
indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
index = inputString.indexOf(stringToFind, index + stringToFind.length()) ;
}
return indexes;
}
The getPalMatches() Method:
private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
ArrayList<String> accMatching = new ArrayList<>();
for (int i = 0; i < Palindromes.size(); i++) {
String matches = "";
String[] split1 = Palindromes.get(i).split("\\s+");
String pal1 = split1[0];
// Make sure the current Pal hasn't already been listed.
boolean alreadyListed = false;
for (int there = 0; there < accMatching.size(); there++) {
String[] th = accMatching.get(there).split("\\s+");
if (th[0].equals(pal1)) {
alreadyListed = true;
break;
}
}
if (alreadyListed) { continue; }
for (int j = 0; j < Palindromes.size(); j++) {
String[] split2 = Palindromes.get(j).split("\\s+");
String pal2 = split2[0];
if (pal1.equals(pal2)) {
// Using Ternary Operator to build the matches string
matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
+ split2[3] : ", " + split2[3];
}
}
if (!matches.equals("")) {
accMatching.add(matches);
}
}
return accMatching;
}
The writeListToFile() Method:
private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
boolean appendFile = false;
if (appendToFile.length > 0) { appendFile = appendToFile[0]; }
try {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
for (int i = 0; i < list.size(); i++) {
bw.append(list.get(i) + System.lineSeparator());
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
}

You should probably create aMap<String, List<String>> containing the Pals as keys and the Accs that contain them as values.
Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
List<String> list = new ArrayList<>();
result.put(gene, list);
for (String pal : Pal) {
if (acc.contains(trimHeader(gene))) {
list.add(pal);
}
}
}
Now you have a Map that you can query for the Pals every Gene contains:
List<String> containedPals = result.get(gene);
This is a very reasonable result for a function like this. What you do afterwards (ie the writing into a file) should better be done in another function (that calls this one).
So, this is probably what you want to do:
List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
// ...
}

Read ArrayList from file. Print words that appear only ONCE

Newbie to coding and Java, please be kind :)
I'm working on a project for school and I'm trying to iterate over an ArrayList that I read in from a text file.
I read the file in using a Scanner into an ArrayList and then sort the ArrayList using Collections.sort() with the hopes that I can check each element with the next one. If the element is the same as the next one, ignore and continue but if the element is not duplicated in the ArrayList, then add it to a new ArrayList.
So, when reading in a text file that has these words:
this this is a a sentence sentence that does not not make sense a sentence not sentence not really really why not this a sentence not sentence a this really why
the new ArrayList should be
is that does make sense
because those words only appear once.
public static void main (String[] args) throws FileNotFoundException {
Scanner fileIn = new Scanner(new File("words.txt"));
ArrayList<String> uniqueArrList = new ArrayList<String>();
ArrayList<String> tempArrList = new ArrayList<String>();
while (fileIn.hasNext()) {
tempArrList.add(fileIn.next());
Collections.sort(tempArrList);
}
for (String s : tempArrList) {
if(!uniqueArrList.contains(s))
uniqueArrList.add(s);
else if (uniqueArrList.contains(s))
uniqueArrList.remove(s);
Collections.sort(uniqueArrList);
System.out.println(uniqueArrList);
}
This is what I have so far but I keep ending up with this [a, does, is, make, really, sense, that]
I hope someone can tell me what I'm doing wrong :)

Your algorithm is not correct, because it keeps adding and removing items from uniqueArrList. Hence, it finds words that appear an odd number of times, and it does not care for the list to be sorted.
You can sort the list once (move sort out of the loop) and then use a very simple strategy:
Walk the list using an integer index
Check the word at the current index against the word at the next index
If words are different, print the current word, and advance index by one
If words are the same, walk the list forward until you see a different word, and use the location of that word as the next value for the loop index.
Here is a sample implementation:
Scanner fileIn = new Scanner(new File("words.txt"));
List<String> list = new ArrayList<>();
while (fileIn.hasNext()) {
list.add(fileIn.next());
}
Collections.sort(list);
int pos = 0;
while (pos != list.size()) {
int next = pos+1;
while (next != list.size() && list.get(pos).equals(list.get(next))) {
next++;
}
if (next == pos+1) {
System.out.println(list.get(pos));
}
pos = next;
}
Demo.

One option here would be to maintain a hashmap of words to counts as you parse the file. Then, iterate that map at the end to obtain the words which only appeared once:
Scanner fileIn = new Scanner(new File("words.txt"));
Map<String, Integer> map = new HashMap<>();
ArrayList<String> uniqueArrList = new ArrayList<String>();
while (fileIn.hasNext()) {
String word = fileIn.next():
Integer cnt = map.get(word);
map.put(word, cnt == null ? 1 : cnt.intValue() + 1);
}
// now iterate over all words in the map, adding unique words to a separate list
for (Map.Entry<String, Integer> entry : map.entrySet()) {
if (entry.getValue() == 1) {
uniqueArrList.add(entry.getKey());
}
}

You current approach is close, you should be sorting once after you add all the words. Then you need to keep an index to the List so you can test if equal elements are adjacent. Something like,
List<String> uniqueArrList = new ArrayList<>();
List<String> tempArrList = new ArrayList<>();
while (fileIn.hasNext()) {
tempArrList.add(fileIn.next());
}
Collections.sort(tempArrList);
for (int i = 1; i < tempArrList.size(); i++) {
String s = tempArrList.get(i - 1);
if (s.equals(tempArrList.get(i))) {
// skip all equal and adjacent values
while (s.equals(tempArrList.get(i)) && i + 1 < tempArrList.size()) {
i++;
}
} else {
uniqueArrList.add(s);
}
}
System.out.println(uniqueArrList);

The easiest way would be to use a Set or a HashSet since you forget to control the elements' repetition. However, if you have to use lists, there is no need to sort the elements. Just iterate twice over the words and there you go
List<String> uniqueWords = new ArrayList<>();
for (int i = 0; i < words.size(); i++) {
boolean hasDuplicate = false;
for (int j = 0; j < words.size(); j++) {
if (i != j) {
if (words.get(i).equals(words.get(j))){
hasDuplicate = true;
}
}
}
if (!hasDuplicate) {
uniqueWords.add(words.get(i))
}
}

Your logic error when you call
else if (uniqueArrList.contains(s))
uniqueArrList.remove(s);
Use One Array:
Scanner fileIn = new Scanner(new File("words.txt"));
ArrayList<String> tempArrList = new ArrayList<String>();
while (fileIn.hasNext()) {
tempArrList.add(fileIn.next());
}
Collections.sort(tempArrList);
System.out.println(tempArrList);
if (tempArrList.size() > 1) {
for (int i = tempArrList.size() - 1; i >= 0; i--) {
String item = tempArrList.remove(i);
if (tempArrList.removeAll(Collections.singleton(item))) {
if (i > tempArrList.size()) {
i = tempArrList.size();
}
} else {
tempArrList.add(item);
}
}
}
System.out.println(tempArrList);
I hope it can help you! Please feedback if it helpful.

For completeness only, this question is a no brainer with Java 8 Streams, using the distinct() intermediate operation:
public static void main (String[] args) throws FileNotFoundException {
final Scanner fileIn = new Scanner(new File("words.txt"));
final List<String> tempArrList = new ArrayList<String>();
while (fileIn.hasNext()) {
tempArrList.add(fileIn.next());
}
final List<String> uniqueArrList = tempArrList.stream().distinct().collect(Collectors.toList());
System.out.println(uniqueArrList);
}
This code prints (for the provided input):
[this, is, a, sentence, that, does, not, make, sense, really, why]
If we want all the words sorted, simply adding sorted() to the stream pipeline does the trick:
tempArrList.stream().sorted().distinct().collect(Collectors.toList());
and we obtain a sorted (and pretty) output:
[a, does, is, make, not, really, sense, sentence, that, this, why]

Why is my radix sorting algorithm returning a partially sorted list?

First off I want to point out that this assignment is homework /but/ I am not looking for a direct answer, but rather at a hint or some insight as to why my implementation is not working.
Here is the given: We are provided with a list of words of 7 characters long each and are asked to sort them using the Radix Sorting Algorithm while using queues.
EDIT 1: Updated Code
Here is my code:
import java.util.*;
import java.io.File;
public class RadixSort {
public void radixSort() {
ArrayList<LinkedQueue> arrayOfBins = new ArrayList<LinkedQueue>();
LinkedQueue<String> masterQueue = new LinkedQueue<String>();
LinkedQueue<String> studentQueue = new LinkedQueue<String>();
//Creating the bins
for (int i = 0; i < 26; i++) {
arrayOfBins.add(new LinkedQueue<String>());
}
// Getting the file name and reading the lines from it
try {
Scanner input = new Scanner(System.in);
System.out.print("Enter the file name with its extension: ");
File file = new File(input.nextLine());
input = new Scanner(file);
while (input.hasNextLine()) {
String line = input.nextLine();
masterQueue.enqueue(line);
}
input.close();
} catch (Exception ex) {
ex.printStackTrace();
}
for (int p = 6; p >= 0; p--) {
for (LinkedQueue queue : arrayOfBins) {
queue.clear();
}
while (masterQueue.isEmpty() == false) {
String s = (String) masterQueue.dequeue();
char c = s.charAt(p);
arrayOfBins.get(c-'a').enqueue(s);
}
for (LinkedQueue queue : arrayOfBins) {
studentQueue.append(queue);
}
}
masterQueue = studentQueue;
System.out.println(masterQueue.size());
System.out.println(masterQueue.dequeue());
}
public static void main(String [] args) {
RadixSort sort = new RadixSort();
sort.radixSort();
}
}

I can see so many problems, I'm not sure how you get an answer at all.
Why do you have two nested outermost loops from 0 to 6?
Why don't you ever clear studentQueue?
The j loop doesn't execute as many times as you think it does.
Aside from definite bugs, the program doesn't output anything -- are you just looking at the result in the debugger? Also are you actually allowed to assume that the words will contain no characters besides lowercase letters?

Converting an String array into char array to compare it with another text file [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Given two files
random_letters.txt
AABBBBB
FLOWERS
BACKGFD
TOBEACH
dictionary.txt
flowers
to
beach
back
I need to check each combination of the random_letters with dictionary to see if there is anything common. it can be a word with at least 6 characters or two words that equal at least 6 characters. Which would make it FLOWERS or TOBEACH.
I am having a tough time figuring out what I need to do. I can get it to work for words with 7 characters because I used strings. I understand I need to use char in order for it to work.
what I have so far:
public static void compare() {
private static String stringToWrite2 = "";
private static String[] allWords = new String[2187];
private static String[] diction = new String[79765];
private static char[][] test = new char[2187][7];
private static char[][] test2 = new char[79765][7];
public static void main(String args[])
try {
Scanner file1 = new Scanner(new File("random_letters.txt"));
Scanner file2 = new Scanner(new File("dictionary.txt"));
for(int i = 0; i < 2187; i++) {
allWords[i] = file1.next();
test[i] = allWords[i].toCharArray();
}
for(int i = 0; i < 79765; i++) {
diction[i] = file2.next();
diction[i] = diction[i].toUpperCase();
test2[i] = diction[i].toCharArray();
}
for(int i = 0; i < 2187; i++) {
for (int j = 0; j < 79765; j++) {
if(allWords[i].equals(diction[j])) {
stringToWrite2 += diction[j];
}
}
}
} catch (IOException e) {
System.out.println("could not find file");
}
System.out.println("-------------------");
System.out.println(stringToWrite2);
for(int i = 0; i < 6; i++) {
for (int j = 0; j < 7; j++)
System.out.println(test2[i][j]);
}
}}

You have two somewhat distinct tasks here: determining if there are any words in dictionary that are also in random_letters (of length >= 6), and determining if there are any sets of two words in dictionary such that their union is a word in random_letters.
Instead of using an array, let's use HashSets for storage, because the single most used operation here will probably be .contains(...). It also gives us access to .retainAll(...), which is very useful for finding intersections.
For the second half of the task, my initial thought was to create a data structure with all of the pairwise permutations of words in diction, and intersect that with allWords. I quickly realized how big that would (likely) become. Instead I used an uglier but more space efficient solution.
private static HashSet<String> allWords = new HashSet<String>();
private static HashSet<String> diction = new HashSet<String>();
public static void compare() {
try {
Scanner file1 = new Scanner(new File("random_letters.txt"));
Scanner file2 = new Scanner(new File("dictionary.txt"));
for(int i = 0; i < 2187; i++) {
allWords.add(file1.next());
}
for(int i = 0; i < 79765; i++) {
diction.add(file2.next().toUpperCase());
}
//Compile set of words that are in both
HashSet<String> intersect = new HashSet<String>();
intersect.addAll(allWords);
intersect.retainAll(diction);
for (String s : intersect){
System.out.println(s);
}
//For every word in random_letters, see if there is a word in diction that is the start of it
HashSet<String> couplesIntersect = new HashSet<String>();
for(String s : allWords){
for(String d : diction){
if(s.startsWith(d)){
//If so, check every word in diction again to see if there is a word that is the remainder
String remainder = s.subString(d.length());
for(String d2 : diction){
if(d2.equals(remainder))
//If so, store this word
couplesIntersect.add(s);
}
}
}
}
//Print those results
for (String s : couplesIntersect){
System.out.println(s);
}
} catch (IOException e) {
System.out.println("could not find file");
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

counting the duplicates in the arraylist [duplicate] - java

Related

Java Counting occurrences in a List and removing addition/duplicates from the structure while updating the word count

Java checking if an element from a list appears in all occurrences

Read ArrayList from file. Print words that appear only ONCE

Why is my radix sorting algorithm returning a partially sorted list?

Converting an String array into char array to compare it with another text file [closed]

Categories

Resources