Ignore numbers in a text file when scanning it in Java - java

I am doing an assignment in Java that requires us to read two different files. One has the top 1000 boy names, and the other contains the top 1000 girl names. We have to write a program that returns all of the names that are in both files. We have to read each boy and girl name as a String, ignoring the number of namings, and add it to a HashSet. When adding to a HashSet, the add method will return false if the name to be added already exists int he HashSet. So to find the common names, you just have to keep track of which names returned false when adding. My problem is that I can't figure out how to ignore the number of namings in each file. My HashSet contains both, and I just want the names.
Here is what I have so far.
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
public class Names {
public static void main(String[] args) {
Set<String> boynames = new HashSet<String>();
Set<String> girlnames = new HashSet<String>();
boynames = loadBoynames();
System.out.println(girlnames);
}
private static Set<String> loadBoynames() {
HashSet<String> d = new HashSet<String>();
File names = new File("boynames.txt");
Scanner s = null;
try {
s = new Scanner(names);
} catch (FileNotFoundException e) {
System.out.println("Can't find boy names file.");
System.exit(1);
}
while(s.hasNext()){
String currentName = s.next();
d.add(currentName.toUpperCase());
}
return d;
}
}
My plan is to take the HashSet that I currently have and add the girl names to it, but before I do I need to not have the numbers in my HashSet.
I tried to skip numbers with this code, but it just spat out errors
while(s.hasNextLine()){
if (s.hasNextInt()){
number = s.nextInt();
}else{
String currentName = s.next();
d.add(currentName.toUpperCase());
}
}
Any help would be appreciated.

You could also use regex to replace all numbers (or more special chars if needed)
testStr = testStr.replaceAll("\\d","");

Try to use StreamTokenizer (java.io) class to read file. it will split your file into tokens and also provide type of token like String value, number value in double data type, end of file, end of line). so you can easily identify the String token.
You can find details from here
http://docs.oracle.com/javase/6/docs/api/java/io/StreamTokenizer.html

Related

If matched then add elements of ArrayList_A to ArrayList , if not then add elements of ArrayList_B to ArrayList

I have the following code:
package sportsCardsTracker;
import java.io.*;
import java.text.*;
import java.util.*;
import java.util.stream.Collectors;
public class Test_Mark6 {
public static ArrayList<String> listingNameList;
public static ArrayList<String> finalNamesList;
public static void main(String[] args) throws IOException, ParseException {
listingNameList = new ArrayList();
listingNameList.add("LeBron James 2017-18 Hoops Card");
listingNameList.add("Stephen Curry Auto Patch, HOT INVESTMENTS!");
listingNameList.add("Michael Jordan 1998 Jersey Worn Card");
ArrayList<String> playersNamesList = new ArrayList();
playersNamesList.add("LeBron James");
playersNamesList.add("Stephen Curry");
playersNamesList.add("Michael Jordan");
finalNamesList = new ArrayList();
String directory = System.getProperty("user.dir");
File file = new File(directory + "/src/sportsCardsTracker/CardPrices.csv");
FileWriter fw = new FileWriter(file, false); //true to not over ride
for (int i = 0; i < listingNameList.size(); i++) {
for (String listingNames : listingNameList) {
List<String> result = NBARostersScraper_Mark3.getNBARoster().stream().map(String::toLowerCase).collect(Collectors.toList());
boolean valueContained = result.stream().anyMatch(s -> listingNames.toLowerCase().matches(".*" + s + ".*"));
if(valueContained == true) {
finalNamesList.add(//The players' name);
}
}
fw.write(String.format("%s, %s\n", finalNamesList.get(i)));
}
}
}
Basically, in the listingsNameList, I have the listing's names and in the playersNamesList, I have all the players' names. What I would like is that, if the code matches the names between the two arrayList and find a player's name, it should returns the players' only.
For example, instead of "LeBron James 2017-18 Hoops Card" it should return "Lebron James" only. If it does not find anything, then just return the listing's name. So far, I have created a new ArrayList namely finalNamesList, my idea would be using an if statement (if match found then add players' name to finalNamesList, if not add the listing' name to finalNamesList). However the code above is not working and it is just adding all of the names in the listingNameList to the finalNamesList. I suspect that the way I grab the index is wrong - but I don't know how to fix it.
The method you are using to match a pattern that seems wrong. Instead of "match()" you can use string contains method as below.
List<String> temp = new ArrayList<>();
for (String listingNames : listingNameList) {
temp = playersNamesList.parallelStream().filter(s -> listingNames.toLowerCase().contains(s.toLowerCase())).map(s -> s).collect(Collectors.toList());
if(temp.size() > 0){
System.out.println(temp.get(0));
//fw.write(String.format("%s, %s\n", temp.get(0));
}
}
One more thing, You don't need to use 2 for loop here, with one loop you can achieve your output.
Though You can still optimize this code, I have taken the temp list above that you can avoid.

Read & compare text files and print words in alphabetical order

First of all I'm sorry if similar questions has been asked before but I couldn't find a solution to what I was looking for. So I've this small java program which compares two text files (text1.txt & text2.txt) and print all the words of text1.txt which doesn't exist in text2.txt. The code below does the job:
text1.txt : This is text file 1. some # random - text
text2.txt : this is text file 2.
import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.*;
public class Read {
public static void main(String[] args) {
Set<String> textFile1 = readFiles("text1.txt");
Set<String> textFile2 = readFiles("text2.txt");
for (String t : textFile1) {
if (!textFile2.contains(t)) {
System.out.println(t);
}}}
public static Set<String> readFiles(String filename)
{
Set<String> words = new HashSet<String>();
try {
for (String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset())) {
String[] split = line.split("\\s+");
for (String word : split) {
words.add(word.toLowerCase());
}}}
catch (IOException e) {
System.out.println(e);
}
return words;
}
}
(Prints word in new line)
Output: #, some, random, 1.
I'm trying to print all the words in alphabetical order. And also if possible, it shouldn't print any specialized character(#,- or numbers). I've been trying to figure it out but no luck. I'd appreciate if someone could help me out with this.
Also I've taken the following line of code from internet which I'm not really familar with. Is there any other easier way to put this line of code:
String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset()))
Edit: HashSet is a must for this piece of work. Sorry I forgot to
mention that.
As you are not allowed to use a TreeSet and forced to use a HashSet, do it this way
import java.io.*;
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.util.*;
public class Read {
public static void main(String[] args) {
Set<String> textFile1 = readFiles("text1.txt");
Set<String> textFile2 = readFiles("text2.txt");
Set<String> difference = new HashSet<String>();
// collect strings by dropping out every string that's not only letters
// using the regex "[a-zA-Z]+"
for (String t : textFile1) {
if (!textFile2.contains(t) && t.matches("[a-zA-Z]+")) {
difference.add(t);
}
}
// sort
List<String> dList = new ArrayList<String>(difference);
Collections.sort(dList);
// show
for (String s : dList) {
System.out.println(s);
}
}
public static Set<String> readFiles(String filename)
{
Set<String> words = new HashSet<String>();
try {
for (String line : Files.readAllLines(new File(filename).toPath(), Charset.defaultCharset())) {
String[] split = line.split("\\s+");
for (String word : split) {
words.add(word.toLowerCase());
}}}
catch (IOException e) {
System.out.println(e);
}
return words;
}
}
Have you looked at any other Set implementations? I think if you use a SortedSet such as a TreeSet, instead of a HashSet, the words will automatically sort into alphabetical order.
Stack Overflow works better if you ask one question at a time.
From What I've read on the java documentation, a HashSet doesn't guarantee sorting on the elements in the set. However if you were to implement instead as a SortedSet it should allow for ordering of the elements, but you may possibly need to make a comparator for it as well.
As for your other questions, for reading files in java there is this guide from geeks for geeks that I find is very user friendly, especially for beginners, and shows a variety of ways to read a file.
Special characters may be a bit tricky, there is a guide here from a previous Stack Overflow answer that may be helpful though.

Trying to remove duplicate elements

I'm messing around trying to learn to use HashSets to remove duplicate elements in my output but I'm running into some trouble.
My goal is to select a text file when the program is run and for it to display the words of the text file without duplicates, punctuation, or capital letters. All of it works fine except for removing the duplicates.
This is my first time using a Set like this. Any suggestions as to what I'm missing? Thanks!
Partial text file input for example: "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure"
import java.util.Scanner;
import java.util.List;
import java.io.*;
import java.util.*;
import javax.swing.JFileChooser;
public class Lab7 {
public interface OrderedList<T extends Comparable<T>> extends Iterable<T>
{
public void add(T element);
public T removeFront();
public T removeRear();
public int size();
public boolean isEmpty();
public boolean contains(T element);
public Iterator<T> iterator();
}
public static void main(String[] arg) throws FileNotFoundException
{
Scanner scan = null;
JFileChooser chooser = new JFileChooser("../Text");
int returnValue = chooser.showOpenDialog(null);
if( returnValue == JFileChooser.APPROVE_OPTION)
{
File file = chooser.getSelectedFile();
scan = new Scanner(file);
}
else
return;
int count = 0;
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String[] noDuplicate = {scan.next().replaceAll("[\\W]", "").toLowerCase()};
List<String> list = Arrays.asList(noDuplicate);
set.addAll(list);
count++;
}
scan.close();
System.out.println(set);
System.out.println();
System.out.println(chooser.getName() + " has " + count + " words.");
}
}
Your problem is that you are creating a new HashSet every time you read a word using the Scanner, so there is no chance for it to do de-duplication. You can fix it with the following steps. Also, normal HashSet does not retain ordering.
Create the HashSet once, before the Scanner loop.
Use a LinkedHashSet, so that order is preserved in the same order that you added it.
Inside the loop, use set.add(item);. As the other answer mentions, you do not need to create a one-element list.
Adding the code for completeness.
public static void main(String[] arg) throws FileNotFoundException
{
Scanner scan = null;
scan = new Scanner(new File("Input.txt"));
int count = 0;
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String word = scan.next().replaceAll("[\\W]", "").toLowerCase();
set.add(word);
count++;
}
scan.close();
// System.out.println(set);
System.out.println();
System.out.println("Input.txt has " + count + " words.");
// How do I print a set by myself?
for (String word : set) {
// Also remove commas
System.out.println(word.replaceAll(",",""));
}
}
I would do it this way:
Set<String> set = new LinkedHashSet<String>();
while(scan.hasNext())
{
String noDuplicate = scan.next().replaceAll("[\\W]", "").toLowerCase();
set.add(noDuplicate);
}
scan.close();
System.out.println("The text has " + set.size() + " unique words.");
Your solution (Creating a one element array, converting that to a List, and converting that to a HashSet) is extremely inefficient, in addition to being incorrect. Just use the String you're originally working with, and add it to the LinkedHashSet (which will preserve ordering). At the end set.size() will show you the number of unique words in your sentence.

Format an arraylist

I have a file that I am storing into an ArrayList and I can't figure out how to format it so that certain Strings of text are stored in particular indexes. The first line will be the category, second line the question and 3rd the answer to trivia questions. I need to do this so that I can then randomly pick questions then check the answers for a trivia game. All I get so far is every word separated by a comma. From the professor,
"The input file contains questions and answers in different categories. For each category, the first line indicates the name of the category. This line will be followed by a number of pairs of lines. The first line of the pair is the question, and the second line is its corresponding answer.
A blank line separates the categories."
Here is my code so far:
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
import javax.swing.JOptionPane;
public class TriviaGamePlayer {
/**
* #param args
*/
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
ArrayList<String> triviaQuestion = new ArrayList<String>();
Scanner infile = new Scanner(new File("trivia.txt"));
while(infile.hasNext()){
triviaQuestion.add(infile.next());
}
System.out.println(triviaQuestion);
}
}
From what I can see in the question so far, You would be best off creating your own TriviaQuestion Object which would look something like
public class TriviaQuestion
{
public String question;
public String answer;
public boolean asked;
public String category;
TriviaQuestion (String q, String a, String c)
{
question = q;
answer = a;
category = c;
}
}
Then you have a few options, but if you have this Object then everything becomes a bit easier. I would create a Map<String,List<TriviaQuestion>> where the key is your category.
Then when reading the file, also you should use infile.hasNextLine() and inFile.nextLine()
Read a line (first I assume would be the category)
Read next two lines (question and answer)
Create new instance `new TriviaQuestion( question, answer, category)'
Add this to the Array list
Repeat until blank
If next line is blank, add list to map and loop back to (1)
Like: (this is assuming well formed file)
String line = inFile.nextLine(); //first line
String category = line;
while(infile.hasNextLine())
{
line = inFile.nextLine();
if(line.isEmpty()) //blank line
category = inFile.nextLine();
else
{
String q = line;
String a = inFile.nextLine();
//do other stuff
}
}
Then to ask a question get the list for the category, choose a random question then set it to asked so it doesn't come up again
ArrayList<TriviaQuestion> questions = yourMap.get("Science");
Integer aRandomNumber = 23 //(create a random Number using list size)
TriviaQuestion questionToAsk = questions.get(aRandomNumber)
System.out.println(questionToAsk.question)
questionToAsk.asked = true
I would approach this problem by identifying what is needed. You have a list of categories (Strings). Within each category, there will be a list of question (String) and answer (String) pairs. From there we already see some "logical" ways to organize the data.
Questions - String
Answers - String
Question/Answer pairs - Write a class (for now, lets refer to it as QAPair) with two Strings as fields (one for the question, one for the answer)
List of Q/A pairs within a category - ArrayList
List of Categories, mapped to a list of Q/A pairs - Maybe a Map would do the trick. The type would be: Map>
From there you would start parsing the file; for the first line, or after a blank line is encountered, you know the String will give a category name. You can call containsKey() to check if the category name already exists; if it does fetch the ArrayList of Q/A Pairs for that category and keep adding to the list, otherwise initialize a new ArrayList and add it to the map for that category.
You could then read a pair of lines. For each pair of lines you read initialize a QAPair object, then add it to the ArrayList for the category they belong to.
Here's an example of using a Map:
Map<String, ArrayList<QAPair>> categories = new HashMap<String, ArrayList<QAPair>>();
if (!categories.containsKey("Math")) { // Check to see if a Math category exists
categories.put("Math", new ArrayList<QAPair>()); // If it doesn't, create it
}
QAPair question1 = new QAPair("2+2", "4");
// get() method returns the ArrayList for the "Math" category
// add() method adds the QAPair to the ArrayList for the "Math" category
categories.get("Math").add(question1);
To get the list of categories from a map and pick one:
// Convert to an array of Strings
String[] catArray = categories.toArray(new String[0]);
// Get the 10th category in the array
// Use catArray.length to find how many categories there are total
catArray[10];

Find unique words in a file - Java

Using a msdos window I am piping in an amazon.txt file.
I am trying to use the collections framework. Keep in mind I want to keep this
as simple as possible.
What I want to do is count all the unique words in the file... with no duplicates.
This is what I have so far. Please be kind this is my first java project.
import java.util.Scanner;
import java.util.ArrayList;
import java.util.Iterator;
public class project1 {
// ArrayList<String> a = new ArrayList<String>();
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
String word;
String grab;
int count = 0;
ArrayList<String> a = new ArrayList<String>();
// Iterator<String> it = a.iterator();
System.out.println("Java project\n");
while (sc.hasNext()) {
word = sc.next();
a.add(word);
if (word.equals("---")) {
break;
}
}
Iterator<String> it = a.iterator();
while (it.hasNext()) {
grab = it.next();
if (grab.contains("a")) {
System.out.println(it.next()); // Just a check to see
count++;
}
}
System.out.println("I counted abc = ");
System.out.println(count);
System.out.println("\nbye...");
}
}
In your version, the wordlist a will contain all words but duplicates aswell. You can either
(a) check for every new word, if it is already included in the list (List#contains is the method you should call), or, the recommended solution
(b) replace ArrayList<String> with TreeSet<String>. This will eliminate duplicates automatically and store the words in alphabetical order
Edit
If you want to count the unique words, then do the same as above and the desired result is the collections size. So if you entered the sequence "a a b c ---", the result would be 3, as there are three unique words (a, b and c).
Instead of ArrayList<String>, use HashSet<String> (not sorted) or TreeSet<String> (sorted) if you don't need a count of how often each word occurs, Hashtable<String,Integer> (not sorted) or TreeMap<String,Integer> (sorted) if you do.
If there are words you don't want, place those in a HashSet<String> and check that this doesn't contain the word your Scanner found before placing into your collection. If you only want dictionary words, put your dictionary in a HashSet<String> and check that it contains the word your Scanner found before placing into your collection.

Categories