I have implemented code to count number of:
- chars
- words
- lines
- bytes
in text file.
But how to count dictionary size: number of different words used in this file?
Also, how to implement iterator which can iterate over only letters? (Ignore whitespaces)
public class wc {
public static void main(String[] args) throws IOException {
//counters
int charsCount = 0;
int wordsCount = 0;
int linesCount = 0;
Scanner in = null;
File file = new File("Sample.txt");
try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(file)))){
while (scanner.hasNextLine()) {
String tmpStr = scanner.nextLine();
if (!tmpStr.equalsIgnoreCase("")) {
String replaceAll = tmpStr.replaceAll("\\s+", "");
charsCount += replaceAll.length();
wordsCount += tmpStr.split("\\s+").length;
}
++linesCount;
}
System.out.println("# of chars: " + charsCount);
System.out.println("# of words: " + wordsCount);
System.out.println("# of lines: " + linesCount);
System.out.println("# of bytes: " + file.length());
}
}
}
To get unique words and their counts:
1. Split your obtained line from file into a string array
2. Store the contents of this string array in a Hashset
3. Repeat steps 1 and 2 till end of file
4. Get unique words and their count from the Hashset
I prefer posting logic and pseudo code as it will help OP to learn something by solving posted problem.
Example of how my code works:
File with words "aa bb cc cc aa aa" has 3 unique words.
First, turn words into a string with each word separated by "-".
String: "aa-bb-cc-cc-aa-aa-"
Get the first word: "aa", set the UniqueWordCount = 1, and then replace "aa-" with "".
New String: "bb-cc-cc-"
Get the first word: "bb", set the UniqueWordCount = 2, and then replace "bb-" with "".
New String: "cc-cc-"
Get the first word: "cc", set the UniqueWordCount = 3, and then replace "cc-" with "".
New String: "", you stop when the string is empty.
private static int getUniqueWordCountInFile(File file) throws FileNotFoundException {
String fileWordsAsString = getFileWords(file);
int uniqueWordCount = 0;
int i = 0;
while (!(fileWordsAsString.isEmpty()) && !(fileWordsAsString.isBlank())) {
if (Character.toString(fileWordsAsString.charAt(i)).equals(" ")) {
fileWordsAsString = fileWordsAsString.replaceAll(fileWordsAsString.substring(0, i+1),"");
i = 0;
uniqueWordCount++;
} else {
i++;
}
}
return uniqueWordCount;
}
private static String getFileWords(File file) throws FileNotFoundException {
String toReturn = "";
try (Scanner fileReader = new Scanner(file)) {
while (fileReader.hasNext()) {
if (fileReader.hasNextInt()) {
fileReader.nextInt();
} else {
toReturn += fileReader.next() + " ";
}
}
}
return toReturn;
}
If you want to use my code just pass getUniqueWordCountInFile() the file that has the words for which you want to count the unique words.
hey #JeyKey you can use HashMap. Here I using Iterator too. You can check out this code.
public class CountUniqueWords {
public static void main(String args[]) throws FileNotFoundException {
File f = new File("File Name");
ArrayList arr=new ArrayList();
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
Scanner in = new Scanner(f);
int i=0;
while(in.hasNext())
{
String s=in.next();
//System.out.println(s);
arr.add(s);
}
Iterator itr=arr.iterator();
while(itr.hasNext())
{i++;
listOfWords.put((String) itr.next(), i);
//System.out.println(listOfWords); //for Printing the words
}
Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values());
System.out.println("The number of unique words: "+uniqueValues.size());
}
}
Related
I am creating a program that is reading a file of names and ages then printing them out in ascending order. I am parsing through the file to figure out the number of name age pairs and then making my array that big.
The input file looks like this:
(23, Matt)(2000, jack)(50, Sal)(47, Mark)(23, Will)(83200, Andrew)(23, Lee)(47, Andy)(47, Sam)(150, Dayton)
When I am running my code I get the output of (0,null) and I am not sure why. I have been trying to fix it for a while and am lost. If anyone can help that would be great My code is below.
public class ponySort {
public static void main(String[] args) throws FileNotFoundException {
int count = 0;
int fileSize = 0;
int[] ages;
String [] names;
String filename = "";
Scanner inputFile = new Scanner(System.in);
File file;
do {
System.out.println("File to read from:");
filename = inputFile.nextLine();
file = new File(filename);
//inputFile = new Scanner(file);
}
while (!file.exists());
inputFile = new Scanner(file);
if (!inputFile.hasNextLine()) {
System.out.println("No one is going to the Friendship is magic Party in Equestria.");
}
while (inputFile.hasNextLine()) {
String data1 = inputFile.nextLine();
String[] parts1 = data1.split("(?<=\\))(?=\\()");
for (String part : parts1) {
String input1 = part.replaceAll("[()]", "");
Integer.parseInt(input1.split(", ")[0]);
fileSize++;
}
}
ages = new int[fileSize];
names = new String[fileSize];
while (inputFile.hasNextLine()) {
String data = inputFile.nextLine();
String[] parts = data.split("(?<=\\))(?=\\()");
for (String part : parts) {
String input = part.replaceAll("[()]", "");
ages[count] = Integer.parseInt(input.split(", ")[0]);
names[count] = input.split(", ")[1];
count++;
}
}
ponySort max = new ponySort();
max.bubbleSort(ages, names, count);
max.printArray(ages, names, count);
}
public void printArray(int ages[], String names[], int count) {
System.out.print("(" + ages[0] + "," + names[0] + ")");
// Checking for duplicates in ages. if it is the same ages as one that already was put in them it wont print.
for (int i = 1; i < count; i++) {
if (ages[i] != ages[i - 1]) {
System.out.print("(" + ages[i] + "," + names[i] + ")");
}
}
}
public void bubbleSort(int ages[], String names[], int count ){
for (int i = 0; i < count - 1; i++) {
for (int j = 0; j < count - i - 1; j++) {
// age is greater so swaps age
if (ages[j] > ages[j + 1]) {
// swap the ages
int temp = ages[j];
ages[j] = ages[j + 1];
ages[j + 1] = temp;
// must also swap the names
String tempName = names[j];
names[j] = names[j + 1];
names[j + 1] = tempName;
}
}
}
}
}
output example
File to read from:
file.txt
(0,null)
Process finished with exit code 0
What your code does is to Scan the file twice.
In the first loop you do
String data1 = inputFile.nextLine();
Code reads first line and then scanner goes to the next (second) line.
Later you do again inputFile.nextLine(); The second line is empty and the code never goes into the second loop and content is never read.
If you can use Lists, you should create two array lists and add ages and names into the arraylists in the first scan, so you scan the file once. When done, you could get the Array out of the arraylist.
If you should only use arrays and you want a simple update, just add another Scanner before the second loop:
ages = new int[fileSize];
names = new String[fileSize];
inputFile = new Scanner(file); // add this line
I have a simple .txt file ("theFile.txt") with the following format where the left column are the lineNumber and the right column are the word:
5 today
2 It's
1 "
4 sunny
3 a
6 "
For this txt file, I'm making two separate methods to each get only the number and only the string, plus another method to scan the file and put each lineNumber and word in a double-linked list DLL:
String fileName = "theFile.txt";
public int getNumberOnly() {
int lineNumber;
//code to only get the lineNumber but NOT the words
//This is as far as I got and I need help on this part
return lineNumber;
}
public String getWordsOnly() {
String words;
//code to only get the words but NOT the lineNumber
//This is as far as I got and I need help on this part
return words;
}
public void readAndPrintWholeFile(String fileName){
String fileContents = new String();
File file = new File("theFile.txt");
Scanner scanner = new Scanner(new FileInputStream(fileName));
DLL<T> list = new DLL<T>();
//Print each lineNumber and corresponding words for example
// 5 Today
// 2 It's
while (scanner.hasNextLine())
{
fileContents = scanner.nextLine();
System.out.println(list.getNumbersOnly() + " " + list.getWordOnly());
//prints the lineNumber then space then the word
}
}
//I already have all DLL accessors and mutators such as get & set next/previous nodes here, etc.
I'm stuck on how to code the method bodies for both getNumbersOnly() and getWordOnly()
I've tried my best to get to this point. Thanks for your help.
public static void readAndPrintWholeFile(String filename) throws FileNotFoundException {
String fileContents;
File file = new File(filename);
Scanner scanner = new Scanner(new FileInputStream(file));
Map<String, String> map = new HashMap<>();
while (scanner.hasNextLine()) {
try {
fileContents = scanner.nextLine();
String[] as = fileContents.split(" +");
map.put(as[0], as[1]);
System.out.println(as[0] + " " + as[1]);
} catch (ArrayIndexOutOfBoundsException e) {
//If some problam with File formate e.g. number without word
}
}
}
You can do this in many way and one of them is hear.
Hera I am not implemented getNumberOnly() and getWordsOnly() method and I d't have DLL implementation so put data in map(HashMap).
You need to pass "fileContents" as the parameter to the functions.
public int getNumberOnly(String fileContents) {
int lineNumber;
//Get position of space
int spacePos = fileContents.indexOf(" ");
//Get substring from start till first space is encountered. Also trim off any leading or trailing spaces. Convert string to int via parseInt
lineNumber = parseInt(fileContents.subString(0,spacePos).trim());
return lineNumber;
}
public String getWordsOnly(String fileContents) {
String words;
int spacePos = fileContents.indexOf(" ");
//Get substring from first space till the end
words = fileContents.subString(spacePos).trim();
return words;
}
I've got the following code that opens and read a file and separates it to words.
My problem is at making an array of these words in alphabetical order.
import java.io.*;
class MyMain {
public static void main(String[] args) throws IOException {
File file = new File("C:\\Kennedy.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
String line = null;
int line_count=0;
int byte_count;
int total_byte_count=0;
int fromIndex;
while( (line = br.readLine())!= null ){
line_count++;
fromIndex=0;
String [] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
String line_rest=line;
for (int i=1; i <= tokens.length; i++) {
byte_count = line_rest.indexOf(tokens[i-1]);
//if ( tokens[i-1].length() != 0)
//System.out.println("\n(line:" + line_count + ", word:" + i + ", start_byte:" + (total_byte_count + fromIndex) + "' word_length:" + tokens[i-1].length() + ") = " + tokens[i-1]);
fromIndex = fromIndex + byte_count + 1 + tokens[i-1].length();
if (fromIndex < line.length())
line_rest = line.substring(fromIndex);
}
total_byte_count += fromIndex;
}
}
}
I would read the File with a Scanner1 (and I would prefer the File(String,String) constructor to provide the parent folder). And, you should remember to close your resources explicitly in a finally block or you might use a try-with-resources statement. Finally, for sorting you can store your words in a TreeSet in which the elements are ordered using their natural ordering2. Something like,
File file = new File("C:/", "Kennedy.txt");
try (Scanner scanner = new Scanner(file)) {
Set<String> words = new TreeSet<>();
int line_count = 0;
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
line_count++;
String[] tokens = line.split(",\\s+|\\s*\\\"\\s*|\\s+|\\.\\s*|\\s*\\:\\s*");
Stream.of(tokens).forEach(word -> words.add(word));
}
System.out.printf("The file contains %d lines, and in alphabetical order [%s]%n",
line_count, words);
} catch (Exception e) {
e.printStackTrace();
}
1Mainly because it requires less code.
2or by a Comparator provided at set creation time
If you are storing the tokens in a String Array, use Arrays.sort() and get a naturally sorted Array. In this case as its String, you will get a sorted array of tokens.
I am practicing to write a program that gets a text file from user and provides data such as characters, words, and lines in the text.
I have searched and looked over the same topic but cannot find a way to make my code run.
public class Document{
private Scanner sc;
// Sets users input to a file name
public Document(String documentName) throws FileNotFoundException {
File inputFile = new File(documentName);
try {
sc = new Scanner(inputFile);
} catch (IOException exception) {
System.out.println("File does not exists");
}
}
public int getChar() {
int Char= 0;
while (sc.hasNextLine()) {
String line = sc.nextLine();
Char += line.length() + 1;
}
return Char;
}
// Gets the number of words in a text
public int getWords() {
int Words = 0;
while (sc.hasNext()) {
String line = sc.next();
Words += new StringTokenizer(line, " ,").countTokens();
}
return Words;
}
public int getLines() {
int Lines= 0;
while (sc.hasNextLine()) {
Lines++;
}
return Lines;
}
}
Main method:
public class Main {
public static void main(String[] args) throws FileNotFoundException {
DocStats doc = new DocStats("someText.txt");
// outputs 1451, should be 1450
System.out.println("Number of characters: "
+ doc.getChar());
// outputs 0, should be 257
System.out.println("Number of words: " + doc.getWords());
// outputs 0, should be 49
System.out.println("Number of lines: " + doc.getLines());
}
}
I know exactly why I get 1451 instead of 1451. The reason is because I do not have '\n' at the end of the last sentence but my method adds
numChars += line.length() + 1;
However, I cannot find a solution to why I get 0 for words and lines.
*My texts includes elements as: ? , - '
After all, could anyone help me to make this work?
**So far, I the problem that concerns me is how I can get a number of characters, if the last sentence does not have '\n' element. Is there a chance I could fix that with an if statement?
-Thank you!
After doc.getChar() you have reached the end of file. So there's nothing more to read in this file!
You should reset your scanner in your getChar/Words/Lines methods, such as:
public int getChar() {
sc = new Scanner(inputFile);
...
// solving your problem with the last '\n'
while (sc.hasNextLine()) {
String line = sc.nextLine();
if (sc.hasNextLine())
Char += line.length() + 1;
else
Char += line.length();
}
return char;
}
Please note that a line ending is not always \n! It might also be \r\n (especially under windows)!
public int getWords() {
sc = new Scanner(inputFile);
...
public int getLines() {
sc = new Scanner(inputFile);
...
I would use one sweep to calculate all 3, with different counters. just a loop over each char, check if its a new word etc, increase counts , use Charater.isWhiteSpace *
import java.io.*;
/**Cound lines, characters and words Assumes all non white space are words so even () is a word*/
public class ChrCounts{
String data;
int chrCnt;
int lineCnt;
int wordCnt;
public static void main(String args[]){
ChrCounts c = new ChrCounts();
try{
InputStream data = null;
if(args == null || args.length < 1){
data = new ByteArrayInputStream("quick brown foxes\n\r new toy\'s a fun game.\nblah blah.la la ga-ma".getBytes("utf-8"));
}else{
data = new BufferedInputStream( new FileInputStream(args[0]));
}
c.process(data);
c.print();
}catch(Exception e){
System.out.println("ee " + e);
e.printStackTrace();
}
}
public void print(){
System.out.println("line cnt " + lineCnt + "\nword cnt " + wordCnt + "\n chrs " + chrCnt);
}
public void process(InputStream data) throws Exception{
int chrCnt = 0;
int lineCnt = 0;
int wordCnt = 0;
boolean inWord = false;
boolean inNewline = false;
//char prev = ' ';
while(data.available() > 0){
int j = data.read();
if(j < 0)break;
chrCnt++;
final char c = (char)j;
//prev = c;
if(c == '\n' || c == '\r'){
chrCnt--;//some editors do not count line seperators as new lines
inWord = false;
if(!inNewline){
inNewline = true;
lineCnt++;
}else{
//chrCnt--;//some editors dont count adjaccent line seps as characters
}
}else{
inNewline = false;
if(Character.isWhitespace(c)){
inWord = false;
}else{
if(!inWord){
inWord = true;
wordCnt++;
}
}
}
}
//we had some data and last char was not in new line, count last line
if(chrCnt > 0 && !inNewline){
lineCnt++;
}
this.chrCnt = chrCnt;
this.lineCnt = lineCnt;
this.wordCnt = wordCnt;
}
}
I'm reading from the file:
name1 wordx wordy passw1
name2 wordx wordy passw2
name3 wordx wordy passw3
name (i) wordx wordy PASSW (i)
x
x word
x words
words
x
words
At the moment I can print line by line:
Line 1: name1 wordx wordy passw1
Line 2: name2 wordx wordy passw2
I plan to have access to:
users [0] = name1
users [1] = name2
users [2] = name3
..
passws [0] = passw1
passws [1] = passw2
passws [2] = passw3
..
My code is:
public static void main(String args[]) throws FileNotFoundException, IOException {
ArrayList<String> list = new ArrayList<String>();
Scanner inFile = null;
try {
inFile = new Scanner(new File("C:\\file.txt"));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while (inFile.hasNextLine()) {
list.add(inFile.nextLine()+",");
}
String listString = "";
for (String s : list) {
listString += s + "\t";
}
String[] parts = listString.split(",");
System.out.println("Line1: "+ parts[0]);
}
How do I get the following output:
User is name1 and password is passw1
User is name32 and password is passw32
Thanks in advance.
Something like this will do:
public static void main(String args[]) throws FileNotFoundException, IOException {
ArrayList<String> list = new ArrayList<String>();
Scanner inFile = null;
try {
inFile = new Scanner(new File("C:\\file.txt"));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
while (inFile.hasNextLine()) {
list.add(inFile.nextLine());
}
int line = 0;
String[] parts = list.get(line).split(" ");
String username = parts[0];
String pass = parts[3];
System.out.println("Line" + (line + 1) + ": " + "User is " + username +" and password is " + pass);
}
EDIT: if you want to iterate through all lines just put last lines in a loop:
for (int line = 0; line < list.size(); line++) {
String[] parts = list.get(line).split(" ");
String username = parts[0];
String pass = parts[3];
System.out.println("Line" + (line + 1) + ": " + "User is " + username +" and password is " + pass);
}
First thing to do is, to add this loop to the end of your code :
for(int i = 0; i <= parts.length(); i++){
System.out.println("parts["+i+"] :" + parts[i] );
}
that will simply show the result of the split using ,.
Then adapt your code, you may want to use another regex to split() your lines, for instance a space.
String[] parts = listString.split(" ");
for documentation about split() method check this.
If you want to get that output then this should do the trick:
public static void main(String args[]) throws FileNotFoundException, IOException {
Scanner inFile = null;
try {
inFile = new Scanner(new File("F:\\file.txt"));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
Map<String, String> userAndPassMap = new LinkedHashMap<>();
while (inFile.hasNextLine()) {
String nextLine = inFile.nextLine();
String[] userAndPass = nextLine.split(" ");
userAndPassMap.put(userAndPass[0], userAndPass[1]);
}
for (Map.Entry<String, String> entry : userAndPassMap.entrySet()) {
System.out.println("User is:" + entry.getKey() + " and password is:" + entry.getValue());
}
}
By storing in a map you are linking directly each username with its password. If you need to save them into separate arrays then you can do this in the while loop instead:
List<String> users = new LinkedList<>(),passwords = new LinkedList<>();
while (inFile.hasNextLine()) {
String nextLine = inFile.nextLine();
String[] userAndPass = nextLine.split(" ");
users.add(userAndPass[0]);
passwords.add(userAndPass[1]);
}
and later transform them to arrays
users.toArray()
I recommend you use a java.util.Map, a standard API which allows you to store objects and read each one of them by a key. (In your case, string objects indexed by string keys). Example:
Let's assume this empty map:
Map<String, String> map=new HashMap<String,String>();
If you store this:
map.put("month", "january");
map.put("day", "sunday");
You can expect that map.get("month") will return "january", map.get("day") will return "sunday", and map.get(any-other-string) will return null.
Back to your case: First, you must create and populate the map:
private Map<String, String> toMap(Scanner scanner)
{
Map<String, String> map=new HashMap<String, String>();
while (scanner.hasNextLine())
{
String line=scanner.nextLine();
String[] parts=line.split(" ");
// Validation: Process only lines with 4 tokens or more:
if (parts.length>=4)
{
map.put(parts[0], parts[parts.length-1]);
}
}
return map;
}
And then, to read the map:
private void listMap(Map<String,String> map)
{
for (String name : map.keySet())
{
String pass=map.get(name);
System.out.println(...);
}
}
You must include both in your class and call them from the main method.
If you need arbitraray indexing of the read lines, use ArrayList:
First, define a javabean User:
public class User
{
private String name;
private String password;
// ... add full constructor, getters and setters.
}
And then, you must create and populate the list:
private ArrayList<User> toList(Scanner scanner)
{
List<User> list=new ArrayList<User>();
while (scanner.hasNextLine())
{
String line=scanner.nextLine();
String[] parts=line.split(" ");
// Validation: Process only lines with 4 tokens or more:
if (parts.length>=4)
{
list.add(new User(parts[0], parts[parts.length-1]));
}
}
return list;
}