I am trying to count the polysyllables from a text file. I have managed to count the syllables and few others(i.e., no. of words, sentence and etc.) but unable to get the count of polysyllables. Searched on internet and found very few information on polysyllables to work with. Appreciate if anyone could help.
static void syllableCount(String text) {
Pattern p = Pattern.compile("[aeiouy]+[^$e(,.:;!?)]");
Matcher m = p.matcher(text);
int syllables = 0;
while (m.find()){
syllables++;
}
System.out.println("Syllables: " + syllables);
}
public static void main(String[] args) {
//accept file name or directory name through command line args
//String fname =args[0];
//pass the filename or directory name to File object
File f = new File("in3.txt");
try (Scanner scanner = new Scanner(f)) {
while (scanner.hasNext()) {
String text = scanner.nextLine();
int words = text.split(" |\n|\t").length;
int sentences = text.split("\\.|\\?|!").length;
int characters = text.replaceAll(" |\n|\t","").split("").length;
syllableCount(text);
**//polySyllables(text);**
Related
public void readToooooolData(String fileName) throws FileNotFoundException
{
File dataFile = new File(fileName);
Scanner scanner = new Scanner(dataFile);
scanner.useDelimiter("( *, *)|(\\s*,\\s*)|(\r\n)|(\n)");
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
if(!line.startsWith("//") || !(scanner.nextLine().startsWith(" ") )) {
String toolName = scanner.next();
String itemCode = scanner.next();
int timesBorrowed = scanner.nextInt();
boolean onLoan = scanner.nextBoolean();
int cost = scanner.nextInt();
int weight = scanner.nextInt();
storeTool( new Tool(toolName, itemCode, timesBorrowed, onLoan, cost, weight) );
scanner.nextLine();
}
scanner.close();
}
}
txt file:
// this is a comment, any lines that start with //
// (and blank lines) should be ignored
// data is toolName, toolCode, timesBorrowed, onLoan, cost, weight
Makita BHP452RFWX,RD2001, 12 ,false,14995,1800
Flex Impact Screwdriver FIS439,RD2834,14,true,13499,1200
DeWalt D23650-GB Circular Saw, RD6582,54,true,14997,5400
Milwaukee DD2-160XE Diamond Core Drill,RD4734,50,false,38894,9000
Bosch GSR10.8-Li Drill Driver,RD3021,25, true,9995,820
Bosch GSB19-2REA Percussion Drill,RD8654,85,false,19999,4567
Flex Impact Screwdriver FIS439, RD2835,14,false,13499,1200
DeWalt DW936 Circular Saw,RD4352,18,false,19999,3300
Sparky FK652 Wall Chaser,RD7625,15,false,29994,8400
There are some problems with your code.
It's better to use try with resources to open the Scanner
Some of the delimiters which you specified are duplicated, and some are unnecessary. Space is included into \s, so (\s*,\s*) is a duplicate of ( *, *), and (\r\n)|(\n) is not needed.
You read a line from the file and test if for being a comment or an empty line - ok. Then you need to extract tokens from the already read line, but you cannot use scanner.next() for this, because it will retrieve you the next token after the line which you have just read. So what your code is actually doing is trying to parse the information on the line after the not-comment/not-empty line.
There is also another scanner.nextLine() at the end of the loop, so you skip one more line from the file.
public void readToooooolData(String fileName) throws FileNotFoundException {
File dataFile = new File(fileName);
try (Scanner scanner = new Scanner(dataFile)) {
while (scanner.hasNextLine()) {
String line = scanner.nextLine().trim();
if (line.startsWith("//") || line.isEmpty()) {
continue;
}
String tokens[] = line.split("\\s*,\\s*");
if (tokens.length != 6) {
throw new RuntimeException("Wrong data file format");
}
String toolName = tokens[0];
String itemCode = tokens[1];
int timesBorrowed = Integer.parseInt(tokens[2]);
boolean onLoan = Boolean.parseBoolean(tokens[3]);
int cost = Integer.parseInt(tokens[4]);
int weight = Integer.parseInt(tokens[5]);
storeTool( new Tool(toolName, itemCode, timesBorrowed, onLoan, cost, weight) );
}
}
}
I have got a text like this in my String s (which I have already read from txt.file)
trump;Donald Trump;trump#yahoo.eu
obama;Barack Obama;obama#google.com
bush;George Bush;bush#inbox.com
clinton,Bill Clinton;clinton#mail.com
Then I'm trying to cut off everything besides an e-mail address and print out on console
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i]);
}
and I have output like this:
trump#yahoo.eu
obama#google.com
bush#inbox.com
clinton#mail.com
How can I avoid such output, I mean how can I get output text without line breakers?
Try using below approach. I have read your file with Scanner as well as BufferedReader and in both cases, I don't get any line break. file.txt is the file that contains text and the logic of splitting remains the same as you did
public class CC {
public static void main(String[] args) throws IOException {
Scanner scan = new Scanner(new File("file.txt"));
while (scan.hasNext()) {
String f1[] = null;
f1 = scan.nextLine().split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
scan.close();
BufferedReader br = new BufferedReader(new FileReader(new File("file.txt")));
String str = null;
while ((str = br.readLine()) != null) {
String f1[] = null;
f1 = str.split("(.*?);");
for (int i = 0; i < f1.length; i++) {
System.out.print(f1[i]);
}
}
br.close();
}
}
You may just replace all line breakers as shown in the below code:
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
System.out.print(f1[i].replaceAll("\r", "").replaceAll("\n", ""));
}
This will replace all of them with no space.
Instead of split, you might match an email like format by matching not a semicolon or a whitespace character one or more times using a negated character class [^\\s;]+ followed by an # and again matching not a semicolon or a whitespace character.
final String regex = "[^\\s;]+#[^\\s;]+";
final String string = "trump;Donald Trump;trump#yahoo.eu \n"
+ " obama;Barack Obama;obama#google.com \n"
+ " bush;George Bush;bush#inbox.com \n"
+ " clinton,Bill Clinton;clinton#mail.com";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
final List<String> matches = new ArrayList<String>();
while (matcher.find()) {
matches.add(matcher.group());
}
System.out.println(String.join("", matches));
[^\\s;]+#[^\\s;]+
Regex demo
Java demo
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "trump;Donald Trump;trump#yahoo.eu "
+ "obama;Barack Obama;obama#google.com "
+ "bush;George Bush;bush#inbox.com "
+ "clinton;Bill Clinton;clinton#mail.com";
String spaceStrings[] = s.split("[\\s,;]+");
String output="";
for(String word:spaceStrings){
if(validate(word)){
output+=word;
}
}
System.out.println(output);
}
public static final Pattern VALID_EMAIL_ADDRESS_REGEX = Pattern.compile(
"^[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,6}$",
Pattern.CASE_INSENSITIVE);
public static boolean validate(String emailStr) {
Matcher matcher = VALID_EMAIL_ADDRESS_REGEX.matcher(emailStr);
return matcher.find();
}
}
Just replace '\n' that may arrive at start and end.
write this way.
String f1[] = null;
f1=s.split("(.*?);");
for (int i=0;i<f1.length;i++) {
f1[i] = f1[i].replace("\n");
System.out.print(f1[i]);
}
I have implemented code to count number of:
- chars
- words
- lines
- bytes
in text file.
But how to count dictionary size: number of different words used in this file?
Also, how to implement iterator which can iterate over only letters? (Ignore whitespaces)
public class wc {
public static void main(String[] args) throws IOException {
//counters
int charsCount = 0;
int wordsCount = 0;
int linesCount = 0;
Scanner in = null;
File file = new File("Sample.txt");
try(Scanner scanner = new Scanner(new BufferedReader(new FileReader(file)))){
while (scanner.hasNextLine()) {
String tmpStr = scanner.nextLine();
if (!tmpStr.equalsIgnoreCase("")) {
String replaceAll = tmpStr.replaceAll("\\s+", "");
charsCount += replaceAll.length();
wordsCount += tmpStr.split("\\s+").length;
}
++linesCount;
}
System.out.println("# of chars: " + charsCount);
System.out.println("# of words: " + wordsCount);
System.out.println("# of lines: " + linesCount);
System.out.println("# of bytes: " + file.length());
}
}
}
To get unique words and their counts:
1. Split your obtained line from file into a string array
2. Store the contents of this string array in a Hashset
3. Repeat steps 1 and 2 till end of file
4. Get unique words and their count from the Hashset
I prefer posting logic and pseudo code as it will help OP to learn something by solving posted problem.
Example of how my code works:
File with words "aa bb cc cc aa aa" has 3 unique words.
First, turn words into a string with each word separated by "-".
String: "aa-bb-cc-cc-aa-aa-"
Get the first word: "aa", set the UniqueWordCount = 1, and then replace "aa-" with "".
New String: "bb-cc-cc-"
Get the first word: "bb", set the UniqueWordCount = 2, and then replace "bb-" with "".
New String: "cc-cc-"
Get the first word: "cc", set the UniqueWordCount = 3, and then replace "cc-" with "".
New String: "", you stop when the string is empty.
private static int getUniqueWordCountInFile(File file) throws FileNotFoundException {
String fileWordsAsString = getFileWords(file);
int uniqueWordCount = 0;
int i = 0;
while (!(fileWordsAsString.isEmpty()) && !(fileWordsAsString.isBlank())) {
if (Character.toString(fileWordsAsString.charAt(i)).equals(" ")) {
fileWordsAsString = fileWordsAsString.replaceAll(fileWordsAsString.substring(0, i+1),"");
i = 0;
uniqueWordCount++;
} else {
i++;
}
}
return uniqueWordCount;
}
private static String getFileWords(File file) throws FileNotFoundException {
String toReturn = "";
try (Scanner fileReader = new Scanner(file)) {
while (fileReader.hasNext()) {
if (fileReader.hasNextInt()) {
fileReader.nextInt();
} else {
toReturn += fileReader.next() + " ";
}
}
}
return toReturn;
}
If you want to use my code just pass getUniqueWordCountInFile() the file that has the words for which you want to count the unique words.
hey #JeyKey you can use HashMap. Here I using Iterator too. You can check out this code.
public class CountUniqueWords {
public static void main(String args[]) throws FileNotFoundException {
File f = new File("File Name");
ArrayList arr=new ArrayList();
HashMap<String, Integer> listOfWords = new HashMap<String, Integer>();
Scanner in = new Scanner(f);
int i=0;
while(in.hasNext())
{
String s=in.next();
//System.out.println(s);
arr.add(s);
}
Iterator itr=arr.iterator();
while(itr.hasNext())
{i++;
listOfWords.put((String) itr.next(), i);
//System.out.println(listOfWords); //for Printing the words
}
Set<Object> uniqueValues = new HashSet<Object>(listOfWords.values());
System.out.println("The number of unique words: "+uniqueValues.size());
}
}
I have a simple .txt file ("theFile.txt") with the following format where the left column are the lineNumber and the right column are the word:
5 today
2 It's
1 "
4 sunny
3 a
6 "
For this txt file, I'm making two separate methods to each get only the number and only the string, plus another method to scan the file and put each lineNumber and word in a double-linked list DLL:
String fileName = "theFile.txt";
public int getNumberOnly() {
int lineNumber;
//code to only get the lineNumber but NOT the words
//This is as far as I got and I need help on this part
return lineNumber;
}
public String getWordsOnly() {
String words;
//code to only get the words but NOT the lineNumber
//This is as far as I got and I need help on this part
return words;
}
public void readAndPrintWholeFile(String fileName){
String fileContents = new String();
File file = new File("theFile.txt");
Scanner scanner = new Scanner(new FileInputStream(fileName));
DLL<T> list = new DLL<T>();
//Print each lineNumber and corresponding words for example
// 5 Today
// 2 It's
while (scanner.hasNextLine())
{
fileContents = scanner.nextLine();
System.out.println(list.getNumbersOnly() + " " + list.getWordOnly());
//prints the lineNumber then space then the word
}
}
//I already have all DLL accessors and mutators such as get & set next/previous nodes here, etc.
I'm stuck on how to code the method bodies for both getNumbersOnly() and getWordOnly()
I've tried my best to get to this point. Thanks for your help.
public static void readAndPrintWholeFile(String filename) throws FileNotFoundException {
String fileContents;
File file = new File(filename);
Scanner scanner = new Scanner(new FileInputStream(file));
Map<String, String> map = new HashMap<>();
while (scanner.hasNextLine()) {
try {
fileContents = scanner.nextLine();
String[] as = fileContents.split(" +");
map.put(as[0], as[1]);
System.out.println(as[0] + " " + as[1]);
} catch (ArrayIndexOutOfBoundsException e) {
//If some problam with File formate e.g. number without word
}
}
}
You can do this in many way and one of them is hear.
Hera I am not implemented getNumberOnly() and getWordsOnly() method and I d't have DLL implementation so put data in map(HashMap).
You need to pass "fileContents" as the parameter to the functions.
public int getNumberOnly(String fileContents) {
int lineNumber;
//Get position of space
int spacePos = fileContents.indexOf(" ");
//Get substring from start till first space is encountered. Also trim off any leading or trailing spaces. Convert string to int via parseInt
lineNumber = parseInt(fileContents.subString(0,spacePos).trim());
return lineNumber;
}
public String getWordsOnly(String fileContents) {
String words;
int spacePos = fileContents.indexOf(" ");
//Get substring from first space till the end
words = fileContents.subString(spacePos).trim();
return words;
}
I have a text file:
John Smith 2009-11-04
Jenny Doe 2009-12-29
Alice Jones 2009-01-03
Bob Candice 2009-01-04
Carol Heart 2009-01-07
Carlos Diaz 2009-01-10
Charlie Brown 2009-01-14
I'm trying to remove the dashes and store them as separate types: first, last, year,month,day and then add it to a sortedset/hashmap. But for some reason. It's not working right.
Here is my code:
public class Test {
File file;
private Scanner sc;
//HashMap<Name, Date> hashmap = new HashMap<>();
/**
* #param filename
*/
public Test(String filename) {
file = new File(filename);
}
public void openFile(String filename) {
// open the file for scanning
System.out.println("Test file " + filename + "\n");
try {
sc = new Scanner(new File("birthdays.dat"));
}
catch(Exception e) {
System.out.println("Birthdays: Unable to open data file");
}
}
public void readFile() {
System.out.println("Name Birthday");
System.out.println("---- --------");
System.out.println("---- --------");
while (sc.hasNext()) {
String line = sc.nextLine();
String[] split = line.split("[ ]?-[ ]?");
String first = split[0];
String last = split[1];
//int year = Integer.parseInt(split[2]);
//int month = Integer.parseInt(split[3]);
//int day = Integer.parseInt(split[4]);
Resource name = new Name(first, last);
System.out.println(first + " " + last + " " + split[2] );
//hashmap.add(name);
}
}
public void closeFile() {
sc.close();
}
public static void main(String[] args) throws FileNotFoundException,
ArrayIndexOutOfBoundsException {
try {
Scanner sc = new Scanner( new File(args[0]) );
for( int i = 0; i < args.length; i++ ) {
//System.out.println( args[i] );
if( args.length == 0 ) {
}
else if( args.length >= 1 ) {
}
// System.out.printf( "Name %-20s Birthday", name.toString(), date.toString() );
}
} catch (ArrayIndexOutOfBoundsException e) {
System.err.println("Usage: Birthdays dataFile");
// Terminate the program here somehow, or see below.
System.exit(-1);
} catch (FileNotFoundException e) {
System.err.println("Birthdays: Unable to open data file");
// Terminate the program here somehow, or see below.
System.exit(-1);
}
Test r = new Test(args[0]);
r.openFile(args[0]);
r.readFile();
r.closeFile();
}
}
Your splitting on dashes but your is program is build around a split using spaces.
Try just splitting on spaces
String[] split = line.split("\\s");
So "John Smith 2009-11-04".split("[ ]?-[ ]?"); results in ["John Smith 2009", "11", "04"] When what you want is for it to split on spaces ["John", "Smith", "2009-11-04"]
I would do this differently, first create a domain object:
public class Person {
private String firstName;
private String lastName;
private LocalDate date;
//getters & setters
//equals & hashCode
//toString
}
Now create a method that parses a single String of the format you have to a Person:
//instance variable
private final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd");
public Person parsePerson(final String input) {
final String[] data = input.split("\\s+");
final Person person = new Person();
person.setFirstName(data[0]);
person.setLastName(data[1]);
person.setDate(LocalDate.parse(data[2], dateTimeFormatter));
return person;
}
Note that the DateTimeFormatter is an instance variable, this is for speed. You should also set the ZoneInfo on the formatter if you need to parse dates not in your current locale.
Now, you can read your file into a List<Person> very easily:
public List<Person> readFromFile(final Path path) throws IOException {
try (final Stream<String> lines = Files.lines(path)) {
return lines
.map(this::parsePerson)
.collect(toList());
}
}
And now that you have a List<Person>, you can sort or process them however you want.
You can even do this while creating the List:
public List<Person> readFromFile(final Path path) throws IOException {
try (final Stream<String> lines = Files.lines(path)) {
return lines
.map(this::parsePerson)
.sorted(comparing(Person::getLastName).thenComparing(Person::getFirstName))
.collect(toList());
}
}
Or have your Person implements Comparable<Person> and simply use natural order.
TL;DR: Use Objects for your objects and life becomes much simpler.
I would use a regex:
private static Pattern LINE_PATTERN
= Pattern.compile("(.+) (.+) ([0-9]{4})-([0-9]{2})-([0-9]{2})");
...
while (sc.hasNext()) {
String line = sc.nextLine();
Matcher matcher = LINE_PATTERN.matcher(line);
if (!matcher.matches()) {
// malformed line
} else {
String first = matcher.group(1);
String last = matcher.group(2);
int year = Integer.parseInt(matcher.group(3));
int month = Integer.parseInt(matcher.group(4));
int day = Integer.parseInt(matcher.group(5));
// do something with it
}
}
You are splitting on spaces and a hyphen. This pattern does not exist.
String[] split = line.split("[ ]?");
String first = split[0];
String last = split[1];
line = split[2];
//now split the date
String[] splitz = line.split("-");
or something like this might work:
String delims = "[ -]+";
String[] tokens = line.split(delims);
If i understood your question right then Here is answer. Check it out.
List<String> listGet = new ArrayList<String>();
String getVal = "John Smith 2009-11-04";
String[] splited = getVal.split("[\\-:\\s]");
for(int j=0;j<splited.length;j++)
{
listGet.add(splited[j]);
}
System.out.println("first name :"+listGet.get(0));
System.out.println("Last name :"+listGet.get(1));
System.out.println("year is :"+listGet.get(2));
System.out.println("month is :"+listGet.get(3));
System.out.println("day is :"+listGet.get(4));
OP :
first name :John
Last name :Smith
year is :2009
month is :11
day is :04