Java reading a file into a list - java

I need to figure out a way to turn an input file into a list of sentences which are delimited by more than one character, or more specifically, periods and exclamation points (! or .)
My input file has a layout similar to this:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence! This one also.
Heres another one?
Yes another one.
How can I put that file into a list sentence by sentence?
Each sentence in my file is finished once a ! or . character is passed.

There are a decent amount of ways to accomplish what you are asking, but here is one way to read a file into a program and split each line by specific delimiters into a list, while still keeping the delimiters in the sentence.
All of the functionality for turning a file to a list based on multiple delimiters can be found in the turnSentencesToList() method
In my example below I split by: ! . ?
import java.io.File;
import java.io.FileNotFoundException;
import java.util.LinkedList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test{
public static void main(String [] args){
LinkedList<String> list = turnSentencesToList("sampleFile.txt");
for(String s: list)
System.out.println(s);
}
private static LinkedList<String> turnSentencesToList(String fileName) {
LinkedList<String> list = new LinkedList<>();
String regex = "\\.|!|\\?";
File file = new File(fileName);
Scanner scan = null;
try {
scan = new Scanner(file);
while(scan.hasNextLine()){
String line = scan.nextLine().trim();
String[] sentences = null;
//we don't need empty lines
if(!line.equals("")) {
//splits by . or ! or ?
sentences = line.split("\\.|!|\\?");
//gather delims because split() removes them
List<String> delims = getDelimiters(line, regex);
if(sentences!=null) {
int count = 0;
for(String s: sentences) {
list.add(s.trim()+delims.get(count));
count++;
}
}
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
return null;
}finally {
if(scan!=null)
scan.close();
}
return list;
}
private static List<String> getDelimiters(String line, String regex) {
//this method is used to provide a list of all found delimiters in a line
List<String> allDelims = new LinkedList<String>();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(line);
String delim = null;
while(matcher.find()) {
delim = matcher.group();
allDelims.add(delim);
}
return allDelims;
}
}
Based on your example input file, the produced output would be:
Sample textfile!
A man, l, a ballot, a catnip, a pooh, a rail, a calamus, a dairyman, a bater, a canal - Panama!
This is a sentence!
This one also.
Heres another one?
Yes another one.

Related

How to read text from a file between two given patterns in java8

I have one text file. I have to read contents between two given patterns.
for example lets I have a file names - datafromOU.txt
I have to need data from the pattern1 till pattern2.
pattern1 : "CREATE EXTR"
Pattern2 :";"
But problem is - file has multiple occurances of pattern2. So my requirement is to look for pattern1 and then search immediate occurance of pattern2 after the pattern1. I need to store this into one string and then process later.
Can you help me how to read data from pattern1 and immediate occurance of pattern2 into a string variable using java streams?
I use java8. Thanks a lot in advance.
Thanks.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class Main {
static String getPatternFromString(String startPattern,String endPattern,String data) {
int startIndex=data.indexOf(startPattern);
data=data.substring(startIndex);
int endIndex=data.indexOf(endPattern, startIndex + startPattern.length());
data=data.substring(0,endIndex+1);
return data;
}
public static void main(String[] args) throws IOException {
File file1 = new File("C:\\Users\\amish\\Desktop\\partie.txt");
String startPattern="CREATE EXTR";
String endPattern=";";
try (BufferedReader leBuffer1 = new BufferedReader(new FileReader(file1));) {
StringBuilder everything = new StringBuilder();
String line;
while ((line = leBuffer1.readLine()) != null) {
everything.append(line);
}
String data = everything.toString();
data = data.trim();
System.out.println(data);
System.out.println(getPatternFromString(startPattern,endPattern, data));
leBuffer1.close();
} catch (FileNotFoundException exception) {
System.out.println("File not found");
}
}
}

Convert a .txt file with doubles to an ArrayList

I have a .txt file with this content: "17.23;12.1;20.34;88.23523;".
I want to read this file as doubles into an ArrayList. And eventually print the ArrayList (and eventually print the min. and max., but I don't think that will be a problem after solving this).
But I only get the output "[ ]".
What am I doing wrong? I've been struggling with this for embarrassing 15+ hours, browsed here, youtube, course books...
My code:
public static void main(String[] args) throws IOException {
File myFile = new File("text.txt");
Scanner scan = new Scanner(myFile);
ArrayList<Double> aList = new ArrayList<>();
while (scan.hasNextDouble()) {
double nextInput = scan.nextDouble();
if (nextInput != 0) {
break;
}
aList.add(nextInput);
}
System.out.println(alist);
}
You should configure your scanner so it will accept:
; as a delimiter
, as a decimal separator
Working code is:
File myFile = new File("input.txt");
// Swedish locale uses ',' as a decimal separator
Scanner scan = new Scanner(myFile).useDelimiter(";").useLocale(Locale.forLanguageTag("sv-SE"));
ArrayList<Double> aList = new ArrayList<>();
while (scan.hasNextDouble()) {
double nextInput = scan.nextDouble();
aList.add(nextInput);
}
System.out.println(aList);
With output [17.23, 12.1, 20.34, 88.23523]
Scanner works by splitting the input into tokens, where tokens are separated by whitespaces (by default). Since there are no whitespaces in the text, the first/only token is the entire text, and since that text is not a valid double value, hasNextDouble() returns false.
Two ways to fix that:
Change the token separator to ;:
scan.useDelimiter(";");
Read the file with BufferedReader and use split():
String filename = "text.txt";
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename))) {
for (String line; (line = in.readLine()) != null; ) {
String[] tokens = line.split(";");
// code here
}
}
That will now result in the following tokens: 17,23, 12,1, 20,34, 88,23523.
Unfortunately, none of those are valid double values, because they use locale-specific formatting, i.e. the decimal point is a ,, not a ..
Which means that if you kept using Scanner, you can't use hasNextDouble() and nextDouble(), and if you changed to use split(), you can't use Double.parseDouble().
You need to use a NumberFormat to parse locale-specific number formats. Since "Uppgift" looks Swedish, we can use NumberFormat.getInstance(Locale.forLanguageTag("sv-SE")), or simply NumberFormat.getInstance() if your default locale is Sweden.
String filename = "text.txt";
try (BufferedReader in = Files.newBufferedReader(Paths.get(filename))) {
NumberFormat format = NumberFormat.getInstance(Locale.forLanguageTag("sv-SE"));
for (String line; (line = in.readLine()) != null; ) {
List<Double> aList = new ArrayList<>();
for (String token : line.split(";")) {
double value = format.parse(token).doubleValue();
aList.add(value);
}
System.out.println(aList); // Prints: [17.23, 12.1, 20.34, 88.23523]
}
}
You can do it simply by changing the delimiter. This works by reading from a String but you can also do it from a file. It is up to you to put the values in some data structure. This presumes your default locale uses ',' as a decimal point.
String str = "17,23;12,1;20,34;88,23523;";
Scanner scan = new Scanner(str);
scan.useDelimiter("[;]+");
while(scan.hasNextDouble()) {
System.out.println(scan.nextDouble());
}
prints
17.23
12.1
20.34
88.23523
Since your file has numbers separated by a semi-colon, you won't be able to read them using scan.hasNextDouble() by default. However, there are so many ways of doing it e.g.
Override the default delimiter.
Reading a line as string and process each number from it after splitting it on the semi-colon.
Option-1:
scan.useDelimiter(";")
Note that since your file has the comma instead of the dot as the decimal symbol, you can use a Locale in which it is the default.
scan.useLocale(Locale.FRANCE);
Also, the following code block in your code will cause the loop to be terminated after reading the first number itself as the first number in your file in not equal to zero. Simply remove these lines in order to get the desired result:
if (nextInput != 0) {
break;
}
Option-2:
Read a line, split it on a semi-colon, replace the comma with a dot, parse each element from the resulting array into Double and store the same into aList.
Demo:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
File myFile = new File("file.txt");
Scanner scan = new Scanner(myFile);
List<Double> aList = new ArrayList<>();
while (scan.hasNextLine()) {
String nextInput = scan.nextLine();
String[] arr = nextInput.split(";");
for (String s : arr) {
aList.add(Double.valueOf(s.replace(",", ".")));
}
}
System.out.println(aList);
}
}
Output:
[17.23, 12.1, 20.34, 88.23523]
An alternative to replace comma with dot is to use NumberFormat as shown below:
import java.io.File;
import java.io.FileNotFoundException;
import java.text.NumberFormat;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;
import java.util.Locale;
import java.util.Scanner;
public class Main {
public static void main(String[] args) throws FileNotFoundException, ParseException {
File myFile = new File("file.txt");
Scanner scan = new Scanner(myFile);
NumberFormat format = NumberFormat.getInstance(Locale.FRANCE);
List<Double> aList = new ArrayList<>();
while (scan.hasNextLine()) {
String nextInput = scan.nextLine();
String[] arr = nextInput.split(";");
for (String s : arr) {
aList.add(format.parse(s).doubleValue());
}
}
System.out.println(aList);
}
}

How to correctly identify words when reading from a file with java Scanner?

I'm trying to do an exercise where I need to create a class to read the words from a .txt put the words in an HashSet. The thing is, if the text read "I am Daniel, Daniel I am." I'll have a word for "am" , "am." and "Daniel," and "Daniel". How do I fix this?
Here's my code. (I tried to use regex, but I'm getting an exception):
import java.io.File;
import java.io.FileNotFoundException;
import java.util.HashSet;
import java.util.Scanner;
public class WordCount {
public static void main(String[] args) {
try {
File file = new File(args[0]);
HashSet<String> set = readFromFile(file);
set.forEach(word -> System.out.println(word));
}
catch(FileNotFoundException e) {
System.err.println("File Not Found!");
}
}
private static HashSet<String> readFromFile(File file) throws FileNotFoundException {
HashSet<String> set = new HashSet<String>();
Scanner scanner = new Scanner(file);
while(scanner.hasNext()) {
String s = scanner.next("[a-zA-Z]");
set.add(s.toUpperCase());
}
scanner.close();
return set;
}
}
Error is thrown when the Scanner try to read a string not matching with the regex.
String s = scanner.next("[a-zA-Z]");
Instead of passing the regex in the Scanner. Read the word and remove the special characters as shown below.
String s = scanner.next();
s = s.replaceAll("[^a-zA-Z]", "");

Java Scanner cannot find word

I am sorry if there is something very simple I am missing.
I have the following code:
import java.util.Scanner;
import java.io.File;
import java.util.regex.Pattern;
public class UnJumble
{
String[] ws;
int ind=0;
public static void main(String args[]) throws Exception
{
System.out.println("Enter a jumbled word");
String w = new Scanner(System.in).next();
UnJumble uj = new UnJumble();
uj.ws = new String[uj.fact(w.length())];
uj.makeWords("",w);
int c=1;
Scanner sc = new Scanner(new File("dict.txt"));
for(int i=0; i<uj.ws.length; i++)
{
Pattern pat = Pattern.compile(uj.ws[i].toUpperCase());
if(sc.hasNext(pat))
System.out.println(c+++" : \'"+uj.ws[i]+"\'");
}
System.out.println("Search Completed.");
if(c==1) System.out.println("No word found.");
}
public void makeWords(String p,String s)
{
if(s.length()==0)
ws[ind++] = p;
else
for(int i=0; i<s.length(); i++)
makeWords(p+s.charAt(i),s.substring(0,i)+s.substring(i+1));
}
public int fact(int n)
{
if(n==0) return 1;
else return n*fact(n-1);
}
}
The dict.txt file is the SOWPODS dictionary, which is the official Scrabble dictionary..
I want to take in a jumbled word, and rearrange it to check if it is present in the dictionary. If it is, then print it out.
When I try tra as input, the output says No word Found..
But the output should have the words tar, art and rat.
Please tell me where I am making a mistake. I apologize if I have made a very simple mistake, because this is the first time I am working with Pattern.
This is from the JavaDoc of Scanner.hasNext(Pattern pattern) (with my highlighting)
Returns true if the next complete token matches the specified pattern.
As your Scanner was initialized with file dict.txt, it is positioned on first word.
And the first complete token in dict.txt does not match any of your scambled words, so no match is found.
Note: This assumes you have one word per line
I'd think you may want to change your code to find your scrambled text somewhere in the dictionary file (with start-of-line before and end-of-line after) resulting in a pattern "(^|\\W)"+uj.ws[i].toUpperCase()+"(\\W|$)" and something like
String dictstring = your dictionary as one string;
Matcher m = p.matcher(dictstring);
if(m.find()) {
...
I recommend IOUtils.toString() for reading your file like this:
String dictstring = "";
try(InputStream is = new FileInputStream("dict.txt")) {
dictstring = IOUtils.toString(is);
}
Here's a small example code to get familiar with pattern and matcher:
String dictString= "ONE\r\nTWO\r\nTHREE";
Pattern p = Pattern.compile("(^|\\W)TWO(\\W|$)");
Matcher m = p.matcher(dictString);
if(m.find()) {
System.out.println("MATCH: " + m.group());
}

in java, how to print entire line in the file when string match found

i am having text file called "Sample.text". It contains multiple lines. From this file, i have search particular string.If staring matches or found in that file, i need to print entire line . searching string is in in middle of the line . also i am using string buffer to append the string after reading the string from text file.Also text file is too large size.so i dont want to iterate line by line. How to do this
You could do it with FileUtils from Apache Commons IO
Small sample:
StringBuffer myStringBuffer = new StringBuffer();
List lines = FileUtils.readLines(new File("/tmp/myFile.txt"), "UTF-8");
for (Object line : lines) {
if (String.valueOf(line).contains("something")) {
myStringBuffer.append(String.valueOf(line));
}
}
we can also use regex for string or pattern matching from a file.
Sample code:
import java.util.regex.*;
import java.io.*;
/**
* Print all the strings that match a given pattern from a file.
*/
public class ReaderIter {
public static void main(String[] args) throws IOException {
// The RE pattern
Pattern patt = Pattern.compile("[A-Za-z][a-z]+");
// A FileReader (see the I/O chapter)
BufferedReader r = new BufferedReader(new FileReader("file.txt"));
// For each line of input, try matching in it.
String line;
while ((line = r.readLine()) != null) {
// For each match in the line, extract and print it.
Matcher m = patt.matcher(line);
while (m.find()) {
// Simplest method:
// System.out.println(m.group(0));
// Get the starting position of the text
int start = m.start(0);
// Get ending position
int end = m.end(0);
// Print whatever matched.
// Use CharacterIterator.substring(offset, end);
System.out.println(line.substring(start, end));
}
}
}
}

Categories