So I have to extract data from a text file.
The text file is set up like this.
3400 Moderate
310 Light
etc.
I need to extract the numbers, store them in one array, and the strings, and store them in another array so I can do calculations to the numbers based on whats written in the array, and then output that to a file. I've got the last part down, I just cant figure out how to separate the ints from the strings when I extract the data from the txt. file.
Here is what I have now, but it's just extracting the int and the word as a String.
import java.io.*;
import java.util.*;
public class HorseFeed {
public static void main(String[] args){
Scanner sc = null;
try {
sc = new Scanner(new File("C:\\Users\\Patric\\Desktop\\HorseWork.txt"));
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
List<String> lines = new ArrayList<String>();
while (sc.hasNextLine()) {
lines.add(sc.nextLine());
}
String[] arr = lines.toArray(new String[0]);
for(int i = 0; i< 100; i++){
System.out.print(arr[i]);
}
}
}
Use split(String regex) in String class. Set the regex to search for whitespaces OR digits. It will return a String[] which contains words.
If you are analyzing it line by line, you would want another String[] in which you would append all the words from the new lines.
plz, follow the code.
import java.io.*;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HorseFeed {
public static void main(String[] args) throws FileNotFoundException, IOException {
List<String> lineList = new ArrayList<String>();
BufferedReader br = new BufferedReader(new FileReader(new File("C:\\Users\\Patric\\Desktop\\HorseWork.txt")));
String line;
while ((line = br.readLine()) != null) {
Pattern pattern = Pattern.compile("[0-9]+");
Matcher matcher = pattern.matcher(line);
if( pattern.matcher(line).matches()){
while(matcher.find()){
lineList.add(matcher.group());
}
}
}
}
}
here lineList contains your integer.
This should work:
import java.io.*;
import java.util.*;
public class HorseFeed {
public static void main(String[] args) throws FileNotFoundException {
List<Integer> intList = new ArrayList<Integer>();
List<String> strList = new ArrayList<String>();
Scanner sc = new Scanner(new File("C:\\Users\\Patric\\Desktop\\HorseWork.txt"));
while (sc.hasNextLine()) {
String line = sc.nextLine();
String[] lineParts = line.split("\\s+");
Integer intValue = Integer.parseInt(lineParts[0]);
String strValue = lineParts[1];
intList.add(intValue);
strList.add(strValue);
}
System.out.println("Values: ");
for(int i = 0; i < intList.size(); i++) {
System.out.print("\t" + intList.get(i) + ": " + strList.get(i));
}
}
}
First extract all text of file and stored it into String . then use replaceall method of string class with pattern to remove digits from it.
Example:
String fileText = new String("welcome 2 java");
ss = fileText.replaceAll("-?\\d+", "");
System.out.println(ss);
Related
I keep getting an error telling me lineNumber cannot be resolved to a variable? I'm not really sure how to fix this exactly. Am I not importing a certain file to java that helps with this?
And also how would I count the number of chars with spaces and without spaces.
Also I need a method to count unique words but I'm not really sure what unique words are.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Scanner;
import java.util.StringTokenizer;
import java.util.ArrayList;
import java.util.List;
public class LineWordChar {
public void main(String[] args) throws IOException {
// Convert our text file to string
String text = new Scanner( new File("way to your file"), "UTF-8" ).useDelimiter("\\A").next();
BufferedReader bf=new BufferedReader(new FileReader("way to your file"));
String lines="";
int linesi=0;
int words=0;
int chars=0;
String s="";
// while next lines are present in file int linesi will add 1
while ((lines=bf.readLine())!=null){
linesi++;}
// Tokenizer separate our big string "Text" to little string and count them
StringTokenizer st=new StringTokenizer(text);
while (st.hasMoreTokens()){
s = st.nextToken();
words++;
// We take every word during separation and count number of char in this words
for (int i = 0; i < s.length(); i++) {
chars++;}
}
System.out.println("Number of lines: "+linesi);
System.out.println("Number of words: "+words);
System.out.print("Number of chars: "+chars);
}
}
abstract class WordCount {
/**
* #return HashMap a map containing the Character count, Word count and
* Sentence count
* #throws FileNotFoundException
*
*/
public static void main() throws FileNotFoundException {
lineNumber=2; // as u want
File f = null;
ArrayList<Integer> list=new ArrayList<Integer>();
f = new File("file_stats.txt");
Scanner sc = new Scanner(f);
int totalLines=0;
int totalWords=0;
int totalChars=0;
int totalSentences=0;
while(sc.hasNextLine())
{
totalLines++;
if(totalLines==lineNumber){
String line = sc.nextLine();
totalChars += line.length();
totalWords += new StringTokenizer(line, " ,").countTokens(); //line.split("\\s").length;
totalSentences += line.split("\\.").length;
break;
}
sc.nextLine();
}
list.add(totalChars);
list.add(totalWords);
list.add(totalSentences);
System.out.println(lineNumber+";"+totalWords+";"+totalChars+";"+totalSentences);
}
}
In order to get your code running you have to do at least two changes:
Replace:
lineNumber=2; // as u want
with
int lineNumber=2; // as u want
Also, you need to modify your main method, you can not throw an exception in your main method declaration because there is nothing above it to catch the exception, you have to handle exceptions inside it:
public static void main(String[] args) {
// Convert our text file to string
try {
String text = new Scanner(new File("way to your file"), "UTF-8").useDelimiter("\\A").next();
BufferedReader bf = new BufferedReader(new FileReader("way to your file"));
String lines = "";
int linesi = 0;
int words = 0;
int chars = 0;
String s = "";
// while next lines are present in file int linesi will add 1
while ((lines = bf.readLine()) != null) {
linesi++;
}
// Tokenizer separate our big string "Text" to little string and count them
StringTokenizer st = new StringTokenizer(text);
while (st.hasMoreTokens()) {
s = st.nextToken();
words++;
// We take every word during separation and count number of char in this words
for (int i = 0; i < s.length(); i++) {
chars++;
}
}
System.out.println("Number of lines: " + linesi);
System.out.println("Number of words: " + words);
System.out.print("Number of chars: " + chars);
} catch (Exception e) {
e.printStackTrace();
}
}
I've used a global Exception catch, you can separate expetion in several catches, in order to handle them separatedly. It gives me an exception telling me an obvious FileNotFoundException, besides of that your code runs now.
lineNumber variable should be declared with datatype.
int lineNumber=2; // as u want
change the first line in the main method from just lineNumber to int lineNumber = 2 by setting its data type, as it is important to set data type of every variable in Java.
Here is what my .txt file looks like (but more than 100 elements per line and more than 100 lines):
-0.89094 -0.86099 -0.82438 -0.78214 -0.73573 -0.68691 -0.63754
-0.42469 -0.3924 -0.36389 -0.33906 -0.31795 -0.30056 -0.28692
What I want to do is read this .txt file and store them in Arryalist. The problem is when I read and store this data, they put all of this in the same array (I want them to store in 2 arrays split by a line).
Here is my code:
public class ReadStore {
public static void main(String[] args) throws IOException {
Scanner inFile = new Scanner(new File("Untitled.txt")).useDelimiter("\\s");
ArrayList<Float> temps = new ArrayList<Float>();
while (inFile.hasNextFloat()) {
// find next line
float token = inFile.nextFloat();
temps.add(token);
}
inFile1.close();
Float [] tempsArray = temps.toArray(new Float[0]);
for (Float s : tempsArray) {
System.out.println(s);
}
}
Any suggestion for making this works?
I might go about this by just reading in each line in its entirety, and then splitting on whitespace to access each floating point number. This gets around the issue of having to distinguish spaces from line separators.
public class ReadStore {
public static void main(String[] args) throws IOException {
Scanner inFile = new Scanner(new File("Untitled.txt"));
ArrayList<Float> temps = new ArrayList<Float>();
while (inFile.hasNextLine()) {
String line = inFile.nextLine();
String[] nums = line.trim().split("\\s+");
for (String num : nums) {
float token = Float.parseFloat(num);
temps.add(token);
}
Float [] tempsArray = temps.toArray(new Float[0]);
for (Float s : tempsArray) {
System.out.println(s);
}
}
inFile.close();
}
}
Here is a demo showing that the logic works for a single line of your input file. Note that I call String#trim() on each line before splitting it, just in case there is any leading or trailing whitespace which we don't want.
Rextester
First read line by line, then for each line read each float.
Try This :
public static void main(String[] args) throws IOException {
Scanner inFile = new Scanner(new File("Untitled.txt"));
List<List<Float>> temps = new ArrayList<>();
while (inFile.hasNextLine()) {
List<Float> data = new ArrayList<>();
Scanner inLine = new Scanner(inFile.nextLine());
while (inLine.hasNextFloat()) {
data.add(inLine.nextFloat());
}
inLine.close();
temps.add(data);
}
inFile.close();
Float[][] dataArray = new Float[temps.size()][];
for (int i = 0; i < dataArray.length; i++) {
dataArray[i] = temps.get(i).toArray(new Float[temps.get(i).size()]);
}
System.out.println(Arrays.deepToString(dataArray));
}
Output :
[[-0.89094, -0.86099, -0.82438, -0.78214, -0.73573, -0.68691, -0.63754], [-0.42469, -0.3924, -0.36389, -0.33906, -0.31795, -0.30056, -0.28692]]
You can read line by line and split it by spaces . Working and tested code:
public static void main(String[] args) throws IOException {
Scanner inFile = new Scanner(new File("C:\\Untitled.txt"));
ArrayList<String[]> temps = new ArrayList<>();
while (inFile.hasNextLine()) {
// find next line
String line = inFile.nextLine();
String[] floats = line.split("\\s+");
temps.add(floats);
}
inFile.close();
temps.forEach(arr -> {
System.out.println(Arrays.toString(arr));
});
}
You can also read float values by regex.
Scanner inFile = new Scanner(new File("C:\\Untitled.txt"));
ArrayList<Float> temps = new ArrayList<>();
while (inFile.hasNext("-\\d\\.\\d+")) {
// find next line
String line = inFile.next();
temps.add(Float.valueOf(line));
}
inFile.close();
Read the file once with Files.readAllLines method. And split it by spaces
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.util.ArrayList;
import java.util.stream.Collectors;
public class Test {
public static void main(String[] args) {
String readFileContent = readFileContent(new File("Test.txt"));
ArrayList<Float> temps = new ArrayList<Float>();
String[] split = readFileContent.split("\\s+");
for (String num : split) {
float token = Float.parseFloat(num.trim());
temps.add(token);
}
Float [] tempsArray = temps.toArray(new Float[0]);
for (Float s : tempsArray) {
System.out.println(s);
}
}
private static String readFileContent(File file) {
try {
return Files.readAllLines(file.toPath()).stream().collect(Collectors.joining("\n"));
} catch (IOException e) {
System.out.println("Error while reading file " + file.getAbsolutePath());
}
return "";
}
}
Each line can be stored as an ArrayList and each ArrayList inside another ArrayList creating a 2-Dimensional ArrayList of Float type.
Read each line using java.util.Scanner.nextLine() and then parse each line for float value. I have used another scanner to parse each line for float values.
After parsing, store float values into a tmp ArrayList and add that list to the major List. Be sure to close the local scanner inside the while itself.
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Scanner;
public class ReadStore {
public static void main(String[] args) throws IOException {
//Input File
Scanner inFile =new Scanner(new File("Untitled1.txt")).useDelimiter("\\s");
//Major List (2-D ArrayList)
ArrayList<ArrayList<Float>> list = new ArrayList<ArrayList<Float>>();
//Reading Each Line
while (inFile.hasNextLine()) {
//tmp ArrayList
ArrayList<Float> arr = new ArrayList<Float>();
String line = inFile.nextLine();
//local scanner to be used for parsing
Scanner local = new Scanner(line);
//Parsing line for flat values
while(local.hasNext()){
if(local.hasNextFloat()){
float token = local.nextFloat();
arr.add(token);
}
}
//closing local Scanner
local.close();
//Adding to major List
list.add(arr);
}
inFile.close();
//Display List values
for(ArrayList<Float> arrList:list){
for(Float f : arrList){
System.out.print(f + " ");
}
System.out.println();
}
}
}
I'm assuming that the number of floats on each line might differ from line to line. If they're all the same, I'd suggest reading them into one big list and then splitting it into sublists.
But if you want to have all the floats on a line be in a single list, it seems like the best approach is to read the file line by line. Then, split each line into tokens, convert to float, and collect the results into a list. The overall result will be a list of list of float. This is pretty easy to do with Java 8 streams:
static List<List<Float>> readData() throws IOException {
Pattern pat = Pattern.compile("\\s+");
try (Stream<String> allLines = Files.lines(Paths.get(filename))) {
return allLines.map(line -> pat.splitAsStream(line)
.map(Float::parseFloat)
.collect(Collectors.toList()))
.collect(Collectors.toList());
}
}
Note the use of Files.lines to get a stream of lines, and also note the use of try-with-resources on the resulting stream.
If you have Java 9, you can simplify this a tiny bit by using Scanner instead of Pattern.splitAsStream to parse the tokens on each line:
static List<List<Float>> readData9() throws IOException {
try (Stream<String> allLines = Files.lines(Paths.get(filename))) {
return allLines.map(Scanner::new)
.map(sc -> sc.tokens()
.map(Float::parseFloat)
.collect(Collectors.toList()))
.collect(Collectors.toList());
}
}
Scanner sc = new Scanner("textfile.txt");
List<String> tokens = new ArrayList<String>();
for (int i =0 ; sc.hasNextLine(); i++)
{
String temp = sc.nextLine();
tokens.add(temp);
}
My textfile looks something like
A
B
C
*empty line*
D
E
F
*empty line*
and so on..
The trouble I'm having is I'm trying to store each section to an array (including the empty line), but I don't know how to go about splitting up these sections. By section I mean A B C empty line, is one section.
If you are just splitting it at new lines and not white spaces, which is what it seems to be since you are using hasNextLine() and nexLine(), you can try this.
final String NEW_LINE = System.getProperty("line.separator");
Scanner sc = new Scanner(new File("textfile.txt"));
List<String> tokens = new ArrayList<String>();
StringBuilder builder = new StringBuilder();
while(sc.hasNextLine()) {
//Read the next line
String temp = sc.nextLine();
builder.append(temp);
if(temp.trim().equals("")) {
tokens.add(builder.toString() + NEW_LINE); //Copy the gotten tokens to the list adding a new line since we read up to, not including, the new line
builder = new StringBuilder(); //Clear the builder
}
}
//Copy any remaining characters to the list
tokens.add(builder.toString() + NEW_LINE);
Instead of adding every line to list as you read from file,
append to string builder or temporary string. When you detect new line, after appending it to temporary string or string builder, add all you got so far into list. Repeat until you have lines.
See complete example as asked Originally :
package com.raj;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class Echo {
public static void main(String[] args) throws Exception {
Scanner sc
= new Scanner(new File("textfile.txt"));
List<StringBuilder> tokens
= new ArrayList<StringBuilder>();
StringBuilder builder
= new StringBuilder();
boolean saveFlag = true;
while (sc.hasNextLine()) {
String temp = sc.nextLine();
if (temp.isEmpty()) {
tokens.add(builder);
builder = new StringBuilder();
saveFlag = false;
continue;
}
builder.append(temp + "\n");
saveFlag = true;
}
sc.close();
if (saveFlag) tokens.add(builder);
for (StringBuilder sb : tokens) {
System.out.println(sb);
}
}
}
I am writing code that reads in a text file through the command line arguments in the main method and prints out each word in it on its own line without printing any word more than once, it will not print anything, can anyone help?
import java.util.*;
import java.io.*;
public class Tokenization {
public static void main(String[] args) throws Exception{
String x = "";
String y = "";
File file = new File(args[0]);
Scanner s = new Scanner(file);
String [] words = null;
while (s.hasNext()){
x = s.nextLine();
}
words = x.split("\\p{Punct}");
String [] moreWords = null;
for (int i = 0; i < words.length;i++){
y = y + " " + words[i];
}
moreWords = y.split("\\s+");
String [] unique = unique(moreWords);
for (int i = 0;i<unique.length;i++){
System.out.println(unique[i]);
}
s.close();
}
public static String[] unique (String [] s) {
String [] uniques = new String[s.length];
for (int i = 0; i < s.length;i++){
for(int j = i + 1; j < s.length;j++){
if (!s[i].equalsIgnoreCase(s[j])){
uniques[i] = s[i];
}
}
}
return uniques;
}
}
You have several problems:
you're reading whole file line by line, but assign only last line to variable x
you're doing 2 splits, both on regexp, it is enough 1
in unique - you're filling only some parts of array, other parts are null
Here is shorter version of what you need:
import java.io.File;
import java.util.HashSet;
import java.util.Scanner;
import java.util.Set;
public class Tokenization {
public static void main(String[] args) throws Exception {
Set<String> words = new HashSet<String>();
try {
File file = new File(args[0]);
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String[] lineWords = scanner.nextLine().split("[\\p{Punct}\\s]+");
for (String s : lineWords)
words.add(s.toLowerCase());
}
scanner.close();
} catch (Exception e) {
System.out.println("Cannot read file [" + e.getMessage() + "]");
System.exit(1);
}
for (String s : words)
System.out.println(s);
}
}
I need to create a method that will read the file, and check each word in the file. Each new word in the file should be stored in a string array. The method should be case insensitive. Please help.
The file says the following:
Ask not what your country can do for you
ask what you can do for your country
So the array should only contain: ask, not, what, your, country, can, do, for, you
import java.util.*;
import java.io.*;
public class TextAnalysis {
public static void main (String [] args) throws IOException {
File in01 = new File("a5_testfiles/in01.txt");
Scanner fileScanner = new Scanner(in01);
System.out.println("TEXT FILE STATISTICS");
System.out.println("--------------------");
System.out.println("Length of the longest word: " + longestWord(fileScanner));
System.out.println("Number of words in file wordlist: " );
countWords();
System.out.println("Word-frequency statistics");
}
public static String longestWord (Scanner s) {
String longest = "";
while (s.hasNext()) {
String word = s.next();
if (word.length() > longest.length()) {
longest = word;
}
}
return (longest.length() + " " + "(\"" + longest + "\")");
}
public static void countWords () throws IOException {
File in01 = new File("a5_testfiles/in01.txt");
Scanner fileScanner = new Scanner(in01);
int count = 0;
while(fileScanner.hasNext()) {
String word = fileScanner.next();
count++;
}
System.out.println("Number of words in file: " + count);
}
public static int wordList (int words) {
File in01 = new File("a5_testfiles/in01.txt");
Scanner fileScanner = new Scanner(in01);
int size = words;
String [] list = new String[size];
for (int i = 0; i <= size; i++) {
while(fileScanner.hasNext()){
if(!list[].contains(fileScanner.next())){
list[i] = fileScanner.next();
}
}
}
}
}
You could take advantage of my following code snippet (it will not store the duplicate words)!
File file = new File("names.txt");
FileReader fr = new FileReader(file);
StringBuilder sb = new StringBuilder();
char[] c = new char[256];
while(fr.read(c) > 0){
sb.append(c);
}
String[] ss = sb.toString().toLowerCase().trim().split(" ");
TreeSet<String> ts = new TreeSet<String>();
for(String s : ss)
ts.add(s);
for(String s : ts){
System.out.println(s);
}
And the output is:
ask
can
country
do
for
not
what
you
your
You could always just try:
List<String> words = new ArrayList<String>();
//read lines in your file all at once
List<String> allLines = Files.readAllLines(yourFile, Charset.forName("UTF-8"));
for(int i = 0; i < allLines.size(); i++) {
//change each line from your file to an array of words using "split(" ")".
//Then add all those words to the list "words"
words.addAll(Arrays.asList(allLines.get(i).split(" ")));
}
//convert the list of words to an array.
String[] arr = words.toArray(new String[words.size()]);
Using Files.readAllLines(yourFile, Charset.forName("UTF-8")); to read all the lines of yourFile is much cleaner than reading each individually. The problem with your approach is that you're counting the number of lines, not the number of words. If there are multiple words on one line, your output will be incorrect.
Alternatively, if you do not use Java 7, you can create a list of lines as follows and then count the words at the end (as opposed to your approach in countWords():
List<String> allLines = new ArrayList<String>();
Scanner fileScanner = new Scanner(yourFile);
while (fileScanner.hasNextLine()) {
allLines.add(scanner.nextLine());
}
fileScanner.close();
Then split each line as shown in the previous code and create your array. Also note that you should use a try{} catch block around your scanner rather than throws ideally.