I would like to receive some suggestions regarding a little problem I am going to solve in Java.
I have a file consisting in this format:
#
some text
some text
some text
#
some text
some text
some text
#
some text
some text
some text
...and so on.
I would need to read the next chunk of this text file, then to create an InputStream object consting of the read chunk and to pass the InputStream object to a parser. I have to repeat these operations for every chunk in the text file. Each chunk is written between the lines starting with #. The problem is to parse each section between the # tags using a parser which should read each chunk from an InputStream.
The text file could be big, so I would like to obtain good performance.
How could I solve this problem?
I have thought about doing something like this:
FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);
Scanner scanner = new Scanner(bufferedReader);
scanner.useDelimiter("#");
List<ParsedChunk> parsedChunks = new ArrayList<ParsedChunk>();
ChunkParser parser = new ChunkParser();
while(scanner.hasNext())
{
String text = scanner.next();
InputStream inputStream = new ByteArrayInputStream(text.getBytes("UTF-8"));
ParsedChunk parsedChunk = parser.parse(inputStream);
parsedChunks.add(parsedChunk);
inputStream.close();
}
scanner.close();
but I am not sure if it would be a good way to do it.
Thank you.
If I have understood correctly. This is what you are trying to achieve. FYI you will need JAVA 7 to get the below code running
public static void main(String[] args) throws IOException {
List<String> allLines = Files.readAllLines(new File("d:/input.txt").toPath(), Charset.defaultCharset());
List<List<String>> chunks = getChunks(allLines);
//Now you have all te chunks and you can process them
}
private static List<List<String>> getChunks(List<String> allLines) {
List<List<String>> result = new ArrayList<List<String>>();
int i = 0;
int fromIndex = 1;
int toIndex = 0;
for(String line : allLines){
i++;
if(line.startsWith("****") && i != 1){ // To skip the first line and the check next delimiter
toIndex = i-1;
result.add(allLines.subList(fromIndex, toIndex));
fromIndex = i;
}
}
return result;
}
didnt quite get the question but u could try using char at this moment as, storing all the character in char array & going thhrough a loop & condiional statement which breaks the string every time it encounters a'#'
Related
I am reading a file with comma separated values which when split into an array will have 10 values for each line . I expected the file to have line breaks so that
line = bReader.readLine()
will give me each line. But my file doesnt have a line break. Instead after the first set of values there are lots of spaces(465 to be precise) and then the next line begins.
So my above code of readLine() is reading the entire file in one go as there are no lined breaks. Please suggest how best to efficiently tackle this scenario.
One way is to replace String with 465 spaces in your text with new line character "\n" before iterating it for reading.
I second Ninan's answer: replace the 465 spaces with a newline, then run the function you were planning on running earlier.
For aesthetics and readability I would suggest using Regex's Pattern to replace the spaces instead of a long unreadable String.replace(" ").
Your code could like below, but replace 6 with 465:
// arguments are passed using the text field below this editor
public static void main(String[] args)
{
String content = "DOG,CAT MOUSE,CHEESE";
Pattern p = Pattern.compile("[ ]{6}",
Pattern.DOTALL | Pattern.CASE_INSENSITIVE);
String newString = p.matcher(content).replaceAll("\n");
System.out.println(newString);
}
My suggestion is read file f1.txt and write to anther file f2.txt by removing all empty lines and spaces then read f2.txt something like
FileReader fr = new FileReader("f1.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("f2.txt");
String line;
while((line = br.readLine()) != null)
{
line = line.trim(); // remove leading and trailing whitespace
if (!line.equals("")) // don't write out blank lines
{
fw.write(line, 0, line.length());
}
}
Then try using your code.
You might create your own subclass of a FilterInputStream or a PushbackInputStream and pass that to an InputStreamReader. One overrides int read().
Such a class unfortunately needs a bit of typing. (A nice excercise so to say.)
private static final int NO_CHAR = -2;
private boolean fromCache;
private int cachedSpaces;
private int cachedNonSpaceChar = NO_CHAR;
int read() throws IOException {
if (fromCache) {
if (cachecSpaces > 0) ...
if (cachedNonSpaceChar != NO_CHAR) ...
...
}
int ch = super.read();
if (ch != -1) {
...
}
return ch;
}
The idea is to cache spaces till either a nonspace char, and in read() either take from the cache, return \n instead, call super.read() when not from cache, recursive read when space.
My understanding is that you have a flat CSV file without proper line break, which supposed to have 10 values on each line.
Updated:
1. (Recommended) You can use Scanner class with useDelimiter to parse csv effectively, assuming you are trying to store 10 values from a line:
public static void parseCsvWithScanner() throws IOException {
Scanner scanner = new Scanner(new File("test.csv"));
// set your delimiter for scanner, "," for csv
scanner.useDelimiter(",");
// storing 10 values as a "line"
int LINE_LIMIT = 10;
// implement your own data structure to store each value of CSV
int[] tempLineArray = new int[LINE_LIMIT];
int lineBreakCount = 0;
while(scanner.hasNext()) {
// trim start and end spaces if there is any
String temp = scanner.next().trim();
tempLineArray[lineBreakCount++] = Integer.parseInt(temp);
if (lineBreakCount == LINE_LIMIT) {
// replace your own logic for handling the full array
for(int i=0; i<tempLineArray.length; i++) {
System.out.print(tempLineArray[i]);
} // end replace
// resetting array and counter
tempLineArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
}
scanner.close();
}
Or use the BufferedReader.
You might not need the ArrayList to store all values if there is memory issue by replacing your own logic.
public static void parseCsv() throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
// your delimiter
char TOKEN = ',';
// your requirement of storing 10 values for each "line"
int LINE_LIMIT = 10;
// tmp for storing from BufferedReader.read()
int tmp;
// a counter for line break
int lineBreakCount = 0;
// array for storing 10 values, assuming the values of CSV are integers
int[] tempArray = new int[LINE_LIMIT];
// storing tempArray of each line to ArrayList
ArrayList<int[]> lineList = new ArrayList<>();
StringBuilder sb = new StringBuilder();
while((tmp = br.read()) != -1) {
if ((char)tmp == TOKEN) {
if (lineBreakCount == LINE_LIMIT) {
// your logic to handle the current "line" here.
lineList.add(tempArray);
// new "line"
tempArray = new int[LINE_LIMIT];
lineBreakCount = 0;
}
// storing current value from buffer with trim of spaces
tempArray[lineBreakCount] =
Integer.parseInt(sb.toString().trim());
lineBreakCount++;
// clear the buffer
sb.delete(0, sb.length());
}
else {
// add current char from BufferedReader if not delimiter
sb.append((char)tmp);
}
}
br.close();
}
I´m in a bit of a struggle here, I´m trying to add each word from a textfile to an ArrayList and every time the reader comes across the same word again it will skip it. (Makes sense?)
I don't even know where to start. I kind of know that I need one loop that adds the textfile to the ArrayList and one the checks if the word is not in the list. Any ideas?
PS: Just started with Java
This is what I've done so far, don't even know if I'm on the right path..
public String findWord(){
int text = 0;
int i = 0;
while sc.hasNextLine()){
wordArray[i] = sc.nextLine();
}
if wordArray[i].contains() {
}
i++;
}
A List (an ArrayList or otherwise) is not the best data structure to use; a Set is better. In pseudo code:
define a Set
for each word
if adding to the set returns false, skip it
else do whatever do want to do with the (first time encountered) word
The add() method of Set returns true if the set changed as a result of the call, which only happens if the word isn't already in the set, because sets disallow duplicates.
I once made a similar program, it read through a textfile and counted how many times a word came up.
Id start with importing a scanner, as well as a file system(this needs to be at the top of the java class)
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.File;
import java.io.PrintStream;
import java.util.Scanner;
then you can make file, as well as a scanner reading from this file, make sure to adjsut the path to the file accordingly. The new Printstream is not necessary but when dealing with a big amount of data i dont like to overflow the console.
public static void main(String[] args) throws FileNotFoundException {
File file=new File("E:/Youtube analytics/input/input.txt");
Scanner scanner = new Scanner(file); //will read from the file above
PrintStream out = new PrintStream(new FileOutputStream("E:/Youtube analytics/output/output.txt"));
System.setOut(out);
}
after this you can use scanner.next() to get the next word so you would write something like this:
String[] array=new String[MaxAmountOfWords];//this will make an array
int numberOfWords=0;
String currentWord="";
while(scanner.hasNext()){
currentWord=scanner.next();
if(isNotInArray(currentWord))
{
array[numberOfWords]=currentWord
}
numberOfWords++;
}
If you dont understand any of this or need further guidence to progress, let me know. It is hard to help you if we dont exactly know where you are at...
You can try this:
public List<String> getAllWords(String filePath){
String line;
List<String> allWords = new ArrayList<String>();
BufferedReader reader = new BufferedReader(new FileReader(new File(filePath)));
//read each line of the file
while((line = reader.readLine()) != null) {
//get each word in the line
for(String word: line.split("(\\w)+"))
//validate if the current word is not empty
if(!word.isEmpty())
if(!allWords.contains(word))
allWords.add(word);
}
}
return allWords;
}
Best solution is to use a Set. But if you still want to use a List, here goes:
Suppose the file has the following data:
Hi how are you
I am Hardi
Who are you
Code will be:
List<String> list = new ArrayList<>();
// Get the file.
FileInputStream fis = new FileInputStream("C:/Users/hdinesh/Desktop/samples.txt");
//Construct BufferedReader from InputStreamReader
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
String line = null;
// Loop through each line in the file
while ((line = br.readLine()) != null) {
// Regex for finding just the words
String[] strArray = line.split("[ ]");
for (int i = 0; i< strArray.length; i++) {
if (!list.contains(strArray[i])) {
list.add(strArray[i]);
}
}
}
br.close();
System.out.println(list.toString());
If your text file has sentences with special characters, you will have to write a regex for that.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Replace first line of a text file in Java
Java - Find a line in a file and remove
I am trying to find a way to remove the first line of text in a text file using java. Would like to use a scanner to do it...is there a good way to do it without the need of a tmp file?
Thanks.
If your file is huge, you can use the following method that is performing the remove, in place, without using a temp file or loading all the content into memory.
public static void removeFirstLine(String fileName) throws IOException {
RandomAccessFile raf = new RandomAccessFile(fileName, "rw");
//Initial write position
long writePosition = raf.getFilePointer();
raf.readLine();
// Shift the next lines upwards.
long readPosition = raf.getFilePointer();
byte[] buff = new byte[1024];
int n;
while (-1 != (n = raf.read(buff))) {
raf.seek(writePosition);
raf.write(buff, 0, n);
readPosition += n;
writePosition += n;
raf.seek(readPosition);
}
raf.setLength(writePosition);
raf.close();
}
Note that if your program is terminated while in the middle of the above loop you can end up with duplicated lines or corrupted file.
Scanner fileScanner = new Scanner(myFile);
fileScanner.nextLine();
This will return the first line of text from the file and discard it because you don't store it anywhere.
To overwrite your existing file:
FileWriter fileStream = new FileWriter("my/path/for/file.txt");
BufferedWriter out = new BufferedWriter(fileStream);
while(fileScanner.hasNextLine()) {
String next = fileScanner.nextLine();
if(next.equals("\n"))
out.newLine();
else
out.write(next);
out.newLine();
}
out.close();
Note that you will have to be catching and handling some IOExceptions this way. Also, the if()... else()... statement is necessary in the while() loop to keep any line breaks present in your text file.
Without temp file you must keep everything in main memory. The rest is straight forward: loop over the lines (ignoring the first) and store them in a collection. Then write the lines back to disk:
File path = new File("/path/to/file.txt");
Scanner scanner = new Scanner(path);
ArrayList<String> coll = new ArrayList<String>();
scanner.nextLine();
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
coll.add(line);
}
scanner.close();
FileWriter writer = new FileWriter(path);
for (String line : coll) {
writer.write(line);
}
writer.close();
If file is not too big, you can read is into a byte array, find first new line symbol and write the rest of array into the file starting from position zero. Or you may use memory mapped file to do so.
I have two files:
1- with 1400000 line or record --- 14 MB
2- with 16000000 -- 170 MB
I want to find if each record or line in file 1 is also in file 2 or not
I develop a java app that do the following: Read file line by line and pass each line to a method that loop in file 2
Here is my code:
public boolean hasIDin(String bioid) throws Exception {
BufferedReader br = new BufferedReader(new FileReader("C://AllIDs.txt"));
long bid = Long.parseLong(bioid);
String thisLine;
while((thisLine = br.readLine( )) != null)
{
if (Long.parseLong(thisLine) == bid)
return true;
}
return false;
}
public void getMBD() throws Exception{
BufferedReader br = new BufferedReader(new FileReader("C://DIDs.txt"));
OutputStream os = new FileOutputStream("C://MBD.txt");
PrintWriter pr = new PrintWriter(os);
String thisLine;
int count=1;
while ((thisLine = br.readLine( )) != null){
String bioid = thisLine;
System.out.println(count);
if(! hasIDin(bioid))
pr.println(bioid);
count++;
}
pr.close();
}
When I run it seems it will take more 1944.44444444444 hours to complete as every line processing takes 5 sec. That is about three months!
Is there any ideas to make it done in a much much more less time.
Thanks in advance.
Why don't you;
read all the lines in file2 into a set. Set is fine, but TLongHashSet would be more efficient.
for each line in the second file see if it is in the set.
Here is a tuned implementation which prints the following and uses < 64 MB.
Generating 1400000 ids to /tmp/DID.txt
Generating 16000000 ids to /tmp/AllIDs.txt
Reading ids in /tmp/DID.txt
Reading ids in /tmp/AllIDs.txt
Took 8794 ms to find 294330 valid ids
Code
public static void main(String... args) throws IOException {
generateFile("/tmp/DID.txt", 1400000);
generateFile("/tmp/AllIDs.txt", 16000000);
long start = System.currentTimeMillis();
TLongHashSet did = readLongs("/tmp/DID.txt");
TLongHashSet validIDS = readLongsUnion("/tmp/AllIDs.txt",did);
long time = System.currentTimeMillis() - start;
System.out.println("Took "+ time+" ms to find "+ validIDS.size()+" valid ids");
}
private static TLongHashSet readLongs(String filename) throws IOException {
System.out.println("Reading ids in "+filename);
BufferedReader br = new BufferedReader(new FileReader(filename), 128*1024);
TLongHashSet ids = new TLongHashSet();
for(String line; (line = br.readLine())!=null;)
ids.add(Long.parseLong(line));
br.close();
return ids;
}
private static TLongHashSet readLongsUnion(String filename, TLongHashSet validSet) throws IOException {
System.out.println("Reading ids in "+filename);
BufferedReader br = new BufferedReader(new FileReader(filename), 128*1024);
TLongHashSet ids = new TLongHashSet();
for(String line; (line = br.readLine())!=null;) {
long val = Long.parseLong(line);
if (validSet.contains(val))
ids.add(val);
}
br.close();
return ids;
}
private static void generateFile(String filename, int number) throws IOException {
System.out.println("Generating "+number+" ids to "+filename);
PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter(filename), 128*1024));
Random rand = new Random();
for(int i=0;i<number;i++)
pw.println(rand.nextInt(1<<26));
pw.close();
}
170Mb + 14Mb is not so huge files.
My suggestion is to load the smallest one file into java.util.Map, parse the biggest one line-by-line (record-by-record) file and check if the current line present in this Map.
P.S. The question looks like something trivial in terms of RDBMS - maybe it's worth to use any?
You can't do an O(N^2) when each iteration is so long, that's completely unacceptable.
If you have enough RAM, you simply parse file 1, create a map of all numbers, then parse file 2 and check your map.
If you don't have enough RAM, parse file 1, create a map and store it to a file, then parse file 2 and read the map. The key is to make the map as easy to parse as possible - make it a binary format, maybe with a binary tree or something where you can quickly skip around and search. (EDIT: I have to add Michael Borgwardt's Grace Hash Join link, which shows an even better way: http://en.wikipedia.org/wiki/Hash_join#Grace_hash_join)
If there is a limit to the size of your files, option 1 is easier to implement - unless you're dealing with huuuuuuuge files (I'm talking lots of GB), you definitely want to do that.
Usually, memory-mapping is the most efficient way to read large files. You'll need to use java.nio.MappedByteBuffer and java.io.RandomAccessFile.
But your search algorithm is the real problem. Building some sort of index or hash table is what you need.
I want to read a text file containing space separated values. Values are integers.
How can I read it and put it in an array list?
Here is an example of contents of the text file:
1 62 4 55 5 6 77
I want to have it in an arraylist as [1, 62, 4, 55, 5, 6, 77]. How can I do it in Java?
You can use Files#readAllLines() to get all lines of a text file into a List<String>.
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
// ...
}
Tutorial: Basic I/O > File I/O > Reading, Writing and Creating text files
You can use String#split() to split a String in parts based on a regular expression.
for (String part : line.split("\\s+")) {
// ...
}
Tutorial: Numbers and Strings > Strings > Manipulating Characters in a String
You can use Integer#valueOf() to convert a String into an Integer.
Integer i = Integer.valueOf(part);
Tutorial: Numbers and Strings > Strings > Converting between Numbers and Strings
You can use List#add() to add an element to a List.
numbers.add(i);
Tutorial: Interfaces > The List Interface
So, in a nutshell (assuming that the file doesn't have empty lines nor trailing/leading whitespace).
List<Integer> numbers = new ArrayList<>();
for (String line : Files.readAllLines(Paths.get("/path/to/file.txt"))) {
for (String part : line.split("\\s+")) {
Integer i = Integer.valueOf(part);
numbers.add(i);
}
}
If you happen to be at Java 8 already, then you can even use Stream API for this, starting with Files#lines().
List<Integer> numbers = Files.lines(Paths.get("/path/to/test.txt"))
.map(line -> line.split("\\s+")).flatMap(Arrays::stream)
.map(Integer::valueOf)
.collect(Collectors.toList());
Tutorial: Processing data with Java 8 streams
Java 1.5 introduced the Scanner class for handling input from file and streams.
It is used for getting integers from a file and would look something like this:
List<Integer> integers = new ArrayList<Integer>();
Scanner fileScanner = new Scanner(new File("c:\\file.txt"));
while (fileScanner.hasNextInt()){
integers.add(fileScanner.nextInt());
}
Check the API though. There are many more options for dealing with different types of input sources, differing delimiters, and differing data types.
This example code shows you how to read file in Java.
import java.io.*;
/**
* This example code shows you how to read file in Java
*
* IN MY CASE RAILWAY IS MY TEXT FILE WHICH I WANT TO DISPLAY YOU CHANGE WITH YOUR OWN
*/
public class ReadFileExample
{
public static void main(String[] args)
{
System.out.println("Reading File from Java code");
//Name of the file
String fileName="RAILWAY.txt";
try{
//Create object of FileReader
FileReader inputFile = new FileReader(fileName);
//Instantiate the BufferedReader Class
BufferedReader bufferReader = new BufferedReader(inputFile);
//Variable to hold the one line data
String line;
// Read file line by line and print on the console
while ((line = bufferReader.readLine()) != null) {
System.out.println(line);
}
//Close the buffer reader
bufferReader.close();
}catch(Exception e){
System.out.println("Error while reading file line by line:" + e.getMessage());
}
}
}
Look at this example, and try to do your own:
import java.io.*;
public class ReadFile {
public static void main(String[] args){
String string = "";
String file = "textFile.txt";
// Reading
try{
InputStream ips = new FileInputStream(file);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String line;
while ((line = br.readLine()) != null){
System.out.println(line);
string += line + "\n";
}
br.close();
}
catch (Exception e){
System.out.println(e.toString());
}
// Writing
try {
FileWriter fw = new FileWriter (file);
BufferedWriter bw = new BufferedWriter (fw);
PrintWriter fileOut = new PrintWriter (bw);
fileOut.println (string+"\n test of read and write !!");
fileOut.close();
System.out.println("the file " + file + " is created!");
}
catch (Exception e){
System.out.println(e.toString());
}
}
}
Just for fun, here's what I'd probably do in a real project, where I'm already using all my favourite libraries (in this case Guava, formerly known as Google Collections).
String text = Files.toString(new File("textfile.txt"), Charsets.UTF_8);
List<Integer> list = Lists.newArrayList();
for (String s : text.split("\\s")) {
list.add(Integer.valueOf(s));
}
Benefit: Not much own code to maintain (contrast with e.g. this). Edit: Although it is worth noting that in this case tschaible's Scanner solution doesn't have any more code!
Drawback: you obviously may not want to add new library dependencies just for this. (Then again, you'd be silly not to make use of Guava in your projects. ;-)
Use Apache Commons (IO and Lang) for simple/common things like this.
Imports:
import org.apache.commons.io.FileUtils;
import org.apache.commons.lang3.ArrayUtils;
Code:
String contents = FileUtils.readFileToString(new File("path/to/your/file.txt"));
String[] array = ArrayUtils.toArray(contents.split(" "));
Done.
Using Java 7 to read files with NIO.2
Import these packages:
import java.nio.charset.Charset;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
This is the process to read a file:
Path file = Paths.get("C:\\Java\\file.txt");
if(Files.exists(file) && Files.isReadable(file)) {
try {
// File reader
BufferedReader reader = Files.newBufferedReader(file, Charset.defaultCharset());
String line;
// read each line
while((line = reader.readLine()) != null) {
System.out.println(line);
// tokenize each number
StringTokenizer tokenizer = new StringTokenizer(line, " ");
while (tokenizer.hasMoreElements()) {
// parse each integer in file
int element = Integer.parseInt(tokenizer.nextToken());
}
}
reader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
To read all lines of a file at once:
Path file = Paths.get("C:\\Java\\file.txt");
List<String> lines = Files.readAllLines(file, StandardCharsets.UTF_8);
All the answers so far given involve reading the file line by line, taking the line in as a String, and then processing the String.
There is no question that this is the easiest approach to understand, and if the file is fairly short (say, tens of thousands of lines), it'll also be acceptable in terms of efficiency. But if the file is long, it's a very inefficient way to do it, for two reasons:
Every character gets processed twice, once in constructing the String, and once in processing it.
The garbage collector will not be your friend if there are lots of lines in the file. You're constructing a new String for each line, and then throwing it away when you move to the next line. The garbage collector will eventually have to dispose of all these String objects that you don't want any more. Someone's got to clean up after you.
If you care about speed, you are much better off reading a block of data and then processing it byte by byte rather than line by line. Every time you come to the end of a number, you add it to the List you're building.
It will come out something like this:
private List<Integer> readIntegers(File file) throws IOException {
List<Integer> result = new ArrayList<>();
RandomAccessFile raf = new RandomAccessFile(file, "r");
byte buf[] = new byte[16 * 1024];
final FileChannel ch = raf.getChannel();
int fileLength = (int) ch.size();
final MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0,
fileLength);
int acc = 0;
while (mb.hasRemaining()) {
int len = Math.min(mb.remaining(), buf.length);
mb.get(buf, 0, len);
for (int i = 0; i < len; i++)
if ((buf[i] >= 48) && (buf[i] <= 57))
acc = acc * 10 + buf[i] - 48;
else {
result.add(acc);
acc = 0;
}
}
ch.close();
raf.close();
return result;
}
The code above assumes that this is ASCII (though it could be easily tweaked for other encodings), and that anything that isn't a digit (in particular, a space or a newline) represents a boundary between digits. It also assumes that the file ends with a non-digit (in practice, that the last line ends with a newline), though, again, it could be tweaked to deal with the case where it doesn't.
It's much, much faster than any of the String-based approaches also given as answers to this question. There is a detailed investigation of a very similar issue in this question. You'll see there that there's the possibility of improving it still further if you want to go down the multi-threaded line.
read the file and then do whatever you want
java8
Files.lines(Paths.get("c://lines.txt")).collect(Collectors.toList());