I am inserting 2_000_000 long values in a list. Each time i insert, i need to print the size of the list to console using SOP. If i do not print size to console the insert finishes in 250 ms. If i print size each time i insert a value then time taken is 25000 ms.
Any idea how can i fix this performance issue?
I am using Java7 and Eclipse (kepler) to test my implementation.
Note: Printing the size each time i insert a long value is required and mandatory by the problem definition.
Try using a BufferedWriter. It allows efficient character writing.
Also, don't forget to call the flush method in the end so that all buffered data is outputted.
Working example:
List<Long> list = new ArrayList<>();
try (BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(System.out))) {
for(long i = 0; i < 2000000; i++) {
list.add(i);
bw.write("Current size:" + list.size());
bw.newLine();
if(i % 100 == 0 || i == 1999999) {
bw.flush();
}
}
} catch (IOException e) {
e.printStackTrace();
}
A second option could be using a StringBuilder.
List<Long> list = new ArrayList<>();
String separator = System.getProperty("line.separator");
int initialCapacity = 200000;
StringBuilder sb = new StringBuilder(initialCapacity);
for (long i = 0; i < 2000000; i++) {
list.add(i);
sb.append("Current size:").append(list.size()).append(separator);
if (i % 10000 == 0 || i == 1999999) {
System.out.println(sb);
sb = new StringBuilder(initialCapacity);
}
}
For example
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(System.out));
for (long myLong : values)
{
list.add(myLong);
writer.write(list.size());
}
Once you're completely done, or perhaps every 1000 inserts, call
writer.flush();
Related
For a homework assignment, I need to implement external sorting such that I can sort a 10GB file with 1GB physical memory. Currently, I'm using a BufferedReader on the large file and constructing/sorting the smaller files sequentially. Then in the merge step, I have BufferedReaders open for all small files and a single BufferedWriter for the large final file where I write to the large file using the merge k sorted lists algorithm with a PriorityQueue. This works, but it needs to be faster (take half as much time to be exact).
The entire splitting step happens sequentially and the entire merging step also happens sequentially. I think I can at least split and sort the files in parallel using multiple threads with different virtual memory spaces. Then the memory used is mostly memory-mapped files and the OS will take care of optimally paging in and out data from physical memory. I was wondering if there was a way for Java to do this using parallel streams. Something along the lines of:
largeFile.splitInParallel(100000)
.lines()
.map((s) -> new LineObject(s))
.sorted()
.forEach(writeSmallFileToDisk)
where the argument to splitInParallel is the number of lines I want in the smaller files. Any help is appreciated, thanks!
EDIT:
My code is
public class Main {
private static final int BUFFER_SIZE = 10_000_000;
/**
* A main method to run examples.
*
* #param args not used
*/
public static void main(String[] args) throws IOException {
System.out.println("Starting...");
String file = args[0];
int batchSize = Integer.parseInt(args[1]);;
try {
FileInputStream fin = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fin, BUFFER_SIZE);
BufferedReader br = new BufferedReader(new InputStreamReader(bis), BUFFER_SIZE);
int lineNumber = 0;
int batchId = 0;
String line;
TaxiEntry[] batch = new TaxiEntry[batchSize];
int i = 0;
while ((line = br.readLine()) != null) {
TaxiEntry taxiEntry = parseLine(line);
batch[i++] = taxiEntry;
lineNumber++;
if (lineNumber % batchSize == 0) {
String outputFileName = String.format("batches/batch_%d.txt", batchId);
BufferedWriter bf = new BufferedWriter(new FileWriter(outputFileName, true), BUFFER_SIZE);
Arrays.parallelSort(batch);
for (int j = 0; j < i; j++) {
bf.write(batch[j].toString());
if (j != i) {
bf.newLine();
}
}
batchId++;
i = 0;
bf.flush();
}
}
String outputFileName = String.format("batches/batch_%d.txt", batchId);
BufferedWriter bf = new BufferedWriter(new FileWriter(outputFileName, true), BUFFER_SIZE);
Arrays.parallelSort(batch, 0, i);
for (int j = 0; j < i; j++) {
bf.write(batch[j].toString());
if (j != i) {
bf.newLine();
}
}
batchId++;
bf.flush();
System.out.println("Processed " + lineNumber + " lines");
merge(batchId);
} catch (IOException e) {
e.printStackTrace();
}
}
public static void merge(int numBatches) throws IOException {
System.out.println("Starting merge...");
// Open readers
BufferedReader[] readers = new BufferedReader[numBatches];
for (int i = 0; i < numBatches; i++) {
String file = String.format("batches/batch_%d.txt", i);
FileInputStream fin = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fin, BUFFER_SIZE);
BufferedReader br = new BufferedReader(new InputStreamReader(bis), BUFFER_SIZE);
readers[i] = br;
}
// Merge
String outputFileName = "result/final.txt";
BufferedWriter bf = new BufferedWriter(new FileWriter(outputFileName, true), BUFFER_SIZE);
PriorityQueue<IndexedTaxiNode> curEntries = new PriorityQueue<>();
for (int i = 0; i < numBatches; i++) {
BufferedReader reader = readers[i];
String next = reader.readLine();
if (next != null) {
TaxiEntry curr = parseLine(next);
curEntries.add(new IndexedTaxiNode(curr, i));
}
}
while (!curEntries.isEmpty()) {
// get max from curEntries
IndexedTaxiNode maxNode = curEntries.remove();
bf.write(maxNode.toString());
bf.newLine();
int index = maxNode.index;
String next = readers[index].readLine();
if (next != null) {
TaxiEntry newEntry = parseLine(next);
curEntries.add(new IndexedTaxiNode(newEntry, index));
}
}
bf.flush();
}
public static TaxiEntry parseLine(String line) {
return new TaxiEntry(line, Double.parseDouble(line.split(",")[16]));
}
}
Doing some timings. I found that the time to read from disk and the time to
do a sort are similar order of magnitude.
System.out.println("Begin loading file");
// do loading stuff
System.out.format("elapsed %.03f ms%n%n", (finishTime - startTime) / 1e6);
System.out.println("Sorting lines");
// do sorting stuff
System.out.format("elapsed %.03f ms%n", (finishTime - startTime) / 1e6);
Console output is:
Begin loading file
elapsed 918.933 ms
Sorting lines
elapsed 1360.896 ms
I used a modest file of about 150 MB for the timings. It might not be a good idea to have lots of threads all reading from disk at the same time.
My suggestion for what it's worth is to have one thread that does all of the disk reading, and another thread that concurrently does sorting. I could only see a way to do this for the splitting and sorting phase.
For the splitting phase, you cannot read all the segments in one go because that would consume too much memory. So you read a few segments, write a few, read a few, and so on. The idea of this interleaving, is to ensure the disk is continuously kept busy, by delegating the sorting operation to another thread. Hopefully by the time the disk is ready to write a segment the sort on that segment has completed so the disk never has to wait.
List<String> lines = new ArrayList<>();
int i = 0;
while (someCondition()) {
String line = reader.readLine();
lines.add(line);
if (lines.size() == BATCH_SIZE) {
sendMsgToWorker(lines); // send to worker thread
if (i == MAX_MESSAGE_QUEUE - 1) {
for (int j = 0; j < MAX_MESSAGE_QUEUE; j++) {
List<String> sortedLines = waitForLineFromWorker(); // wait for worker thread
writeTmpFile(sortedLines);
}
}
lines = new ArrayList<>();
i = (i + 1) % MAX_MESSAGE_QUEUE;
}
}
An outline for the splitting and sorting phase is shown above, without covering any edge cases. The amount of memory used would be proportional to BATCH_SIZE * MAX_MESSAGE_QUEUE.
Unfortunately, I don't see a way to apply concurrency to the phase of merging the multiple files. The disk is just the disk so cannot go any faster even with multiple threads.
You could try investigating parallel quicksort, but the problem with quicksort is choosing a pivot point so that the partitions end up a reasonable size.
I am looking for a quick way to read in the roughly 150mb worth of spectroscopic data I have into a program I am writing. The data is currently stored in a text file (.dat) and its content is stored in a format like:
489.99992 490.000000.011780.01409
where the first N values represent x values and are separated by spaces and the last N values are y values separated by newline characters. (eg. x1= 489.99992, x2= 490.00000, y1=0.01178, y2=0.01409).
I wrote the following parser,
private void parse()
{
FileReader reader = null;
String currentNumber = "";
int indexOfIntensity = 0;
long startTime = System.currentTimeMillis();
try
{
reader = new FileReader(FILE);
char[] chars = new char[65536];
boolean waveNumMode = true;
double valueAsDouble;
//get buffer sized chunks of data from the file
for(int len; (len = reader.read(chars)) > 0;)
{
//parse through the buffer
for(int i = 0; i < len; i++)
{
//is a new number if true
if((chars[i] == ' ' || chars[i] == '\n') && currentNumber != "")
{
try
{
valueAsDouble = Double.parseDouble(currentNumber);
}catch(NumberFormatException nfe)
{
System.out.println("Could not convert to double: " + currentNumber);
currentNumber = "";
continue;
}
if(waveNumMode)
{
//System.out.println("Wavenumber: " + valueAsDouble);
listOfPoints.add(new Tuple(valueAsDouble));
}else
{
//System.out.println("Intensity: " + valueAsDouble);
listOfPoints.get(indexOfIntensity).setIntensityValue(valueAsDouble);
indexOfIntensity++;
}
if(chars[i] == '\n')
{
waveNumMode = false;
}
currentNumber = ""; //clear for the next number
continue;
}
currentNumber += chars[i];
}
}
} catch (IOException e) {
e.printStackTrace();
}
try
{
reader.close();
} catch (IOException e)
{
e.printStackTrace();
}
long stopTime = System.currentTimeMillis();
System.out.println("Execution time: " + ((stopTime - startTime) / 1000.0) + " seconds");
}
but this takes around 50 seconds to finish for the 150mb file. For reference, we are using another piece of software which does this in roughly half a second (however it uses its own custom file type). I am willing to use a different file type or whatever really if it brings the execution time down. How can I speed this up?
Thanks in advance
In order to optimize code, you first need to find what parts of the code are slowing things down. Use a profiler to measure your code's performance and identify what parts are slowing down the process.
try reading all bytes from the file at once and then parse:
Files.readAllBytes(Paths.get(fileName))
as reader.read() operation is costly in Java.
You can also try surrounding your FileReader with BufferReader and then check if any performance gain.
For more info, visit the link:
https://www.geeksforgeeks.org/different-ways-reading-text-file-java/
We process huge files (sometimes 50 GB each file). The application reads this one file and based on the business logic, it will write multiple output files (4-6).
The records in the file are of variable length and each field in a record is a delimiter separated.
Going by the understanding that reading a file using FileChannel with a ByteBuffer was always better than using a BufferedReader.readLine and then using a split by the delimiter.
BufferSizes tried 10240(10KB) and even more
Commit interval - 5000, 10000 etc
Below is how we used file channel to read:
Read byte by byte. Check if the read byte is a new line char(10) -
which means end of line.
check for delimiter bytes. capture the bytes read in a byte array(we initialized this byte array with a maximum field size of 350 bytes) until delimiter bytes are encountered.
convert these bytes read until this time, to String using UTF-8 encoding - new String(byteArr, 0, index,"UTF-8") to be specific - index is the number of bytes read until delimiter.
Using this method of reading the file using FileChannel took 57 minutes to process the file.
We want to decrease this time and tried using BufferredReader.readLine() and then use a split by delimiter, to see how it fares.
And shockingly the same file completed processing only in 7 minutes.
What's the catch here? Why FileChannel is taking more time than a buffered reader and then using a string split.
I was always under the assumption that ReadLine and Split combination will have a big performance impact?
Can any one throw light on if I was using FileChannel in a wrong way? One
Thanks in advance. Hope I have summarized the issue properly.
The below is sample code :
while (inputByteBuffer.hasRemaining() && (b = inputByteBuffer.get()) != 0){
boolean endOfField = false;
if (b == 10){
break;
}
else{
if (b == 94){//^
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (inputByteBuffer.hasRemaining()){
byte b2 = inputByteBuffer.get();
if (b2 == 124){//|
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (inputByteBuffer.hasRemaining()){
byte b3 = inputByteBuffer.get();
if (b3 == 94){//^
String field = new String(fieldBytes, 0, index, encoding);
if(fieldIndex == -1){
fields = new String[sizeFromAConfiguration];
}else{
fields[fieldIndex] = field;
}
fieldBytes = new byte[maxFieldSize];
endOfField = true;
fieldIndex++;
}
else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b2, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b3, index);
}
}
else{
endOfFile = true;
//fields.add(new String(fieldBytes, 0, index, encoding));
fields[fieldIndex] = new String(fieldBytes, 0, index, encoding);
fieldBytes = new byte[maxFieldSize];
endOfField = true;
}
}else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
index++;
fieldBytes = addFieldBytes(fieldBytes, b2, index);
}
}else{
endOfFile = true;
fieldBytes = addFieldBytes(fieldBytes, b, index);
}
}
else{
fieldBytes = addFieldBytes(fieldBytes, b, index);
}
}
if (!inputByteBuffer.hasRemaining()){
inputByteBuffer.clear();
noOfBytes = inputFileChannel.read(inputByteBuffer);
inputByteBuffer.flip();
}
if (endOfField){
index = 0;
}
else{
index++;
}
}
You're causing a lot of overhead with the constant hasRemaining()/read() checks as well as the constant get() calls. It would probably be better to get() the entire buffer into an array and process that directly, only calling read() when you get to the end.
And to answer a question in comments, you should not allocate a new ByteBuffer per read. This is expensive. Keep using the same one. And NB do not use a DirectByteBuffer for this application. It is not appropriate: it's only appropriate when you want the data to stay south of the JVM/JNI boundary, e.g. when merely copying between channels.
But I think I would throw this away, or rather rewrite it, using BufferedReader.read(), rather than readLine() followed by string splits, and using much the same logic as you have here, except of course that you don't need to keep calling hasRemaining() and filling the buffer, which BufferedReader will do automatically for you.
You have to take care to store the result of read() into an int, and to check it for -1 after every read().
It isn't clear to me that you should be using a Reader at all actually, unless you know you have multibyte text. Possibly a simple BufferedInputStream would be more appropriate.
While one cannot tell with certainty how a particular code will behave I would imagine the best way is to profile it just like you did.The FileChannel while percieved to be faster is actually not helping in your case.But this may not be because of reading from the file but actual processing that you do with the content you read.
One article I would like to point out while dealing with files is
https://www.redgreencode.com/why-is-java-io-slow/
Also the corresponding Github codebase
Java IO benchmark
I would like to point out this code to use a combination of both worlds
fos = new FileOutputStream(outputFile);
outFileChannel = fos.getChannel();
bufferedWriter = new BufferedWriter(Channels.newWriter(outFileChannel, "UTF-8"));
Since it is read in your case I will consider
File inputFile = new File("C:\\input.txt");
FileInputStream fis = new FileInputStream(inputFile);
FileChannel inputChannel = fis.getChannel();
BufferedReader bufferedReader = new BufferedReader(Channels.newReader(inputChannel,"UTF-8"));
Also I will tweak the chunksize and with Spring batch it is always trial and error to find sweet spot.
On a completely unrelated note the reason for your problem of not able to use BufferedReader is because of doubling of charecters and I am assuming this happens more commonly with ebcdic charecters.I will simply run a loop like this to identfy the troublemakers and eliminate at the source.
import java.io.UnsupportedEncodingException;
public class EbcdicConvertor {
public static void main(String[] args) throws UnsupportedEncodingException {
int index = 0;
for (int i = -127; i < 128; i++) {
byte[] b = new byte[1];
b[0] = (byte) i;
String cp037 = new String(b, "CP037");
if (cp037.getBytes().length == 2) {
index++;
System.out.println(i + "::" + cp037);
}
}
System.out.println(index);
}
}
The above answer is without testing my actual hypothesis.Here is an actual program to measure time.The results speak for themselves on a 200 MB file
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.Channels;
import java.nio.channels.FileChannel;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.regex.Pattern;
public class ReadComplexDelimitedFile {
private static long total = 0;
private static final Pattern DELIMITER_PATTERN = Pattern.compile("\\^\\|\\^");
private void readFileUsingScanner() {
String s;
try (Scanner stdin = new Scanner(new File(this.getClass().getResource("input.txt").getPath()))) {
while (stdin.hasNextLine()) {
s = stdin.nextLine();
String[] fields = DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingCustomBufferedReader() {
try (BufferedReader stdin = new BufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReader() {
try (java.io.BufferedReader stdin = new java.io.BufferedReader(new FileReader(new File(this.getClass().getResource("input.txt").getPath())))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total += fields.length;
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReaderFileChannel() {
try (FileInputStream fis = new FileInputStream(this.getClass().getResource("input.txt").getPath())) {
try (FileChannel inputChannel = fis.getChannel()) {
try (BufferedReader stdin = new BufferedReader(Channels.newReader(inputChannel, "UTF-8"))) {
String s;
while ((s = stdin.readLine()) != null) {
String[] fields = DELIMITER_PATTERN.split(s, 0);
total = total + fields.length;
}
}
} catch (Exception e) {
System.err.println("Error");
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingBufferedReaderByteFileChannel() {
try (FileInputStream fis = new FileInputStream(this.getClass().getResource("input.txt").getPath())) {
try (FileChannel inputChannel = fis.getChannel()) {
try (BufferedReader stdin = new BufferedReader(Channels.newReader(inputChannel, "UTF-8"))) {
int b;
StringBuilder sb = new StringBuilder();
while ((b = stdin.read()) != -1) {
if (b == 10) {
total = total + DELIMITER_PATTERN.split(sb, 0).length;
sb = new StringBuilder();
} else {
sb.append((char) b);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
} catch (Exception e) {
System.err.println("Error");
}
}
private void readFileUsingFileChannelStream() {
try (RandomAccessFile fis = new RandomAccessFile(new File(this.getClass().getResource("input.txt").getPath()), "r")) {
try (FileChannel inputChannel = fis.getChannel()) {
ByteBuffer byteBuffer = ByteBuffer.allocate(8192);
ByteBuffer recordBuffer = ByteBuffer.allocate(250);
int recordLength = 0;
while ((inputChannel.read(byteBuffer)) != -1) {
byte b;
byteBuffer.flip();
while (byteBuffer.hasRemaining() && (b = byteBuffer.get()) != -1) {
if (b == 10) {
recordBuffer.flip();
total = total + splitIntoFields(recordBuffer, recordLength);
recordBuffer.clear();
recordLength = 0;
} else {
++recordLength;
recordBuffer.put(b);
}
}
byteBuffer.clear();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private int splitIntoFields(ByteBuffer recordBuffer, int recordLength) {
byte b;
String[] fields = new String[17];
int fieldCount = -1;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < recordLength - 1; i++) {
b = recordBuffer.get(i);
if (b == 94 && recordBuffer.get(++i) == 124 && recordBuffer.get(++i) == 94) {
fields[++fieldCount] = sb.toString();
sb = new StringBuilder();
} else {
sb.append((char) b);
}
}
fields[++fieldCount] = sb.toString();
return fields.length;
}
public static void main(String args[]) {
//JVM wamrup
for (int i = 0; i < 100000; i++) {
total += i;
}
// We know scanner is slow-Still warming up
ReadComplexDelimitedFile readComplexDelimitedFile = new ReadComplexDelimitedFile();
List<Long> longList = new ArrayList<>(50);
for (int i = 0; i < 50; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingScanner();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingScanner");
longList.forEach(System.out::println);
// Actual performance test starts here
longList = new ArrayList<>(10);
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReaderFileChannel();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReaderFileChannel");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingCustomBufferedReader();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingCustomBufferedReader");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingBufferedReaderByteFileChannel();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingBufferedReaderByteFileChannel");
longList.forEach(System.out::println);
longList.clear();
for (int i = 0; i < 10; i++) {
total = 0;
long startTime = System.nanoTime();
readComplexDelimitedFile.readFileUsingFileChannelStream();
long stopTime = System.nanoTime();
long timeDifference = stopTime - startTime;
longList.add(timeDifference);
}
System.out.println("Time taken for readFileUsingFileChannelStream");
longList.forEach(System.out::println);
}
}
BufferedReader was written very long back and hence we can rewrite some parts relevant to this example.For instance we don't care about \r and skipLF or skipCR or those kinds of stuff
We are going to read the file( no need for syncrhonized)
By extension no need for StringBuffer even otherwise StringBuilder can be used.Performance improvement immediately seen.
dangerous hack,remove synchronized and replace StringBuffer with StringBuilder don't use it without proper testing and not knowing what you are doing
public String readLine() throws IOException {
StringBuilder s = null;
int startChar;
bufferLoop:
for (; ; ) {
if (nextChar >= nChars)
fill();
if (nextChar >= nChars) { /* EOF */
if (s != null && s.length() > 0)
return s.toString();
else
return null;
}
boolean eol = false;
char c = 0;
int i;
/* Skip a leftover '\n', if necessary */
charLoop:
for (i = nextChar; i < nChars; i++) {
c = cb[i];
if (c == '\n') {
eol = true;
break charLoop;
}
}
startChar = nextChar;
nextChar = i;
if (eol) {
String str;
if (s == null) {
str = new String(cb, startChar, i - startChar);
} else {
s.append(cb, startChar, i - startChar);
str = s.toString();
}
nextChar++;
return str;
}
if (s == null)
s = new StringBuilder(defaultExpectedLineLength);
s.append(cb, startChar, i - startChar);
}
}
Java 8 Intel i5 12 GB RAM Windows 10
Result:
Time taken for readFileUsingBufferedReaderFileChannel::
2581635057 1849820885 1763992972 1770510738 1746444157 1733491399
1740530125 1723907177 1724280512 1732445638
Time taken for readFileUsingBufferedReader
1851027073 1775304769 1803507033 1789979554 1786974538 1802675458
1789672780 1798036307 1789847714 1785302003
Time taken for readFileUsingCustomBufferedReader
1745220476 1721039975 1715383650 1728548462 1724746005 1718177466
1738026017 1748077438 1724608192 1736294175
Time taken for readFileUsingBufferedReaderByteFileChannel
2872857919 2480237636 2917488143 2913491126 2880117231 2904614745
2911756298 2878777496 2892169722 2888091211
Time taken for readFileUsingFileChannelStream
3039447073 2896156498 2538389366 2906287280 2887612064 2929288046
2895626578 2955326255 2897535059 2884476915
Process finished with exit code 0
I did try NIO with all possible options(provided in this post and to the best of my knowledge and research) and found that it no where came close to BufferedReader in terms of reading a text file.
Changing BufferedReader to use StringBuilder in place of StringBuffer, I don't see any significant improvement in performance (only very few seconds for some files and some of them were better using StringBuffer itself).
Removing synchronized block also didn't give much/any improvement. And it's not worth to tweak something by which we didn't receive any benefit.
The below is the time taken(reading, processing, writing - time taken for processing and writing is not significant - not even 20% of time) for file which is around 50 GB
NIO : 71.67 (Minutes)
IO (BufferedReader) : 10.84 (Minutes)
Thank you all for your time to reading and responding to this post and providing suggestions.
The main issue here is creating a new byte[] very rapidly(fieldBytes = new byte[maxFieldSize];).
Since for every iteration a new array is being created, garbage collection is being kicked off very often which triggers "stop the world" to reclaim the memory.
And also, the object creation could be expensive.
We could rather initialize the byte array once and then track the indexes to just convert the field to string with an end index.
And anyway, BufferedReader is faster than FileChannel, atleast to read the ASCII files, and to keep the code simple, we continued using Bufferred Reader itself.
Using Bufferred reader, the development and testing effort can be reduced by not having tedious logic to find delimiters and populating the object.
I have to write 100 random integers to a file, and display them in increasing order. PrintWriter writes them, but when I try to read from file, method hasNext() return false, and I can't understand why. I suppose the problem is with PrintWriter.
try(PrintWriter output = new PrintWriter(file);
Scanner input = new Scanner(file)) {
for (int i = 0; i < 100; i++) {
output.print((int)(Math.random() * 101) + " ");
}
int[] numbers = new int[100];
int i = 0;
while (input.hasNextInt()) {
numbers[i++] = input.nextInt();
}
Arrays.sort(numbers);
for (int n : numbers)
System.out.println(n);
} catch (FileNotFoundException ex) {
System.out.println("Cannot find the file!");
}
You need to flush the output stream so changes are written to the file.
...
for (int i = 0; i < 100; i++) {
output.print((int) (Math.random() * 101) + " ");
}
output.flush();
...
Additionally, close flushes the stream automatically in this case - so if you tried to read the file after the try block, you would see the changes. For a PrintWriter, flush is also called whenever a newline is printed (via println or manually via printf/print).
I believe it is usually not a good idea to have a reader and writer open simultaneously.
I want to sequentially read each line of an input unsorted file into consecutive elements of the array until there are no more records in
the file or until the input size is reached, whichever occurs first. but i can't think of a way to check the next line if its the end of the file?
This is my code:
Scanner cin = new Scanner(System.in);
System.out.print("Max number of items: ");
int max = cin.nextInt();
String[] input = new String[max];
try {
BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"));
for(int i=0; i<max; i++){ //to do:check for empty record
input[i] = br.readLine();
}
}
catch (IOException e){
System.out.print(e.getMessage());
}
for(int i=0; i<input.length; i++){
System.out.println((i+1)+" "+input[i]);
}
the file has 205 lines, if I input 210 as max, the array prints with five null elements like so..
..204 Seychelles
205 Algeria
206 null
207 null
208 null
209 null
210 null
Thanks for your responses in advance!
From the docs:
public String readLine()
Returns: A String containing the contents of the line, not including
any line-termination characters, or null if the end of the stream has
been reached
In other words, you should do
String aux = br.readLine();
if(aux == null)
break;
input.add(aux)
I recomend you use a variable-size array (you can pre-allocated with the requested size if reasonable). Such that you get either the expected size or the actual number of lines, and can check later.
(depending on how long your file is, you might want to look at readAllLines() too.)
Please refer this Number of lines in a file in Java and modify your for loop to take whatever is the least out of the entered max value or the no.of lines in the file.
Use List<String>
List<String> lines = new ArrayList<>(); // Growing array.
try (BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"))) {
for(;;) {
String line = br.readLine();
if (line == null) {
break;
}
lines.add(line);
}
} catch (IOException e) {
System.out.print(e.getMessage());
} // Closes automatically.
// If lines wanted as array:
String[] input = lines.toArray(new String[lines.size()]);
Using a dynamically growing ArrayList is the normal way to deal with such problem.
P.S.
FileReader will read in the current platform encoding, i.e. a local file, created locally.
You could do a null check in your first for-loop like:
public static void main(String[] args) {
Scanner cin = new Scanner(System.in);
System.out.print("Max number of items: ");
int max = cin.nextInt();
BufferedReader br = new BufferedReader(new FileReader("src/ioc.txt"));
List<String> input = new ArrayList<>();
String nextString;
int i;
for (i = 0; i < max && ((nextString = br.readline()) != null); i++) {
input.add(nextString);
}
for (int j = 0; j < i; j++) {
System.out.println((j + 1) + " " + input.get(j));
}
}
Try :
for(int i=0; i<max; i++){ //to do:check for empty record
if(br.readLine()!=null)
input[i] = br.readLine();
else
break;
}
int i=0;
for(; i<max; i++){ //to do:check for empty record
String line=br.readLine();
if(line==null){
break;
}
input[i] = line;
}
//i will contain the count of lines read. indexes 0...(i-1) represent the data.