Reading a text file using BufferedReader.readLine() is too slow - java

I am trying to read a text file which contains about 1000 very long lines. Entire file stands at about 1.4MB.
I am using BufferedReader's readLine method to read file. What happens is it takes 8-10 seconds to print the output on console. I tried the same using fgets of php and it prints all the same lines in blink of an eye!!! How is it possible?
Below is the code I am using
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.logging.Level;
import java.util.logging.Logger;
public class ClickLogDataImporter {
public static void main(String [] args) {
try {
new ClickLogDataImporter().getFileData();
} catch (Exception ex) {
Logger.getLogger(ClickLogDataImporter.class.getName()).log(Level.SEVERE, null, ex);
}
}
public void getFileData() throws FileNotFoundException, IOException {
String path = "/home/shantanu/Documents";
BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream(path+"/sample.txt")));
String line = "";
while((line = (br.readLine())) != null) {
System.out.println(line);
}
}
}
PHP code
<?php
$fileName = "/home/shantanu/Documents/sample.txt";
$file = fopen($fileName, 'r');
while(($line = fgets($file)) != false) {
echo $line."\n";
}
?>
Please enlighten me about this issue

I'm not sure but I think PHP just prints the file given the method you used, Java reads the file and gets every lines from it, that means checking every character for a line breaker, the process does not seem to be the same at all.
string file_get_contents
If you try and print each line one by one from the file with PHP, it should be slower.

8 seconds for that code sounds much too long to me. I suspect something else is going on, to be honest. Are you sure it's not console output which is taking a long time?
I suggest you time it (e.g. with System.nanoTime) writing out the total time at the end, but run it with a console minimized. I suspect you'll find it's fast enough then.

Isn't that just the console output that is slow? Now that you know that you're file is read correctly, try by commenting out the line System.out.println(line);.

file_get_contents loads all the file contents into a String, with your code in Java you are reading and printing line by line.
If you are testing inside an IDE like Eclipse, the console output can be quite slow.
If you want the exact behavior of file_get_contents, you can use this dirty code :
File f = new File(path, "sample.txt");
ByteArrayOutputStream bos = new ByteArrayOutputStream(new Long(Math.min(Integer.MAX_VALUE, f.length())).intValue());
FileInputStream fis = new FileInputStream(f);
byte[] buf = new byte[1024 * 8];
int size;
while((size = fis.read(buf)) > 0) {
bos.write(buf, 0, size);
}
fis.close();
bos.close();
System.out.println(new String(bos.toByteArray()));

Well if you r using readline it will go and read the file 1000 times for each line . Try using the read function with a very big buffer say over 28000 or so. It will then read a file say a total of 60 times for 1.4 MB which is much lesser than 1000. If u use a small buffer of 1000, then its gonna read the file around 1300 or something which is even slower than 1000( readline ). Also while printing the lines use print instead of println since the lines are not exactly lines but an array of characters.

Readers are usually slow, you should try Stream readers which are fast. And make sure that FIlE opening process is not taking time. If File is opened and stream objects are created and then measure time, then you can figure out exactly it is due to File opening issue or reading the file issue. Make sure that system io load is not high at the time of this operation, otherwise you measurement will go bad.
BufferedInputStream reader=new BufferedInputStream(new FileInputStream("/home/shantanu/Documents/sample.txt"));
byte[] line=new byte[1024];
while(reader.read(line)>0) {
System.out.println(new String(line));
}

Related

Limit Buffer Reader to prevent DoS Attack

I need help with the below code. I need to review it and fix the security issues within the code. The issue that I see is the BufferReader should read in chunks. This would possibly prevent a DOS Attack.The way the code is written now it will read a infinite length. I'm not sure the best way to limit the BufferReader.Any help would be appreciated.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class example {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// Read the filename from the command line argument
String filename = args[0];
BufferedReader inputStream = null;
String fileLine;
try {
inputStream = new BufferedReader(new FileReader(filename));
System.out.println("Email Addresses:");
// Read one Line using BufferedReader
while ((fileLine = inputStream.readLine()) != null) {
System.out.println(fileLine);
}
} catch (IOException io) {
System.out.println("File IO exception" + io.getMessage());
} finally {
// Need another catch for closing
// the streams
try {
if (inputStream != null) {
inputStream.close();
}
} catch (IOException io) {
System.out.println("Issue closing the Files" + io.getMessage());
}
}
}
}
The requirement behind the warning about BufferedReader.readLine is to impose a reasonable bound on the maximum amount of memory that an adversary can cause to be allocated at a time. In this case the important usage is the size of the String characters and roughly the same in the buffer used to create it. If the adversary can do this multiple times at once, then that will also need to be limited. Typically, if the resource can be stopped but not closed (for instance, over a network file system) then the buffer can be kept in memory indefinitely.
The easy, general solution is to implement an InputStream that limits the total number of bytes that can be read through it. That could also be implemented at the Reader level limiting the number of characters. The dirty way around is to ignore the BufferedReader and do the reading of char arrays and combining into a StringBuilder yourself.
Presumably various third party library include code that covers those approaches.
(Also: Do use try-with-resource. FileReader picks up whatever character coding has been left as the default, which is probably wrong. Adding throws IOException to main makes the simpler.)

Separating Get request Response body in java Socket programming

I'm trying to write a curl like program using java, which uses only java socket programming (and not apache http client or any other APIs)
I want to have the option of showing whole or only the body of the response to my get request to user. Currently came up with the following code:
BufferedReader br = new BufferedReader(new InputStreamReader(s.getInputStream()));
String t;
while ((t = br.readLine()) != null) {
if (t.isEmpty() && !parameters.isVerbose()) {
StringBuilder responseData = new StringBuilder();
while ((t = br.readLine()) != null) {
responseData.append(t).append("\r\n");
}
System.out.println(responseData.toString());
parameters.verbose = false;
break;
} else if(parameters.isVerbose())// handle output
System.out.println(t);
}
br.close();
When the verbose option is on, it works quick and shows the whole response body in less than a second. but when I want to just have the body of the message it takes too much time(approx 10 sec) to hand it out.
Does any one knows how can it be processed in a faster way?
Thank you.
I'm going to assume what you mean by slow is that it starts displaying something almost immediately but keeps on printing lines for a long time. Writing to the console takes time, and you're printing each line invidually while in the other code path you first store the entire response in memory and then flush it to the console.
If the verbose response is small enough to fit in memory, you should do the same, otherwise you can decide on an arbitrary number of lines to print in batches (i.e; you accumulate n lines in memory and then flush to the console, clear the StringBuilderand repeat).
The most elegant way to implement my suggestion is to use a PrintStream wrapping a BufferedOutputStream, itself wrapping System.out. All my comments and advices are condensed in the following snippet:
private static final int BUFFER_SIZE = 4096;
public static void printResponse(Socket socket, Parameters parameters) throws IOException {
try (BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
PrintStream printStream = new PrintStream(new BufferedOutputStream(System.out, BUFFER_SIZE))) {
// there is no functional difference in your code between the verbose and non-verbose code paths
// (they have the same output). That's a bug, but I'm not fixing it in my snippet as I don't know
// what you intended to do.
br.lines().forEach(line -> printStream.append(line).append("\r\n"));
}
}
If it uses any language construct you don't know about, feel free to ask further questions.

copying XML file from URL returns incomplete file

I am writing a small program to retrieve a large number of XML files. The program sort of works, but no matter which solution from stackoverflow I use, every XML file I save locally misses the end of the file. By "the end of the file" I mean approximately 5-10 lines of xml code. The files are of different length (~500-2500 lines) and the total length doesn't seem to have an effect on the size of the missing bit. Currently the code looks like this:
package plos;
import static org.apache.commons.io.FileUtils.copyURLToFile;
import java.io.File;
public class PlosXMLfetcher {
public PlosXMLfetcher(URL u,File f) {
try {
org.apache.commons.io.FileUtils.copyURLToFile(u, f);
} catch (IOException ex) {
Logger.getLogger(PlosXMLfetcher.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I have tried using BufferedInputStream and ReadableByteChannel as well. I have tried running it in threads, I have tried using read and readLine. Every solution gives me an incomplete XML file as return.
In some of my tests (I can't remember which, sorry), I got a socket connection reset error - but the above code executes without error messages.
I have manually downloaded some of the XML files as well, to check if they are actually complete on the remote server - which they are.
I'm guessing that somewhere along the way a BufferedWriter or BufferedOutputStream has not had flush() called on it.
Why not write your own copy function to rule out FileUtils.copyURLToFile(u, f)
public void copyURLToFile(u, f) {
InputStream in = u.openStream();
try {
FileOutputStream out = new FileOutputStream(f);
try {
byte[] buffer = new byte[1024];
int count;
while ((count = in.read(buffer) > 0) {
out.write(buffer, 0, count);
}
out.flush();
} finally {
out.close();
}
} finally {
in.close();
}
}

Java - Flushing the OutputStream of a process doesn't send the data immediately if it's too small

I'm firing up an external process from Java and grabbing its stdin, stdout and stderr via process.getInputStream() etc. My issue is: when I want to write data to my output stream (the proc's stdin) it's not getting sent until I actually call close() on the stream. I am explicitly calling flush().
I did some experimenting and noticed that if I increased the number of bytes I was sending, it would eventually go through. The magic number, on my system, is 4058 bytes.
To test I'm sending the data over to a perl script which reads like this:
#!/usr/bin/perl
use strict;
use warnings;
print "Perl starting";
while(<STDIN>) {
print "Perl here, printing this: $_"
}
Now, here's the java code:
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
public class StreamsExecTest {
private static String readInputStream(InputStream is) throws IOException {
int guessSize = is.available();
byte[] bytes = new byte[guessSize];
is.read(bytes); // This call has side effect of filling the array
String output = new String(bytes);
return output;
}
public static void main(String[] args) {
System.out.println("Starting up streams test!");
ProcessBuilder pb;
pb = new ProcessBuilder("./test.pl");
// Run the proc and grab the streams
try {
Process p = pb.start();
InputStream pStdOut = p.getInputStream();
InputStream pStdErr = p.getErrorStream();
OutputStream pStdIn = p.getOutputStream();
int counter = 0;
while (true) {
String output = readInputStream(pStdOut);
if (!output.equals("")) {
System.out.println("<OUTPUT> " + output);
}
String errors = readInputStream(pStdErr);
if (!errors.equals("")) {
System.out.println("<ERRORS> " + errors);
}
if (counter == 50) {
// Write to the stdin of the execed proc. The \n should
// in turn trigger it to treat it as a line to process
System.out.println("About to send text to proc's stdin");
String message = "hello\n";
byte[] pInBytes = message.getBytes();
pStdIn.write(pInBytes);
pStdIn.flush();
System.out.println("Sent " + pInBytes.length + " bytes.");
}
if (counter == 100) {
break;
}
Thread.sleep(100);
counter++;
}
// Cleanup
pStdOut.close();
pStdErr.close();
pStdIn.close();
p.destroy();
} catch (Exception e) {
// Catch everything
System.out.println("Exception!");
e.printStackTrace();
System.exit(1);
}
}
}
So when I run this, I get effectively nothing back. If immediately after calling flush(), I call close() on pStdIn, it works as expected. This isn't what I want though; I want to be able to continually hold the stream open and write to it whenever it so pleases me. As mentioned before, if message is 4058 bytes or larger, this will work without the close().
Is the operating system (running on 64bit Linux, with a 64bit Sun JDK for what it's worth) buffering the data before sending it? I could see Java having no real control over that, once the JVM makes the system call to write to the pipe all it can do is wait. There's another puzzle though:
The Perl script prints line before going into the while loop. Since I check for any input from Perl's stdout on every iteration of my Java loop, I would expect to see it on the first run through the loop, see the attempt at sending data from Java->Perl and then nothing. But I actually only see the initial message from Perl (after that OUTPUT message) when the write to the output stream happens. Is something blocking that I'm not aware of?
Any help greatly appreciated!
You haven't told Perl to use unbuffered output. Look in perlvar and search for $| for different ways to set unbuffered mode. In essence, one of:
HANDLE->autoflush( EXPR )
$OUTPUT_AUTOFLUSH
$|
Perl may be buffering it before it starts printing anything.
is.read(bytes); // This call has side effect of filling the array
No it doesn't. It has the effect of reading between 1 and bytes.length-1 bytes into the array. See the Javadoc.
I don't see any obvious buffering in your code, so it may be on the Perl side. What happens if you put a newline \n at the end of your print statement?
Note also that you can't, in general, read the stdin and stderr on the main thread like that. You'll be subject to deadlock - e.g., if the child process prints lots of stderr, while the parent is reading stdin, the stderr buffer will fill and the child process will block, but the parent will stay blocked forever trying to read stdin.
You need to use separate threads to read stderr and stding (also separate from the main thread, which here is used to pump input to the process).

How can I print .exe printf() messages from java program

I have one application that prints messages from Test.exe in console .My java program creates one process by executing this Test.exe.
This application prints messages by reading from input-stream of that process.
The problem, that I am facing is,
I have two scenarios:
1) When I double click test.exe, messages("Printing : %d") are printing for every second.
2)But when I run my java application,whole messages are printing at last(not for every second) before terminating Test.exe.If .exe has a very huge messages to print,then it will print those messages(I think whenever buffer becomes full)and flushing will be done.
But how can I print messages same as 1st case.
Help from anyone would be appreciated. :)
Here is the code for this Test.exe.
#include <stdio.h>
#include <windows.h>
void main(void)
{
int i=0;
while (1)
{
Sleep(500);
printf("\nPrinting : %d",i);
i++;
if (i==10)
//if(i==100)
{
return 0;
}
}
}
And my Java application is below:
public class MainClass {
public static void main(String[] args) {
String str = "G:\\Charan\\Test\\Debug\\Test.exe";
try {
Process testProcess = Runtime.getRuntime().exec(str);
InputStream inputStream = new BufferedInputStream(
testProcess.getInputStream());
int read = 0;
byte[] bytes = new byte[1000];
String text;
while (read >= 0) {
if (inputStream.available() > 0 ) {
read = inputStream.read(bytes);
if (read > 0) {
text = new String(bytes, 0, read);
System.out.println(text);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Is it possible in reverse order.If I input some text from console,Java should read and pass that String to .exe(or testProcess).How .exe scan something from Java program.
Could anyone help me..
Given that you're trying to print stdout from that process line by line, I would created a BufferedReader object using the process' input stream and use the readLine() method on that. You can get a BufferedReader object using the following chain of constructors:
BufferedReader testProcessReader = new BufferedReader(new InputStreamReader(testProcess.getInputStream()));
And to read line by line:
String line;
while ((line = testProcessReader.readLine()) != null) {
System.out.println(line);
}
The assumption here is that Test.exe is flushing its output, which is required by any read from the Java side. You can flush the output from C by calling fflush(stdout) after every call to printf().
If you don't flush, the data only lives in a buffer. When considering performance, it's a trade-off, how often you want the data to be written vs. how many writes / flush operations you want to save. If performance is critical, you can consider looking into a more efficient inter-process communication mechanism to pass data between the processes instead of stdout. Since you are on Windows, the first step might be to take a look at the Microsoft IPC help page.
Seems to have something to do with not flushing. I guess it's on both sides - The C library you use seems to only automatically flush output when writing to a terminal. Flush manually after calling printf.
On the Java side, try reading from a non-buffered stream.

Categories