I am trying to submit solution (using some online compiler that has compile time constraints) for sorting an array- here is my code snippet-
class TSORT {
public static void main(String[] args) throws IOException{
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
PrintWriter bw = new PrintWriter(System.out, false);
int t = Integer.parseInt(br.readLine());
int[] list = new int[1000001];
for(int i = 0; i < t; i++){
int n = Integer.parseInt(br.readLine());
list[n]++;
}
int r=0;
for(int i = 0; i < 1000001; i++){
if(list[i] > 0){
for(int j = 0; j < list[i]; j++){
bw.println(i); // if I use bw.flush() here, time limit gets exceeded.
}
}
}
bw.flush();
}
}
This code gets submitted successfully, but if I use flush() as true (automatic flushing- new PrintWriter(System.out, true);), the compiler shows TIME LIMIT EXCEEDED .
My question is - how should I use flush() to get best compile time?
You're submitting the code, and it is afterwards executed somewhere, that's why you have a TIme Limit Exceeded exception.
The reason why you don't get this exception if you disable automatic flushing is simple, once you look at what flush actually means. flush blocks your code and waits until everything which was written to the stream also went through the stream to it's target (in this case System.out).
If you have automatic flushing turned on, it will flush after every println command. So after every println you application blocks and waits for the Java VM or the Host system to forward your string to System.out.
If you have automatic flushing turned off, the strings of println are cached in memory. Depending of the implementation of the stream, it can still try to flush out data from the memory in the background, but it doesn't have to. In the end of your application you will write all your strings at once (via flush). This is faster because of less context switches and because it doesn't lock your application from running the loop.
Related
I am writing lot of data to the stdandard output and I remark that according to the output console the program execution time is variable. The program is slower on NetBeans console than in Windows cmd for example.
So i think writing to stdout fill a buffer and writing become blocking when this buffer is full (the console output doesn't consume fast enough).
I reproduce this behavior with a Java program.
Here a program that output data .
public class Output {
public static void main(String[] args) {
long start = System.currentTimeMillis();
for (int i = 0; i < 10_000; i++) {
System.out.println("Awesone Output ");
}
System.out.println("Total time : " + (System.currentTimeMillis() - start));
}
}
And here a program that consume data from the above program
public class StdoutBlocking {
public static void main(String[] args) throws IOException, InterruptedException {
ProcessBuilder processBuilder = new ProcessBuilder("java", "stackoverflowDemo.StdoutBlocking.Output");
processBuilder.directory(new File("C:/Users/me/Documents/NetBeansProjects/tmp/build/classes"));
Process process = processBuilder.start();
BufferedReader reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
String line = null;
String lastLine = null;
while ((line = reader.readLine()) != null) {
Thread.sleep(10);
lastLine = line;
}
System.out.println("done : " + lastLine);
}
}
I launch the second one that start output and consume its output.
Without sleep the Output program is really quick with it's very slow.
So what is the size of this buffer ??
This is just for my Curiosity i'am not trying to achieve Something particular
Default buffer size for System.out and System.err is 8192. You can see it by looking the source code of it's methods like println(String). They use BufferedWriter with default buffer size.
Without sleep the program is really quick with it's very slow. this has no relation to buffer size. Your program sleeps at least 10 milliseconds for ach line of output (10000 lines). So with Thread.sleep(10) your program does nothing for at least 100 seconds. That is why it is slow with Thread.sleep().
Regarding performance of System.out.println() check this question: Why is System.out.println so slow?
The underlying OS operation (displaying chars on a console window) is slow because
1.The bytes have to be sent to the console application (should be quite fast)
2.Each char has to be rendered using (usually) a true type font (that's pretty slow, switching off anti aliasing could improve performance, btw)
3.The displayed area may have to be scrolled in order to append a new line to the visible area (best case: bit block transfer operation, worst case: re-rendering of the complete text area)
Also you could find this in Javadoc for Process class
Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the subprocess may cause the subprocess to block, and even deadlock.
For UNIX you could check this link: https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer
I need to read a file one character at a time and I'm using the read() method from BufferedReader. *
I found that read() is about 10x slower than readLine(). Is this expected? Or am I doing something wrong?
Here's a benchmark with Java 7. The input test file has about 5 million lines and 254 million characters (~242 MB) **:
The read() method takes about 7000 ms to read all the characters:
#Test
public void testRead() throws IOException, UnindexableFastaFileException{
BufferedReader fa= new BufferedReader(new FileReader(new File("chr1.fa")));
long t0= System.currentTimeMillis();
int c;
while( (c = fa.read()) != -1 ){
//
}
long t1= System.currentTimeMillis();
System.err.println(t1-t0); // ~ 7000 ms
}
The readLine() method takes only ~700 ms:
#Test
public void testReadLine() throws IOException{
BufferedReader fa= new BufferedReader(new FileReader(new File("chr1.fa")));
String line;
long t0= System.currentTimeMillis();
while( (line = fa.readLine()) != null ){
//
}
long t1= System.currentTimeMillis();
System.err.println(t1-t0); // ~ 700 ms
}
* Practical purpose: I need to know the length of each line, including the newline characters (\n or \r\n) AND the line length after stripping them. I also need to know if a line starts with the > character. For a given file this is done only once at the start of the program. Since EOL chars are not returned by BufferedReader.readLine() I'm resorting on the read() method. If there are better ways of doing this, please say.
** The gzipped file is here http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/chr1.fa.gz. For those who may be wondering, I'm writing a class to index fasta files.
The important thing when analyzing performance is to have a valid benchmark before you start. So let's start with a simple JMH benchmark that shows what our expected performance after warmup would be.
One thing we have to consider is that since modern operating systems like to cache file data that is accessed regularly we need some way to clear the caches between tests. On Windows there's a small little utility that does just this - on Linux you should be able to do it by writing to some pseudo file somewhere.
The code then looks as follows:
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
#BenchmarkMode(Mode.AverageTime)
#Fork(1)
public class IoPerformanceBenchmark {
private static final String FILE_PATH = "test.fa";
#Benchmark
public int readTest() throws IOException, InterruptedException {
clearFileCaches();
int result = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
int value;
while ((value = reader.read()) != -1) {
result += value;
}
}
return result;
}
#Benchmark
public int readLineTest() throws IOException, InterruptedException {
clearFileCaches();
int result = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
String line;
while ((line = reader.readLine()) != null) {
result += line.chars().sum();
}
}
return result;
}
private void clearFileCaches() throws IOException, InterruptedException {
ProcessBuilder pb = new ProcessBuilder("EmptyStandbyList.exe", "standbylist");
pb.inheritIO();
pb.start().waitFor();
}
}
and if we run it with
chcp 65001 # set codepage to utf-8
mvn clean install; java "-Dfile.encoding=UTF-8" -server -jar .\target\benchmarks.jar
we get the following results (about 2 seconds are needed to clear the caches for me and I'm running this on a HDD so that's why it's a good deal slower than for you):
Benchmark Mode Cnt Score Error Units
IoPerformanceBenchmark.readLineTest avgt 20 3.749 ± 0.039 s/op
IoPerformanceBenchmark.readTest avgt 20 3.745 ± 0.023 s/op
Surprise! As expected there's no performance difference here at all after the JVM has settled into a stable mode. But there is one outlier in the readCharTest method:
# Warmup Iteration 1: 6.186 s/op
# Warmup Iteration 2: 3.744 s/op
which is exaclty the problem you're seeing. The most likely reason I can think of is that OSR isn't doing a good job here or that the JIT is only running too late to make a difference on the first iteration.
Depending on your use case this might be a big problem or negligible (if you're reading a thousand files it won't matter, if you're only reading one this is a problem).
Solving such a problem is not easy and there are no general solutions, although there are ways to handle this. One easy test to see if we're on the right track is to run the code with the -Xcomp option which forces HotSpot to compile every method on the first invocation. And indeed doing so, causes the large delay at the first invocation to disappear:
# Warmup Iteration 1: 3.965 s/op
# Warmup Iteration 2: 3.753 s/op
Possible solution
Now that we have a good idea what the actual problem is (my guess is still all those locks neither being coalesced nor using the efficient biased locks implementation), the solution is rather straight forward and simple: Reduce the number of function calls (so yes we could've arrived at this solution without everything above, but it's always nice to have a good grip on the problem and there might have been a solution that didn't involve changing much code).
The following code runs consistently faster than either of the other two - you can play with the array size but it's surprisingly unimportant (presumably because contrary to the other methods read(char[]) does not have to acquire a lock so the cost per call is lower to begin with).
private static final int BUFFER_SIZE = 256;
private char[] arr = new char[BUFFER_SIZE];
#Benchmark
public int readArrayTest() throws IOException, InterruptedException {
clearFileCaches();
int result = 0;
try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
int charsRead;
while ((charsRead = reader.read(arr)) != -1) {
for (int i = 0; i < charsRead; i++) {
result += arr[i];
}
}
}
return result;
}
This is most likely good enough performance wise, but if you wanted to improve performance even further using a file mapping might (wouldn't count on too large an improvement in a case such as this, but if you know that your text is always ASCII, you could make some further optimizations) further help performance.
So this is the practical answer to my own question: Don't use BufferedReader.read() use FileChannel instead. (Obviously I'm not answering the WHY I put in the title). Here's the quick and dirty benchmark, hopefully others will find it useful:
#Test
public void testFileChannel() throws IOException{
FileChannel fileChannel = FileChannel.open(Paths.get("chr1.fa"));
long n= 0;
int noOfBytesRead = 0;
long t0= System.nanoTime();
while(noOfBytesRead != -1){
ByteBuffer buffer = ByteBuffer.allocate(10000);
noOfBytesRead = fileChannel.read(buffer);
buffer.flip();
while ( buffer.hasRemaining() ) {
char x= (char)buffer.get();
n++;
}
}
long t1= System.nanoTime();
System.err.println((float)(t1-t0) / 1e6); // ~ 250 ms
System.err.println("nchars: " + n); // 254235640 chars read
}
With ~250 ms to read the whole file char by char, this strategy is considerably faster than BufferedReader.readLine() (~700 ms), let alone read(). Adding if statements in the loop to check for x == '\n' and x == '>' makes little difference. Also putting a StringBuilder to reconstruct lines doesn't affect the timing too much. So this is plenty good for me (at least for now).
Thanks to #Marco13 for mentioning FileChannel.
Java JIT optimizes away empty loop bodies, so your loops actually look like this:
while((c = fa.read()) != -1);
and
while((line = fa.readLine()) != null);
I suggest you read up on benchmarking here and the optimization of the loops here.
As to why the time taken differs:
Reason one (This only applies if the bodies of the loops contain code): In the first example, you're doing one operation per line, in the second, you're doing one per character. This this adds up the more lines/characters you have.
while((c = fa.read()) != -1){
//One operation per character.
}
while((line = fa.readLine()) != null){
//One operation per line.
}
Reason two: In the class BufferedReader, the method readLine() doesn't use read() behind the scenes - it uses its own code. The method readLine() does less operations per character to read a line, than it would take to read a line with the read() method - this is why readLine() is faster at reading an entire file.
Reason three: It takes more iterations to read each character, than it does to read each line (unless each character is on a new line); read() is called more times than readLine().
Thanks #Voo for the correction. What I mentioned below is correct from FileReader#read() v/s BufferedReader#readLine() point of view BUT not correct from BufferedReader#read() v/s BufferedReader#readLine() point of view, so I have striked-out the answer.
Using read() method on BufferedReader is not a good idea, it wouldn't cause you any harm but it certainly wastes the purpose of class.
Whole purpose in life of BufferedReader is to reduce the i/o by buffering the content. You can read here in Java tutorials. You may also notice that read() method in BufferedReader is actually inherited from Reader while readLine() is BufferedReader's own method.
If you want to use read() method then I would say you better use FileReader, which is meant for that purpose. You can read here in Java tutorials.
So, I think answer to your question is very simple (without going into bench-marking and all that explainations) -
Each read() is handled by underlying OS and triggers disk access, network activity, or some other operation that is relatively expensive.
When you use readLine() then you save all these overheads, so readLine() will always be faster than read(), may not be substantially for small data but faster.
It is not surprising to see this difference if you think about it. One test is iterating the lines in a text file, while the other is iterating characters.
Unless each line contains one character, it is expected that the readLine() is way faster than the read() method.(although as pointed out by the comments above, it is arguable since a BufferedReader buffers the input, while the physical file reading might not be the only performance taking operation)
If you really want to test the difference between the 2 I would suggest a setup where you iterate over each character in both tests. E.g. something like:
void readTest(BufferedReader r)
{
int c;
StringBuilder b = new StringBuilder();
while((c = r.read()) != -1)
b.append((char)c);
}
void readLineTest(BufferedReader r)
{
String line;
StringBuilder b = new StringBuilder();
while((line = b.readLine())!= null)
for(int i = 0; i< line.length; i++)
b.append(line.charAt(i));
}
Besides the above, please use a "Java performance diagnostic tool" to benchmark your code. Also, readup on how to microbenchmark java code.
According to the documentation:
Every read() method call makes an expensive system call.
Every readLine() method call still makes an expensive system call, however, for more bytes at once, so there are fewer calls.
Similar situation happens when we make database update command for each record we want to update, versus a batch update, where we make one call for all the records.
UPDATE:
Turns out that I was initializing my inputstream on both my Server and Client at the same time and that was what was causing the problem.
In the server side of an application I am making: I never reach one part of the code, which causes my client end to block at that point. Initially I assumed there weren't enough resources for it to run the program, so I quit all my other applications and tried again. I left it running for an hour, still to no avail.
How can I figure out what's wrong? I can't understand where it might be blocking and it can't be stuck in an infinite loop anywhere before that point.
String[][] coordinates1 = new String[10][10], coordinates2 = new String[10][10];
public void run() {
//initializes the 10x10 grids for the player
synchronized (this) {
//outer loop is for the rows
for (int loop = 0; loop < 10; loop++) {
//the inner loop is for the columns
for (int loop2 = 0; loop2 < 10; loop2++) {
coordinates1[loop][loop2] = "~ ";
coordinates2[loop][loop2] = "~ ";
}
}
}
if (player == 1) {
//deals with the client assuming it is player1
//declares Object i/o streams
ObjectInputStream in = null;
ObjectOutputStream out = null;
try {
//initializes i/o streams
in = new ObjectInputStream(client.getInputStream());
out = new ObjectOutputStream(client.getOutputStream());
//writes the two 10x10 grids to the client
out.writeObject(coordinates2);
out.writeObject(coordinates1);
out.flush();
...
}
The part it never reaches are the "out.writeObject(..);" lines.
The full code can be found here, along with the client end.
It is not blocked, your program is waiting.
You never get to the writeObject line, because the preceeding two lines (getInputStream and getOutputStream) will "wait" for the socket to establish a valid connection. You can verify this by placing two log lines around this:
in = new ObjectInputStream(client.getInputStream());
To fix the problem I would suggest reviewing your code where you are establishing the connection (ServerSockets and Sockets).
I wrote a Java method to send an instruction to a remote device via serial port and get a known number of bytes as the answer. The code runs on RaspberryPi, using librxtx-java library. The remote device was verified to send the answer of expected length.
The code below is the last part of this method where RaspberryPi waits for all the bytes of the answer for up to a given time "t_max".
The code as it is throws an IndexOutOfBoundsException during System.arraycopy. If I wrap the arraycopy instruction by try...catch and print out the pointer variable at catch, there is indeed an index overflow.
However, if I uncomment the line which prints out the pointer value, there is no more exception. Even replacing this line by System.out.println("X"); makes the exception gone, but not does the System.out.print("X"); for example.
I tried changing the variables to volatile but no more luck. How can printing out to terminal change the value of a variable?
long t0 = System.currentTimeMillis();
long t = t0;
byte[] answer = new byte[answerLength];
byte[] readBuffer = new byte[answerLength];
int numBytes = 0;
int answerPointer = 0;
while (t - t0 < t_max) {
try {
if (inputStream.available() > 0) {
numBytes = inputStream.read(readBuffer);
}
} catch (Exception e) {
}
if (numBytes > 0) {
// System.out.println("answerPointer="+answerPointer);
System.arraycopy(readBuffer, 0, answer, answerPointer, numBytes);
answerPointer = answerPointer + numBytes;
}
if (answerPointer == answerLength) {
return (answer);
}
t = System.currentTimeMillis();
}
Have you tried verifying if the output stream and input stream are linked in any way? May be the input stream is reading from the output-stream and '\n' (new line) is being used as the end of stream character. Can you try printing out to a print-stream wrappend around byte-array-output-stream instead of standard-out and see if doing a ps.println("X") causes an exception? If it does cause an exception then possibly the standard output and input stream are linked and that is why doing a System.out.println("X") makes the exception go away.
Also, volatile keyword is used in the context of threads. It will not have any effect in a single thread environment.
If the code inputStream.available() throws an exception on second iteration of while (t - t0 < t_max) variables numBytes and readBuffer stay initialized with old values. Try to wrap all code in block while (t - t0 < t_max) into try {} catch {} and don't hide an exception.
I have to write an external sorting program in java which given a file A containing an arbitrary number of integers, sorts them using only file B (which is the same size) as temporary storage. For the first stage I am reading blocks of the file into ram, using the inbuilt java sort and writing back to file B, however this is proving to be very slow. I would like to know if there are any glaring inefficiencies in my code? Note that input1 and output are RandomAccessFile Objcets and BUFFER_SIZE is the block size decided at runtime by the amount of free memory.
public void SortBlocks() throws IOException{
int startTime = (int) System.currentTimeMillis();
input1.seek(0);output.seek(0);
DataInputStream in = new DataInputStream(new BufferedInputStream(new FileInputStream(input1.getFD()),2048));
DataOutputStream out = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(output.getFD()),2048));
int[] buffer = new int[BUFFER_SIZE];
int j=0;
for(int i=0; i<input1.length();i+=4){
buffer[j] = in.readInt();
j++;
if(j == BUFFER_SIZE){
writeInts(buffer,out,j);
j=0;
}
}
writeInts(buffer,out,j);
out.flush();
SwitchIO();
int endTime = (int) System.currentTimeMillis();
System.out.println("sorted blocks in " + Integer.toString(endTime-startTime));
}
private static void writeInts(int[] Ints, DataOutputStream out, int size) throws IOException{
Arrays.sort(Ints,0,size);
for(int i=0;i<size;i++){
out.writeInt(Ints[i]);
}
}
Thanks in advance for your feedback.
The most glaring inefficiency is the use of input1.length() which is a relatively expensive operation and you are calling it on every int value.
I can't see why you decrease the buffer size when the default (8192) would be more efficient.
If you are reading files, I would use a ByteBuffer as an IntBuffer. A bottleneck is likely to be the way you read and write data. Using int values in native order would improve the translation performance. (Rather than the default which big endian)
If you access the file as a memory mapped file you may be able to gracefully handle files larger than the memory size.