Improve performance of a Java Program - java

I've made an Applet Search Utility in which I provide a string as input and find that string in the specified file or folder.
I've done with this but I m not happy with its performance.
The process is taking too much time to respond.
I decided to do its profiling to see what is happening and I noticed that the method scanner.hasNextLine() is taking most of the time.
Though this is very important method for my program because I have to read all the lines and find that string, Is there any other way by which I can improve its performance and reduce the execution time
Here is the code where I am using this method ....
fw = new FileWriter("filePath", true);
bw = new BufferedWriter(fw);
for (File file : filenames) {
if(file.isHidden())
continue;
if (!file.isDirectory()) {
Scanner scanner = new Scanner(file);
int cnt = 0;
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
if(!exactMatch)
{
if(!caseSensitive)
{
if (line.toLowerCase().contains(searchString.toLowerCase())) {
// System.out.println(line);
cnt += StringUtils.countMatches(line.toLowerCase(),
searchString.toLowerCase());
}
}
else
{
if (line.contains(searchString)) {
// System.out.println(line);
cnt += StringUtils.countMatches(line,
searchString);
}
}
}
And yes the method toLowerCase() is also taking more time then expected.
I have changed my code and now I am using BufferedReader in place of Scanner as Alex and Nrj suggested and I found a nice improvement in the performance of my application.
It is now processing in one third time of its earlier version.
Thanks to all that replied.....

Following your question I examined code of Scanner and I think that your are right. It is not optimized to work with large data. I'd recommend you to use simple BufferedReader that wraps InputStreamReader that wraps FileInputStream:
BufferedReader r = new BufferedReader(new InputStreamReader(new FileInputStream(fileName)))
then read line-by-line:
r.readLine()
If this is not enough for you try to read bulks of lines and then process them.
Concerning to toLowerCase() you can try to use regular expressions instead. The benefit is that you do not have to change the case of line every time. The disadvantage is that in simple cases regular expression works a bit slower than regular string comparison.

I would suggest redesigning your solution and use something like Lucene to do the search for you. You can index and search files with Lucene much more efficiently, tutorial on how to do it with text files can be found here: http://www.avajava.com/tutorials/lessons/how-do-i-use-lucene-to-index-and-search-text-files.html

(Only small optimizations, in response to comment above.)
if(!caseSensitive)
{
searchString = searchString.toLowerCase();
}
while (true) {
String line = bufferedReader.readLine();
if (line == null)
break;
if(!caseSensitive)
{
line = line.toLowerCase();
}
if(!exactMatch)
{
if (line.contains(searchString)) {
// System.out.println(line);
cnt += StringUtils.countMatches(line,
searchString);
}
}

Try using BufferedReader
Make use of threads. You can search the files in parallel which should reduce the search time.

I would not use Java to search the file system for matches of the string. Instead invoke a native algorithm from Java instead. I would invoke grep from Java using something like this:
ProcessBuilder pb = new ProcessBuilder("grep", "-r", "foo");
pb.directory(new File("myDir"));
Process p = pb.start();
InputStream in = p.getInputStream();
//Do whatever you prefer with the stream

Related

Separating Get request Response body in java Socket programming

I'm trying to write a curl like program using java, which uses only java socket programming (and not apache http client or any other APIs)
I want to have the option of showing whole or only the body of the response to my get request to user. Currently came up with the following code:
BufferedReader br = new BufferedReader(new InputStreamReader(s.getInputStream()));
String t;
while ((t = br.readLine()) != null) {
if (t.isEmpty() && !parameters.isVerbose()) {
StringBuilder responseData = new StringBuilder();
while ((t = br.readLine()) != null) {
responseData.append(t).append("\r\n");
}
System.out.println(responseData.toString());
parameters.verbose = false;
break;
} else if(parameters.isVerbose())// handle output
System.out.println(t);
}
br.close();
When the verbose option is on, it works quick and shows the whole response body in less than a second. but when I want to just have the body of the message it takes too much time(approx 10 sec) to hand it out.
Does any one knows how can it be processed in a faster way?
Thank you.
I'm going to assume what you mean by slow is that it starts displaying something almost immediately but keeps on printing lines for a long time. Writing to the console takes time, and you're printing each line invidually while in the other code path you first store the entire response in memory and then flush it to the console.
If the verbose response is small enough to fit in memory, you should do the same, otherwise you can decide on an arbitrary number of lines to print in batches (i.e; you accumulate n lines in memory and then flush to the console, clear the StringBuilderand repeat).
The most elegant way to implement my suggestion is to use a PrintStream wrapping a BufferedOutputStream, itself wrapping System.out. All my comments and advices are condensed in the following snippet:
private static final int BUFFER_SIZE = 4096;
public static void printResponse(Socket socket, Parameters parameters) throws IOException {
try (BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
PrintStream printStream = new PrintStream(new BufferedOutputStream(System.out, BUFFER_SIZE))) {
// there is no functional difference in your code between the verbose and non-verbose code paths
// (they have the same output). That's a bug, but I'm not fixing it in my snippet as I don't know
// what you intended to do.
br.lines().forEach(line -> printStream.append(line).append("\r\n"));
}
}
If it uses any language construct you don't know about, feel free to ask further questions.

Resume read of huge text file in Java

I am reading a huge text file of words (one word per line) but I have to stop it from time to time to resume the read the next day. Right now I'm using Apache's lineiterator but it's totally the wrong solution. My file is 7Gb and I had to interrupt reading it around at 1Gb. To resume the read I saved the number of line already read. This means that I have an if statement on the while loop. Apache's FileUtils doesn't allow to seek so that was my solution.
What is the best/fastest solution? I thought to use RandomAccessfile to get to the right line and continue reading, but I'm not sure if I can go to the right place AND how do I save the correct place I read last. I can reead again a couple of lines, so the precision is not so important, but I haven't found a way to get the pointer. I have a BufferedReader to read the File and a RandomAccessFile to seek to the right place, but I don't know how to periodically save a position with the BufferedReader.
Any hints?
Code: (note the "SOMETHING" where I should print the value I can use on the seekToByte )
try {
RandomAccessFile rand = new RandomAccessFile(file,"r");
rand.seek(seekToByte);
startAtByte = rand.getFilePointer();
rand.close();
} catch(IOException e) {
// do something
}
// Do it using the BufferedReader
BufferedReader reader = null;
FileReader freader = null;
try {
freader = new FileReader(file);
reader = new BufferedReader(freader);
reader.skip(startAtByte);
long i=0;
for(String line; (line = reader.readLine()) != null; ) {
lines.add(line);
System.out.print(i+" ");
if (lines.size()>1000) {
commit(lines);
System.out.println("");
lines.clear();
System.out.println(SOMETHING?);
}
}
} catch(Exception e) {
// handle this
} finally {
if (reader != null) {
try {reader.close();} catch(Exception ignore) {}
}
}
RandomAccessfile is indeed one way to go. Use
long position = file.getFilePointer();
When you stop reading to save where you are in the file, and then restore with:
file.seek(position);
To resume reading at the same place.
However, be careful when using RandomAccessfile, as its readLine method does not completely support Unicode.
Can you somehow use predetermined offsets, for instance chop the file into four pieces (offset0, offset1) (offset1, offset2)..etc, and use RecursiveAction (ForkJoin API) to take advantage of parallelism.

Java - Use Input and OutputStream of ProcessBuilder continuously

I want to use an external tool while extracting some data (loop through lines).
For that I first used Runtime.getRuntime().exec() to execute it.
But then my extraction got really slow. So I am searching for a possibility to exec the external tool in each instance of the loop, using the same instance of shell.
I found out, that I should use ProcessBuilder. But it's not working yet.
Here is my code to test the execution (with input from the answers here in the forum already):
public class ExecuteShell {
ProcessBuilder builder;
Process process = null;
BufferedWriter process_stdin;
BufferedReader reader, errReader;
public ExecuteShell() {
String command;
command = getShellCommandForOperatingSystem();
if(command.equals("")) {
return; //Fehler! No error handling yet
}
//init shell
builder = new ProcessBuilder( command);
builder.redirectErrorStream(true);
try {
process = builder.start();
} catch (IOException e) {
System.out.println(e);
}
//get stdout of shell
reader = new BufferedReader(new InputStreamReader(process.getInputStream()));
errReader = new BufferedReader(new InputStreamReader(process.getErrorStream()));
//get stdin of shell
process_stdin = new BufferedWriter(new OutputStreamWriter(process.getOutputStream()));
System.out.println("ExecuteShell: Constructor successfully finished");
}
public String executeCommand(String commands) {
StringBuffer output;
String line;
try {
//single execution
process_stdin.write(commands);
process_stdin.newLine();
process_stdin.flush();
} catch (IOException e) {
System.out.println(e);
}
output = new StringBuffer();
line = "";
try {
if (!reader.ready()) {
output.append("Reader empty \n");
return output.toString();
}
while ((line = reader.readLine())!= null) {
output.append(line + "\n");
return output.toString();
}
if (!reader.ready()) {
output.append("errReader empty \n");
return output.toString();
}
while ((line = errReader.readLine())!= null) {
output.append(line + "\n");
}
} catch (Exception e) {
System.out.println("ExecuteShell: error in executeShell2File");
e.printStackTrace();
return "";
}
return output.toString();
}
public int close() {
// finally close the shell by execution exit command
try {
process_stdin.write("exit");
process_stdin.newLine();
process_stdin.flush();
}
catch (IOException e) {
System.out.println(e);
return 1;
}
return 0;
}
private static String getShellCommandForOperatingSystem() {
Properties prop = System.getProperties( );
String os = prop.getProperty( "os.name" );
if ( os.startsWith("Windows") ) {
//System.out.println("WINDOWS!");
return "C:/cygwin64/bin/bash";
} else if (os.startsWith("Linux") ) {
//System.out.println("Linux!");
return"/bin/sh";
}
return "";
}
}
I want to call it in another Class like this Testclass:
public class TestExec{
public static void main(String[] args) {
String result = "";
ExecuteShell es = new ExecuteShell();
for (int i=0; i<5; i++) {
// do something
result = es.executeCommand("date"); //execute some command
System.out.println("result:\n" + result); //do something with result
// do something
}
es.close();
}
}
My Problem is, that the output stream is always empty:
ExecuteShell: Constructor successfully finished
result:
Reader empty
result:
Reader empty
result:
Reader empty
result:
Reader empty
result:
Reader empty
I read the thread here: Java Process with Input/Output Stream
But the code snippets were not enough to get me going, I am missing something. I have not really worked with different threads much. And I am not sure if/how a Scanner is of any help to me. I would really appreciate some help.
Ultimatively, my goal is to call an external command repeatetly and make it fast.
EDIT:
I changed the loop, so that the es.close() is outside. And I wanted to add, that I do not want only this inside the loop.
EDIT:
The problem with the time was, that the command I called caused an error. When the command does not cause an error, the time is acceptable.
Thank you for your answers
You are probably experiencing a race condition: after writing the command to the shell, your Java program continues to run, and almost immediately calls reader.ready(). The command you wanted to execute has probably not yet output anything, so the reader has no data available. An alternative explanation would be that the command does not write anything to stdout, but only to stderr (or the shell, maybe it has failed to start the command?). You are however not reading from stderr in practice.
To properly handle output and error streams, you cannot check reader.ready() but need to call readLine() (which waits until data is available) in a loop. With your code, even if the program would come to that point, you would read only exactly one line from the output. If the program would output more than one line, this data would get interpreted as the output of the next command. The typical solution is to read in a loop until readLine() returns null, but this does not work here because this would mean your program would wait in this loop until the shell terminates (which would never happen, so it would just hang infinitely).
Fixing this would be pretty much impossible, if you do not know exactly how many lines each command will write to stdout and stderr.
However, your complicated approach of using a shell and sending commands to it is probably completely unnecessary. Starting a command from within your Java program and from within the shell is equally fast, and much easier to write. Similarly, there is no performance difference between Runtime.exec() and ProcessBuilder (the former just calls the latter), you only need ProcessBuilder if you need its advanced features.
If you are experiencing performance problems when calling external programs, you should find out where they are exactly and try to solve them, but not with this approach. For example, normally one starts a thread for reading from both the output and the error stream (if you do not start separate threads and the command produces large output, everything might hang). This could be slow, so you could use a thread pool to avoid repeated spawning of processes.

Is there a better way to read from a process's inputstream and then handle using specified methods?

I am writing a program doing the following works:
Run a command using ProcessBuilder (like "svn info" or "svn diff");
Read the output of the command from the process's getInputStream();
With the output of the command, I want either:
Parse the output and get what I want and use it later, OR:
Write the output directly to a specified file.
Now what I am doing is using BufferedReader to read whatever the command outputs by lines and save them to an ArrayList, and then decide if I would just scan the lines to find out something or write the lines to a file.
Obviously this is an ugly implement because the ArrayList should not be needed if I want a command's output to be saved to a file. So what will you suggest, to do it in a better way?
Here is some of my codes:
Use this to run command and read from the output of the process
private ArrayList<String> runCommand(String[] command) throws IOException {
ArrayList<String> result = new ArrayList<>();
_processBuilder.command(command);
Process process = null;
try {
process = _processBuilder.start();
try (InputStream inputStream = process.getInputStream();
InputStreamReader inputStreamReader = new InputStreamReader(inputStream);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {
String line;
while ((line = bufferedReader.readLine()) != null) {
result.add(line);
}
}
}
catch (IOException ex) {
_logger.log(Level.SEVERE, "Error!", ex);
}
finally {
if (process != null) {
try {
process.waitFor();
}
catch (InterruptedException ex) {
_logger.log(Level.SEVERE, null, ex);
}
}
}
return result;
}
and in one method I may do like this:
ArrayList<String> reuslt = runCommand(command1);
for (String line: result) {
// ...parse the line here...
}
and in another I may do like this:
ArrayList<String> result = runCommand(command2);
File file = new File(...filename, etc...);
try (PrintWriter printWriter = new PrintWriter(new FileWriter(file, false))) {
for (String line: result) {
printWriter.println(line);
}
}
Returning the process output in an ArrayList seems like a fine abstraction to me. Then the caller of runCommand() doesn't need to worry about how the command was run or the output read. The memory used by the extra list is probably not significant unless your command is very prolix.
The only time I could see this being an issue would be if the caller wanted to start processing the output while the command was still running, which doesn't seem to be the case here.
For very big output that you don't want to copy into memory first, one option would be to have runCommand() take a callback like Guava's LineProcessor that it will call for each line of the output. Then runCommand() can still abstract away the whole deal of running the process, reading the output, and closing everything afterwards, but data can be passed out to the callback as it runs rather than waiting for the method to return the whole response in one array.
I don't think it's a performance issue that you store the text uselessly in some cases. Nonetheless, for cleanliness, it might be better to write two methods:
private ArrayList<String> runCommand(String[] command)
private void runCommandAndDumpToFile(String[] command, File file)
(It wasn't quite clear from your question, but I assume that you know before running your process whether you'll just write the output to file or process it.)

how to add two words from text file using java program

I am new to java. I think this is the simplest problem but even i dont know how to solve this problem. I have one text file. In that file i have some words like below :
good
bad
efficiency
I want to add list of words into another by using java program. My output want to be like this
good bad
good efficiency
bad efficiency
How to get this using java program. I tried search for some ideas. But i wont get any idea. Please suggest me any ideas. Thanks in advance.
If you do not want to learn it from scratch I would recommend using the Apache Commons io library.
The FileUtils class has a simple interface to read from and write to a file.
A good place to start learning Java IO would be to look over Sun's Java Tutorials on File IO. If you're looking into how to read in individual lines, I would particularly look at Scanners. And if at some point you're looking to manipulate Strings like this without IO being heavily involved, I'd look at Java's StringBuilder.
import java.io.*;
class Test {
//--------------------------------------------------< main >--------//
public static void main (String[] args) {
Test t = new Test();
t.readMyFile();
}
//--------------------------------------------< readMyFile >--------//
void readMyFile() {
String record = null;
String rec=null;
int recCount = 0;
try {
FileReader fr = new FileReader("c:/abc/java/prash.txt");
FileReader fr1 = new FileReader("c:/abc/java/pras.txt");
BufferedReader br = new BufferedReader(fr);
BufferedReader br1 = new BufferedReader(fr1);
record = new String();
rec = new String();
while ((record = br.readLine()) != null && (rec=br1.readLine())!=null) {
// recCount++;
System.out.print(record +" "+ rec);
//System.out.print(rec);
}
} catch (IOException e) {
// catch possible io errors from readLine()
System.out.println("Uh oh, got an IOException error!");
e.printStackTrace();
}
} // end of readMyFile()
} // end of class

Categories