I know that going into a catch block has some significance cost when executing a program, however, I was wondering if entering a try{} block also had any impact so I started looking for an answer in google with many opinions, but no benchmarking at all. Some answers I found were:
Java try/catch performance, is it recommended to keep what is inside the try clause to a minimum?
Try Catch Performance Java
Java try catch blocks
However they didn't answer my question with facts, so I decided to try it for myself.
Here's what I did. I have a csv file with this format:
host;ip;number;date;status;email;uid;name;lastname;promo_code;
where everything after status is optional and will not even have the corresponding ; , so when parsing a validation has to be done to see if the value is there, here's where the try/catch issue came to my mind.
The current code that I inherited in my company does this:
StringTokenizer st=new StringTokenizer(line,";");
String host = st.nextToken();
String ip = st.nextToken();
String number = st.nextToken();
String date = st.nextToken();
String status = st.nextToken();
String email = "";
try{
email = st.nextToken();
}catch(NoSuchElementException e){
email = "";
}
and it repeats what it's done for email with uid, name, lastname and promo_code.
and I changed everything to:
if(st.hasMoreTokens()){
email = st.nextToken();
}
and in fact it performs faster. When parsing a file that doesn't have the optional columns. Here are the average times:
--- Trying:122 milliseconds
--- Checking:33 milliseconds
however, here's what confused me and the reason I'm asking: When running the example with values for the optional columns in all 8000 lines of the CSV, the if() version still performs better than the try/catch version, so my question is
Does really the try block does not have any performance impact on my code?
The average times for this example are:
--- Trying:105 milliseconds
--- Checking:43 milliseconds
Can somebody explain what's going on here?
Thanks a lot
Yes, try (in Java) does not have any performance impact. The compiler generates no VM statements for a try block. It simply records the program counters between which the try block is active and attaches this information to the method in the class file. Then, when an exception is thrown, the VM unwinds the stack and checks at each frame whether the program counter in that frame is in a relevant try block. This (together with building the stack trace) is quite costly, so catching is expensive. However, trying is free :).
Still, it is not good practice to use exceptions for regular control flow.
The reason why your code performs faster is probably that catching is so extremely costly that it outweights the time saved by replacing the check by a simple try.
Try catch can be faster in code where the catch is triggered not very often, e.g., if you go into the try 10000 times but only catch once, the try method would be faster than the if-check. Still, this is no good style and your way of explicitly checking for more tokens is to be preferred.
Related
I am currently working on a spring based API which has to transform csv data and to expose them as json.
it has to read big CSV files which will contain more than 500 columns and 2.5 millions lines each.
I am not guaranteed to have the same header between files (each file can have a completly different header than another), so I have no way to create a dedicated class which would provide mapping with the CSV headers.
Currently the api controller is calling a csv service which reads the CSV data using a BufferReader.
The code works fine on my local machine but it is very slow : it takes about 20 seconds to process 450 columns and 40 000 lines.
To improve speed processing, I tried to implement multithreading with Callable(s) but I am not familiar with that kind of concept, so the implementation might be wrong.
Other than that the api is running out of heap memory when running on the server, I know that a solution would be to enhance the amount of available memory but I suspect that the replace() and split() operations on strings made in the Callable(s) are responsible for consuming a large amout of heap memory.
So I actually have several questions :
#1. How could I improve the speed of the CSV reading ?
#2. Is the multithread implementation with Callable correct ?
#3. How could I reduce the amount of heap memory used in the process ?
#4. Do you know of a different approach to split at comas and replace the double quotes in each CSV line ? Would StringBuilder be of any healp here ? What about StringTokenizer ?
Here below the CSV method
public static final int NUMBER_OF_THREADS = 10;
public static List<List<String>> readCsv(InputStream inputStream) {
List<List<String>> rowList = new ArrayList<>();
ExecutorService pool = Executors.newFixedThreadPool(NUMBER_OF_THREADS);
List<Future<List<String>>> listOfFutures = new ArrayList<>();
try {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
String line = null;
while ((line = reader.readLine()) != null) {
CallableLineReader callableLineReader = new CallableLineReader(line);
Future<List<String>> futureCounterResult = pool.submit(callableLineReader);
listOfFutures.add(futureCounterResult);
}
reader.close();
pool.shutdown();
} catch (Exception e) {
//log Error reading csv file
}
for (Future<List<String>> future : listOfFutures) {
try {
List<String> row = future.get();
}
catch ( ExecutionException | InterruptedException e) {
//log Error CSV processing interrupted during execution
}
}
return rowList;
}
And the Callable implementation
public class CallableLineReader implements Callable<List<String>> {
private final String line;
public CallableLineReader(String line) {
this.line = line;
}
#Override
public List<String> call() throws Exception {
return Arrays.asList(line.replace("\"", "").split(","));
}
}
I don't think that splitting this work onto multiple threads is going to provide much improvement, and may in fact make the problem worse by consuming even more memory. The main problem is using too much heap memory, and the performance problem is likely to be due to excessive garbage collection when the remaining available heap is very small (but it's best to measure and profile to determine the exact cause of performance problems).
The memory consumption would be less from the replace and split operations, and more from the fact that the entire contents of the file need to be read into memory in this approach. Each line may not consume much memory, but multiplied by millions of lines, it all adds up.
If you have enough memory available on the machine to assign a heap size large enough to hold the entire contents, that will be the simplest solution, as it won't require changing the code.
Otherwise, the best way to deal with large amounts of data in a bounded amount of memory is to use a streaming approach. This means that each line of the file is processed and then passed directly to the output, without collecting all of the lines in memory in between. This will require changing the method signature to use a return type other than List. Assuming you are using Java 8 or later, the Stream API can be very helpful. You could rewrite the method like this:
public static Stream<List<String>> readCsv(InputStream inputStream) {
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream, StandardCharsets.UTF_8));
return reader.lines().map(line -> Arrays.asList(line.replace("\"", "").split(",")));
}
Note that this throws unchecked exceptions in case of an I/O error.
This will read and transform each line of input as needed by the caller of the method, and will allow previous lines to be garbage collected if they are no longer referenced. This then requires that the caller of this method also consume the data line by line, which can be tricky when generating JSON. The JakartaEE JsonGenerator API offers one possible approach. If you need help with this part of it, please open a new question including details of how you're currently generating JSON.
Instead of trying out a different approach, try to run with a profiler first and see where time is actually being spent. And use this information to change the approach.
Async-profiler is a very solid profiler (and free!) and will give you a very good impression of where time is being spent. And it will also show the time spend on garbage collection. So you can easily see the ratio of CPU utilization caused by garbage collection. It also has the ability to do allocation profiling to figure out which objects are being created (and where).
For a tutorial see the following link.
Try using Spring batch and see if it helps your scenario.
Ref : https://howtodoinjava.com/spring-batch/flatfileitemreader-read-csv-example/
I wanted to make a program that outputs a string repeated a given number of times (separated by a space) using String.repeat, but have been running into a java.lang.OutOfMemoryError when the string is repeated too many times. How can I determine the maximum number of times a string can be repeated without causing an out of memory error?
I search online for the maximum length of a string and came up with 2147483647. In my code, I divide this maximum length by the length of the string to repeat. I wanted it to round off automatically, so I used the int data type. I expected my program to be able to print the word, but instead of printing the result it still generates an out of memory error. Is the maximum string length correct? If not, what is the maximum string length?
import java.util.*;
public class Darshit {
public static void main(String[] Darshit1) {
Scanner Darshit = new Scanner(System.in);
System.out.println("WELCOME TO WORD RE-PRINTER!");
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~");
System.out.println("Enter the text");
String b = Darshit.nextLine();
int len = b.length()+1;
int e = 2147483647/len;
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~");
System.out.println("How many times you want to repeat the text");
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~");
System.out.println("Note ~>");
System.out.println("You can only print the word upto " + e + " times!");
int a = Darshit.nextInt();
String c = " ";
String d = b + c;
System.out.println(d.repeat(a));
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~");
System.out.println("Thank you for using it");
System.out.println("Made by Darshit Sharma");
System.out.println("8th-C");
System.out.println("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~");
}
}
Generally speaking, there are two approaches to handling errors:
test beforehand whether performing an operation will cause error and not perform the operation if it will (defensive programming)
trap errors if they happen (exception handling)
Different platforms have different conventions about which approach is preferred in which context (assuming exception handling is even supported). In this case, however, there is no reliable way to test whether there will be an error caused by repeating the string too many times.
In many situations, handling an OutOfMemoryError won't lead anywhere, as there isn't a way to recover and get the application back to a valid state, mainly because if the application is running low on memory, it probably won't be able to do anything useful. It might be able to log or print a message explaining why it's crashing, so the program could catch & rethrow, but usually not much more than that. In this case, your program has enough memory for most of its tasks (just not enough for the primary task of allocating the repeated string) and is simple enough that you can handle it simply: print a message explaining what happened. After that, the application is already close to exiting, so no other handling should be needed.
try {
System.out.println(d.repeat(a));
} catch (java.lang.OutOfMemoryError oome) {
System.err.println("I ran out of memory trying to repeat the string. You asked for too many repetitions.");
}
I'm developing a test for some hundreds of regex I have to manage in Android.
I encountered a catastrophic backtracking I cannot prevent, (i.e., the matcher enters an exponential complexity and it seems it is in an infinite loop, while, in reality, it is exploring a very huge number of possible matches), so I need to limit the overall execution of the matching using a timeout.
I've already found a possible approach here, but I also have to get the boolean return value from the find() method, so the Runnable is not the best choice.
Even the little variation proposed among the other answers in the link above, to avoid the use of thread is not applicable, because it is based upon an extension of CharSequence which simply doesn't work because charAt is not used in the matcher.find() (checked this twice, both with a breakpoint during debug and also reading the Matcher source). Edit: I found in a second time that also #NullPointerException already found that the charAt gets never called, but I don't know if since 3 years ago he found out a solution
So, the best option I found until now seems to be using a FutureTask, which has the possibility to specify a timeout and can also return a value. I implemented the following code:
private boolean interruptMatch(final Matcher matcher){
boolean res = false;
ExecutorService executor = Executors.newSingleThreadExecutor();
FutureTask<Boolean> future =
new FutureTask(new Callable() {
public Boolean call() {
return matcher.find();
}
});
executor.execute(future);
try {
res = future.get(2000, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
Log.d("TESTER","Find interrupted after 2000 ms");
} catch (ExecutionException e) {
Log.d("TESTER","Find ExecException after 2000 ms");
} catch (TimeoutException e) {
Log.d("TESTER","Find timeout after 2000 ms");
}
future.cancel(true);
executor.shutdownNow();
return res;
}
This part of code is being called by the main method, in an almost "classic" way:
pattern = Pattern.compile(pattern, java.util.regex.Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(inputString);
if (interruptMatch(matcher)) { // before the need to manage catastrophic backtracking here there was a simple if (matcher.find()){
// Do something
}
So, everythings seemed to work, at least for the first some hundreds patterns (also limiting in the timeout time the catastrophic backtracking long running find), until I got the following error:
JNI ERROR (app bug): weak global reference table overflow (max=51200)
It has been generated by the above java code (before this error didn't appear - cancelling the pattern which caused the catastrophic backtracking, obviously), but I cannot find out how to clean the global reference table, (I found a number of answers about similar issues generated directly by JNi code, but not from Java), not how to find a workaround or another valid approach.
EDIT: I further tried to debug and I found that the issue arises when I call the get method. I tried to follow the code of FutureTask, but I didn't find anything useful (and I get bored too fast).
Can you help me, please?
Thank you in advance
After other digging I found out that there is a tracked issue in Android, (it seems it deals with other topics, but it also answer mine) and from the replies I understand that it is just an issue which appear during debugging. I tested again my tester app and I found it's true: without debugging the above error doesn't happens. So, the severity of the issue is much lower and I can live with it (for me this is a closed issue) –
Here's a brief outline of my Java code to upload archive files to a server. It's been working fine for a couple years. But it usually just uploads a small number of files at a time. Now it is needing to upload hundreds or thousands. And it is failing after a large-ish number of iterations.
public class BatchUploader implements Runnable {
private int processUploads() {
String myFilename;
try {
BufferedReader input = new BufferedReader(new FileReader(infile));
try {
while (!stopRunning && (myFilename = input.readLine()) != null) {
if (myFilename.trim().isEmpty()) {
continue;
}
myFile = FileHelper.getFileFromFullyQualifiedName(myFilename);
upload(myFile);
}
} finally {
input.close();
isUploading = false;
}
}
}
}
After several hundred to several thousand uploads, I get an error like this:
02/20 23:17:05.314 java.io.FileNotFoundException: /home/baz (No such file or directory): java.io.FileInputStream.open(Native Method)
java.io.FileInputStream.<init>(FileInputStream.java:137)
java.io.FileReader.<init>(FileReader.java:72)
bk.a(SourceFile:41)
bk.d(SourceFile:123)
aU.e(SourceFile:181)
aU.run(SourceFile:24)
java.lang.Thread.run(Thread.java:679)
The problem is that the string containing the path to the file (held in the String var myFilename) is truncated. Instead of /home/baz is should be /home/bazillion/data/filename.arc
Something seems to run out of memory in this loop. I have no idea what is going on. Can anyone give a suggestion?
Would it help to break out of the while-loop after a certain count, then resume after a few minutes?
To add insult to injury, the list of filenames to upload is wiped out after this exception. I'm sure there's an easy fix for that in my code, but I don't know what it is. I don't work in Java much.
I don't think it's memory management; if it was, you'd get OutOfMemoryError; and in any case, it would not be truncated.
It would be interesting to look at FileHelper.getFileFromFullyQualifiedName(), the problem might very well be there. Or, maybe, in your data file?
Would it help to break out of the while-loop after a certain count, wthen resume after a few minutes?
Absolutely not. I'd make it an aggravated felony if it was up to me. We've got to program defensively: if you think you might have a bug, debug it, find the precise reason for it and fix it; don't brush it under the rug and then leave it to the poor maintenance programmer who'll be debugging a codebase 10,000 times as large ten years down the road at 3 am.
If your error is reproducible, add an "if" statement with string comparison, and print or log all variables to see what's really going on. Better yet, use a debugger, and set a conditional breakpoint there when the string begins with /home/baz and step through it examining all variables and seeing what happens.
To get a value visible after an exception is thrown, you can assign it to a field variable (the one defined just after the public class ... { line.
Good luck!
I saw the following code in this commit for MongoDB's Java Connection driver, and it appears at first to be a joke of some sort. What does the following code do?
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
(EDIT: the code has been updated since posting this question)
After inspecting the history of that line, my main conclusion is that there has been some incompetent programming at work.
That line is gratuitously convoluted. The general form
a? true : b
for boolean a, b is equivalent to the simple
a || b
The surrounding negation and excessive parentheses convolute things further. Keeping in mind De Morgan's laws it is a trivial observation that this piece of code amounts to
if (!_ok && Math.random() <= 0.1)
return res;
The commit that originally introduced this logic had
if (_ok == true) {
_logger.log( Level.WARNING , "Server seen down: " + _addr, e );
} else if (Math.random() < 0.1) {
_logger.log( Level.WARNING , "Server seen down: " + _addr );
}
—another example of incompetent coding, but notice the reversed logic: here the event is logged if either _ok or in 10% of other cases, whereas the code in 2. returns 10% of the times and logs 90% of the times. So the later commit ruined not only clarity, but correctness itself.
I think in the code you have posted we can actually see how the author intended to transform the original if-then somehow literally into its negation required for the early return condition. But then he messed up and inserted an effective "double negative" by reversing the inequality sign.
Coding style issues aside, stochastic logging is quite a dubious practice all by itself, especially since the log entry does not document its own peculiar behavior. The intention is, obviously, reducing restatements of the same fact: that the server is currently down. The appropriate solution is to log only changes of the server state, and not each its observation, let alone a random selection of 10% such observations. Yes, that takes just a little bit more effort, so let's see some.
I can only hope that all this evidence of incompetence, accumulated from inspecting just three lines of code, does not speak fairly of the project as a whole, and that this piece of work will be cleaned up ASAP.
https://github.com/mongodb/mongo-java-driver/commit/d51b3648a8e1bf1a7b7886b7ceb343064c9e2225#commitcomment-3315694
11 hours ago by gareth-rees:
Presumably the idea is to log only about 1/10 of the server failures (and so avoid massively spamming the log), without incurring the cost of maintaining a counter or timer. (But surely maintaining a timer would be affordable?)
Add a class member initialized to negative 1:
private int logit = -1;
In the try block, make the test:
if( !ok && (logit = (logit + 1 ) % 10) == 0 ) { //log error
This always logs the first error, then every tenth subsequent error. Logical operators "short-circuit", so logit only gets incremented on an actual error.
If you want the first and tenth of all errors, regardless of the connection, make logit class static instead of a a member.
As had been noted this should be thread safe:
private synchronized int getLogit() {
return (logit = (logit + 1 ) % 10);
}
In the try block, make the test:
if( !ok && getLogit() == 0 ) { //log error
Note: I don't think throwing out 90% of the errors is a good idea.
I have seen this kind of thing before.
There was a piece of code that could answer certain 'questions' that came from another 'black box' piece of code. In the case it could not answer them, it would forward them to another piece of 'black box' code that was really slow.
So sometimes previously unseen new 'questions' would show up, and they would show up in a batch, like 100 of them in a row.
The programmer was happy with how the program was working, but he wanted some way of maybe improving the software in the future, if possible new questions were discovered.
So, the solution was to log unknown questions, but as it turned out, there were 1000's of different ones. The logs got too big, and there was no benefit of speeding these up, since they had no obvious answers. But every once in a while, a batch of questions would show up that could be answered.
Since the logs were getting too big, and the logging was getting in the way of logging the real important things he got to this solution:
Only log a random 5%, this will clean up the logs, whilst in the long run still showing what questions/answers could be added.
So, if an unknown event occurred, in a random amount of these cases, it would be logged.
I think this is similar to what you are seeing here.
I did not like this way of working, so I removed this piece of code, and just logged these
messages to a different file, so they were all present, but not clobbering the general logfile.