I'm developing a test for some hundreds of regex I have to manage in Android.
I encountered a catastrophic backtracking I cannot prevent, (i.e., the matcher enters an exponential complexity and it seems it is in an infinite loop, while, in reality, it is exploring a very huge number of possible matches), so I need to limit the overall execution of the matching using a timeout.
I've already found a possible approach here, but I also have to get the boolean return value from the find() method, so the Runnable is not the best choice.
Even the little variation proposed among the other answers in the link above, to avoid the use of thread is not applicable, because it is based upon an extension of CharSequence which simply doesn't work because charAt is not used in the matcher.find() (checked this twice, both with a breakpoint during debug and also reading the Matcher source). Edit: I found in a second time that also #NullPointerException already found that the charAt gets never called, but I don't know if since 3 years ago he found out a solution
So, the best option I found until now seems to be using a FutureTask, which has the possibility to specify a timeout and can also return a value. I implemented the following code:
private boolean interruptMatch(final Matcher matcher){
boolean res = false;
ExecutorService executor = Executors.newSingleThreadExecutor();
FutureTask<Boolean> future =
new FutureTask(new Callable() {
public Boolean call() {
return matcher.find();
}
});
executor.execute(future);
try {
res = future.get(2000, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
Log.d("TESTER","Find interrupted after 2000 ms");
} catch (ExecutionException e) {
Log.d("TESTER","Find ExecException after 2000 ms");
} catch (TimeoutException e) {
Log.d("TESTER","Find timeout after 2000 ms");
}
future.cancel(true);
executor.shutdownNow();
return res;
}
This part of code is being called by the main method, in an almost "classic" way:
pattern = Pattern.compile(pattern, java.util.regex.Pattern.CASE_INSENSITIVE);
matcher = pattern.matcher(inputString);
if (interruptMatch(matcher)) { // before the need to manage catastrophic backtracking here there was a simple if (matcher.find()){
// Do something
}
So, everythings seemed to work, at least for the first some hundreds patterns (also limiting in the timeout time the catastrophic backtracking long running find), until I got the following error:
JNI ERROR (app bug): weak global reference table overflow (max=51200)
It has been generated by the above java code (before this error didn't appear - cancelling the pattern which caused the catastrophic backtracking, obviously), but I cannot find out how to clean the global reference table, (I found a number of answers about similar issues generated directly by JNi code, but not from Java), not how to find a workaround or another valid approach.
EDIT: I further tried to debug and I found that the issue arises when I call the get method. I tried to follow the code of FutureTask, but I didn't find anything useful (and I get bored too fast).
Can you help me, please?
Thank you in advance
After other digging I found out that there is a tracked issue in Android, (it seems it deals with other topics, but it also answer mine) and from the replies I understand that it is just an issue which appear during debugging. I tested again my tester app and I found it's true: without debugging the above error doesn't happens. So, the severity of the issue is much lower and I can live with it (for me this is a closed issue) –
Related
I have the following code for logging all the errors after every command I run in cmd with my tool. (It runs p4 integrate commands, about 1000-1500/task)
if (errorArrayList.size() > 0) {
LoggerSingleton.I.writeDebugInfoTimeStampedLog("[INFO-CMD] CommandExecuter.java -> runAndGetResults: errors happened while running the following command: [ " + commandResultBean.getCommand() + " ]");
for (int i = 0; i < errorArrayList.size(); i++) {
LoggerSingleton.I.writeDebugErrorTimeStampedLog(errorArrayList.get(i));
commandResultBean.addToCLI_Error(errorArrayList.get(i));
}
LoggerSingleton.I.writeDebugInfoTimeStampedLog("[INFO-CMD] CommandExecuter.java -> runAndGetResults: Listing errors of command [" + commandResultBean.getCommand() + "] finished");
}
The feature that I'm working on right now is check the error I get, and if that's on a predefined error list (list of errors that doesn't matter, and in fact not real errors, for example "all revision(s) already integrated") do nothing else, but when it's a "real" error, write it to an other log file too (Because these debug logs way too long for the users of the tool, it's made for the developers more likely).
The question is, what is the best way for this?
I want to avoid big deceleration. I have many commands, but the number of errors less then the commands, but that is not unusual at all that I get 700-800 "irrelevant" errors in one task.
I will use another class to make the I/O part, and that is not a problem to extend the running time in case we catch a "real" error.
The list is constant, it is okay if it can be modified only by coding.
At the moment I don't know what type to use (2-3 single Strings, List, Array ...). What type should I use? I never used enums in Java before, in this one should I?
I guess a for or foreach and errorArrayList.get(i).contains(<myVariable>)in a method is the only option for the checking.
If I'm wrong, there is a better way to do this?
EDIT
If I have an ArrayList<String>called knownErrors with the irrelevant errors (can define only parts of it), and I use the following code will better performance than a method wrote above? Also, can I use it if I have only parts of the String? How?
if (errorArrayList.removeAll(knownErrors) {
//do the logging and stuff
}
ArrayList itself has a method removeAll(Collection c) which removes all the elements which are matching with input collection elements. Below program show it evidently. So if you have the known error to be skipped in arraylist and pass it to removeall method it will remove the known errors and errorArrayList will have only new errors.
I'm trying to write a loop that will find all the instances of "${arbitraryTextHere}" in an input string. E.g:
someText${findMe}moreText${findMeToo}EvenMoreText${DontForgetMe}
Here is my code:
Pattern placeholderPattern = Pattern.compile("\\$\\{[\\w|\\d]+\\}");
Matcher placeholderMatcher = placeholderPattern.matcher(templateString);
int workingIndex = 0;
while(placeholderMatcher.find()){
workingIndex = placeholderMatcher.start();
}
Note: The templateString I'm testing this out with is S"omeString ${someProp}"
The strange thing is that .find() has to return true in order to get inside the loop, but then .start() throws an IllegalStateException. The reason why this is so strange is that .start() only throws an IllegalStateException if the matcher's internal first variable is less than 0, but .find(), via the Matcher's boolean search(int from) method, will make sure that first is zero or greater unless no match is found, but if no match is found then .find() will return false, and we won't wind up in the loop body.
So what exactly is going on here?
Update: So I'f I encapsulate the above code so that it all runs in one unit test then it works. So I think the problem is related to having it in a class who's method is called from the unit test. But that's kind of weird. I'm going to dig into this aspect of the problem a bit more and then post an update.
Update: Ok, well I tried turning it off again and on again (I restarted my IntelliJ and recompiled my code) and now it's not broken anymore, so I think i must have screwed something up in that department.
As per the last update on my question, restarting IntelliJ and recompiling my code fixed things.
I saw the following code in this commit for MongoDB's Java Connection driver, and it appears at first to be a joke of some sort. What does the following code do?
if (!((_ok) ? true : (Math.random() > 0.1))) {
return res;
}
(EDIT: the code has been updated since posting this question)
After inspecting the history of that line, my main conclusion is that there has been some incompetent programming at work.
That line is gratuitously convoluted. The general form
a? true : b
for boolean a, b is equivalent to the simple
a || b
The surrounding negation and excessive parentheses convolute things further. Keeping in mind De Morgan's laws it is a trivial observation that this piece of code amounts to
if (!_ok && Math.random() <= 0.1)
return res;
The commit that originally introduced this logic had
if (_ok == true) {
_logger.log( Level.WARNING , "Server seen down: " + _addr, e );
} else if (Math.random() < 0.1) {
_logger.log( Level.WARNING , "Server seen down: " + _addr );
}
—another example of incompetent coding, but notice the reversed logic: here the event is logged if either _ok or in 10% of other cases, whereas the code in 2. returns 10% of the times and logs 90% of the times. So the later commit ruined not only clarity, but correctness itself.
I think in the code you have posted we can actually see how the author intended to transform the original if-then somehow literally into its negation required for the early return condition. But then he messed up and inserted an effective "double negative" by reversing the inequality sign.
Coding style issues aside, stochastic logging is quite a dubious practice all by itself, especially since the log entry does not document its own peculiar behavior. The intention is, obviously, reducing restatements of the same fact: that the server is currently down. The appropriate solution is to log only changes of the server state, and not each its observation, let alone a random selection of 10% such observations. Yes, that takes just a little bit more effort, so let's see some.
I can only hope that all this evidence of incompetence, accumulated from inspecting just three lines of code, does not speak fairly of the project as a whole, and that this piece of work will be cleaned up ASAP.
https://github.com/mongodb/mongo-java-driver/commit/d51b3648a8e1bf1a7b7886b7ceb343064c9e2225#commitcomment-3315694
11 hours ago by gareth-rees:
Presumably the idea is to log only about 1/10 of the server failures (and so avoid massively spamming the log), without incurring the cost of maintaining a counter or timer. (But surely maintaining a timer would be affordable?)
Add a class member initialized to negative 1:
private int logit = -1;
In the try block, make the test:
if( !ok && (logit = (logit + 1 ) % 10) == 0 ) { //log error
This always logs the first error, then every tenth subsequent error. Logical operators "short-circuit", so logit only gets incremented on an actual error.
If you want the first and tenth of all errors, regardless of the connection, make logit class static instead of a a member.
As had been noted this should be thread safe:
private synchronized int getLogit() {
return (logit = (logit + 1 ) % 10);
}
In the try block, make the test:
if( !ok && getLogit() == 0 ) { //log error
Note: I don't think throwing out 90% of the errors is a good idea.
I have seen this kind of thing before.
There was a piece of code that could answer certain 'questions' that came from another 'black box' piece of code. In the case it could not answer them, it would forward them to another piece of 'black box' code that was really slow.
So sometimes previously unseen new 'questions' would show up, and they would show up in a batch, like 100 of them in a row.
The programmer was happy with how the program was working, but he wanted some way of maybe improving the software in the future, if possible new questions were discovered.
So, the solution was to log unknown questions, but as it turned out, there were 1000's of different ones. The logs got too big, and there was no benefit of speeding these up, since they had no obvious answers. But every once in a while, a batch of questions would show up that could be answered.
Since the logs were getting too big, and the logging was getting in the way of logging the real important things he got to this solution:
Only log a random 5%, this will clean up the logs, whilst in the long run still showing what questions/answers could be added.
So, if an unknown event occurred, in a random amount of these cases, it would be logged.
I think this is similar to what you are seeing here.
I did not like this way of working, so I removed this piece of code, and just logged these
messages to a different file, so they were all present, but not clobbering the general logfile.
I want to test something for a while, say 5 seconds, and then pass the test if nothing wrong has been asserted. Is this possible with annotations? Can something like #Test(uptime=5000) be used?
Revised answer after question was edited
Fundamentally it feels like you're testing the wrong thing here - it seems very odd for "nothing happening" to be a sign of success.
If you want to prove that your algorithm can run for a certain amount of time without failing, I would actually extract out a single cycle, then write a test of something like:
#Test
public void fineForFiveSeconds() {
long start = System.nanoTime();
long end = start + TimeUnit.SECONDS.toNanos(5);
while (System.nanoTime < end()) {
test.executeOneIteration();
}
}
This way you don't have a separate thread which has to kill the working code, etc.
Original answer
This answer was written before the question indicated that timing out was a sign of success, not failure.
I think you just want the timeout attribute in the #Test annotation:
#Test(timeout = 5000)
with documentation:
Optionally specify timeout in milliseconds to cause a test method to fail if it takes longer than that number of milliseconds.
I know that going into a catch block has some significance cost when executing a program, however, I was wondering if entering a try{} block also had any impact so I started looking for an answer in google with many opinions, but no benchmarking at all. Some answers I found were:
Java try/catch performance, is it recommended to keep what is inside the try clause to a minimum?
Try Catch Performance Java
Java try catch blocks
However they didn't answer my question with facts, so I decided to try it for myself.
Here's what I did. I have a csv file with this format:
host;ip;number;date;status;email;uid;name;lastname;promo_code;
where everything after status is optional and will not even have the corresponding ; , so when parsing a validation has to be done to see if the value is there, here's where the try/catch issue came to my mind.
The current code that I inherited in my company does this:
StringTokenizer st=new StringTokenizer(line,";");
String host = st.nextToken();
String ip = st.nextToken();
String number = st.nextToken();
String date = st.nextToken();
String status = st.nextToken();
String email = "";
try{
email = st.nextToken();
}catch(NoSuchElementException e){
email = "";
}
and it repeats what it's done for email with uid, name, lastname and promo_code.
and I changed everything to:
if(st.hasMoreTokens()){
email = st.nextToken();
}
and in fact it performs faster. When parsing a file that doesn't have the optional columns. Here are the average times:
--- Trying:122 milliseconds
--- Checking:33 milliseconds
however, here's what confused me and the reason I'm asking: When running the example with values for the optional columns in all 8000 lines of the CSV, the if() version still performs better than the try/catch version, so my question is
Does really the try block does not have any performance impact on my code?
The average times for this example are:
--- Trying:105 milliseconds
--- Checking:43 milliseconds
Can somebody explain what's going on here?
Thanks a lot
Yes, try (in Java) does not have any performance impact. The compiler generates no VM statements for a try block. It simply records the program counters between which the try block is active and attaches this information to the method in the class file. Then, when an exception is thrown, the VM unwinds the stack and checks at each frame whether the program counter in that frame is in a relevant try block. This (together with building the stack trace) is quite costly, so catching is expensive. However, trying is free :).
Still, it is not good practice to use exceptions for regular control flow.
The reason why your code performs faster is probably that catching is so extremely costly that it outweights the time saved by replacing the check by a simple try.
Try catch can be faster in code where the catch is triggered not very often, e.g., if you go into the try 10000 times but only catch once, the try method would be faster than the if-check. Still, this is no good style and your way of explicitly checking for more tokens is to be preferred.