Duplication Criteria in Sonar

Duplication Criteria in Sonar - java

I have followed below link which is for Java Script
Sonarqube: Is it possible to adapt duplication metric for javascript code?
Similarly I have done for my Java project.
And as per this if we wish to change the duplication criteria, i.e. by default 10 lines, we have to add one line in sonar.properties file which is stored in project.
sonar.projectKey=Test
sonar.projectName=Test
sonar.projectVersion=1.0
sonar.sources=src
sonar.language=java
sonar.sourceEncoding=UTF-8
sonar.cpd.java.minimumLines=5
But its not working for Java, is there anything else I need to configure?

Per SonarQube's Duplications documentation:
A piece of code is considered duplicated as soon as there are at least 100 successive and duplicated tokens (can be overridden with property sonar.cpd.${language}.minimumTokens) spread on at least 10 lines of code (can be overridden with property sonar.cpd.${language}.minimumLines). For Java projects, the duplication detection mechanism behaves slightly differently. A piece of code is considered as duplicated as soon as there is the same sequence of 10 successive statements whatever the number of tokens and lines. This threshold cannot be overridden.

Related

Is exitEveryRule the best place to capture rule data (source Line) in Antlr4?

I'm trying to grab some data about the source code file and the node. If I want to capture source line number (int line = ctx.getStart().getLine();) if it goes in exitEveryRule am I correct in thinking that this will execute each time a rule ends? I saw another post that seemed to indicate this was more for errors.
I started to add code to each of the rules but in Java, that's a lot of rules. Seems like I should simply use this and be done. But before I delete and retry, any issues with that approach?

It did in fact work but there were some interesting dependencies on the grammar. It worked as a first effort, but we found we needed to extend the grammar is a few minor ways to get a consistent output. Most were "intermediate" steps so we could create a consistent. So it does work, and it does what you would expect but in the end, we needed to extend it a bit anyway.

Work out Analyzer, Version, etc. from Lucene index files?

Just double-checking on this: I assume this is not possible and that if you want to keep such info somehow bundled up with the index files in your index directory you have to work out a way to do it yourself.
Obviously you might be using different Analyzers for different directories, and 99% of the time it is pretty important to use the right one when constructing a QueryParser: if your QP has a different one all sorts of inaccuracies might crop up in the results.
Equally, getting the wrong Version of the index files might, for all I know, not result in a complete failure: again, you might instead get inaccurate results.
I wonder whether the Lucene people have ever considered bundling up this sort of info with the index files? Equally I wonder if anyone knows whether any of the Lucene derivative apps, like Elasticsearch, maybe do incorporate such a mechanism?
Actually, just looking inside the "_0" files (_0.cfe, _0.cfs and _0.si) of an index, all 3 do actually contain the word "Lucene" seemingly followed by version info. Hmmm...
PS other related thoughts which occur: say you are indexing a text document of some kind (or 1000 documents)... and you want to keep your index up-to-date each time it is opened. One obvious way to do this would be to compare the last-modified date of individual files with the last time the index was updated: any documents which are now out-of-date would need to have info pertaining to them removed from the index, and then have to be re-indexed.
This need must occur all the time in connection with Lucene indices. How is it generally tackled in the absence of helpful "meta info" included in with the index files proper?

Anyone interested in this issue:
It does appear from what I said that the Version is contained in the index files. I looked at the CheckIndex class and the various info you can get from that, e.g. CheckIndex.Status.SegmentInfoStatus, without finding a way to obtain the Version. I'm starting to assume this is deliberate, and that the idea is just to let Lucene handle the updating of the index as required. Not an entirely satisfactory state of affairs if so...
As for getting other things, such as the Analyzer class, it appears you have to implement this sort of "metadata" stuff yourself if you want to... this could be done by just including a text file in with the other files, or alternately it appears you can use the IndexData class. Of course your Version could also be stored this way.
For writing such info, see IndexWriter.setCommitData().
For retrieving such info, you have to use one of several (?) subclasses of IndexReader, such as DirectoryReader.

How to compare all the variables in two different Eclipse sessions

I am presently trying to find a bug in a long java code. This code reads an input text file and it runs many CFD simulations and then it collects results and analyzes them. In the input file there is a flag that only changes the order in which that bunch of completely independent threads is run. However, if I change this flag I have that the java program crashes after all the simulations are run, while it should actually behave the same. In order to debug this, I opened two different Eclipse sessions, and in each session I run the code. The only difference between the 2 sessions is the value of that flag I mentioned above. I have set up a breakpoint in a line after all the 37 threads (i.e. simulations) are run and results saved on file. That line is in a subroutine where only the main thread is active (all the other threads stopped) and it collects and elaborate further the results of the 37 simulations. I would like to have a way to quickly compare all the variables (in the present scope) of one Eclipse session with the other, in order to find possible differences. Saving them in a text file and comparing the 2 text files would make the trick for me. Do you know how I can do it, if even possible at all? Any other working solutions is also welcomed.

How to exclude numbers from Lucene Indexing?

I am working on an information retrieval application, using Lucene 5.3.1 (latest as of now), I managed to index the terms from a text file and then search within it. The text file happens to contain chapter numbers like 2.1, 3.4.2 and so on and so forth.
The problem is that I don't need these numbers indexed, as I have no need to search for them, and I haven't been able to find out how to exclude certain terms from the tokenizing, I know the Analyzer uses the StopWords set to exclude several terms, but it doesn't do anything with numbers as far as I know.

The simplest answer I can come up with – remove numbers from text before indexing. You can use regular expressions for that. This solution has one side effect – PositionIncrementAttribute will be calculated without those numbers, as they do not appear in text. This can broke some of your PhraseQuery'ies.
Another option, as were already mentioned – write custom TokenFilter to strip numbers out. But you should remember:
to tune Analyzer to not explode terms on dots. Otherwise 2.1 will be two terms instead of one. This again can cause problems with PhraseQuery;
correctly change value of PositionIncrementAttribute (increment it) while removing terms from TokenStream.

Estimate unit tests required in large code base

Our team is responsible for a large codebase containing legal rules.
The codebase works mostly like this:
class SNR_15_UNR extends Rule {
public double getValue(RuleContext context) {
double snr_15_ABK = context.getValue(SNR_15_ABK.class);
double UNR = context.getValue(GLOBAL_UNR.class);
if(UNR <= 0) // if UNR value would reduce snr, apply the reduction
return snr_15_ABK + UNR;
return snr_15_ABK;
}
}
When context.getValue(Class<? extends Rule>) is called, it just evaluates the specific rule and returns the result. This allows you to create a dependency graph while a rule is evaluating, and also to detect cyclic dependencies.
There are about 500 rule classes like this. We now want to implement tests to verify the correctness of these rules.
Our goal is to implement a testing list as follows:
TEST org.project.rules.SNR_15_UNR
INPUT org.project.rules.SNR_15_ABK = 50
INPUT org.project.rules.UNR = 15
OUTPUT SHOULD BE 50
TEST org.project.rules.SNR_15_UNR
INPUT org.project.rules.SNR_15_ABK = 50
INPUT org.project.rules.UNR = -15
OUTPUT SHOULD BE 35
Question is: how many test scenario's are needed? Is it possible to use static code analysis to detect how many unique code paths exist throughout the code? Does any such tool exist, or do I have to start mucking about with Eclipse JDT?
For clarity: I am not looking for code coverage tools. These tell me which code has been executed and which code was not. I want to estimate the development effort required to implement unit tests.

(EDIT 2/25, focused on test-coding effort):
You have 500 sub-classes, and each appears (based on your example with one conditional) to have 2 cases. I'd guess you need 500*2 tests.
If your code is not a regular as you imply, a conventional (branch) code coverage tool might not be the answer you think you want as starting place, but it might actually help you make an estimate. Code T<50 tests across randomly chosen classes, and collect code coverage data P (as a percentage) over whatever part of the code base you think needs testing (particularly your classes). Then you need roughly (1-P)*100*T tests.
If your extended classes are all as regular as you imply, you might consider generating them. If you trust the generation process, you might be able avoid writing the tests.
(ORIGINAL RESPONSE, focused on path coverage tools)
Most code coverage tools are "line" or "branch" coverage tools; they do not count unique paths through the code. At best they count basic blocks.
Path coverage tools do exist; people have built them for research demos, but commercial versions are relatively rare. You can find one at http://testingfaqs.org/t-eval.html#TCATPATH. I don't think this one handles Java.
One of the issues is that the apparent paths through code is generally exponential in the number of decisions since each encountered decision generates a True path and a False path based on the outcome of the conditional (1 decision --> 2 paths, 2 decisions --> 4 paths, ...). Worse loops are in effect a decision repeated as many times as the loop iterates; a loop that repeats a 100 times in effect has 2**100 paths. To control this problem, the more interesting path coverage tools try to determine the feasibility of a path: if the symbolically combined predicates from the conditionals in a prefix of that path are effectively false, the path is infeasible and can be ignored, since it can't really occur. Another standard trick is treat loops as 0, 1, and N iterations to reduce the number of apparent paths. Managing the number of paths requires rather a lot of machinery, considerably above what most branch-coverage test tools need, which helps explain why real path coverage tools are rare.

how many test scenario's are needed?
Many. 500 might be a good start.
Is it possible to use static code analysis to detect how many unique code paths exist throughout the code?
Yes. It called a code coverage tool. Here are some free ones. http://www.java-sources.com/open-source/code-coverage

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.