Remove comments from a java file and maintain file structure

Remove comments from a java file and maintain file structure - java

I am working on a project that requires me to remove comments from a java file. Currently, I am using the regular expression
(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)
which I got from http://ostermiller.org/findcomment.html.
The regular expression works well, but the problem is that I need to preserve the file structure when I remove the comments. In other words, if I have a 3 line block comment, I need it to be replaced with 3 blank lines. This is necessary so that the code remains on the same line numbers as the original.
How would I replace the 3 line block comment with 3 blank lines?
Edit:
I was able to solve my problem by making use of SableCC.

I haven't fully sussed out what that regex is doing, but if it matches the entire comment, then you can get the matched comment, check to see how many newlines it contains, and then replace the match with that many newlines instead of replacing it with the empty string.

If you're set on regex you can try this
~/(?:/.*?$|\*[^*]*\*/)~
DEMO
This makes use of two different non-capture groups
Since all comments (single-line and multi-line) have to start with a / that's the first character of the regex. Then a comment can have another / or a *. This is where the alternation comes in. The first part /.*?$ handles single line comments, while the second part \*[^*]*\* matches on multi-line comments.
If your multi-line comments are formatted with leading * followed by a <space>, like this:
/* mu
* lti
* line
* comment
*/
then this DEMO should do the trick (I don't think a line can start with a * in Java, unless it's in a comment).
Unfortunately, I have not found a suitable substitution to preserve line spacing if they are not formatted as above.

Related

Regex for adding a word to a specific line if line does not contain the word

I have a YAML file with multiple lines and I know there's one line that looks like this:
...
schemas: core,ext,plugin
...
Note that there is unknown number of whitespaces at the beginning of this line (because YAML). The line can be identified uniquely by the schemas: expression. The number of existing values for the schemas property is unknown, but greater than zero. And I do not know what these values are, except that one of them might be foo.
I would like to use a regex match-and-replace to append the word ,foo to this line if foo is not already contained in the list of values at any position. foo might appear on any other line but I want to ignore these instances. I don't want the other lines to be modified.
I've tried different regular expressions with lookarounds and capture groups, but none did the job. My latest attempt that looked promising at first was:
(?s)(?!.*foo)(.*schemas:.*)
But this does not match if foo is contained on any other line, which is not what I want.
Any assistance would be very much appreciated. Thanks.
(I use the Java regex engine, btw.)

Would this work?
^(?!.*foo)(\s*schemas:.*)$
If you want to make sure stuff like
food, fool, etc.
matches you can use this:
^(?!.*(?:foo\s*$|foo,))(\s*schemas:.*)$
Replacement:
$1,foo
If I understood your question correctly, you want to make sure only one line is checked for the negative lookahead. This should accomplish that. I tested it on https://regex101.com/ using the Java 8 engine. You can also check what each operator does there.
Explanation:
wrapping the expression with
^$
makes sure that only one line is considered at a time.
The negative lookahead
(?!.*(?:foo\s*$|foo,))
looks for any "foo" followed by either (whitespaces and a newline) or a comma within this line. If you want to make the expression faster you could probably turn the lookahead into a lookbehind, so that the simpler check for "schemas:" comes first. However, I don't know if this actually improves performance.
^(\s*schemas:.*)(?<!(?:foo\s?$|foo,))$
With lookbehinds you can't use the * quantifier, so the regex would match if foo is followed by more than one whitespace.

Java regex plus at the beginning is optional

I would like to write Java regex where plus at the beginning is optional
I try this but not working correctly
[+]+[0-9]{3,}
so that +123 and 123 is valid
What I am doing wrong?

As Hamza commented below, use [+]?[0-9]{3,}. A question mark means one or none of the previous, which in this case means one or no + before the three numbers.

ignoring multiple line comments in java

I am writing a program where I need to ignore comments in the file passed.
I read about Regex pattern in this concern..and am able to ignore single line comments(//...) and multiple line comments, if its defined in a single line (/.../).
But am facing difficulty in ignoring multiple line comments like shown below:
>
/* ........
..........
....*/
for single line I used
"//.*$"
and for the second one,
"/\ \ * . * \ \ * /"
Somewhere I even read that Reluctanat quantifier would be helpful...tried with differnt patterns using regex and reluctant quantifier...no joy..
Can someone help me with this..?
Thanks

I think it will works for you :
/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/
or for both single line and multiline comment :
(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|(//.*)

The . normally does not match new line characters like \r, \n and others.
You can however use DOT_ALL which changes this behaviour. This is identical with the (?s).
"(?s)/\\*.*\\*/"
"/\\*[.\r\n]*\\*/"
This replacement should be done before treating single line comments. And there . should not match a new line.

Thanks all for the suggestions..However,I got one exact solution... the pattern
"(?s)/\*.+?\*/" works fine in removing multiline comments..

You may use following to solve your problem
"(?s)/\*.\/"
"/\*[.\r\n]\/"

Replacing text in between certain symbols

I am given a string like this:
CSF#asomedatahere#iiwin#hnotwhatIwant
And I want to replace the string that is present BETWEEN #i and #h (h could be any character) . This is what I have so far and I feel that I am close, however, there may not always be a #CHAR after this #idata pattern.
(?<=#i)(.*)(?=#.*)
I would like it to work for that optionally not being there. As it can be seen in the link below it works for the first case not the second. I tried adding a '?' at the end to make the last part optional but that makes it not work for the first case.
Here is a link that will show you actively what is not working: http://fiddle.re/vtvmc

You need to expand the look-ahead to use the end of the input as well:
(?<=#i)(.*?)(?=#.*|$)
This would match
iwin#hnotwhatIwant in CSF#asomedatahere#iiwin#hnotwhatIwant
iwin#h in CSF#asomedatahere#iiwin#h
iwin in CSF#asomedatahere#iiwin.

Regular expression- prefix and suffix without a specific string

I'd like your help with the following problem.
I'm trying to define a regular expression which represent a valid comment in Java.
For that I want a prefix: "/\*" + Everything including new lines and tabs BUT not another "\*/"+ a suffix "*/"
I tried this one: "/\*"[^"\*/"]"\*/" but it does not work. It takes /*fdfsd */ */ as one valid comment
What should I do?

You can try with
yourString.matches("/[*]((?![*]/).)*[*]/")
this will match at start /* and */ at end. In the middle I am using negative look-ahead to test if character (represented by dot) is not first * in */. Of course it involves little backtracking so performance may be improved but for now it would do the trick.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove comments from a java file and maintain file structure - java

I haven't fully sussed out what that regex is doing, but if it matches the entire comment, then you can get the matched comment, check to see how many newlines it contains, and then replace the match with that many newlines instead of replacing it with the empty string.

Related

Regex for adding a word to a specific line if line does not contain the word

Java regex plus at the beginning is optional

ignoring multiple line comments in java

Replacing text in between certain symbols

Regular expression- prefix and suffix without a specific string

Categories

Resources