Java: Removing comments from string

Java: Removing comments from string - java

I'd like to do a function which gets a string and in case it has inline comments it removes it. I know it sounds pretty simple but i wanna make sure im doing this right, for example:
private String filterString(String code) {
// lets say code = "some code //comment inside"
// return the string "some code" (without the comment)
}
I thought about 2 ways: feel free to advice otherwise
Iterating the string and finding double inline brackets and using substring method.
regex way.. (im not so sure bout it)
can u tell me what's the best way and show me how it should be done? (please don't advice too advanced solutions)
edited: can this be done somehow with Scanner object? (im using this object anyway)

If you want a more efficient regex to really match all types of comments, use this one :
replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","");
source : http://ostermiller.org/findcomment.html
EDIT:
Another solution, if you're not sure about using regex is to design a small automata like follows :
public static String removeComments(String code){
final int outsideComment=0;
final int insideLineComment=1;
final int insideblockComment=2;
final int insideblockComment_noNewLineYet=3; // we want to have at least one new line in the result if the block is not inline.
int currentState=outsideComment;
String endResult="";
Scanner s= new Scanner(code);
s.useDelimiter("");
while(s.hasNext()){
String c=s.next();
switch(currentState){
case outsideComment:
if(c.equals("/") && s.hasNext()){
String c2=s.next();
if(c2.equals("/"))
currentState=insideLineComment;
else if(c2.equals("*")){
currentState=insideblockComment_noNewLineYet;
}
else
endResult+=c+c2;
}
else
endResult+=c;
break;
case insideLineComment:
if(c.equals("\n")){
currentState=outsideComment;
endResult+="\n";
}
break;
case insideblockComment_noNewLineYet:
if(c.equals("\n")){
endResult+="\n";
currentState=insideblockComment;
}
case insideblockComment:
while(c.equals("*") && s.hasNext()){
String c2=s.next();
if(c2.equals("/")){
currentState=outsideComment;
break;
}
}
}
}
s.close();
return endResult;
}

The best way to do this is to use regular expressions.
At first to find the /**/ comments and then remove all // commnets. For example:
private String filterString(String code) {
String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "")
}

Just use the replaceAll method from the String class, combined with a simple regular expression. Here's how to do it:
import java.util.*;
import java.lang.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "private String filterString(String code) {\n" +
" // lets say code = \"some code //comment inside\"\n" +
" // return the string \"some code\" (without the comment)\n}";
s = s.replaceAll("//.*?\n","\n");
System.out.println("s=" + s);
}
}
The key is the line:
s = s.replaceAll("//.*?\n","\n");
The regex //.*?\n matches strings starting with // until the end of the line.
And if you want to see this code in action, go here: http://www.ideone.com/e26Ve
Hope it helps!

To find the substring before a constant substring using a regular expression replacement is a bit much.
You can do it using indexOf() to check for the position of the comment start and substring() to get the first part, something like:
String code = "some code // comment";
int offset = code.indexOf("//");
if (-1 != offset) {
code = code.substring(0, offset);
}

#Christian Hujer has been correctly pointing out that many or all of the solutions posted fail if the comments occur within a string.
#Loïc Gammaitoni suggests that his automata approach could easily be extended to handle that case. Here is that extension.
enum State { outsideComment, insideLineComment, insideblockComment, insideblockComment_noNewLineYet, insideString };
public static String removeComments(String code) {
State state = State.outsideComment;
StringBuilder result = new StringBuilder();
Scanner s = new Scanner(code);
s.useDelimiter("");
while (s.hasNext()) {
String c = s.next();
switch (state) {
case outsideComment:
if (c.equals("/") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/"))
state = State.insideLineComment;
else if (c2.equals("*")) {
state = State.insideblockComment_noNewLineYet;
} else {
result.append(c).append(c2);
}
} else {
result.append(c);
if (c.equals("\"")) {
state = State.insideString;
}
}
break;
case insideString:
result.append(c);
if (c.equals("\"")) {
state = State.outsideComment;
} else if (c.equals("\\") && s.hasNext()) {
result.append(s.next());
}
break;
case insideLineComment:
if (c.equals("\n")) {
state = State.outsideComment;
result.append("\n");
}
break;
case insideblockComment_noNewLineYet:
if (c.equals("\n")) {
result.append("\n");
state = State.insideblockComment;
}
case insideblockComment:
while (c.equals("*") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/")) {
state = State.outsideComment;
break;
}
}
}
}
s.close();
return result.toString();
}

I made an open source library (on GitHub) for this purpose , its called CommentRemover you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
Little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}

for scanner, use a delimiter,
delimiter example.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class MainClass {
public static void main(String args[]) throws IOException {
FileWriter fout = new FileWriter("test.txt");
fout.write("2, 3.4, 5,6, 7.4, 9.1, 10.5, done");
fout.close();
FileReader fin = new FileReader("Test.txt");
Scanner src = new Scanner(fin);
// Set delimiters to space and comma.
// ", *" tells Scanner to match a comma and zero or more spaces as
// delimiters.
src.useDelimiter(", *");
// Read and sum numbers.
while (src.hasNext()) {
if (src.hasNextDouble()) {
System.out.println(src.nextDouble());
} else {
break;
}
}
fin.close();
}
}
Use a tokenizer for a normal string
tokenizer:
// start with a String of space-separated words
String tags = "pizza pepperoni food cheese";
// convert each tag to a token
StringTokenizer st = new StringTokenizer(tags," ");
while ( st.hasMoreTokens() )
{
String token = (String)st.nextToken();
System.out.println(token);
}
http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example

It will be better if code handles single line comment and multi line comment separately . Any suggestions ?
public class RemovingCommentsFromFile {
public static void main(String[] args) throws IOException {
BufferedReader fin = new BufferedReader(new FileReader("/home/pathtofilewithcomments/File"));
BufferedWriter fout = new BufferedWriter(new FileWriter("/home/result/File1"));
boolean multilinecomment = false;
boolean singlelinecomment = false;
int len,j;
String s = null;
while ((s = fin.readLine()) != null) {
StringBuilder obj = new StringBuilder(s);
len = obj.length();
for (int i = 0; i < len; i++) {
for (j = i; j < len; j++) {
if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '*') {
j += 2;
multilinecomment = true;
continue;
} else if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '/') {
singlelinecomment = true;
j = len;
break;
} else if (obj.charAt(j) == '*' && obj.charAt(j + 1) == '/') {
j += 2;
multilinecomment = false;
break;
} else if (multilinecomment == true)
continue;
else
break;
}
if (j == len)
{
singlelinecomment=false;
break;
}
else
i = j;
System.out.print((char)obj.charAt(i));
fout.write((char)obj.charAt(i));
}
System.out.println();
fout.write((char)10);
}
fin.close();
fout.close();
}

Easy solution that doesn't remove extra parts of code (like those above)
// works for any reader, you can also iterate over list of strings instead
String str="";
String s;
while ((s = reader.readLine()) != null)
{
s=s.replaceAll("//.*","\n");
str+=s;
}
str=str.replaceAll("/\\*.*\\*/"," ");

Related

How do i fix this return compilation error?

Getting the following error and I'm not sure how to solve it:
This method must return a result of type String
Program is supposed to print all lines of the file "months.txt" when "String result = fileRead(0, "months.txt");" = 0. It works without the huge if-else when I specify a line to be output but I can't figure out how to get it to work like this,
import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public class methodsExceptions1 {
public static void main(String[] args) throws IOException {
String result = fileRead(0, "months.txt");
System.out.println(result);
}
public static String fileRead(int line, String f) throws IOException {
File myFile = new File("months.txt");
Scanner inputFile = new Scanner(myFile);
if (line == 0 && inputFile.hasNextLine()) {
System.out.println(inputFile.nextLine());
} else {
String lineRead = "";
for (int i = 0; i < line; i++) {
if (inputFile.hasNextLine()) {
lineRead = inputFile.nextLine();
} else {
return "FILE READ ERROR: There are only " + i + " lines of text in this file";
}
}
inputFile.close();
return lineRead;
}
}
}

Your method only returns a string in the else branch. If line == 0 && inputFile.hasNextLine() is true, nothing is returned.
To fix the error, return something in your positive branch, or throw an exception.
Maybe you intended to return inputFile.nextLine() instead of printing it.

If you put your return statement inside if or if-else or else block, then you will get this type of error.
So, you should also write a return statement out of these blocks. (also, some editors will block this code during coding)
public class MethodsExceptions1 {
...
...
public static String fileRead(int line, String f) throws IOException {
File myFile = new File("months.txt");
Scanner inputFile = new Scanner(myFile);
if(line == 0 && inputFile.hasNextLine()) {
System.out.println(inputFile.nextLine());
}else {
String lineRead = "";
for (int i = 0; i < line; i++) {
if (inputFile.hasNextLine()) {
lineRead = inputFile.nextLine();
} else {
return "FILE READ ERROR: There are only " + i + " lines of text in this file";
}
}
inputFile.close();
return lineRead;
}
//need a return statement here
return new String("");
}
}

Your method is a liar. By its signature, you would expect it to read something from a file and return it as a String object. But in one case it returns nothing and prints something to the console. He's lying to you. Avoid side effects in your methods.
You can improve this if your method only returns a String. Let the caller of your method decide whether this String should be printed.
Instead of
if (line == 0 && inputFile.hasNextLine())
System.out.println(inputFile.nextLine());
do
if (line == 0 && inputFile.hasNextLine()) {
return inputFile.nextLine();

Base64 Encoding in Java vs HttpServerUtility.UrlTokenEncode in C#

I'm having a trouble while I tried to encode a String in Java.
I have the follwing code in C#, and the string Bpz2Gjg01d7VfGfD8ZP1UA==, when I execute C# code I'm getting:
QnB6MkdqZzAxZDdWZkdmRDhaUDFVQT090
public static void Main(string[] args)
{
string strWord = "Bpz2Gjg01d7VfGfD8ZP1UA==";
byte[] encbuff = Encoding.UTF8.GetBytes(strWord);
string strWordEncoded = HttpServerUtility.UrlTokenEncode(encbuff);
Console.WriteLine(strWordEncoded);
}
I'm trying to replicate the previous code in Java, in the first
attempt I used the javax.xml.bind.DatatypeConverter Class:
public static void main(String[] args) {
String strWord = "Bpz2Gjg01d7VfGfD8ZP1UA==";
byte[] encbuff = strWord.getBytes(StandardCharsets.UTF_8);
String strWordEncoded = DatatypeConverter.printBase64Binary(encbuff);
System.out.println(strWordEncoded);
}
But I'm getting the following String (
missing the last zero compared to C# string):
QnB6MkdqZzAxZDdWZkdmRDhaUDFVQT09
In my second attempt I used the BouncyCastle Base64 encoder:
public static void main(String[] args) {
String strWord = "Bpz2Gjg01d7VfGfD8ZP1UA==";
byte[] encbuff = strWord.getBytes(StandardCharsets.UTF_8);
String strWordEncoded = new String(Base64.encode(encbuff));
System.out.println(strWordEncoded);
}
But I'm getting the exact same previous String(
still missing the last zero):
QnB6MkdqZzAxZDdWZkdmRDhaUDFVQT09
Does anyone know what may be happening?

I've had a look at the .NET framework code. UrlTokenEncode actually removes any extra = padding symbols from the end of the base64 string and replaces them with the number of padding symbols, so either 0, 1, or 2. This is what's causing the extra 0 at the end of your string. So be aware: the HttpServerUtility.UrlTokenEncode method is NOT a plain Base64 encoder. It actually uses Convert.ToBase64String internally for the regular encoding and adds some more on top (see my comments on the question). If you need to create this exact string, you will need to implement the same changes in Java on top of the regular base64 encoding.

I found a solution based on the comments made to me, basically I look at the source code of the method in the Reference Source of Microsoft.
Then I translated the C# code to Java code, and it looks like this:
public static String UrlTokenEncode(byte[] input) {
try {
if (input == null) {
return null;
}
if (input.length < 1) {
return null;
}
String base64Str = null;
int endPos = 0;
char[] base64Chars = null;
base64Str = Base64.toBase64String(input);
if (base64Str == null) {
return null;
}
for (endPos = base64Str.length(); endPos > 0; endPos--) {
if (base64Str.charAt(endPos - 1) != '=') {
break;
}
}
base64Chars = new char[endPos + 1];
base64Chars[endPos] = (char) ((int) '0' + base64Str.length() - endPos);
for (int iter = 0; iter < endPos; iter++) {
char c = base64Str.charAt(iter);
switch (c) {
case '+':
base64Chars[iter] = '-';
break;
case '/':
base64Chars[iter] = '_';
break;
case '=':
base64Chars[iter] = c;
break;
default:
base64Chars[iter] = c;
break;
}
}
return new String(base64Chars);
} catch (Exception e) {
return null;
}
}
Finally I tested the method and I got the desire output:
public static void main(String[] args) {
String strWord = "Bpz2Gjg01d7VfGfD8ZP1UA==";
byte[] encbuff = strWord.getBytes(StandardCharsets.UTF_8);
String strWordEncoded = UrlTokenEncode(encbuff);
}
M2NIclh4eEwxRGp2MEsyeFc0SHVDZz090

I want to read a file and also check a word whether the word is present in the file or not. If the word is present one of my method will return +1

This is my code. I want to read a file called "write.txt" and then once it reads. Compare it with a word, here I use "target variable(of string type) once the comparison is done inside the method called findTarget it will return 1 after the condition is true. I try to call the method but I keep getting an error. test.java:88: error: cannot find symbol
String testing = findTarget(target1, source1);
^
symbol: variable target1
location: class test
1 error
can someone correct my mistake. I am quite new to programming.
import java.util.*;
import java.io.*;
public class test {
public static int findTarget( String target, String source )
{
int target_len = target.length();
int source_len = source.length();
int add = 0;
for(int i = 0;i < source_len; ++i) // i is an varialbe used to count upto
source_len.
{
int j = 0; // take another variable to count loops
while(add == 0)
{
if( j >= target_len ) // count upto target length
{
break;
}
else if( target.charAt( j ) != source.charAt( i + j ) )
{
break;
}
else
{
++j;
if( j == target_len )
{
add++; // this will return 1: true
}
}
}
}
return add;
//System.out.println(""+add);
}
public static void main ( String ... args )
{
//String target = "for";
// function 1
try
{
// read the file
File file = new File("write.txt"); //establising a file object
BufferedReader br = new BufferedReader(new FileReader(file));
//reading the files from the file object "file"
String target1;
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
System.out.println(target1);
//target.close();
}
catch (IOException e)
{
System.out.println("file error!");
}
String source1 = "Searching for a string within a string the hard way.";
// function 2
test ob = new test();
String testing = findTarget(target1, source1);
// end
//System.out.println(findTarget(target, source));
System.out.println("the answer is: "+testing);
}
}

The error is because findTarget is a class function.
So, where you have this:
test ob = new test();
String testing = findTarget(target1, source1);
...should be changed to call the function from a static context:
//test ob = new test(); not needed, the function is static
int testing = test.findTarget(target1, source1);
// also changed the testing type from String to int, as int IS findTarget's return type.
I don't have your file contents to give a trial run, but that should at least help get past the error.
=====
UPDATE:
You are close!
Inside main, change the code at your loop so that it looks like this:
String target1;
int testing = 0; // move and initialize testing here
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
{
//System.out.println(target1);
testing += test.findTarget(target1, source1);
//target1 = br.readLine();
}
System.out.println("answer is: "+testing);

I have finally been able to solve my problem. but extending the functionalities. I want to increment the add by 1. but in my programming, it keeps giving me output as
answer is: 1 answer is: 1
instead I want my program to print not two 1's rather 1+1 = 2
can someone fix this incrementing problem?
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.*;
public class test {
public static int findTarget(String target, String source) {
int target_len = target.length();
int source_len = source.length();
int add = 0;
// this function checks the character whether it is present.
for (int i = 0; i < source_len; ++i) // i is a varialbe used to count upto source_len.
{
int j = 0; // take another variable to count loops
while (add == 0)
{
if (j >= target_len) // count upto target length
{
break;
}
else if (target.charAt(j) != source.charAt(i + j))
{
break;
}
else
{
++j;
if (j == target_len)
{
add++; // this will return 1: true
}
}
}
}
return add;
//System.out.println(""+add);
}
public static void main(String... args) {
//String target = "for";
// function 1
try {
// read the file
Scanner sc = new Scanner(System.in);
System.out.println("Enter your review: ");
String source1 = sc.nextLine();
//String source1 = "Searching for a string within a string the hard way.";
File file = new File("write.txt"); //establising a file object
BufferedReader br = new BufferedReader(new FileReader(file)); //reading the files from the file object "file"
String target1;
while ((target1 = br.readLine()) != null) //as long the condition is not null it will keep printing.
{
//System.out.println(target1);
int testing = test.findTarget(target1, source1);
System.out.println("answer is: "+testing);
//target1 = br.readLine();
}
br.close();
}
catch (IOException e)
{
System.out.println("file error!");
}
}
}

Programmatically remove comments from Java File [duplicate]

I have a java project and i have used comments in many location in various java files in the project. Now i need to remove all type of comments : single line , multiple line comments .
Please provide automation for removing comments. using tools or in eclipse etc.
Currently i am manually trying to remove all commetns

You can remove all single- or multi-line block comments (but not line comments with //) by searching for the following regular expression in your project(s)/file(s) and replacing by $1:
^([^"\r\n]*?(?:(?<=')"[^"\r\n]*?|(?<!')"[^"\r\n]*?"[^"\r\n]*?)*?)(?<!/)/\*[^\*]*(?:\*+[^/][^\*]*)*?\*+/
It's possible that you have to execute it more than once.
This regular expression avoids the following pitfalls:
Code between two comments /* Comment 1 */ foo(); /* Comment 2 */
Line comments starting with an asterisk: //***NOTE***
Comment delimiters inside string literals: stringbuilder.append("/*");; also if there is a double quote inside single quotes before the comment
To remove all single-line comments, search for the following regular expression in your project(s)/file(s) and replace by $1:
^([^"\r\n]*?(?:(?<=')"[^"\r\n]*?|(?<!')"[^"\r\n]*?"[^"\r\n]*?)*?)\s*//[^\r\n]*
This regular expression also avoids comment delimiters inside double quotes, but does NOT check for multi-line comments, so /* // */ will be incorrectly removed.

I had to write somehting to do this a few weeks ago. This should handle all comments, nested or otherwise. It is long, but I haven't seen a regex version that handled nested comments properly. I didn't have to preserve javadoc, but I presume you do, so I added some code that I belive should handle that. I also added code to support the \r\n and \r line separators. The new code is marked as such.
public static String removeComments(String code) {
StringBuilder newCode = new StringBuilder();
try (StringReader sr = new StringReader(code)) {
boolean inBlockComment = false;
boolean inLineComment = false;
boolean out = true;
int prev = sr.read();
int cur;
for(cur = sr.read(); cur != -1; cur = sr.read()) {
if(inBlockComment) {
if (prev == '*' && cur == '/') {
inBlockComment = false;
out = false;
}
} else if (inLineComment) {
if (cur == '\r') { // start untested block
sr.mark(1);
int next = sr.read();
if (next != '\n') {
sr.reset();
}
inLineComment = false;
out = false; // end untested block
} else if (cur == '\n') {
inLineComment = false;
out = false;
}
} else {
if (prev == '/' && cur == '*') {
sr.mark(1); // start untested block
int next = sr.read();
if (next != '*') {
inBlockComment = true; // tested line (without rest of block)
}
sr.reset(); // end untested block
} else if (prev == '/' && cur == '/') {
inLineComment = true;
} else if (out){
newCode.append((char)prev);
} else {
out = true;
}
}
prev = cur;
}
if (prev != -1 && out && !inLineComment) {
newCode.append((char)prev);
}
} catch (IOException e) {
e.printStackTrace();
}
return newCode.toString();
}

you can try it with the java-comment-preprocessor:
java -jar ./jcp-6.0.0.jar --i:/sourceFolder --o:/resultFolder -ef:none --r
source

I made a open source library and uploaded to github, its called CommentRemover you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
There is a little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}

This is an old post but this may help someone who enjoys working on command line like myself:
The perl one-liner below will remove all comments:
perl -0pe 's|//.*?\n|\n|g; s#/\*(.|\n)*?\*/##g;' test.java
Example:
cat test.java
this is a test
/**
*This should be removed
*This should be removed
*/
this should not be removed
//this should be removed
this should not be removed
this should not be removed //this should be removed
Output:
perl -0pe 's#/\*\*(.|\n)*?\*/##g; s|//.*?\n|\n|g' test.java
this is a test
this should not be removed
this should not be removed
this should not be removed
If you want get rid of multiple blank lines as well:
perl -0pe 's|//.*?\n|\n|g; s#/\*(.|\n)*?\*/##g; s/\n\n+/\n\n/g' test.java
this is a test
this should not be removed
this should not be removed
this should not be removed
EDIT: Corrected regex

Dealing with source code is hard unless you know more on the writing of comment.
In the more general case, you could have // or /* in text constants. So your really need to parse the file at a syntaxic level, not only lexical. IMHO the only bulletproof solution would be to start for example with the java parser from openjdk.
If you know that your comments are never deeply mixed with the code (in my exemple comments MUST be full lines), a python script could help
multiple = False
for line in text:
stripped = line.strip()
if multiple:
if stripped.endswith('*/'):
multiple = False
continue
elif stripped.startswith('/*'):
multiple = True
elif stripped.startswith('//'):
pass
else:
print(line)

If you are using Eclipse IDE, you could make regex do the work for you.
Open the search window (Ctrl+F), and check 'Regular Expression'.
Provide the expression as
/\*\*(?s:(?!\*/).)*\*/
Prasanth Bhate has explained it in Tool to remove JavaDoc comments?

public class TestForStrings {
/**
* The main method.
*
* #param args
* the arguments
* #throws Exception
* the exception
*/
public static void main(String args[]) throws Exception {
String[] imports = new String[100];
String fileName = "Menu.java";
// This will reference one API at a time
String line = null;
try {
FileReader fileReader = new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader = new BufferedReader(fileReader);
int startingOffset = 0;
// This will reference one API at a time
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.forName("ISO-8859-1"));
// remove single line comments
for (int count = 0; count < lines.size(); count++) {
String tempString = lines.get(count);
lines.set(count, removeSingleLineComment(tempString));
}
// remove multiple lines comment
for (int count = 0; count < lines.size(); count++) {
String tempString = lines.get(count);
removeMultipleLineComment(tempString, count, lines);
}
for (int count = 0; count < lines.size(); count++) {
System.out.println(lines.get(count));
}
} catch (FileNotFoundException ex) {
System.out.println("Unable to open file '" + fileName + "'");
} catch (IOException ex) {
System.out.println("Error reading file '" + fileName + "'");
} catch (Exception e) {
}
}
/**
* Removes the multiple line comment.
*
* #param tempString
* the temp string
* #param count
* the count
* #param lines
* the lines
* #return the string
*/
private static List<String> removeMultipleLineComment(String tempString,
int count, List<String> lines) {
try {
if (tempString.contains("/**") || (tempString.contains("/*"))) {
int StartIndex = count;
while (!(lines.get(count).contains("*/") || lines.get(count)
.contains("**/"))) {
count++;
}
int endIndex = ++count;
if (StartIndex != endIndex) {
while (StartIndex != endIndex) {
lines.set(StartIndex, "");
StartIndex++;
}
}
}
} catch (Exception e) {
// Do Nothing
}
return lines;
}
/**
* Remove single line comments .
*
* #param line
* the line
* #return the string
* #throws Exception
* the exception
*/
private static String removeSingleLineComment(String line) throws Exception {
try {
if (line.contains(("//"))) {
int startIndex = line.indexOf("//");
int endIndex = line.length();
String tempoString = line.substring(startIndex, endIndex);
line = line.replace(tempoString, "");
}
if ((line.contains("/*") || line.contains("/**"))
&& (line.contains("**/") || line.contains("*/"))) {
int startIndex = line.indexOf("/**");
int endIndex = line.length();
String tempoString = line.substring(startIndex, endIndex);
line = line.replace(tempoString, "");
}
} catch (Exception e) {
// Do Nothing
}
return line;
}
}

This is what I came up with yesterday.
This is actually homework I got from school so if anybody reads this and finds a bug before I turn it in, please leave a comment =)
ps. 'FilterState' is a enum class
public static String deleteComments(String javaCode) {
FilterState state = FilterState.IN_CODE;
StringBuilder strB = new StringBuilder();
char prevC=' ';
for(int i = 0; i<javaCode.length(); i++){
char c = javaCode.charAt(i);
switch(state){
case IN_CODE:
if(c=='/')
state = FilterState.CAN_BE_COMMENT_START;
else {
if (c == '"')
state = FilterState.INSIDE_STRING;
strB.append(c);
}
break;
case CAN_BE_COMMENT_START:
if(c=='*'){
state = FilterState.IN_COMMENT_BLOCK;
}
else if(c=='/'){
state = FilterState.ON_COMMENT_LINE;
}
else {
state = FilterState.IN_CODE;
strB.append(prevC+c);
}
break;
case ON_COMMENT_LINE:
if(c=='\n' || c=='\r') {
state = FilterState.IN_CODE;
strB.append(c);
}
break;
case IN_COMMENT_BLOCK:
if(c=='*')
state=FilterState.CAN_BE_COMMENT_END;
break;
case CAN_BE_COMMENT_END:
if(c=='/')
state = FilterState.IN_CODE;
else if(c!='*')
state = FilterState.IN_COMMENT_BLOCK;
break;
case INSIDE_STRING:
if(c == '"' && prevC!='\\')
state = FilterState.IN_CODE;
strB.append(c);
break;
default:
System.out.println("unknown case");
return null;
}
prevC = c;
}
return strB.toString();
}

private static int find(String s, String t, int start) {
int ret = s.indexOf(t, start);
return ret < 0 ? Integer.MAX_VALUE : ret;
}
private static int findSkipEsc(String s, String t, int start) {
while(true) {
int ret = find(s, t, start);
if( ret == Integer.MAX_VALUE) return -1;
int esc = find(s, "\\", start);
if( esc > ret) return ret;
start += 2;
}
}
private static String removeLineCommnt(String s) {
int i, start = 0;
while (0 <= (i = find(s, "//", start))) { //Speed it up
int j = find(s, "'", start);
int k = find(s, "\"", start);
int first = min(i, min(j, k));
if (first == Integer.MAX_VALUE) return s;
if (i == first) return s.substring(0, i);
//skipp quoted string
start = first+1;
if (k == first) { // " asdas\"dasd "
start = findSkipEsc(s,"\"",start);
if (start < 0) return s;
start++;
continue;
}
//if j == first ' asda\'sasd ' --- not in JSON
start = findSkipEsc(s,"'\"'",start);
if (start < 0) return s;
start++;
}
return s;
}
static String removeLineCommnts(String s) {
if (!s.contains("//")) return s; //Speed it up
return Arrays.stream(s.split("[\\n\\r]+")).
map(Common::removeLineCommnt).
collect(Collectors.joining("\n"));
}

Counting number of words in a file

I'm having a problem counting the number of words in a file. The approach that I am taking is when I see a space or a newLine then I know to count a word.
The problem is that if I have multiple lines between paragraphs then I ended up counting them as words also. If you look at the readFile() method you can see what I am doing.
Could you help me out and guide me in the right direction on how to fix this?
Example input file (including a blank line):
word word word
word word
word word word

You can use a Scanner with a FileInputStream instead of BufferedReader with a FileReader. For example:-
File file = new File("sample.txt");
try(Scanner sc = new Scanner(new FileInputStream(file))){
int count=0;
while(sc.hasNext()){
sc.next();
count++;
}
System.out.println("Number of words: " + count);
}

I would change your approach a bit. First, I would use a BufferedReader to read the file file in line-by-line using readLine(). Then split each line on whitespace using String.split("\\s") and use the size of the resulting array to see how many words are on that line. To get the number of characters you could either look at the size of each line or of each split word (depending of if you want to count whitespace as characters).

This is just a thought. There is one very easy way to do it. If you just need number of words and not actual words then just use Apache WordUtils
import org.apache.commons.lang.WordUtils;
public class CountWord {
public static void main(String[] args) {
String str = "Just keep a boolean flag around that lets you know if the previous character was whitespace or not pseudocode follows";
String initials = WordUtils.initials(str);
System.out.println(initials);
//so number of words in your file will be
System.out.println(initials.length());
}
}

Just keep a boolean flag around that lets you know if the previous character was whitespace or not (pseudocode follows):
boolean prevWhitespace = false;
int wordCount = 0;
while (char ch = getNextChar(input)) {
if (isWhitespace(ch)) {
if (!prevWhitespace) {
prevWhitespace = true;
wordCount++;
}
} else {
prevWhitespace = false;
}
}

I think a correct approach would be by means of Regex:
String fileContent = <text from file>;
String[] words = Pattern.compile("\\s+").split(fileContent);
System.out.println("File has " + words.length + " words");
Hope it helps. The "\s+" meaning is in Pattern javadoc

import java.io.BufferedReader;
import java.io.FileReader;
public class CountWords {
public static void main (String args[]) throws Exception {
System.out.println ("Counting Words");
FileReader fr = new FileReader ("c:\\Customer1.txt");
BufferedReader br = new BufferedReader (fr);
String line = br.readLin ();
int count = 0;
while (line != null) {
String []parts = line.split(" ");
for( String w : parts)
{
count++;
}
line = br.readLine();
}
System.out.println(count);
}
}

Hack solution
You can read the text file into a String var. Then split the String into an array using a single whitespace as the delimiter StringVar.Split(" ").
The Array count would equal the number of "Words" in the file.
Of course this wouldnt give you a count of line numbers.

3 steps: Consume all the white spaces, check if is a line, consume all the nonwhitespace.3
while(true){
c = inFile.read();
// consume whitespaces
while(isspace(c)){ inFile.read() }
if (c == '\n'){ numberLines++; continue; }
while (!isspace(c)){
numberChars++;
c = inFile.read();
}
numberWords++;
}

File Word-Count
If in between words having some symbols then you can split and count the number of Words.
Scanner sc = new Scanner(new FileInputStream(new File("Input.txt")));
int count = 0;
while (sc.hasNext()) {
String[] s = sc.next().split("d*[.#:=#-]");
for (int i = 0; i < s.length; i++) {
if (!s[i].isEmpty()){
System.out.println(s[i]);
count++;
}
}
}
System.out.println("Word-Count : "+count);

Take a look at my solution here, it should work. The idea is to remove all the unwanted symbols from the words, then separate those words and store them in some other variable, i was using ArrayList. By adjusting the "excludedSymbols" variable you can add more symbols which you would like to be excluded from the words.
public static void countWords () {
String textFileLocation ="c:\\yourFileLocation";
String readWords ="";
ArrayList<String> extractOnlyWordsFromTextFile = new ArrayList<>();
// excludedSymbols can be extended to whatever you want to exclude from the file
String[] excludedSymbols = {" ", "," , "." , "/" , ":" , ";" , "<" , ">", "\n"};
String readByteCharByChar = "";
boolean testIfWord = false;
try {
InputStream inputStream = new FileInputStream(textFileLocation);
byte byte1 = (byte) inputStream.read();
while (byte1 != -1) {
readByteCharByChar +=String.valueOf((char)byte1);
for(int i=0;i<excludedSymbols.length;i++) {
if(readByteCharByChar.equals(excludedSymbols[i])) {
if(!readWords.equals("")) {
extractOnlyWordsFromTextFile.add(readWords);
}
readWords ="";
testIfWord = true;
break;
}
}
if(!testIfWord) {
readWords+=(char)byte1;
}
readByteCharByChar = "";
testIfWord = false;
byte1 = (byte)inputStream.read();
if(byte1 == -1 && !readWords.equals("")) {
extractOnlyWordsFromTextFile.add(readWords);
}
}
inputStream.close();
System.out.println(extractOnlyWordsFromTextFile);
System.out.println("The number of words in the choosen text file are: " + extractOnlyWordsFromTextFile.size());
} catch (IOException ioException) {
ioException.printStackTrace();
}
}

This can be done in a very way using Java 8:
Files.lines(Paths.get(file))
.flatMap(str->Stream.of(str.split("[ ,.!?\r\n]")))
.filter(s->s.length()>0).count();

BufferedReader bf= new BufferedReader(new FileReader("G://Sample.txt"));
String line=bf.readLine();
while(line!=null)
{
String[] words=line.split(" ");
System.out.println("this line contains " +words.length+ " words");
line=bf.readLine();
}

The below code supports in Java 8
//Read file into String
String fileContent=new String(Files.readAlBytes(Paths.get("MyFile.txt")),StandardCharacters.UFT_8);
//Keeping these into list of strings by splitting with a delimiter
List<String> words = Arrays.asList(contents.split("\\PL+"));
int count=0;
for(String x: words){
if(x.length()>1) count++;
}
sop(x);

So easy we can get the String from files by method: getText();
public class Main {
static int countOfWords(String str) {
if (str.equals("") || str == null) {
return 0;
}else{
int numberWords = 0;
for (char c : str.toCharArray()) {
if (c == ' ') {
numberWords++;
}
}
return ++numberWordss;
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Removing comments from string - java

Easy solution that doesn't remove extra parts of code (like those above) // works for any reader, you can also iterate over list of strings instead String str=""; String s; while ((s = reader.readLine()) != null) { s=s.replaceAll("//.","\n"); str+=s; } str=str.replaceAll("/\\.\\/"," ");

Related

How do i fix this return compilation error?

Base64 Encoding in Java vs HttpServerUtility.UrlTokenEncode in C#

I want to read a file and also check a word whether the word is present in the file or not. If the word is present one of my method will return +1

Programmatically remove comments from Java File [duplicate]

Counting number of words in a file

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java: Removing comments from string - java

Easy solution that doesn't remove extra parts of code (like those above) // works for any reader, you can also iterate over list of strings instead String str=""; String s; while ((s = reader.readLine()) != null) { s=s.replaceAll("//.*","\n"); str+=s; } str=str.replaceAll("/\\*.*\\*/"," ");

Related

How do i fix this return compilation error?

Base64 Encoding in Java vs HttpServerUtility.UrlTokenEncode in C#

I want to read a file and also check a word whether the word is present in the file or not. If the word is present one of my method will return +1

Programmatically remove comments from Java File [duplicate]

Counting number of words in a file

Categories

Resources

Easy solution that doesn't remove extra parts of code (like those above) // works for any reader, you can also iterate over list of strings instead String str=""; String s; while ((s = reader.readLine()) != null) { s=s.replaceAll("//.","\n"); str+=s; } str=str.replaceAll("/\\.\\/"," ");