Compare line by line from a text file in Java - java

I am trying to compare lines from a text file in Java.
For example, there is a text file with these lines:
temp1 am 32.5 pm 33.5 temp2 am 33.5 pm 33.5 temp3 am 32.5 pm
33.5 temp4 am 31.5 pm 35
a b c d e
a is the name of the line, b is constant(am), c is a variable, d is constant(pm), e is another variable.
It will only compare the variables -> temp1(c) to temp2(c), temp1(e) to temp2(e) etc.
When there are two or more lines with the same c(s) and e(s), it will throw FormatException.
From the example text file above, because temp1's c is the same as temp3's c and temps1's e is the same as temp3's e, it will throw FormatException.
This is what I have so far:
public static Temp read(String file) throws FormatException {
String line = "";
FileReader fr = new FileReader(fileName);
Scanner scanner = new Scanner(fr);
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
System.out.println(line);
}
scanner.close();
if () {
throw new FormatException("Error.");
How can I make this?

You will need to split your lines to extract your variables and a Set to check for duplicates as next:
Set<String> ceValues = new HashSet<>();
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] values = line.split(" ");
if (!ceValues.add(String.format("%s %s", values[2], values[4]))) {
// The value has already been added so we throw an exception
throw new FormatException("Error.");
}
}

As I don't want to do your homework for you, let me get you started:
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] partials = line.split(" ");
String a = partials[0];
//...
String e = partials[4];
}
I'm splitting the line over a space as this is the only thing to split over in your case. This gives us 5 seperate strings (a through e). You will need to save them in a String[][] for later analysis but you should be able to figure out for yourself how to do this.
Try playing around with this and update your question if you're still stuck.

Here you got an example that basically includes:
a collection in which store your lines
simple pattern matching logic (see Java Regex Tutorial for more)
a try-with-resource statement
a recursive check method
First of all I would make a simple POJO representing a line info:
public class LineInfo {
private String lineName;
private String am;
private String pm;
public LineInfo(String lineName, String am, String pm) {
this.lineName = lineName;
this.am = am;
this.pm = pm;
}
// getters and setters
}
Second I would need a pattern to validate each line and extract data from them:
// group 1 group 2 group3 group 4 group 5
// v v v v v
private static final String LINE_REGEX = "(\\w+)\\s+am\\s+(\\d+(\\.\\d+)?)\\s+pm\\s+(\\d+(\\.\\d+)?)";
private static final Pattern LINE_PATTERN = Pattern.compile(LINE_REGEX);
Third I would rework the read method like this (I return void for simplicity):
public static void read(String fileName) throws FormatException {
// collect your lines (or better the information your lines provide) in some data structure, like a List
final List<LineInfo> lines = new ArrayList<>();
// with this syntax your FileReader and Scanner will be closed automatically
try (FileReader fr = new FileReader(fileName); Scanner scanner = new Scanner(fr)) {
while (scanner.hasNextLine()) {
final String line = scanner.nextLine();
final Matcher matcher = LINE_PATTERN.matcher(line);
if (matcher.find()) {
lines.add(new LineInfo(matcher.group(1), matcher.group(2), matcher.group(4)));
} else {
throw new FormatException("Line \"" + line + "\" is not valid.");
}
}
// recursive method
compareLines(lines, 0);
} catch (final IOException e) {
e.printStackTrace();
// or handle it in some way
}
}
private static void compareLines(List<LineInfo> lines, int index) throws FormatException {
// if there are no more lines return
if (index == lines.size()) {
return;
}
final LineInfo line = lines.get(index);
for (int i = index + 1; i < lines.size(); i++) {
final LineInfo other = lines.get(i);
// do the check
if (line.getAm().equals(other.getAm()) && line.getPm().equals(other.getPm())) {
throw new FormatException(String.format("Lines #%d (%s) and #%d (%s) does not meet the requirements.",
index, line.getLineName(), i, other.getLineName()));
}
}
// do the same thing with the next line
compareLines(lines, index + 1);
}

If I got your question right then you need to check line by line in order to find duplicates using c and e as criteria
this means, line n must be compared against all the other lines, if repeated then exception...
The suggestion will be:
Define a class that represent the element c and e of every line...
class LinePojo {
private String c;
private String e;
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((c == null) ? 0 : c.hashCode());
result = prime * result + ((e == null) ? 0 : e.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
LinePojo other = (LinePojo) obj;
if (c == null) {
if (other.c != null)
return false;
} else if (!c.equals(other.c))
return false;
if (e == null) {
if (other.e != null)
return false;
} else if (!e.equals(other.e))
return false;
return true;
}
#Override
public String toString() {
return "(c=" + c + ", e=" + e + ")";
}
public LinePojo(String c, String e) {
this.c = c;
this.e = e;
}
}
then a list of that class where every line will be inserted and /or check if an element is there or not..
List<LinePojo> myList = new ArrayList<LinePojo>();
then iterate line by line
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
String[] lineInfo = line.split(" ");
LinePojo lp = new LinePojo(lineInfo[2], lineInfo[4]);
if (myList.contains(lp)) {
throw new IllegalArgumentException("there is a duplicate element");
} else {
myList.add(lp);
}
}

Related

write to new row every xth element java csv write

Using an answer I found here on SO I have found a way to write out my resultset to a csv file. However it currently just writes every element of the array to a new column. How would I alter the following code to change format to create new row, on every xth element like below?
int value = 2
Current format: a, b, c, d, e, f
Desired format: a, b,
c, d,
e, f
I know I can utilize the modulo of the int value, but am unsure how to write to a specific column or row.
private static final String DEFAULT_SEPARATOR = " ";
public static void writeLine(Writer w, List<String> values, String separators, String customQuote) throws IOException {
boolean first = true;
//default customQuote is empty
if (separators == " ") {
separators = DEFAULT_SEPARATOR;
}
StringBuilder sb = new StringBuilder();
for (String value : values) {
if (!first) {
sb.append(separators);
}
if (customQuote == " ") {
sb.append(followCVSformat(value));
} else {
sb.append(customQuote).append(followCVSformat(value)).append(customQuote);
}
first = false;
}
sb.append("\n");
w.append(sb.toString());
}
private static String followCVSformat(String value) {
String result = value;
if (result.contains("\"")) {
result = result.replace("\"", "\"\"");
}
return result;
}
public static void writeLine(
Writer w, List<String> values, String separators, String customQuote, int elementsPerRow)
throws IOException {
...
int counter = 0;
for (String value : values) {
if (!first) {
sb.append(separators);
}
if (counter != 0 && counter % elementsPerRow == 0)
sb.append("\n");
if (customQuote == " ") {
sb.append(followCVSformat(value));
} else {
sb.append(customQuote).append(followCVSformat(value)).append(customQuote);
}
first = false;
counter++;
}
...

Programmatically remove comments from Java File [duplicate]

I have a java project and i have used comments in many location in various java files in the project. Now i need to remove all type of comments : single line , multiple line comments .
Please provide automation for removing comments. using tools or in eclipse etc.
Currently i am manually trying to remove all commetns
You can remove all single- or multi-line block comments (but not line comments with //) by searching for the following regular expression in your project(s)/file(s) and replacing by $1:
^([^"\r\n]*?(?:(?<=')"[^"\r\n]*?|(?<!')"[^"\r\n]*?"[^"\r\n]*?)*?)(?<!/)/\*[^\*]*(?:\*+[^/][^\*]*)*?\*+/
It's possible that you have to execute it more than once.
This regular expression avoids the following pitfalls:
Code between two comments /* Comment 1 */ foo(); /* Comment 2 */
Line comments starting with an asterisk: //***NOTE***
Comment delimiters inside string literals: stringbuilder.append("/*");; also if there is a double quote inside single quotes before the comment
To remove all single-line comments, search for the following regular expression in your project(s)/file(s) and replace by $1:
^([^"\r\n]*?(?:(?<=')"[^"\r\n]*?|(?<!')"[^"\r\n]*?"[^"\r\n]*?)*?)\s*//[^\r\n]*
This regular expression also avoids comment delimiters inside double quotes, but does NOT check for multi-line comments, so /* // */ will be incorrectly removed.
I had to write somehting to do this a few weeks ago. This should handle all comments, nested or otherwise. It is long, but I haven't seen a regex version that handled nested comments properly. I didn't have to preserve javadoc, but I presume you do, so I added some code that I belive should handle that. I also added code to support the \r\n and \r line separators. The new code is marked as such.
public static String removeComments(String code) {
StringBuilder newCode = new StringBuilder();
try (StringReader sr = new StringReader(code)) {
boolean inBlockComment = false;
boolean inLineComment = false;
boolean out = true;
int prev = sr.read();
int cur;
for(cur = sr.read(); cur != -1; cur = sr.read()) {
if(inBlockComment) {
if (prev == '*' && cur == '/') {
inBlockComment = false;
out = false;
}
} else if (inLineComment) {
if (cur == '\r') { // start untested block
sr.mark(1);
int next = sr.read();
if (next != '\n') {
sr.reset();
}
inLineComment = false;
out = false; // end untested block
} else if (cur == '\n') {
inLineComment = false;
out = false;
}
} else {
if (prev == '/' && cur == '*') {
sr.mark(1); // start untested block
int next = sr.read();
if (next != '*') {
inBlockComment = true; // tested line (without rest of block)
}
sr.reset(); // end untested block
} else if (prev == '/' && cur == '/') {
inLineComment = true;
} else if (out){
newCode.append((char)prev);
} else {
out = true;
}
}
prev = cur;
}
if (prev != -1 && out && !inLineComment) {
newCode.append((char)prev);
}
} catch (IOException e) {
e.printStackTrace();
}
return newCode.toString();
}
you can try it with the java-comment-preprocessor:
java -jar ./jcp-6.0.0.jar --i:/sourceFolder --o:/resultFolder -ef:none --r
source
I made a open source library and uploaded to github, its called CommentRemover you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
There is a little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
This is an old post but this may help someone who enjoys working on command line like myself:
The perl one-liner below will remove all comments:
perl -0pe 's|//.*?\n|\n|g; s#/\*(.|\n)*?\*/##g;' test.java
Example:
cat test.java
this is a test
/**
*This should be removed
*This should be removed
*/
this should not be removed
//this should be removed
this should not be removed
this should not be removed //this should be removed
Output:
perl -0pe 's#/\*\*(.|\n)*?\*/##g; s|//.*?\n|\n|g' test.java
this is a test
this should not be removed
this should not be removed
this should not be removed
If you want get rid of multiple blank lines as well:
perl -0pe 's|//.*?\n|\n|g; s#/\*(.|\n)*?\*/##g; s/\n\n+/\n\n/g' test.java
this is a test
this should not be removed
this should not be removed
this should not be removed
EDIT: Corrected regex
Dealing with source code is hard unless you know more on the writing of comment.
In the more general case, you could have // or /* in text constants. So your really need to parse the file at a syntaxic level, not only lexical. IMHO the only bulletproof solution would be to start for example with the java parser from openjdk.
If you know that your comments are never deeply mixed with the code (in my exemple comments MUST be full lines), a python script could help
multiple = False
for line in text:
stripped = line.strip()
if multiple:
if stripped.endswith('*/'):
multiple = False
continue
elif stripped.startswith('/*'):
multiple = True
elif stripped.startswith('//'):
pass
else:
print(line)
If you are using Eclipse IDE, you could make regex do the work for you.
Open the search window (Ctrl+F), and check 'Regular Expression'.
Provide the expression as
/\*\*(?s:(?!\*/).)*\*/
Prasanth Bhate has explained it in Tool to remove JavaDoc comments?
public class TestForStrings {
/**
* The main method.
*
* #param args
* the arguments
* #throws Exception
* the exception
*/
public static void main(String args[]) throws Exception {
String[] imports = new String[100];
String fileName = "Menu.java";
// This will reference one API at a time
String line = null;
try {
FileReader fileReader = new FileReader(fileName);
// Always wrap FileReader in BufferedReader.
BufferedReader bufferedReader = new BufferedReader(fileReader);
int startingOffset = 0;
// This will reference one API at a time
List<String> lines = Files.readAllLines(Paths.get(fileName),
Charset.forName("ISO-8859-1"));
// remove single line comments
for (int count = 0; count < lines.size(); count++) {
String tempString = lines.get(count);
lines.set(count, removeSingleLineComment(tempString));
}
// remove multiple lines comment
for (int count = 0; count < lines.size(); count++) {
String tempString = lines.get(count);
removeMultipleLineComment(tempString, count, lines);
}
for (int count = 0; count < lines.size(); count++) {
System.out.println(lines.get(count));
}
} catch (FileNotFoundException ex) {
System.out.println("Unable to open file '" + fileName + "'");
} catch (IOException ex) {
System.out.println("Error reading file '" + fileName + "'");
} catch (Exception e) {
}
}
/**
* Removes the multiple line comment.
*
* #param tempString
* the temp string
* #param count
* the count
* #param lines
* the lines
* #return the string
*/
private static List<String> removeMultipleLineComment(String tempString,
int count, List<String> lines) {
try {
if (tempString.contains("/**") || (tempString.contains("/*"))) {
int StartIndex = count;
while (!(lines.get(count).contains("*/") || lines.get(count)
.contains("**/"))) {
count++;
}
int endIndex = ++count;
if (StartIndex != endIndex) {
while (StartIndex != endIndex) {
lines.set(StartIndex, "");
StartIndex++;
}
}
}
} catch (Exception e) {
// Do Nothing
}
return lines;
}
/**
* Remove single line comments .
*
* #param line
* the line
* #return the string
* #throws Exception
* the exception
*/
private static String removeSingleLineComment(String line) throws Exception {
try {
if (line.contains(("//"))) {
int startIndex = line.indexOf("//");
int endIndex = line.length();
String tempoString = line.substring(startIndex, endIndex);
line = line.replace(tempoString, "");
}
if ((line.contains("/*") || line.contains("/**"))
&& (line.contains("**/") || line.contains("*/"))) {
int startIndex = line.indexOf("/**");
int endIndex = line.length();
String tempoString = line.substring(startIndex, endIndex);
line = line.replace(tempoString, "");
}
} catch (Exception e) {
// Do Nothing
}
return line;
}
}
This is what I came up with yesterday.
This is actually homework I got from school so if anybody reads this and finds a bug before I turn it in, please leave a comment =)
ps. 'FilterState' is a enum class
public static String deleteComments(String javaCode) {
FilterState state = FilterState.IN_CODE;
StringBuilder strB = new StringBuilder();
char prevC=' ';
for(int i = 0; i<javaCode.length(); i++){
char c = javaCode.charAt(i);
switch(state){
case IN_CODE:
if(c=='/')
state = FilterState.CAN_BE_COMMENT_START;
else {
if (c == '"')
state = FilterState.INSIDE_STRING;
strB.append(c);
}
break;
case CAN_BE_COMMENT_START:
if(c=='*'){
state = FilterState.IN_COMMENT_BLOCK;
}
else if(c=='/'){
state = FilterState.ON_COMMENT_LINE;
}
else {
state = FilterState.IN_CODE;
strB.append(prevC+c);
}
break;
case ON_COMMENT_LINE:
if(c=='\n' || c=='\r') {
state = FilterState.IN_CODE;
strB.append(c);
}
break;
case IN_COMMENT_BLOCK:
if(c=='*')
state=FilterState.CAN_BE_COMMENT_END;
break;
case CAN_BE_COMMENT_END:
if(c=='/')
state = FilterState.IN_CODE;
else if(c!='*')
state = FilterState.IN_COMMENT_BLOCK;
break;
case INSIDE_STRING:
if(c == '"' && prevC!='\\')
state = FilterState.IN_CODE;
strB.append(c);
break;
default:
System.out.println("unknown case");
return null;
}
prevC = c;
}
return strB.toString();
}
private static int find(String s, String t, int start) {
int ret = s.indexOf(t, start);
return ret < 0 ? Integer.MAX_VALUE : ret;
}
private static int findSkipEsc(String s, String t, int start) {
while(true) {
int ret = find(s, t, start);
if( ret == Integer.MAX_VALUE) return -1;
int esc = find(s, "\\", start);
if( esc > ret) return ret;
start += 2;
}
}
private static String removeLineCommnt(String s) {
int i, start = 0;
while (0 <= (i = find(s, "//", start))) { //Speed it up
int j = find(s, "'", start);
int k = find(s, "\"", start);
int first = min(i, min(j, k));
if (first == Integer.MAX_VALUE) return s;
if (i == first) return s.substring(0, i);
//skipp quoted string
start = first+1;
if (k == first) { // " asdas\"dasd "
start = findSkipEsc(s,"\"",start);
if (start < 0) return s;
start++;
continue;
}
//if j == first ' asda\'sasd ' --- not in JSON
start = findSkipEsc(s,"'\"'",start);
if (start < 0) return s;
start++;
}
return s;
}
static String removeLineCommnts(String s) {
if (!s.contains("//")) return s; //Speed it up
return Arrays.stream(s.split("[\\n\\r]+")).
map(Common::removeLineCommnt).
collect(Collectors.joining("\n"));
}

Search ArrayList for certain character in string

What is the correct syntax for searching an ArrayList of strings for a single character? I want to check each string in the array for a single character.
Ultimately I want to perform multiple search and replaces on all strings in an array based on the presence of a single character in the string.
I have reviewed java-examples.com and java docs as well as several methods of searching ArrayLists. None of them do quite what I need.
P.S. Any pointers on using some sort of file library to perform multiple search and replaces would be great.
--- Edit ---
As per MightyPork's recommendations arraylist revised to use simple string type. This also made it compatible with hoosssein's solution which is included.
public void ArrayInput() {
String FileName; // set file variable
FileName = fileName.getText(); // get file name
ArrayList<String> fileContents = new ArrayList<String>(); // create arraylist
try {
BufferedReader reader = new BufferedReader(new FileReader(FileName)); // create reader
String line = null;
while ((line = reader.readLine()) != null) {
if(line.length() > 0) { // don't include blank lines
line = line.trim(); // remove whitespaces
fileContents.add(line); // add to array
}
}
for (String row : fileContents) {
System.out.println(row); // print array to cmd
}
String oldstr;
String newstr;
oldstr = "}";
newstr = "!!!!!";
for(int i = 0; i < fileContents.size(); i++) {
if(fileContents.contains(oldstr)) {
fileContents.set(i, fileContents.get(i).replace(oldstr, newstr));
}
}
for (String row : fileContents) {
System.out.println(row); // print array to cmd
}
// close file
}
catch (IOException ex) { // E.H. for try
JOptionPane.showMessageDialog(null, "File not found. Check name and directory.");
}
}
first you need to iterate the list and search for that character
string.contains("A");
for replacing the character you need to keep in mind that String is immutable and you must replace new string with old string in that list
so the code is like this
public void replace(ArrayList<String> toSearchIn,String oldstr, String newStr ){
for(int i=0;i<toSearchIn.size();i++){
if(toSearchIn.contains(oldstr)){
toSearchIn.set(i, toSearchIn.get(i).replace(oldstr, newStr));
}
}
}
For the search and replace you are better off using a dictionary, if you know that you will replace Hi with Hello. The first one is a simple search, here with the index and the string being returned in a Object[2], you will have to cast the result. It returns the first match, you were not clear on this.
public static Object[] findStringMatchingCharacter(List<String> list,
char character) {
if (list == null)
return null;
Object[] ret = new Object[2];
for (int i = 0; i < list.size(); i++) {
String s = list.get(i);
if (s.contains("" + character)) {
ret[0] = s;
ret[1] = i;
}
return ret;
}
return null;
}
public static void searchAndReplace(ArrayList<String> original,
Map<String, String> dictionary) {
if (original == null || dictionary == null)
return;
for (int i = 0; i < original.size(); i++) {
String s = original.get(i);
if (dictionary.get(s) != null)
original.set(i, dictionary.get(s));
}
}
You can try this, modify as needed:
public static ArrayList<String> findInString(String needle, List<String> haystack) {
ArrayList<String> found = new ArrayList<String>();
for(String s : haystack) {
if(s.contains(needle)) {
found.add(s);
}
}
return found;
}
(to search char, just do myChar+"" and you have string)
To add the find'n'replace functionality should now be fairly easy for you.
Here's a variant for searching String[]:
public static ArrayList<String[]> findInString(String needle, List<String[]> haystack) {
ArrayList<String[]> found = new ArrayList<String[]>();
for(String fileLines[] : haystack) {
for(String s : fileLines) {
if(s.contains(needle)) {
found.add(fileLines);
break;
}
}
}
return found;
}
You don't need to iterate over lines twice to do what you need. You can make replacement when iterating over file.
Java 8 solution
try (BufferedReader reader = Files.newBufferedReader(Paths.get("pom.xml"))) {
reader
.lines()
.filter(x -> x.length() > 0)
.map(x -> x.trim())
.map(x -> x.replace("a", "b"))
.forEach(System.out::println);
} catch (IOException e){
//handle exception
}
Another way by using iterator
public static void main(String[] args) {
ArrayList<String> list = new ArrayList<>();
list.add("Naman");
list.add("Aman");
list.add("Nikhil");
list.add("Adarsh");
list.add("Shiva");
list.add("Namit");
Iterator<String> iterator = list.iterator();
while (iterator.hasNext()) {
String next = iterator.next();
if (next.startsWith("Na")) {
System.out.println(next);
}
}
}

Java: Removing comments from string

I'd like to do a function which gets a string and in case it has inline comments it removes it. I know it sounds pretty simple but i wanna make sure im doing this right, for example:
private String filterString(String code) {
// lets say code = "some code //comment inside"
// return the string "some code" (without the comment)
}
I thought about 2 ways: feel free to advice otherwise
Iterating the string and finding double inline brackets and using substring method.
regex way.. (im not so sure bout it)
can u tell me what's the best way and show me how it should be done? (please don't advice too advanced solutions)
edited: can this be done somehow with Scanner object? (im using this object anyway)
If you want a more efficient regex to really match all types of comments, use this one :
replaceAll("(?:/\\*(?:[^*]|(?:\\*+[^*/]))*\\*+/)|(?://.*)","");
source : http://ostermiller.org/findcomment.html
EDIT:
Another solution, if you're not sure about using regex is to design a small automata like follows :
public static String removeComments(String code){
final int outsideComment=0;
final int insideLineComment=1;
final int insideblockComment=2;
final int insideblockComment_noNewLineYet=3; // we want to have at least one new line in the result if the block is not inline.
int currentState=outsideComment;
String endResult="";
Scanner s= new Scanner(code);
s.useDelimiter("");
while(s.hasNext()){
String c=s.next();
switch(currentState){
case outsideComment:
if(c.equals("/") && s.hasNext()){
String c2=s.next();
if(c2.equals("/"))
currentState=insideLineComment;
else if(c2.equals("*")){
currentState=insideblockComment_noNewLineYet;
}
else
endResult+=c+c2;
}
else
endResult+=c;
break;
case insideLineComment:
if(c.equals("\n")){
currentState=outsideComment;
endResult+="\n";
}
break;
case insideblockComment_noNewLineYet:
if(c.equals("\n")){
endResult+="\n";
currentState=insideblockComment;
}
case insideblockComment:
while(c.equals("*") && s.hasNext()){
String c2=s.next();
if(c2.equals("/")){
currentState=outsideComment;
break;
}
}
}
}
s.close();
return endResult;
}
The best way to do this is to use regular expressions.
At first to find the /**/ comments and then remove all // commnets. For example:
private String filterString(String code) {
String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "")
}
Just use the replaceAll method from the String class, combined with a simple regular expression. Here's how to do it:
import java.util.*;
import java.lang.*;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String s = "private String filterString(String code) {\n" +
" // lets say code = \"some code //comment inside\"\n" +
" // return the string \"some code\" (without the comment)\n}";
s = s.replaceAll("//.*?\n","\n");
System.out.println("s=" + s);
}
}
The key is the line:
s = s.replaceAll("//.*?\n","\n");
The regex //.*?\n matches strings starting with // until the end of the line.
And if you want to see this code in action, go here: http://www.ideone.com/e26Ve
Hope it helps!
To find the substring before a constant substring using a regular expression replacement is a bit much.
You can do it using indexOf() to check for the position of the comment start and substring() to get the first part, something like:
String code = "some code // comment";
int offset = code.indexOf("//");
if (-1 != offset) {
code = code.substring(0, offset);
}
#Christian Hujer has been correctly pointing out that many or all of the solutions posted fail if the comments occur within a string.
#Loïc Gammaitoni suggests that his automata approach could easily be extended to handle that case. Here is that extension.
enum State { outsideComment, insideLineComment, insideblockComment, insideblockComment_noNewLineYet, insideString };
public static String removeComments(String code) {
State state = State.outsideComment;
StringBuilder result = new StringBuilder();
Scanner s = new Scanner(code);
s.useDelimiter("");
while (s.hasNext()) {
String c = s.next();
switch (state) {
case outsideComment:
if (c.equals("/") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/"))
state = State.insideLineComment;
else if (c2.equals("*")) {
state = State.insideblockComment_noNewLineYet;
} else {
result.append(c).append(c2);
}
} else {
result.append(c);
if (c.equals("\"")) {
state = State.insideString;
}
}
break;
case insideString:
result.append(c);
if (c.equals("\"")) {
state = State.outsideComment;
} else if (c.equals("\\") && s.hasNext()) {
result.append(s.next());
}
break;
case insideLineComment:
if (c.equals("\n")) {
state = State.outsideComment;
result.append("\n");
}
break;
case insideblockComment_noNewLineYet:
if (c.equals("\n")) {
result.append("\n");
state = State.insideblockComment;
}
case insideblockComment:
while (c.equals("*") && s.hasNext()) {
String c2 = s.next();
if (c2.equals("/")) {
state = State.outsideComment;
break;
}
}
}
}
s.close();
return result.toString();
}
I made an open source library (on GitHub) for this purpose , its called CommentRemover you can remove single line and multiple line Java Comments.
It supports remove or NOT remove TODO's.
Also it supports JavaScript , HTML , CSS , Properties , JSP and XML Comments too.
Little code snippet how to use it (There is 2 type usage):
First way InternalPath
public static void main(String[] args) throws CommentRemoverException {
// root dir is: /Users/user/Projects/MyProject
// example for startInternalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc.. goes like that
.removeTodos(false) // Do Not Touch Todos (leave them alone)
.removeSingleLines(true) // Remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startInternalPath("src.main.app") // Starts from {rootDir}/src/main/app , leave it empty string when you want to start from root dir
.setExcludePackages(new String[]{"src.main.java.app.pattern"}) // Refers to {rootDir}/src/main/java/app/pattern and skips this directory
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
Second way ExternalPath
public static void main(String[] args) throws CommentRemoverException {
// example for externalPath
CommentRemover commentRemover = new CommentRemover.CommentRemoverBuilder()
.removeJava(true) // Remove Java file Comments....
.removeJavaScript(true) // Remove JavaScript file Comments....
.removeJSP(true) // etc..
.removeTodos(true) // Remove todos
.removeSingleLines(false) // Do not remove single line type comments
.removeMultiLines(true) // Remove multiple type comments
.startExternalPath("/Users/user/Projects/MyOtherProject")// Give it full path for external directories
.setExcludePackages(new String[]{"src.main.java.model"}) // Refers to /Users/user/Projects/MyOtherProject/src/main/java/model and skips this directory.
.build();
CommentProcessor commentProcessor = new CommentProcessor(commentRemover);
commentProcessor.start();
}
for scanner, use a delimiter,
delimiter example.
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class MainClass {
public static void main(String args[]) throws IOException {
FileWriter fout = new FileWriter("test.txt");
fout.write("2, 3.4, 5,6, 7.4, 9.1, 10.5, done");
fout.close();
FileReader fin = new FileReader("Test.txt");
Scanner src = new Scanner(fin);
// Set delimiters to space and comma.
// ", *" tells Scanner to match a comma and zero or more spaces as
// delimiters.
src.useDelimiter(", *");
// Read and sum numbers.
while (src.hasNext()) {
if (src.hasNextDouble()) {
System.out.println(src.nextDouble());
} else {
break;
}
}
fin.close();
}
}
Use a tokenizer for a normal string
tokenizer:
// start with a String of space-separated words
String tags = "pizza pepperoni food cheese";
// convert each tag to a token
StringTokenizer st = new StringTokenizer(tags," ");
while ( st.hasMoreTokens() )
{
String token = (String)st.nextToken();
System.out.println(token);
}
http://www.devdaily.com/blog/post/java/java-faq-stringtokenizer-example
It will be better if code handles single line comment and multi line comment separately . Any suggestions ?
public class RemovingCommentsFromFile {
public static void main(String[] args) throws IOException {
BufferedReader fin = new BufferedReader(new FileReader("/home/pathtofilewithcomments/File"));
BufferedWriter fout = new BufferedWriter(new FileWriter("/home/result/File1"));
boolean multilinecomment = false;
boolean singlelinecomment = false;
int len,j;
String s = null;
while ((s = fin.readLine()) != null) {
StringBuilder obj = new StringBuilder(s);
len = obj.length();
for (int i = 0; i < len; i++) {
for (j = i; j < len; j++) {
if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '*') {
j += 2;
multilinecomment = true;
continue;
} else if (obj.charAt(j) == '/' && obj.charAt(j + 1) == '/') {
singlelinecomment = true;
j = len;
break;
} else if (obj.charAt(j) == '*' && obj.charAt(j + 1) == '/') {
j += 2;
multilinecomment = false;
break;
} else if (multilinecomment == true)
continue;
else
break;
}
if (j == len)
{
singlelinecomment=false;
break;
}
else
i = j;
System.out.print((char)obj.charAt(i));
fout.write((char)obj.charAt(i));
}
System.out.println();
fout.write((char)10);
}
fin.close();
fout.close();
}
Easy solution that doesn't remove extra parts of code (like those above)
// works for any reader, you can also iterate over list of strings instead
String str="";
String s;
while ((s = reader.readLine()) != null)
{
s=s.replaceAll("//.*","\n");
str+=s;
}
str=str.replaceAll("/\\*.*\\*/"," ");

Treemap Problem

I am trying to count frequency of words in a text file. But I have to use a different approach. For example, if the file contains BRAIN-ISCHEMIA and ISCHEMIA-BRAIN, I need to count BRAIN-ISCHEMIA twice (and leaving ISCHEMIA-BRAIN) or vice versa. Here is my piece of code-
// Mapping of String->Integer (word -> frequency)
HashMap<String, Integer> frequencyMap = new HashMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
// Remove this line if you want words to be case sensitive
currentLine = currentLine.toLowerCase();
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
// Delimit words based on whitespace, punctuation, and quotes
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord, frequency + 1);
}
StringTokenizer parser2 = new StringTokenizer(currentLine2);
while (parser2.hasMoreTokens()) {
String currentWord2 = parser2.nextToken();
Integer frequency = frequencyMap.get(currentWord2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
frequency = 0;
}
frequencyMap.put(currentWord2, frequency + 1);
}
}
// Display our nice little Map
System.out.println(frequencyMap);
But for the following file-
ISCHEMIA-GLUTAMATE
ISCHEMIA-BRAIN
GLUTAMATE-BRAIN
BRAIN-TOLERATE
BRAIN-TOLERATE
TOLERATE-BRAIN
GLUTAMATE-ISCHEMIA
ISCHEMIA-GLUTAMATE
I am getting the following output-
{glutamate-brain=1, ischemia-glutamate=3, ischemia-brain=1, glutamate-ischemia=3, brain-tolerate=3, brain-ischemia=1, tolerate-brain=3, brain-glutamate=1}
The problem is in second while block I think. Any light on this problem will be highly appreciated.
From an algorithm perspective, you may want to consider the following approach:
For each string, split, then sort, then re-combine (i.e. take DEF-ABC and convert to ABC-DEF. ABC-DEF would convert to ABC-DEF). Then use that as the key for your frequency count.
If you need to hold onto the exact original item, just include that in your key - so the key would have: ordinal (the re-combined string) and original.
Disclaimer: I stole the sweet trick suggested by Kevin Day for my implementation.
I still want to post just to let you know that using the right data structure (Multiset/Bad) and the right libraries (google-guava) will not only simplify the code but also makes it efficient.
Code
public class BasicFrequencyCalculator
{
public static void main(final String[] args) throws IOException
{
#SuppressWarnings("unchecked")
Multiset<Word> frequency = Files.readLines(new File("c:/2.txt"), Charsets.ISO_8859_1, new LineProcessor() {
private final Multiset<Word> result = HashMultiset.create();
#Override
public Object getResult()
{
return result;
}
#Override
public boolean processLine(final String line) throws IOException
{
result.add(new Word(line));
return true;
}
});
for (Word w : frequency.elementSet())
{
System.out.println(w.getOriginal() + " = " + frequency.count(w));
}
}
}
public class Word
{
private final String key;
private final String original;
public Word(final String orig)
{
this.original = orig.trim();
String[] temp = original.toLowerCase().split("-");
Arrays.sort(temp);
key = temp[0] + "-"+temp[1];
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + ((getKey() == null) ? 0 : getKey().hashCode());
return result;
}
#Override
public boolean equals(final Object obj)
{
if (this == obj)
{
return true;
}
if (obj == null)
{
return false;
}
if (!(obj instanceof Word))
{
return false;
}
Word other = (Word) obj;
if (getKey() == null)
{
if (other.getKey() != null)
{
return false;
}
}
else if (!getKey().equals(other.getKey()))
{
return false;
}
return true;
}
#Override
public String toString()
{
return getOriginal();
}
public String getKey()
{
return key;
}
public String getOriginal()
{
return original;
}
}
Output
BRAIN-TOLERATE = 3
ISCHEMIA-GLUTAMATE = 3
GLUTAMATE-BRAIN = 1
ISCHEMIA-BRAIN = 1
Thanks everyone for your help. Here is how I solved it-
// Mapping of String->Integer (word -> frequency)
TreeMap<String, Integer> frequencyMap = new TreeMap<String, Integer>();
// Iterate through each line of the file
String[] temp;
String currentLine;
String currentLine2;
while ((currentLine = in.readLine()) != null) {
temp=currentLine.split("-");
currentLine2=temp[1]+"-"+temp[0];
// Iterate through each word of the current line
StringTokenizer parser = new StringTokenizer(currentLine);
while (parser.hasMoreTokens()) {
String currentWord = parser.nextToken();
Integer frequency = frequencyMap.get(currentWord);
Integer frequency2 = frequencyMap.get(currentLine2);
// Add the word if it doesn't already exist, otherwise increment the
// frequency counter.
if (frequency == null) {
if (frequency2 == null)
frequency = 0;
else {
frequencyMap.put(currentLine2, frequency2 + 1);
break;
}//else
} //if (frequency == null)
frequencyMap.put(currentWord, frequency + 1);
}//while (parser.hasMoreTokens())
}//while ((currentLine = in.readLine()) != null)
// Display our nice little Map
System.out.println(frequencyMap);

Categories