When i read java file as tokens and print it's content,
using BufferedReader and StringTokenizer,how can i print only its content without comment statements that begin with " // " , " /* */" .
I want to print content of file without these statement that used for clarify the code.
You can do that very easily using JavaParser: just parse the code specifying that you want to ignore comments and then dump the AST
CompilationUnit cu = JavaParser.parse(reader, false /*considerComments*/);
String codeWithoutComments = cu.toString();
While dumping it will reformat the code.
1 If you want to remove comments, you can well:
remove // => see the same question here, no need of regex : Find single line comments in byte array
remove /* */ it is more difficult. regex could work, but you could get a lot of pain . I dont recommend that
2 use a java parser : Java : parse java source code, extract methods
javaparser for example: https://github.com/javaparser/javaparser
then iterate the code, and remove comments, etc.
This code will remove the comment inside a text file.But, It will not remove the symbols of comment, if you need to remove it, you can do it by editing the three functions which I had written below.Test case which i had tested.
// helloworld
/* comment */
a /* comment */
b
/*
comment
*/
c
d
e
// xxxx
f // xxxx
The Output will be:
//
/* */
a /* */
b
/*
*/
c
d
e
//
f //
In this program I didn't remove the comment symbol as I was making lexical analyzer.You can remove the comment symbols by editing the program statements where i had put the comments.
public class testSpace {
public static void main(String[] args) {
try {
String filePath = "C:\\Users\\Sibil\\eclipse-workspace\\Assignment1\\src\\Input.txt";
FileReader fr = new FileReader(filePath);
String line;
BufferedReader br = new BufferedReader(fr);
int lineNumber = 0;
while ((line = br.readLine()) != null) {
lineNumber++;
if ((line.contains("/*") && line.contains("*/")) || (line.contains("//"))) {
line = findreplacement(line);
System.out.println(line);//Begining of the multiline comment
} else if (line.contains("/*")) {
line = getStartString(line);
System.out.println(line);
while ((line = br.readLine()) != null) {
lineNumber++;
if (line.contains("*/")) {
line = getEndString(line);
System.out.println(line);//Print the end of a Multline comment
break;
} else {
line = " ";
System.out.println(line);//Blank Space for commented line inside a multiline comment
}
}
} else
System.out.println(line);//Line without comment
}
} catch (Exception e) {
System.out.println(e);
}
}
private static String getEndString(String s) {
int end = s.indexOf("*/");
String lineEnd = s.substring(end, s.length());//Edit here if you don't need the comment symbol by substracting 2 or adding 2
return lineEnd;
}
private static String getStartString(String s) {
int start = s.indexOf("/*");
String lineStart = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
return lineStart;
}
private static String findreplacement(String s) {
String line = "";
if (s.contains("//")) {
int start = s.indexOf("//");
line = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
} else if ((s.contains("/*") && s.contains("*/"))) {
int start = s.indexOf("/*");
int end = s.indexOf("*/");
String lineStart = s.substring(0, start + 2);//Edit here if you don't need the comment symbol by substracting 2 or adding 2
String lineEnd = s.substring(end, s.length());//Edit here if you don't need the comment symbol by substracting 2 or adding 2
line = lineStart + " " + lineEnd;
}
return line;
}
}
If your file has a line like this,
System.out.println("Hello World/*Do Something */");
It will fail and the output will be:
System.out.println("Hello world");
I want to find names in a collection of text documents from a huge list of about 1 million names. I'm making a Pattern from the names of the list first:
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += name.replace("\"", "") + "|";
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
After doing so I got an IllegalPatternSyntax Exception because some names contain a '+' in their names or other Regex expressions. I tried solving this by either ignoring the few names by:
if(name.contains("\""){
//ignore this name }
Didn't work properly but also messy because you have to escape everything manually and run it many times and waste your time.
Then I tried using the quote method:
Pattern all = Pattern.compile(Pattern.quote(combined));
However now, I don't find any matches in the text documents anymore, even when I also use quote on the them. How can I solve this issue?
I agree with the comment of #dragon66, you should not quote pipe "|". So your code would be like the code below using Pattern.quote() :
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
String combined = "";
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined += Pattern.quote(name.replace("\"", "")) + "|"; //line changed
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
Pattern all = Pattern.compile(combined);
Also I suggest to verify if your problem domain needs optimization replacing the use of the String combined = ""; over an Immutable StringBuilder class to avoid the creation of unnecessary new strings inside a loop.
guilhermerama presented the bugfix to your code.
I will add some performance improvements. As I pointed out the regex library of java does not scale and is even slower if used for searching.
But one can do better with Multi-String-Seach algorithms. For example by using StringsAndChars String Search:
//setting up a test file
Iterable<String> lines = createLines();
Files.write(Paths.get("names.tsv"), lines , CREATE, WRITE, TRUNCATE_EXISTING);
// read the pattern from the file
BufferedReader TSVFile = new BufferedReader(new FileReader("names.tsv"));
Set<String> combined = new LinkedHashSet<>();
String dataRow = TSVFile.readLine();
dataRow = TSVFile.readLine();// skip first line (header)
while (dataRow != null) {
String[] dataArray = dataRow.split("\t");
String name = dataArray[1];
combined.add(name);
dataRow = TSVFile.readLine(); // Read next line of data.
}
TSVFile.close();
// search the pattern in a small text
StringSearchAlgorithm stringSearch = new AhoCorasick(new ArrayList<>(combined));
StringFinder finder = stringSearch.createFinder(new StringCharProvider("test " + name(38) + "\n or " + name(799) + " : " + name(99999), 0));
System.out.println(finder.findAll());
The result will be
[5:10(00038), 15:20(00799), 23:28(99999)]
The search (finder.findAll()) does take (on my computer) < 1 millisecond. Doing the same with java.util.regex took around 20 milliseconds.
You may tune this performance by using other algorithms provided by RexLex.
Setting up needs following code:
private static Iterable<String> createLines() {
List<String> list = new ArrayList<>();
for (int i = 0; i < 100000; i++) {
list.add(i + "\t" + name(i));
}
return list;
}
private static String name(int i) {
String s = String.valueOf(i);
while (s.length() < 5) {
s = '0' + s;
}
return s;
}
I am trying to duplicate the below data 1 million times and want to write to file.
row1,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,sdfsdf,sfgggf,34f
each time I want to update the first column to no of records, so my second row will be
row2,Test,2.0,1305033.0,3.0,sdfgfsg,2452345,sfgfsdg,asdfgsdfg,Gasdfgfsdgh,sdgh,sdhd sdgh,sdgh,sdgh,,sdhg,,sdgh,,,,,,,sdgh,,,,,,,,,05/12/1954,,,,,,sdghdgsh,sdfhgd,,12/25/1981,,,,12/25/1981,,,,,,,,,,,,,sdgh, dsghgh; sdgh,,,,,1.0,asrg,awrgtwag,245sfgsfg
I tried using String builder, I am not able to append more than 10,000 rows.... The program becomes very slow....
Any suggestions...
I'm fine trying to write code in other languages
The below is the code snippet which prepares the data to write to the file and in my app I'll get data as Object[]
private static void writecsv(Map<String, Object[]> data) throws Exception{
Set<String> keyset = data.keySet();
StringBuilder sb =new StringBuilder();;
for(int count=0; count < OUTPUT_RECORD_COUNT;count++)
{
for (String key : keyset)
{
Object[] objArr = data.get(key);
for (Object obj : objArr)
{
if(obj ==null)
obj=BLANK;
sb.append(obj.toString() + COMMA);
sb.toString();
}
sb.setLength(sb.length()-1);
sb.append(NEW_LINE);
}
}
System.out.print( sb.toString());
}
If you print to System.out directly in your inner for-loop, you won't have to buffer everything in memory in the StringBuilder.
You want to write to a file, but I don't see any OutputStream or FileWriter in your code.
Don't use a StringBuilder as a buffer.
private static final int OUTPUT_RECORD_COUNT = 1000000;
private static final String BLANK = "";
private static final String COMMA = ",";
private static final String FILE_ENCODING = "Cp1252"; // Windows-ANSI
/*
* Creates a String for the fields in array fields by joining
* the String values with COMMA separator.
* First character is also a COMMA because later we will put one field
* in front of the resulting string.
*/
private static String createLine(Object[] fields) {
StringBuilder sb = new StringBuilder();
for(Object field: fields) {
sb.append(COMMA).append(field == null ? BLANK : field.toString());
}
return sb.toString();
}
/*
* Added the fileName parameter.
*/
private static void writecsv(Map<String, Object[]> data, String fileName) throws Exception {
Set<String> keyset = data.keySet();
// Use a
// - FileOutputStream to write bytes to file
// - OutputStreamWriter to convert text strings to bytes according to a character encoding
// - BufferedWriter to use an in-memory buffer for writing to the file
// - PrintWriter for convencience methods like println()
PrintWriter out = new PrintWriter(new BufferedWriter(
new OutputStreamWriter(new FileOutputStream(fileName), FILE_ENCODING)));
try {
// It seems each key represents one original line
for (String key : keyset) {
// Create each line - at least the part after the "rowX" - only once.
String line = createLine(data.get(key));
// And you want every line OUTPUT_RECORD_COUNT times duplicates
for(int count=0; count < OUTPUT_RECORD_COUNT;count++) {
// Put "rowX" in front of every line, where X is the value of count.
out.print("row");
out.print(count);
out.println(line);
}
} finally {
// Close the Writer even in case of an exception.
out.flush();
out.close();
}
}
}
Ummm, have you tried using bash?
#!/bin/bash
var=1
while [ $var -le 1000000 ]
do
echo "$var" >> temp
var=$(( $var + 1 ))
done
I tried to run the program and it took around couple minutes to finish appending 1 million lines
Your code is keeping all the data in memory, which is why it cannot scale. Instead, you should open the file beforehand and then write to it line by line.
See, e.g., this answer for a simple example on how to do this.
Also note that when you are serious about writing proper CSV, you should consider using a library for that, such as opencsv. Then things like proper quoting will be handled for you.
I implemented some translational application and faced with the problem - incorrect output.
For example:
Input:
"Three predominant stories interweave: a dynastic war among several
families for control of Westeros; the rising threat of the dormant
cold supernatural Others dwelling beyond an immense wall of ice on
Westeros' northern border; and the am"
Output:
"%0D%0A%0D%0AThe+история+ -
+A+Песня+из+Лед+и+Fire+принимает++++вымышленный+континентах+Вестероса+и+Essos%2C+with+a+история++тысяч++лет.++Точка+++++главе+в+в+история+
- +a+ограниченной+перспектива+++ассортимент++символы+,+растет+from+девяти+в+в+первое++тридцать
один+++пятый+of+the+романов.+Три+преобладающим+рассказы+переплетаются%3A+a+династические+war+среди+несколько+семей+for+control++Вестероса%3B++рост+угрозу+of+the+спящие+cold+сверхъестественное+Другие+жилье+за+an+огромный+wall++лед+on+Вестероса%27+сев.
границы%3B+и++am"
I know that URLEncoder is the reason of wrong output (all these "+" and "%"), but don't know how to fix it.
Here is some code:
// This method should take an original text that should be
// translated and encode it to use as URL parameter.
private String encodeText(String text) throws IOException {
return URLEncoder.encode(text, "UTF-8");
}
// It shold “extract” translated text from Yandex Translator response.
// More details about response format you can find at
// http://api.yandex.ru/translate/doc/dg/reference/translate.xml,
// we need to use XML interface.
private String parseContent(String content)
throws UnsupportedEncodingException {
String begin = "<text>";
String end = "</text>";
String result = "";
int i, j;
i = content.indexOf(begin);
j = content.indexOf(end);
if ((i != -1) && (j != -1)) {
result = content.substring((i + begin.length()), j);
}
return new String(result.getBytes(), "UTF-8");
}
// method translate() should return translation of original text.
// urlSourceProvider loads translated text
public String translate(String original) throws IOException {
return parseContent(urlSourceProvider
.load(prepareURL(encodeText(original))));
}
Try:
String result = URLDecoder.decode(variable, "UTF-8");
it should decode your text.
To speed-up a lookup search into a multi-record file I wish to store its elements into a String array of array so that I can just search a string like "AF" into similar strings only ("AA", "AB, ... , "AZ") and not into the whole file.
The original file is like this:
AA
ABC
AF
(...)
AP
BE
BEND
(...)
BZ
(...)
SHORT
VERYLONGRECORD
ZX
which I want to translate into
AA ABC AF (...) AP
BE BEND (...) BZ
(...)
SHORT
VERYLONGRECORD
ZX
I don't know how much records there are and how many "elements" each "row" will have as the source file can change in the time (even if, after being read into memory, the array is only read).
I tried whis solution:
in a class I defined the string array of (string) arrays, without defining its dimensions
public static String[][] tldTabData;
then, in another class, I read the file:
public static void tldLoadTable() {
String rec = null;
int previdx = 0;
int rowidx = 0;
// this will hold each row
ArrayList<String> mVector = new ArrayList<String>();
FileInputStream fStream;
BufferedReader bufRead = null;
try {
fStream = new FileInputStream(eVal.appPath+eVal.tldTabDataFilename);
// Use DataInputStream to read binary NOT text.
bufRead = new BufferedReader(new InputStreamReader(fStream));
} catch (Exception er1) {
/* if we fail the 1.st try maybe we're working into some "package" (e.g. debugging)
* so we'll try a second time with a modified path (e.g. adding "bin\") instead of
* raising an error and exiting.
*/
try {
fStream = new FileInputStream(eVal.appPath +
"bin"+ File.separatorChar + eVal.tldTabDataFilename);
// Use DataInputStream to read binary NOT text.
bufRead = new BufferedReader(new InputStreamReader(fStream));
} catch (FileNotFoundException er2) {
System.err.println("Error: " + er2.getMessage());
er2.printStackTrace();
System.exit(1);
}
}
try {
while((rec = bufRead.readLine()) != null) {
// strip comments and short (empty) rows
if(!rec.startsWith("#") && rec.length() > 1) {
// work with uppercase only (maybe unuseful)
//rec.toUpperCase();
// use the 1st char as a row index
rowidx = rec.charAt(0);
// if row changes (e.g. A->B and is not the 1.st line we read)
if(previdx != rowidx && previdx != 0)
{
// store the (completed) collection into the Array
eVal.tldTabData[previdx] = mVector.toArray(new String[mVector.size()]);
// clear the collection itself
mVector.clear();
// and restart to fill it from scratch
mVector.add(rec);
} else
{
// continue filling the collection
mVector.add(rec);
}
// and sync the indexes
previdx = rowidx;
}
}
streamIn.close();
// globally flag the table as loaded
eVal.tldTabLoaded = true;
} catch (Exception er2) {
System.err.println("Error: " + er2.getMessage());
er2.printStackTrace();
System.exit(1);
}
}
When executing the program, it correctly accumulates the strings into mVector but, when trying to copy them into the eVal.tldTabData I get a NullPointerException.
I bet I have to create/initialize the array at some point but having problems to figure where and how.
First time I'm coding in Java... helloworld apart. :-)
you can use a Map to store your strings per row;
here something that you'll need :
//Assuming that mVector already holds all you input strings
Map<String,List<String>> map = new HashMap<String,List<String>>();
for (String str : mVector){
List<String> storedList;
if (map.containsKey(str.substring(0, 1))){
storedList = map.get(str.substring(0, 1));
}else{
storedList = new ArrayList<String>();
map.put(str.substring(0, 1), storedList);
}
storedList.add(str);
}
Set<String> unOrdered = map.keySet();
List<String> orderedIndexes = new ArrayList<String>(unOrdered);
Collections.sort(orderedIndexes);
for (String key : orderedIndexes){//get strings for every row
List<String> values = map.get(key);
for (String value : values){//writing strings on the same row
System.out.print(value + "\t"); // change this to writing to some file
}
System.out.println(); // add new line at the end of the row
}