I'm trying to write a small program that detect comments in a code file, and tag them by a index-tag, meaning a tag with an increasing value.
For example this input:
method int foo (int y) {
int temp; // FIRST COMMENT
temp = 63; // SECOND COMMENT
// THIRD COMMENT
}
should be change to:
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_1>// SECOND COMMENT</TAG>
<TAG_2>// THIRD COMMENT</TAG>
}
I tried the following code:
String prefix, suffix;
String pattern = "(//.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
int i = 0;
suffix = "</TAG>";
while (m.find()) {
prefix = "<TAG_" + i + ">";
System.out.println(m.replaceAll(prefix + m.group() + suffix));
i++;
}
The output for the above code is:
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_0>// SECOND COMMENT</TAG>
<TAG_0>// THIRD COMMENT</TAG>
}
To replace occurrences of detected patterns, you should use the Matcher#appendReplacement method which fills a StringBuffer:
StringBuffer sb = new StringBuffer();
while (m.find()) {
prefix = "<TAG_" + i + ">";
m.appendReplacement(sb, prefix + m.group() + suffix);
i++;
}
m.appendTail(sb); // append the rest of the contents
The reason replaceAll will do the wrong replacement is that it will have the Matcher scan the whole string to replace every matched pattern with <TAG_0>...</TAG>. In effect, the loop would only execute once.
Have you tried reading the file per line, like:
String prefix, suffix;
suffix = " </TAG>";
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
int i = 0;
for (String line; (line = br.readLine()) != null;) {
if (line.contains("//")) {
prefix = "<TAG_" + i + ">//";
System.out.println(line.split("//*")[0] + " " + prefix + line.split("//*")[1] + suffix);
i++;
}
}
} catch (IOException e) {
}
fichiertexte.txt :
method int foo (int y) {
int temp; // FIRST COMMENT
temp = 63; // SECOND COMMENT
// THIRD COMMENT
}
App.java :
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class App {
public static void main(String[] args) {
String fileText = "";
String fichier = "fichiertexte.txt";
// lecture du fichier texte
try {
InputStream ips = new FileInputStream(fichier);
InputStreamReader ipsr = new InputStreamReader(ips);
BufferedReader br = new BufferedReader(ipsr);
String ligne;
while ((ligne = br.readLine()) != null) {
//System.out.println(ligne);
fileText += ligne + "\n";
}
br.close();
} catch (Exception e) {
System.err.println(e.toString());
}
String prefix, suffix;
String pattern = "(//.*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);
int i = 0;
suffix = "</TAG>";
StringBuffer sb = new StringBuffer();
while (m.find()) {
prefix = "<TAG_" + i + ">";
m.appendReplacement(sb, prefix + m.group() + suffix);
i++;
}
System.out.println(sb.toString());
}
}
System.out :
method int foo (int y) {
int temp; <TAG_0>// FIRST COMMENT</TAG>
temp = 63; <TAG_1>// SECOND COMMENT</TAG>
<TAG_2>// THIRD COMMENT</TAG>
}
Related
This is what i have for now. I want to know, how many times i have some word in .txt document . Now i am trying to use BufferedReader didn't manage well enough. I guess here is a easier way to solve this, but i don't know.
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
public class TekstiAnalüsaator {
public static void main(String[] args) throws Exception {
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
while (rida != null){
System.out.println("Reading: " + rida);
rida = puhverdab.readLine();
}
puhverdab.close();
}
}
I want to search words using this structure. What file, then what word i need to find, (return) how many times, this word is in the file.
TekstiAnalüsaator analüsaator = new TekstiAnalüsaator("kiri.txt");
int esinemisteArv = analüsaator.sõneEsinemisteArv("kala");
Please see the code example below. This should solve the issue you are facing.
import java.io.*;
public class CountWords {
public static void main(String args[]) throws IOException {
System.out.println(count("Test.java", "static"));
}
public static int count(String filename, String wordToSearch) throws IOException {
int tokencount = 0;
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
String s;
int linecount = 0;
String line;
while ((s = br.readLine()) != null) {
if (s.contains(wordToSearch))
tokencount++;
// System.out.println(s);
}
return tokencount;
}
}
It is a bit of a tricky question because counting words in a string is not so simple task. Your approach is fine for reading the file line by line so now the problem is how to count the word matches.
For example you can do the simple check for matches like that:
public static int getCountOFWordsInLine(String line, String test){
int count=0;
int index=0;
while(line.indexOf(test,index ) != -1) {
count++;
index=line.indexOf(test,index)+1;
}
return count;
}
The problem with that approach is that if your word is "test" and your string is "Next word matches asdfatestsdf" it will count it as a match. So you can try using some more advanced regex:
public static int getCountOFWordsInLine(String line, String word) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+word+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
It actually checks for the word surrounded by \b which is word break
It still won't find the word if it start with uppercase though. If you want to make it case insensitive you can modify the previous method by changing everything to lowercase prior to searching. But it depends on your definition of word.
The whole program will become:
public class MainClass {
public static void main(String[] args) throws InterruptedException {
try {
InputStream baidid = new FileInputStream("c:\\test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida = puhverdab.readLine();
String word="test";
int count=0;
while (rida != null){
System.out.println("Reading: " + rida);
count+=getCountOFWordsInLine(rida,word );
rida = puhverdab.readLine();
}
System.out.println("count:"+count);
puhverdab.close();
}catch(Exception e) {
e.printStackTrace();
}
}
public static int getCountOFWordsInLine(String line, String test) {
int count = 0;
Pattern pattern = Pattern.compile("\\b"+test+"\\b");
Matcher matcher = pattern.matcher(line);
while (matcher.find())
count++;
return count;
}
}
import java.io.*;
import java.until.regex.*;
public class TA
{
public static void main(String[] args) throws Exception
{
InputStream baidid = new FileInputStream("test.txt");
InputStreamReader tekst = new InputStreamReader(baidid, "UTF-8");
BufferedReader puhverdab = new BufferedReader(tekst);
String rida;
String word = argv[0]; // search word passed via command line
int count1=0, count2=0, count3=0, count4=0;
Pattern P1 = Pattern.compile("\\b" + word + "\\b");
Pattern P2 = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE);
while ((rida = puhverdab.readLine()) != null)
{
System.out.println("Reading: " + rida);
// Version 1 : counts lines containing [word]
if (rida.contains(word)) count1++;
// Version 2: counts every instance of [word]
into pos=0;
while ((pos = rida.indexOf(word, pos)) != -1) { count2++; pos++; }
// Version 3: looks for surrounding whitespace
Matcher m = P1.matcher(rida);
while (m.find()) count3++;
// Version 4: looks for surrounding whitespace (case insensitive)
Matcher m = P2.matcher(rida);
while (m.find()) count4++;
}
System.out.println("Found exactly " + count1 + " line(s) containing word: \"" + word + "\"");
System.out.println("Found word \"" + word + "\" exactly " + count2 + " time(s)");
System.out.println("Found word \"" + word + "\" surrounded by whitespace " + count3 + " time(s).");
System.out.println("Found, case insensitive search, word \"" + word + "\" surrounded by whitespace " + count4 + " time(s).");
puhverdab.close();
}
}
This reads line-by-line as you've already done, splits a line by whitespace to obtain individual words, and checks each word for a match.
int countWords(String filename, String word) throws Exception {
InputStream inputStream = new FileInputStream(filename);
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
BufferedReader reader = new BufferedReader(inputStreamReader);
int count = 0;
String line = reader.readLine();
while (line != null) {
String[] words = line.split("\\s+");
for (String w : words)
if (w.equals(word))
count++;
line = reader.readLine();
}
reader.close();
return count;
}
So I have pretty much completed (I think) my wc program in Java, that takes a filename from a user input (even multiple), and counts the lines, words, bytes (number of characters) from the file. There were 2 files provided for testing purposes, and they are in a .dat format, being readable from dos/linux command lines. Everything is working properly except for the count when there are \n or \r\n characters at the end of line. It will not count these. Please help?
import java.io.*;
import java.util.regex.Pattern;
public class Prog03 {
private static int totalWords = 0, currentWords = 0;
private static int totalLines =0, currentLines = 0;
private static int totalBytes = 0, currentBytes = 0;
public static void main(String[] args) {
System.out.println("This program determines the quantity of lines, words, and bytes\n" +
"in a file or files that you specify.\n" +
"\nPlease enter one or more file names, comma-separated: ");
getFileName();
System.out.println();
} // End of main method.
public static void countSingle (String fileName, BufferedReader in) {
try {
String line;
String[] words;
//int totalWords = 0;
int totalWords1 = 0;
int lines = 0;
int chars = 0;
while ((line = in.readLine()) != null) {
lines++;
currentLines = lines;
chars += line.length();
currentBytes = chars;
words = line.split(" ");
totalWords1 += countWords(line);
currentWords = totalWords1;
} // End of while loop.
System.out.println(currentLines + "\t\t" + currentWords + "\t\t" + currentBytes + "\t\t"
+ fileName);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static void countMultiple(String fileName, BufferedReader in) {
try {
String line;
String[] words;
int totalWords1 = 0;
int lines = 0;
int chars = 0;
while ((line = in.readLine()) != null) {
lines++;
currentLines = lines;
chars += line.length();
currentBytes = chars;
words = line.split(" ");
totalWords1 += countWords(line);
currentWords = totalWords1;
} // End of while loop.
totalLines += currentLines;
totalBytes += currentBytes;
totalWords += totalWords1;
} catch (Exception ex) {
ex.printStackTrace();
}
} // End of method count().
private static long countWords(String line) {
long numWords = 0;
int index = 0;
boolean prevWhitespace = true;
while (index < line.length()) {
char c = line.charAt(index++);
boolean currWhitespace = Character.isWhitespace(c);
if (prevWhitespace && !currWhitespace) {
numWords++;
}
prevWhitespace = currWhitespace;
}
return numWords;
} // End of method countWords().
private static void getFileName() {
BufferedReader in ;
try {
in = new BufferedReader(new InputStreamReader(System.in));
String fileName = in.readLine();
String [] files = fileName.split(", ");
System.out.println("Lines\t\tWords\t\tBytes" +
"\n--------\t--------\t--------");
for (int i = 0; i < files.length; i++) {
FileReader fileReader = new FileReader(files[i]);
in = new BufferedReader(fileReader);
if (files.length == 1) {
countSingle(files[0], in);
in.close();
}
else {
countMultiple(files[i], in);
System.out.println(currentLines + "\t\t" +
currentWords + "\t\t" + currentBytes + "\t\t"
+ files[i]);
in.close();
}
}
if (files.length > 1) {
System.out.println("----------------------------------------" +
"\n" + totalLines + "\t\t" + totalWords + "\t\t" + totalBytes + "\t\tTotals");
}
}
catch (FileNotFoundException ioe) {
System.out.println("The specified file was not found. Please recheck "
+ "the spelling and try again.");
ioe.printStackTrace();
}
catch (IOException ioe) {
ioe.printStackTrace();
}
}
} // End of class
that is the entire program, if anyone helping should need to see anything, however this is where I count the length of each string in a line (and I assumed that the eol characters would be part of this count, but they aren't.)
public static void countMultiple(String fileName, BufferedReader in) {
try {
String line;
String[] words;
int totalWords1 = 0;
int lines = 0;
int chars = 0;
while ((line = in.readLine()) != null) {
lines++;
currentLines = lines;
**chars += line.length();**
currentBytes = chars;
words = line.split(" ");
totalWords1 += countWords(line);
currentWords = totalWords1;
} // End of while loop.
totalLines += currentLines;
totalBytes += currentBytes;
totalWords += totalWords1;
} catch (Exception ex) {
ex.printStackTrace();
}
}
BufferedReader always ignores new line or line break character. There is no way to do this using readLine().
You can use read() method instead. But in that case you have to read each character individually.
just a comment, to split a line to words, it is not enough to split based on single space: line.split(" "); you will miss if there are multiple spaces or tabs between words. better to do split on any whitespace char line.split("\\s+");
Sorry for my English. Read file character by character. I need to output a string that meets our expression, in this case ([a-zA-Z0-9]){0,}. That's how I did it: if a line break occurs, then there is a check on the regular expression. Why does not it work?
UDP code:
public static void main(String[] args) throws FileNotFoundException, IOException {
int endWord = wordToFind.length();
int startWord = 0;
String myWord = "";
int numbLine = 1;
char[] barray = new char[1024];
StringBuilder stringB = new StringBuilder();
Pattern p = Pattern.compile("([a-zA-Z0-9]){0,}");
try(BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("text.txt"))))
{
startTime = System.currentTimeMillis();
int value;
while((value = reader.read(barray, 0, barray.length)) != -1) {
for(int i = 0; i < value; i++) {
stringB.append(barray[i]);
if( Character.toString(barray[i]).equals("\n") ) {
Matcher m = p.matcher(stringB.toString());
if(m.matches()) {
finwWordLine = true;
}
if(finwWordLine) {
System.out.println(numbLine + ": " +stringB.toString());
}
stringB.delete(0, stringB.length());
finwWordLine = false;
numbLine++;
}
}
}
reader.close();
}catch(Exception e){
System.out.println("Error : "+e);
}
}
UPD text.txt
one line
two line
three line
asd asdas das dasd asd
asdasdasd
five line
one
asdasd asd asd asd
Try this regex instead:
[a-zA-Z0-9]*
I am having some trouble reading in a file and removing all of the punctuation from the file.
Below is what I currently have and I can not figure out why "----" and "*****" would still occur.
Can anyone point me in a direction to figure out how I need to adjust my replaceAll() in order to make sure repeated occurrences of punctuation can be removed?
public void analyzeFile(File filepath) {
try {
FileInputStream fStream = new FileInputStream(filepath);
DataInputStream in = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String textFile = "";
String regex = "[a-zA-Z0-9\\s]";
String putString = "";
wordCount = 0;
while ((textFile = br.readLine()) != null) {
if (!textFile.equals("") && textFile.length() > 0) {
String[] words = textFile.split(" ");
wordCount += words.length;
for (int i = 0; i < words.length; i++) {
putString = cleanString(regex, words[i]);
if(putString.length() > 0){
mapInterface.put(putString, 1);
}
}
putString = "";
}
}
in.close();
} catch (Exception e) {
System.out.println("Error while attempting to read file: "
+ filepath + " " + e.getMessage());
}
}
private String cleanString(String regex, String str){
String newString = "";
Pattern regexChecker = Pattern.compile(regex);
Matcher regexMatcher = regexChecker.matcher(str);
while(regexMatcher.find()){
if(regexMatcher.group().length() != 0){
newString += regexMatcher.group().toString();
}
}
return newString;
}
Surely you can use the \w escaped alphanumeric character? This will recognise all letters and numbers, but not punctuation.
putString = words[i].replaceAll("[^\w]+", "");
This replaces any non-word character with an empty string.
I have written code in Java to read the content of a file. But it is working for small line of file only not for more than 1000 line of file. Please tell me me what error I have made in the below program.
program:
import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class aaru
{
public static void main(String args[]) throws FileNotFoundException
{
File sourceFile = new File("E:\\parser\\parse3.txt");
File destinationFile = new File("E:\\parser\\new.txt");
FileInputStream fileIn = new FileInputStream(sourceFile);
FileOutputStream fileOut = new FileOutputStream(destinationFile);
DataInputStream dataIn = new DataInputStream(fileIn);
DataOutputStream dataOut = new DataOutputStream(fileOut);
String str = "";
String[] st;
String sub[] = null;
String word = "";
String contents = "";
String total = "";
String stri = "";
try
{
while ((contents = dataIn.readLine()) != null)
{
total = contents.replaceAll(",", "");
String str1 = total.replaceAll("--", "");
String str2 = str1.replaceAll(";", "");
String str3 = str2.replaceAll("&", "");
String str4 = str3.replaceAll("^", "");
String str5 = str4.replaceAll("#", "");
String str6 = str5.replaceAll("!", "");
String str7 = str6.replaceAll("/", "");
String str8 = str7.replaceAll(":", "");
String str9 = str8.replaceAll("]", "");
String str10 = str9.replaceAll("\\?", "");
String str11 = str10.replaceAll("\\*", "");
String str12 = str11.replaceAll("\\'", "");
Pattern pattern =
Pattern.compile("\\s+", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE);
Matcher matcher = pattern.matcher(str12);
//boolean check = matcher.find();
String result = str12;
Pattern p = Pattern.compile("^www\\.|\\#");
Matcher m = p.matcher(result);
stri = m.replaceAll(" ");
int i;
int j;
st = stri.split("\\.");
for (i = 0; i < st.length; i++)
{
st[i] = st[i].trim();
/*if(st[i].startsWith(" "))
st[i]=st[i].substring(1,st[i].length);*/
sub = st[i].split(" ");
if (sub.length > 1)
{
for (j = 0; j < sub.length - 1; j++)
{
word = word + sub[j] + "," + sub[j + 1] + "\r\n";
}
}
else
{
word = word + st[i] + "\r\n";
}
}
}
System.out.println(word);
dataOut.writeBytes(word + "\r\n");
fileIn.close();
fileOut.close();
dataIn.close();
dataOut.close();
} catch (Exception e)
{
System.out.print(e);
}
}
}
It's not immediately obvious why your code doesn't read full files, but here are two hints:
First: Don't use a DataInputStream for reading full lines. Instead wrap your FileInputStream in a InputStreamReader (ideally providing an encoding) and a BufferedReader (as documented by the JavaDoc of DataInputStream.readLine()):
Like this:
BufferedReader reader = new BufferedReader(new InputStreamReader(fileIn, "UTF-8"));
Second: when you don't know how to handle an exception at least print its stack trace like this:
catch(Exception e)
{
e.printStackTrace();
}