Implemented Jaccard distance into ANTLR to find similarity of java code

Implemented Jaccard distance into ANTLR to find similarity of java code - java

After a while, I was successfully able to get an unique id from a file .java using ANTLR. And then I divide that unique id to 4-gram using N-gram, thanks to ANTLR. This is my code:
public void runAlgoritma(File mainFile, List<String> fileJlist)
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(FileUtama.getAbsolutePath()));
} catch (FileNotFoundException e1) {
e1.printStackTrace();
}
final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
lexer.preserveWhitespacesAndComments = false;
try {
lexer.setCharStream(new ANTLRReaderStream(in));
} catch (IOException e) {
e.printStackTrace();
}
final CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
tokens.LT(10); // paksa force load
Antlr3JavaParser parser = new Antlr3JavaParser(tokens);
StringBuilder sbr = new StringBuilder();
List tokenList = tokens.getTokens();
for (int i = 0; i < tokenList.size(); i++) {
org.antlr.runtime.Token token = (org.antlr.runtime.Token) tokenList.get(i);
int text = token.getType();
sbr.append(text);
}
String mainFile = sbr.toString();
StringBuffer stringBuffer = new StringBuffer();
for (String term : new NgramAnalyzer(4).analyzer(mainFile)) {
stringBuffer.append(term + "\n");
}
System.out.println(stringBuffer);
I was wondering, How can I compare two java source codes using jaccard similiarity from the n-gram that I have made ?

Related

How to search string in a file and then search that string in another file

I'm trying to create a java program that can read a file named file1.txt and store its strings and search those strings to another file named file2.txt and if the match is not found then print that particular string from file1.txt.
public static void main(String[] args)
{
try
{
BufferedReader word_list = new BufferedReader(new FileReader("file1.txt"));
BufferedReader eng_dict = new BufferedReader(new FileReader("file2.txt"));
String spelling_word = word_list.readLine();
String eng_dict_word = eng_dict.readLine();
while (spelling_word != null)
{
System.out.println(spelling_word);
spelling_word = word_list.readLine();
if(eng_dict_word.contains(spelling_word))
{
System.out.println("Word found "+spelling_word);
}
else
{
System.out.println("Word not found "+spelling_word);
}
}
word_list.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
Right now I'm able to get data from file1.txt but unable to search file1's data for example to search word "Home" in file2.txt
See that here File1.txt contains Homee and File2.txt has Home, so Homee should be print

first, you need to read first file. Preferably to SET() as it will get rid of duplicate strings. You will have set1
When this is done, you need to read second file, and do the same. You will get set2
Now, you have need to use RemoveAll() method on set1 with set2 as parameter.
What is remaining in set1 needs to be printed on scren. You can do it with lambda.
See THIS to get how to read file.
see code below:
Set<String> set1 = new HashSet<>();
Set<String> set2 = new HashSet<>();
try (FileReader reader = new FileReader("file1.txt");
BufferedReader br = new BufferedReader(reader)) {
// read line by line
String line;
while ((line = br.readLine()) != null) {
set1.add(line);
}
} catch (IOException e) {
System.err.format("IOException: %s%n", e);
}
try (FileReader reader = new FileReader("file2.txt");
BufferedReader br = new BufferedReader(reader)) {
// read line by line
String line;
while ((line = br.readLine()) != null) {
set2.add(line);
}
} catch (IOException e) {
System.err.format("IOException: %s%n", e);
}
set1.removeAll(set2);
set1.forEach(System.out::println);

Use Regex for the specific need, below is the refactored code for your problem. Let me know if this works for you.
public static void main(String[] args) throws IOException
{
try
{
BufferedReader word_list = new BufferedReader(new FileReader("resources/file1.txt"));
BufferedReader eng_dict = new BufferedReader(new FileReader("resources/file2.txt"));
String spelling_word = word_list.readLine();
String eng_dict_word = eng_dict.readLine();
int matchFound = 0;
Matcher m = null;
while (spelling_word != null)
{
// creating the pattern for checking for the exact match on file2.txt
String spelling_word_pattern = "\\b" + spelling_word + "\\b";
Pattern p = Pattern.compile(spelling_word_pattern);
while(eng_dict_word !=null) {
m = p.matcher(eng_dict_word);
if(m.find()) {
matchFound = 1;
break;
}
eng_dict_word = eng_dict.readLine();
}
if(matchFound == 1) {
System.out.println("Word found " + m.group());
}else {
System.out.println("Word not found "+ spelling_word);
}
spelling_word = word_list.readLine();
eng_dict = new BufferedReader(new FileReader("resources/file2.txt"));
eng_dict_word = eng_dict.readLine();
matchFound = 0;
}
word_list.close();
eng_dict.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
file1.txt content
Homeee
Ho
hello
hom
xx
Me
file2.txt content
Home
Me
H
Homey
Result
Word not found Homeee
Word not found Ho
Word not found hello
Word not found hom
Word not found xx
Word found Me

Split file in chunk when fine head record (java 8)

I've a piece of code that "split" a file in some chunks when find a start record.
List<StringBuilder> list = new ArrayList<>();
StringBuilder jc = null;
try (BufferedReader br = Files.newBufferedReader(Paths.get("")) {
for (String line = br.readLine(); line != null; line = br.readLine()) {
if (line.startsWith("REQ00")) {
jc = new StringBuilder();
list.add(jc);
}
jc.append(line);
}
} catch (IOException e) {
e.printStackTrace();
}
Is there any way to "convert" this code into Java 8 Stream way ?

Use the right tool for the job. With Scanner, it’s as simple as
List<String> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
while(s.hasNext()) list.add(s.next());
} catch (IOException e) {
e.printStackTrace();
}
Now your code has the special requirements of creating StringBuilders and not retaining the line breaks. So the extended version is:
List<StringBuilder> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
while(s.hasNext()) list.add(new StringBuilder(s.next().replaceAll("\\R", "")));
} catch (IOException e) {
e.printStackTrace();
}
A more efficient variant is
List<StringBuilder> list = new ArrayList<>();
try(Scanner s = new Scanner(Paths.get(path))) {
s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE));
while(s.hasNext()) list.add(toStringBuilderWithoutLinebreaks(s.next()));
} catch (IOException e) {
e.printStackTrace();
}
…
static final Pattern LINE_BREAK = Pattern.compile("\\R");
static StringBuilder toStringBuilderWithoutLinebreaks(String s) {
Matcher m = LINE_BREAK.matcher(s);
if(!m.find()) return new StringBuilder(s);
StringBuilder sb = new StringBuilder(s.length());
int last = 0;
do { sb.append(s, last, m.start()); last = m.end(); } while(m.find());
return sb.append(s, last, s.length());
}
Starting with Java 9, you can also use a Stream operation for it:
List<StringBuilder> list;
try(Scanner s = new Scanner(Paths.get(path))) {
list = s.useDelimiter(Pattern.compile("^(?=REQ00)", Pattern.MULTILINE))
.tokens()
.map(string -> toStringBuilderWithoutLinebreaks(string))
.collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
list = List.of();
}

Map<Integer, String> chunks = Files.lines(Paths.get("")).collect(
Collectors.groupingBy(
new Function<String, Integer>(){
Integer lastKey = 0;
public Integer apply(String s){
if(s.startsWith("REQ00")){
lastKey = lastKey+1;
}
return lastKey;
}
}, Collectors.joining()));
I just used joining, which creates a string instead of a string builder. It could be replaced with a collector that uses string builder, or the strings could be changed to string builders afterwards.

Why is there no output when I run my code?

I am trying to read a file called ecoli.txt, which contains the DNA sequence for ecoli, and store its contents into a string. I tried to print the string to test my code. However, when I run the program, there is no output. I am still new to java so I am sure there is an error in my code, I just need help finding it.
package codons;
import java.io.*;
public class codons
{
public static void main(String[] args)
{
try
{
FileReader codons = new FileReader("codons.txt");
FileReader filereader = new FileReader("ecoli.txt");
BufferedReader ecoli = new BufferedReader(filereader);
StringBuilder dna_string = new StringBuilder();
String line = ecoli.readLine();
while(line != null);
{
dna_string.append(line);
line = ecoli.readLine();
}
String string = new String(dna_string);
System.out.println(string);
ecoli.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
edit:
I was still having trouble getting the program to work the way I wanted it to so I attempted to complete writing the rest of what I wanted in the program and I am still not getting any output. Anyway, this is where I am at now:
package codons;
import java.io.*;
import java.util.*;
import java.lang.*;
import java.text.*;
public class codons
{
public static void main(String[] args)
{
try
{
FileReader filecodons = new FileReader("codons.txt");
FileReader filereader = new FileReader("ecoli.txt");
BufferedReader ecoli = new BufferedReader(filereader);
StringBuilder dna_sb = new StringBuilder();
String line = ecoli.readLine();
while(line != null)
{
dna_sb.append(line);
line = ecoli.readLine();
}
String dna_string = new String(dna_sb);
ecoli.close();
BufferedReader codons = new BufferedReader(filecodons);
StringBuilder codon_sb = new StringBuilder();
String codon = codons.readLine();
while(codon != null)
{
codon_sb.append(codon);
line = codons.readLine();
}
String codon_string = new String(codon_sb);
codons.close();
for(int x = 0; x <= codon_sb.length(); x++)
{
int count = 0;
String codon_ss = new String(codon_string.substring(x, x+3));
for(int i = 0; i <= dna_sb.length(); i++)
{
String dna_ss = new String(dna_string.substring(i, i+3));
int result = codon_ss.compareTo(dna_ss);
if(result == 0)
{
count += 1;
}
}
System.out.print("The codon '");
System.out.print(codon_ss);
System.out.print("'is in the dna sequence");
System.out.print(count);
System.out.println("times.");
}
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}

Remove the ; after while(line != null), it causes an infinite loop instead of executing the next instructions.
The reason is explained here: Effect of semicolon after 'for' loop (the question is about the C language, but it is equivalent in Java).

Regular Expression..Splitting a string array twice

I have a text file with state-city values:-
These are the contents in my file:-
Madhya Pradesh-Bhopal
Goa-Bicholim
Andhra Pradesh-Guntur
I want to split the state and the city... Here is my code
FileInputStream fis= new FileInputStream("StateCityDetails.txt");
BufferedInputStream bis = new BufferedInputStream(fis);
int h=0;
String s;
String[] str=null;
byte[] b= new byte[1024];
while((h=bis.read(b))!=-1){
s= new String(b,0,h);
str= s.split("-");
}
for(int i=0; i<str.length;i++){
System.out.println(str[1]); ------> the value at 1 is Bhopal Goa
}
}
Also I have a space between Madhya Pradesh..
So i want to Remove spaces between the states in the file and also split the state and city and obtain this result:-
str[0]----> MadhyaPradesh
str[1]----> Bhopal
str[2]-----> Goa
str[3]----->Bicholim
Please Help..Thank you in advance :)

I would use a BufferedReader here, rather than the way you are doing it. The code snippet below reads each line, split on hyphen (-), and removes all whitespace from each part. Each component is entered into a list, in left to right (and top to bottom) order. The list is converted to an array at the end in case you need this.
List<String> names = new ArrayList<String>();
BufferedReader br = null;
try {
String currLine;
br = new BufferedReader(new FileReader("StateCityDetails.txt"));
while ((currLine = br.readLine()) != null) {
String[] parts = currLine.split("-");
for (int i=0; i < parts.length; ++i) {
names.add(parts[i].replaceAll(" ", ""));
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null) br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
// convert the List to an array of String (if you require it)
String[] nameArr = new String[names.size()];
nameArr = names.toArray(nameArr);
// print out result
for (String val : nameArr) {
System.out.println(val);
}

splitting of csv file based on column value in java

I want to split csv file into multiple csv files depending on column value.
Structure of csv file: Name,Id,Dept,Course
abc,1,CSE,Btech
fgj,2,EE,Btech
(Rows are not separated by ; at end)
If value of Dept is CSE or ME , write it to file1.csv, if value is ECE or EE write it to file2.csv and so on.
Can I use drools for this purpose? I don't know drools much.
Any help how it can be done?
This is what I have done yet:
public void run() {
String csvFile = "C:/csvFiles/file1.csv";
BufferedReader br = null;
BufferedWriter writer=null,writer2=null;
String line = "";
String cvsSplitBy = ",";
String FileName = "C:/csvFiles/file3.csv";
String FileName2 = "C:/csvFiles/file4.csv";
try {
writer = new BufferedWriter(new FileWriter(FileName));
writer2 = new BufferedWriter(new FileWriter(FileName2));
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
String[] values=line.split(cvsSplitBy);
if(values[2].equals("CSE"))
{
writer.write(line);
}
else if(values[2].equals("ECE"))
{
writer2.write(line);
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
writer.flush();
writer.close();
writer2.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

1) First find column index using header row or if header is not present then by index
2) Follow below algorithm which will result map of key value where key is column by which split is performed
global resultMap;
Method add(key,row) {
data = (resultMap.containsKey(key))? resultMap.get(key):new ArrayList<String>();
data.add(row);
resultMap.put(key, data );
}
Method getSplittedMap(List rows) {
for (String currentRow : rows) {
add(key, currentRow);
}
return resultMap;
}
hope this helps.

FileOutputStream f_ECE = new FileOutputStream("provideloaction&filenamehere");
FileOutputStream f_CSE_ME = new FileOutputStream("provideloaction&filenamehere");
FileInputputStream fin = new FileinputStream("provideloaction&filenamehere");
int size = fin.available(); // find the length of file
byte b[] = new byte[size];
fin.read(b);
String s = new String(b); // file copied into string
String s1[] = s.split("\n");
for (int i = 0; i < s1.length; i++) {
String s3[] = s1[i].split(",")
if (s3[2].equals("ECE"))
f_ECE.write(s1.getBytes());
if (s3[2].equals("CSE") || s3.equals("EEE"))
f_CSE_ME.write(payload.getBytes());
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Implemented Jaccard distance into ANTLR to find similarity of java code - java

Related

How to search string in a file and then search that string in another file

Split file in chunk when fine head record (java 8)

Why is there no output when I run my code?

Regular Expression..Splitting a string array twice

splitting of csv file based on column value in java

Categories

Resources