regex comment matching code in java not working properly

regex comment matching code in java not working properly - java

I have this code for Identifying the comments and print them in java
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Solution {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(\\/\\*((.|\n)*)\\*\\/)|\\/\\/.*");
String code = "";
Scanner scan = new Scanner(System.in);
while(scan.hasNext())
{
code+=(scan.nextLine()+"\n");
}
Matcher matcher = pattern.matcher(code);
int nxtBrk=code.indexOf("\n");
while(matcher.find())
{
int i=matcher.start(),j=matcher.end();
if(nxtBrk<i)
{
System.out.print("\n");
}
System.out.print(code.substring(i,j));
nxtBrk = code.indexOf("\n",j);
}
scan.close();
}
}
Now when I try the code against this input
/*This is a program to calculate area of a circle after getting the radius as input from the user*/
\#include<stdio.h>
int main()
{ //something
It outputs right and only the comments. But when I give the input
/*This is a program to calculate area of a circle after getting the radius as input from the user*/
\#include<stdio.h>
int main()
{//ok
}
/*A test run for the program was carried out and following output was observed
If 50 is the radius of the circle whose area is to be calculated
The area of the circle is 7857.1429*/
The program outputs the whole code instead of just the comments. I don't know what wrong is doing the addition of that last lines.
EDIT: parser is not an option because I am solving problems and I have to use programming language . link https://www.hackerrank.com/challenges/ide-identifying-comments

Parsing source code with regular expressions is very unreliable. I'd suggest you use a specialized parser. Creating one is pretty simple using antlr. And, since you seem to be parsing C source files, you can use the C grammar.

Your pattern, shorn of its Java quoting (and some unnecessary backslashes), is this:
(/\*((.|
)*)\*/)|//.*
That's fine enough, except that it has just greedy quantifiers which means that it will match from the first /* to the last */. You want non-greedy quantifiers instead, to get this pattern:
(/\*((.|
)*?)\*/)|//.*
Small change, big consequence since it now matches to the first */ after the /*. Re-encoded as Java code.
Pattern pattern = Pattern.compile("(/\\*((.|\n)*?)\\*/)|//.*");
(Be aware that you are very close to the limit of what it is sensible to match with regular expressions. Indeed, it's actually incorrect since you might have strings with /* or // in. But you'll probably get away with it…)

Related

Display elements of a String which are not found on the other

We were told to do a program on stings and I wasn't able to attend class because I was sick. I am asking for your help on this task that was given to us.
Create a java program that will ask the user to input two Strings. Compare the two strings and display the letters that are found on the first string but are not found on the second string.
Here is what I have at the moment https://pastebin.com/7a4dHecR
I really have no Idea what to do so any help would be appreciated!
https://pastebin.com/7a4dHecR
import java.util.*;
public class filename{
public static void main(String[] args){
Scanner sc =new Scanner(System.in);
System.out.print("Input first string: ");
String one=sc.next();
System.out.println();
System.out.print("Input second string: ");
String two=sc.next();
}
}

There are many ways to do this. I'm going to give you some parts you can put together. They are not the shortest or simplest way to solve this particular problem, but they will be useful for other small programs you write.
Here are some hints:
First, figure out how to step through your code with a debugger.
Second, figure out how to find the Javadoc for Java library classes and their methods.
You need to do something for each character in a string. Use a for loop for that:
for (int i = 0; i < one.length(); i++) {
// your code here
}
You need to get a particular character of a String.
String c = one.substring(i, i+1);
Read the Javadoc for String.substring to understand what the i and i+1 parameters do.
Now you need to find a way to check whether a String contains another String. Look at the Javadoc for the String class.
Then you can put all this together.

You could try the following:
String diff: StringUtils.difference(one, two);
System.out.println(diff);

Do I scan twice if I call scanner.hasNext and then scanner.next

Do I scan twice if I call scanner.hasNext(pattern) and then scanner.next(pattern) with the same pattern on java.util.Scanner
Let's say i have this code with a lots of cases (trying to make a lexer):
import java.util.*;
import java.util.regex.Pattern;
public class MainClass {
public static void main(String[] args) {
Scanner scanner = new Scanner("Hello World! 3 + 3.0 = 6 ");
Pattern a = Pattern.compile("..rld!");
Pattern b = Pattern.compile("...llo");
while(scanner.hasNext()) {
if (scanner.hasNext(a)) {
scanner.next(a);
/*Do something meaningful with it like create a token*/
}
else if(scanner.hasNext(b)) {
scanner.next(b);
}
/*...*/
}
// close the scanner
scanner.close();
}
}
My questions are:
Does the hasNext(pattern) caches somehow the result of the search? So it doesn't search the same pattern twice
Is this slower or faster than using try { scanner.next(pattern) } catch { ... }
Or is there an easier way (without third-party libraries) to tokenize based on the regex patterns

Ok so I think that the answer is:
Documentation doesn't say anything so it may be possible, but it probably does not.
Also, I was primarily asking because i wanted to use it for parsing more complex things like string literals and not just white space separated tokens.
And found out that Scanner still takes such token and then it checks if it matches. So it is now useless for my use case.

Java - Counting words, lines, and characters from a file

I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.
This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.
Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?

I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below
while (fileScan.hasNextLine()) {
lineC++;
tempo = fileScan.nextLine();
StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
wordC += st.countTokens();
while(st.hasMoreTokens()) {
String stt = st.nextToken();
System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
charC += stt.length();
}
System.out.println("Lines: " + lineC + "\nWords: " + wordC+" \nChars: "+charC);
}
Note: Escaping character with StringTokenizer will not work. i.e. you would expect that \\s should delimit with any whitespace character but it will instead delimit based on literal character s. If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters

I tried your code but I didn't receive any exception here. However, I suspect that when you input the file name, maybe you forgot the extension of the file.

You probably forgot the file extension while giving input, but there is a much simpler way of doing this. You also mention you don't know how to count the characters. You can try something like this:
import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;
public class WordCount
{
public static void main(String[] args)
{
Scanner userInput = new Scanner(System.in);
try {
// Input file
System.out.println("Please enter the name of the file.");
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
}
catch (IOException ex1) {
System.out.println("Error.");
System.exit(0);
}
}
}
Going through the code
import java.util.stream.*;
Note we use the streams package, for filtering out empty strings while finding words. Now let's skip forward a bit.
String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
The above part gets all of the text in the file and stores it as a string.
System.out.printf("Lines: %d\nWords: %d\nCharacters: %d",content.split("\n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
Okay, this is a long line. Let's break it down.
"Lines: %d\nWords: %d\nCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. The first %d will be replaced by content.split("\n").length, which is the number of lines. We get the number of lines by splitting the string.
The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(). Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Next, we filter all the empty values out, since String.split keeps in empty values. The .count() is self-explanatory, takes the amount of words left after filtering.
The third and last %d is the simplest. It is replaced by the length of the string. content.length() should be self-explanatory.
I left your catch block intact, but I feel like the System.exit(0) is a bit redundant.

Debugging variable error with Java

I am trying to do this stuff. If a user enters "C:\Windows\system32\foo.txt" then the program will convert it to "C:\\Windows\\system32\\foo.txt". A front slash needs to be added to every other preceding slash. Here's what I have coded till now (only the section relevant):
import javax.swing.*;
public class test {
public static void main(String[] args){
String path = JOptionPane.showInputDialog(null, "Enter the File path", "Word counter", JOptionPane.INFORMATION_MESSAGE);
for (int z=0;z<=path.length()-1;z++)
{
if (path.charAt(z) == '\\')
{
path.charAt(z) = "\\\\";
}
}
System.out.println(path); // For knowing what's going on
}
}
Unfortunately it's not compiling, and I don't have a clue of what to do. Any possible help welcomed. Thank you!

You are trying modify a String. Remember strings are immutable.
you can try something like
path.replace(oldChar, newChar) if you want to replace some chars.

This: path.charAt(z) cannot be on the left side of an assignment statement. Instead concatenate your String or use a StringBuilder.
Or just use String's replace(...) method.

java regex line

In java i would like to read a file line by line and print the line to the output.
I want to solve this with regular expressions.
while (...)
{
private static java.util.regex.Pattern line = java.util.regex.Pattern.compile(".*\\n");
System.out.print(scanner.next(line));
}
The regex in the code is not correct, as i get InputMismatchException.
I am working on this regex for 2 hours. Please help with it.
With regex powertoy i see that ".*\n" is correct. But my program runs incorrectly.
The whole source is:
/**
* Extracts the points in the standard input in off file format to the standard output in ascii points format.
*/
import java.util.regex.Pattern;
import java.util.Scanner;
class off_to_ascii_points
{
private static Scanner scanner = new Scanner(System.in);
private static Pattern fat_word_pattern = Pattern.compile("\\s*\\S*\\s*");
private static Pattern line = Pattern.compile(".*\\n", Pattern.MULTILINE);
public static void main(String[] args)
{
try
{
scanner.useLocale(java.util.Locale.US);
/* skip to the number of points */
scanner.skip(fat_word_pattern);
int n_points = scanner.nextInt();
/* skip the rest of the 2. line */
scanner.skip(fat_word_pattern); scanner.skip(fat_word_pattern);
for (int i = 0; i < n_points; ++i)
{
System.out.print(scanner.next(line));
/*
Here my mistake is.
next() reads only until the delimiter,
which is by default any white-space-sequence.
That is next() does not read till the end of the line
what i wanted.
Changing "next(line)" to "nextLine()" solves the problem.
Also, setting the delimiter to line_separator
right before the loop solves the problem too.
*/
}
}
catch(java.lang.Exception e)
{
System.err.println("exception");
e.printStackTrace();
}
}
}
The beginning of an example input is:
OFF
4999996 10000000 0
-28.6663 -11.3788 -58.8252
-28.5917 -11.329 -58.8287
-28.5103 -11.4786 -58.8651
-28.8888 -11.7784 -58.9071
-29.6105 -11.2297 -58.6101
-29.1189 -11.429 -58.7828
-29.4967 -11.7289 -58.787
-29.1581 -11.8285 -58.8766
-30.0735 -11.6798 -58.5941
-29.9395 -11.2302 -58.4986
-29.7318 -11.5794 -58.6753
-29.0862 -11.1293 -58.7048
-30.2359 -11.6801 -58.5331
-30.2021 -11.3805 -58.4527
-30.3594 -11.3808 -58.3798
I first skip to the number 4999996 which is the number of lines containing point coordinates. These lines are that i am trying to write to the output.

I suggest using
private static Pattern line = Pattern.compile(".*");
scanner.useDelimiter("[\\r\\n]+"); // Insert right before the for-loop
System.out.println(scanner.next(line)); //Replace print with println
Why your code doesn't work as expected:
This has to do with the Scanner class you use and how that class works.
The javadoc states:
A Scanner breaks its input into tokens
using a delimiter pattern, which by
default matches whitespace.
That means when you call one of the Scanner's.next* methods the scanner reads the specified input until the next delimiter is encountered.
So your first call to scanner.next(line) starts reading the following line
-28.6663 -11.3788 -58.8252
And stops at the space after -28.6663. Then it checks if the token (-28.6663) matches your provided pattern (.*\n) which obviously doesn't match (-28.6663). That's why.

If you only want to print the file to standard out, why do you want to use regexps? If you know that you always want to skip the first two lines, there are simpler ways to accomplish it.
import java.util.Scanner;
import java.io.File;
public class TestClass {
public static void main(String[] args) throws Exception {
Scanner in=new Scanner(new File("test.txt"));
in.useDelimiter("\n"); // Or whatever line delimiter is appropriate
in.next(); in.next(); // Skip first two lines
while(in.hasNext())
System.out.println(in.next());
}
}

You have to switch the Pattern into multiline mode.
line = Pattern.compile("^.*$", Pattern.MULTILINE);
System.out.println(scanner.next(line));

By default the scanner uses the white space as its delimiter. You must change the delimiter to the new line before you read the line after the first skips. The code you need to change is to insert the following line before the for loop:
scanner.useDelimiter(Pattern.compile(System.getProperty("line.separator")));
and update the Pattern variable line as following:
private static Pattern line = Pattern.compile(".*", Pattern.MULTILINE);

Thank everybody for the help.
Now i understand my mistake:
The API documentation states, that every nextT() method of the Scanner class first skips the delimiter pattern, then it tries to read a T value. However it forgets to say that each next...() method reads only till the first occurrence of the delimiter!

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

regex comment matching code in java not working properly - java

Parsing source code with regular expressions is very unreliable. I'd suggest you use a specialized parser. Creating one is pretty simple using antlr. And, since you seem to be parsing C source files, you can use the C grammar.

Related

Display elements of a String which are not found on the other

Do I scan twice if I call scanner.hasNext and then scanner.next

Java - Counting words, lines, and characters from a file

Debugging variable error with Java

java regex line

Categories

Resources