Parse Java code between braces using Regex and groups - java

I am creating some java code that takes correctly written .java files as input, and i want to extract the text between braces using a regular expression. I want to use the Pattern and Matcher classes, and not for loops.
I believe its best to create a regex that groups the text in the whole class, and later another regex that will be aplied to the previous output and groups the text in methods.
I got close to getting the class text using the following regex on online regex testers:
\w\sclass.*\{((.*\s*)*)\}
but i'm pretty sure i am doing it wrong by using two groups instead of just one. Furthermore when i use this expression in Java i am actually getting nothing.
Here is an example file that i am using for debugging
package foo.bar;
import java.io.File;
public class Handy {
{
// static block, dont care!
}
/**
* Check if a string is Null and Empty
* #param str
* #return
*/
public static boolean isNullOrEmpty(String str) {
Boolean result = (str == null || str.isEmpty());
return result;
}
/**
* Mimics the String.format method with a smaller name
* #param format
* #param args
* #return
*/
public static String f(String format, Object... args)
{
return String.format(format, args);
}
}
With the example code above, i expect to get:
entire class text
{
// static block, dont care!
}
/**
* Check if a string is Null and Empty
* #param str
* #return
*/
public static boolean isNullOrEmpty(String str) {
Boolean result = (str == null || str.isEmpty());
return result;
}
/**
* Mimics the String.format method with a smaller name
* #param format
* #param args
* #return
*/
public static String f(String format, Object... args)
{
return String.format(format, args);
}
individual method text
Boolean result = (str == null || str.isEmpty());
return result;
return String.format(format, args);
I know how to use the Pattern and Matcher classes already, i just need the right regexes...

After some confusion in the comments section, i would like to share my solution for what i asked, even if it was not very clear.
This is not thoroughly tested code, but it works for my purpose. Some adjustments or improvements are very likely possible. I took some inspiration from the comments i read in this post, and others like this.
I feed each of the following methods the entire plain text found in a .java file, and from there i use Pattern and Matcher to extract what i want.
private static String patternMatcher(String content, String patternText, int groupIndex) {
Pattern pattern = Pattern.compile(patternText);
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
return matcher.group(groupIndex);
} else {
return "";
}
}
public static String getPackageName(String content) {
return patternMatcher(content, ".*package\\s+(.*)\\s*\\;", 1);
}
public static String getClassName(String content) {
return patternMatcher(content, ".*class\\s+(\\w+)[\\w\\s]+\\{", 1);
}
public static String getClassCode(String content) {
return patternMatcher(content, ".*class.*\\{((.*\\s*)*)\\}", 1);
}
public static String getMethodName(String code) {
String uncommentedCode = removeComments(code).trim();
return patternMatcher(uncommentedCode,
"(public|private|static|protected|abstract|native|synchronized) *([\\w<>.?, \\[\\]]*)\\s+(\\w+)\\s*\\([\\w<>\\[\\]._?, \\n]*\\)\\s*([\\w ,\\n]*)\\s*\\{",
3);
}
public static String removeComments(String content) {
return content.replaceAll("\\/\\*[\\s\\S]*?\\*\\/|([^:]|^)\\/\\/.*$", "$1 ").trim();
}
I double checked but i hope i didn't forget any escape character, be carefull with those.
Lots of people recomended that i used an actual code parsing library, like ANTLR, but i assumed it would take much longer for me to learn how to work with it, then it would take to do with with RegEx. Furthermore i wanted to improve my Regex skills, this exercise definitely taught me some things.

Related

How to recommend parameters in java?

Is there any possible way to recommend parameters when I use the method? (Just like annotations?)
I made lots of static final String parameter for url connection and I want to know whether I use right input parameter or not when I code.
Below is my example.
public void myRequest(String inputParameter) {
String requestURL = "";
static final String ex1 = "http://example.com/xml1";
static final String ex2 = "http://example.com/xml2";
static final String ex3 = "http://example.com/xml3";
static final String ex4 = "http://example.com/xml4";
static final String ex5 = "http://example.com/xml5";
static final String ex6 = "http://example.com/xml6";
static final String ex7 = "http://example.com/xml7";
// too many..
// ....
static final String ex125 = "http://example.com/xm125";
if(inputParameter.equals("ex1")) {
requestURL = ex1;
} else if(inputParameter.equals("ex2")) {
requestURL = ex2;
}
// just like this..
// .....
else if(inputParameter.equals("ex125")) {
requestURL = ex125;
}
String requestURL = inputParameter;
URL url = new URL(requestURL);
URLConnection urlConnection = url.openConnection();
// below codes are unnecessary.
}
and I will use that method in the "main" method
public static void main(String[] args) {
myRequest("ex1"); // this!!
}
The point is, when I write the method "myRequest()", IDE will tell me the information about parameter (when the cursor is between '(' and ')'). The only one, that I can be noticed, is "You should write String object." not "You can use ex1 which means loading ex1.xml, ex2 meaning ex2.xml or ... ex125 meaning xm125.xml"
If my hope comes true, the result seems like this.
(Writing the methods)
myRequest(|); // there is cursor between ( and )
"What xml should I request? Umm.. there is ex1 to ex125. Ok.. ex1 means.. my hometown history.. ex2 means my school history.. and Umm.. I should use ex4 ok!"
(And I write below method)
myRequest("ex4");
I want IDE to inform me what String Parameter should I use.
Is there any possible way?
Given your updates to the question, the answer is build around enums and maps.
In other words: you simply, never ever put down a list of constants like you do there; that are then "mapped" (by hard wiring in your code; as you do in your example) to some other incoming strings.
Instead: you could use an enum to hold those constants; and you could also add some method to that enum class that knows how to map incoming strings to the available enum constants.
But the main point is: you want to abandon "raw" strings. The IDE (respectively the compiler) can't help you with adding an "ext1" string as method parameter.
But when you have:
public enum ExUrls {
EX1("http://example.com/xml1"), EX2("...
... a private constructor that takes that url string)
then you do:
void someMethod(ExURls ex) {
and all of a sudden, the IDE will be able to suggest the all potential ExUrl constants to you!
I'm not sure if I got your question right but you can use javadocs.
Here is an example from oracle.
/**
* Returns an Image object that can then be painted on the screen.
* The url argument must specify an absolute {#link URL}. The name
* argument is a specifier that is relative to the url argument.
* <p>
* This method always returns immediately, whether or not the
* image exists. When this applet attempts to draw the image on
* the screen, the data will be loaded. The graphics primitives
* that draw the image will incrementally paint on the screen.
*
* #param url an absolute URL giving the base location of the image
* #param name the location of the image, relative to the url argument
* #return the image at the specified URL
* #see Image
*/
public Image getImage(URL url, String name) {
try {
return getImage(new URL(url, name));
} catch (MalformedURLException e) {
return null;
}
}
All you need to do is type /** and press enter on top of your function then you can add details. These will show up when you hover to its function call.
In your case maybe like this
/**
* This method is very amazing it will cure cancer
* (ex1 - hometown history),
* (ex2 - school history),
* (ex3 - blah blah),
* (ex4 - what is the meaning of life),
* ...
* (ex125 - choose this)
*
* #param inputParameter - (String) You can input ex1 up to ex125
*/
public void myRequest(String inputParameter) {
but if you're looking to restrict input to ex1-ex125 only you need something else.

Recursive Reverse Word method not able to get past first letter

When I run the tester class it can't get past the first letter. It outputs an "!" mark instead of continuing to print the rest of the word reversed. Can't figure why it keeps getting hungup. The recursive method seems to be stuck and unable to continue past the first character. Instead of printing "!olleH" I only get "!". Thanks!
/**
Class reverses text of word
*/
public class Sentence
{
private String text="";
/**
Constructs a sentence
#param word
*/
public Sentence(String textIN)
{
text=textIN;
}
/**
gets text
#return text
*/
public String getText()
{
return text;
}
/**
Reverse word
*/
public String reverse()
{
if (text.length() <= 1)
{
return text;
}
else
{
char val = text.charAt(0);
text=text.substring(1);
return reverse() + val;
}
}
}
/**
A tester class for reversing a sentence.
*/
public class SentenceTester
{
public static void main(String[] args)
{
Sentence greeting = new Sentence("Hello!");
greeting.reverse();
System.out.println(greeting.getText());
System.out.println("Expected: !olleH");
}
}
From a read-through of your code, your reverse method seems to work - it returns a reverse of the original text. However, it does this by altering the value of text, and it never puts the final value into text.
What actually happens is that text gets shorter by one character (removed from the front) until there is only one character left - the !.
So, you could solve the problem in your main method:
public static void main(String[] args)
{
Sentence greeting = new Sentence("Hello!");
String result = greeting.reverse();
System.out.println(result);
System.out.println("Expected: !olleH");
}
I hope you like it that way. Just for information, you could use below -
String reversedString = new StringBuilder("Original String").reverse().toString();
One more thing, your code will not work if you pass null :)
The problem with your code is, you are modifying text in else block.
text = text.substring(1);
So at the end it holds only !
Here is the final program (just changed the return statement):
/**
Class reverses text of word
*/
public class Sentence
{
private String text="";
/**
Constructs a sentence
#param word
*/
public Sentence(String textIN)
{
text=textIN;
}
/**
gets text
#return text
*/
public String getText()
{
return text;
}
/**
Reverse word
*/
public String reverse()
{
if (text.length() <= 1)
{
return text;
}
else
{
char val = text.charAt(0);
text=text.substring(1);
return (text=reverse() + val);
}
}
}
the reason is simple - String is immutable. Every time you want it to change assign it back again.

alternate method for using substring on a String

I have a string which contains an underscore as shown below:
123445_Lisick
I want to remove all the characters from the String after the underscore. I have tried the code below, it's working, but is there any other way to do this, as I need to put this logic inside a for loop to extract elements from an ArrayList.
public class Test {
public static void main(String args[]) throws Exception {
String str = "123445_Lisick";
int a = str.indexOf("_");
String modfiedstr = str.substring(0, a);
System.out.println(modfiedstr);
}
}
Another way is to use the split method.
String str = "123445_Lisick";
String[] parts = string.split("_");
String modfiedstr = parts[0];
I don't think that really buys you anything though. There's really nothing wrong with the method you're using.
Your method is fine. Though not explicitly stated in the API documentation, I feel it's safe to assume that indexOf(char) will run in O(n) time. Since your string is unordered and you don't know the location of the underscore apriori, you cannot avoid this linear search time. Once you have completed the search, extraction of the substring will be needed for future processing. It's generally safe to assume the for simple operations like this in a language which is reasonably well refined the library functions will have been optimized.
Note however, that you are making an implicit assumption that
an underscore will exist within the String
if there are more than one underscore in the string, all but the first should be included in the output
If either of these assumptions will not always hold, you will need to make adjustments to handle those situations. In either case, you should at least defensively check for a -1 returned from indexAt(char) indicating that '_' is not in the string. Assuming in this situation the entire String is desired, you could use something like this:
public static String stringAfter(String source, char delim) {
if(source == null) return null;
int index = source.indexOf(delim);
return (index >= 0)?source.substring(index):source;
}
You could also use something like that:
public class Main {
public static void main(String[] args) {
String str = "123445_Lisick";
Pattern pattern = Pattern.compile("^([^_]*).*");
Matcher matcher = pattern.matcher(str);
String modfiedstr = null;
if (matcher.find()) {
modfiedstr = matcher.group(1);
}
System.out.println(modfiedstr);
}
}
The regex groups a pattern from the start of the input string until a character that is not _ is found.
However as #Bill the lizard wrote, i don't think that there is anything wrong with the method you do it now. I would do it the same way you did it.

Removing accents from String

Recentrly I found very helpful method in StringUtils library which is
StringUtils.stripAccents(String s)
I found it really helpful with removing any special characters and converting it to some ASCII "equivalent", for instace ç=c etc.
Now I am working for a German customer who really needs to do such a thing but only for non-German characters. Any umlauts should stay untouched. I realised that strinAccents won't be useful in that case.
Does anyone has some experience around that stuff?
Are there any useful tools/libraries/classes or maybe regular expressions?
I tried to write some class which is parsing and replacing such characters but it can be very difficult to build such map for all languages...
Any suggestions appriciated...
Best built a custom function. It can be like the following. If you want to avoid the conversion of a character, you can remove the relationship between the two strings (the constants).
private static final String UNICODE =
"ÀàÈèÌìÒòÙùÁáÉéÍíÓóÚúÝýÂâÊêÎîÔôÛûŶŷÃãÕõÑñÄäËëÏïÖöÜüŸÿÅåÇçŐőŰű";
private static final String PLAIN_ASCII =
"AaEeIiOoUuAaEeIiOoUuYyAaEeIiOoUuYyAaOoNnAaEeIiOoUuYyAaCcOoUu";
public static String toAsciiString(String str) {
if (str == null) {
return null;
}
StringBuilder sb = new StringBuilder();
for (int index = 0; index < str.length(); index++) {
char c = str.charAt(index);
int pos = UNICODE.indexOf(c);
if (pos > -1)
sb.append(PLAIN_ASCII.charAt(pos));
else {
sb.append(c);
}
}
return sb.toString();
}
public static void main(String[] args) {
System.out.println(toAsciiString("Höchstalemannisch"));
}
My gut feeling tells me the easiest way to do this would be to just list allowed characters and strip accents from everything else. This would be something like
import java.util.regex.*;
import java.text.*;
public class Replacement {
public static void main(String args[]) {
String from = "aoeåöäìé";
String result = stripAccentsFromNonGermanCharacters(from);
System.out.println("Result: " + result);
}
private static String patternContainingAllValidGermanCharacters =
"a-zA-Z0-9äÄöÖéÉüÜß";
private static Pattern nonGermanCharactersPattern =
Pattern.compile("([^" + patternContainingAllValidGermanCharacters + "])");
public static String stripAccentsFromNonGermanCharacters(
String from) {
return stripAccentsFromCharactersMatching(
from, nonGermanCharactersPattern);
}
public static String stripAccentsFromCharactersMatching(
String target, Pattern myPattern) {
StringBuffer myStringBuffer = new StringBuffer();
Matcher myMatcher = myPattern.matcher(target);
while (myMatcher.find()) {
myMatcher.appendReplacement(myStringBuffer,
stripAccents(myMatcher.group(1)));
}
myMatcher.appendTail(myStringBuffer);
return myStringBuffer.toString();
}
// pretty much the same thing as StringUtils.stripAccents(String s)
// used here so I can demonstrate the code without StringUtils dependency
public static String stripAccents(String text) {
return Normalizer.normalize(text,
Normalizer.Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
(I realize the pattern doesn't probably contain all the characters needed, but add whatever is missing)
This might give you a work around. here you can detect the language and get the specific text only.
EDIT:
You can have the raw string as an input, put the language detection to German and then it will detect the German characters and will discard the remaining.

read a file using Scanner: Why am I getting an error when using Scanner for read files in java?

This example demonstrates using Scanner to read a file line by line (it does not perform a write operation) I don't know why I get an error when I try to compile. Could somebody explain the reason to me?. I'm using jcreatorLE and JDK 1.6 to run my program:
import java.io.*;
import java.util.Scanner;
public final class File_read {
public static void main(String... aArgs) throws FileNotFoundException {
ReadWithScanner parser = new ReadWithScanner("C:\\Temp\\test.txt");
parser.processLineByLine();
log("Done.");
}
/**
* #param aFileName full name of an existing, readable file.
*/
public ReadWithScanner(String aFileName){
fFile = new File(aFileName);
}
/** Template method that calls {#link #processLine(String)}. */
public final void processLineByLine() throws FileNotFoundException {
Scanner scanner = new Scanner(fFile);
try {
//first use a Scanner to get each line
while ( scanner.hasNextLine() ){
processLine( scanner.nextLine() );
}
}
finally {
//ensure the underlying stream is always closed
scanner.close();
}
}
/**
* Overridable method for processing lines in different ways.
*
* <P>This simple default implementation expects simple name-value pairs, separated by an
* '=' sign. Examples of valid input :
* <tt>height = 167cm</tt>
* <tt>mass = 65kg</tt>
* <tt>disposition = "grumpy"</tt>
* <tt>this is the name = this is the value</tt>
*/
protected void processLine(String aLine){
//use a second Scanner to parse the content of each line
Scanner scanner = new Scanner(aLine);
scanner.useDelimiter("=");
if ( scanner.hasNext() ){
String name = scanner.next();
String value = scanner.next();
log("Name is : " + quote(name.trim()) + ", and Value is : " + quote(value.trim()) );
}
else {
log("Empty or invalid line. Unable to process.");
}
//(no need for finally here, since String is source)
scanner.close();
}
// PRIVATE //
private final File fFile;
private static void log(Object aObject){
System.out.println(String.valueOf(aObject));
}
private String quote(String aText){
String QUOTE = "'";
return QUOTE + aText + QUOTE;
}
}
This is the result from running it:
--------------------Configuration: <Default>--------------------
C:\Users\administrador\Documents\File_read.java:15: invalid method declaration; return type required
public ReadWithScanner(String aFileName){
^
1 error
Process completed.
When you lifted that code from here :-), you renamed the class but not the constructor. Only constructors are allowed to not have return types.
I suggest you either rename the class back or rename the constructor.
I hope this isn't homework. As it stands, your educator would have an easy time proving plagiarism. You'll need to at least change the variable names as well as class names, you might want to also reformat it a bit including changing the order of methods in the class.
That's if it's homework. Which it's not, right? :-)
Your "ReadWithScanner" constructor needs to match the name of the class ("File_read")
public File_read(String aFileName){
fFile = new File(aFileName);
}
Your class is named File_read and your constructor is named ReadWithScanner. The warning is that your constructor needs to be named the same as the class.
The name of the class is File_read, so the constructor name should be File_read but you gave the name as ReadWithScanner that is why its complaining. Compiler thinking its a method name so expecting a return type.

Categories