Regex matching imports of a class - java

I've been trying to write a regex to match the imports of a class. Let the class be
import static org.junit.Assert.*;
import org.
package.
Test;
import mypackage.mystuff;
The output should be [org.junit.Assert.*, org.package.Test, mypackage.mystuff]. I've been struggling with the line breaks and with regular expressions in general since I'm not that experienced with them. This is my current attempt:
((?<=\bimport\s)\s*([^\s]+ )*([a-z.A-Z0-9]+.(?=;)))

This (almost) suits your needs:
(?<=import (?:static )?+)[^;]+
Debuggex Demo
Almost because the matches include the new lines if any (e.g. in your org.package.Test declaration). This should be handled afterwards:
Pattern pattern = Pattern.compile("(?<=import (?:static )?+)[^;]+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
String match = matcher.group().replaceAll("\\s+", "");
// do something with match
}
In Java, \s matches [ \t\n\x0B\f\r]. Have a look at possessive quantifiers as well to understand the ?+ quantifier.

This regex should work for all kinds of import statements and should not match invalid statements:
import\p{javaIdentifierIgnorable}*\p{javaWhitespace}+(?:static\p{javaIdentifierIgnorable}*\p{javaWhitespace}+)?(\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*|(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\p{javaJavaIdentifierStart}[\p{javaJavaIdentifierPart}\p{javaIdentifierIgnorable}]*)+(?:\p{javaWhitespace}*\.\p{javaWhitespace}*\*)?))\p{javaWhitespace}*;
It's extensively using Java's categories, e.g. \p{javaWhitespace} calls Character.isWhitespace:
Categories that behave like the java.lang.Character boolean ismethodname methods (except for the deprecated ones) are available through the same \p{prop} syntax where the specified property has the name javamethodname.
Still not readable? Guessed so. That's why I tried to express it with Java code (REGEX):
public class ImportMatching {
static final String IMPORTS = "import\n" +
"java.io.IOException;\n" +
"import java.nio.file.Files;\n" +
"import java . nio . file. Path;\n" +
"import java.nio.file.Paths\n" +
";import java.util.ArrayList;\n" +
"import static java.util. List.*;\n" +
"import java.util.List. *;\n" +
"import java.\n" +
" util.\n" +
" List;\n" +
" import java.util.regex.Matcher;import java.util.regex.Pattern\n" +
" ;\n" +
"import mypackage.mystuff;\n" +
"import mypackage.*;";
static final String WS = "\\p{javaWhitespace}";
static final String IG = "\\p{javaIdentifierIgnorable}";
static final String ID = "\\p{javaJavaIdentifierStart}" + multiple(charClass("\\p{javaJavaIdentifierPart}" + IG));
static final String DOT = multiple(WS) + "\\." + multiple(WS);
static final String WC = "\\*";
static final String REGEX = "import" + multiple(IG) + atLeastOnce(WS) +
optional(nonCapturingGroup("static" + multiple(IG) + atLeastOnce(WS))) +
group(
ID +
nonCapturingGroup(
or(
DOT + WC,
atLeastOnce(nonCapturingGroup(DOT + ID)) + optional(nonCapturingGroup(DOT + WC))
)
)
) +
multiple(WS) + ';';
public static void main(String[] args) {
final List<String> imports = getImports(IMPORTS);
System.out.printf("Matches: %d%n", imports.size());
imports.stream().forEach(System.out::println);
}
static List<String> getImports(String javaSource) {
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(javaSource);
List<String> imports = new ArrayList<>();
while(matcher.find()) {
imports.add(matcher.group(1).replaceAll(charClass(WS + IG), ""));
}
return imports;
}
static String nonCapturingGroup(String regex) {
return group("?:" + regex);
}
static String or(String option1, String option2) {
return option1 + '|' + option2;
}
static String atLeastOnce(String regex) {
return regex + '+';
}
static String optional(String regex) {
return regex + '?';
}
static String multiple(String regex) {
return regex + '*';
}
static String group(String regex) {
return '(' + regex + ')';
}
static String charClass(String regex) {
return '[' + regex + ']';
}
}
I'm using one group for the package.Class part and then replacing any noise from the matches.
The test input is the following string (IMPORTS):
import
java.io.IOException;
import java.nio.file.Files;
import java . nio . file. Path;
import java.nio.file.Paths
;import java.util.ArrayList;
import static java.util. List.*;
import java.util.List. *;
import java.
util.
List;
import java.util.regex.Matcher;import java.util.regex.Pattern
;
import mypackage.mystuff;
import mypackage.*;
The output:
Matches: 12
java.io.IOException
java.nio.file.Files
java.nio.file.Path
java.nio.file.Paths
java.util.ArrayList
java.util.List.*
java.util.List.*
java.util.List
java.util.regex.Matcher
java.util.regex.Pattern
mypackage.mystuff
mypackage.*

You can use this regex:
(\w+\.\n*\s*)+([\w\*]+)(?=\;)
Escaped For Java:
(\\w+\\.\\n*\\s*)+([\\w\\*]+)(?=\\;)
Here is a regex tester link

Maybe this is what you are looking for?
(?<=\bimport)(\s*\R*\s*(?:[a-z0-9A-Z]+(\R|\s)+)*)((([a-zA-Z0-9]+\.)+)[a-zA-Z0-9]*\*?);
Source

Try this regexp:
import (static )*([^;])*

This works good for me
import\s*((?:\w+[/./])+)

Related

Regex to grab validate email address in complete XML string or normal string

Need to grab string text of email value in big XML/normal string.
Been working with Regex for it and as of now below Regex is working correctly for normal String
Regex : ^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}$
Text : paris#france.c
but in case when above text is enclosed in XML tag it fails to return.
<email>paris#france.c</email>
I am trying to amend some change to this regex so that it will work for both of the scenarios
You have put ^ at the beginning which means the "Start of the string", and $ at the end which means the "End of the string". Now, look at your string:
<email>paris#france.c</email>
Do you think, it starts and ends with an email address?
I have removed them and also escaped the - in your regex. Here you can check the following auto-generated Java code with the updated regex.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+(?:\\\\.[\\w!#$%&'*+/=?`\\{|\\}~^\\-]+)*#(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{1,6}";
final String string = "paris#france.c\n"
+ "<email>paris#france.c</email>";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
Output:
Full match: paris#france.c
Full match: paris#france.c

Regex to split a string using java

I am trying to parse a string as I need to pass the map to UI.
Here is my input string :
"2020-02-01T00:00:00Z",1,
"2020-04-01T00:00:00Z",4,
"2020-05-01T00:00:00Z",2,
"2020-06-01T00:00:00Z",31,
"2020-07-01T00:00:00Z",60,
"2020-08-01T00:00:00Z",19,
"2020-09-01T00:00:00Z",10,
"2020-10-01T00:00:00Z",33,
"2020-11-01T00:00:00Z",280,
"2020-12-01T00:00:00Z",61,
"2021-01-01T00:00:00Z",122,
"2021-12-01T00:00:00Z",1
I need to split the string like this :
"2020-02-01T00:00:00Z",1 : split[0]
"2020-04-01T00:00:00Z",4 : split[1]
Issue is I can't split it on " , " as its repeated 2 times.
I need a regex that gives 2020-02-01T00:00:00Z,1 as one token to process further.
I am new to regex. Can someone please provide a regex expression for the same.
If you want the pairs of date-time and ID, you can use the regex, (\"\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z\",\d+)(?=,|$) to get the match results.
The pattern, (?=,|$) is the lookahead assertion for comma or end of the line.
Demo:
import java.util.List;
import java.util.regex.MatchResult;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
String s = "\"2020-02-01T00:00:00Z\",1,\n"
+ " \"2020-04-01T00:00:00Z\",4,\n"
+ " \"2020-05-01T00:00:00Z\",2,\n"
+ " \"2020-06-01T00:00:00Z\",31,\n"
+ " \"2020-07-01T00:00:00Z\",60,\n"
+ " \"2020-08-01T00:00:00Z\",19,\n"
+ " \"2020-09-01T00:00:00Z\",10,\n"
+ " \"2020-10-01T00:00:00Z\",33,\n"
+ " \"2020-11-01T00:00:00Z\",280,\n"
+ " \"2020-12-01T00:00:00Z\",61,\n"
+ " \"2021-01-01T00:00:00Z\",122,\n"
+ " \"2021-12-01T00:00:00Z\",1";
List<String> list = Pattern.compile("(\\\"\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z\\\",\\d+)(?=,|$)")
.matcher(s)
.results()
.map(MatchResult::group)
.collect(Collectors.toList());
list.stream()
.forEach(p -> System.out.println(p));
}
}
Output:
"2020-02-01T00:00:00Z",1
"2020-04-01T00:00:00Z",4
"2020-05-01T00:00:00Z",2
"2020-06-01T00:00:00Z",31
"2020-07-01T00:00:00Z",60
"2020-08-01T00:00:00Z",19
"2020-09-01T00:00:00Z",10
"2020-10-01T00:00:00Z",33
"2020-11-01T00:00:00Z",280
"2020-12-01T00:00:00Z",61
"2021-01-01T00:00:00Z",122
"2021-12-01T00:00:00Z",1
Why can't you just split on , and ignore the last value?
Here's your pattern:
final Pattern pattern = Pattern.compile("(\\S+),(\\d+)");
final Matcher matcher = pattern.matcher("Input....");
Here's how to use it:
while (matcher.find()) {
final String date = matcher.group(1);
final String number = matcher.group(2);
}

Find words matching a specific REGEX within a sentence using JAVA

I am trying to generate a dynamic message that can be used for processing using Java and Regular Expressions. My incoming value can be just "$bdate$" or be embedded within a sentence like "Your Birthdate : $bdate$". I want to replace these $aaa$ values dynamically at run time and am not able to isolate the regex matched values within a sentence. Here is what I have so far....
package com.test;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;
public class TestRegex {
public static String REGEX = "\\$((?:[a-zA-Z0-9_ ]*))\\$";
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
System.out.println("Matcher : " + Pattern.matches(REGEX, "$ABX_ 11$"));
String [] splitStrings = testString.split("\\W+"); //also tried "\\b+"
List<String> stringList = Arrays.asList(splitStrings);
for(String test : stringList) {
System.out.println("Split Word : " + test);
}
}
}
The output is below - it misses the preceding and succeeding $ symbols:
Matcher : true
Split Word : Summary
Split Word : summary
Split Word : Age
Split Word : age
Split Word : Location
Split Word : location
I know I am very close but not able to figure out the issue - Can anyone please help !!
You can use the following:
String pattern = "\\w+|\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
See Ideone DEMO
Just to extend #Karthik's answer and complete the thread, below code snippet only looks for words that match a pattern within the sentence and collects them - it might be easier to replace those dynamically at run time.
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static String testString = "Summary : $summary$"
+ "Age : $age$"
+ "Location : $location$";
public static void main(String[] args) {
String pattern = "\\$\\w+\\$";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(testString);
while (m.find( )) {
System.out.println("Found value: " + m.group(0) );
}
}
}

Regular Expression : No match found

I just started learning about regular expressions. I am trying to get the attribute values within "mytag" tags and used the following code, but it is giving me No match found exception.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class dummy {
public static void testRegEx()
{
// String pattern_termName = "(?i)\\[.*\\]()\\[.*\\]";
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1="(.*?)" attr2="(.*?)" attr3="(.*?)"](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
public static void main(String args[])
{
testRegEx();
}
}
I have used \" in place of " but it still shows me same exception.
You forget to check the matcher object against find function and also you need to use \"
instead of ",. The find method scans the input sequence looking for the next subsequence that matches the pattern.
Pattern patternTag;
Matcher matcherTag;
String mypattern= "\\[mytag attr1=\"(.*?)\" attr2=\"(.*?)\" attr3=\"(.*?)\"\\s*](.+?)\\[/mytag]";
String term="[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
patternTag = Pattern.compile(mypattern);
matcherTag = patternTag.matcher(term);
while(matcherTag.find()){
System.out.println(matcherTag.group(1)+"*********"+matcherTag.group(2)+"$$$$$$$$$$$$");
}
Output:
20258044753052856*********A security $$$$$$$$$$$$
DEMO
\\s+ or \\s* missing
code:
final String pattern = "\\[\\s*mytag\\s+attr1\\s*=\\s*\"(.*?)\"\\s+attr2\\s*=\\s*\"(.*?)\"\\s+attr3\\s*=\\s*\"(.*?)\"\\s*\\](.+?)\\[/mytag\\]";
final String input = "[mytag attr1=\"20258044753052856\" attr2=\"A security \" attr3=\"cvvc\" ]TagTitle[/mytag]";
final Pattern p = Pattern.compile( pattern );
final Matcher m = p.matcher( input );
if( m.matches()) {
System.out.println(
m.group(1) + '\t' + m.group(2) + '\t' + m.group(3) + '\t' + m.group(4));
}
outpout:
20258044753052856 A security cvvc TagTitle

Matcher can't match

I have the following code. I need to check the text for existing any of the words from some list of banned words. But even if this word exists in the text matcher doesn't see it. here is the code:
final ArrayList<String> regexps = config.getProperty(property);
for (String regexp: regexps){
Pattern pt = Pattern.compile("(" + regexp + ")", Pattern.CASE_INSENSITIVE);
Matcher mt = pt.matcher(plainText);
if (mt.find()){
result = result + "message can't be processed because it doesn't satisfy the rule " + property;
reason = false;
System.out.println("reason" + mt.group() + regexp);
}
}
What is wrong? This code can'f find regexp в[ыy][шs]лит[еe], which is regexp in the plainText = "Вышлите пожалуйста новый счет на оплату на Санг, пока согласовывали, уже
прошли его сроки. Лиценз...". I also tried another variants of the regexp but everything is useless
The trouble is elsewhere.
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("(qwer)");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
}
}
This prints out true. You may want to use word boundary as delimiter, though:
import java.util.regex.*;
public class HelloWorld {
public static void main(String []args) {
Pattern pt = Pattern.compile("\\bqwer\\b");
Matcher mt = pt.matcher("asdf qwer zxcv");
System.out.println(mt.find());
mt = pt.matcher("asdfqwer zxcv");
System.out.println(mt.find());
}
}
The parenthesis are useless unless you need to capture the keyword in a group. But you already have it to begin with.
Use ArrayList's built in functions indexOf(Object o) and contains(Object o) to check if a String exists anywhere in the Array and where.
e.g.
ArrayList<String> keywords = new ArrayList<String>();
keywords.add("hello");
System.out.println(keywords.contains("hello"));
System.out.println(keywords.indexOf("hello"));
outputs:
true
0
Try this to filter out messages which contain banned words using the following regex which uses OR operator.
private static void findBannedWords() {
final ArrayList<String> keywords = new ArrayList<String>();
keywords.add("f$%k");
keywords.add("s!#t");
keywords.add("a$s");
String input = "what the f$%k";
String bannedRegex = "";
for (String keyword: keywords){
bannedRegex = bannedRegex + ".*" + keyword + ".*" + "|";
}
Pattern pt = Pattern.compile(bannedRegex.substring(0, bannedRegex.length()-1));
Matcher mt = pt.matcher(input);
if (mt.matches()) {
System.out.println("message can't be processed because it doesn't satisfy the rule ");
}
}

Categories