Java regex: trouble extracting word phrase - java

I'm trying to extract a word phrase from a Java source file. For example I have a simple source class
class TestClass implements TestInterface implements TestInterface2 {
}
class TestClass2 {
}
I want to extract the "class TestClass" and "class TestClass2". I have tried different regex patterns but couldn't find a solution
My testing code spinet:-
public static void wordPhraser(String sourceText) {
Pattern p = Pattern.compile("class(\\s+)([a-zA-Z]*)");
Matcher m = p.matcher(sourceText);
while (m.find()) {
System.out.println("output " + m.group());
}
}
Also tried:-
"class\\s*([a-zA-Z])"
"class\\s*[a-zA-Z]"
"^class\\s+[a-zA-Z]$"
Non of these are working.
Thanks.

I'm afraid to say that they work, but there is room for improvement:
\bclass(\s+)([a-zA-Z_]\w*)\b
Is a better regex. You weren't matching numbers.
For sure, this is how you should use it in Java:
String regex = "\\bclass(\\s+)([a-zA-Z_]\\w*)\\b";
To match more:
\b((public|private|protected|static|abstract|final)\s*)*class(\s+)([a-zA-Z_]\w*)\b
Demo:

Here is the regex I use:
(final|abstract|\n|^) {0,}class {1,}.{1,} {0,}\\{
That will get the test including the implements/interfaces too though. Here's the code I use to parse them out, and just get the classname:
String match = m.group();//m is my matcher for the regex
String s = match.substring(match.indexOf("class ") + "class ".length(), match.lastIndexOf("{")).trim();
if(s.contains("extends"))
s=s.substring(0, s.indexOf("extends"));
if(s.contains("implements"))
s=s.substring(0, s.indexOf("implements"));
s=s.trim();
strings.add(s);
NOTE: This won't work with public or private classes, only those with simply final/abstract modifiers

Related

RegExpr output incorrect

I am trying to get all of the output from a string that I want to match a pattern using matcher, however, I am not sure that either the string or my pattern isn't correct. I am trying to get (Server: switch) as the first pattern and so on and so forth after the newline, however, I am only getting the last three pattern as my output shows. My output is the following with the code following
found_m: Message: Mess
found_m: Token: null
found_m: Response: OK
Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "Server: Switch\nMessage: Mess\nToken: null\nResponse: OK";
String pattern = "([\\w]+): ([^\\n]+)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
while(m.find()) {
System.out.println("found_m: " + m.group());
}
}else {
System.out.println("NO MATCH");
}
}
}
Is my string line incorrect or my string pattern that I am not doing regexpr wrong?
Thanks in advance.
Your regex is almost correct.
Problem is that you're calling find twice: 1st time in if condition and then again in while.
You can use do-while loop instead:
if (m.find( )) {
do {
System.out.println("found_m: " + m.group());
} while(m.find());
} else {
System.out.println("NO MATCH");
}
For regex part you can use this with minor correction:
final String pattern = "(\\w+): ([^\\n]+)";
or if you don't need 2 capturing groups then use:
final String pattern = "\\w+: [^\\n]+";
As there is no need to use character class around \\w+
I'm not familiar with Java, but this regex pattern should work to capture every group and match.
([\w]+): (\w+)(?:(?:[\\][n])|$)
It basically states capture the word followed by the colon and space, then capture the next word before either the \n or the end of string.
Good luck.

Deal with apostrophe in java regex in replaceALL

Trying the replace only the EXACT & WHOLE OCCURRENCES of pattern using the following code. Apparently you in you'll is being replaced as ###'ll. But what I want is only you to be replaced.
Please suggest.
import java.util.*;
import java.io.*;
public class Fielreadingtest{
public static void main(String[] args) throws IOException {
String MyText = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed. ";
String newLine = System.getProperty("line.separator");
System.out.println("Before:" + newLine + MyText);
String pattern = "\\byou\\b";
MyText = MyText.replaceAll(pattern, "###");
System.out.println("After:" + newLine +MyText);
}
}
/*
Before:
I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.
After:
I knew about ### long before I met ###. I also know that ###’re an awesome person. By the way ###’ll be missed.
*/
This being said I have an input file which contains a list of words that I want to skip which looks like this:
Now as per #Anubhav I have to use (^|\\s)you([\\s.]|$) to replace exactly you but not anything else. Is my best bet to use a tool like notepad++ and pre & post fix all my input words as above or change something in the code itslef. The code I'm using is this:
for (String pattern : patternsToSkip) {
line = line.replaceAll(pattern, "");
}
source: https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount2_source.html?scroll=topic_7_1
You can instead use this regex:
String pattern = "(^|\\s)you([\\s.,;:-]|$)";
This will match "you" only at:
start or preceded by a space
end or followed by a space OR a some listed punctuation characters
You can use a negative lookahead:
\b(you)(?!['’])
Escaped for a Java string:
"\\b(you)(?!['’])"
Your demo input contains a different apostrophe than on my keyboard. I've put both in the negative lookahead.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/**
<P>{#code java ReplaceYouWholeWordWithAtAtAt}</P>
**/
public class ReplaceYouWholeWordWithAtAtAt {
public static final void main(String[] ignored) {
String sRegex = "\\byou(?!['’])";
String sToSearch = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.";
String sRplcWith = "###";
Matcher m = Pattern.compile(sRegex).matcher(sToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, sRplcWith);
}
m.appendTail(sb);
System.out.println(sb);
}
}
Output:
[C:\java_code\]java ReplaceYouWholeWordWithAtAtAt
I knew about ### long before I met ###. I also know that youÆre an awesome person. By the way youÆll be missed.

Java - What is the regex for the following String

I am writing a program that takes in a java file and checks each line for a string containing assertEquals and then replaces the string belonging in the second argument of assertEquals (that would be expectedVar and expectedVar2).
Say these lines are read from a file and placed on a string variable:
String myString1 = "Assert.assertEquals(outputMessage, expectedVar, actualVar);"
String myString2 = "Assert.assertEquals(/"Hello World, /" + "Hello!", expectedVar2, actualVar);"
I would like to use a single regex from the Pattern Library along with 'group' and replace expectedVar and expectedVar2 or basically any string that lies in the second argument of assertEquals.
I was thinking to take in anything after the first comma and before the second comma but the myString2 could also contain multiple commas (eg. /"Hello World, /" + "Hello!").
I am not sure on how to approach this. I am willing to implement this differently if you have another idea.
Thank you in advanced
You need to use regex atomic grouping (?>...|...) >>
Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String text = "Assert.assertEquals(/\"Hello World, /\" + \"Hello!\", expectedVar2, actualVar)";
String pattern = "Assert\\.assertEquals\\((?>(?:[^,]*\"(?:[^,]*,)+[^,]*\")+|[^,]+),\\s*([^,]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
if (m.find()) {
System.out.println("MATCH FOUND: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
}
}
Output:
MATCH FOUND: expectedVar2
Test this code here.

Regex and String Manipulation Techniques

Given that an input String may be specified as follows:
read(xpath(‘...’)) or
xpath(‘...’) or
...
Where ... just holds some xpath expression, for example,/comment/text
All I really want is the xpath expression; what would be an efficient way to in general extract this value given the three possible valid patterns that could be specified.
Also, I am implementing this in Java.
Here is the basic example, it matches xpath part in both of your string examples:
import java.util.regex.*;
class Untitled {
public static void main(String[] args) {
String input = "read(xpath('...'))";
String result = null;
Pattern regex = Pattern.compile("xpath\\(\'(.*?)\'\\)");
Matcher matcher = regex.matcher(input);
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
}
}
It would have helped you posted some code but you can use String Split. Please refer to http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#split%28java.lang.String%29
If you are sure to have xpath('') , then you can use xpath(' as your regex and strip the string and gather the data inside it until it hits another ' (apostrophe).
I hope this gives you an idea.

Parse a string in Java

I have strings formatted similar to the one below in a Java program. I need to get the number out.
Host is up (0.0020s latency).
I need the number between the '(' and the 's' characters. E.g., I would need the 0.0020 in this example.
If you are sure it will always be the first number you could use the regular expresion \d+\.\d+ (but note that the backslashes need to be escaped in Java string literals).
Try this code:
String input = "Host is up (0.0020s latency).";
Pattern pattern = Pattern.compile("\\d+\\.\\d+");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group());
}
See it working online: ideone
You could also include some of the surrounding characters in the regular expression to reduce the risk of matching the wrong number. To do exactly as you requested in the question (i.e. matching between ( and s) use this regular expression:
\((\d+\.\d+)s
See it working online: ideone
Sounds like a case for regular expressions.
You'll want to match for the decimal figure and then parse that match:
Float matchedValue;
Pattern pattern = Pattern.compile("\\d*\\.\\d+");
Matcher matcher = pattern.matcher(yourString);
boolean isfound = matcher.find();
if (isfound) {
matchedValue = Float.valueOf(matcher.group(0));
}
It depends on how "similar" you mean. You could potentially use a regular expression:
import java.math.BigDecimal;
import java.util.regex.*;
public class Test {
public static void main(String args[]) throws Exception {
Pattern pattern = Pattern.compile("[^(]*\\(([0-9]*\\.[0-9]*)s");
String text = "Host is up (0.0020s latency).";
Matcher match = pattern.matcher(text);
if (match.lookingAt())
{
String group = match.group(1);
System.out.println("Before parsing: " + group);
BigDecimal value = new BigDecimal(group);
System.out.println("Parsed: " + value);
}
else
{
System.out.println("No match");
}
}
}
Quite how specific you want to make your pattern is up to you, of course. This only checks for digits, a dot, then digits after an opening bracket and before an s. You may need to refine it to make the dot optional etc.
This is a great site for building regular expressions from simple to very complex. You choose the language and boom.
http://txt2re.com/
Here's a way without regex
String str = "Host is up (0.0020s latency).";
str = str.substring(str.indexOf('(')+1, str.indexOf("s l"));
System.out.println(str);
Of course using regular expressions in this case is best solution but in many simple cases you can use also something like :
String value = myString.subString(myString.indexOf("("), myString.lastIndexOf("s"))
double numericValue = Double.parseDouble(value);
This is not recomended because text in myString can changes.

Categories