Java - What is the regex for the following String - java

I am writing a program that takes in a java file and checks each line for a string containing assertEquals and then replaces the string belonging in the second argument of assertEquals (that would be expectedVar and expectedVar2).
Say these lines are read from a file and placed on a string variable:
String myString1 = "Assert.assertEquals(outputMessage, expectedVar, actualVar);"
String myString2 = "Assert.assertEquals(/"Hello World, /" + "Hello!", expectedVar2, actualVar);"
I would like to use a single regex from the Pattern Library along with 'group' and replace expectedVar and expectedVar2 or basically any string that lies in the second argument of assertEquals.
I was thinking to take in anything after the first comma and before the second comma but the myString2 could also contain multiple commas (eg. /"Hello World, /" + "Hello!").
I am not sure on how to approach this. I am willing to implement this differently if you have another idea.
Thank you in advanced

You need to use regex atomic grouping (?>...|...) >>
Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main
{
public static void main (String[] args) throws java.lang.Exception
{
String text = "Assert.assertEquals(/\"Hello World, /\" + \"Hello!\", expectedVar2, actualVar)";
String pattern = "Assert\\.assertEquals\\((?>(?:[^,]*\"(?:[^,]*,)+[^,]*\")+|[^,]+),\\s*([^,]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(text);
if (m.find()) {
System.out.println("MATCH FOUND: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
}
}
Output:
MATCH FOUND: expectedVar2
Test this code here.

Related

RegExpr output incorrect

I am trying to get all of the output from a string that I want to match a pattern using matcher, however, I am not sure that either the string or my pattern isn't correct. I am trying to get (Server: switch) as the first pattern and so on and so forth after the newline, however, I am only getting the last three pattern as my output shows. My output is the following with the code following
found_m: Message: Mess
found_m: Token: null
found_m: Response: OK
Here is my code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches {
public static void main( String args[] ) {
// String to be scanned to find the pattern.
String line = "Server: Switch\nMessage: Mess\nToken: null\nResponse: OK";
String pattern = "([\\w]+): ([^\\n]+)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
while(m.find()) {
System.out.println("found_m: " + m.group());
}
}else {
System.out.println("NO MATCH");
}
}
}
Is my string line incorrect or my string pattern that I am not doing regexpr wrong?
Thanks in advance.
Your regex is almost correct.
Problem is that you're calling find twice: 1st time in if condition and then again in while.
You can use do-while loop instead:
if (m.find( )) {
do {
System.out.println("found_m: " + m.group());
} while(m.find());
} else {
System.out.println("NO MATCH");
}
For regex part you can use this with minor correction:
final String pattern = "(\\w+): ([^\\n]+)";
or if you don't need 2 capturing groups then use:
final String pattern = "\\w+: [^\\n]+";
As there is no need to use character class around \\w+
I'm not familiar with Java, but this regex pattern should work to capture every group and match.
([\w]+): (\w+)(?:(?:[\\][n])|$)
It basically states capture the word followed by the colon and space, then capture the next word before either the \n or the end of string.
Good luck.

Split String using multiple delimiters in one step

My question is on splitting a string initially based on one criteria and then splitting the remaining part of the string with another criteria. I want to split the email address below into 3 parts in Java:
String email = "blah.blah_blah#mail.com";
// After splitting i want 3 separate strings (can be array or accessed via an Iterable)
string1.equals("blah.blah_blah");
string2.equals("mail");
string3.equals("com");
I know I can first split it into two based on # and then later split the second string based on ., but is there anyway of doing this in one step? I don't mind either the String#split method or regex method using Pattern and Matcher.
Use this regex in your split:
#|[.](?!.*[#.])
It will split at an # or at the very last . after the # (the one before "com"). Regex101 Tested
Use it like this:
String[] emailParts = email.split("#|[.](?!.*[#.])");
Then emailParts will be an array of the 3 strings that you want, in order.
As a bonus, if you want it to split at every dot after the # (including the ones between subdomains), then remove the . from the character class at the end of the regex. It will become #|[.](?!.*#)
You can use this regex:
([^#]*)#([^#]*)\.([^#\.]*)
Here is the demo
Here is the example Java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class JavaRegex
{
public static void main(String args[])
{
// String to be scanned to find the pattern.
String line = "blah.blah_blah#mail.mail2.com";
String pattern = "([^#]*)#([^#]*)\\.([^#\\.]*)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find())
{
System.out.println("Found value: " + m.group(1));
System.out.println("Found value: " + m.group(2));
System.out.println("Found value: " + m.group(3));
} else
{
System.out.println("NO MATCH");
}
}
}
Thanks for Pshemo for pointing out that look-aheads were unnecessary.
You seem to want to split on
- #
or
- any dot that is after # (in other words has # somewhere before it).
If that is the case you can use email.split("#|(?<=#.{0,1000})[.]"); which will return String[] array containing separated tokens.
I used .{0,1000} instead of .* because look-behind needs to have obvious max length in Java which excludes * quantifier. But assuming that # and . will not be separated by more than 1000 characters we can use {0,1000} instead.
String str = "blah.blah_blah#mail.com";
String[] tempMailSplitted;
String[] tempHostSplitted;
String delimiter = "#";
tempMailSplitted = str.split(delimiter);
System.out.println(temp[1]); //mail.com
String hostMailDelimiter = "."
tempHostSplitted = temp[1].split(hostMailDelimiter);
You can also do it in a regex if you want that ask me. :)

Deal with apostrophe in java regex in replaceALL

Trying the replace only the EXACT & WHOLE OCCURRENCES of pattern using the following code. Apparently you in you'll is being replaced as ###'ll. But what I want is only you to be replaced.
Please suggest.
import java.util.*;
import java.io.*;
public class Fielreadingtest{
public static void main(String[] args) throws IOException {
String MyText = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed. ";
String newLine = System.getProperty("line.separator");
System.out.println("Before:" + newLine + MyText);
String pattern = "\\byou\\b";
MyText = MyText.replaceAll(pattern, "###");
System.out.println("After:" + newLine +MyText);
}
}
/*
Before:
I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.
After:
I knew about ### long before I met ###. I also know that ###’re an awesome person. By the way ###’ll be missed.
*/
This being said I have an input file which contains a list of words that I want to skip which looks like this:
Now as per #Anubhav I have to use (^|\\s)you([\\s.]|$) to replace exactly you but not anything else. Is my best bet to use a tool like notepad++ and pre & post fix all my input words as above or change something in the code itslef. The code I'm using is this:
for (String pattern : patternsToSkip) {
line = line.replaceAll(pattern, "");
}
source: https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_wordcount2_source.html?scroll=topic_7_1
You can instead use this regex:
String pattern = "(^|\\s)you([\\s.,;:-]|$)";
This will match "you" only at:
start or preceded by a space
end or followed by a space OR a some listed punctuation characters
You can use a negative lookahead:
\b(you)(?!['’])
Escaped for a Java string:
"\\b(you)(?!['’])"
Your demo input contains a different apostrophe than on my keyboard. I've put both in the negative lookahead.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
/**
<P>{#code java ReplaceYouWholeWordWithAtAtAt}</P>
**/
public class ReplaceYouWholeWordWithAtAtAt {
public static final void main(String[] ignored) {
String sRegex = "\\byou(?!['’])";
String sToSearch = "I knew about you long before I met you. I also know that you’re an awesome person. By the way you’ll be missed.";
String sRplcWith = "###";
Matcher m = Pattern.compile(sRegex).matcher(sToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, sRplcWith);
}
m.appendTail(sb);
System.out.println(sb);
}
}
Output:
[C:\java_code\]java ReplaceYouWholeWordWithAtAtAt
I knew about ### long before I met ###. I also know that youÆre an awesome person. By the way youÆll be missed.

Java regex: trouble extracting word phrase

I'm trying to extract a word phrase from a Java source file. For example I have a simple source class
class TestClass implements TestInterface implements TestInterface2 {
}
class TestClass2 {
}
I want to extract the "class TestClass" and "class TestClass2". I have tried different regex patterns but couldn't find a solution
My testing code spinet:-
public static void wordPhraser(String sourceText) {
Pattern p = Pattern.compile("class(\\s+)([a-zA-Z]*)");
Matcher m = p.matcher(sourceText);
while (m.find()) {
System.out.println("output " + m.group());
}
}
Also tried:-
"class\\s*([a-zA-Z])"
"class\\s*[a-zA-Z]"
"^class\\s+[a-zA-Z]$"
Non of these are working.
Thanks.
I'm afraid to say that they work, but there is room for improvement:
\bclass(\s+)([a-zA-Z_]\w*)\b
Is a better regex. You weren't matching numbers.
For sure, this is how you should use it in Java:
String regex = "\\bclass(\\s+)([a-zA-Z_]\\w*)\\b";
To match more:
\b((public|private|protected|static|abstract|final)\s*)*class(\s+)([a-zA-Z_]\w*)\b
Demo:
Here is the regex I use:
(final|abstract|\n|^) {0,}class {1,}.{1,} {0,}\\{
That will get the test including the implements/interfaces too though. Here's the code I use to parse them out, and just get the classname:
String match = m.group();//m is my matcher for the regex
String s = match.substring(match.indexOf("class ") + "class ".length(), match.lastIndexOf("{")).trim();
if(s.contains("extends"))
s=s.substring(0, s.indexOf("extends"));
if(s.contains("implements"))
s=s.substring(0, s.indexOf("implements"));
s=s.trim();
strings.add(s);
NOTE: This won't work with public or private classes, only those with simply final/abstract modifiers

getting a string with quotes

I have a string "Hello" hello (including the quotes) and i just want to get the Hello that has the quotes but without the quotes
i tried using regular expression but it never finds the quotes im guessing
String s = new String("string");
Pattern p = Pattern.compile("\"([^\"])\"");
Matcher m = p.matcher(n);
while (m.find()) {
s = m.group(1);
}
the while loop never gets executed, suggestions?
-- Moved the star inside the parenthesis for proper grouping ---
"\"([^\"]*)\""
Tested successfully with the code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = new String("\"Hello\" hello");
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
which produced the expected output
Hello
-- Original post follows --
You don't match anything because your regex is written to only match quoted one character strings.
"\"([^\"])*\""
is closer to what you need. Note the star, it means zero or more of the preceeding expression. In this case the preceeding expression is "anything that lacks a double quote".
I suggest you try a String which has quotes in it if you want to find any. ;)
Try
String s = "start \"string\" end";
or
String s = "\"Hello\" hello";
You can simply use indexOf("\"") in this case.

Categories