How can I replace a named group's value - java

I have the regex
private static final String COPY_NUMBER_REGEX = ".*copy(?<copy_num>\\d+)";
And I need to replace the named group as follows:
private void setCopyNum(){
Pattern pa = Pattern.compile(COPY_NUMBER_REGEX);
Matcher ma = pa.matcher(template.getName());
if(ma.find()){
Integer numToReplace = Integer.valueOf(ma.group("copy_num")) + 1;
//How to replace the value of the original captured group with numToReplace?
}
}
The question is in the comment in fact. Is there something in Java Regex that allows us to replace named groups value? I mean to get a new String with a replaced value, of course. For instance:
input: Happy birthday template - copy(1)
output: Happy birthday template - copy(2)

Here's a quick and dirty solution:
// | preceded by "copy"
// | | named group "copynum":
// | | | any 1+ digit
final String COPY_NUMBER_REGEX = "(?<=copy)(?<copynum>\\d+)";
// using String instead of your custom Template object
String template = "blah copy41";
Pattern pa = Pattern.compile(COPY_NUMBER_REGEX);
Matcher ma = pa.matcher(template);
StringBuffer sb = new StringBuffer();
// iterating match
while (ma.find()) {
Integer numToReplace = Integer.valueOf(ma.group("copynum")) + 1;
ma.appendReplacement(sb, String.valueOf(numToReplace));
}
ma.appendTail(sb);
System.out.println(sb.toString());
Output
blah copy42
Notes
copy_num is an invalid named group - no underscores allowed
The example is self-contained (would work in a main method). You'll need to adapt slightly to your context.
You might need to add escaped parenthesis around your named group, if you need to actually match those: "(?<=copy\\()(?<copynum>\\d+)(?=\\))"

Related

Replacing all regex matches with masking characters in Java

Java 11 here. I have a huge String that will contain 0+ instances of the following "fizz token":
the substring "fizz"
followed by any integer 0+
followed by an equals sign ("=")
followed by another string of any kind, a.k.a. the "fizz value"
terminated by the first whitespace (included tabs, newlines, etc.)
So some examples of a valid fizz token:
fizz0=fj49jc49fj59
fizz39=f44kk5k59
fizz101023=jjj
Some examples of invalid fizz tokens:
fizz=9d94dj49j4 <-- missing an integer after "fizz" and before "="
fizz2= <-- missing a fizz value after "="
I am trying to write a Java method that will:
Find all instances of matching fizz tokens inside my huge input String
Obtain each fizz token's value
Replace each character of the token value with an upper-case X ("X")
So for example:
| Fizz Token | Token Value | Final Result |
|--------------------|--------------|--------------------|
| fizz0=fj49jc49fj59 | fj49jc49fj59 | fizz0=XXXXXXXXXXXX |
| fizz39=f44kk5k59 | f44kk5k59 | fizz39=XXXXXXXXX |
| fizz101023=jjj | jjj | fizz101023=XXX |
I need the method to do this replacement with the token values for all fizz tokens found in the input sting, hence:
String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
String masked = mask(input);
// Outputs: Some initial text fizz0=XXXXXXXXXXXX then some more fizz101023=XXX
System.out.println(masked);
My best attempt thus far is a massive WIP:
public class Masker {
private Pattern fizzTokenPattern = Pattern.compile("fizz{d*}=*");
public String mask(String input) {
Matcher matcher = fizzTokenPattern.matcher(input);
int numMatches = matcher.groupCount();
for (int i = 0; i < numMatches; i++) {
// how to get the token value from the group?
String tokenValue = matcher.group(i); // ex: fj49jc49fj59
// how to replace each character with an X?
// ex: fj49jc49fj59 ==> XXXXXXXXXXXX
String masked = tokenValue.replaceAll("*", "X");
// how to grab the original (matched) token and replace it with the new
// 'masked' string?
String entireTokenWithValue = input.substring(matcher.group(i));
}
}
}
I feel like I'm in the ballpark but missing some core concepts. Anybody have any ideas?
According to requirements
the substring "fizz"
followed by any integer 0+
followed by an equals sign ("=")
followed by another string of any kind, a.k.a. the "fizz value"
terminated by the first whitespace (included tabs, newlines, etc.)
regex which fulfill it can look like
fizz
\d+
=
-5. \S+ - one or more of any NON-whitespace characters.
which gives us "fizz\\d+=\\S+".
But since you want to only modify some part of that match, and reuse other we can wrap those parts in groups like "(fizz\\d+=)(\\S+)". This way our replacement will need to
assign back what was found in "(fizz\\d+=)
modify what was found in "(\\S+)"
this modification is simply assigning X repeated n times where n is length of what is found in group "(\\S+)".
In other words your code can look like
class Masker {
private static Pattern p = Pattern.compile("(fizz\\d+=)(\\S+)");
public static String mask(String input) {
return p.matcher(input)
.replaceAll(match -> match.group(1)+"X".repeat(match.group(2).length()));
}
//DEMO
public static void main(String[] args) throws Exception {
String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
String masked = Masker.mask(input);
System.out.println(input);
System.out.println(masked);
}
}
Output:
Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj
Some initial text fizz0=XXXXXXXXXXXX then some more fizz101023=XXX
Version 2 - with named-groups so more readable/easier to maintain
class Masker {
private static Pattern p = Pattern.compile("(?<token>fizz\\d+=)(?<value>\\S+)");
public static String mask(String input) {
StringBuilder sb = new StringBuilder();
Matcher m = p.matcher(input);
while(m.find()){
String token = m.group("token");
String value = m.group("value");
String maskedValue = "X".repeat(value.length());
m.appendReplacement(sb, token+maskedValue);
}
m.appendTail(sb);
return sb.toString();
}
//DEMO
public static void main(String[] args) throws Exception {
String input = "Some initial text fizz0=fj49jc49fj59 then some more fizz101023=jjj";
String masked = Masker.mask(input);
System.out.println(input);
System.out.println(masked);
}
}

JAVA regular expression to replace all occurrences of a particular word "working weird"

Does anyone see something wrong with this regex I have. All I want is for this to find any occurrences of the and replace it with what word the user chooses. This expression only changes some occurrences and when it does it removes the before white space and I guess concatenates it with the word before.
Also it should not replace then, there, their, they etc
private final String MY_REGEX = (" the | THE | thE | The | tHe | ThE ");
userInput = JTxtInput.getText();
String usersChoice = JTxtUserChoice.getText();
String usersChoiceOut = (usersChoice + " ");
Pattern pattern = Pattern.compile(MY_REGEX, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(userInput);
while (matcher.find())
{
userInput = userInput.replaceAll(MY_REGEX, usersChoiceOut);
JTxtOutput.setText(userInput);
System.out.println(userInput);
}
Ok this new code seems to replace all desired words and nothing else, also doing it without the spacing issues.
private final String MY_REGEX = ("the |THE |thE |The |tHe |ThE |THe ");
String usersChoiceOut = (usersChoice + " ");
The problem is because of the spaces in MY_REGEX. Check the following demo:
public class Main {
public static void main(String[] args) {
String str="This is the eighth wonder of THE world! How about a new style of writing The as tHe";
// Correct way
String MY_REGEX = ("the|THE|thE|The|tHe|ThE");
System.out.println(str.replaceAll(MY_REGEX, "###"));
}
}
Outputs:
This is ### eighth wonder of ### world! How about a new style of writing ### as ###
whereas
public class Main {
public static void main(String[] args) {
String str="This is the eighth wonder of THE world! How about a new style of writing The as tHe";
// Incorrect way
String MY_REGEX = ("the | THE | thE | The | tHe | ThE");
System.out.println(str.replaceAll(MY_REGEX, "###"));
}
}
Outputs:
This is ###eighth wonder of###world! How about a new style of writing###as tHe
The spaces in the alternation have meaning and will tried to be matched literally on both sides of the word.
As you are already using Pattern.CASE_INSENSITIVE, you could also match the followed by a single space as you mention in your updated answer, and use an inline modifier (?i) to make the pattern case insensitive.
userInput = userInput.replaceAll("(?i)the ", usersChoiceOut);
If the should not be part of a larger word, you add a word boundary \b before it.
(?i)\bthe

Java regex, find a specific key : value with digits, with or without comma separated

I need to fetch a retrieve a particular value from a key named 'Result' found on text files.
A Text could look like this
This is text file :
value 1: abc
value 2: def
value 3: xyz
value 3: Constant : Result:482,9 abc²:88,55 r:0
x = abc in Pa , y = eee in yyy
I need to get the value of the key : Result, which is in this case 482,9
This is my pseudo code:
private static final String resultRegex = "(K:(?<Result>\\d+,\\d+).*)";
private final Pattern RESULT_PATTERN = Pattern.compile(resultRegex);
Matcher resultMatcher = RESULT_PATTERN.matcher(fileContent);
String resultValue = "";
if (resultMatcher.find()) {
resultValue = resultMatcher.group(RESULT_REGEX_VALUE);
} else {
throw new RuntimeException(EXCEPTION_FILE_RESULT_NOT_VALID);
}
This works and gives me 482,9 as result.
But It doesn't work when the value in the result doesn't have comma and is of Integer type.
Any suggestions how to change :
private static final String resultRegex = "(Result:(?<Result>\\d+,\\d+).*)";
I think you want to extract a floating number after Result:, and perhaps, got confused with the named capture group usage. Actually, the name here does not make any difference, you might as well use a numbered group, but what is important is the K: in the pattern: it requires the next pattern to match after a sequence of chars K:. And that is not the case here.
You may match the number after Result::
private static final String resultRegex = "Result:\\s*(?<Result>\\d+[,.]\\d+)";
private static final Pattern RESULT_PATTERN = Pattern.compile(resultRegex);
Here, Result:\s*(?<Result>\d+[,.]\d+) pattern matches Result:, 0+ whitespaces (\s*), and then captures into group "Result" 1 or more digits, a comma or period, and again 1 or more digits.
See a Java demo and a regex demo.
String fileContent = "value 1: abc\n\nvalue 2: def\n\nvalue 3: xyz\n\nvalue 3: Constant : Result:482,9 abc²:88,55 r:0\n\n x = abc in Pa , y = eee in yyy";
Matcher resultMatcher = RESULT_PATTERN.matcher(fileContent);
String resultValue = "";
if (resultMatcher.find()) {
resultValue = resultMatcher.group("Result");
} else {
throw new RuntimeException(EXCEPTION_FILE_RESULT_NOT_VALID);
}
System.out.println(resultValue); // => 482,9

Java String- How to get a part of package name in android?

Its basically about getting string value between two characters. SO has many questions related to this. Like:
How to get a part of a string in java?
How to get a string between two characters?
Extract string between two strings in java
and more.
But I felt it quiet confusing while dealing with multiple dots in the string and getting the value between certain two dots.
I have got the package name as :
au.com.newline.myact
I need to get the value between "com." and the next "dot(.)". In this case "newline". I tried
Pattern pattern = Pattern.compile("com.(.*).");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
int ct = matcher.group();
I tried using substrings and IndexOf also. But couldn't get the intended answer. Because the package name in android varies by different number of dots and characters, I cannot use fixed index. Please suggest any idea.
As you probably know (based on .* part in your regex) dot . is special character in regular expressions representing any character (except line separators). So to actually make dot represent only dot you need to escape it. To do so you can place \ before it, or place it inside character class [.].
Also to get only part from parenthesis (.*) you need to select it with proper group index which in your case is 1.
So try with
String beforeTask = "au.com.newline.myact";
Pattern pattern = Pattern.compile("com[.](.*)[.]");
Matcher matcher = pattern.matcher(beforeTask);
while (matcher.find()) {
String ct = matcher.group(1);//remember that regex finds Strings, not int
System.out.println(ct);
}
Output: newline
If you want to get only one element before next . then you need to change greedy behaviour of * quantifier in .* to reluctant by adding ? after it like
Pattern pattern = Pattern.compile("com[.](.*?)[.]");
// ^
Another approach is instead of .* accepting only non-dot characters. They can be represented by negated character class: [^.]*
Pattern pattern = Pattern.compile("com[.]([^.]*)[.]");
If you don't want to use regex you can simply use indexOf method to locate positions of com. and next . after it. Then you can simply substring what you want.
String beforeTask = "au.com.newline.myact.modelact";
int start = beforeTask.indexOf("com.") + 4; // +4 since we also want to skip 'com.' part
int end = beforeTask.indexOf(".", start); //find next `.` after start index
String resutl = beforeTask.substring(start, end);
System.out.println(resutl);
You can use reflections to get the name of any class. For example:
If I have a class Runner in com.some.package and I can run
Runner.class.toString() // string is "com.some.package.Runner"
to get the full name of the class which happens to have a package name inside.
TO get something after 'com' you can use Runner.class.toString().split(".") and then iterate over the returned array with boolean flag
All you have to do is split the strings by "." and then iterate through them until you find one that equals "com". The next string in the array will be what you want.
So your code would look something like:
String[] parts = packageName.split("\\.");
int i = 0;
for(String part : parts) {
if(part.equals("com")
break;
}
++i;
}
String result = parts[i+1];
private String getStringAfterComDot(String packageName) {
String strArr[] = packageName.split("\\.");
for(int i=0; i<strArr.length; i++){
if(strArr[i].equals("com"))
return strArr[i+1];
}
return "";
}
I have done heaps of projects before dealing with websites scraping and I
just have to create my own function/utils to get the job done. Regex might
be an overkill sometimes if you just want to extract a substring from
a given string like the one you have. Below is the function I normally
use to do this kind of task.
private String GetValueFromText(String sText, String sBefore, String sAfter)
{
String sRetValue = "";
int nPos = sText.indexOf(sBefore);
if ( nPos > -1 )
{
int nLast = sText.indexOf(sAfter,nPos+sBefore.length()+1);
if ( nLast > -1)
{
sRetValue = sText.substring(nPos+sBefore.length(),nLast);
}
}
return sRetValue;
}
To use it just do the following:
String sValue = GetValueFromText("au.com.newline.myact", ".com.", ".");

Trouble with regular expressions in java

I want to check given logical formulars with a regular expression.
The logical connectives for this form are & (and) , | (or), (!) negation sign (multiple negations allowed) and the variables are normal character sequences followed with cardinalities [0],[1],[0..1].
the variable names can also something be like this "F.G.H." or "F:G:H:" or simple "F" etc.
the square brackets belongs to the cardinalites.,lso constants are allowed, e.g.
with this pattern it is not working:
Pattern.compile("([!]*[a-zA-Z][\\.])?([!]*[a-zA-Z][\\.]?)*((\\[0\\])?|(\\[1\\])?|(\\[0\\.\\.1\\])?)|(TRUE)|(FALSE)|(&)|(|)|(!)");
my current case that a variable like this: !!F[0] is not accepted, but i want this to be accepted.
here some examples for the formulars, which i want to allow
!!F[0] & !F1.G[0..1] | (F1[1] | F2[0]) & F:G[0..1]
also whitespaces between each element, except variables and their cardinalities shall be allowed.
This one is quite awful but should suit your needs:
[!(]*([A-Z]+[0-9]*([.:][A-Z]+[0-9]*)*\[([01]|0[.]{2}1)\]|TRUE|FALSE)[)]*( *[&|] *[!(]*([A-Z]+[0-9]*([.:][A-Z]+[0-9]*)*\[([01]|0[.]{2}1)\]|TRUE|FALSE)[)]*)*
Demo
Please note that is simply allows parentheses without counting them, i.e. inputs such as !((!(F[0]) will match while only !((!(F[0]))) should.
If you want something already cleaner, you could build your regex step by step:
String atomVarPref = "[!(]*";
String atomVar = "[A-Z]+[0-9]*";
String atomSep = "[.:]";
String atomVarCard = "\\[([01]|0[.]{2}1)\\]";
String atomVarSuff = "[)]*";
String sep = " *[&|] *";
String varTemplate = "%s(%s(%s%s)*%s|TRUE|FALSE)%s";
String var = String.format(varTemplate, atomVarPref, atomVar, atomSep, atomVar, atomVarCard, atomVarSuff);
String regexTemplate = "%s(%s%s)*";
String regex = String.format(regexTemplate, var, sep, var);
Calling:
String input = "!!F[0] & !F1.G[0..1] | (F1[1] | F2[0]) & F:G[0..1]";
System.out.println(input.matches(regex)); // prints true

Categories