Java Regex: replace any B NOT between A and Z - java

I'm looking for a regular expression that replaces any B in a string that is not surrounded by A and Z.
Note that there may be many Bs inside and outside of sequences starting with A and ending Z, but I only want to replace those that are outside.
In other words: what Regex is required to make the following JUnit test pass?
#Test
public void testReplaceBnotBetweenAandZ() throws Exception {
String str = "U-B-V-B-A-B-C-B-Z-W-A-B-Z-B-U";
String repl = str.replaceAll(**#REGEX#**, "x");
Assert.assertEquals("U-x-V-x-A-B-C-B-Z-W-A-B-Z-x-U", repl);
}
The real use case is to replace any & characters of an (X)HTML string that are not contained in a CDATA section. (B = &, A = <![CDATA[ and Z = ]]>).
Thanks!

You can use negative lookahead:
String repl = str.replaceAll("(?<!A[^AZ]{0,999})B(?![^AZ]*Z)", "x");
//=> U-x-V-x-A-B-C-B-Z-W-A-B-Z-x-U

The boundless, quickest way is to match both A - Z and B
Then replace appropriately within a callback.
Find: (A[^Z]*Z)|B
Replace Callback: Group 1 matched ? Group 1 : "x"
( A [^Z]* Z ) # (1)
| B
Sample code:
Pattern p = Pattern.compile("(A[^Z]*Z)|B");
Matcher m = p.matcher(inputString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
if (m.start(1) < 0) {
m.appendReplacement(sb, "x");
} else {
m.appendReplacement(sb, "$1");
}
}
m.appendTail(sb);
For your actual use case:
Pattern p = Pattern.compile("(\\Q<![CDATA[\\E(?:(?!\\Q]]>\\E).)*\\Q]]>\\E)|&");

/(?<!A-)B(?!-Z)/ passes the test.
#Test
public void testReplaceBnotBetweenAandZ() throws Exception {
String str = "U-B-V-B-A-B-C-B-Z-W-A-B-Z-B-U";
String repl = str.replaceAll("(?<!A-)B(?!-Z)", "x");
Assert.assertEquals("U-x-V-x-A-B-C-B-Z-W-A-B-Z-x-U", repl);
}
I used negative lookahead (?!-Z) and lookbehind (?<!A-). You can find here more about.

Related

Is there a regex where if first expression is valid then check for next [duplicate]

I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}

In java pattern matcher(regex) how to iterate and replace each text with different text

I want to check for pattern matching, and if the pattern matches, then I wanted to replace those text matches with the element in the test array at the given index.
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"}
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$5\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println(matcher.groupCount());
System.out.println(matcher.replaceAll("test"));
}
System.out.println(text);
}
}
I want the end result text string to be in this format:
{\"test1\":\"one\",\"test2\":\"$two\",\"test3\":\"three\",\"test4\":\"four\"}
but the while loop is exiting after one match and "test" is replaced everywhere like this:
{"test1":"test","test2":"test","test3":"test","test4":"test"}
Using the below code I got the result:
public class test {
public static void main(String[] args) {
String[] test={"one","two","three","four"};
Pattern pattern = Pattern.compile("\\$(\\d)+");
String text="{\"test1\":\"$1\",\"test2\":\"$2\",\"test3\":\"$3\",\"test4\":\"$4\"}";
Matcher m = pattern.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, test[Integer.parseInt(m.group(1)) - 1]);
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
But, if I have a replacement text array like this,
String[] test={"$$one","two","three","four"};
then, because of the $$, I am getting an exception in thread "main":
java.lang.IllegalArgumentException: Illegal group reference
at java.util.regex.Matcher.appendReplacement(Matcher.java:857)**
The following line is your problem:
System.out.println(matcher.replaceAll("test"));
If you remove it the loop will walk through all matches.
As a solution for your problem, you could replace the loop with something like this:
For Java 8:
StringBuffer out = new StringBuffer();
while (matcher.find()) {
String r = test[Integer.parseInt(matcher.group(1)) - 1];
matcher.appendReplacement(out, r);
}
matcher.appendTail(out);
System.out.println(out.toString());
For Java 9 and above:
String x = matcher.replaceAll(match -> test[Integer.parseInt(match.group(1)) - 1]);
System.out.println(x);
This only works, if you replace the $5 with $2 which is what I would assume is your goal.
Concerning the $ signs in the replacement string, the documentation states:
A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).
In other words, you must write your replacement array as String[] test = { "\\$\\$one", "two", "three", "four" };
I can do a regex solution if you like, but this is much easier (assuming this is the desired output).
int count = 1;
for (String s : test) {
text = text.replace("$" + count++, s);
}
System.out.println(text);
It prints.
{"test1":"one","test2":"two","test3":"three","test4":"four"}

Java Regex expression not working

I have a problem with not working REGEX. I dont know what I am doing wrong. My code:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("\\btimetable:(.*);");
//also tried "timetable:(.*);" and "(\\btimetable:)(.*)(;)"
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("S:" + m.start() + ", E:" + m.end());
System.out.println("x: "+ test.substring(m.start(), m.end()));
}
Expected result:
(1) "timetable:xxxxxtimetable:"
(2) "timetable: fullihhghtO"
I thanks for any help.
A non-capturing group could be handy in our case:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("(?:\\btimetable:(.*?);)+"); // <-- here
Matcher m = p.matcher(test);
int i = 1;
while (m.find()) {
System.out.println(i + ") "+ m.group(1));
i++;
}
OUTPUT
1) xxxxxtimetable:
2) fullihhghtO
Regex explained:
(?:\\btimetable:(.*?);)+ by using the non-capturing (?:\\btimetable:...) we'll consume the "timetable:" without capturing it, then the second matching group (.*?) captures what we want to capture (everything between \btimetable: and ;). Pay special attention to the non-greedy term: .*? which means that we'll consume the minimum possible amount of characters until the ;. If we won't use this lazy form, the regex will use "greedy" default mode and will consume all the characters until the last ; in the string!
Now, all that is relevant if you wanted to catch only the unique part, but if you wanted to catch the whole thing:
1) timetable:xxxxxtimetable:;
2) timetable: fullihhghtO;
It can be done easily by modifying the line with the regex to:
Pattern p = Pattern.compile("\\b(timetable:.*?;)+");
which is even simpler: only one capturing group (see that we still have to use the non-greedy mode!).
You don't need to use regex, a simple split would do it :
public static void main(String[] args) throws IOException {
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
String[] array = test.split(";");
String str1 = array[0].trim();
String str2 = array[1].trim();
System.out.println(str1 + "\n" + str2); //timetable:xxxxxtimetable:
//timetable: fullihhghtO
}

How to return the first chunk of either numerics or letters from a string?

For example, if I had (-> means return):
aBc123afa5 -> aBc
168dgFF9g -> 168
1GGGGG -> 1
How can I do this in Java? I assume it's something regex related but I'm not great with regex and so not too sure how to implement it (I could with some thought but I have a feeling it would be 5-10 lines long, and I think this could be done in a one-liner).
Thanks
String myString = "aBc123afa5";
String extracted = myString.replaceAll("^([A-Za-z]+|\\d+).*$", "$1");
View the regex demo and the live code demonstration!
To use Matcher.group() and reuse a Pattern for efficiency:
// Class
private static final Pattern pattern = Pattern.compile("^([A-Za-z]+|\\d+).*$");
// Your method
{
String myString = "aBc123afa5";
Matcher matcher = pattern.matcher(myString);
if(matcher.matches())
System.out.println(matcher.group(1));
}
Note: /^([A-Za-z]+|\d+).*$ and /^([A-Za-z]+|\d+)/ both works in similar efficiency. On regex101 you can compare the matcher debug logs to find out this.
Without using regex, you can do this:
String string = "168dgFF9g";
String chunk = "" + string.charAt(0);
boolean searchDigit = Character.isDigit(string.charAt(0));
for (int i = 1; i < string.length(); i++) {
boolean isDigit = Character.isDigit(string.charAt(i));
if (isDigit == searchDigit) {
chunk += string.charAt(i);
} else {
break;
}
}
System.out.println(chunk);
public static String prefix(String s) {
return s.replaceFirst("^(\\d+|\\pL+|).*$", "$1");
}
where
\\d = digit
\\pL = letter
postfix + = one or more
| = or
^ = begin of string
$ = end of string
$1 = first group `( ... )`
An empty alternative (last |) ensures that (...) is always matched, and always a replace happens. Otherwise the original string would be returned.

Replacing regex with the same amount of "." as its length

See this for my current attempt: http://regexr.com?374vg
I have a regex that captures what I want it to capture, the thing is that the String().replaceAll("regex", ".") replaces everything with just one ., which is fine if it's at the end of the line, but otherwise it doesn't work.
How can I replace every character of the match with a dot, so I get the same amount of . symbols as its length?
Here's a one line solution:
str = str.replaceAll("(?<=COG-\\d{0,99})\\d", ".").replaceAll("COG-(?=\\.+)", "....");
Here's some test code:
String str = "foo bar COG-2134 baz";
str = str.replaceAll("(?<=COG-\\d{0,99})\\d", ".").replaceAll("COG-(?=\\.+)", "....");
System.out.println(str);
Output:
foo bar ........ baz
This is not possible using String#replaceAll. You might be able to use Pattern.compile(regexp) and iterate over the matches like so:
StringBuilder result = new StringBuilder();
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(inputString);
int previous = 0;
while (matcher.find()) {
result.append(inputString.substring(previous, matcher.start()));
result.append(buildStringWithDots(matcher.end() - matcher.start()));
previous = matcher.end();
}
result.append(inputString.substring(previous, inputString.length()));
To use this you have to define buildStringWithDots(int length) to build a String containing length dots.
Consider this code:
Pattern p = Pattern.compile("COG-([0-9]+)");
Matcher mt = p.matcher("Fixed. Added ''Show annualized values' chackbox in EF Comp Report. Also fixed the problem with the missing dots for the positions and the problem, described in COG-18613");
if (mt.find()) {
char[] array = new char[mt.group().length()];
Arrays.fill(array, '.');
System.out.println( " <=> " + mt.replaceAll(new String(array)));
}
OUTPUT:
Fixed. Added ''Show annualized values' chackbox in EF Comp Report. Also fixed the problem with the missing dots for the positions and the problem, described in .........
Personally, I'd simplify your life and just do something like this (for starters). I'll let you finish.
public class Test {
public static void main(String[] args) {
String cog = "COG-19708";
for (int i = cog.indexOf("COG-"); i < cog.length(); i++) {
System.out.println(cog.substring(i,i+1));
// build new string
}
}
}
Can you put your regex in grouping so replace it with string that matches the length of matched grouping? Something like:
regex = (_what_i_want_to_match)
String().replaceAll(regex, create string that has that many '.' as length of $1)
?
note: $1 is what you matched in your search
see also: http://www.regular-expressions.info/brackets.html

Categories