Replace a string without certain prefix and suffix in Java - java

I'm trying to replace all ocurrences of a given string, but I have to be sure that it isn't surrounded with letters or numbers.
For example:
// Directive's block
BIT EQU $1111
BIT0 EQU $0000
// Instruction's block
ADD BIT, (**BIT**0)+
When my parser founds an EQU in the first line, it reads the instruction's block trying to find the given label ("BIT", in this case) and replacing it with its value. Then the result is (which is wrong):
ADD $1111, (**$1111**0)+
Overriding the name of the other label, cause it is a substring of it. So I have to be sure that the surrounded characters are not letters or numbers, then I can be sure that it doesn't overrides another label ID.
My code for now is:
output += operand.replace(label, value)+" ";
operand: a string containing the whole operand
label: the label to be found for replacement
value: the value to be replaced with that label
Now i'm trying to use ReplaceAll() and some regex:
String regex = "(?<![a-zA-Z_])"+label+"[^a-zA-Z_]";
output+= operand.replaceAll(regex, value)+" ";
But it throws the following exception:
IndexOutOfBoundsException: Non group 1 (java.util.regex.Matcher.start)
Even if I left only the suffix, it throws the same error.
Does anybody knows what does it means?
Thanks you guys.

If you're using replaceAll() and you're trying to replace something with $1111, that won't work, because $ has a special meaning in replaceAll. Use Matcher.quoteReplacement(value) instead of value in the replaceAll() call; quoteReplacement makes sure that any special characters are "quoted" so that they no longer have special meanings. (The replacement is interpreting $1 as "replace with the contents of group 1", which is why you're getting the error.)

Related

Using regular expression, how to remove matching sequence at the beginning and ending of the text but keeping what's in the middle?

my problem is very simple but I can't figure out the correct regular expression I should use.
I have the following variable (Java) :
String text = "\033[1mYO\033[0m"; // this is ANSI for bold text in the Terminal
My goal is to remove the ANSI codes with a single regular expression (I just want to keep the plain text at the middle). I cannot modify the text in any way and those ANSI codes will always be at the same place (so one at the beginning, one at the end, though sometimes it's possible that there is none).
With this regular expression, I will remove them using replaceAll method :
String plainText = text.replaceAll(unknownRegex, "");
Any idea on what the unknown regex could be?
Well, you use a single regex that has the ansi codes optionally at the beginning and end, captures anything in between and replaces the entire string with the value of the group: text.replaceAll("^(?:\\\\\\d+\\[1m)?(.*?)(?:\\\\\\d+\\[0m)?$", "$1"). (this might not capture every ansi code - adjust if needed).
Breaking the expression down (note that the example above escapes backslashes for Java strings so they are doubled):
^ is the start of the string
(?:\\\d+\[1m)? matches an optional \<at least 1 digit>[1m
(.*?) matches any text but as little as possible, and captures it into group 1
(?:\\\d+\[0m)? atches an optional \<at least 1 digit>[0m
$ is the end of the input
In the replacement $1 refers to the value of capturing group 1 which is (.*?) in the expression.
Found the answer thanks to a comment that disappeared.
Actually, i just need to make a group to get what's in the middle of the string and using it ($1) to replace the whole thing :
String plainText = text.replaceAll("\\033\\[.*m(.+)\\033\\[.*m", "$1")
Not sure if this will remove every ANSI codes but that is enough for what I want to do.

How to use regular expressions on an index of a String of array in Java

I am basically trying to find regular expression for a text "TC XX" where XX can be any two digit number. My piece of code is:
boolean b = DocArray[RTArrayIndex].matches("/TC \\d{2}/");
where DocArray - an array of string which is basically derived from another string separated by \t
RTArrayIndex - current index of the DocArray array.
Regular Expression - /TC \\d{2}/
The value of string at the current index is "TC 10", but still the value of "b" I am getting is false.
Another index of the array contains the string, "Refer Logs of TC 10" too, but again the value of "b" is false.
You have a few problems. First, your regex contains some "/" characters, which it is attempting to match. If you remove both of those, you will have a slightly better regex.
boolean b = DocArray[RTArrayIndex].matches("TC \\d{2}");
The regex above should evaluate for your first example, but not your second. You need to account for leading and trailing characters. You can do this by using the "." symbol. "." is a placeholder for any character at all, "" means it can be seen any number of times. If you add ".*" to the beginning and end of your pattern, any string that contains the substring "TC \d\d" will match to your regex.
boolean b = DocArray[RTArrayIndex].matches(".*TC \\d{2}.*");
Remove the slash at the begining and the end of your regular expression like that :
TC \\d{2}
This works for your first exemple. If you want all strings containing TC 10, you need to add some part at the begining and the end like .* (which means 'anything')
The final regular expression should be :
.*TC \\d{2}.*

Replace with empty string replaces newChar around all the characters in original string

I was just working on one of my java code in which I am using Java String.replace method. So while testing the replace method as in one situation I am planning to put junk value of String.replace("","");
so on Testing I came to a condition of replacing blank value with some other value i.e String.replace("","p") which replaced "p" everywhere around all the characters of the original String
Example:
String strSample = "val";
strSample = strSample.replace("","p");
System.out.println(strSample);
Output:
pvpaplp
Can anyone please explain why it works like this?
replace looks for each place that you have a String which starts with the replaced string. e.g. if you replace "a" in "banana" it finds "a" 3 times.
However, for empty string it finds it everywhere including before and after the last letter.
Below is the definition from Java docs for the overloaded replace method of your case.
String java.lang.String.replace(CharSequence target, CharSequence
replacement)
Replaces each substring of this string that matches the literal target
sequence with the specified literal replacement sequence. The
replacement proceeds from the beginning of the string to the end, for
example, replacing "aa" with "b" in the string "aaa" will result in
"ba" rather than "ab".
Parameters:
target The sequence of char values to be replaced
replacement The replacement sequence of char values
Now, since you are defining target value as "" i.e. empty, so it will pick each location in the string and replace it with value defined in replacement.
Good thing to note is the fact that if you will use strSample = strSample.replace(" ","p"); which means one white space character as target value then nothing will be replaced because now in this case replace method will try to search for a white space character.
The native Java java.lang.String implementation (like Ruby and Python) considers empty string "" a valid character sequence while performing string operations. Therefore the "" character sequence is effectively everywhere between two chars including before and after the last character.
It works coherently with all java.lang.String operations. See :
String abc = "abc";
System.out.println(abc.replace("", "a")); // aaabaca instead of "abc"
System.out.println(abc.indexOf("", "a")); // 0 instead of -1
System.out.println(abc.contains("", "a")); // true instead of false
As a side note :
This behavior might be misleading because many other languages / implementations do not behave like this. For instance, SQL (MySQL, MSSQL, Oracle and PostgreSQL) and PHP do not considers "" like a valid character sequence for string replacement. .NET goes further and throws System.ArgumentException: String cannot be of zero length. when calling, for instance, abc.Replace("", "a").
Even the popular Apache Commons Lang Java library works differently :
org.apache.commons.lang3.StringUtils.replace("abc", "", "a")); /* abc */
Take a look at this example:
"" + "abc" + ""
What is result of this code?
Answer: it is still "abc". So as you see we can say that all strings have some empty strings before and after it.
Same rule applies in-between characters like
"a"+""+"b"+""+"c"
will still create "abc"
So empty strings also exists between characters.
In your code
"val".replace("","p")
all these empty strings ware replaced with p which result in pvpaplp.
In case of ""+""+..+""+"" assume that Java is smart enough to see it as one "".

regex certain character can exist or not but nothing after that

I'm new to regex and I'm trying to do a search on a couple of string.
I wanted to check if a certain character, in this case its ":" (without the quote) exist on the strings.
If : does not exist in the string it would still match, but if : exist there should be nothing after that only space and new line will be allowed.
I have this pattern, but it does not seem to work as I want it.
(.*)(:?\s*\n*)
Thank you.
If I understand your question correctly, ^[^:]*(:\s*)?$
Let's break this down a bit:
^ Starting anchor; without this, the match can restart itself every time it sees another colon, or non-whitespace following a colon.
[^:]* Match any number of characters that AREN'T colon characters; this way, if the entire string is non-colon characters, the string is treated as a valid match.
(:\s*)? If at any point we do see a colon, all following characters must be white space until the end of the string; the grouping parens and following ? act to make this an all-or-nothing conditional statement.
$ Ending anchor; without this, the regex won't know that if it sees a colon the following whitespace MUST persist until the end of the string.
here is a pattern which should work
/^([^:]*|([^:]*:\s*))$/
you can use the pipe to manage alternatives
Another way is :
^[^:]*(|:[\n]*)$
^[^:]* => starts with anything except :
(|:[\n]*)$ => ends either with exactly nothing OR ':' followed by line breaks

Removing every other character in a string using Java regex

I have this homework problem where I need to use regex to remove every other character in a string.
In one part, I have to delete characters at index 1,3,5,... I have done this as follows:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "$1"));
This prints 12345 which is what I want. Essentially I match two characters at a time, and replacing with the first character. I used group capturing to do this.
The problem is, I'm having trouble with the second part of the homework, where I need to delete characters at index 0,2,4,...
I have done the following:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll(".(.)", "$1"));
This prints abcd5, but the correct answer must be abcd. My regex is only incorrect if the input string length is odd. If it's even, then my regex works fine.
I think I'm really close to the answer, but I'm not sure how to fix it.
You are indeed very close to the answer: just make matching the second char optional.
String s = "1a2b3c4d5";
System.out.println(s.replaceAll(".(.)?", "$1"));
// prints "abcd"
This works because:
Regex is greedy by default, it will take the second character if it's there
When the input is of odd length, the second char won't be there at the last replacement, but you'd still match one char (i.e. last char in input)
You can still use backreferences in substitution even if the group fails to match
It will substitute in the empty string, not "null"
This is different from Matcher.group(int), which returns null for failed groups
References
regular-expressions.info/Optional
A closer look at the first part
Let's take a closer look at the first part of the homework:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "$1"));
// prints "12345"
Here you didn't have to use ? for the second char, but it "works" because even though you didn't match the last char, you didn't have to! The last char can remain unmatched, unreplaced, due to the problem specification.
Now suppose that we want to delete chars at index 1,3,5..., and put the chars at index 0,2,4... in brackets.
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).", "($1)"));
// prints "(1)(2)(3)(4)5"
A-ha!! Now you're experiencing the exact same problem with odd-length input! You couldn't match the last char with your regex, because your regex needs two chars, but there's only one char at the end for odd-length input!
The solution, again, is to make matching the second char optional:
String s = "1a2b3c4d5";
System.out.println(s.replaceAll("(.).?", "($1)"));
// prints "(1)(2)(3)(4)(5)"
my regex is only incorrect if the input string length is odd. if it's even, then my regex works fine.
Change your expresion to .(.)? - the question mark makes the second character optional, which means it doesn't matter if input is odd or even
Your regex needs 2 chars to match, so fails on the final char.
This regex:
".(.{0,1})"
Will make the second char optional, so it will match with your final '5' as well

Categories