Why the second argument is not being taken as regex? - java

I came across an interesting question on java regex
Is there a regular expression way to replace a set of characters with another set (like shell tr command)?
So I tried the following:
String a = "abc";
a = a.replaceAll("[a-z]", "[A-Z]");
Now if I get print a the output is
[A-Z][A-Z][A-Z]
Here I think the compiler is taking the first argument as gegex, but not the second argument.
So is there any problem with this code or something else is the reason???

This is the way replaceAll works.
See API:
public String replaceAll(String regex, String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.

The answer to the linked question is a quite clear »No«, so this should come as no surprise.
As you can see from the documentation the second argument is indeed a regular string that is used as replacement:
Parameters:
regex – the regular expression to which this string is to be matched
replacement – the string to be substituted for each match

second argument is simple String that will get substituted according to API

If you want to turn lower case to upper case, there is a toUpperCase function available in String class. For equivalent functionality to tr utility, I think there is no support in Java (up to Java 7).
The replacement string is usually take literally, except for the sequence $n where n denotes the number of the capturing group in the regex. This will use captured string from the match as replacement.

I consider regex as a way to express a condition (i.e does a given string match this expression). With that in mind, what you are asking would mean "please replace what matches in my string with ... another condition" which doesn't make much sens.
Now by trying to understand what you are looking for, it ssems to me that you want to find some automatic mapping between classes of characters (e.g. [a-z] -> [A-Z]). As far as I know this does not exist and you would have to write it yourself (except for the forementionned toUpperCase())

public String replaceAll(String regex, String replacement)
First argument is regular expression if substring matches with that pattern that will be replaced by second argument ,if you want to convert to lowercase to upper case use
toUpperCase()
method

You should look into jtr. Example of usage:
String hello = "abccdefgdhcij";
CharacterReplacer characterReplacer;
try {
characterReplacer = new CharacterReplacer("a-j", "Helo, Wrd!");
hello = characterReplacer.doReplacement(hello);
} catch(CharacterParseException e) {
}
System.out.println(hello);
Output:
Hello, World!

Related

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.
$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}
If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");
The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin
Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.
Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

Java regular expression for number starts with code

I am not a Java developer but I am interfacing with a Java system.
Please help me with a regular expression that would detect all numbers starting with with 25678 or 25677.
For example in rails would be:
^(25677|25678)
Sample input is 256776582036 an 256782405036
^(25678|25677)
or
^2567[78]
if you do ^(25678|25677)[0-9]* it Guarantees that the others are all numbers and not other characters.
Should do the trick for you...Would look for either number and then any number after
In Java the regex would be the same, assuming that the number takes up the entire line. You could further simplify it to
^2567[78]
If you need to match a number anywhere in the string, use \b anchor (double the backslash if you are making a string literal in Java code).
\b2567[78]
how about if there is a possibility of a + at the beginning of a number
Add an optional +, like this [+]? or like this \+? (again, double the backslash for inclusion in a string literal).
Note that it is important to know what Java API is used with the regular expression, because some APIs will require the regex to cover the entire string in order to declare it a match.
Try something like:
String number = ...;
if (number.matches("^2567[78].*$")) {
//yes it starts with your number
}
Regex ^2567[78].*$ Means:
Number starts with 2567 followed by either 7 or 8 and then followed by any character.
If you need just numbers after say 25677, then regex should be ^2567[78]\\d*$ which means followed by 0 or n numbers after your matching string in begining.
The regex syntax of Java is pretty close to that of rails, especially for something this simple. The trick is in using the correct API calls. If you need to do more than one search, it's worthwhile to compile the pattern once and reuse it. Something like this should work (mixed Java and pseudocode):
Pattern p = Pattern.compile("^2567[78]");
for each string s:
if (p.matcher(s).find()) {
// string starts with 25677 or 25678
} else {
// string starts with something else
}
}
If it's a one-shot deal, then you can simplify all this by changing the pattern to cover the entire string:
if (someString.matches("2567[78].*")) {
// string starts with 25677 or 25678
}
The matches() method tests whether the entire string matches the pattern; hence the leading ^ anchor is unnecessary but the trailing .* is needed.
If you need to account for an optional leading + (as you indicated in a comment to another answer), just include +? at the start of the pattern (or after the ^ if that's used).

Splitting a string in java on more than one symbol

I want to split a string when following of the symbols encounter "+,-,*,/,="
I am using split function but this function can take only one argument.Moreover it is not working on "+".
I am using following code:-
Stringname.split("Symbol");
Thanks.
String.split takes a regular expression as argument.
This means you can alternate whatever symbol or text abstraction in one parameter in order to split your String.
See documentation here.
Here's an example in your case:
String toSplit = "a+b-c*d/e=f";
String[] splitted = toSplit.split("[-+*/=]");
for (String split: splitted) {
System.out.println(split);
}
Output:
a
b
c
d
e
f
Notes:
Reserved characters for Patterns must be double-escaped with \\. Edit: Not needed here.
The [] brackets in the pattern indicate a character class.
More on Patterns here.
You can use a regular expression:
String[] tokens = input.split("[+*/=-]");
Note: - should be placed in first or last position to make sure it is not considered as a range separator.
You need Regular Expression. Addionaly you need the regex OR operator:
String[]tokens = Stringname.split("\\+|\\-|\\*|\\/|\\=");
For that, you need to use an appropriate regex statement. Most of the symbols you listed are reserved in regex, so you'll have to escape them with \.
A very baseline expression would be \+|\-|\\|\*|\=. Relatively easy to understand, each symbol you want is escaped with \, and each symbol is separated by the | (or) symbol. If, for example, you wanted to add ^ as well, all you would need to do is append |\^ to that statement.
For testing and quick expressions, I like to use www.regexpal.com

replaceAll type method that preserves special characters

Currently, the replaceAll method of the String class, along with Matcher.replaceAll methods evaluate their arguments as regular expressions.
The problem I am having is that the replacement string I am passing to either of these methods contains a dollar sign (which of course has special meaning in a regular expression). An easy work-around to this would be to pass my replacement string to 'Matcher.quoteReplacement' as this produces a string with literal characters, and then pass this sanitized string to replaceAll.
Unfortunately, I can't do the above as I need to preserve the special characters as the resultant string is later used in operations where a reg ex is expected, and if I have escaped all the special characters this will break that contract.
Can someone please suggest a way I might achieve what I want to do? Many thanks.
EDIT: For clearer explanation, please find code example below:
String key = "USD";
String value = "$";
String content = "The figure is in USD";
String contentAfterReplacement;
contentAfterReplacement = content.replaceAll(key, value); //will throw an exception as it will evaluate the $ in 'value' variable as special regex character
contentAfterReplacement = content.replaceAll(key, Matcher.quoteReplacement(value)); //Can't do this as contentAfterReplacement is passed on and later parsed as a regex (Ie, it can't have special characters escaped).
Why not use String#replace method instead of replaceAll. replaceAll uses regex but replace doesn't use regex in replacement string.

Java Regexp clarification

I have a string like :
<RandomText>
executeRule(x, y, z)
<MoreRandomText>
What I would like to accomplish is the following: if this executeRule string exists in the bigger text block, I would like to get its 2'nd parameter.
How could I do this ?
What do you mean the bigger text block?
If you want to extract the second param from that expression, it would be something like
executeRule\(\w+,\s*(\w+),\s*\w+\)
The second param is held on capture group $1.
Keep in mind that to use this expression in Java, you need to escape the '\'. Also, I'm just assuming \w is good enough to match your params, that would depend on your particular rules.
If you need some help with actually using regexes in Java, there are many resources you can turn to, I found this tutorial to be fairly simple and it explains the basic usages:
http://www.vogella.de/articles/JavaRegularExpressions/article.html
import java.util.regex.Matcher;
import java.util.regex.Pattern;
...
Pattern p = Pattern.compile("executeRule\\(\\w+, (\\w+), \\w+\\)");
Matcher m = p.matcher(YOUR_TEXT_FROM_FILE);
while (m.find()) {
String secondArgument = m.group(1);
...process secondArgument...
}
Once this code executes secondArgument will contain the value of y. The above regular expression assumes that you expect the arguments to be composed of word characters (i.e. small and capital letters, digits and underscore).
Double backslashes are needed by Java string literal syntax, regexp engine will see single backslashes.
If you'd like to allow for whitespace in the string as it is allowed in most programming languages, you may use the following regexp:
Pattern p = Pattern.compile("executeRule\\(\\s*\\w+\\s*,\\s*(\\w+)\\s*,\\s*\\w+\\s*\\)");

Categories