replacing string with regex in java

replacing string with regex in java - java

I think I have a decent handle wrt matching strings using Regex in Java, but now I am trying to replace strings using Regex and not having much success.
Simply put, I am trying to find where there is a digit immediately followed by a constant string "CMR", then adding a space between the digit and the "CMR" substring. "0CMR" should become "0 CMR", "5CMR" should become "5 CMR", etc. Any preceding non-digit should be left as it was.
So my source string is "theStringThat0CMRhas"
my command is:
replaceAll("[0-9]CMR", "[0-9] CMR");
I get the added space in the result, but the result becomes "theStringThat[0-9] CMRhas" which obviously isn't what I need. Somehow I need to tell Regex not to replace with "[0-9]", but with whatever it matched on in the first place.
I know I'm doing this wrong, but I don't know what's right.
Any help appreciated.
Thanks,
Tom

You want to use a capturing group:
replaceAll("([0-9])CMR", "$1 CMR")
$1 references the first group in the match, denoted by parentheses.
Also, [0-9] can be substituted with \d.

Try this:
replaceAll("(?<=\\d)(?=\\D)"," ")
It uses look ahead for non digit character and negative look ahead for digit characters.
If you want just do it for the one with CMR after the digits, use:
"(?<=\\d)(?=CMR)"

You should group the number regex and call argument. Your code here:
replaceAll("([0-9])CMR", "$1 CMR");
For more regex knowledge, please read this document
https://www.tutorialspoint.com/java/java_regular_expressions.htm
Good luck!

a good starting point may be here for reading regex: http://www.regular-expressions.info/java.html
on this site the replacing string page is here: http://www.regular-expressions.info/replacetutorial.html
$with a number represents a whole regex match, and you can use these to refer to what you were doing
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("[0-9]CMR","$0");
System.out.println(resultString);
this would result in the answer: theStringThat0CMR has
you obviously didnt want this, so lets change the answer up a little
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$0 CMR");
System.out.println(resultString);
now we are referencing the parenthsis, in which it hasn't done anything yet, so its replacing what it found, with the same thing, a space, and CMR
your result would now be: theStringThat0CMR CMRhas
so lets reference the part where we have chosen the number
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$1 CMR");
System.out.println(resultString);
now your answer will be: theStringThat0 CMRhas
it is finding where it picked a number, replacing it with that number, a space, and then CMR
you are trying to do what I believe to be called a backreference though I am unsure. Regex is still not my strong suit either.

Related

Java Regex - Finding specific string within a String

I am trying to match a string that start with the set word "hotel", then a hyphen, then a word of any length, then another hyphen and finally a number of any length.
Edit: Dima gave the solution I needed in the comments of this question! Thanks Dima.
Further edit: elaborating on Dima's answer, adding capturing groups making it easier to retrieve the information entered, and correcting the last bit to only accept digits:
^hotel-(.+)-(\d+)

^hotel-(.)*$
(But hotel-something WILL work, according to your initial statement).
So, if you actually want something like:
hotel-XXXXXX-YYYYYYY
Then the regex is :
^hotel-(.)*-(.)*$
Try a regex online tester like http://www.regextester.com/.

If you want to match the start of the input, you use ^.
so if you have ^hotel-\b, that will force hotel to be at the start of the string.
as a note, you can use $ for the end of the string in a similar way.

\bhotel-[^\s-]+-[^\s-]+\b
\b means that it should be a word boundery
[^\s-] means anything but - or whitespace
https://regex101.com/r/mH3vY8/1

Replacing occurrences of generic string

I never was good enough with regex, and I assume this is the job for it.
I have a link like www.somelink/phoyto.jpg/?sz=50
I need to replace 50 with my value, let's say 100. Trouble is, that I cannot be sure, that this will be always sz=50 and not sz=150 or sz=10 or any other value.
What I need is to find an occurence of string contains of 'sz' + number and replace it with 'sz=100'.
Sure, I can do that 'manually" in some for loop, but that wouldn't be nor smar nor efiicient.

str = "www.somelink/phoyto.jpg/?sz=50";
str.replaceall("sz=\\d+", "sz=100");
\d is the java pattern for digit. + stands for one or more digits. replaceall replaces all occurrences of sz=<number>.
Here is a handy online regex tester for java: http://www.regexplanet.com/advanced/java/index.html

This should work:
String link = "www.somelink/phoyto.jpg/?sz=50";
link = link.replaceFirst("sz=\\d+", "sz=100");
System.out.println(link);

It's pretty simple, and this pattern should work:
(sz=\d+)
Code:
String result = searchText.replaceAll("(sz=\\d+)", "sz100");
Example:
http://regex101.com/r/mB3xT9

Is this Regex incorrect? No matches found

I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)

Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.

Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time

Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));

It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.

I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.

If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.

Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)

The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.

You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."

The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$

You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string

Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...

Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/

You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Problem replacing words using [^a-zA-Z] regex

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?

[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.

String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.