Using Regexp in Java to remove some text - java

It is maybe a simple question. But I tried a lot of Regexp combinations and still not worinkg. My problem is: I have words like: Test=move or Testing=move
I would like to remove the text 'Test=' or 'Testing='. In other words i need only the 'move' text after the '='. What is the best way to do that in Java? Thanks.

I think that for this problem, the split(string regex) is better suited:
String str = "Test=move";
System.out.println(str.split("=")[1]);

I would replace \w+= with "" - this will get rid of any work preceding an equals sign.
myString.replaceAll("\w+=", "");
If the string before the equals sign has more than just letters you can add them to an optional selection:
myString.replaceAll("[\w-\.\d]+=", "");
This will remove any word with letters, numbers, hyphens and periods.

Related

One expression to ignore special symbols, numbers, and spacing? Java

What I have so far is:
userSent = userSent.replaceAll("\\s+", ""); //replaces all spaces with no spacing
userSent = userSent.replaceAll("[^a-zA-Z0-9]", ""); //remove all special characters
userSent = userSent.replaceAll("[0-9]", ""); //remove all numbers from string
I would like to simplify this to one expression if possible using Java, thanks a bunch.
Those three replaces are equivalent to this:
userSent = userSent.replaceAll("[^a-zA-Z]", "");
(The set of characters that are "not a letter" includes numbers and spaces.)
However, I suspect that this is not what you actually want because it removes every character that is not in the Latin alphabet and mashes them all into a single "word". Is that really what you want? (To my mind, it doesn't match the problem description in your Question's title.)
My advice would be to make sure that the 3 replaces do what you actually want (in other words ... test them in combination) before you try to combine them into a single regex.

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.
It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")
Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc
You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

replacing string with regex in java

I think I have a decent handle wrt matching strings using Regex in Java, but now I am trying to replace strings using Regex and not having much success.
Simply put, I am trying to find where there is a digit immediately followed by a constant string "CMR", then adding a space between the digit and the "CMR" substring. "0CMR" should become "0 CMR", "5CMR" should become "5 CMR", etc. Any preceding non-digit should be left as it was.
So my source string is "theStringThat0CMRhas"
my command is:
replaceAll("[0-9]CMR", "[0-9] CMR");
I get the added space in the result, but the result becomes "theStringThat[0-9] CMRhas" which obviously isn't what I need. Somehow I need to tell Regex not to replace with "[0-9]", but with whatever it matched on in the first place.
I know I'm doing this wrong, but I don't know what's right.
Any help appreciated.
Thanks,
Tom
You want to use a capturing group:
replaceAll("([0-9])CMR", "$1 CMR")
$1 references the first group in the match, denoted by parentheses.
Also, [0-9] can be substituted with \d.
Try this:
replaceAll("(?<=\\d)(?=\\D)"," ")
It uses look ahead for non digit character and negative look ahead for digit characters.
If you want just do it for the one with CMR after the digits, use:
"(?<=\\d)(?=CMR)"
You should group the number regex and call argument. Your code here:
replaceAll("([0-9])CMR", "$1 CMR");
For more regex knowledge, please read this document
https://www.tutorialspoint.com/java/java_regular_expressions.htm
Good luck!
a good starting point may be here for reading regex: http://www.regular-expressions.info/java.html
on this site the replacing string page is here: http://www.regular-expressions.info/replacetutorial.html
$with a number represents a whole regex match, and you can use these to refer to what you were doing
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("[0-9]CMR","$0");
System.out.println(resultString);
this would result in the answer: theStringThat0CMR has
you obviously didnt want this, so lets change the answer up a little
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$0 CMR");
System.out.println(resultString);
now we are referencing the parenthsis, in which it hasn't done anything yet, so its replacing what it found, with the same thing, a space, and CMR
your result would now be: theStringThat0CMR CMRhas
so lets reference the part where we have chosen the number
String testString = "theStringThat0CMRhas";
String resultString = testString.replaceAll("([0-9])CMR","$1 CMR");
System.out.println(resultString);
now your answer will be: theStringThat0 CMRhas
it is finding where it picked a number, replacing it with that number, a space, and then CMR
you are trying to do what I believe to be called a backreference though I am unsure. Regex is still not my strong suit either.

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Problem replacing words using [^a-zA-Z] regex

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?
[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.
String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

Categories