Problem replacing words using [^a-zA-Z] regex

Problem replacing words using [^a-zA-Z] regex - java

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?

[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.

String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

Related

How to use string.replaceAll to change everything after a certain word

I have the following string: http://localhost:somePort/abc/soap/1.0
I want the string to just look like this: http://localhost:somePort/abc.
I want to use string.replaceAll but can't seem to get the regex right. My code looks like this: someString.replaceAll(".*\\babc\\b.*", "abc");
I'm wondering what I'm missing? I don't want to split the string or use .replaceFirst, as many solutions suggest.

It would seem to make more sense to use substring, but if you must use replaceAll, here's a way to do it.
You want to replace /abc and everything after it with just /abc.
string = string.replaceAll("/abc.*", "/abc")
If you want to be more discriminating you can include a word boundary after abc, giving you
string = string.replaceAll("/abc\\b.*", "/abc")

Just for explanation on the given regex, why it wont work:
\b \b - word boundaries are not required here and also as .* is added in the beginning it matches the whole string and when you try to replace it with "abc" it will replace the entire match with "abc". Hence you get the wrong answer. Instead, only try to match what is required and then whatever is matched that will be replaced with "abc" string.
someString.replaceAll("/abc.*", "/abc");
/abc.* - Looks specifically for /abc followed by 0 or more characters
/abc - Replaces the above match with /abc

You should use replaceFirst since after first match you are removing all after
text= text.replaceFirst("/abc.*", "/abc");
Or
You can use indexOf to get the index of certain word and then get substring.
String findWord = "abc";
text = text.substring(0, text.indexOf(findWord) + findWord.length());

Word that matches ^.(?=.\\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$

I am totally confused right now.
What is a word that matches: ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tried at Regex 101 this 1Test#!. However that does not work.
I really appreciate your input!

What happens is that your regex seems to be in Java-flavor (Note the \\d)
that is why you have to convert it to work with regex101 which does not work with jave (only works with php, phyton, javascript)
see converted regex:
^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
which will match your string 1Test#!. Demo here: http://regex101.com/r/gE3iQ9

You just want something that matches that regex?
Here:
a1a!

This pattern matches
\dTest#!
if u want a pattern which matches 1Test#! try this pattern
^.(?=.\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$

Your java string ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ encodes the regexp expression ^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$.
This is because the \ is an escape sequence.
The latter matches the string you specified.
If your original string was a regexp, rather than a java string, it would match strings such as \dTest#!
Also you should consider removing the first .*, doing so would make the regexp more efficient. The reason is that regexp's by default are greedy. So it will start by matching the whole string to the initial .*, the lookahead will then fail. The regexp will backtrack, matchine the first .* to all but the last character, and will fail all but one of the loohaheads. This will proceed until it hits a point where the different lookaheads succeed. Dropping the first .*, putting the lookahead immidiately after the start of string anchor, will avoid this problem, and in this case the set of strings matched will be the same.

regex help in java

I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.

You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+

Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.

Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr

there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.

regex to find substring between special characters

I am running into this problem in Java.
I have data strings that contain entities enclosed between & and ; For e.g.
&Text.ABC;, &Links.InsertSomething;
These entities can be anything from the ini file we have.
I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.
I am trying to use regex to pattern match and failing.
Can anyone suggest the regex for this problem?
Thanks!

Here is the regex:
"&[A-Za-z]+(\\.[A-Za-z]+)*;"
It starts by matching the character &, followed by one or more letters (both uppercase and lower case) ([A-Za-z]+). Then it matches a dot followed by one or more letters (\\.[A-Za-z]+). There can be any number of this, including zero. Finally, it matches the ; character.
You can use this regex in java like this:
Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");
Or just
"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");
If you want to remove whitespaces after the matched tokens, you can use this re:
"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace

And there is a nice online regular expression tester which uses the java regexp library.
http://www.regexplanet.com/simple/index.html

You can try:
input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");
See it

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)

The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.

You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."

The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$

You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string

Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...

Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/

You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem replacing words using [^a-zA-Z] regex - java

String unwantedCharactersRegex = "[^a-zA-Z]" This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try String unwantedCharactersRegex = "[^a-zA-Z]+"

Related

How to use string.replaceAll to change everything after a certain word

Word that matches ^.(?=.\\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$

regex help in java

regex to find substring between special characters

Java - Unknown characters passing as [a-zA-z0-9]*?

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Problem replacing words using [^a-zA-Z] regex - java

String unwantedCharactersRegex = "[^a-zA-Z]" This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try String unwantedCharactersRegex = "[^a-zA-Z]+"

Related

How to use string.replaceAll to change everything after a certain word

Word that matches ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$

regex help in java

regex to find substring between special characters

Java - Unknown characters passing as [a-zA-z0-9]*?

Categories

Resources

Word that matches ^.(?=.\\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$