Regex match one character one time in any combination - java

Read much questions and answers but got no idea how to solve my problem.
I have a String something like this:
23424(223)+32 -32
allowed in the full String is:
any number multiple times anywhere of the String
one time ) anywhere of the String
one time ( anywhere of the String
one time + anywhere of the String
multiple times spaces anywhere of the String
multiple times - anywhere of the String
For me the most problem is to find the one time character anywhere of the String. Hope you can help me.
This example String should not match. 23424(223)3+3)2 -32

You can use this regex with negative lookaheads:
^(?!.*\(.*\()(?!.*\).*\))(?!.*\+.*\+)[\d ()+-]+$
We are using 3 negative lookaheads:
(?!.*\(.*\() # Negative lookahead to disallow more than one (
(?!.*\).*\)) # Negative lookahead to disallow more than one )
(?!.*\+.*\+) # Negative lookahead to disallow more than one +
RegEx Demo
Reference: Lookarounds in regex
In Java use:
String regex = "^(?!.*\\(.*\\()(?!.*\\).*\\))(?!.*\\+.*\\+)[\\d ()+-]+$";

You can solve it without regex , just iterate over the string , and put each character in a map with this logic
if(map.get(str.charAt(i)) == null)
map.put(str.charAt(i),1)
else
map.put(str.charAt(i),map.get(str.charAt(i))+1)
when done with the loop, check each one time character in the map and see which one has more than one occurrence

Related

Java Regex with "Joker" characters

I try to have a regex validating an input field.
What i call "joker" chars are '?' and '*'.
Here is my java regex :
"^$|[^\\*\\s]{2,}|[^\\*\\s]{2,}[\\*\\?]|[^\\*\\s]{2,}[\\?]{1,}[^\\s\\*]*[\\*]{0,1}"
What I'm tying to match is :
Minimum 2 alpha-numeric characters (other than '?' and '*')
The '*' can only appears one time and at the end of the string
The '?' can appears multiple time
No WhiteSpace at all
So for example :
abcd = OK
?bcd = OK
ab?? = OK
ab*= OK
ab?* = OK
??cd = OK
*ab = NOT OK
??? = NOT OK
ab cd = NOT OK
abcd = Not OK (space at the begining)
I've made the regex a bit complicated and I'm lost can you help me?
^(?:\?*[a-zA-Z\d]\?*){2,}\*?$
Explanation:
The regex asserts that this pattern must appear twice or more:
\?*[a-zA-Z\d]\?*
which asserts that there must be one character in the class [a-zA-Z\d] with 0 to infinity questions marks on the left or right of it.
Then, the regex matches \*?, which means an 0 or 1 asterisk character, at the end of the string.
Demo
Here is an alternative regex that is faster, as revo suggested in the comments:
^(?:\?*[a-zA-Z\d]){2}[a-zA-Z\d?]*\*?$
Demo
Here you go:
^\?*\w{2,}\?*\*?(?<!\s)$
Both described at demonstrated at Regex101.
^ is a start of the String
\?* indicates any number of initial ? characters (must be escaped)
\w{2,} at least 2 alphanumeric characters
\?* continues with any number of and ? characters
\*? and optionally one last * character
(?<!\s) and the whole String must have not \s white character (using negative look-behind)
$ is an end of the String
Other way to solve this problem could be with look-ahead mechanism (?=subregex). It is zero-length (it resets regex cursor to position it was before executing subregex) so it lets regex engine do multiple tests on same text via construct
(?=condition1)
(?=condition2)
(?=...)
conditionN
Note: last condition (conditionN) is not placed in (?=...) to let regex engine move cursor after tested part (to "consume" it) and move on to testing other things after it. But to make it possible conditionN must match precisely that section which we want to "consume" (earlier conditions didn't have that limitation, they could match substrings of any length, like lets say few first characters).
So now we need to think about what are our conditions.
We want to match only alphanumeric characters, ?, * but * can appear (optionally) only at end. We can write it as ^[a-zA-Z0-9?]*[*]?$. This also handles non-whitespace characters because we didn't include them as potentially accepted characters.
Second requirement is to have "Minimum 2 alpha-numeric characters". It can be written as .*?[a-zA-Z0-9].*?[a-zA-Z0-9] or (?:.*?[a-zA-Z0-9]){2,} (if we like shorter regexes). Since that condition doesn't actually test whole text but only some part of it, we can place it in look-ahead mechanism.
Above conditions seem to cover all we wanted so we can combine them into regex which can look like:
^(?=(?:.*?[a-zA-Z0-9]){2,})[a-zA-Z0-9?]*[*]?$

Regex for multiple instances of character

In Java, using a regular expression, how would I check a string to see if it had a correct amount of instances of a character.
For example take the string hello.world.hello:world:. How could this string be checked to see if it contained two instances of a . or two instances of a :?
I have tried
Pattern p = Pattern.compile("[:]{2}");
Matcher m = p.matcher(hello.world.hello:world:);
m.find();
but that failed.
Edit
First I would like to say thank you for all the answers. I noticed a lot of the answers said something along the lines of "This means: zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice". So if you were checking for 3 : in a string such as Hello::World: how would you do it?
Well, using matches you could use:
"([^:]*:[^:]*){2}"
This means: "zero or more non-colons, followed by a single colon, followed by zero or more non-colons - matched exactly twice".
Using find is not as good, as there may be additional : and it will just ignore them.
You can use this regex based on two lookaheads assertions:
^(?=(?:[^.]*\.){2}[^.]*$)(?=(?:[^:]*:){2}[^:]*$)
(?=(?:[^.]*\.){2}[^.]*$) makes sure there are exactly 2 DOTS and (?=(?:[^:]*:){2}[^:]*$) asserts that there are exactly 2 colons in input string.
RegEx Demo
You can determine whether the string has exectly the given number of a certain character, say ':', by attempting to match it against a pattern of this form:
^(?:[^:]*[:]){2}[^:]*$
That says exactly two non-capturing groups consisting of any number (including zero) of characters other than ':' followed by one colon, with the second group followed by any number of additional characters other than ':'.

How to remove duplicate characters in a string using regex?

I need to replace the duplicate characters in a string. I tried using
outputString = str.replaceAll("(.)(?=.*\\1)", "");
This replaces the duplicate characters but the position of the characters changes as shown below.
input
haih
output
aih
But I need to get an output hai. That is the order of the characters that appear in the string should not change. Given below are the expected outputs for some inputs.
input
aaaassssddddd
output
asd
input
cdddddggggeeccc
output
cdge
How can this be achieved?
It seems like your code is leaving the last character, so how about this?
outputString = new StringBuilder(str).reverse().toString();
// outputString is now hiah
outputString = outputString.replaceAll("(.)(?=.*\\1)", "");
// outputString is now iah
outputString = new StringBuilder(outputString).reverse().toString();
// outputString is now hai
Overview
It's possible with Oracle's implementation, but I wouldn't recommend this answer for many reasons:
It relies on a bug in the implementation, which interprets *, + or {n,} as {0, 0x7FFFFFFF}, {1, 0x7FFFFFFF}, {n, 0x7FFFFFFF} respectively, which allows the look-behind to contains such quantifiers. Since it relies on a bug, there is no guarantee that it will work similarly in the future.
It is unmaintainable mess. Writing normal code and any people who have some basic Java knowledge can read it, but using the regex in this answer limits the number of people who can understand the code at a glance to people who understand the in and out of regex implementation.
Therefore, this answer is for educational purpose, rather than something to be used in production code.
Solution
Here is the one-liner replaceAll regex solution:
String output = input.replaceAll("(.)(?=(.*))(?<=(?=\\1.*?\\1\\2$).+)","")
Printing out the regex:
(.)(?=(.*))(?<=(?=\1.*?\1\2$).+)
What we want to do is to look-behind to see whether the same character has appeared before or not. The capturing group (.) at the beginning captures the current character, and the look-behind group is there to check whether the character has appeared before. So far, so good.
However, since backreferences \1 doesn't have obvious length, it can't appear in the look-behind directly.
This is where we make use of the bug to look-behind up to the beginning of the string, then use a look-ahead inside the look-behind to include the backreference, as you can see (?<=(?=...).+).
This is not the end of the problem, though. While the non-assertion pattern inside look-behind .+ can't advance past the position after the character in (.), the look-ahead inside can. As a simple test:
"haaaaaaaaa".replaceAll("h(?<=(?=(.*)).*)","$1")
> "aaaaaaaaaaaaaaaaaa"
To make sure that the search doesn't spill beyond the current character, I capture the rest of the string in a look-ahead (?=(.*)) and use it to "mark" the current position (?=\\1.*?\\1\\2$).
Can this be done in one replacement without using look-behind?
I think it is impossible. We need to differentiate the first appearance of a character with subsequent appearance of the same character. While we can do this for one fixed character (e.g. a), the problem requires us to do so for all characters in the string.
For your information, this is for removing all subsequent appearance of a fixed character (h is used here):
.replaceAll("^([^h]*h[^h]*)|(?!^)\\Gh+([^h]*)","$1$2")
To do this for multiple characters, we must keep track of whether the character has appeared before or not, across matches and for all characters. The regex above shows the across matches part, but the other condition kinda makes this impossible.
We obviously can't do this in a single match, since subsequent occurrences can be non-contiguous and arbitrary in number.

Regular Expression to match one or more digits 1-9, one '|', one or more '*" and zero or more ','

I'm new to regular expressions and I need to find a regular expression that matches one or more digits [1-9] only ONE '|' sign, one or more '*' sign and zero or more ',' sign.
The string should not contain any other characters.
This is what I have:
if(this.ruleString.matches("^[1-9|*,]*$"))
{
return true;
}
Is it correct?
Thanks,
Vinay
I think you should test separately for every type of symbols rather than write complex expression.
First, test that you don't have invalid symbols - "^[0-9|*,]$"
Then test for digits "[1-9]", it should match at least one.
Then test for "\\|", "\\*" and "\\," and check the number of matches.
If all test are passed then your string is valid.
Nope, try this:
"^[1-9]+\\|\\*+,*$"
Please give us at least 10 possible matching strings of what you are looking to accept, and 10 of what you want to reject, and tell us if either this have to keep some sequence or its order doesn't matter. So we can make a reliable regex.
By now, all I can offer is:
^[1-9]+\|{1}\*+,*$
This RegEx was tested against these sample strings, accepting them:
56421|*****,,,
2|*********,,,
1|*
7|*,
18|****
123456789|*
12|********,,
1516332|**,,,
111111|*
6|*****,,,,
And it was tested against these sample strings, rejecting them:
10|*,
2***525*|*****,,,
123456,15,22*66*****4|,,,*167
1|2*3,4,5,6*
,*|173,
|*,
||12211
12
1|,*
1233|54|***,,,,
I assume your given order is strict and all conditions apply at the same time.
It looks like the pattern you need is
n-n, one or more times seperated by commas
then a bar (|)
then n*n, one or more times seperated by commas.
Here is a regular expression for that.
([1-9]{1}[0-9]*\-[0-9]+){1}
(,[1-9]{1}[0-9]*\-[0-9]+)*
\|
([1-9]{1}[0-9]*\*[0-9]+){1}
(,[1-9]{1}[0-9]*\*[0-9]+)*
But it is so complex, and does not take into account the details, such as
for the case of n-m, you want
n less than m
(I guess).
And you likely want the same number of n-m before the bar, and x*y after the bar.
Depends whether you want to check the syntax completely or not.
(I hope you do want to.)
Since this is so complex, it should be done with a set of code instead of a single regular expression.
this regex should work
"^[1-9\\|\\*,-]*$"
Assert position at the beginning of the string «^»
Match a single character present in the list below «[1-9\|*,-]»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «»
A character in the range between “1” and “9” «1-9»
A | character «\|»
A * character «*»
The character “,” «,»
The character “-” «-»
Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

Java - Unknown characters passing as [a-zA-z0-9]*?

I'm no expert in regex but I need to parse some input I have no control over, and make sure I filter away any strings that don't have A-z and/or 0-9.
When I run this,
Pattern p = Pattern.compile("^[a-zA-Z0-9]*$"); //fixed typo
if(!p.matcher(gottenData).matches())
System.out.println(someData); //someData contains gottenData
certain spaces + an unknown symbol somehow slip through the filter (gottenData is the red rectangle):
In case you're wondering, it DOES also display Text, it's not all like that.
For now, I don't mind the [?] as long as it also contains some string along with it.
Please help.
[EDIT] as far as I can tell from the (very large) input, the [?]'s are either white spaces either nothing at all; maybe there's some sort of encoding issue, also perhaps something to do with #text nodes (input is xml)
The * quantifier matches "zero or more", which means it will match a string that does not contain any of the characters in your class. Try the + quantifier, which means "One or more": ^[a-zA-Z0-9]+$ will match strings made up of alphanumeric characters only. ^.*[a-zA-Z0-9]+.*$ will match any string containing one or more alphanumeric characters, although the leading .* will make it much slower. If you use Matcher.lookingAt() instead of Matcher.matches, it will not require a full string match and you can use the regex [a-zA-Z0-9]+.
You have an error in your regex: instead of [a-zA-z0-9]* it should be [a-zA-Z0-9]*.
You don't need ^ and $ around the regex.
Matcher.matches() always matches the complete string.
String gottenData = "a ";
Pattern p = Pattern.compile("[a-zA-z0-9]*");
if (!p.matcher(gottenData).matches())
System.out.println("doesn't match.");
this prints "doesn't match."
The correct answer is a combination of the above answers. First I imagine your intended character match is [a-zA-Z0-9]. Note that A-z isn't as bad as you might think it include all characters in the ASCII range between A and z, which is the letters plus a few extra (specifically [,\,],^,_,`).
A second potential problem as Martin mentioned is you may need to put in the start and end qualifiers, if you want the string to only consists of letters and numbers.
Finally you use the * operator which means 0 or more, therefore you can match 0 characters and matches will return true, so effectively your pattern will match any input. What you need is the + quantifier. So I will submit the pattern you are most likely looking for is:
^[a-zA-Z0-9]+$
You have to change the regexp to "^[a-zA-Z0-9]*$" to ensure that you are matching the entire string
Looks like it should be "a-zA-Z0-9", not "a-zA-z0-9", try correcting that...
Did anyone consider adding space to the regex [a-zA-Z0-9 ]*. this should match any normal text with chars, number and spaces. If you want quotes and other special chars add them to the regex too.
You can quickly test your regex at http://www.regexplanet.com/simple/
You can check input value is contained string and numbers? by using regex ^[a-zA-Z0-9]*$
if your value just contained numberString than its show match i.e, riz99, riz99z
else it will show not match i.e, 99z., riz99.z, riz99.9
Example code:
if(e.target.value.match('^[a-zA-Z0-9]*$')){
console.log('match')
}
else{
console.log('not match')
}
}
online working example

Categories