regular expression to remove a string conditionally - java

I want to remove dd/ or /dd/ or /dd but then if it's /dd/ I want to replace it with / so that it looks like MM/YYYY.
dd/MM/YYYY
MM/dd/YYYY
MM/YYYY/dd
[^\p{Alpha}]*d+[^\p{Alpha}]*
The above is my current regex.
What I want to achieve is either,
MM/YYYY
YYYY/MM
Cause right now, if I replace /dd/, it results in
MMYYYY
or
YYYYMM

The best one in terms of understandability is to spell out the three options as alternatives:
/dd(?=/)|^dd/|/dd$
That is:
/dd(?=/) the string "/dd" anywhere in the text followed by (positive lookahead) a "/"
or ^dd/ the string "dd/" at the beginning of the text
or /dd$ the string "/dd" at the end of the text
The first alternative is written with a lookahead for the ending slash after "/dd" so that this slash is not consumed, and left in the string so that "MM/dd/YYYY" keeps one slash in the middle.

I think what you're looking for are the Pattern and Matcher classes in Java. Attempt to find a match for your regular expression, and for each match call the replaceAll() or replaceOne() function.
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

You can achieve this using the regex: dd/|/dd.
Not sure though by d you meant digits or just d.
The regex you have there is more general and matches much more than required.

Related

Regex Pattern in Java

I have a regular expression as defined
AAA_BBB_CCCC_(.*)_[0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9]T[0-2][0-9][0-5][0-9][0-5][0-9].
There is a string defined as --> **AAA_BBB_CCCC_DDD_EEEE_19710101T123456** and in the code, we have matcher.group(1) which can filter out what is desired as (DDD_EEEE). Now, I've a new string coming in as --> **AAA_BBB_ATCCCC_DDD_EEEE_19710101T123456**. Is there a way that I can change the regex to satisfy both old and new string? I tried few solutions that came up from Stackoverflow questions like this and others but that didn't work quite right for me.
You just need to add an optional group, (?:AT)?, before CCCC:
AAA_BBB_(?:AT)?CCCC_(.*)_[0-9]{4}[0-1][0-9][0-3][0-9]T[0-2][0-9][0-5][0-9][0-5][0-9]
^^^^^^^
See the regex demo
I also contracted the four [0-9] to [0-9]{4} to make the pattern shorter.
The (?:AT)? is a non-capturing group to which a ? quantifier is applied. The ? quantifier makes the whole sequence of letters match 1 or 0 times, making it optional in the end.
Please give the following regex a try.
AAA_BBB_(ATCCCC|CCCC)_(.*)_[0-9][0-9][0-9][0-9][0-1][0-9][0-3][0-9]T[0-2][0-9][0-5][0-9][0-5][0-9].
It would only match ATCCCC or CCCC. It won't be able to support dynamic characters preceding CCCC. You would need to use wildcards for that.
Also, you would need to change your matcher.group(1) statement to matcher.group(2)

Any suggestions to match and extract the pattern?

I want to match something like this
$(string).not(string).not(string)
The not(string) can repeat zero or more times, after $(string).
Note that the string can be whatever things, except nested not(string).
I used the regular expression (\\$\\((.*)\\))((\\.not\\((.*?)\\))*?)(?!(\\.not)), I think the *? is to non-greedily match any number of sequence of not(string), and use the lookahead to stop the match that is not not(string), so that I can extract only the part that I want.
However, when I tested on the input like
$(string).not(string).not(string).append(string)
the group(0) returns the whole string, which I only need $(string).not(string).not(string).
Obviously I still miss something or misuse of anything, any suggestions?
Try this one (escaped for java):
(\\$\\(string\\)(?:(?:\\.not\(.*?\\))+))
It should capture just the part that you are after. You can test it out (unescaped for java though)
If we assume that parenthesis are not nested, you can write something like this:
string p = "\\$\\([^)]*\\)(?:\\.not\\([^)]*\\))*";
Not need to add a lookahead since the non-capturing group has a greedy quantifier (so the group is repeated as possible).
if what you called string in your question may be a quoted string with parenthesis inside like in Pshemo example: $(string).not(".not(foo)").not(string), you can replace each [^)]* with (?:\\s*\"[^\"]*\"\\s*|[^)]*) to ignore characters inside quoted parts.
From here, "group zero denotes the entire pattern". Use group(1).
(\$\([\w ]+\))(\.not\([\w ]+\))*
This will also work, it would give you two groups, One consisting of the word with $ sign, another would give you the set of all ".not" strings.
Please note: You might have to add escape characters for java.

Word that matches ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$

I am totally confused right now.
What is a word that matches: ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
I tried at Regex 101 this 1Test#!. However that does not work.
I really appreciate your input!
What happens is that your regex seems to be in Java-flavor (Note the \\d)
that is why you have to convert it to work with regex101 which does not work with jave (only works with php, phyton, javascript)
see converted regex:
^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$
which will match your string 1Test#!. Demo here: http://regex101.com/r/gE3iQ9
You just want something that matches that regex?
Here:
a1a!
This pattern matches
\dTest#!
if u want a pattern which matches 1Test#! try this pattern
^.(?=.\d)(?=.[a-zA-Z])(?=.[!##$%^&]).*$
Your java string ^.*(?=.*\\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$ encodes the regexp expression ^.*(?=.*\d)(?=.*[a-zA-Z])(?=.*[!##$%^&]).*$.
This is because the \ is an escape sequence.
The latter matches the string you specified.
If your original string was a regexp, rather than a java string, it would match strings such as \dTest#!
Also you should consider removing the first .*, doing so would make the regexp more efficient. The reason is that regexp's by default are greedy. So it will start by matching the whole string to the initial .*, the lookahead will then fail. The regexp will backtrack, matchine the first .* to all but the last character, and will fail all but one of the loohaheads. This will proceed until it hits a point where the different lookaheads succeed. Dropping the first .*, putting the lookahead immidiately after the start of string anchor, will avoid this problem, and in this case the set of strings matched will be the same.

Java: regex - how do i get the first quote text

As a beginner with regex i believe im about to ask something too simple but ill ask anyway hope it won't bother you helping me..
Lets say i have a text like "hello 'cool1' word! 'cool2'"
and i want to get the first quote's text (which is 'cool1' without the ')
what should be my pattern? and when using matcher, how do i guarantee it will remain the first quote and not the second?
(please suggest a solution only with regex.. )
Use this regular expression:
'([^']*)'
Use as follows: (ideone)
Pattern pattern = Pattern.compile("'([^']*)'");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Or this if you know that there are no new-line characters in your quoted string:
'(.*?)'
when using matcher, how do i guarantee it will remain the first quote and not the second?
It will find the first quoted string first because it starts seaching from left to right. If you ask it for the next match it will give you the second quoted string.
If you want to find first quote's text without the ' you can/should use Lookahead and Lookbehind mechanism like
(?<=').*?(?=')
for example
System.out.println("hello 'cool1' word! 'cool2'".replaceFirst("(?<=').*?(?=')", "ABC"));
//out -> hello 'ABC' word! 'cool2'
more info
You could just split the string on quotes and get the second piece (which will be between the first and second quotes).
If you insist on regex, try this:
/^.*?'(.*?)'/
Make sure it's set to multiline, unless you know you'll never have newlines in your input. Then, get the subpattern from the result and that will be your string.
To support double quotes too:
/^.*?(['"])(.*?)\1/
Then get subpattern 2.

Regular expression ([A-Za-z]*) matches digits and special characters?

i need to validate a text fiel in my application. this cant contain neither digit nor special char so i tried this regex:
[A-Za-z]*
The problem is that this regex doesn't work when i put a digit or a special char in the middle or at the end of the String.
You should use it like this:
^[A-Za-z]+$
to match text (1 or more in length) containing ASCII letters only.
Go ahead and try ^[A-Za-z]*$ instead.
You could use the following Regex:
Pattern p = Pattern.compile("^[A-Za-z]*$");
See a list of regex-specs on http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
The pattern you describe will never work. Without the begin and end bonds the pattern will look for a substring that matches. Since an empty string is also allowed (star means 0 or more characters), one can simply use the empty string anywhere.
you are check help me validate date time type YYYY-MM-DDThh:mm:ss.sssZ or
YYYY-MM-DD
^(((19|20)[0-9][0-9])-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01]))|(((19|20)[0-9][0-9])-(0?[1-9]|1[012])-(0?[1-9]|[12][0-9]|3[01])T([01]?[0-9])|([2][0123]):([012345]?[0-9]):([012345]?[0-9])\.([0-9][0-9][0-9][Z]))$

Categories