I want to remove dd/ or /dd/ or /dd but then if it's /dd/ I want to replace it with / so that it looks like MM/YYYY.
dd/MM/YYYY
MM/dd/YYYY
MM/YYYY/dd
[^\p{Alpha}]*d+[^\p{Alpha}]*
The above is my current regex.
What I want to achieve is either,
MM/YYYY
YYYY/MM
Cause right now, if I replace /dd/, it results in
MMYYYY
or
YYYYMM
The best one in terms of understandability is to spell out the three options as alternatives:
/dd(?=/)|^dd/|/dd$
That is:
/dd(?=/) the string "/dd" anywhere in the text followed by (positive lookahead) a "/"
or ^dd/ the string "dd/" at the beginning of the text
or /dd$ the string "/dd" at the end of the text
The first alternative is written with a lookahead for the ending slash after "/dd" so that this slash is not consumed, and left in the string so that "MM/dd/YYYY" keeps one slash in the middle.
I think what you're looking for are the Pattern and Matcher classes in Java. Attempt to find a match for your regular expression, and for each match call the replaceAll() or replaceOne() function.
https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
You can achieve this using the regex: dd/|/dd.
Not sure though by d you meant digits or just d.
The regex you have there is more general and matches much more than required.
I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.
I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}
using only regular expression methods, the method String.replaceAll and ArrayList
how can i split a String into tokens, but ignore delimiters that exist inside quotes?
the delimiter is any character that is not alphanumeric or quoted text
for example:
The string :
hello^world'this*has two tokens'
should output:
hello
worldthis*has two tokens
I know there is a damn good and accepted answer already present but I would like to add another regex based (and may I say simpler) approach to split the given text using any non-alphanumeric delimiter which not inside the single quotes using
Regex:
/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/
Which basically means match a non-alphanumeric text if it is followed by even number of single quotes in other words match a non-alphanumeric text if it is outside single quotes.
Code:
String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);
Output:
[hello, world'this*has two tokens', 2ndToken]
Demo:
Here is a live working Demo of the above code.
Use a Matcher to identify the parts you want to keep, rather than the parts you want to split on:
String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
See it working online: ideone
You cannot in any reasonable way. You are posing a problem that regular expressions aren't good at.
Do not use a regular expression for this. It won't work. Use / write a parser instead.
You should use the right tool for the right task.
I'm trying to parse through a string formatted like this, except with more values:
Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value
The Regex
((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))
In the actual string, there are about double the amount of key/values, but I'm keeping it short for brevity. I have them in parentheses so I can call them in groups. The keys I have stored as Constants, and they will always be the same. The problem is, it never finds a match which doesn't make sense (unless the Regex is wrong)
Judging by your comment above, it sounds like you're creating the Pattern and Matcher objects and associating the Matcher with the target string, but you aren't actually applying the regex. That's a very common mistake. Here's the full sequence:
String regex = "Key1=(.*),Key2=(.*)"; // etc.
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(targetString);
// Now you have to apply the regex:
if (m.find())
{
String value1 = m.group(1);
String value2 = m.group(2);
// etc.
}
Not only do you have to call find() or matches() (or lookingAt(), but nobody ever uses that one), you should always call it in an if or while statement--that is, you should make sure the regex actually worked before you call any methods like group() that require the Matcher to be in a "matched" state.
Also notice the absence of most of your parentheses. They weren't necessary, and leaving them out makes it easier to (1) read the regex and (2) keep track of the group numbers.
Looks like you'd do better to do:
String[] pairs = data.split(",");
Then parse the key/value pairs one at a time
Your regex is working for me...
If you are always getting an IllegalStateException, I would say that you are trying to do something like:
matcher.group(1);
without having invoked the find() method.
You need to call that method before any attempt to fetch a group (or you will be in an illegal state to call the group() method)
Give this a try:
String test = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
Pattern pattern = Pattern.compile("((Key1)=(.*)),((Key2)=(.*)),((Key3)=(.*)),((Key4)=(.*)),((Key5)=(.*)),((Key6)=(.*)),((Key7)=(.*))");
Matcher matcher = pattern.matcher(test);
matcher.find();
System.out.println(matcher.group(1));
It's not wrong per se, but it requires a lot of backtracking which might cause the regular expression engine to bail. I would try a split as suggested elsewhere, but if you really need to use a regular expression, try making it non-greedy.
((Key1)=(.*?)),((Key2)=(.*?)),((Key3)=(.*?)),((Key4)=(.*?)),((Key5)=(.*?)),((Key6)=(.*?)),((Key7)=(.*?))
To understand why it requires so much backtracking, understand that for
Key1=(.*),Key2=(.*)
applied to
Key1=x,Key2=y
Java's regular expression engine matches the first (.*) to x,Key2=y and then tries stripping characters off the right until it can get a match for the rest of the regular expression: ,Key2=(.*). It effectively ends up asking,
Does "" match ,Key2=(.*), no so try
Does "y" match ,Key2=(.*), no so try
Does "=y" match ,Key2=(.*), no so try
Does "2=y" match ,Key2=(.*), no so try
Does "y2=y" match ,Key2=(.*), no so try
Does "ey2=y" match ,Key2=(.*), no so try
Does "Key2=y" match ,Key2=(.*), no so try
Does ",Key2=y" match ,Key2=(.*), yes so the first .* is "x" and the second is "y".
EDIT:
In Java, the non-greedy qualifier changes things so that it starts off trying to match nothing and then building from there.
Does "x,Key2=(.*)" match ,Key2=(.*), no so try
Does ",Key2=(.*)" match ,Key2=(.*), yes.
So when you've got 7 keys it doesn't need to unmatch 6 of them which involves unmatching 5 which involves unmatching 4, .... It can do it's job in one forward pass over the input.
I'm not going to say that there's no regex that will work for this, but it's most likely more complicated to write (and more importantly, read, for the next person that has to deal with the code) than it's worth. The closest I'm able to get with a regex is if you append a terminal comma to the string you're matching, i.e, instead of:
"Key1=value1,Key2=value2"
you would append a comma so it's:
"Key1=value1,Key2=value2,"
Then, the regex that got me the closest is: "(?:(\\w+?)=(\\S+?),)?+"...but this doesn't quite work if the values have commas, though.
You can try to continue tweaking that regex from there, but the problem I found is that there's a conflict in the behavior between greedy and reluctant quantifiers. You'd have to specify a capturing group for the value that is greedy with respect to commas up to the last comma prior to an non-capturing group comprised of word characters followed by the equal sign (the next value)...and this last non-capturing group would have to be optional in case you're matching the last value in the sequence, and maybe itself reluctant. Complicated.
Instead, my advice is just to split the string on "=". You can get away with this because presumably the values aren't allowed to contain the equal sign character.
Now you'll have a bunch of substrings, each of which that is a bunch of characters that comprise a value, the last comma in the string, followed by a key. You can easily find the last comma in each substring using String.lastIndexOf(',').
Treat the first and last substrings specially (because the first one does not have a prepended value and the last one has no appended key) and you should be in business.
If you know you always have 7, the hack-of-least resistance is
^Key1=(.+),Key2=(.+),Key3=(.+),Key4=(.+),Key5=(.+),Key6=(.+),Key7=(.+)$
Try it out at http://www.fileformat.info/tool/regex.htm
I'm pretty sure that there is a better way to parse this thing down that goes through .find() rather than .matches() which I think I would recommend as it allows you to move down the string one key=value pair at a time. It moves you into the whole "greedy" evaluation discussion.
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems. - Jamie Zawinski
The simplest solution is the most robust.
final String data = "Key1=value,Key2=value,Key3=value,Key4=value,Key5=value,Key6=value,Key7=value";
final String[] pairs = data.split(",");
for (final String pair: pairs)
{
final String[] keyValue = pair.split("=");
final String key = keyValue[0];
final String value = keyValue[1];
}