java pattern filter - java

i need created java pattern to filter data, like 13.6Gb, 12MB,15.5Kb
I use those code
Pattern p = Pattern.compile("(\\d+)(\\w+)");
Matcher m = p.matcher(content);
String num_letter = m.group(1);
String union = m.group(2);
but it can't detect decimal number, so how to modify this pattern

Try adding a conditional match for the decimal part:
Pattern.compile("(\\d+(?:[.]\\d+)?)(\\w+)");
Note the use of non-capturing group for the decimal part.

Have is a variation on the conditional decimal match:
Pattern.compile("(\\d+\\.?\\d+?)+(\\w+)");

If you are using eclipse, I prefer to use a tool like: http://myregexp.com/eclipsePlugin.html - it makes this waaaaay easy.
Eyeballing yours, I would say something like (\\d+(\\.?(\\d+))?) then you could see how many match groups you have before pulling the ones out you want. Alternatively, using named capture groups would be more readable.
-Ryan

Related

Regex extract string in java

I'm trying to extract a string from a String in Regex Java
Pattern pattern = Pattern.compile("((.|\\n)*).{4}InsurerId>\\S*.{5}InsurerId>((.|\\n)*)");
Matcher matcher = pattern.matcher(abc);
I'm trying to extract the value between
<_1:InsurerId>F2021633_V1</_1:InsurerId>
I'm not sure where am I going wrong but I don't get output for
if (matcher.find())
{
System.out.println(matcher.group(1));
}
You can use:
Pattern pattern = Pattern.compile("<([^:]+:InsurerId)>([^<]*)</\\1>");
Matcher matcher = pattern.matcher(abc);
if (matcher.find()) {
System.out.println(matcher.group(2));
}
RegEx Demo
You may want to use the totally awesome page http://regex101.com/ to test your regular expressions. As you can see at https://regex101.com/r/rV8uM3/1, you only have empty capturing groups, but let me explain to you what you did. :D
((.|\n)*) This matches any character, or a new line, unimportant how often. It is capturing, so your first matching group will always be everything before <_1:InsurerId>, or an empty string. You can match any character instead, it will include new lines: .*. You can even leave it away as it isn't actually part of the String you want to match - using anything here will actually be a problem if you have multiple InsurerIds in your file and want to get them all.
.{4}InsurerId> This matches "InsurerId>" with any four characters in front of it and is exactly what you want. As the first character is probably always an opening angle bracket (and you don't want stuff like "<ExampleInsurerId>"), I'd suggest using <.{3}InsurerId> instead. This still could have some problems (<Test id="<" xInsurerId>), so if you know exactly that it's "_<a digit>:", why not use <_\d:InsurerId>?
\S* matches everything except for whitespaces - probably not the best idea as XML and similar files can be written to not contain any space at all. You want to have everything to the next tag, so use [^<]* - this matches everything except for an opening angle bracket. You also want to get this value later, so you have to use a capturing group: ([^<]*)
.{5}InsurerId> The same thing here: use <\/.{3}InsurerId> or <\/_\d:InsurerId> (forward slashes are actually characters interpreted by other RegEx implementations, so I suggest escaping them)
((.|\n)*) Again the same thing, just leave it away
The resulting Regular Expression would then be the following:
<_\d:InsurerId>([^<]*)<\/_\d:InsurerId>
And as you can see at https://regex101.com/r/mU6zZ3/1 - you have exactly one match, and it's even "F2021633_V1" :D
For Java, you have to escape the backslashes, so the resulting code would look like this:
Pattern pattern = Pattern.compile("<_\\d:InsurerId>([^<]*)<\\/_\\d:InsurerId>");
If you are using Java 7 and above, you can use naming groups to make the Regex a little bit more readable (also see the backreference group \k for close tag to match the openning tag):
Pattern pattern = Pattern.compile("(?:<(?<InsurancePrefix>.+)InsurerId>)(?<id>[A-Z0-9_]+)</\\k<InsurancePrefix>InsurerId>");
Matcher matcher = pattern.matcher("<_1:InsurerId>F2021633_V1</_1:InsurerId>");
if (matcher.matches()) {
System.out.println(matcher.group("id"));
}
Using back reference the matches() fails, for example, on this text
<_1:InsurerId>F2021633_V1</_2:InsurerId>
which is correct
Javadoc has a good explanation: https://docs.oracle.com/javase/8/docs/api/
Also you might consider using a different tool (XML parser) instead of Regex, as well, as other people have to support your code, and complex Regex is usually difficult to understand.

Java replaceAll() to pull numbers and periods

I have a line of output similar to "Spec-Version: 11.3.0". I'm struggling to pull only the version out, with periods, using replaceAll(). Right now I have something like this:
version = line.replaceAll("[\\D+\\.]" , "");
With this I'm getting a version of:
1130
No matter what combination of syntax I use I'm either losing the periods or pulling the entire line.
Any help is appreciated.
The below regex would store the version number in the first group. Replace the whole string with the first group.
:\s*(.*$)
Your java string would be ":\\s*(.*$)"
DEMO
What you're doing there is removing everything that is either a period, or is not a number (which includes periods).
Try "[^\\d\\.]"
Your replace all is getting rid of anything not a digit or a period.
version = line.replaceAll("[^\\.\\d]" , "");
should replace anything not a digit and not a period
so many answers try this pattern to MATCH your request
([\d.]+)$
Demo
for a replace version use this pattern
^.*?(?=[\d.]+$)
Demo
/[^\d.]/g
Replaces anything that isn't a digit or ".".
Here is a demo at Regexr
As others mentioned, this is a perfect use case for a capture group. In java you can write the following:
String regex = ".*Spec-Version: ([\\d\\.]+).*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher("as asdgf sdf as Spec-Version: 12.3.1 asda sd]");
if (matcher.matches()) {
System.out.println("Match found");
System.out.println(matcher.group(1));
}
You will need to try the match on each line of text. I recommend this instead of replaceAll because it will most certainly be more flexible in the future. In fact, you will be able to match strings like:
pattern
.matcher("124442 1 2.23.4.12 as asdgf sdf as Spec-Version: 12.3.1 asda sd] 12.12314.15421");
while in the case above, replaceAll will give '12444212.23.4.1212.3.112.12314.15421', which is not what you want.

Java fix regex in code

I need to print #OPOK, but in the following code:
String s = "\"MSG1\":\"00\",\"MSG2\":\"#OPOK\",\"MSG3\":\"XXXXXX\"}";
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+)\".*");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
} else {
System.out.println("Match not found");
}
I get #OPOK","MSG3":"XXXXXX instead, how do I fix my pattern ?
You want to make your .+ part reluctant. By default it's greedy - it'll match as much as it can without preventing the pattern from matching. You want it to match as little as it can, like this:
Pattern pattern = Pattern.compile(".*\"MSG2\":\"(.+?)\".*");
The ? is what makes it reluctant. See the Pattern documentation for more details.
Or of course you could just match against "any character other than a double quote" which is what Brian's approach will do. Both will work equally well as far as I'm aware; there may well be performance differences between them (I'd expect Brian's to perform better to be honest) but if performance is important to you you should test both approaches.
You probably want the following:
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]+)\"");
For the capture group you are interested in, this will match any character except a double quote. Since the group is surrounded by double quotes, this should prevent it from going "too far" in the match.
Edited to add: As #bmorris591 suggested in the comments, you can add an extra + (as shown below) to make the quantifier possessive. This may help improve performance in cases where the matcher fails to find a match.
Pattern pattern = Pattern.compile("\"MSG2\":\"([^\"]++)\"");

Extract substring after a certain pattern

I have the following string:
http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true
How can I extract the part after 30/? In this case, it's 32531a5d-b0b1-4a8b-9029-b48f0eb40a34.I have another strings having same part upto 30/ and after that every string having different id upto next / which I want.
You can do like this:
String s = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
System.out.println(s.substring(s.indexOf("30/")+3, s.length()));
split function of String class won't help you in this case, because it discards the delimiter and that's not what we want here. you need to make a pattern that looks behind. The look behind synatax is:
(?<=X)Y
Which identifies any Y that is preceded by a X.
So in you case you need this pattern:
(?<=30/).*
compile the pattern, match it with your input, find the match, and catch it:
String input = "http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true";
Matcher matcher = Pattern.compile("(?<=30/).*").matcher(input);
matcher.find();
System.out.println(matcher.group());
Just for this one, or do you want a generic way to do it ?
String[] out = mystring.split("/")
return out[out.length - 2]
I think the / is definitely the delimiter you are searching for.
I can't see the problem you are talking about Alex
EDIT : Ok, Python got me with indexes.
Regular expression is the answer I think. However, how the expression is written depends on the data (url) format you want to process. Like this one:
Pattern pat = Pattern.compile("/Content/SiteFiles/30/([a-z0-9\\-]+)/.*");
Matcher m = pat.matcher("http://xxx/Content/SiteFiles/30/32531a5d-b0b1-4a8b-9029-b48f0eb40a34/05%20%20LEISURE.mp3?&mydownloads=true");
if (m.find()) {
System.out.println(m.group(1));
}

How to match a set of string in regexp

How do I combine ch..+ and ch..- in regexp effectively without having to scan separately?
And are we using matcher in the pattern?
My output code is like this:
ch01+
ch01-
ch02+
ch02-
...
How do I combine ch..+ and ch..- in regexp effectively without having to scan separately?
Use | (pipe) for alternation:
ch..(\+|-)
And are we using matcher in the pattern?
Depends on how you're using the regexp and the pattern. To get a concrete answer, you'll have to show some actual code, or ask a much more specific question.
N.B. If you want to restrict the two characters after ch to 0-9, you can use \d, which is a shorthand character class for [0-9]:
ch\d{2}(\+|-)
You can use a character class containing just "+" and "-" like so "[+-]".
Pattern p = Pattern.compile("ch..[+-]");
Matcher m = p.matcher("ch01+");
if (m.find()) {
// found it...

Categories