Regex only matches once - java

I have the following regex that matches only once:
Matcher m = Pattern.compile("POLYGON\\s\\(\\((([0-9]*\\.[0-9]+)\\s([0-9]*\\.[0-9]+),?)+\\)\\)")
.matcher("POLYGON ((12.789754538957263 36.12443963532555,12.778550292768816 36.089875458584984,12.77760353347314 36.12427601168043))");
while (m.find()) {
System.out.println("-> " + m.group(2) + " - " + m.group(3));
}
But it only prints the first match:
-> 12.789754538957263 - 36.12443963532555
Why does it not match the other coordinates?
I want to print a new line for each pair of coordinates, e.g.
12.789754538957263 - 36.12443963532555
12.778550292768816 - 36.089875458584984
12.77760353347314 - 36.12427601168043

Your regex should look like this (\[0-9\]*\.\[0-9\]+)\s(\[0-9\]*\.\[0-9\]+)
String input = ...
Matcher m = Pattern.compile("([0-9]*\\.[0-9]+)\\s([0-9]*\\.[0-9]+)").matcher(input);
while (m.find()) {
System.out.println("-> " + m.group(1) + " - " + m.group(2));
}
Outputs
-> 12.789754538957263 - 36.12443963532555
-> 12.778550292768816 - 36.089875458584984
-> 12.77760353347314 - 36.12427601168043
If you want to make sure that the input should between POLYGON (( .. )) you can use replaceAll to extract that inputs :
12.789754538957263 36.12443963532555,12.778550292768816 36.089875458584984,12.77760353347314 36.12427601168043
Your code should be :
.matcher(input.replaceAll("POLYGON \\(\\((.*?)\\)\\)", "$1"));
Instead of :
.matcher(input);
Solution 2
After analysing your problem, I think you need just this :
Stream.of(input.replaceAll("POLYGON \\(\\((.*?)\\)\\)", "$1").split(","))
.forEach(System.out::println);

You could still check if your input begins with a certain string like the following.
I'd use the following regex to do the check : (\[\\d.\]+)\\s(\[\\d.\]+)
It searches for sequences of digits or points separated by a space.
String input = ...
if (input.startsWith("POLYGON")) {
Matcher m = Pattern.compile("([\\d.]+)\\s([\\d.]+)").matcher(input);
while (m.find()) {
System.out.println("-> " + m.group(1) + " - " + m.group(2));
}
}

Related

Regex seperate 2 numbers by komma

I'm trying to make a regex to allow only a case of a number then "," and another number or same case seperated by ";" like
57,1000
57,1000;6393,1000
So far i made this: Pattern.compile("\\b[0-9;,]{1,5}?\\d+;([0-9]{1,5},?)+").matcher("57,1000").find();
which work if case is 57,1000;6393,1000 but it also allow letters and don't work when case 57,1000
try Regex "(\d+,\d+(;\d+,\d+)?)"
#Test
void regex() {
Pattern p = Pattern.compile("(\\d+,\\d+)(;\\d+,\\d+)?");
Assertions.assertTrue(p.matcher("57,1000").matches());
Assertions.assertTrue(p.matcher("57,1000;6393,1000").matches());
}
How about like this. Just look for two numbers separated by a comma and capture them.
String[] data = {"57,1000",
"57,1000;6393,1000"};
Pattern p = Pattern.compile("(\\d+),(\\d+)");
for (String str : data) {
System.out.println("For String : " + str);
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1) + " " + m.group(2));
}
System.out.println();
}
prints
For String : 57,1000
57 1000
For String : 57,1000;6393,1000
57 1000
6393 1000
If you just want to match those, you can do the following: It matches a single instance of the string followed by an optional one preceded by a semi-colon.
String regex = "(\\d+,\\d+)(;(\\d+,\\d+))?";
for (String str : data) {
System.out.println("Testing String " + str + " : " +str.matches(regex));
}
prints
Testing String 57,1000 : true
Testing String 57,1000;6393,1000 : true

Java Regex OR operator not working properly

I have this Strings :
String test1=":test:block1:%a1%a2%a3%a4:block2:BL";
and
String test2=":test:block2:BL:block1:%a1%a2%a3%a4";
I've created an regex pattern in order to isolate this piece of String
block1:%a1%a2%a3%a4:
from the rest of the String letting those Strings like this :
in the case of test1="block1:%a1%a2%a3%a4:"; (with ':' at the end)
in the case of test2=":block1:%a1%a2%a3%a4"; (with ':' at the beggining)
The regex i've created is :
"(block1:(.*?):|:block1:(.*))";
With test1 is working , but with test2 is retrieving me this :
block1:%a1%a2%a3%a4:block2:BL";
Can someone give me a hand with this ?
Cheers!
You may use
block1:([^:]*)
It matches block1: text and then captures into Group 1 any 0 or more chars other than :.
See Java demo:
String patternString = "block1:([^:]*)";
String[] tests = {":test:block1:%a1%a2%a3%a4:block2:BL",
":test:block2:BL:block1:%a1%a2%a3%a4"};
for (int i=0; i<tests.length; i++)
{
Pattern p = Pattern.compile(patternString, Pattern.DOTALL);
Matcher m = p.matcher(tests[i]);
if(m.find())
{
System.out.println(tests[i] + " matched. Match: " +
m.group(0) + ", Group 1: " + m.group(1));
}
}
Output:
:test:block1:%a1%a2%a3%a4:block2:BL matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4
:test:block2:BL:block1:%a1%a2%a3%a4 matched. Match: block1:%a1%a2%a3%a4, Group 1: %a1%a2%a3%a4

How to replace all domains with pattern in a XML string in Java?

I have an XML output like this (<xml> element or xlink:href attribute are just fiction and you cannot rely on them to create regex pattern.)
<xml>http://localhost:8080/def/abc/xyx</xml>
<element xlink:href="http://localhostABCDEF/def/ABC/XYZ">Some Text</element>
...
What I want to do is using Java regex to replace the domain pattern (I don't know about existing domains):
"http(s)?://.*/def/.*
with an input domain (e.g: http://google.com/def) and the result will be:
<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com.com/def/ABC/XYZ">Some Text</element>
...
How can I do it? I think Regex in Java can do or String.replaceAll (but this one seems not possible).
Regex: http[s]?:\/{2}.+\/def Substitution: http://google.com/def
Details:
? Matches between zero and one times
[] Match a single character present in the list
. Matches any character
+ Matches between one and unlimited times
Java code:
String domain = "http://google.com/def";
String html = "<xml>http://localhost:8080/def/abc/xyx</xml>\r\n<element xlink:href=\"http://localhostABCDEF/def/ABC/XYZ\">Some Text</element>";
html = html.replaceAll("http[s]?:\\/{2}.+\\/def", domain);
System.out.print(html);
Output:
<xml>http://google.com/def/abc/xyx</xml>
<element xlink:href="http://google.com/def/ABC/XYZ">Some Text</element>
Actually, this could be done with Regex and it is simple enough than parsing XML document. Here is the answer:
String text = "<epsg:CommonMetaData>\n"
+ " <epsg:type>geographic 2D</epsg:type>\n"
+ " <epsg:informationSource>EPSG. See 3D CRS for original information source.</epsg:informationSource>\n"
+ " <epsg:revisionDate>2007-08-27</epsg:revisionDate>\n"
+ " <epsg:changes>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2002.151\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2003.370\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2006.810\"/>\n"
+ " <epsg:changeID xlink:href=\"http://www.opengis.net/def/change-request/EPSG/0/2007.079\"/>\n"
+ " </epsg:changes>\n"
+ " <epsg:show>true</epsg:show>\n"
+ " <epsg:isDeprecated>false</epsg:isDeprecated>\n"
+ " </epsg:CommonMetaData>\n"
+ " </gml:metaDataProperty>\n"
+ " <gml:metaDataProperty>\n"
+ " <epsg:CRSMetaData>\n"
+ " <epsg:projectionConversion xlink:href=\"http://www.opengis.net/def/coordinateOperation/EPSG/0/15593\"/>\n"
+ " <epsg:sourceGeographicCRS xlink:href=\"http://www.opengis.net/def/crs/EPSG/0/4979\"/>\n"
+ " </epsg:CRSMetaData>\n"
+ " </gml:metaDataProperty>"
+ "<gml:identifier codeSpace=\"OGP\">http://www.opengis.net/def/area/EPSG/0/1262</gml:identifier>";
String patternString1 = "(http(s)?://.*/def/.*)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
String prefixDomain = "http://localhost:8080/def";
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String url = prefixDomain + matcher.group(1).split("def")[1];
matcher.appendReplacement(sb, url);
System.out.println(url);
}
matcher.appendTail(sb);
System.out.println(sb.toString());
which returns output https://www.diffchecker.com/CyJ8fY8p

Nested/Repeated Group in Regex

I have to parse a multi line string and retrieve the email addresses in a specific location.
And I have done it using the below code:
String input = "Content-Type: application/ms-tnef; name=\"winmail.dat\"\r\n"
+ "Content-Transfer-Encoding: binary\r\n" + "From: ABC aa DDD <aaaa.b#abc.com>\r\n"
+ "To: DDDDD dd <sssss.r#abc.com>\r\n" + "CC: Rrrrr rrede <sssss.rv#abc.com>, Dsssssf V R\r\n"
+ " <dsdsdsds.vr#abc.com>, Psssss A <pssss.a#abc.com>, Logistics\r\n"
+ " <LOGISTICS#abc.com>, Gssss Bsss P <gdfddd.p#abc.com>\r\n"
+ "Subject: RE: [MyApps] (PRO-34604) PR for Additional Monitor allocation [CITS\r\n"
+ " Ticket:258849]\r\n" + "Thread-Topic: [MyApps] (PRO-34604) PR for Additional Monitor allocation\r\n"
+ " [CITS Ticket:258849]\r\n" + "Thread-Index: AQHRXMJHE6KqCFxKBEieNqGhdNy7Pp8XHc0A\r\n"
+ "Date: Mon, 1 Feb 2016 17:56:17 +0530\r\n"
+ "Message-ID: <B7F84439E634A44AB586E3FF2EA0033A29E27E47#JETWINSRVRPS01.abc.com>\r\n"
+ "References: <JA.101.1453963700000#myapps.abc.com>\r\n"
+ " <JA.101.1453963700000.978.1454311765375#myapps.abc.com>\r\n"
+ "In-Reply-To: <JIRA.450101.1453963700000.978.1454311765375#myapps.abc.com>\r\n"
+ "Accept-Language: en-US\r\n" + "Content-Language: en-US\r\n" + "X-MS-Has-Attach:\r\n"
+ "X-MS-Exchange-Organization-SCL: -1\r\n"
+ "X-MS-TNEF-Correlator: <B7F84439E634A44AB586E3FF2EA0033A29E27E47#JETWINSRVRPS01.abc.com>\r\n"
+ "MIME-Version: 1.0\r\n" + "X-MS-Exchange-Organization-AuthSource: TURWINSRVRPS01.abc.com\r\n"
+ "X-MS-Exchange-Organization-AuthAs: Internal\r\n" + "X-MS-Exchange-Organization-AuthMechanism: 04\r\n"
+ "X-Originating-IP: [1.1.1.7]";
Pattern pattern = Pattern.compile("To:(.*<([^>]*)>).*Message-ID", Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
Pattern innerPattern = Pattern.compile("<([^>]*)>");
Matcher innerMatcher = innerPattern.matcher(matcher.group(1));
while (innerMatcher.find()) {
System.out.println("-->:" + innerMatcher.group(1));
}
}
Here it works fine. I'm first grouping the part from To till the Message which is the required part. And then I have another grouping to extract the email ids.
Is there any better way to do this? Can we do it with one pattern matcher set?
Update:
This is the expected output:
-->:sssss.r#abc.com
-->:sssss.rv#abc.com
-->:dsdsdsds.vr#abc.com
-->:pssss.a#abc.com
-->:LOGISTICS#abc.com
-->:gdfddd.p#abc.com
Ideally, you could have used lookarounds:
(?<=To:.*)<([^>]+)>(?=.*Message-ID)
Visualization by Debuggex
Unfortunately, Java doesn't support variable length in lookbehinds. A workaround could be:
(?<=To:.{0,1000})<([^>]+)>(?=.*Message-ID)
I think you are looking for all the emails inside <...> that come after To: and before Message-ID. So, you may use a \G based regex for one pass:
Pattern pt = Pattern.compile("(?:\\bTo:|(?!^)\\G).*?<([^>]*)>(?=.*Message-ID)", Pattern.DOTALL);
Matcher m = pt.matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
See IDEONE demo and a regex demo
The regex matches:
(?:\\bTo:|(?!^)\\G) - a leading boundary, either To: as a whole word or the location after the previous successful match
.*? - any characters, any number of occurrences up to the first
<([^>]*)> - substring starting with < followed with zero or more characters other than > (Group 1) and followed with a closing >
(?=.*Message-ID) - a positive lookahead that makes sure there is Message-ID somewhere ahead of the current match.

Java how to setup regex for this string

So I'm trying to pull two strings via a matcher object from one string that is stored in my online databases.
Each string appears after s:64: and is in quotations
Example s:64:"stringhere"
I'm currently trying to get them as so but any regex that I've tried has failed,
Pattern p = Pattern.compile("I don't know what to put as the regex");
Matcher m = p.matcher(data);
So with that said, all I need is the regex that will return the two strings in the matcher so that m.group(1) is my first string and m.group(2) is my second string.
Try this regex:-
s:64:\"(.*?)\"
Code:
Pattern pattern = Pattern.compile("s:64:\"(.*?)\"");
Matcher matcher = pattern.matcher(YourStringVar);
// Check all occurance
int count = 0;
while (matcher.find() && count++ < 2) {
System.out.println("Group : " + matcher.group(1));
}
Here group(1) returns the each match.
OUTPUT:
Group : First Match
Group : Second Match
Refer LIVE DEMO
String data = "s:64:\"first string\" random stuff here s:64:\"second string\"";
Pattern p = Pattern.compile("s:64:\"([^\"]*)\".*s:64:\"([^\"]*)\"");
Matcher m = p.matcher(data);
if (m.find()) {
System.out.println("First string: '" + m.group(1) + "'");
System.out.println("Second string: '" + m.group(2) + "'");
}
prints:
First string: 'first string'
Second string: 'second string'
Regex you need should be compile("s:64:\"(.*?)\".*s:64:\"(.*?)\"")

Categories