I need to extract region information from s3 virtual based path uri as follows.
https://<bucket_name>.s3.<region_name>..com/
My intention is to fetch the region name from s3 uri and provide the region while creation s3client using aws java sdk 2.x.
Can anyone help me with the regex for this?
Try this:
public class RegexTest {
public static void main(String[] args) {
String url = "https://mybucket.s3.us-west-1.com/";
Pattern p = Pattern.compile("https:\\/\\/.*\\.s3\\.(.*)\\.com");
Matcher matcher = p.matcher(url);
if(matcher.find()) {
System.out.println(matcher.group(1));
}
}
}
matcher.group(0) will give you the entire match (the full URL in this case). Calling matcher.group(1) will get the first matched group, that is the first part of the regex encapsuled in ()
I have a log as a string, and I am trying to capture the error message from it. the regex I tried did not work.
String = "Retrying for error: [[\"billing\",\{u'non_field_errors': [u'Invalid payment email
provided']}\"]]"
I need to extract the error message which is
Invalid payment email provided
How I can extract this, using regex?
I tried the pattern Retrying for error: (\\.+), but it doesn't work:
String pattern = "Retrying for error: (\\.+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(unescapedStr);
if (m.find()) {
error = m.group(1);
}
How can I get the expected result?
Invalid payment email provided, Actual: null
you dont have to use regex. how about this
String x = "Retrying for error: [[\"billing\",\\{u'non_field_errors': [u'Invalid payment email provided']}\"]]";
String c = x.replace("Retrying for error: ","");
String g = c.substring(c.lastIndexOf('[')+1);
String v = g.substring(0, g.indexOf(']'));
System.out.println(v);
this prints
u'Invalid payment email provided'
now do your logs have multiple instances of Retrying for error:? also, does this
"Retrying for error: [[\"billing\",\\{u'non_field_errors': [u'Invalid payment email provided']}\"]]";
represent a single line in your logs?
The main idea is this:
If each line in your log file has just one instance of Retrying for error:, then you can easily parse the log one line at a time and iteratively strip away stuff that you dont need.
You may use the following regex:
Retrying for error:.*\[u'([^']+)
See the regex demo.
Details
Retrying for error: - a literal substring
.* - any 0+ chars other than line break chars, as many as possible
\[u' - a [u' substring
([^']+) - Capturing group #1 (matcher.group(1) value): 1+ chars other than '.
See the Java demo:
String unescapedStr = "Retrying for error: [[\"billing\",\\{u'non_field_errors': [u'Invalid payment email provided']}\"]]";
String pattern = "Retrying for error:.*\\[u'([^']+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(unescapedStr);
if (m.find()) {
System.out.println(m.group(1));
}
// => Invalid payment email provided
I want to parse all google map links inside a String. The format is as follows :
1st example
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298
https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z
https://www.google.com/maps/place//#38.8976763,-77.0387185,17z
https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z
https://www.google.com/maps/place/#38.8976763,-77.0387185,17z
https://google.com/maps/place/#38.8976763,-77.0387185,17z
http://google.com/maps/place/#38.8976763,-77.0387185,17z
https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z
These are all valid google map URLs (linking to White House)
Here is what I tried
String gmapLinkRegex = "(http|https)://(www\\.)?google\\.com(\\.\\w*)?/maps/(place/.*)?#(.*z)[^ ]*";
Pattern patternGmapLink = Pattern.compile(gmapLinkRegex , Pattern.CASE_INSENSITIVE);
Matcher m = patternGmapLink.matcher(s);
while (m.find()) {
logger.info("group0 = {}" , m.group(0));
String place = m.group(4);
place = StringUtils.stripEnd(place , "/"); // remove tailing '/'
place = StringUtils.stripStart(place , "place/"); // remove header 'place/'
logger.info("place = '{}'" , place);
String latLngZ = m.group(5);
logger.info("latLngZ = '{}'" , latLngZ);
}
It works in simple situation , but still buggy ...
for example
It need post-process to grab optional place information
And it cannot extract one line with two urls such as :
s = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z " +
" and http://google.com/maps/place/#38.8976763,-77.0387185,17z";
It should be two urls , but the regex matches the whole line ...
The points :
The whole URL should be matched in group(0) (including the tailing data part in 1st example),
in the 1st example , if the zoom level : 17z is removed , it is still a valid gmap URL , but my regex cannot match it.
Easier to extract optional place info
Lat / Lng extraction is must , zoom level is optional.
Able to parse multiple urls in one line
Able to process maps.google.com(.xx)/maps , I tried (www|maps\.)? but seems still buggy
Any suggestion to improve this regex ? Thanks a lot !
The dot-asterisk
.*
will always allow anything to the end of the last url.
You need "tighter" regexes, which match a single URL but not several with anything in between.
The "[^ ]*" might include the next URL if it is separated by something other than " ", which includes line break, tab, shift-space...
I propose (sorry, not tested on java), to use "anything but #" and "digit, minus, comma or dot" and "optional special string followed by tailored charset, many times".
"(http|https)://(www\.)?google\.com(\.\w*)?/maps/(place/[^#]*)?#([0123456789\.,-]*z)(\/data=[\!:\.\-0123456789abcdefmsx]+)?"
I tested the one above on a perl-regex compatible engine (np++).
Please adapt yourself, if I guessed anything wrong. The explicit list of digits can probably be replaced by "\d", I tried to minimise assumptions on regex flavor.
In order to match "URL" or "URL and URL", please use a variable storing the regex, then do "(URL and )*URL", replacing "URL" with regex var. (Asuming this is possible in java.) If the question is how to then retrieve the multiple matches: That is java, I cannot help. Let me know and I delete this answer, not to provoke deserved downvotes ;-)
(Edited to catch the data part in, previously not seen, first example, first line; and the multi URLs in one line.)
I wrote this regex to validate google maps links:
"(http:|https:)?\\/\\/(www\\.)?(maps.)?google\\.[a-z.]+\\/maps/?([\\?]|place/*[^#]*)?/*#?(ll=)?(q=)?(([\\?=]?[a-zA-Z]*[+]?)*/?#{0,1})?([0-9]{1,3}\\.[0-9]+(,|&[a-zA-Z]+=)-?[0-9]{1,3}\\.[0-9]+(,?[0-9]+(z|m))?)?(\\/?data=[\\!:\\.\\-0123456789abcdefmsx]+)?"
I tested with the following list of google maps links:
String location1 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location2 = "https://www.google.com.tw/maps/place/#38.8976763,-77.0387185,17z";
String location3 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location4 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z/data=!3m1!4b1!4m5!3m4!1s0x89b7b7bcdecbb1df:0x715969d86d0b76bf!8m2!3d38.8976763!4d-77.0365298";
String location5 = "https://www.google.com/maps/place/white+house/#38.8976763,-77.0387185,17z";
String location6 = "https://www.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location7 = "https://maps.google.com/maps/place//#38.8976763,-77.0387185,17z";
String location8 = "https://www.google.com/maps/place/#38.8976763,-77.0387185,17z";
String location9 = "https://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location10 = "http://google.com/maps/place/#38.8976763,-77.0387185,17z";
String location11 = "https://www.google.com/maps/place/#/data=!4m2!3m1!1s0x3135abf74b040853:0x6ff9dfeb960ec979";
String location12 = "https://maps.google.com/maps?q=New+York,+NY,+USA&hl=no&sll=19.808054,-63.720703&sspn=54.337928,93.076172&oq=n&hnear=New+York&t=m&z=10";
String location13 = "https://www.google.com/maps";
String location14 = "https://www.google.fr/maps";
String location15 = "https://google.fr/maps";
String location16 = "http://google.fr/maps";
String location17 = "https://www.google.de/maps";
String location18 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location19 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location20 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location21 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location22 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location23 = "https://www.google.de/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4";
String location24 = "https://www.google.com/maps?ll=37.0625,-95.677068&spn=45.197878,93.076172&t=h&z=4&layer=t&lci=com.panoramio.all,com.google.webcams,weather";
String location25 = "https://www.google.com/maps?ll=37.370157,0.615234&spn=45.047033,93.076172&t=m&z=4&layer=t";
String location26 = "http://www.google.com/maps/place/21.01196755,105.86306012";
String location27 = "http://google.com/maps/bylatlng?lat=21.01196022&lng=105.86298748";
String location28 = "https://www.google.com/maps/place/C%C3%B4ng+vi%C3%AAn+Th%E1%BB%91ng+Nh%E1%BA%A5t,+354A+%C4%90%C6%B0%E1%BB%9Dng+L%C3%AA+Du%E1%BA%A9n,+L%C3%AA+%C4%90%E1%BA%A1i+H%C3%A0nh,+%C4%90%E1%BB%91ng+%C4%90a,+H%C3%A0+N%E1%BB%99i+100000,+Vi%E1%BB%87t+Nam/#21.0121535,105.8443773,13z/data=!4m2!3m1!1s0x3135ab8ee6df247f:0xe6183d662696d2e9";
This is my tree grammar:
grammar t;
options{
output = AST;
}
type
:
'NVARCHAR' -> "VARCHAR"
;
ANTLR3 3.1.3 says:
syntax error: antlr: t.g:12:5: unexpected token: 'NVARCHAR'
What's wrong here? I took it from this article.
ps. I'm using this grammar later in order to get AST out of it. Once the AST is retrieved I'm walking through it and add every token's text to some string buffer. The idea of the rewriting above is to replace certain tokens. I'm doing language-to-language mapping (SQL to SQL dialect, to be more specific).
Note the first sentence Terence starts with: "just had some cool ideas about a semantic rule specification language...". That's what the first example is: an idea. It's not valid syntax.
There are (at least) two options for you:
1. rewrite the text in the token immediately
grammar T;
options{
output=AST;
}
#parser::members {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("NVARCHAR"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.type();
}
}
type
: NVARCHAR {System.out.println("token=" + $NVARCHAR.text);}
;
NVARCHAR
: 'NVARCHAR' {setText("VARCHAR");}
;
But this only adjusts the text, not the type of the token, which remains a NVARCHAR type.
2. use an imaginary token:
grammar T;
options{
output=AST;
}
tokens {
VARCHAR='VARCHAR';
}
#parser::members {
public static void main(String[] args) throws Exception {
TLexer lexer = new TLexer(new ANTLRStringStream("NVARCHAR"));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.type();
}
}
type
: NVARCHAR -> VARCHAR
;
NVARCHAR
: 'NVARCHAR'
;
which changes the text and type of the token.
As you can see, with both demos, token=VARCHAR is being printed to the console:
bart#hades:~/Programming/ANTLR/Demos/T$ java -cp antlr-3.3.jar org.antlr.Tool T.g
bart#hades:~/Programming/ANTLR/Demos/T$ javac -cp antlr-3.3.jar *.java
bart#hades:~/Programming/ANTLR/Demos/T$ java -cp .:antlr-3.3.jar TParser
token=VARCHAR
in antlr4 replacing text and type can be achieved with the type action:
OldTokenType:
('Token1' | 'Token2' | 'Token3' ) {setText("New Token");}
-> type(NewTokenType);
I am trying to do some stuff with replacing String containing some URL to a browser compatible linked URL.
My initial String looks like this :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
What I want to get is a String looking like :
"hello, i'm some text with an url like http://www.the-url.com/ and I need to have an hypertext link !"
I can catch URL with this code line :
String withUrlString = myString.replaceAll(".*://[^<>[:space:]]+[[:alnum:]/]", "HereWasAnURL");
Maybe the regexp expression needs some correction, but it's working fine, need to test in further time.
So the question is how to keep the expression catched by the regexp and just add a what's needed to create the link : catched string
Thanks in advance for your interest and responses !
Try to use:
myString.replaceAll("(.*://[^<>[:space:]]+[[:alnum:]/])", "HereWasAnURL");
I didn't check your regex.
By using () you can create groups. The $1 indicates the group index.
$1 will replace the url.
I asked a simalir question: my question
Some exemples: Capturing Text in a Group in a regular expression
public static String textToHtmlConvertingURLsToLinks(String text) {
if (text == null) {
return text;
}
String escapedText = HtmlUtils.htmlEscape(text);
return escapedText.replaceAll("(\\A|\\s)((http|https|ftp|mailto):\\S+)(\\s|\\z)",
"$1$2$4");
}
There may be better REGEXs out there, but this does the trick as long as there is white space after the end of the URL or the URL is at the end of the text. This particular implementation also uses org.springframework.web.util.HtmlUtils to escape any other HTML that may have been entered.
For anybody who is searching a more robust solution I can suggest the Twitter Text Libraries.
Replacing the URLs with this library works like this:
new Autolink().autolink(plainText)
Belows code replaces links starting with "http" or "https", links starting just with "www." and finally replaces also email links.
Pattern httpLinkPattern = Pattern.compile("(http[s]?)://(www\\.)?([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern wwwLinkPattern = Pattern.compile("(?<!http[s]?://)(www\\.+)([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
Pattern mailAddressPattern = Pattern.compile("[\\S&&[^#]]+#([\\S&&[^.#]]+)(\\.[\\S&&[^#]]+)");
String textWithHttpLinksEnabled =
"ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda";
if (Objects.nonNull(textWithHttpLinksEnabled)) {
Matcher httpLinksMatcher = httpLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = httpLinksMatcher.replaceAll("$0");
final Matcher wwwLinksMatcher = wwwLinkPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = wwwLinksMatcher.replaceAll("$0");
final Matcher mailLinksMatcher = mailAddressPattern.matcher(textWithHttpLinksEnabled);
textWithHttpLinksEnabled = mailLinksMatcher.replaceAll("$0");
System.out.println(textWithHttpLinksEnabled);
}
Prints:
ajdhkas www.dasda.pl/asdsad?asd=sd www.absda.pl maiandrze#asdsa.pl klajdld http://dsds.pl httpsda http://www.onet.pl https://www.onsdas.plad/dasda
Assuming your regex works to capture the correct info, you can use backreferences in your substitution. See the Java regexp tutorial.
In that case, you'd do
myString.replaceAll(....., "\1")
In case of multiline text you can use this:
text.replaceAll("(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)",
"$1<a href='$2'>$2</a>$4");
And here is full example of my code where I need to show user's posts with urls in it:
private static final Pattern urlPattern = Pattern.compile(
"(\\s|\\^|\\A)((http|https|ftp|mailto):\\S+)(\\s|\\$|\\z)");
String userText = ""; // user content from db
String replacedValue = HtmlUtils.htmlEscape(userText);
replacedValue = urlPattern.matcher(replacedValue).replaceAll("$1$2$4");
replacedValue = StringUtils.replace(replacedValue, "\n", "<br>");
System.out.println(replacedValue);