Get link from url and get email by regex

Get link from url and get email by regex - java

I'm looking for good regex in java to get string url from all links and all emails. Now I have regex for links:
String linkRegex = "http[s]*://(\\w+\\.)*(\\w+)";
Pattern pattern = Pattern.compile(linkRegex);
Matcher matcher = pattern.matcher(stringAddres);
while (matcher.find()) {
String currentLink = matcher.group();
}
and I got links like: http://twitter.com but also I have https://google. So is there any way that I can remove links like https://google?
And I need regex that gives me email from string, for example:
from this:
href="mailto:contact#example.com">contact#example.com</a></span>
I should get only contact#example.com

There are many answered questions with simple regex patterns that work with most common mails, still I would suggest this regex based on RFC 5322 Standard:
(?:[a-z0-9!#$%&'+/=?^_`{|}~-]+(?:.[a-z0-9!#$%&'+/=?^_`{|}~-]+)|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])")#(?:(?:a-z0-9?.)+a-z0-9?|[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\[\x01-\x09\x0b\x0c\x0e-\x7f])+)])
Copied from this site.

I would just use look-behind to lock onto the interesting attributes in the text, and then just capture everything in the "...".
Like this
((?<=href="mailto:)|(?<=src="))[^"]+

Related

Java regex only finding one match

I'm using the following regex:
(?<=<((Pswrd>)|([^/]{1,2147483646}?:Pswrd>)))((?s).+?)(?=</(\\1))
And I have the following text to match:
<abc:Pswrd>PASSWORD_ONE</abc:Pswrd>
<Pswrd>PASSWORD_TWO</Pswrd>
I need to match the context of both XML tags but is only working for the second one.
The output is:
PASSWORD_TWO
And it should be:
PASSWORD_ONE
PASSWORD_TWO
It seems the OR is not working for some reason?
String message = " <abc:Pswrd>PASSWORD_ONE</abc:Pswrd>\n" +
" <Pswrd>PASSWORD_TWO</Pswrd>";
String regex = "(?<=<((Pswrd>)|([^/]{1,2147483646}?:Pswrd>)))((?s).+?)(?=</(\\1))";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(message);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
Thanks
Update: It needs to be the matching group 0.

So in order to either match <Pswd> or <abc:Pswd> or <something:Pswd>, the RegEx would need to look something like <\w*:*Pswrd>. The problem however is that the look behind does not like non-fixed width quantifiers, so you can't create a look behind that caters for a "dynamic"
Instead I would suggest just go for something simple, such as :
(?<=Pswrd>)(.*)(?=<\/)
Essentially here you just look for the last bit of the opening tag (namely "Pswrd>") then you match any thing between that and the closing portion of the tag.

Regular Expression to find entire link in string

I have a regular expression in apex that is only grabbing part of the link I need in a string. I need it to grab the entire link.
Here is what im working with:
String myvar = 'this is an example http://test.com/testing/123654123%0A%0A%0A%';
String myvar1 = '(?:(?:(?:[a-z0-9]{3,9}:(?://)?)(?:[-;:&=+$,w]+#)?[a-z0-9.-]+|(?:www.|[-;:&=+$??,w]+#)[a-z0-9.-]+)((?:/[+~%/.w-]*)?\\??(?:[-+=&;%#.w]*)#?w*)?)';
Pattern MyPattern = Pattern.compile(myvar1);
Matcher MyMatcher = MyPattern.matcher(myvar);
while (MyMatcher.find()) {
System.debug(MyMatcher.group());
Location = MyMatcher.group();
}
This is only returning http://test.com/
I need http://test.com/testing/123654123
How can I modify the regular expression to provide the complete link?
I just need to modify my existing regex to accomplish this. How can keep as much of the regular expression im using as possible?
(?:(?:(?:[a-z0-9]{3,9}:(?://)?)(?:[-;:&=+$,w]+#)?[a-z0-9.-]+|(?:www.|[-;:&=+$??,w]+#)[a-z0-9.-]+)((?:/[+~%/.w-]*)?\\??(?:[-+=&;%#.w]*)#?w*)?)

Use this regex :
https?:\/\/[a-zA-Z0-9.\/-]*
Online demo http://regexr.com/3d7j7

Java Regex pattern that matches in any online tester but doesn't in Eclipse

I have a piece of code that I can't make it working on Eclipse with Java 1.7 installed.
There is a regex expression I want to use to match and extract 2 strings from every match, so I am using 2 groups for this.
I have tested my expression in many websites (online regex testers) and it works for them bust it isn't working on my Java project in Eclipse.
The source string looks like anyone of these:
Formal Language: isNatural
Annotation Tool: isHuman%Human Annotator: isHuman
Hybrid Annotation: conceptType%Hybrid Annotation Tool: conceptType%Hybrid Tagset: conceptType
... and so on.
I want to extract the first words before the ":" and the word after for every match.
The regex I'm using is this:
(\w*\s*\w+):(\s+\w+)%{0,1}
And the snippet of code:
String attribute = parts[0];
Pattern pattern = Pattern.compile("(\\w*\\s*\\w+):(\\s+\\w+)%{0,1}");
Matcher matcher = pattern.matcher(attribute);
OWLDataProperty dataProp = null;
if (matcher.matches()){
while (matcher.find()){
String name = null, domain = null;
domain = matcher.group(1);
name = matcher.group(2);
dataProp = factory.getOWLDataProperty(":"+Introspector.decapitalize(name), pm);
OWLClass domainClass = factory.getOWLClass(":"+domain.replaceAll(" ", ""), pm);
OWLDataPropertyDomainAxiom domainAxiom = factory.getOWLDataPropertyDomainAxiom(dataProp, domainClass);
manager.applyChange(new AddAxiom(ontology, domainAxiom));
}
Does anybody of you know why it's not working?
Many thanks.

When using matches(), you are asking if the string you provided matches your regex as a whole. It is as if you added ^ at the beginning of your regex and $ at the end.
Your regex is otherwise fine, and returns what you expect. I recommend testing it regexplanet.com, Java mode. You will see when matches() is true, when it false, and what each find() will return.
To solve your problem, I think you only need to remove the if (matcher.matches()) condition.

extracting a specific link pattern using java pattern matcher

Let's say I have a link like below along with a bunch of other links
http://testttt.com/met?tag1=x&tag2=y&tag3=z%20a
I would like to extract the entire link if it starts with http://testttt.com/met
I tried doing the following but it didn't work
Pattern pattern = Pattern.compile("http://testttt.com/met?[a-zA-Z][0-9]");
Matcher match = pattern.matcher("http://testttt.com/met?tag1=x&tag2=y&tag3=z%20a");
if (match.find()) {
System.out.println("match found");
}

Why not just use
if (str.startsWith("http://testttt.com/met")) {
...
}

If your string only contains the url, use the answer proposed by Reimeus. If you're trying to extract the url from a bunch of text, you can use this pattern:
Pattern pattern = Pattern.compile("http://testttt\\.com/met\\??[^\\s]*");
It contains all the necessary escapes and and matches everything up to the next whitespace.

Use RegEx in Java to extract parameters in between parentheses

I'm writing a utility to extract the names of header files from JSPs. I have no problem reading the JSPs line by line and finding the lines I need. I am having a problem extracting the specific text needed using regex. After looking at many similar questions I'm hitting a brick wall.
An example of the String I'll be matching from within is:
<jsp:include page="<%=Pages.getString(\"MY_HEADER\")%>" flush="true"></jsp:include>
All I need is MY_HEADER for this example. Any time I have this tag:
<%=Pages.getString
I need what comes between this:
<%=Pages.getString(\" and this: )%>
Here is what I have currently (which is not working, I might add) :
String currentLine;
while ((currentLine = fileReader.readLine()) != null)
{
Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\]*)");
Matcher matcher = pattern.matcher(currentLine);
while(matcher.find()) {
System.out.println(matcher.group(1).toString());
}}
I need to be able to use the Java RegEx API and regex to extract those header names.
Any help on this issue is greatly appreciated. Thanks!
EDIT:
Resolved this issue, thankfully. The tricky part was, after being given the right regex, it had to be taken into account that the String I was feeding to the regex was always going to have two " / " characters ( (/"MY_HEADER"/) ) that needed to be escaped in the pattern.
Here is what worked (thanks to the help ;-)):
Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\\"]*)");

This should do the trick:
<%=Pages\\.getString\\(\\\\\"([^\\\\]*)
Yeah that's a scary number of back slashes. matcher.group(1) should return MY_HEADER. It starts at the \" and matches everything until the next \ (which I assume here will be at \")%>.)
Of course, if your target text contains a backslash (\), this will not work. But you didn't give an indication that you'd ever be looking for something like <%=Pages.getString(\"Fun!\Yay!\")%> -- where this regex would only return Fun! and ignore the rest.
EDIT
The reason your test case was failing is because you were using this test string:
String currentLine = "<%=Pages.getString(\"MY_HEADER\")%>";
This is the equivalent of reading it in from a file and seeing:
<%=Pages.getString("MY_HEADER")%>
Note the lack of any \. You need to use this instead:
String sCurrentLine = "<%=Pages.getString(\\\"MY_HEADER\\\")%>";
Which is the equivalent of what you want.
This is test code that works:
String currentLine = "<%=Pages.getString(\\\"MY_HEADER\\\")%>";
Pattern pattern = Pattern.compile("<%=Pages\\.getString\\(\\\\\"([^\\\\]*)");
Matcher matcher = pattern.matcher(currentLine);
while(matcher.find()) {
System.out.println(matcher.group(1).toString());
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get link from url and get email by regex - java

I would just use look-behind to lock onto the interesting attributes in the text, and then just capture everything in the "...". Like this ((?<=href="mailto:)|(?<=src="))[^"]+

Related

Java regex only finding one match

Regular Expression to find entire link in string

Java Regex pattern that matches in any online tester but doesn't in Eclipse

extracting a specific link pattern using java pattern matcher

Use RegEx in Java to extract parameters in between parentheses

Categories

Resources