Java/Android regex test if in a string is a link - java

Pattern.compile("((http\\://|https\\://|ftp\\://|sftp\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\\?\\.'~]*)?");
I have this pattern, I'd like to test if there is a link in my string.
I'd like to linkify those text in a TextView.
The code does not work when the link contains a & character.
full code:
Pattern httpMatcher = Pattern.compile("((http\\://|https\\://|ftp\\://|sftp\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\\?\\.'~]*)?");
String httpViewURL = "myhttp://";
Linkify.addLinks(label, httpMatcher, httpViewURL);

I think this is cleaner that using regex:
boolean isLink(String s) {
try {
new URL(s);
return true;
} catch (MalformedURLException e) {
return false;
}
}

You can use Patterns.WEB_URL:
public boolean isLink(String string) {
return Patterns.WEB_URL.matcher(string).matches();
}
Note that Patterns class is available only since API level 8, but you can get its source code here https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java

Pattern httpMatcher = Pattern.compile("((http\\://|https\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%&#-:/-_\\?\\.'~]*)?");
this is working now, thanks

Related

Replace Text using Apache POI in Header/Footer not working when a dot is inside the placeholder

I use templ4docx/Apache POI (2.0.3/3.17).
There you can set a VariablePatten like this:
private static final String START_PATTERN = "#{";
private static final String END_PATTERN = "}";
...
targetDocx.setVariablePattern(new VariablePattern(START_PATTERN, END_PATTERN));
When i use a placeholder with dots, it´s not working inside Header/Footer. In the Body with dots it works. And Images works too with placeholder and dots inside!
Example in Word-Template:
#{Person.Name} // works in Body NOT in Header/Footer!
#{Person.Name} // works in Body and Header/Footer!
#{Person} // works in Body and Header/Footer!
#{Image.Name} // works in Body and Header/Footer! Here i use ImageVariable instead of Textvariable.
I debug the code an saw the "run.setText()" is called with the right Text but in the final document it´s not.
#Override
public void insert(Insert insert, Variable variable) {
if (!(insert instanceof TextInsert)) {
return;
}
if (!(variable instanceof TextVariable)) {
return;
}
TextInsert textInsert = (TextInsert) insert;
TextVariable textVariable = (TextVariable) variable;
for (XWPFRun run : textInsert.getParagraph().getRuns()) {
String text = run.getText(0);
if (StringUtils.contains(text, textInsert.getKey().getKey())) {
text = StringUtils.replace(text, textVariable.getKey(), textVariable.getValue());
if (text.contains("\n")) {
String[] textLines = text.split("\n");
run.setText(textLines[0], 0);
for (int i = 1; i < textLines.length; i++) {
run.addBreak();
run.setText(textLines[i]);
}
} else {
run.setText(text, 0);
}
}
}
}
Any ideas why it didn´t work with Placeholder "#{Person.Name}" as a TextVariable in Header/Footer? But it works with "#{PersonName}" and ImageVariable "#{Images.Logo1}"???
It looks like Word sometimes separates the placeholders, so you'll only find parts of the placeholder in the runs.
In the class "TextInsertStrategy" i check in the loop of the runs for split placeholders and treat it accordingly. With that I could solve the problem.

How to match string that ends with a number using XPath

The issue is that I'm looking to construct an XPath expression to get nodes having attributes XXX having values like TP* where the star is a number.
Suppose I have this XML file
<tagA attA="VAL1">text</tagA>
<tagB attB="VAL333">text</tagB>
<tagA attA="VAL2">text</tagA>
<tagA attA="V2">text</tagA>
So the xpath expression should get me all tagA having attribute attrA with values with the pattern VAL*
//tagA[#attrA[matches('VAL\d')]]: is not working
If you need XPath 1.0 solution, try below:
//tagA[boolean(number(substring-after(#attA, "VAL"))) or number(substring-after(#attA, "VAL")) = 0]
If #attA cannot be "VAL0", then just
//tagA[boolean(number(substring-after(#attA, "VAL")))]
matches() requires XPath 2.0, but javax.xml.xpath in Java 8 supports only XPath 1.0.
Furthermore, the first argument of matches() is the string to match. So, you'd want:
//tagA[#attrA[matches(., 'VAL\d')]]
This is looking for "VAL" plus a single digit anywhere in the attribute value of #attrA. See the regex in #jschnasse's answer if you wish to match the entire string with multiple/optional digit suffixes (XPath 2.0) or Andersson's answer for an XPath 1.0 solution.
Add a quantifier (*,+,...) to your \d. Try
'^VAL\d*$'
As #kjhughes has pointed out. This will not work with standard Java, because even current version of Java 11 does not support XPath 2.0.
You can however use Saxon if you need XPath 2.0 support.
Saxon Example (It is a variant of this answer using javax.xml)
Processor processor = new Processor(false);
#Test
public void xpathWithSaxon() {
String xml = "<root><tagA attA=\"VAL1\">text</tagA>\n" + "<tagB attB=\"VAL333\">text</tagB>\n"
+ "<tagA attA=\"VAL2\">text</tagA>\n" + "<tagA attA=\"V2\">text</tagA>\n" + "</root>";
try (InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));) {
processFilteredXmlWith(in, "//root/tagA[matches(#attA,'^VAL\\d*$')]", (node) -> {
printItem(node, System.out);
});
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private void printItem(XdmItem node, PrintStream out) {
out.println(node);
}
public void processFilteredXmlWith(InputStream in, String xpath, Consumer<XdmItem> process) {
XdmNode doc = readXmlWith(in);
XdmValue list = filterNodesByXPathWith(doc, xpath);
list.forEach((node) -> {
process.accept(node);
});
}
private XdmNode readXmlWith(InputStream xmlin) {
try {
return processor.newDocumentBuilder().build(new StreamSource(xmlin));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private XdmValue filterNodesByXPathWith(XdmNode doc, String xpathExpr) {
try {
return processor.newXPathCompiler().evaluate(xpathExpr, doc);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Prints
<tagA attA="VAL1">text</tagA>
<tagA attA="VAL2">text</tagA>

Converting String to java.net.URI

First apologies, I'm mainly a Perl person doing some Java. I've read some literature but can't get this to give me the signature that I need:
logger.debug("Entered addRelationships");
boolean rval = true;
for(int i=0;i<relationships.length;i++)
{
URI converted_uri ;
try {
converted_uri = new URI("relationships[i].datatype") ;
} catch (URISyntaxException e) {
logger.error("Error converting datatype", e);
return rval = false ;
}
boolean r = addRelationship(context, relationships[i].subject,
relationships[i].predicate, relationships[i].object,
relationships[i].isLiteral, converted_uri);
if(r==false)
{
rval = false;
}
}
return rval;
}
The resulting error is:
addRelationship(org.fcrepo.server.Context,java.lang.String,java.lang.String,java.lang.String,boolean,java.lang.String) in org.fcrepo.server.management.DefaultManagement cannot be applied to (org.fcrepo.server.Context,java.lang.String,java.lang.String,java.lang.String,boolean,java.net.URI)
It seems to me that converted_uri is a URI at the end of this? datatype was a String in the previous release, so no gymnastics were required!
Just remove the qoutes:
converted_uri = new URI(relationships[i].datatype) ;
When you are using quotes, exactly as in perl you are dealing with string literal. If you want to refer to variable you have to mention it in code directly.
While #AlexR pointed out another problem in your code, it's not the cause of the problem you identified in your question.
You had a compile error, and an error in the syntax of the URI will only show up at runtime like the one that #AlexR identified.
The problem you have is that you are trying to pass a URI as the last argument, but the method addRelationship expects a String as the last argument. That's what the error says.
(The first part of the error says what the signature of the method is in reality, as you see it ends in java.lang.String, while the second part of the error says what type of data you are trying to give to the method, and as you see that one ends in java.net.URI)
So it seems that the URI was not changed as you expected; it still needs a String.
Solution is to change your code to:
boolean rval = true;
for(int i = 0; i < relationships.length; i++)
{
boolean r = addRelationship(context, relationships[i].subject,
relationships[i].predicate, relationships[i].object,
relationships[i].isLiteral, relationships[i].datatype);
if (!r)
{
rval = false;
}
}
return rval;

Android / Java: Check if url is valid youtube url

I want to check if an url is valid youtube url so that I can shown in view otherwise I will hide the view.
Is there any regular expression in Java that can help me to check if url is valid. Currently I am using this regex but I guess it's not the one I want:
String youTubeURl = "https://www.youtube.com/watch?v=Btr8uOU0BkI";
String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";
if (!youTubeURl.isEmpty() && youTubeURl.matches(pattern)) {
/// Valid youtube URL
}
else{
// Not Valid youtube URL
}
This works for me.
public static boolean isYoutubeUrl(String youTubeURl)
{
boolean success;
String pattern = "^(http(s)?:\\/\\/)?((w){3}.)?youtu(be|.be)?(\\.com)?\\/.+";
if (!youTubeURl.isEmpty() && youTubeURl.matches(pattern))
{
success = true;
}
else
{
// Not Valid youtube URL
success = false;
}
return success;
}
If you want to retrieve the Youtube videoId you can use the following function.
public static String getVideoIdFromYoutubeUrl(String youtubeUrl)
{
/*
Possibile Youtube urls.
http://www.youtube.com/watch?v=WK0YhfKqdaI
http://www.youtube.com/embed/WK0YhfKqdaI
http://www.youtube.com/v/WK0YhfKqdaI
http://www.youtube-nocookie.com/v/WK0YhfKqdaI?version=3&hl=en_US&rel=0
http://www.youtube.com/watch?v=WK0YhfKqdaI
http://www.youtube.com/watch?feature=player_embedded&v=WK0YhfKqdaI
http://www.youtube.com/e/WK0YhfKqdaI
http://youtu.be/WK0YhfKqdaI
*/
String pattern = "(?<=watch\\?v=|/videos/|embed\\/|youtu.be\\/|\\/v\\/|\\/e\\/|watch\\?v%3D|watch\\?feature=player_embedded&v=|%2Fvideos%2F|embed%\u200C\u200B2F|youtu.be%2F|%2Fv%2F)[^#\\&\\?\\n]*";
Pattern compiledPattern = Pattern.compile(pattern);
//url is youtube url for which you want to extract the id.
Matcher matcher = compiledPattern.matcher(youtubeUrl);
if (matcher.find()) {
return matcher.group();
}
return null;
}
You should use
Patterns.WEB_URL.matcher(youTubeURl).matches()
It will return True if URL is valid and false if URL is invalid.
Use android.webkit.URLUtil.isValidUrl(java.lang.String) to check if url is valid. And then you can check if url contains Youtube string.
Like
private boolean isValidUrl(String url) {
if (url == null) {
return false;
}
if (URLUtil.isValidUrl(url)) {
// Check host of url if youtube exists
Uri uri = Uri.parse(url);
if ("www.youtube.com".equals(uri.getHost())) {
return true;
}
// Other way You can check into url also like
//if (url.startsWith("https://www.youtube.com/")) {
//return true;
//}
}
// In other any case
return false;
}
In order to achieve what you want you should use this Regex like so:
private static final Pattern youtubePattern = Pattern.compile("^(http(s)?:\/\/)?((w){3}.)?youtu(be|.be)?(\.com)?\/.+");
private boolean isValid = youtubePattern.matcher(youtubeUrl).matches();
where youtubeUrl can be any URL of the following list:
http://youtu.be/t-ZRX8984sc
http://youtube.com/watch?v=iwGFalTRHDA
http://www.youtube.com/watch?v=t-ZRX8984sc
http://www.youtube.com/watch?v=iwGFalTRHDA&feature=related
http://www.youtube.com/embed/watch?feature=player_embedded&v=r5nB9u4jjy4
https://www.youtube.com/channel/UCDZkgJZDyUnqwB070OyP72g
youtube.com/n17B_uFF4cA
youtube.com/iwGFalTRHDA
http://youtu.be/n17B_uFF4cA
https://youtube.com/iwGFalTRHDA
https://youtube.com/channel/UCDZkgJZDyUnqwB070OyP72g
This regex can match any types of URLs related to youtube.com
This doesn't check if URL is valid or not. It only checks for strings "youtube" & "youtu.be" in your URL. Following is the regex
String yourUrl = "https://youtu.be/Xh0-x1RFEOY";
yourUrl.matches(".*(youtube|youtu.be).*")
" .* " at beginning & end means that there could be anything on left & right of the expression(youtube & youtu.be) you are checking.
NOTE: This has nothing to do with the validity of the URL

MongoDB regex, I get a different answer from the Java API compared with the console

I must be doing my regex wrong.
In the console I do
db.triples.find({sub_uri: /.*pdf.*/ }); and get the desired result.
My Java class looks like this, (I have set input="pdf"):
public static List<Triple> search(String input){
DB db=null;
try {
db = Dao.getDB();
}
catch (UnknownHostException e1) { e1.printStackTrace(); }
catch (MongoException e1) { e1.printStackTrace(); }
String pattern = "/.*"+input+".*/";
System.out.println(input);
List<Triple> triples = new ArrayList<Triple>();
DBCollection triplesColl = null;
try {
triplesColl = db.getCollection("triples"); } catch (MongoException e) { e.printStackTrace();}
{
Pattern match = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
BasicDBObject query = new BasicDBObject("sub_uri", match);
// finds all people with "name" matching /joh?n/i
DBCursor cursor = triplesColl.find(query);
if(cursor.hasNext()){
DBObject tripleAsBSON = cursor.next();
Triple t = new Triple();
t.setSubject(new Resource((String)tripleAsBSON.get("sub_uri")));
System.out.println(t.getSubject().getUri());
triples.add(t);
}
}
return triples;
}
From the console I get 12 results as I should, from the Java code I get no results.
Java doesn't need/understand regex delimiters (/ around the regex). You need to remove them:
String pattern = ".*"+input+".*";
I'm also not sure if that regex is really what you want. At least you should anchor it:
String pattern = "^.*"+input+".*$";
and compile it using the Pattern.MULTILINE option. This avoids a severe performance penalty if a line doesn't contain your sub-regex input. You are aware that input is a regex, not a verbatim string, right?

Categories