How Java regex match for non-subsequence string? - java

Example:
String url = "http://www.google.com/abcd.jpg";
String url2 = "http://www.google.com/abcd.jpg_xxx.jpg";
I want to match "http://www.google.com/abcd" whatever url or url2.
I write a regex:
(http://.*\.com/[^(.jpg)]*).jpg
But [^(.jpg)]* doesn't like correct. What the regex should be?

Forward slash need to be escaped as well. Use this regex:
^(http:\/\/.+?\.com\/[^.]+)\.jpg
Live Demo

Reluctant quantifier .*? matches to first ".jpg":
(http:\/\/.*\.com\/.*?)\.jpg.*

Related

Remove backslash before forward slash

Context: GoogleBooks API returing unexpected thumbnail url
Ok so i found the reason for the problem i had in that question
what i found was the returned url from the googlebooks api was something like this:
http:/\/books.google.com\/books\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api
Going to that url would return a error, but if i replaced the "\ /"s with "/" it would return the proper url
is there something like a java/kotlin regex that would change this http:/\/books.google.com\/ to this http://books.google.com/
(i know a bit of regex in python but I'm clueless in java/kotlin)
thank you
You can use triple-quoted string literals (that act as raw string literals where backslashes are treated as literal chars and not part of string escape sequences) + kotlin.text.replace:
val text = """http:/\/books.google.com\/books\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api"""
print(text.replace("""\/""", "/"))
Output:
http://books.google.com/books/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api
See the Kotlin demo.
NOTE: you will need to double the backslashes in the regular string literal:
print(text.replace("\\/", "/"))
If you need to use this "backslash + slash" pattern in a regex you will need 2 backslashes in the triple-quoted string literal and 4 backslashes in a regular string literal:
print(text.replace("""\\/""".toRegex(), "/"))
print(text.replace("\\\\/".toRegex(), "/"))
NOTE: There is no need to escape / forward slash in a Kotlin regex declaration as it is not a special regex metacharacter and Kotlin regexps are defined with string literals, not regex literals, and thus do not need regex delimiters (/ is often used as a regex delimiter char in environments that support this notation).
You could match the protocol, and then replace the backslash followed by a forward slash by a forward slash only
https?:\\?/\\?/\S+
Pattern in Java
String regex = "https?:\\\\?/\\\\?/\\S+";
Java demo | regex demo
For example in Java:
String regex = "https?:\\\\?/\\\\?/\\S+";
String string = "http:/\\/books.google.com\\/books\\/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api";
if(string.matches(regex)) {
System.out.println(string.replace("\\/", "/"));
}
}
Output
http://books.google.com/books/content?id=0DwKEBD5ZBUC&printsec=frontcover&img=1&zoom=5&source=gbs_api
I had same problem and my url was:
String url="https:\\/\\/www.dailymotion.com\\/cdn\\/H264-320x240\\/video\\/x83iqpl.mp4?sec=zaJEh8Q2ahOorzbKJTOI7b5FX3QT8OXSbnjpCAnNyUWNHl1kqXq0D9F8iLMFJ0ocg120B-dMbEE5kDQJN4hYIA";
I solved it with this code:
replace("\\/", "/");

Validating numeric sequence

I am using Java and I need to validate a numeric sequence like this: 9999/9999.
I tried using this regex \\d{4}\\\\d{4}, but I am getting false for matches().
My code:
Pattern regex = Pattern.compile("\\d{4}\\\\d{4}");
if (!regex.matcher(mySequence).matches()) {
System.out.println("invalid");
} else {
System.out.println("valid");
}
Can anyone help me, please?
The regex pattern is attempting to match a backslash rather than a forward slash character. You need to use:
Pattern regex = Pattern.compile("\\d{4}/\\d{4}")
Pattern regex = Pattern.compile("\\d{4}\\\\d{4}");
should be
Pattern regex = Pattern.compile("\\d{4}/\\d{4}");
Change your pattern to :
Pattern regex = Pattern.compile("\\d{4}\\\\\\d{4}");
for matching "9999\\9999" (actual value: 9999\9999) (in java you need to escape while declaring String)
or if you want to match "9999/9999" then above solutions would work fine:
Pattern regex = Pattern.compile("\\d{4}/\\d{4}");

Regex java from javascript

Hello I have this regex in Javascript :
var urlregex = new RegExp("((www.)(([a-zA-Z0-9-]){2,}\.){1,4}([a-zA-Z]){2,6}(\/([a-zA-Z-_\/\.0-9#:?=&;,]*)?)?)");
And when I try to put it on a Java String I have this error :
Description Resource Path Location Type
Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ ) CreateActivity.java /SupLink/src/com/supinfo/suplink/activities line 43 Java Problem
So I just want to know what I have to change to render it in Java in order to do this(this function runs fine) :
private boolean IsMatch(String s, String pattern)
{
try {
Pattern patt = Pattern.compile(pattern);
Matcher matcher = patt.matcher(s);
return matcher.matches();
} catch (PatternSyntaxException e){
return false;
}
}
EDIT :
Thank you for your help, now I have this :
private String regex = "((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)";
But I don't match what I really want (regex are horrible ^^), I would like to match thes types of urls :
www.something.com
something.com
something.com/anotherthing/anything
Can you help me again ?
really thanks
When you create the Java string, you need to escape the backslashes a second time so that Java understands that they are literal backslashes. You can replace all existing backslashes with \\. You also need to escape any Java characters that normally need to be escaped.
Currently your regex require www at start. If you want to make it optional add ? after (www.). Also you probably want to escape . after www part. Your regex should probably look like.
"((www\\.)?(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"
You should scape \
something like this
"((www.)(([a-zA-Z0-9-]){2,}\\.){1,4}([a-zA-Z]){2,6}(\\/([a-zA-Z-_\\/\\.0-9#:?=&;,]*)?)?)"

Java Regex - Extract link from HTML anchor

I have the following code
private String anchorRegex = "\\<\\s*?a\\s+.*?href\\s*?=\\s*?([^\\s]*?).*?\\>";
private Pattern anchorPattern = Pattern.compile(anchorRegex, Pattern.CASE_INSENSITIVE);
String content = getContentAsString();
Matcher matcher = anchorPattern.matcher(content);
while(matcher.find()) {
System.out.println(matcher.group(1));
}
The call to getContentAsString() returns the HTML content from a web page. The problem I'm having is that the only thing that gets printed in my System.out is a space. Can anyone see what's wrong with my regex?
Regex drives me crazy sometimes.
You need to delimit your capturing group from the following .*?. There's probably double quotes " around the href, so use those:
<\s*a\s+.*?href\s*=\s*"(\S*?)".*?>
Your regex contains:
([^\s]*?).*?
The ([^\s]*?) says to reluctantly find all non-whitespace characters and save them in a group. But the reluctant *? depends on the next part, which is .; any character. So the matching of the href aborts at the first possible chance and it is the .*? which matches the rest of the URL.
The regex you should be using is this:
String anchorRegex = "(?s)<\\s*a\\s+.*?href\\s*=\\s*['\"]([^\\s>]*)['\"]";
This should be able to pull out the href without too much trouble.
The link is in capture group 2, its expanded and assumes dot-all.
Use Java delimiters as necessary.
(?s)
<a
(?=\s)
(?:[^>"']|"[^"]*"|'[^']*')*? (?<=\s) href \s*=\s* (['"]) (.*?) \1
(?:".*?"|'.*?'|[^>]*?)+
>
or not expanded, not dot-all.
<a(?=\s)(?:[^>"']|"[^"]*"|'[^']*')*?(?<=\s)href\s*=\s*(['"])([\s\S]*?)\1(?:"[\s\S]*?"|'[\s\S]*?'|[^>]*?)+>

Simple regex extract folders

What would be the most efficient way to cover all cases for a retrieve of folder1/folder22
from:
http://localhost:8080/folder1/folder22/file.jpg
or
http://domain.com/folder1/folder22/file.jpg
or
http://127.0.0.0.1:8080/folder1/folder22/file.jpg
so there may be one or more folders/sub-folders. Basically I would like to strip the domain name and port if available and the file name at the end.
Thank for your time.
What about the URL class and getPath()?
Maybe it's not the most efficient way, but one of the simplest I think:
String[] urls = {
"http://localhost:8080/folder1/folder22/file.jpg",
"http://domain.com/folder1/folder22/file.jpg",
"http://127.0.0.0.1:8080/folder1/folder22/file.jpg" };
for (String url : urls)
System.out.println(new File(new URL(url).getPath()).getParent());
You should probably use Java's URL parser for this, but if it has to be a regex:
\b(?=/).*(?=/[^/\r\n]*)
will match /folder1/folder22 in all your examples.
try {
Pattern regex = Pattern.compile("\\b(?=/).*(?=/[^/\r\n]*)");
Matcher regexMatcher = regex.matcher(subjectString);
if (regexMatcher.find()) {
ResultString = regexMatcher.group();
}
Explanation:
\b: Assert position at a word boundary (this will work before a single slash, but not between slashes or after a :)
(?=/): Assert that the next character is a slash.
.*: Match anything until...
(?=/[^/\r\n]*): ...exactly one last / (and anything else except slashes or newlines) follows.
^.+/([^/]+/[^/]+)/[^/]+$
The best way to get the last two directories from a url is the following:
preg_match("/\/([^\/]+\/){2}[^\/]+$/", $path, $matches);
If matched, And $matches[1] will always contain what you want, no matter filename of full url.

Categories