Extract YouTube ID with or without RegEx

Extract YouTube ID with or without RegEx - java

Please let me know how to get youtube ID without going to regular expression?
Using above method following URL, didn't work
http://www.youtube.com/e/dQw4w9WgXcQ
http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
public static String extractYTId(String youtubeUrl) {
String video_id = "";
try {
if(youtubeUrl != null && youtubeUrl.trim().length() > 0 && youtubeUrl.startsWith("http")) {
String expression = "^.*((youtu.be" + "\\/)" + "|(v\\/)|(\\/u\\/w\\/)|(embed\\/)|(watch\\?))\\??v?=?([^#\\&\\?]*).*"; // var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
//String expression = "^.*(?:youtu.be\\/|v\\/|e\\/|u\\/\\w+\\/|embed\\/|v=)([^#\\&\\?]*).*";
CharSequence input = youtubeUrl;
Pattern pattern = Pattern.compile(expression, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
if(matcher.matches()) {
String groupIndex1 = matcher.group(7);
if(groupIndex1 != null && groupIndex1.length() == 11)
video_id = groupIndex1;
}
}
} catch(Exception e) {
Log.e("YoutubeActivity", "extractYTId " + e.getMessage());
}
return video_id;
}
Other links working fine
http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0
http://www.youtube.com/embed/0zM3nApSvMg?rel=0
http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index
http://www.youtube.com/watch?v=0zM3nApSvMg
http://youtu.be/0zM3nApSvMg
http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s
http://youtu.be/dQw4w9WgXcQ
http://www.youtube.com/embed/dQw4w9WgXcQ
http://www.youtube.com/v/dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0

You can use following RegEx
^(?:(?:https?:\/\/)?(?:www\.)?)?(youtube(?:-nocookie)?\.com|youtu\.be)\/.*?(?:embed|e|v|watch\?.*?v=)?\/?([a-z0-9]+)
RegEx Breakup:
^: Start of the line anchor
(?:(?:https?:\/\/)?(?:www\.)?)?:
(?:https?:\/\/)?: Match http:// or https:// optionally
(?:www\.)?)?: Match www. zero or one time
(youtube(?:-nocookie)?\.com|youtu\.be)\/: Match either
youtube.com or youtube-nocookie.com or youtu.be followed by /
.*?: Lazy match. Match until the next pattern satisfies.
(?:embed|e|v|watch\?.*?v=)?\/?:
(?:embed|e|v|watch\?.*?v=)?: Match embed or e or v or from watch? to v= or nothing
\/?: Match / zero or one time
([a-z0-9]+): Match one or more alphanumeric characters and add that in the captured group.
Live DemoUsing JavaScript
var regex = /^(?:(?:https?:\/\/)?(?:www\.)?)?(youtube(?:-nocookie)?\.com|youtu\.be)\/.*?(?:embed|e|v|watch\?.*?v=)?\/?([a-z0-9]+)/i;
// An array of all the youtube URLs
var youtubeLinks = [
'http://www.youtube.com/e/dQw4w9WgXcQ',
'http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ',
'http://www.youtube.com/v/0zM3nApSvMg?fs=1&hl=en_US&rel=0',
'http://www.youtube.com/embed/0zM3nApSvMg?rel=0',
'http://www.youtube.com/watch?v=0zM3nApSvMg&feature=feedrec_grec_index',
'http://www.youtube.com/watch?v=0zM3nApSvMg',
'http://youtu.be/0zM3nApSvMg',
'http://www.youtube.com/watch?v=0zM3nApSvMg#t=0m10s',
'http://youtu.be/dQw4w9WgXcQ',
'http://www.youtube.com/embed/dQw4w9WgXcQ',
'http://www.youtube.com/v/dQw4w9WgXcQ',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ',
'http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0'
];
// An object to store the results
var youtubeIds = {};
// Iterate over the youtube URLs
youtubeLinks.forEach(function(url) {
// Get the value of second captured group to extract youtube ID
var id = "<span class='youtubeId'>" + (url.match(regex) || [0, 0, 'No ID present'])[2] + "</span>";
// Add the URL and the extracted ID in the result object
youtubeIds[url] = id;
});
// Log the object in the browser console
console.log(youtubeIds);
// To show the result on the page
document.getElementById('output').innerHTML = JSON.stringify(youtubeIds, 0, 4);
.youtubeId {
color: green;
font-weight: bold;
}
<pre id="output"></pre>

Your regex is designed for youtu.be domain, of course it doesn't work with youtube.com one.
Construct java.net.URL (https://docs.oracle.com/javase/7/docs/api/java/net/URL.html) from your URL string
Use URL#getQuery() to get the query part
Check Parse a URI String into Name-Value Collection for a ways to decode query part into a name-value map, and get value for name 'v'
If there is no 'query' part (like in http://www.youtube.com/e/dQw4w9WgXcQ), then use URL#getPath() (which will give you /e/dQw4w9WgXcQ) and parse your video ID from it, e. g., by skipping first 3 symbols: url.getPath().substring(3)
Update. Why not regex? Because standard JDK URL parser is much more robust. It is being tested by the whole Java community, while RegExp-based reinvented wheel is only tested by your own code.

I like to use this function for all YouTube video ids. I pass through the url and return only the id. Check the fiddle below.
var ytSrc = function( url ){
var regExp = /^.*((youtu.be\/)|(v\/)|(\/u\/\w\/)|(embed\/)|(watch\?))\??v?=?([^#\&\?]*).*/;
var match = url.match(regExp);
if (match&&match[7].length==11){
return match[7];
}else{
alert("Url incorrecta");
}
}
https://jsfiddle.net/keinchy/tL4thwd7/1/

Related

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}

You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

How to get youtube video id from URL with java?

I want to get the v=id from youtube's URL with java
Example Youtube URL formats:
http://www.youtube.com/watch?v=u8nQa1cJyX8&a=GxdCwVVULXctT2lYDEPllDR0LRTutYfW
http://www.youtube.com/watch?v=u8nQa1cJyX8
http://youtu.be/0zM3nApSvMg
http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/KdwsulMb8EQ
http://youtu.be/dQw4w9WgXcQ http://www.youtube.com/embed/dQw4w9WgXcQ
http://www.youtube.com/v/dQw4w9WgXcQ
http://www.youtube.com/e/dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/?v=dQw4w9WgXcQ
http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
http://www.youtube.com/?feature=player_embedded&v=dQw4w9WgXcQ
http://www.youtube.com/user/IngridMichaelsonVEVO#p/u/11/KdwsulMb8EQ
http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0
or any other youtube format what contains a video id in the url
I am Trying with that :-
Pattern compiledPattern = Pattern.compile("(?<=v=).*?(?=&|$)",Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(sourceUrl);
if(matcher.find()){
setVideoId(matcher.group());
}
It is not working only for one URL :-
http://youtu.be/6UW3xuJinEg

The code below will extract the video ids for the following type of urls.
http://www.youtube.com/watch?v=dQw4w9WgXcQ&a=GxdCwVVULXctT2lYDEPllDR0LRTutYfW
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://youtu.be/dQw4w9WgXcQ
http://www.youtube.com/embed/dQw4w9WgXcQ
http://www.youtube.com/v/dQw4w9WgXcQ
http://www.youtube.com/e/dQw4w9WgXcQ
http://www.youtube.com/watch?v=dQw4w9WgXcQ
http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0
String pattern = "(?<=watch\\?v=|/videos/|embed\\/|youtu.be\\/|\\/v\\/|\\/e\\/|watch\\?v%3D|watch\\?feature=player_embedded&v=|%2Fvideos%2F|embed%\u200C\u200B2F|youtu.be%2F|%2Fv%2F)[^#\\&\\?\\n]*";
Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(url); //url is youtube url for which you want to extract the id.
if (matcher.find()) {
return matcher.group();
}

6UW3xuJinEg (i mean the string after youtu.be/) is the ID most of the time. But for being more sure you can send HTTP GET request to that URL and it will respond you with a HTTP302 redirect response where you can find the actual redirection URL. you can parse that URL your previous code.
To send and recieve that request and response you can use libraries like jsoup. but because it's just a simple GET request you can simply use java sockets.
Connect to youtube.be on 80 port and write this in output stream:
GET /6UW3xuJinEg HTTP/1.1
# Don't forget the blank lines

I found solution for this .. i expand that URL.. and its working ..
public static String expandUrl(String shortenedUrl) {
URL url;
String expandedURL = "";
try {
url = new URL(shortenedUrl);
// open connection
HttpURLConnection httpURLConnection = (HttpURLConnection) url.openConnection(Proxy.NO_PROXY);
// stop following browser redirect
httpURLConnection.setInstanceFollowRedirects(false);
// extract location header containing the actual destination URL
expandedURL = httpURLConnection.getHeaderField("Location");
httpURLConnection.disconnect();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return expandedURL;
}

private String getYouTubeId(String youTubeUrl) {
String pattern = "https?://(?:[0-9A-Z-]+\\.)?(?:youtu\\.be/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|</a>))[?=&+%\\w]*";
Pattern compiledPattern = Pattern.compile(pattern,
Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(youTubeUrl);
if (matcher.find()) {
return matcher.group(1);
}
return null;
}
Use this method, it works in most case that return Null in above answers.
cases tested:
https://m.youtube.com/watch?feature=youtu.be&v=ROkXM3csNWY
https://www.youtube.com/watch?v=rie69P0W668
https://m.youtube.com/watch?feature=youtu.be&v=JqyzwbpYYqc
https://www.youtube.com/watch?v=YPln3JP_gKs&feature=youtu.be

You can use regex I've created:
public static String YOUTUBE_PATTERN_ID = "^(?:(?:\\w*.?://)?\\w*.?\\w*-?.?\\w*/(?:embed|e|v|watch|.*/)?\\??(?:feature=\\w*\\.?\\w*)?&?(?:v=)?/?)([\\w\\d_-]+).*";
Pattern matcher = Pattern.compile(YOUTUBE_PATTERN_ID).matcher(url)
if (matcher.find()) {
return matcher.group(1)
}
https://regex101.com/r/b0yMMd/1
Used snippet base from this answer: https://stackoverflow.com/a/35436389/7138308
var regex = /^(?:(?:\w*.?:\/\/)?\w*.?\w*\-?.?\w*\/(?:embed|e|v|watch|.*\/)?\??(?:feature=\w*\.?\w*)?\&?(?:v=)?\/?)([\w\d_-]+).*/i;
// An array of all the youtube URLs
var youtubeLinks = [
'http://www.youtube.com/watch?v=u8nQa1cJyX8&a=GxdCwVVULXctT2lYDEPllDR0LRTutYfW ',
'http://www.youtube.com/watch?v=u8nQa1cJyX-8 ',
'http://youtu.be/0zM3nApSvMg ',
'http://www.youtube.com/user/IngridMichaelsonVEVO#p/a/u/1/KdwsulMb8EQ ',
'http://youtu.be/dQw4w9WgXcQ ',
'http://www.youtube.com/embed/dQw4w9WgXcQ ',
'http://www.youtube.com/v/dQw4w9WgXcQ ',
'http://www.youtube.com/e/dQw4w9WgXcQ ',
'http://www.youtube.com/watch?v=dQw4w9WgXcQ ',
'http://www.youtube.com/?v=dQw4w9WgXcQ ',
'http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ ',
'http://www.youtube.com/?feature=player_embedded&v=dQw4w9WgXcQ ',
'http://www.youtube.com/user/IngridMichaelsonVEVO#p/u/11/KdwsulMb8EQ ',
'http://www.youtube-nocookie.com/v/6L3ZvIMwZFM?version=3&hl=en_US&rel=0 ',
'https://m.youtube.com/watch?feature=youtu.be&v=ROkXM3csNWY ',
'https://www.youtube.com/watch?v=rie69P0W668 ',
'https://m.youtube.com/watch?feature=youtu.be&v=JqyzwbpYYqc ',
'https://www.youtube.com/watch?v=YPln3JP_gKs&feature=youtu.be ',
'https://www.youtube.com/watch?v=l-kX8Z4u0Kw&list=PLhml-dmiPOedRDLV8n1ro_OTdzKjOdlyp'
];
// An object to store the results
var youtubeIds = {};
// Iterate over the youtube URLs
youtubeLinks.forEach(function(url) {
// Get the value of second captured group to extract youtube ID
var id = "<span class='youtubeId'>" + (url.match(regex) || [0, 0, 'No ID present'])[1] + "</span>";
// Add the URL and the extracted ID in the result object
youtubeIds[url] = id;
});
// Log the object in the browser console
console.log(youtubeIds);
// To show the result on the page
document.getElementById('output').innerHTML = JSON.stringify(youtubeIds, 0, 4);
.youtubeId {
color: green;
font-weight: bold;
}
<pre id="output"></pre>

Try this code here.
// (?:youtube(?:-nocookie)?\.com\/(?:[^\/\n\s]+\/\S+\/|(?:v|e(?:mbed)?)\/|\S*?[?&]v=)|youtu\.be\/)([a-zA-Z0-9_-]{11})
final static String reg = "(?:youtube(?:-nocookie)?\\.com\\/(?:[^\\/\\n\\s]+\\/\\S+\\/|(?:v|e(?:mbed)?)\\/|\\S*?[?&]v=)|youtu\\.be\\/)([a-zA-Z0-9_-]{11})";
public static String getVideoId(String videoUrl) {
if (videoUrl == null || videoUrl.trim().length() <= 0)
return null;
Pattern pattern = Pattern.compile(reg, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(videoUrl);
if (matcher.find())
return matcher.group(1);
return null;
}
You can find my whole parser code from here
https://github.com/TheFinestArtist/YouTubePlayerActivity/blob/master/library/src/main/java/com/thefinestartist/ytpa/utils/YoutubeUrlParser.java
This is useful open source I made to play Youtube Video.
https://github.com/TheFinestArtist/YouTubePlayerActivity

GWT RegExp - multiple matches

I want to find all the "code" matches in my input string (With GWT RegExp). When I call the "regExp.exec(inputStr)" method it only returns the first match, even when I call it multiple times:
String input = "ff <code>myCode</code> ff <code>myCode2</code> dd <code>myCode3</code>";
String patternStr = "<code[^>]*>(.+?)</code\\s*>";
// Compile and use regular expression
RegExp regExp = RegExp.compile(patternStr);
MatchResult matcher = regExp.exec(inputStr);
boolean matchFound = (matcher != null); // equivalent to regExp.test(inputStr);
if (matchFound) {
// Get all groups for this match
for (int i=0; i<matcher.getGroupCount(); i++) {
String groupStr = matcher.getGroup(i);
System.out.println(groupStr);
}
}
How can I get all the matches?
Edit: Like greedybuddha noted: A regex is not really suited to parse (X)HTML. I gave JSOUP a try and it is much more convienient than with a regex. My code with jsoup now looks like this. I am renaming all code tags and apply them a CSS-Class:
String input = "ff<code>myCode</code>ff<code>myCode2</code>";
Document doc = Jsoup.parse(input, "UTF-8");
Elements links = doc.select("code"); // a with href
for(Element link : links){
System.out.println(link.html());
link.tagName("pre");
link.addClass("prettify");
}
System.out.println(doc);

Compile the regular expression with the "g" flag, for global matching.
RegExp regExp = RegExp.compile(patternStr,"g");
I think you will also want "m" for multiline matching, "gm".
That being said, for HTML/XML parsing you should consider using JSoup or another alternative.

Using Regex to get jsessionid

I have to get the jsessionid code from an url not the jsessionid string. It is possible to match something and exclude it?
https://esgf-data.dkrz.de/esgf-idp/idp/login.htm;jsessionid=436100313FAFBBB9B4DC8BA3C2EC267B
Result = 436100313FAFBBB9B4DC8BA3C2EC267B
Code added from comment:
Pattern pattern = Pattern.compile("/jsessionid=([a-z0-9]+)/i");
Matcher matcher=pattern.matcher(connection.getURL().toExternalForm());

/=([A-Z0-9]+)/ will get all uppercase and numbers after the equals sign = and move them to backreference #1
$subject = 'https://esgf-data.dkrz.de/esgf-idp/idp/login.htm;jsessionid=436100313FAFBBB9B4DC8BA3C2EC267B';
if (preg_match('/=([A-Z0-9]+)/', $subject, $regs)) {
$result = $regs[1];
} else {
$result = "";
}

Try this out :
String data = "https://esgf-data.dkrz.de/esgf-idp/idp/login.htm;jsessionid=436100313FAFBBB9B4DC8BA3C2EC267B";
Pattern pattern = Pattern.compile("jsessionid=(\\w+)");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println("Result is : " + matcher.group(1));
}

Just as a caveat if you have a routing identifier 1 2 on the end of your JSESSIONID, these previous regular expressions might fail. I found this will pick up the routing expression as well (Regex101).
^([A-F0-9]+)((\.[A-Za-z0-9]+)*)$

substring between two delimiters

I have a string as : "This is a URL http://www.google.com/MyDoc.pdf which should be used"
I just need to extract the URL that is starting from http and ending at pdf :
http://www.google.com/MyDoc.pdf
String sLeftDelimiter = "http://";
String[] tempURL = sValueFromAddAtt.split(sLeftDelimiter );
String sRequiredURL = sLeftDelimiter + tempURL[1];
This gives me the output as "http://www.google.com/MyDoc.pdf which should be used"
Need help on this.

This kind of problem is what regular expressions were made for:
Pattern findUrl = Pattern.compile("\\bhttp.*?\\.pdf\\b");
Matcher matcher = findUrl.matcher("This is a URL http://www.google.com/MyDoc.pdf which should be used");
while (matcher.find()) {
System.out.println(matcher.group());
}
The regular expression explained:
\b before the "http" there is a word boundary (i.e. xhttp does not match)
http the string "http" (be aware that this also matches "https" and "httpsomething")
.*? any character (.) any number of times (*), but try to use the least amount of characters (?)
\.pdf the literal string ".pdf"
\b after the ".pdf" there is a word boundary (i.e. .pdfoo does not match)
If you would like to match only http and https, try to use this instead of http in your string:
https?\: - this matches the string http, then an optional "s" (indicated by the ? after the s) and then a colon.

why don't you use startsWith("http://") and endsWith(".pdf") mthods of String class.
Both the method returns boolean value, if both returns true, then your condition succeed else your condition is failed.

Try this
String StringName="This is a URL http://www.google.com/MyDoc.pdf which should be used";
StringName=StringName.substring(StringName.indexOf("http:"),StringName.indexOf("which"));

You can use Regular Expression power for here.
First you have to find Url in original string then remove other part.
Following code shows my suggestion:
String regex = "\\b(http|ftp|file)://[-a-zA-Z0-9+&##/%?=~_|!:,.;]*[-a-zA-Z0-9+&##/%=~_|]";
String str = "This is a URL http://www.google.com/MyDoc.pdf which should be used";
String[] splited = str.split(regex);
for(String current_part : splited)
{
str = str.replace(current_part, "");
}
System.out.println(str);
This snippet code cans retrieve any url in any string with any pattern.
You cant add customize protocol such as https to protocol part in above regular expression.
I hope my answer help you ;)

public static String getStringBetweenStrings(String aString, String aPattern1, String aPattern2) {
String ret = null;
int pos1,pos2;
pos1 = aString.indexOf(aPattern1) + aPattern1.length();
pos2 = aString.indexOf(aPattern2);
if ((pos1>0) && (pos2>0) && (pos2 > pos1)) {
return aString.substring(pos1, pos2);
}
return ret;
}

You can use String.replaceAll with a capturing group and back reference for a very concise solution:
String input = "This is a URL http://www.google.com/MyDoc.pdf which should be used";
System.out.println(input.replaceAll(".*(http.*?\\.pdf).*", "$1"));
Here's a breakdown for the regex: https://regexr.com/3qmus

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract YouTube ID with or without RegEx - java

Related

Two separate patterns and matchers (java)

How to get youtube video id from URL with java?

GWT RegExp - multiple matches

Using Regex to get jsessionid

substring between two delimiters

Categories

Resources