Removing link from Text in Java?

Removing link from Text in Java? - java

I need to change somethign like this -> Hello, go here http://www.google.com for your ...
grab the link, and change it in a method i made, and replace it back into the string like this
-> Hello, go here http://www.yahoo.com for your...
Here is what i have so far:
if(Text.toLowerCase().contains("http://"))
{
// Do stuff
}
else if(Text.toLowerCase().contains("https://"))
{
// Do stuff
}
All i need to do is change the URL in the String to something different. The Url in the String will not always be http://www.google.com, so i can not just say replace("http://www.google.com","")

Use regex:
String oldUrl = text.replaceAll(".*(https?://)www((\\.\\w+)+).*", "www$2");
text = text.replaceAll("(https?://)www(\\.\\w+)+", "$1" + traslateUrl(oldUrl));
Note: code changed to meet extra requirements in comments below.

you can grab the link from the string using below code. I assumed the string will contain only .com domain
String input = "Hello, go here http://www.google.com";
Pattern pattern = Pattern.compile("http[s]{0,1}://www.[a-z-]*.com");
Matcher m = pattern.matcher(input);
while (m.find()) {
String str = m.group();
}

Have you tried something like:
s= s.replaceFirst("http:.+[ ]", new link);
This will find any word beginning with http up till the first white space and replace it with whatever you want
if you want to keep the link then you can do:
String oldURL;
if (s.contains("http")) {
String[] words = s.split(" ");
for (String word: words) {
if (word.contains("http")) {
oldURL = word;
break;
}
}
//then replace the url or whatever
}

You can try this
private String removeUrl(String commentstr)
{
String urlPattern = "((https?|ftp|gopher|telnet|file|Unsure|http):((//)|(\\\\))+[\\w\\d:##%/;$()~_?\\+-=\\\\\\.&]*)";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
int i = 0;
while (m.find()) {
commentstr = commentstr.replaceAll(m.group(i),"").trim();
i++;
}
return commentstr;
}

Related

method to take string inside curly braces using split or tokenizer

String s = "author= {insert text here},";
Trying to get the inside of the string, ive looked around but couldn't find a resolution with just split or tokenizer...
so far im doing this
arraySplitBracket = s.trim().split("\\{", 0);
which gives me insert text here},
at array[1] but id like a way to not have } attached
also tried
StringTokenizer st = new StringTokenizer(s, "\\{,\\},");
But it gave me author= as output.

public static void main(String[] args) {
String input="{a c df sdf TDUS^&%^7 }";
String regEx="(.*[{]{1})(.*)([}]{1})";
Matcher matcher = Pattern.compile(regEx).matcher(input);
if(matcher.matches()) {
System.out.println(matcher.group(2));
}
}

You can use \\{([^}]*)\\} Regex to get string between curly braces.
Code Snap :
String str = "{insert text here}";
Pattern p = Pattern.compile("\\{([^}]*)\\}");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group(1));
}
Output :
insert text here

String s = "auther ={some text here},";
s = s.substring(s.indexOf("{") + 1); //some text here},
s = s.substring(0, s.indexOf("}"));//some text here
System.out.println(s);

How about taking a substring by excluding the character at arraySplitBracket.length()-1
Something like
arraySplitBracket[1] = arraySplitBracket[1].substring(0,arraySplitBracket.length()-1);
Or use String Class's replaceAll function to replace } ?

How to extract id from url ? Google sheet

I have the follow urls.
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258
https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
Foreach url, I need to extract the sheet id: 1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY into a java String.
I am thinking of using split but it can't work with all test cases:
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("/");
String res = parts[parts.length-2];
Log.d("hello res",res );
How can I that be possible?

You can use regex \/d\/(.*?)(\/|$) (regex demo) to solve your problem, if you look closer you can see that the ID exist between d/ and / or end of line for that you can get every thing between this, check this code demo :
String[] urls = new String[]{
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/a/example.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY/edit#gid=1842172258",
"https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY"
};
String regex = "\\/d\\/(.*?)(\\/|$)";
Pattern pattern = Pattern.compile(regex);
for (String url : urls) {
Matcher matcher = pattern.matcher(url);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
}
Outputs
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6PTKTzY0xOM5c6TXY
1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY

it looks like the id you are looking for always follow "/spreadsheets/d/" if it is the case you can update your code to that
String string = "https://docs.google.com/spreadsheets/d/1mrsetjgfZI2BIypz7SGHMOfHGv6kTKTzY0xOM5c6TXY/edit#gid=1842172258";
String[] parts = string.split("spreadsheets/d/");
String result;
if(parts[1].contains("/")){
String[] parts2 = parts[1].split("/");
result = parts2[0];
}
else{
result=parts[1];
}
System.out.println("hello "+ result);

Using regex
Pattern pattern = Pattern.compile("(?<=\\/d\\/)[^\\/]*");
Matcher matcher = pattern.matcher(url);
System.out.println(matcher.group(1));
Using Java
String result = url.substring(url.indexOf("/d/") + 3);
int slash = result.indexOf("/");
result = slash == -1 ? result
: result.substring(0, slash);
System.out.println(result);

Google use fixed lenght characters for its IDs, in your case they are 44 characters and these are the characters google use: alphanumeric, -, and _ so you can use this regex:
regex = "([\w-]){44}"
match = re.search(regex,url)

Two separate patterns and matchers (java)

I'm working on a simple bot for discord and the first pattern reading works fine and I get the results I'm looking for, but the second one doesn't seem to work and I can't figure out why.
Any help would be appreciated
public void onMessageReceived(MessageReceivedEvent event) {
if (event.getMessage().getContent().startsWith("!")) {
String output, newUrl;
String word, strippedWord;
String url = "http://jisho.org/api/v1/search/words?keyword=";
Pattern reading;
Matcher matcher;
word = event.getMessage().getContent();
strippedWord = word.replace("!", "");
newUrl = url + strippedWord;
//Output contains the raw text from jisho
output = getUrlContents(newUrl);
//Searching through the raw text to pull out the first "reading: "
reading = Pattern.compile("\"reading\":\"(.*?)\"");
matcher = reading.matcher(output);
//Searching through the raw text to pull out the first "english_definitions: "
Pattern def = Pattern.compile("\"english_definitions\":[\"(.*?)]");
Matcher matcher2 = def.matcher(output);
event.getTextChannel().sendMessage(matcher2.toString());
if (matcher.find() && matcher2.find()) {
event.getTextChannel().sendMessage("Reading: "+matcher.group(1)).queue();
event.getTextChannel().sendMessage("Definition: "+matcher2.group(1)).queue();
}
else {
event.getTextChannel().sendMessage("Word not found").queue();
}
}
}

You had to escape the [ character to \\[ (once for the Java String and once for the Regex). You also did forget the closing \".
the correct pattern looks like this:
Pattern def = Pattern.compile("\"english_definitions\":\\[\"(.*?)\"]");
At the output, you might want to readd \" and start/end.
event.getTextChannel().sendMessage("Definition: \""+matcher2.group(1) + "\"").queue();

How to remove text between <script></script> tags

I want to remove the content between <script></script>tags. I'm manually checking for the pattern and iterating using while loop. But, I'm getting StringOutOfBoundException at this line:
String script = source.substring(startIndex,endIndex-startIndex);
Below is the complete method:
public static String getHtmlWithoutScript(String source) {
String START_PATTERN = "<script>";
String END_PATTERN = " </script>";
while (source.contains(START_PATTERN)) {
int startIndex=source.lastIndexOf(START_PATTERN);
int endIndex=source.indexOf(END_PATTERN,startIndex);
String script=source.substring(startIndex,endIndex);
source.replace(script,"");
}
return source;
}
Am I doing anything wrong here? And I'm getting endIndex=-1. Can anyone help me to identify, why my code is breaking.

String text = "<script>This is dummy text to remove </script> dont remove this";
StringBuilder sb = new StringBuilder(text);
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
sb.replace(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag), "");
System.out.println(sb.toString());
If you want to remove the script tags too add the following line :
sb.toString().replace(startTag, "").replace(endTag, "")
UPDATE :
If you dont want to use StringBuilder you can do this:
String text = "<script>This is dummy text to remove </script> dont remove this";
String startTag = "<script>";
String endTag = "</script>";
//removing the text between script
String textToRemove = text.substring(text.indexOf(startTag) + startTag.length(), text.indexOf(endTag));
text = text.replace(textToRemove, "");
System.out.println(text);

You can use a regex to remove the script tag content:
public String removeScriptContent(String html) {
if(html != null) {
String re = "<script>(.*)</script>";
Pattern pattern = Pattern.compile(re);
Matcher matcher = pattern.matcher(html);
if (matcher.find()) {
return html.replace(matcher.group(1), "");
}
}
return null;
}
You have to add this two imports:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

I know I'm probably late to the party. But I would like to give you a regex (really tested solution).
What you have to note here is that when it comes to regular expressions, their engines are greedy by default. So a search string such as <script>(.*)</script> will match the entire string starting from <script> up until the end of the line, or end of the file depending on the regexp options used. This is due to the fact that the search engine uses greedy matching by default.
Now in order to perform the match that you want to in an accurate manner... you could use "lazy" searching.
Search with Lazy loading
<script>(.*?)<\/script>
Now with that, you will get accurate results.
You can read more about about Regexp Lazy & Greedy in this answer.

This worked for me:
private static String removeScriptTags(String message) {
String scriptRegex = "<(/)?[ ]*script[^>]*>";
Pattern pattern2 = Pattern.compile(scriptRegex);
if(message != null) {
Matcher matcher2 = pattern2.matcher(message);
StringBuffer str = new StringBuffer(message.length());
while(matcher2.find()) {
matcher2.appendReplacement(str, Matcher.quoteReplacement(" "));
}
matcher2.appendTail(str);
message = str.toString();
}
return message;
}
Credit goes to nealvs: https://nealvs.wordpress.com/2010/06/01/removing-tags-from-a-string-in-java/

Deleting everything except last part of a String?

What kind of method would I use to make this:
http://www.site.net/files/file1.zip
To
file1.zip?

String yourString = "http://www.site.net/files/file1.zip";
int index = yourString.lastIndexOf('/');
String targetString = yourString.substring(index + 1);
System.out.println(targetString);// file1.zip

String str = "http://www.site.net/files/file1.zip";
str = str.substring(str.lastIndexOf("/")+1);

You could use regex to extract the last part:
#Test
public void extractFileNameFromUrl() {
final Matcher matcher = Pattern.compile("[\\w+.]*$").matcher("http://www.site.net/files/file1.zip");
Assert.assertEquals("file1.zip", matcher.find() ? matcher.group(0) : null);
}
It'll return only "file1.zip". Included here as a test as I used it to validate the code.

Use split:
String[] arr = "http://www.site.net/files/file1.zip".split("/");
Then:
String lastPart = arr[arr.length-1];
Update: Another simpler way to get this:
File file = new File("http://www.site.net/files/file1.zip");
System.out.printf("Path: [%s]%n", file.getName()); // file1.zip

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Removing link from Text in Java? - java

Use regex: String oldUrl = text.replaceAll(".(https?://)www((\\.\\w+)+).", "www$2"); text = text.replaceAll("(https?://)www(\\.\\w+)+", "$1" + traslateUrl(oldUrl)); Note: code changed to meet extra requirements in comments below.

Related

method to take string inside curly braces using split or tokenizer

How to extract id from url ? Google sheet

Two separate patterns and matchers (java)

How to remove text between <script></script> tags

Deleting everything except last part of a String?

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Removing link from Text in Java? - java

Use regex: String oldUrl = text.replaceAll(".*(https?://)www((\\.\\w+)+).*", "www$2"); text = text.replaceAll("(https?://)www(\\.\\w+)+", "$1" + traslateUrl(oldUrl)); Note: code changed to meet extra requirements in comments below.

Related

method to take string inside curly braces using split or tokenizer

How to extract id from url ? Google sheet

Two separate patterns and matchers (java)

How to remove text between <script></script> tags

Deleting everything except last part of a String?

Categories

Resources

Use regex: String oldUrl = text.replaceAll(".(https?://)www((\\.\\w+)+).", "www$2"); text = text.replaceAll("(https?://)www(\\.\\w+)+", "$1" + traslateUrl(oldUrl)); Note: code changed to meet extra requirements in comments below.