How can I set link inside a text in Android? - java

So, I am using Jsoup for web scraping. I can scrape the data from the web, But, the problem is I am getting the links and text separately. I want those links to set inside my texts. I am using SpannableStringBuilder so, there are a lot of links and a lot of texts. so I can't figure out how to deal with the problem as I am new to android development.
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final SpannableStringBuilder
builder = new SpannableStringBuilder();
try {
Document doc = Jsoup.
connect("https://www.wikipedia.org/").get();
String title = doc.title();
Elements links = doc.select("a[href]");
builder.append(title).append("\n");
for (Element link : links) {
final String url = link.attr("href");
builder.append("\n")
.append("Link: ")
.append(url, new URLSpan(url),
Spannable.SPAN_EXCLUSIVE_EXCLUSIVE)
.append("\n")
.append("Text: ")
.append(link.text());
}
} catch (IOException e) {
builder.append("Error : ")
.append(e.getMessage()).append("\n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
textView.setText(builder.toString());
textView.setMovementMethod
(LinkMovementMethod.getInstance());
}
});
}
}).start();}
I am getting output like this format.
Link : //en.wikipedia.org/
Text : English 5 678 000+ articles
Link : //ja.wikipedia.org/
Text : 日本語 1 112 000+ 記事
Link : //es.wikipedia.org/
Text : Español 1 430 000+ artículos
......
......
I want to have an output like this format,
** Texts: English 5 678 000+ articles**,
inside that line, I want to
join this link
** Link://en.wikipedia.org/**
as hyperlinked or in some way so that I can click this text and go to the webpage directly like in MS Word.

You are looking for setting text values using HTML. Here is the documentation and Here is some sample code:
String str = "Do you want to search on " + "<a href=http//www.google.com>" +
"Google" + "</a>" + " or " + "<a href=http//www.yahoo.com>" +
"Yahoo" + "</a>" + "?";
if(Build.VERSION.SDK_INT >= 24) {
viewToSet.setText(Html.fromHtml(str, Html.FROM_HTML_MODE_LEGACY));
} else {
viewToSet.setText(Html.fromHtml(str));
}
In it, you can set values using HTML. You can also update colors, bold, italics, etc, as long as you utilize HTML properties.

Related

Split pdf into sections based on titles/bookmarks using regex

I am reading a pdf and extract the text from it into an ArrayList. I am collecting all the bookmarks from the pdf which in this case are the titles of each section and I add them into a list. I want to extract each section by using regex based on the titles/bookmarks. Below what I have tried so far.
for (String text1 : texts) {
for (int title = 0; title < titles.size(); title++) {
try {
//Regex Pattern to find the text between two titles
Pattern p = Pattern.compile(titles.get(title) +
".*(\\n.*)+?(?=" + titles.get(title + 1) + ')');
// the issue here is that the title+1 goes over the size of the titles
Matcher matcherTitle = p.matcher(text1);
// System.out.println(p.matcher(text));
for (int i = 1; i <= matcherTitle.groupCount(); i++) {
System.out.println(matcherTitle.group(i));
}
} catch (Exception e) {
Pattern p = Pattern.compile(titles.get(title) +
"(.|\\n)\\.*");
Matcher matcherTitle = p.matcher(text1);
// System.out.println(p.matcher(text).results().toString());
// System.out.println(titles.get(title) + " This title is the last Title");
}
}
}
Let's say that titles is the list with the titles and texts is the list with the text. Unfortunately, nothing is printed out. In the end I would like to write each section into a txt file with the name of the file being the title of each section.

How to get scrape specific URL from multiple URL in Webpage Java

I am doing data scraping for the first time. My assignment is to get specific URL from webpage where there are multiple links (help, click here etc). How can I get specific url and ignore random links? In this link I only want to get The SEC adopted changes to the exempt offering framework and ignore other links. How do I do that in Java? I was able to extract all URL but not sure how to get specific one. Below is my code
while (rs.next()) {
String Content = rs.getString("Content");
doc = Jsoup.parse(Content);
//email extract
Pattern p = Pattern.compile("[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+");
Matcher matcher = p.matcher(doc.text());
Set<String> emails = new HashSet<String>();
while (matcher.find()) {
emails.add(matcher.group());
}
System.out.println(emails);
//title extract
String title = doc.title();
System.out.println("Title: " + title);
}
Elements links = doc.select("a");
for(Element link: links) {
String url = link.attr("href");
System.out.println("\nlink :"+ url);
System.out.println("text: " + link.text());
}
System.out.println("Getting all the images");
Elements image = doc.getElementsByTag("img");
for(Element src:image) {
System.out.println("src "+ src.attr("abs:src"));
}

[JAVA]Get html link from webpage

I want to get the link in this pic using java, image is below. There are few more links in that webpage. I found this code on stackoverflow, I don't understand how to use it though.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class weber{
public static void main(String[] args)throws Exception{
String url = "http://www.skyovnis.com/category/ufology/";
Document doc = Jsoup.connect(url).get();
/*String question = doc.select("#site-inner").text();
System.out.println("Question: " + question);*/
Elements anser = doc.select("#container .entry-title a");
for (Element anse : anser){
System.out.println("Answer: " + anse.text());
}
}
}
code is edited from the original I found tho. please help.
For your URL following code works fine.
public static void main(String[] args) {
Document doc;
try {
// need http protocol
doc = Jsoup.connect("http://www.skyovnis.com/category/ufology/").userAgent("Mozilla").get();
// get page title
String title = doc.title();
System.out.println("title : " + title);
// get all links (this is what you want)
Elements links = doc.select("a[href]");
for (Element link : links) {
// get the value from href attribute
System.out.println("\nlink : " + link.attr("href"));
System.out.println("text : " + link.text());
}
} catch (IOException e) {
e.printStackTrace();
}
}
output was
title : Ufology
link : http://www.shop.skyovnis.com/
text : Shop
link : http://www.shop.skyovnis.com/product-category/books/
text : Books
Following code filter the links by text of it.
for (Element link : links) {
if(link.text().contains("Arecibo Message"))//find the link with some texts
{
System.out.println("here is the element you need");
System.out.println("\nlink : " + link.attr("href"));
System.out.println("text : " + link.text());
}
}
It’s recommended to specify a “userAgent” in Jsoup, to avoid HTTP 403 error messages.
Document doc = Jsoup.connect("http://anyurl.com").userAgent("Mozilla").get();
"Onna malli mage yuthukama kala."
refernce :
https://www.mkyong.com/java/jsoup-html-parser-hello-world-examples/

How to get text from specific href with jsoup?

I get text from http://m.wol.jw.org/en/wol/dt/r1/lp-e/2014/6/26 via jsoup in my android app.
It looks like:
public static void refreshFromNetwork(Context context) {
Document document;
Elements dateElement;
Elements textElement;
Elements commentElement;
try {
Calendar calendar = Calendar.getInstance();
int year = calendar.get(Calendar.YEAR);
int month = calendar.get(Calendar.MONTH) + 1;
int day = calendar.get(Calendar.DAY_OF_MONTH);
sDayURL = sURL + "/" + year + "/" + month + "/" + day;
document = Jsoup.connect(sDayURL).get();
if (document.hasText()) {
dateElement = document.select(".ss");
textElement = document.select(".sa");
commentElement = document.select(".sb");
sDate = dateElement.text();
sText = textElement.text();
sComment = commentElement.html();
sSavedForCheckingDate = sLocalDate;
savePrefs(context);
sDayURL = null;
} else {
Toast.makeText(mContext,
mContext.getString(R.string.warning_unstable_connection),
Toast.LENGTH_SHORT).show();
}
} catch (IOException e) {
System.out.println("error");
e.printStackTrace();
}
}
But there are some hrefs in text. When the cursor is on them, pops up with text frame.
I can't post images, so see it there: http://habrastorage.org/files/45e/b09/17f/45eb0917f3644bbd9e5ea2b79d98363d.png
But when I try to get text from that href (I get it from sComment with html), it returns me all the text (which displays when I click on href), not part of it, like in popup. I'm not a web developer, so I don't understand, how to get only the desired text. How can I do it?
Follow the snapshot below to get only the text on pop-up
Click the pop-up href
See the text the popup text is on the this page also, to extract only the text shown on popup simply use this class and display the contents
When you click on the link href, a new page open with the same text with red font this is the text you need as it is the pop-up text, now you have just use
String Href=Scomment.attr("href");
Document doc=Jsoup.connect(Href).get();
Element element= doc.getElementById("p101");
String dialogtext=element.text();
This is the solution to you question.
Hope it'll help you
Use sComment = commentElement.text(); instead.

Java script add element without reload

I have this div in which I add more divs with javascript.
However, everytime I add a new div the to div with javascript, it refreshes so for example a youtube video in one of those divs will stop playing.
Can I put these divs into the div without reloading it?
My current code for putting in thing is
m += "new thing <a> and other stuff </a>"
I NEED it to put the new thing I want without reload.
I currently put them in using href="javascript: addMessage('current time', 'user', 'message')"
The addMessage code:
function addMessage(time, user, msg) {
if (msg == "") {
return false;
}
var m = document.getElementById('message-panel');
m.innerHTML += "<div class='sentMessage'><span class='time'>" + time + "</span><span class='name'><a>" + user + "</a></span><span class='message'>" + msg + "</span></div>";
pageScroll();
if (user != "SERVER") {
if (user != "ERROR") {
playAudio('new-message-sound');
}
}
return false;
}
My only solution to putting new messages in if with href="javascript:addMessage()". I CAN NOT DO ONCLICK="" because I'm using java to controll the javascript!
My javacode for putting in the messages:
public void addMessage(String user, String msg) {
try {
getAppletContext().showDocument(new URL("javascript:addMessage(\"" + Time.now("HH:mm") + "\", \"" + user + "\", \"" + msg + "\")"));
}
catch (MalformedURLException me) {}
}
Thanks in advance, enji
Create the element with document.createElement('tag name here');, then insert it with m.appendChild(newelement);. This leaves any elements before it unaffected.

Categories