JSoup storing text in a variable - java

i'm new in JAVA / Android development.
I made an app to extract text from a HTML class;
protected List<String> doInBackground(String... url) {
try {
Document doc = Jsoup.connect(
"http://example/test.html").get();
Elements st1 = doc.select("a[class*=subject_rating_details");
for (Element element : st1) {
sgrade[0] = st1.get(0).text();
sgrade[1] = st1.get(0).text();
sgrade[2] = st1.get(0).text();
sgrade[3] = st1.get(0).text();
sgrade[4] = st1.get(0).text();
}
} catch (IOException e) {
e.printStackTrace();
}
List<String> pinfo = null;
return pinfo;
}
#Override
protected void onPostExecute(List<String> pinfo) {
prog.dismiss();
}
}
List<ListData> varlist = new ArrayList<ListData>();
String sgrade[] = new String[] {};
I used JSoup to extract from my webpage different text from the HTML class="subject_rating_details".
But it force closes with the code above.
I can successfully extract it with a single String, example:
for (Element element : st1) {
stringname = st1.get(0).text();
stringname = st1.get(1).text();
stringname = st1.get(2).text();
stringname = st1.get(3).text();
stringname = st1.get(4).text();
}
But it only stores the last one ( stringname = st1.get(4).text(); )
I've tried also:
for (Element element : st1) {
stringname1 = st1.get(0).text();
stringname2 = st1.get(1).text();
stringname3 = st1.get(2).text();
stringname4 = st1.get(3).text();
stringname5 = st1.get(4).text();
}
But i need the text from st1 in a single variable.
What can i do?
Thanks
EDIT
I want something like this:
String sgrade[] = new String[] {};
for (Element element : st1) {
sgrade[0] = st1.get(0).text();
sgrade[1] = st1.get(0).text();
sgrade[2] = st1.get(0).text();
sgrade[3] = st1.get(0).text();
sgrade[4] = st1.get(0).text();
}
Witch later i could read each text and display it in a TextView:
textview1.setText(sgrade[0]); <--/// This would display "Ford"
textview2.setText(sgrade[1]); <--/// This would display "Mustang"
textview3.setText(sgrade[2]); <--/// This would display "2013"
/// HTML ///
...
<p class="subject_rating_details">Ford</p>
<p class="subject_rating_details">Mustang</p>
<p class="subject_rating_details">2013</p>
...
/// HTML ///

Please try this way. With this you will get value of st1 in single string named stringname.
List<String> stringname =new ArrayList<String>();
for (Element element : st1) {
stringname.add(st1.get(0).text());
stringname.add(st1.get(1).text());
stringname.add(st1.get(2).text());
stringname.add(st1.get(3).text());
stringname.add(st1.get(4).text());
}

Related

How to get Ticker symbol from table using jsoup?

I'm trying to get the symbols from the table at YahooFinance, but can't figure out why my code doesn't detect the table.
This is what I tried:
public String[] getTrendingTickers() {
String[] trendingTickers = new String[30];
int numTickers = 0;
String url = "https://finance.yahoo.com/trending-tickers/";
try {
Document document = Jsoup.connect(url).get();
for (Element row : document.select("table.W(100%) tr")) {
String ticker = row.select(
".Fz\\(s\\).Ta\\(start\\)\\!.Bgc\\(\\$lv2BgColor\\).Z\\(1\\).Bgc\\(\\$lv3BgColor\\).Pos\\(st\\).simpTblRow\\:h_Bgc\\(\\$hoverBgColor\\).Pend\\(10px\\).Start\\(0\\).Pend\\(15px\\).Pstart\\(6px\\).Ta\\(start\\).Va\\(m\\)")
.text();
System.out.println(ticker);
trendingTickers[numTickers] = ticker;
numTickers++;
}
} catch (Exception e) {
System.out.println(e);
}
return trendingTickers;
}
With the error org.jsoup.select.Selector$SelectorParseException: Could not parse query 'table.W(100%).tr': unexpected token at '(100%).tr'
Here is some sample code that creates a list of all the symbols in the table of the page you reference:
Document document = Jsoup.connect("https://finance.yahoo.com/trending-tickers/").get();
Element table = document.select("table tbody").first();
List<String> symbols = new ArrayList<>();
for (Element row: table.select("tr")) {
symbols.add(row.select("td").first().text());
}
System.out.println(symbols);
See https://jsoup.org/apidocs/org/jsoup/select/Selector.html for details on the selector syntax.

Combine 2 array lists of objects that have null values

I'm trying to concatenate 2 array lists of objects into one but i can't figure out how to do it. I've tried with addAll and add but those methods won't really do what i want.
Basically, i have one array list with values like this:
SearchResult1 [title=null, url=null, price=19 690 EUR]
And another one with values like this:
SearchResult2 [title=Ford Car, url=http://www.something.com, price=null]
How can i combine those 2 arrays into one with values like this:
SearchResult3 [title=Ford Car, url=http://www.something.com, price=19 690 EUR]
This is the code so far:
public List searchMethod() {
try {
final String query = "ford";
final Document page = Jsoup.connect("link" + URLEncoder.encode(query, "UTF-8")).userAgent(USER_AGENT).get();
List<SearchResult> resultList1 = new ArrayList<SearchResult>();
List<SearchResult> resultList2 = new ArrayList<SearchResult>();
List<SearchResult> resultList3 = new ArrayList<SearchResult>();
for(Element searchResult : page.select(".offer-price")) {
String price = searchResult.text();
resultList1.add(new SearchResult(price));
}
for(Element searchResult : page.select(".offer-title__link")) {
String title = searchResult.text();
String url = searchResult.attr("href");
resultList2.add(new SearchResult(title, url));
}
resultList3.addAll(resultList1);
resultList3.addAll(resultList2);
return resultList3;
}catch(Exception e) {
e.printStackTrace();
}
return Collections.emptyList();
}
The values that i put in those arrays are extracted from a web page
Thanks for helping!
From the comment, you have said that you just want to correlate/merge the objects from both lists by each index.
You can simply loop through the list, constructing a new SearchResult (assuming you have getters for the fields)
for(int i = 0; i < resultList1.size(); i++) {
resultList3.add(new SearchResult(resultList1.get(i).getPrice(),
resultList2.get(i).getTitle(),
resultList2.get(i).getUrl()));
}
You may have to change the order of the passed arguments to the SearchResult constructor taking price, title and url as you haven't shown it.
why don't you do it in one shot?
List<SearchResult> resultList1 = new ArrayList<SearchResult>();
for(Element searchResult : page.select(".offer-title__link")) {
String title = searchResult.text();
String url = searchResult.attr("href");
resultList1.add(new SearchResult(title, url));
}
int index = 0;
for(Element searchResult : page.select(".offer-price")) {
String price = searchResult.text();
//since you have already assumed
//that price will come in the same order and title and url.
resultList1.get(index++).setPrice(price);
}
return resultList1;

How to properly print nested HTML lists using iText? [duplicate]

I have XHTML content, and I have to create from this content a PDF file on the fly. I use iText pdf converter.
I tried the simple way, but I always get bad result after calling the XMLWorkerHelper parser.
XHTML:
<ul>
<li>First
<ol>
<li>Second</li>
<li>Second</li>
</ol>
</li>
<li>First</li>
</ul>
The expected value:
First
Second
Second
First
PDF result:
First Second Second
First
In the result there is no nested list. I need a solution for calling the parser, and not creating an iText Document instance.
Please take a look at the example NestedListHtml
In this example, I take your code snippet list.html:
<ul>
<li>First
<ol>
<li>Second</li>
<li>Second</li>
</ol>
</li>
<li>First</li>
</ul>
And I parse it into an ElementList:
// CSS
CSSResolver cssResolver =
XMLWorkerHelper.getInstance().getDefaultCssResolver(true);
// HTML
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
htmlContext.autoBookmark(false);
// Pipelines
ElementList elements = new ElementList();
ElementHandlerPipeline end = new ElementHandlerPipeline(elements, null);
HtmlPipeline html = new HtmlPipeline(htmlContext, end);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML));
Now I can add this list to the Document:
for (Element e : elements) {
document.add(e);
}
Or I can list this list to a Paragraph:
Paragraph para = new Paragraph();
for (Element e : elements) {
para.add(e);
}
document.add(para);
You will get the desired result as shown in nested_list.pdf
You can not add nested lists to a PdfPCell or to a ColumnText. For instance: this will not work:
PdfPTable table = new PdfPTable(2);
table.addCell("Nested lists don't work in a cell");
PdfPCell cell = new PdfPCell();
for (Element e : elements) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
This is due to a limitation in the ColumnText class that has been there for many years. We have evaluated the problem and the only way to fix this, would be to rewrite ColumnText entirely. This is not an item on our current technical road map.
Here's a workaround for nested ordered and un-ordered lists.
The rich Text editor I am using giving the class attribute "ql-indent-1/2/2/" for li tags, based on the attribute adding ul/ol starting and ending tags.
public String replaceIndentSubList(String htmlContent) {
org.jsoup.nodes.Document document = Jsoup.parseBodyFragment(htmlContent);
Elements element_UL = document.select("ul");
Elements element_OL = document.select("ol");
if (!element_UL.isEmpty()) {
htmlContent = replaceIndents(htmlContent, element_UL, "ul");
}
if (!element_OL.isEmpty()) {
htmlContent = replaceIndents(htmlContent, element_OL, "ol");
}
return htmlContent;
}
public String replaceIndents(String htmlContent, Elements element, String tagType) {
String attributeKey = "class";
String startingULTgas = "<" + tagType + ">";
String endingULTags = "</" + tagType + ">";
int lengthOfQLIndenet = new String("ql-indent-").length();
HashMap<String, String> startingLiTagMap = new HashMap<String, String>();
HashMap<String, String> lastLiTagMap = new HashMap<String, String>();
Pattern regex = Pattern.compile("ql-indent-\\d");
HashSet<String> hash_Set = new HashSet<String>();
Elements element_Tag = element.select("li");
for (org.jsoup.nodes.Element element2 : element_Tag) {
org.jsoup.nodes.Attributes att = element2.attributes();
if (att.hasKey(attributeKey)) {
String attributeValue = att.get(attributeKey);
Matcher matcher = regex.matcher(attributeValue);
if (matcher.find()) {
if (!startingLiTagMap.containsKey(attributeValue)) {
startingLiTagMap.put(attributeValue, element2.toString());
}
hash_Set.add(matcher.group(0));
if (!startingLiTagMap.get(attributeValue)
.equalsIgnoreCase(element2.toString())) {
lastLiTagMap.put(attributeValue, element2.toString());
}
}
}
}
System.out.println(htmlContent);
Iterator value = hash_Set.iterator();
while (value.hasNext()) {
String liAttributeKey = (String) value.next();
int noOfIndentes = Integer
.parseInt(liAttributeKey.substring(lengthOfQLIndenet));
if (noOfIndentes > 1)
for (int i = 1; i < noOfIndentes; i++) {
startingULTgas = startingULTgas + "<" + tagType + ">";
endingULTags = endingULTags + "</" + tagType + ">";
}
htmlContent = htmlContent.replace(startingLiTagMap.get(liAttributeKey),
startingULTgas + startingLiTagMap.get(liAttributeKey));
if (lastLiTagMap.get(liAttributeKey) != null) {
System.out.println("Inside last Li Map");
htmlContent = htmlContent.replace(lastLiTagMap.get(liAttributeKey),
lastLiTagMap.get(liAttributeKey) + endingULTags);
}
else {
htmlContent = htmlContent.replace(startingLiTagMap.get(liAttributeKey),
startingLiTagMap.get(liAttributeKey) + endingULTags);
}
startingULTgas = "<" + tagType + ">";
endingULTags = "</" + tagType + ">";
}
System.out.println(htmlContent);[enter image description here][1]
return htmlContent;
}

Java jsoup link extracting

I am trying to extract the links within a given element in jsoup. Here what I have done but its not working:
Document doc = Jsoup.connect(url).get();
Elements element = doc.select("section.row");
Element s = element.first();
Elements se = s.getElementsByTag("article");
for(Element link : se){
System.out.println("link :" + link.select("href"));
}
Here is the html:
The thing I am trying to do is get all the links withing the article classes. I thought that maybe first I must select the section class ="row", and then after that derive somehow the links from the article class but I could not make it work.
Try out this.
Document doc = Jsoup.connect(url).get();
Elements section = doc.select("#main"); //select section with the id = main
Elements allArtTags = section.select("article"); // select all article tags in that section
for (Element artTag : allArtTags ){
Elements atags = artTag.select("a"); //select all a tags in each article tag
for(Element atag : atags){
System.out.println(atag.text()); //print the link text or
System.out.println(atag.attr("href"));//print link
}
}
I'm using this in one of my projects:
final Elements elements = doc.select("div.item_list_section.item_description");
you'll have to get the elements you want to extract links from.
private static ... inspectElement(Element e) {
try {
final String name = getAttr(e, "a[href]");
final String link = e.select("a").first().attr("href");
//final String price = getAttr(e, "span.item_price");
//final String category = getAttr(e, "span.item_category");
//final String spec = getAttr(e, "span.item_specs");
//final String datetime = e.select("time").attr("datetime");
...
}
catch (Exception ex) { return null; }
}
private static String getAttr(Element e, String what) {
try {
return e.select(what).first().text();
}
catch (Exception ex) { return ""; }
}

JSoup parsing data from within a tag

I am managing to parse most of the data I need except for one as it is contained within the a href tag and I am needing the number that appears after "mmsi="
Sunsail 4013
my current parser fetches all the other data I need and is below. I tried a few things out the code commented out returns unspecified occasionally for an entry. Is there any way I can add to my code below so that when the data is returned the number "235083844" returns before the name "Sunsail 4013"?
try {
File input = new File("shipMove.txt");
Document doc = Jsoup.parse(input, null);
Elements tables = doc.select("table.shipInfo");
for( Element element : tables )
{
Elements tdTags = element.select("td");
//Elements mmsi = element.select("a[href*=/showship.php?mmsi=]");
// Iterate over all 'td' tags found
for( Element td : tdTags ){
// Print it's text if not empty
final String text = td.text();
if( text.isEmpty() == false )
{
System.out.println(td.text());
}
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Example of data parsed and html file here
You can use attr on an Element object to retrieve a particular attribute's value
Use substring to get the required value if the String pattern is consistent
Code
// Using just your anchor html tag
String html = "Sunsail 4013";
Document doc = Jsoup.parse(html);
// Just selecting the anchor tag, for your implementation use a generic one
Element link = doc.select("a").first();
// Get the attribute value
String url = link.attr("href");
// Check for nulls here and take the substring from '=' onwards
String id = url.substring(url.indexOf('=') + 1);
System.out.println(id + " "+ link.text());
Gives,
235083844 Sunsail 4013
Modified condition in your for loop from your code:
...
for (Element td : tdTags) {
// Print it's text if not empty
final String text = td.text();
if (text.isEmpty() == false) {
if (td.getElementsByTag("a").first() != null) {
// Get the attribute value
String url = td.getElementsByTag("a").first().attr("href");
// Check for nulls here and take the substring from '=' onwards
String id = url.substring(url.indexOf('=') + 1);
System.out.println(id + " "+ td.text());
}
else {
System.out.println(td.text());
}
}
}
...
The above code would print the desired output.
If you need value of attribute, you should use attr() method.
for( Element td : tdTags ){
Elements aList = td.select("a");
for(Element a : aList){
String val = a.attr("href");
if(StringUrils.isNotBlank(val)){
String yourId = val.substring(val.indexOf("=") + 1);
}
}

Categories