JSoup - Select all comments - java

I want to select all comments from a document using JSoup. I would like to do something like this:
for(Element e : doc.select("comment")) {
System.out.println(e);
}
I have tried this:
for (Element e : doc.getAllElements()) {
if (e instanceof Comment) {
}
}
But the following error occurs in eclipse "Incompatible conditional operand types Element and Comment".
Cheers,
Pete

Since Comment extends Node you need to apply instanceof to the node objects, not the elements, like this:
for(Element e : doc.getAllElements()){
for(Node n: e.childNodes()){
if(n instanceof Comment){
System.out.println(n);
}
}
}

In Kotlin you can get via Jsoup every Comment of the whole Document or a specific Element with:
fun Element.getAllComments(): List<Comment> {
return this.allElements.flatMap { element ->
element.childNodes().filterIsInstance<Comment>()
}
}

Related

java 8 stream a nested list to find a value is present

I need to find out if there is any value present for the characteristsic object . right now i am iterating all the loops to get check if value is present or not . the code is also not null safe .
boolean flag;
for (ServiceGroup serviceGroup : serviceGroups) {
List<Service> services = serviceGroup.getServices();
for (Service service : services) {
List<Subscription> subscriptions = service.getSubscriptions();
for (Subscription subscription : subscriptions) {
List<ComOrderCFS> prss = subscription.getPrss();
for (ComOrderCFS prs : prss) {
List<ComCFSCharacteristsics> characteristsics = prs.getCharacteristics();
for (ComCFSCharacteristsics characteristsic : characteristsics) {
if ("SerialNumber".equals(characteristsic.getName()) && characteristsic.getValue() != null) {
flag = true;
} else {
flag = false;
}
}
}
}
}
}
I tried my hand in streams but not getting the desired o/p
List<ComOrderCFS> prss= serviceGroups.stream().map(ServiceGroup::getServices).flatMap(Collection::stream).map
(Service::getSubscriptions).flatMap(Collection::stream).map(Subscription::getPrss).
findFirst().get();
is there anyway to to it in java 8 and return a boolean if present .
will appreciate a lot if the solution has comments (new to java 8)
thanks
return serviceGroups.stream()
.map(ServiceGroup::getServices)
.flatMap(Collection::stream)
.map(Service::getSubscriptions)
.flatMap(Collection::stream)
.map(Subscription::getPrss)
.flatMap(Collection::stream)
.map(ComOrderCFS::getCharacteristics)
.flatMap(Collection::stream)
.anyMatch(characteristsic -> "SerialNumber".equals(characteristsic.getName()) && characteristsic.getValue() != null);
Edit: I assumed you don't want to overwrite flag on every iteration but rather want to verify if any one entry in that list fulfills the condition.

Get attribute values from all elements

Code:
Document doc = Jsoup.connect("things.com").get();
Elements jpgs = doc.select("img[src$=.jpg]");
String links = jpgs.attr("src");
System.out.print("all: " + jpgs);
System.out.print("src: " + links);
Output:
all:
<img alt="Apple" src="apple.jpg">
<img alt="Cat" src="cat.jpg">
<img alt="Boat" src="boat.jpg">
src: apple.jpg
Jsoup gave the attribute value for first element. How can I get the others (cat.jpg and boat.jpg)?
Thank you.
You loop through links and get it from each one via Element#attr, since Elements#attr (note the s) says:
Get an attribute value from the first matched element that has the attribute.
(My emphasis.)
So for instance:
for (Element e : jpgs) {
// use e.attr("src") here
}
Using Java 8's new Stream stuff, you can probably get a List<String> of them if you like:
List<String> links = jpgs.stream<Element>()
.map(element -> element.attr("src"))
.collect(Collectors.toList());
...but my Java 8 streams-fu is very weak, so that may not be quite right. Yeah, that isn't right. But that's the general idea.
The boring old-fashioned way is:
List<String> links = new ArrayList<String>(links.size());
for (Element e : jpgs) {
srcs.add(e.attr("src"));
}
Elements#attr will only return the first match.
Elements#attr Source Code
public String attr(String attributeKey) {
for (Element element : this) {
if (element.hasAttr(attributeKey))
return element.attr(attributeKey);
}
return "";
}
Solution
To obtain the result you want, you should loop over your Elements
for (Element e : jpgs) {
System.out.println(e.attr("src"));
}

Java Code Optimization(jsoup)

Is there an efficient way to optimize this code, as most part of it look like identical, I just started learning jsoup and dont know how really can do that ://
Document doc = Jsoup.connect("http://www.blocket.se/hela_sverige/bilar?ca=11&cg=1020&w=3&md=th").get();
Elements partOne = doc.select("a[title=Flera bilder]");
for (Element element : partOne) {
String myElementOne = element.attr("abs:href");
System.out.println(myElementOne);
}
Elements partTwo = doc.select("a[title=\"\"]");
for (Element element : partTwo) {
String myElementTwo = element.attr("abs:href");
System.out.println(myElementTwo);
}
Elements partThree = doc.select("a[title=Bild]");
for (Element element : partThree) {
String myElementThree = element.attr("abs:href");
System.out.println(myElementThree);
}
The partOne, partTwo and partThree blocks are basically identical; just replace all of the parameter differences with variables and extract to a method:
void someMethodName(Document doc, String selector) {
Elements partOne = doc.select(selector);
for (Element element : partOne) {
String myElementOne = element.attr("abs:href");
System.out.println(myElementOne);
}
}
Example invocation:
someMethodName(doc, "a[title=Flera bilder]");
Alternatively, if you have access to Guava:
Iterable<Element> it = Iterables.concat(
doc.select("a[title=Flera bilder]"),
doc.select("a[title=\"\"]"),
doc.select("a[title=Bild]"));
for (Element element : it) {
String myElement = element.attr("abs:href");
System.out.println(myElement);
}
Andy's solution is of course doing the job. However, since you asked specifically for ways optimizing the JSoup calls, I would suggest to learn more about CSS selectors and regular expressions. For example this will do fine in your case:
Elements allParts = doc.select("a[title~=^Flera bilder$|^$|^Bild$]");
for (Element element : allParts) {
String elStr = element.attr("abs:href");
System.out.println(elStr);
}
Here, I use the ~= operator for attribute texts. It allows me to use a common regular expression to combine all three of your select statements into one.
An alternative way of doing this would be to use the , operator for adding all selectors into one:
Elements allParts2 = doc.select("a[title=Flera bilder],a[title=\"\"],a[title=Bild]");

how to extract email id using jsoup?

Elements elements = doc.select("span.st");
for (Element e : elements) {
out.println("<p>Text : " + e.text()+"</p>");
}
Element e contains text with some email id in it. How to extract the maild id from it. I have seen the Jsoup API doc which provides :matches(regex), but I didn't understand how to use it. I'm trying to use
^[a-zA-Z0-9_!#$%&’*+/=?`{|}~^.-]+#[a-zA-Z0-9.-]+$
which I found while googling.
Thank in advance for your help.
:matches(regex) is useful if you want to find something based on a specified regex (e.g. find all nodes that contain email).
I think this is not what you want. Instead, you need to extract the email from e.text() using regex. In your case:
Elements elements = doc.select("span.st");
for (Element e : elements) {
out.println("<p>Text : " + e.text()+"</p>");
out.println(extractEmail(e.text()));
}
// ...
public static String extractEmail(String str) {
Matcher m = Pattern.compile("[a-zA-Z0-9_.+-]+#[a-zA-Z0-9-]+\\.[a-zA-Z0- 9-.]+").matcher(str);
while (m.find()) {
return m.group();
}
return null;
}

Element id in the loop (JSOUP)

Here is my code:
Element current = doc.select("tr[class=row]").get(5);
for (Element td : current.children()) {
System.out.println(td.text());
}
How can I get an Element id in the loop?
Thanks!
In HTML id is a normal attribute, so you can simply call td.attr("id"):
Element current = doc.select("tr.row").get(5);
for (Element td : current.children()) {
System.out.println(td.attr("id"));
}
Note that there is also a selector for classes: tr.row.
JSoup supports many of the CSS selectors, so this could be rewritten with a single selector:
Elements elements = doc.select("tr.row:nth-of-type(6) > td");
for (Element element : elements) {
System.out.println(element.id());
}

Categories