Jsoup is not Correctly Working

Jsoup is not Correctly Working - java

Hello guys i have an problem by Jsoup it is not working and i have no idea to figured it out here is the Code
private void getWebsite() {
Document doc = null;
try {
doc = Jsoup.connect("http://www.jean-clermont-schule.de/seite/90384/vertretungsplan.html").get();
Elements newsHeadlines = doc.select("content");
} catch (IOException e) {
e.printStackTrace();
}
}
And here is an Picture

Your code you provided is valid and compiles so the error is likely outside of what you've shown us. Looking at your picture, I guess you've imported the wrong Document class. Check your imports.

I am unable to add comment above.
I think that the code line Elements newsHeadlines = doc.select("content"); is wrong because content isn't tag for this link.
You must provide tag name with attribute and value being optional while using .select("");
You may try Elements newsHeadlines = doc.select("div[id=content]");

Related

Trying to iterate over very similar Elements in an XML file. NOTE XML file is attribute less

Hullo, I have a question about xml and java. I have a weird XML file with no attributes and only Elements, im trying to zero in on a specific Element Stack, and then iterate over all of the similar element stacks.
<InstrumentData>
<Action>Entire Plot</Action>
<AppStamp>Vectorworks</AppStamp>
<VWVersion>2502</VWVersion>
<VWBuild>523565</VWBuild>
<AutoRot2D>false</AutoRot2D>
<UID_1505_1_1_0_0> ---- This is the part I care about, there are about 1000+ of these and they all vary slightly after the "UID_"---
<Action>Update</Action>
<TimeStamp>20200427192323</TimeStamp>
<AppStamp>Vectorworks</AppStamp>
<UID>1505.1.1.0.0</UID>
</UID_1505_1_1_0_0>
I am using dom4j as the xml parser and I dont have any issues spitting out all of the data I just want to zero in on the XML path.
This is the code so far:
public class Unmarshal {
public Unmarshal() {
File file = new File("/Users/michaelaboah/Desktop/LIHN 1.11.18 v2020.xml");
SAXReader reader = new SAXReader();
try {
Document doc = reader.read(file);
Element ele = doc.getRootElement();
Iterator<Element> it = ele.elementIterator();
Iterator<Node> nodeIt = ele.nodeIterator();
while(it.hasNext()) {
Element test2 = (Element) it.next();
List<Element> eleList = ele.elements();
for(Element elementsIt : eleList) {
System.out.println(elementsIt.selectSingleNode("/SLData/InstrumentData").getStringValue());
//This spits out everything under the Instrument Data branch
//All of that data is very large
System.out.println(elementsIt.selectSingleNode("/SLData/InstrumentData/UID_1505_1_1_0_0").getStringValue());
//This spits out everything under the UID branch
}
}
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Also, I know there are some unused data types and variables there was a lot of testing

I think your answer is:
elementsIt.selectSingleNode("/SLData/InstrumentData/*[starts-with(local-name(), 'UID_')]").getStringValue()
I used this post to find this XPath and it works with the few xml lines you gave.

How do I resolve this error using Paths.of?

I have been trying many of the examples provided and have yet to be successful. Here is the code I am currently trying, but getting an error in Eclipse on Paths.of (the of is underlined in red) that says: "rename in file".
String content;
try {
content = Files.readAllLines(Paths.of("C:", "Calcs.txt"));
} catch (IOException e1) {
e1.printStackTrace ();
}
System.out.println (content);

First it is not possible, if you get a list as return type, to assign this to a string. So you must write:
List<String> content;
Second regarding to the Java 8 documentation there is no method of available for this class. You can use the method get like this:
List<String> content = Files.readAllLines(Paths.get("C:", "Calcs.txt"));
Otherwise there exists a method of in the Path class since Java 11. Therefore you can write something like that:
List<String> content = Files.readAllLines(Path.of("C:", "Calcs.txt"));

You're probably looking for Paths.get:
String content;
try {
content = String.join("\n", Files.readAllLines(Paths.get("/home/hassan", "Foo.java")));
} catch (IOException e1) {
e1.printStackTrace ();
}

Parse data from webpage to android app using Jsoup

My android app has a part were i need to parse data from wikipedia.com and use that in application. when i go to https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data I can see the covid19 cases. I want to retrieve the number from the table
I am using Jsoup. I am able to get html data by using this https://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Template:COVID-19_pandemic_data .If you can guide me how can i extract the india cases and deaths from html file. as the html doc is huge and there no attr for tr. There's not much information about this on internet. What i have tried so far...
private void getWebsite() {
new Thread(new Runnable() {
#Override
public void run() {
final StringBuilder builder = new StringBuilder();
String web_link = "https://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Template:COVID-19_pandemic_data";
try {
Document doc = Jsoup.connect(web_link).get();
String title = doc.title();
Elements links = doc.select("tr");
builder.append(title).append("\n");
for(Element link : links){
builder.append(link);
}
} catch (IOException e) {
builder.append("Error : ").append(e.getMessage()).append("\n");
}
runOnUiThread(new Runnable() {
#Override
public void run() {
textView.setText(builder.toString());
}
});
}
}).start();
}

The problem is related to the format of the data (XML). When you navigate down the XML elements, you find what's displayed in the document, when viewed via your browser, is:
<someTag>...</someTag>
But what's actually present is the xml encoded version of the string:
<someTag>...</someTag>
JSoup won't work well with this and you'll need further processing to convert the output to more XML to get it working I'd imagine. You can test this yourself by viewing the result of:
doc.getElementsByTag("text")
And you'd need to replace all < and > tokens with <, > respectively.
Here's what I tried, plus some minor edits after failing to be able to pull tbody/thead/th.. I then started trying to pull from the top level tag, starting with api, moving deeper into the DOM.
final StringBuilder builder = new StringBuilder();
String url = "https://en.wikipedia.org/w/api.php?format=xml&action=parse&page=Template:COVID-19_pandemic_data";
try {
Document doc = Jsoup.connect(url).get();
String title = doc.getElementsByTag("parse").attr("title");
Also worth mentioning there are some really good examples in the documents here: https://jsoup.org/cookbook/extracting-data/dom-navigation
And finally, for what it's worth, I'd change the URL used to: https://en.wikipedia.org/wiki/Template:COVID-19_pandemic_data to make life easier for use with JSoup so you can just pull the relevant bits of data from HTML rather than XML.
In my view, if you have the choice, HtmlUnit would be a better tool for this since you can simply specify an XPath for the HTML element you want to extract without having to use multiple method calls to get what you want... the more concise format means there's less room for errors to hide.

ghost4j class cast exception during joining two PostScripts

I am trying to join two PostScript files to one with ghost4j 0.5.0 as follows:
final PSDocument[] psDocuments = new PSDocument[2];
psDocuments[0] = new PSDocument();
psDocuments[0].load("1.ps");
psDocuments[1] = new PSDocument();
psDocuments[1].load("2.ps");
psDocuments[0].append(psDocuments[1]);
psDocuments[0].write("3.ps");
During this simplified process I got the following exception message for the above "append" line:
org.ghost4j.document.DocumentException: java.lang.ClassCastException:
org.apache.xmlgraphics.ps.dsc.events.UnparsedDSCComment cannot be cast to
org.apache.xmlgraphics.ps.dsc.events.DSCCommentPage
Until now I have not made to find out whats the problem here - maybe some kind of a problem within one of the PostScript files?
So help would be appreciated.
EDIT:
I tested with ghostScript commandline tool:
gswin32.exe -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pswrite -sOutputFile="test.ps" --filename "1.ps" "2.ps"
which results in a document where 1.ps and 2.ps are merged into one(!) page (i.e. overlay).
When removing the --filename the resulting document will be a PostScript with two pages as expected.

The exception occurs because one of the 2 documents does not follow the Adobe Document Structuring Convention (DSC), which is mandatory if you want to use the Document append method.
Use the SafeAppenderModifier instead. There is an example here: http://www.ghost4j.org/highlevelapisamples.html (Append a PDF document to a PostScript document)

I think something is wrong in the document or in the XMLGraphics library as it seems it cannot parse a part of it.
Here you can see the code in ghost4j that I think it is failing (link):
DSCParser parser = new DSCParser(bais);
Object tP = parser.nextDSCComment(DSCConstants.PAGES);
while (tP instanceof DSCAtend)
tP = parser.nextDSCComment(DSCConstants.PAGES);
DSCCommentPages pages = (DSCCommentPages) tP;
And here you can see why XMLGraphics may bre sesponsable (link):
private DSCComment parseDSCComment(String name, String value) {
DSCComment parsed = DSCCommentFactory.createDSCCommentFor(name);
if (parsed != null) {
try {
parsed.parseValue(value);
return parsed;
} catch (Exception e) {
//ignore and fall back to unparsed DSC comment
}
}
UnparsedDSCComment unparsed = new UnparsedDSCComment(name);
unparsed.parseValue(value);
return unparsed;
}
It seems parsed.parseValue(value) has thrown an exception, it was hidden in the catch and it returned an unparsed version ghost4j didn't expect.

JSVGCanvas.getSVGDocument() is returning null?

I seem to have a problem working with batikSVG for manupilating SVG using Java. I can display the SVG just fine on the JSVG Canvas but when I try to the canvas's SVGDocument using getSVGDocument it seems to return null. Why is that, and how can I get the actual document?
jSVGCanvas1.setURI(new File("circle.svg").toURI().toString());
jSVGCanvas1.setDocumentState(JSVGCanvas.ALWAYS_DYNAMIC);
SVGDocument doc = jSVGCanvas1.getSVGDocument();
if(doc==null)System.out.println("null");
The last line tests where doc is null and it always prints null. Please help!

You'll need to wait for the document to load and that happens asynchronously. Something like this...
jSVGCanvas1.addSVGDocumentLoaderListener(new SVGDocumentLoaderAdapter() {
public void documentLoadingCompleted(SVGDocumentLoaderEvent e) {
SVGDocument doc = jSVGCanvas1.getSVGDocument();
if(doc==null)System.out.println("null");
}
});

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Jsoup is not Correctly Working - java

Your code you provided is valid and compiles so the error is likely outside of what you've shown us. Looking at your picture, I guess you've imported the wrong Document class. Check your imports.

Related

Trying to iterate over very similar Elements in an XML file. NOTE XML file is attribute less

How do I resolve this error using Paths.of?

Parse data from webpage to android app using Jsoup

ghost4j class cast exception during joining two PostScripts

JSVGCanvas.getSVGDocument() is returning null?

Categories

Resources