org.jsoup.select.Selector$SelectorParseException: Could not parse query - java

i'm using Xsoup.
this is the code:
private void updateSeed(Document document) {
mappingParser
.setSeed(Xsoup.compile("//div[#class='pgCell'][last()]/a/#href")
.evaluate(document).get());
}
when i execute the previous function i'm getting the following exception:
Exception in thread "main" org.jsoup.select.Selector$SelectorParseException: Could not parse query 'div[#class='pgCell'][last()]': unexpected token at 'last()'
at us.codecraft.xsoup.xevaluator.XPathParser.byFunction(XPathParser.java:225)
at us.codecraft.xsoup.xevaluator.XPathParser.consumePredicates(XPathParser.java:202)
at us.codecraft.xsoup.xevaluator.XPathParser.findElements(XPathParser.java:138)
at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:51)
at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:375)
at us.codecraft.xsoup.xevaluator.XPathParser.combinator(XPathParser.java:85)
at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:49)
at us.codecraft.xsoup.xevaluator.XPathParser.parse(XPathParser.java:375)
at us.codecraft.xsoup.Xsoup.compile(Xsoup.java:27)
at com.qannoufit.test.CrawlerController.updateSeed(CrawlerController.java:102)
at com.qannoufit.test.CrawlerController.populateShouldParse(CrawlerController.java:91)
at com.qannoufit.test.CrawlerController.startCrawling(CrawlerController.java:60)
at com.qannoufit.test.Main.main(Main.java:12)

Try to use a CSS selector instead. The initial xPath query could be translated like below:
div.pgCell:last-of-type > a
Once you have the anchor, gets its href.
private void updateSeed(Document document) {
Element anchor = document.select("div.pgCell:last-of-type > a").first();
if (anchor==null) {
// Anchor not found, handle error here...
}
mappingParser.setSeed(anchor.absUrl("href"));
}

Related

Extract values from xml file using Java

Here is my response contain XML file and I want to retrieve bEntityID="328" from this xml response
<?xml version="1.0" encoding="UTF-8"?>
<ns2:aResponse xmlns:ns2="http://www.***.com/F1/F2/F3/2011-09-11">
<createBEntityResponse bEntityID="328" />
</ns2:aResponse>
I am trying to this but getting null
System.out.println("bEntitytID="+XmlPath.with(response.asString())
.getInt("aResponse.createBEntityResponse.bEntityID"));
Any suggestion for getting BEntityID from this response?
Though I dont suggest the below approach to use Regex to get element values, but if you are too desperate to get then try the below code:
public class xmlValue {
public static void main(String[] args) {
String xml = "<ns2:aResponse xmlns:ns2=\"http://www.***.com/F1/F2/F3/2011-09-11\">\n" +
" <createBEntityResponse bEntityID=\"328\" />\n" +
"</ns2:aResponse>";
System.out.println(getTagValue(xml,"createBEntityResponse bEntityID"));
}
public static String getTagValue(String xml, String tagName){
String [] s;
s = xml.split("createBEntityResponse bEntityID");
String [] valuesBetweenQuotes = s[1].split("\"");
return valuesBetweenQuotes[1];
}
}
Output: 328
Note: Better solution is to use XML parsers
This will fetch the first tag value:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
Other way around is to use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser()); //parse the whole xml doc
for (Element e : doc.select("tagName")) {
System.out.println(e); //select the specific tag and prints
}
I think the best way is deserializing xml to pojo like here, and then get value
entityResponse.getEntityId();
I tried with the same XML file and was able to get the value of bEntityId with the following code. Hope it helps.
#Test
public void xmlPathTests() {
try {
File xmlExample = new File(System.getProperty("user.dir"), "src/test/resources/Data1.xml");
String xmlContent = FileUtils.readFileToString(xmlExample);
XmlPath xmlPath = new XmlPath(xmlContent).setRoot("aResponse");
System.out.println(" Entity ::"+xmlPath.getInt(("createBEntityResponse.#bEntityID")));
assertEquals(328, xmlPath.getInt(("createBEntityResponse.#bEntityID")));
} catch (Exception e) {
e.printStackTrace();
}
}

Namespace error while validation schema with StAXSource

I'm try to validate an XML using StAX and javax Validator however I'm getting the following cast error:
org.xml.sax.SAXException: java.lang.ClassCastException: org.codehaus.stax2.ri.evt.NamespaceEventImpl cannot be cast to java.lang.String
javax.xml.transform.TransformerException: java.lang.ClassCastException: org.codehaus.stax2.ri.evt.NamespaceEventImpl cannot be cast to java.lang.String
The basic idea is that I need to parse an XML using StAX and I'm attempting to reuse the event reader I'll be using for parsing and creating a StAXSource to perform the validation.
I was able to debug the error and trace the cast exception to the class com.sun.org.apache.xalan.internal.xsltc.trax.StAXEvent2SAX, line 341, where there is a loop through an iterator and a cast to a String when in fact the iterator has the type NamespaceEventImpl (snippet code of the portion of code below).
// end namespace bindings
for( Iterator i = event.getNamespaces(); i.hasNext();) {
String prefix = (String)i.next();
if( prefix == null ) { // true for default namespace
prefix = "";
}
_sax.endPrefixMapping(prefix);
}
The following is the content of the iterator "i" while performing the logic I'm referring to:
iterator content
Below is a snippet of code describing how I'm doing it.
public void validateRequest(RequestMessage message) {
try {
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader(new ByteArrayInputStream(message.getMessage().getBytes()));
this.validateSchema(eventReader);
if(this.isSchemaValid()) {
// parse through XML
}
} catch(Exception e) {
LOGGER.error(e.getMessage(), e);
}
}
private void validateSchema(XMLEventReader eventReader) {
try {
StAXErrorHandler errorHandler = new StAXErrorHandler();
this.validator.setErrorHandler(errorHandler);
this.validator.validate(new StAXSource(eventReader));
} catch (SAXException | IOException | XMLStreamException e) {
LOGGER.error(e.getMessage(), e);
}
}
I was wondering if someone faced this issue before and if it is a limitation of using StAXSource with the Validator itself.

Get value from xml tags which are of same name using xpaths

I publish some csv input file on a server and it gives me a xml file that looks like this:
<ns0:TransportationEvent xmlns:ns0="http://www.server.com/schemas/TransportationEvent.xsd">
<ns0:creationDateTime>2017-04-06</ns0:creationDateTime>
.....
.....
</ns0:TransportationEvent>
<ns0:TransportationEvent xmlns:ns0="http://www.fedex.com/schemas/TransportationEvent.xsd">
<ns0:creationDateTime>2017-04-25</ns0:creationDateTime>
.....
.....
</ns0:TransportationEvent>
The TransportationEvent tag would be added again and again with the updated date in it.
I am retrieving data from this xml using XpathFactory class and NamespaceContext class which is shown as below:
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
String uri;
if (prefix.equals("ns0"))
uri = "http://www.server.com/schemas/TransportationEvent.xsd";
else
uri = null;
return uri;
}
public Iterator getPrefixes(String val) {
return null;
}
// Dummy implementation - not used!
public String getPrefix(String uri) {
return null;
}
};
XPathFactory xpathFact = XPathFactory.newInstance();
XPath xpath = xpathFact.newXPath();
xpath.setNamespaceContext(ctx);
String strXpath = "//ns0:TransportationEvent/ns0:creationDateTime/text()";
String creationDateTime = xpath.evaluate(strXpath, doc);
The above code gives the value of creationDateTime as 2017-04-06. Basically it always take values from the first TransportationEvent tag.
I need to pick data from that "TransportationEvent" tag where the "creationDateTime" is equal to today's date.
I can perform this by using NodeList class and can iterate through all the "TransportationEvent" tags but then I would not be able to use the Xpath or NamespaceContext implementation. I am finding no connection between the NodeList class and the NamespaceContext class or the Xpath class.
I want to get the value of ctx which has the context of the latest TransportationEvent tag.
I know I am missing something. Could somebody help please?
Use the last() function in a predicate to select only the last TransportationEvent:
String strXpath = "//ns0:TransportationEvent[last()]/ns0:creationDateTime/text()";

JSON response Issue for Jira Rest Client

When I use the following method :
public String getProjectList() {
projNames = new ArrayList<>();
projNames.add("Project1");
projNames.add("Project2");
projNames.add("Project3");
return new Gson().toJson(projNames);
in the following code :
$(document).ready(function() {
$.getJSON('DBDropDown', function(resp) { // on sucess
var $select = $('#someselect');
$select.find('option').remove();
$select.prepend("<option value='Select Project'></option>").val('');
$.each(resp, function(key, value) { // Iterate over the JSON object.
$('<option>').val(key).text(value).appendTo($select); // Create HTML <option> element, set its value with currently iterated key and its text content with currently iterated item and finally append it to the <select>.
});
}).fail(function() { // on failure
alert("Request failed.");
});
});
and my JSP call is :
response.getWriter().write(MusicDatabase
.getInstance()
.getProjectList()
I am able to get the dropdown menu. But when I use this method in place of getProjectList I dont get a response when I check chrome developer tools and debug.
public String Names() throws URISyntaxException{
names = new ArrayList<>();
URI uri = new URI("https://jira.xxxxx.com");
JiraRestClientFactory jrcf = new AsynchronousJiraRestClientFactory();
JiraRestClient jrc = jrcf.createWithBasicHttpAuthentication(uri, "xxx", "xxxx");
Iterable<BasicProject> allproject = jrc.getProjectClient().getAllProjects().claim();
for(BasicProject project : allproject){
names.add(project.getName());
}
return new Gson().toJson(names);
}
I am not getting any response and console throws ClassDefNotFound Exception when I already have all the classes needed. Help me if you have gone through this type of issue.
Thanks

How to get the content from a website using Jsoup

I amm trying to get the data from a website. With this code:
#WebServlet(description = "get content from teamforge", urlPatterns = { "/JsoupEx" })
public class JsoupEx extends HttpServlet {
private static final long serialVersionUID = 1L;
private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";
public JsoupEx() {
super();
}
protected void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException, IOException {
Document doc = Jsoup.connect(URL).get();
for (Element table : doc.select("table.DataTbl")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 1) {
System.out.println(tds.get(0).text() + ":"
+ tds.get(2).text());
}
}
}
}
}
I am using the jsoup parser. When run, I do not get any errors, just no output.
Please help on this.
With the following code
public class Tester {
private static final String URL = "http://www.moving.com/real-estate/city-profile/results.asp?Zip=60505";
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect(URL).get();
System.out.println(doc);
}
}
I get a java.net.SocketTimeoutException: Read timed out. I think the particuliar URL you are trying to crawl is too slow for Jsoup. Being in Europe, my connection might be slower as yours. However you might want to check for this exception in the log of your AS.
By setting the timeout to 10 seconds, I was able to download and parse the document :
Connection connection = Jsoup.connect(URL);
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);
With the rest of your code I get :
Population:78,413
Population Change Since 1990:53.00%
Population Density:6,897
Male:41,137
Female:37,278
.....
thanx Julien, I tried with the following code, getting SocketTimeoutException. And code is
Connection connection=Jsoup.connect("http://www.moving.com/real-estate/city-
profile/results.asp?Zip=60505");
connection.timeout(10000);
Document doc = connection.get();
System.out.println(doc);

Categories