My previous question was closed and marked as duplicate, but the suggested asnwer does not answer my problem, and as suggested, I'm asking a new question.
Let's work with the suggested answer.
Here's the code:
String strMsg = "<soap:Envelope xmlns:soap=\"http://www.w3.org/2003/05/soap-envelope\">" +
" <soap:Header>" +
" <context xmlns=\"urn:zimbra\"/>" +
" </soap:Header>" +
" <soap:Body>" +
" <soap:Fault>" +
" <soap:Code>" +
" <soap:Value>soap:Sender</soap:Value>" +
" </soap:Code>" +
" <soap:Reason>" +
" <soap:Text>no valid authtoken present</soap:Text>" +
" </soap:Reason>" +
" <soap:Detail>" +
" <Error xmlns=\"urn:zimbra\">" +
" <Code>service.AUTH_REQUIRED</Code>" +
" <Trace>qtp1027591600-6073:1588614639199:4eacbd0257a457b6</Trace>" +
" </Error>" +
" </soap:Detail>" +
" </soap:Fault>" +
" </soap:Body>" +
"</soap:Envelope>";
InputStream is = new ByteArrayInputStream(strMsg.getBytes());
XMLInputFactory xif = XMLInputFactory.newFactory();
XMLStreamReader xsr = xif.createXMLStreamReader(is);
//Envelope
xsr.nextTag();
QName name = xsr.getName();
//Header
xsr.nextTag();
name = xsr.getName();
//Context
xsr.nextTag();
name = xsr.getName();
//Context again
xsr.nextTag();
name = xsr.getName();
//Header again
xsr.nextTag();
name = xsr.getName();
//Body
xsr.nextTag();
name = xsr.getName();
//Fault
xsr.nextTag();
name = xsr.getName();
/* IM COMMENTING THE FOLLOWING CODE BECAUSE I'M INTERESTED IN THE FAULT CONTENT
* AND EVEN IF IT TRY TO GO DEEPER I CAN'T GO PASS "VALUE" NODE
*
//Code
xsr.nextTag();
name = xsr.getName();
//Value
xsr.nextTag();
name = xsr.getName();
//throws exception, no more elements for some reason
xsr.nextTag();
name = xsr.getName();
*/
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StringWriter stringWriter = new StringWriter();
transformer.transform(new StAXSource(xsr), new StreamResult(stringWriter));
StringReader sr = new StringReader(stringWriter.toString());
JAXBContext jaxbContext = JAXBContext.newInstance(Fault.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Fault fault = (Fault) unmarshaller.unmarshal(sr); //THROWS EXCEPTION
//"unexcpected element (URI:"http://www.w3.org/2003/05/soap-envelope", local:"Fault"). Expected elements are <{}Fault>
My Fault class:
#XmlRootElement(name = "Fault")
#XmlAccessorType(XmlAccessType.FIELD)
public static class Fault {
#XmlElement(name = "Code")
private String code;
#XmlElement(name = "Reason")
private String reason;
public String getCode() {
return code;
}
public void setCode(String code) {
this.code = code;
}
public String getReason() {
return reason;
}
public void setReason(String reason) {
this.reason = reason;
}
}
I suspected it wasn't going to work, since the elements directly inside "Fault" don't have values themselves, they have more elements inside, and also all elements are prefixed with "soap", my envelope isn't exactly structured as the one in the suggested answer.
But still, I couldn't fetch the "Fault" node, as an exception was thrown:
unexcpected element (URI:"http://www.w3.org/2003/05/soap-envelope", local:"Fault"). Expected elements are <{}Fault>
I'm interested in getting the value of:
<soap:Text>no valid authtoken present</soap:Text>"
Also, this is only for this type of error, there might be other errors, also, when the answer is positive, I get a whole different response.
What I'm really insterested in is, finding a way to explore the envelope in the following way:
//pseudo code
(envelope->body->fault->reason->text != null) {reasonText = envelope->body->fault->reason->text)
But whatever way I'm able to reach Reason->Text will do, then I can adapt script to other bodies.
Thank you in advance.
A friend of mine who didn't want to answer here, found a solution, and even better, he found it using the way I wanted to:
//pseudo code
(envelope->body->fault->reason->text != null) {reasonText = envelope->body->fault->reason->text)
For anyone else in the futures who stumbles upon this problem, here it is a solution:
String strMsg = "<soap:Envelope xmlns:soap=\"http://www.w3.org/2003/05/soap-envelope\">" +
" <soap:Header>" +
" <context xmlns=\"urn:zimbra\"/>" +
" </soap:Header>" +
" <soap:Body>" +
" <soap:Fault>" +
" <soap:Code>" +
" <soap:Value>soap:Sender</soap:Value>" +
" </soap:Code>" +
" <soap:Reason>" +
" <soap:Text>no valid authtoken present</soap:Text>" +
" </soap:Reason>" +
" <soap:Detail>" +
" <Error xmlns=\"urn:zimbra\">" +
" <Code>service.AUTH_REQUIRED</Code>" +
" <Trace>qtp1027591600-6073:1588614639199:4eacbd0257a457b6</Trace>" +
" </Error>" +
" </soap:Detail>" +
" </soap:Fault>" +
" </soap:Body>" +
"</soap:Envelope>";
strMsg = strMsg.replaceAll("soap:",""); //Had to replace soap:, not fancy but it works.
is = new ByteArrayInputStream(strMsg.getBytes());
InputSource xml = new InputSource(is);
XPath xPath = XPathFactory.newInstance().newXPath();
Object exprEval = xPath.compile("/Envelope/Body/Fault/Reason/Text/text()").evaluate(xml, XPathConstants.STRING);
if ( exprEval != null ) {
System.out.println( "Fault reason text: " + exprEval );
// This prints what's expected:
// Fault reason text: no valid authtoken present
}
There you go.
Related
I am trying to make the crawler "abort" searching a certain subdomain every time it doesn't find a relevant page after 3 consecutive tries. After extracting the title and the text of the page I start looking for the correct pages to submit to my solr collection. (I do not want to add pages that don't match this query)
public void visit(Page page)
{
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String domain = page.getWebURL().getDomain();
String path = page.getWebURL().getPath();
String subDomain = page.getWebURL().getSubDomain();
String parentUrl = page.getWebURL().getParentUrl();
String anchor = page.getWebURL().getAnchor();
System.out.println("Docid: " + docid);
System.out.println("URL: " + url);
System.out.println("Domain: '" + domain + "'");
System.out.println("Sub-domain: '" + subDomain + "'");
System.out.println("Path: '" + path + "'");
System.out.println("Parent page: " + parentUrl);
System.out.println("Anchor text: " + anchor);
System.out.println("ContentType: " + page.getContentType());
if(page.getParseData() instanceof HtmlParseData) {
String title, text;
HtmlParseData theHtmlParseData = (HtmlParseData) page.getParseData();
title = theHtmlParseData.getTitle();
text = theHtmlParseData.getText();
if ( (title.toLowerCase().contains(" word1 ") && title.toLowerCase().contains(" word2 ")) || (text.toLowerCase().contains(" word1 ") && text.toLowerCase().contains(" word2 ")) ) {
//
// submit to SOLR server
//
submit(page);
Header[] responseHeaders = page.getFetchResponseHeaders();
if (responseHeaders != null) {
System.out.println("Response headers:");
for (Header header : responseHeaders) {
System.out.println("\t" + header.getName() + ": " + header.getValue());
}
}
failedcounter = 0;// we start counting for 3 consecutive pages
} else {
failedcounter++;
}
if (failedcounter == 3) {
failedcounter = 0; // we start counting for 3 consecutive pages
int parent = page.getWebURL().getParentDocid();
parent....HtmlParseData.setOutgoingUrls(null);
}
my question is, how do I edit the last line of this code so that i can retrieve the parent "page object" and delete its outgoing urls, so that the crawl moves on to the rest of the subdomains.
Currently i cannot find a function that can get me from the parent id to the page data, for deleting the urls.
The visit(...) method is called as one of the last statements of processPage(...) (line 523 in WebCrawler).
The outgoing links are already added to the crawler's frontier (and might be processed by other crawler processes as soon as they are added).
You could define the behaviour described by adjusting the shouldVisit(...) or (depending on the exact use-case) in shouldFollowLinksIn(...) of the crawler
I am trying to parse information from a particular website using JSOUP.
So far I can parse and display a single row, as the website has a lot of html and I am quite new to this I was wondering is there a way to parse all table rows on the page containing the word "fixturerow".
Here is my parser code:
Document doc =Jsoup.connect("http://www.irishrugby.ie/club/ulsterbankleagueandcup/fixtures.php").get();
Elements kelime = doc.select("tr#fixturerow0");
for(Element sectd:kelime){
Elements tds = sectd.select("td");
String result = tds.get(0).text();
String result1 = tds.get(1).text();
String result2 = tds.get(2).text();
String result3 = tds.get(3).text();
String result4 = tds.get(4).text();
String result5 = tds.get(5).text();
String result6 = tds.get(6).text();
String result7 = tds.get(7).text();
System.out.println("Date: " + result);
System.out.println("Time: " + result1);
System.out.println("League: " + result2);
System.out.println("Home Team: " + result3);
System.out.println("Score: " + result4);
System.out.println("Away Team: " + result5);
System.out.println("Venue: " + result6);
System.out.println("Ref: " + result7);
}`
Thanks for your time!
You can use the ^= (starts-with) selector:
Elements kelime = doc.select("tr[id^=fixturerow]");
This will return all elements with an id that starts with fixturerow.
You may have better luck if you use a selector that looks for id's that start-with the text of interest. So try changing
Elements kelime = doc.select("tr#fixturerow0");
to
Elements kelime = doc.select("tr[id^=fixturerow]");
Where ^= means that the text of interest starts with the text that follows.
I need to POST data and at the same time redirect to that URL in REST environment. I can do this for normal strings, but the requirement is to POST specific Object.
The way I do it for normal string is -
public Response homePage(#FormParam("username") String username,
#FormParam("passwordhash") String password) {
return Response.ok(PreparePOSTForm(username)).build();
}
private static String PreparePOSTForm(String username)
{
//Set a name for the form
String formID = "PostForm";
String url = "home";
//Build the form using the specified data to be posted.
StringBuilder strForm = new StringBuilder();
strForm.append("<form id=\"" + formID + "\" name=\"" +
formID + "\" action=\"" + url +
"\" method=\"POST\">");
strForm.append("<input type=\"hidden\" name=\"" + "username" +
"\" value=\"" + username + "\">");
strForm.append("</form>");
//Build the JavaScript which will do the Posting operation.
StringBuilder strScript = new StringBuilder();
strScript.append("<script language=\"javascript\">");
strScript.append("var v" + formID + " = document." +
formID + ";");
strScript.append("v" + formID + ".submit();");
strScript.append("</script>");
//Return the form and the script concatenated.
//(The order is important, Form then JavaScript)
return strForm.toString() + strScript.toString();
}
But this method is not sending Objects. I need a work around to send complex Objects. Please help me with this issue.
Thanks in advance.
I am using the Java library Metadata-extractor and cannot extract the tag
description correctly using the getUserCommentDescription method code below,
although the tag.getDescription does work:
String exif = "File: " + file;
File jpgFile = new File(file);
Metadata metadata = ImageMetadataReader.readMetadata(jpgFile);
for (Directory directory : metadata.getDirectories()) {
String directoryName = directory.getName();
for (Tag tag : directory.getTags()) {
String tagName = tag.getTagName();
String description = tag.getDescription();
if (tagName.toLowerCase().contains("comment")) {
Log.d("DEBUG", description);
}
exif += "\n " + tagName + ": " + description; //Returns the correct values.
Log.d("DEBUG", directoryName + " " + tagName + " " + description);
}
if (directoryName.equals("Exif IFD0")) {
// create a descriptor
ExifSubIFDDirectory exifDirectory = metadata.getDirectory(ExifSubIFDDirectory.class);
ExifSubIFDDescriptor descriptor = new ExifSubIFDDescriptor(exifDirectory);
Log.d("DEBUG","Comments: " + descriptor.getUserCommentDescription()); //Always null.
}
Am I missing something here?
You are checking for the directory name Exif IFD0 and then accessing the ExifSubIFDDirectory.
Try this code outside the loop:
Metadata metadata = ImageMetadataReader.readMetadata(jpgFile);
ExifSubIFDDirectory exifDirectory = metadata.getDirectory(ExifSubIFDDirectory.class);
ExifSubIFDDescriptor descriptor = new ExifSubIFDDescriptor(exifDirectory);
String comment = descriptor.getUserCommentDescription();
If this returns null then it may be an encoding issue or bug. If you run this code:
byte[] commentBytes =
exifDirectory.getByteArray(ExifSubIFDDirectory.TAG_USER_COMMENT);
Do you have bytes in the array?
If so then please open an issue in the issue tracker and include a sample image that can be used to reproduce the problem. You must authorise any image you provide for use in the public domain.
I want to read XML file with multi level tags and CDATA by using Java.
The sample XML is:
<?xml version="1.0" encoding="UTF-8"?>
<Result>
<ResultDetails>
<SearchFilmResult ItemType="film">
<FilmDetails>
<FilmDetail>
<Film Code="INCEPTION"><![CDATA[INCEPTION 2010]]></Film>
<Imdb>8.8</Imdb>
<FilmInformation>
<Director><![CDATA[Christopher Nolan]]></Director>
<Actors>
<Actor1><![CDATA[Leonardo DiCaprio]]></Actor1>
<Actor2><![CDATA[Joseph Gordon-Levitt]]></Actor2>
<Actor3><![CDATA[Ellen Page]]></Actor3>
</Actors>
</FilmInformation>
</FilmDetail>
</FilmDetails>
</SearchFilmResult>
</ResultDetails>
</Result>
The expected result is:
Film Code = INCEPTION
Film Name = INCEPTION 2010
IMDB = 8.8
Director = Christopher Nolan
Actors = Leonardo DiCaprio, Joseph Gordon-Levitt, Joseph Gordon-Levitt
Does anyone can guide me how to do? Many thanks.
Have you looked at XPath?
Here's a very simple example that will parse this sample XML, but I'd think it would be up to you to explore the possibilities that are out there and determine what will work well for you.
Give this a try:
public class Test {
public static void main(String[] args) throws Exception {
// sample xml
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<Result>\n" +
" <ResultDetails>\n" +
" <SearchFilmResult ItemType=\"film\">\n" +
" <FilmDetails>\n" +
" <FilmDetail>\n" +
" <Film Code=\"INCEPTION\"><![CDATA[INCEPTION 2010]]></Film>\n" +
" <Imdb>8.8</Imdb>\n" +
" <FilmInformation>\n" +
" <Director><![CDATA[Christopher Nolan]]></Director> \n" +
" <Actors>\n" +
" <Actor1><![CDATA[Leonardo DiCaprio]]></Actor1>\n" +
" <Actor2><![CDATA[Joseph Gordon-Levitt]]></Actor2>\n" +
" <Actor3><![CDATA[Ellen Page]]></Actor3>\n" +
" </Actors> \n" +
" </FilmInformation>\n" +
" </FilmDetail>\n" +
" </FilmDetails>\n" +
" </SearchFilmResult>\n" +
" </ResultDetails>\n" +
"</Result>";
// read the xml
InputSource source = new InputSource(new StringReader(xml));
// build a document model
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(source);
// create an xpath interpreter
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
// evaluate nodes
String filmCode = xpath.evaluate("Result/ResultDetails/SearchFilmResult/FilmDetails/FilmDetail/Film/#Code", document);
String filmName = xpath.evaluate("Result/ResultDetails/SearchFilmResult/FilmDetails/FilmDetail/Film", document);
String imdb = xpath.evaluate("Result/ResultDetails/SearchFilmResult/FilmDetails/FilmDetail/Imdb", document);
String director = xpath.evaluate("Result/ResultDetails/SearchFilmResult/FilmDetails/FilmDetail/FilmInformation/Director", document);
// get actor data
XPathExpression expr = xpath.compile("Result/ResultDetails/SearchFilmResult/FilmDetails/FilmDetail/FilmInformation/Actors/child::*");
NodeList actors = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
StringBuilder sb = new StringBuilder();
// compile actor list
for ( int i = 0; i < actors.getLength(); i++ ) {
String actorName = actors.item(i).getFirstChild().getNodeValue();
if ( i > 0 ) {
sb.append(", ");
}
sb.append(actorName);
}
// print output
System.out.println("Film Code = " + filmCode);
System.out.println("Film Name = " + filmName);
System.out.println("IMDB = " + imdb);
System.out.println("Director = " + director);
System.out.println("Actors = " + sb.toString());
}
}
Output:
Film Code = INCEPTION
Film Name = INCEPTION 2010
IMDB = 8.8
Director = Christopher Nolan
Actors = Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen Page