Jdoms annoying textnodes and addContent(index, Element) - schema solutions? - java

i have some already generated xmls and the application causing problems now needs to add elements to it which need to be at a specific position to be valid with to the applied schemata...
now there are two problems the first one is that i have to hardcode the positions which is not that nice but "ok".
But the much bigger one is jdom... I printed the content list and it looks like:
element1
text
element2
element4
text
element5
while the textnodes are just whitespaces and every element i add makes it even more unpredictable how many textnodes there are (because sometimes there are added some sometimes not) which are just counted as it were elements but i want to ignore them because when i add element3 at index 2 its not between element2 and element4 it comes after this annoying textnode.
Any suggestions? The best solution imho would be something that automatically puts it where it has to be according to the schema but i think thats not possible?
Thanks for advice :)

The JDOM Model of the XML is very literal... it has to be. On the other hand, JDOM offers ways to filter and process the XML in a way that should make your task easier.
In your case, you want to add Element content to the document, and all the text content is whitespace..... so, just ignore all the text content, and worry about the Element content only.
For example, if you want to insert a new element nemt before the 3rd Element, you can:
rootemt.getChildren().add(3, new Element("nemt"));
The elements are now sorted out.... what about the text...
A really simple solution is to just pretty-print the output:
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(System.out, mydoc);
That way all the whitespace will be reformatted to make the XML 'pretty'.
EDIT - and no, there is no way with JDOM to automatically insert the element in the right place according to the schema....
Rolf

Related

Jsoup eq selector returns no value

Trying to fetch data using Jsoup 1.10.3, seems like eq selector is not working correctly.
I tried the nth-child, but it seems like its not getting the second table (table:nth-child(2)).
Is my selector correct?
html > body > table:nth-child(2) > tbody > tr:nth-child(2) > td:nth-child(2)
in the example below, trying to extract the value 232323
Here is the try it sample
There are several issues that you may be struggling with. First, I don't think that you want to use the :nth-child(an+b) selector. Here is the explanation of that selector from the jsoup docs:
:nth-child(an+b) elements that have an+b-1 siblings before it in the document tree, for any positive integer or zero value of n, and has a parent element. For values of a and b greater than zero, this effectively divides the element's children into groups of a elements (the last group taking the remainder), and selecting the bth element of each group. For example, this allows the selectors to address every other row in a table, and could be used to alternate the color of paragraph text in a cycle of four. The a and b values must be integers (positive, negative, or zero). The index of the first child of an element is 1.
I guess you want to use the :table:nth-of-type(n) selector.
Second, you only select elements with your selector, but you want to get the visible content 232323, which is only one inner node of the element you select. So what is missing is the part where you get to the content. There are several ways of doing this. I again recommend that you read the docs. Especially the cookbook is very helpful for beginners. I guess you could use something like this:
String content = element.text();
Third, with CSS selector you really do to need to go through every hierarchy level of the DOM. Since tables always contain a tbody and tr and td elements, you may do something like this:
String content = document.select("table:nth-of-type(2) tr:nth-of-type(2) td:last-of-type").text();
Note, I do not have a java compiler at hand. Please use my code with care.

Jsoup attribute removal on html tags

I have the problem that i want to filter certain texts which may contain html.
I use jsoup to whitelist and clean the tags which works pretty nice.
I only have the problem that some of the tags can contain attributes, mostly style or classes but there could also be different attributes. (name, target, ect.) When cleaning this is no problem because they get stripped nicely but when whitelisting some tags which would be allowed get blocked because of the attributes. The basic whitelist does not seem to cover style or class attributes plus i cannot be shure what else i'm encountering.
Since I want to allow quite a wide range of tags, but remove most of them during cleaning, I don't want to add all attributes for all tags that I'm allowing. The simplest would be to strip all attributes from all tags, since I'm not interested in them anyway and then check if the stripped text with the plain tags is valid.
Is there a function that removes all attributes or some simple loop, another option would be to tell the whitelister to ignore all attributes and simply whitelist on the tags.
The solution that finally worked for me is quite simple. I iterate through all elements, then iterate through all attributes and then remove them on the element, which leaves me with a cleaned version where i just have to validate the html-tags themselves. I think this is not the neatest way to solve the problem but it does what I wanted.
** EDIT **
I got upvoted many times for the old code while it actually contained an absolute beginners bug. You can never delete while iterating through the same list.
This bug only triggered when more than one attribute was removed, however.
updated code with a bugFix:
Document doc = Jsoup.parseBodyFragment(aText);
Elements el = doc.getAllElements();
for (Element e : el) {
List<String> attToRemove = new ArrayList<>();
Attributes at = e.attributes();
for (Attribute a : at) {
// transfer it into a list -
// to be sure ALL data-attributes will be removed!!!
attToRemove.add(a.getKey());
}
for(String att : attToRemove) {
e.removeAttr(att);
}
}
return Jsoup.isValid(doc.body().html(), theLegalWhitelist);

Store XML data in DOM parser [duplicate]

I am new working in Java and XML DOM parser. I had a requirement like read the xml data and store it inform of column and rows type.
Example:sample.xml file
<staff>
<firstname>Swetha</firstname>
<lastname>EUnis</lastname>
<nickname>Swetha</nickname>
<salary>10000</salary>
</staff>
<staff>
<firstname>John</firstname>
<lastname>MAdiv</lastname>
<nickname>Jo</nickname>
<salary>200000</salary>
</staff>
i need to read this XML file and store it in the above format:
firstName,lastName,nickName,Salary
swetha,Eunis,swetha,10000
john,MAdiv,Jo,200000
Java Code:
NodeList nl= doc.getElementsByTagName("*");
for(int i=0;i< nl.getLength();i++)
{
Element section = (Element) nl.item(i);
Node title = section.getFirstChild();
while (title != null && title.getNodeType() != Node.ELEMENT_NODE)
{
title = title.getNextSibling();
if (title != null)
{
String first=title.getFirstChild().getNodeValue().trim();
if(first!=null)
{
title = title.getNextSibling();
}
System.out.print(first + ",");
} }
System.out.println("");
}//for
I did the above code, but i am not able to find the way to get the data in the above column and row format. Can any one please please kindly help me in solving my issue, i am looking into it from past many days
Since this looks like homework, I'm going to give you some hints:
The chances are that your lecturer has given you some lecture notes and/or examples on processing an XML DOM. Read them all again.
The getElementsByTagName method takes an element name as a parameter. "*" is not a valid element name, so the call won't return anything.
Your code needs to mirror the structure of the XML. The XML structure in this case consists of N staff elements, each of which contains elements named firstname, lastname, nickname and salary.
It is also possible that your lecturer expects you to use something like XSLT or an XML binding mechanism to simplify this. (Or maybe this was intended to be XMI rather than XML ... in which there are other ways to handle this ...)
I kept getElementsByTagName method parameter "*" because to read the data dynamically.
Well, it doesn't work!! The DOM getElementsByTagName method does NOT accept a pattern of any kind.
If you want to make your code generic, you can't use getElementsByTagName. You will need to walk the tree from the top, starting with the DOM's root node.
Can you please provide me with sample data.
No. Your lecturer would not approve of me giving you code to copy from. However, I will point out that there are lots of XML DOM tutorials on the web which should help you figure out what you need to do. The best thing is for you to do the work yourself. You will learn more that way ... and that is the whole point of your homework!
1. The DOM Parser will parse the entire XML file to create the DOM object.
2. You will always need to be aware of the the type of output and the structure of xml returned when a request is fired on a web-service.
3. And its Not the XML structure of a reply which is returned from the Webservice that will be dynamic, but the child elements values and attributes can be Dynamic.
4. You will need to handle this dynamic behavior with try/catch block...
For further details on DOM PARSER, see this site...
http://tutorials.jenkov.com/java-xml/dom.html

Selenium webdriver: finding all elements with similar id

I have this xpath: //*[#id="someId::button"]
Pressing it shows a dropdown list of values.
Now, I know all the elements in the list have an id like this :
//*[#id="someId--popup::popupItemINDEX"]
, where INDEX is a number from 1 to whatever the number of options are.
I also know the value which I must click.
One question would be: since I will always know the id of the button which generates the dropdown, can I get all the elements in the dropdown with a reusable method? (I need to interact with more than one dropdown)
The way I thought about it is:
get the root of the initial ID, as in:
//*[#id="someId
then add the rest : --popup::popupItem. I also need to add the index and I thought I could use a try block (in order to get though the exceptions when I give a bigger than expected index) like this:
for(int index=1;index<someBiggerThanExpectedNumber;index++){
try{
WebElement aux= driver.findElement(By.xpath(builtString+index+"\"]"));
if(aux.getText().equals(myDesiredValue))
aux.click();
}catch(Exception e){}
}
Note that I am using the webdriver api and java.
I would like to know if this would work and if there is an easier way of doing this, given the initial information I have.
EDIT: The way I suggested works, but for an easier solution, the accepted answer should be seen
As a rule of thumb, try to select more elements by one query, if possible. Searching for many elements one-by-one will get seriously slow.
If I understand your needs well, a good way to do this would be using
driver.findElement(By.id("someId::button")).click();
driver.findElement(By.xpath("//*[contains(#id, 'someId--popup::popupItem') " +
"and text()='" + myDesiredValue + "']"))
.click();
For more information about XPath, see the spec. It's surprisingly a very good read if you can skip the crap!
That finds and clicks an element with text equal to you desired value which contains "someId--popup::popupItem" in its ID.
List<WebElement> list = driver.findElements(By.xpath("//*[contains(#id, 'someId--popup::popupItem')]"));
That finds all just all elements that contain "someId--popup::popupItem" in their ID. You can then traverse the list and look for your desired element.
Did you know you can call findElement() on a WebElement to search just it's children?
- driver.findElement(By.id("someId")).findElements(By.className("clickable"))
Without a peek on the underlying HTML, I guess I can't offer the best approach, but I have some in my head.
Have you tried using JavascriptExecutor?
If you are willing to write a little JavaScript then this would be straightforward than in java (I think)
All you will need to do is have some JavaScript crawl through the DOM subtree, and return a list of DOM elements matching your criteria. WebDriver will then happily marshall this as List<WebElement> in the java world.
The Safer Method to use here is
int size=driver.findElements(By.xpath("//*[#id='someId::button']")).size();
Start using Index Now
String builtString="//*[#id='someId::button'][";
for(int index=1;index<=size();index++)
{
try
{
WebElement aux= driver.findElement(By.xpath(builtString+index+"\"]"));
if(aux.getText().equals(myDesiredValue))
aux.click();
}
catch(Exception e){}
}
Please Let me know is the above funda is working or not.

How to add string content in xml nodes(empty node/append to existing content) using XPath Query

I have the following xml structure already available with the childnodes as title, desc, symp, diag, treat addinfo etc. I want to check that whether addinfo contains any thing and also append some string to it like "This is additional info". There are many disease title I need to check the addifo tag according to disease title-title tag.
<chapter>
<disease type="Name">
<title>Name</title>
<desc>--------------</desc>
<symp>--------------</symptoms>
<diag>-------------</diagnosis>
<treat>-------------</treatment>
<addinfo></addinfo>
</disease>
</chapter
>
I am using XPath query for searching the content of the tags according to the disease name.
Thanks
XPath can only retrieve information from your document, it cannot modify it.
XQuery can produce a modified copy of your document, but it's not particularly easy, because you need to explicitly copy all the parts that you don't want to change.
XSLT is a better bet for producing a modified copy of the document; or XQuery Update Facility if you prefer.
You can use an XPath like this to see if addinfo is empty or not: //addinfo[not(node())]
UPDATE:
Ok, now am totally lost as to what you want. It sounds like a really straightforward problem. You would create some sort of getText() method that given an XML node, gets the text associated with it (if any or returns null or empty string). Then you would do a traversal of all the diseases and for each disease, perform your logic: if( string.isNullOrEmpty(getText(disease, "addinfo")) ) { // do something } -- if you want to check whether addinfo is empty or add stuff to it etc..

Categories