Conditional Java Parsing - Getting Child Node Contents

Conditional Java Parsing - Getting Child Node Contents - java

I am having some problems with what should be simple DOM parsing. I have checked over numerous questions and so far nothing has helped my situation. The problem is that I have some conditional nodes that may appear in an XML or may not appear. The tool that I have created must save the contents of these values into ArrayLists to be used later. Here is the XML in question:
-<Dbtr>
-<PstlAdr>
<AdrLine>111 Arlington Ave</AdrLine>
<AdrLine>Apartment A</AdrLine>
<AdrLine>Augusta, AZ 11100</AdrLine>
</PstlAdr>
</Dbtr>
Specifically, the Dbtr tag may appear any number of times in an XML. For each Dbtr tag there may be between 1-4 AdrLine children. I need to be able to save the value of each AdrLine and if there is no value then save a blank "" value into the array list for each.
To do this I wrote the following code:
NodeList Dbtr = doc.getElementsByTagName("Dbtr");
for(int i = 0; i < Dbtr.getLength(); i++){
NodeList DbtrChildren = Dbtr.item(i).getChildNodes();
if(DbtrChildren.getLength()==1){
//Add the first child.
}else if(DbtrChildren.getLength()==2){
//Add the first & second child.
}else if(DbtrChildren.getLength()==3){
System.out.println("Test Flag");
System.out.println(DbtrChildren.item(0).getNodeValue()+"Node Value");
System.out.println(DbtrChildren.item(0).getAttributes()+"Text Attributes");
System.out.println(DbtrChildren.item(0).getTextContent()+"Text Content");
}else if(DbtrChildren.getLength()==4){
//Add all 4 children.
}
}
So depending on the number of children AdrLine nodes the values will either be saved into an array list or else a blank value will be saved.
The problem is that no matter what I do I get blank values for the children. I can clearly see during testing that the Dbtr tag does in fact have 3 children. As you can see I tried to do some debugging to figure out some way to get the values. See results below:
Test Flag
Node Value
nullText Attributes
Text Content
So I'm getting a large amount of whitespace but no value. Of course I considered that perhaps it was actually picking up "PstlAdr" but then why would it successfully detect 3 child nodes?
Any help is GREATLY appreciated.

I figured out the solution to my problem.
In the above example I was getting back "3" for the length of DbtrChildren which led me to believe that the three children elements were the adrline tags.
In reality the three children were #text, pstladr and #text (the #text apparently represents the /n).
So I was trying to get values from a tag without values. Once I tried
for(int i = 0; i < Dbtr.getLength(); i++){
NodeList Dbtr2 = Dbtr.item(i).getChildNodes();
Node Dbtr3 = Dbtr2.item(1);
System.out.println(Dbtr3.getNodeName());
}
I got back "Pstladr" as the result so now I know I only need to go further into "Pstladr" to fix my problem.

Related

How to count HTML Tags if I have dynamic changes in these tags?

I want to count some tags in this page Link
I am trying to count the tags of opening positions so, I tried this code
using java to count but I always find my count is = 0;
public By cardsNumOfPositions = By.xpath("//div[#class='card']");
List<WebElement> element = driver.findElements(cardsNumOfPositions);
int countelements = element.size();
And I write this function to count:-
public void printCountElements() {
System.out.println(countelements);
}
Everytime the count is 0, I searched for an Iframe but I didn't find any.
so how can I get the size of this element?

Also, you can make countelements as a static variable.
like this:
static int countelements;
now since it's static, its value will persist. You can call it like this:
public void printCountElements() {
countelements = element.size();
System.out.println(countelements);
}

Looking at the code, it doesn't look like there's anything wrong with it if what you're trying to achieve is to count the number of elements on the page matching that xpath.
The xpath also looks good, and I've tested it and it works. I can see that iframes exist on the page, but it doesn't look like these elements are within those iframes, so it doesn't look like stepping into an iframe would be required.
Are there sufficient waits in place, to make sure these elements are fully loaded before we're trying to find them? I'm worried that we're trying to put these elements into a list before they've loaded.
Failing that, what I'd check next is run the same test, but this time using the parents (or parents parents) of this element, and follow it up the chain to see if we can get any hits whilst running the code.

Trying to use an assert command in conjuntion with arraylist using selenium, but it breaks if the elements are not in the proper order

I have a string that needs to be compared to the names that are on the website. So the first thing I do is get the number of rows (because some arrays have more or fewer than 2 people in them) and then put that size into an int. String[] names come from the names that selenium is supposed to find when it goes to the website to execute this statement assertTrue(assertion.getText().contains(names[i-1])); The problem is: if the names do not appear in the order in which they appear in the array it breaks. In other words, if Mick Jagger is in li[1] and Keith Richards is in li[2], everything runs as expected. But if Keith Richards appears in li[1] it breaks. Furthermore, I am supposed to use the assertTrue command to do this. I have tried sorting, pushing whats on the web into a new ArrayList and I keep getting errors. Anyone know a good way to ensure the order isn't important and still use the assertTrue command?
Thanks,
Scott
WebElement assertion = null;
List<WebElement> assignees = driver.findElements(By.xpath(".//*[#id='assignee']/li"));
int count = assignees.size();
String[] names = {"Mick", "Keith"};
for (int i = 1; i < count; i++)
{
assertion = driver.findElement(By.xpath(".//*[#id='assignee']/li["+i+"]"));
assertTrue(assertion.getText().contains(names[i-1]));

If names represents the full string, you can just flip it. Make sure the text in your assertion (probably should be named something like assignee instead of assertion) is contained in your collection:
assertTrue(Arrays.asList(names).contains(assertion.getText());
Let me know if this won't work because a name is actually a subset of the text in assertion and I'll adjust the answer.
If they don't exactly match (which you have indicated they don't), you could use linq in c# to match this. Since you're using java you can use an additional loop. There may be a more efficient way to do this in java that I'm not aware of.
String assigneeText = assertion.getText();
boolean nameFound = false;
for(String name: names)
{
nameFound = assigneeText.contains(name);
if(nameFound)
{
break;
}
}
assertTrue(nameFound, "None of the expected names were found in the following assignee text: " + assigneeText);

Jsoup, cannot get an element out of a table

I've been messing around with Jsoup lately. My friend loves to buy gold for Diablo, so I thought I'd make him a little program that will grab the prices from various websites and present them to him, so he can spend as little money as possible. Usually, I can grab the price like this;
Document Fasteve;
try {
Fasteve = Jsoup.connect("http://www.fasteve.com/diablo-3/Gold/?st=US(Normal)").get();
Elements Price = Fasteve.select("table[class=table_2] tr:eq(5) td:eq(1)");
System.out.println("http://www.fasteve.com/diablo-3/Gold/?st=US(Normal)");
System.out.println("1000M Gold = " + Price.text());
} catch (IOException e) {
e.printStackTrace();
}
However I can't use that method. Nor can I use the method where you state the tr and td you are grabbing from because.. for this site, all the tr's have the same class so I can't call
Elements Price = Fasteve.select("table[class=table] tr[class=row] td:[class=column]");
Any thoughts as to how I can grab that value? (64.37)
Thanks once again, Stackoverflow.

Consider
Creating a class that holds the td1 String and the td2 or price String, say let's call it DiabloGoldRow or some-such.
creating an Collection of this class, say, ArrayList<DiabloGoldRow>, or if you want to be able quickly get information based on the td1 String, a HashMap<String, DiabloGoldRow>.
Then using JSoup to isolate the information in the table, and then iterate through it in a for loop, creating instances of DiabloGoldRow objects and putting them into the ArrayList or other collection (i.e., HashMap).
I'll leave the details of the code as an exercise for the student.
Edit
You ask,
Why do I need to create a separate class to hold the variables?
Because you need to hold the two pieces of information held on each row close together and may need to search on one to obtain the other. It's a lot cleaner to do it this way than to use 2D arrays or parallel arrays. What is your objection towards doing this?
Edit 2
You state,
I am not opposed to anything. I'm simply wondering how that will help me grab the values I need. My question was using the methods I normally do, I cannot grab the data I want to. I was simply looking for a different syntax to grab the specified data.
Again, one way you can do this with a for loop. Simply loop through the rows of the table:
Elements eles = doc.select("table tr");
for (int i = 0; i < eles.size(); i++) {
Elements rowEles = eles.get(i).select("form");
Elements goldEles = rowEles.select("[name=gold]");
String goldValue = goldEles.attr("value");
Elements priceEles = rowEles.select("[name=price]");
String priceValue = priceEles.attr("value");
System.out.printf("%-7s: %-5s%n", goldValue, priceValue);
}

Handling Empty Nodes Using Java DOM

I have a question concerning XML, Java's use of DOM, and empty nodes. I am currently working on a project wherein I take an XML descriptor file of abstract machines (for text parsing) and parse a series of input strings with them. The actual building and interpretation of these abstract machines is all done and working fine, but I have come across a rather interesting XML requirement. Specifically, I need to be able to turn an empty InputString node into an empty string ("") and still execute my parsing routines. The problem, however, occurs when I attempt to extract this blank node from my XML tree. This causes a null pointer exception and then generally bad things start happening. Here is the offending snippet of XML (Note the first element is empty):
<InputStringList>
<InputString></InputString>
<InputString>000</InputString>
<InputString>111</InputString>
<InputString>01001</InputString>
<InputString>1011011</InputString>
<InputString>1011000</InputString>
<InputString>01010</InputString>
<InputString>1010101110</InputString>
</InputStringList>
I extract my strings from the list using:
//Get input strings to be validated
xmlElement = (Element)xmlMachine.getElementsByTagName(XML_INPUT_STRING_LIST).item(0);
xmlNodeList = xmlElement.getElementsByTagName(XML_INPUT_STRING);
for (int j = 0; j < xmlNodeList.getLength(); j++) {
//Add input string to list
if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
arrInputStrings.add(xmlNodeList.item(j).getFirstChild().getNodeValue());
} else {
arrInputStrings.add("");
}
}
How should I handle this empty case? I have found a lot of information on removing blank text nodes, but I still actually have to parse the blank nodes as empty strings. Ideally, I would like to avoid using a special character to denote a blank string.
Thank you in advance for your time.

if (xmlNodeList.item(j).getFirstChild().getNodeValue() != null) {
nodeValue shouldn't be null; it would be firstChild itself that might be null and should be checked for:
Node firstChild= xmlNodeList.item(j).getFirstChild();
arrInputStrings.add(firstChild==null? "" : firstChild.getNodeValue());
However note that this is still sensitive to the content being only one text node. If you had an element with another element in, or some text and a CDATA section, just getting the value of the first child isn't enough to read the whole text.
What you really want is the textContent property from DOM Level 3 Core, which will give you all the text inside the element, however contained.
arrInputStrings.add(xmlNodeList.item(j).getTextContent());
This is available in Java 1.5 onwards.

You could use a library like jOOX to generally simplify standard DOM manipulation. With jOOX, you'd get the list of strings as such:
List<String> strings = $(xmlMachine).find(XML_INPUT_STRING_LIST)
.find(XML_INPUT_STRING)
.texts();

How to get some xml that comes before and a little from after a DOM Node

I am using java and I am pretty open to using w3c DOM or DOM4J at this point.
So lets say I have a Node like a text node that I have found something interesting in, like say an occurrence of a substring in the nodes text. If I want to get a string with a number characters preceding that node and a few characters after that node how may I do that? Basically I need to be able to display a snippet of the original xml around the occurrence of that string.
The problem I have with getting the parent node for example and then calling asXML is that I no longer know the exact location of the substring in the text node. If I search again for that string value in the parents xml then I may find 2 occurrences or many more if the parent has other children that contain an occurrence of that string.
Much appreciation if any one can answer this question.

I haven't done anything with the DOM from Java in ages, so take this as pseudocode, not Java.
Basically, it boils down to something like this:
parent = node.getParentNode()
Node[] children = parent.getChildNodes()
for (Node child : children) {
if (child == node) {
// Do something different with the matched node
} else {
// do something with child
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Conditional Java Parsing - Getting Child Node Contents - java

Related

How to count HTML Tags if I have dynamic changes in these tags?

Trying to use an assert command in conjuntion with arraylist using selenium, but it breaks if the elements are not in the proper order

Jsoup, cannot get an element out of a table

Handling Empty Nodes Using Java DOM

How to get some xml that comes before and a little from after a DOM Node

Categories

Resources