Processing XML elements inline with text - java

I have a program which reads an XML file using Java DOM and processes certain element. For example, here is part of the document I am looking at:
<Flow>
<Id>306</Id>
<Type>Simple</Type>
<FlowContent Width="0.2000000000000000111">
<P Id="523"><T xml:space="preserve" Id="652">A spouse’s pension would be paid equal to <O Id="351"/>% of your Core pension at date of death.</T>
</P>
</FlowContent>
(Note: this is exported from a program called GMC Inspire Designer, so I have no control over its format.)
I can process most elements fine, but have issues with text content which also contains elements. In the example above, another layout object <O Id="351"/> (referencing another piece of text or a variable) occurs in the body of the text.
I can look up this element and retrieve it using the ID number. This is the element linked in the above snippet:
<Variable>
<Id>351</Id>
<Name>CAMT44</Name>
What I would then like to do is output information from the linked node (e.g., I could look up the node with ID 351 and retrieve the name etc. then display this information in place of where the element appears within the string).
I currently look up children and store the ID in a string array like so:
NodeList nl = e.getElementsByTagName("O");
sa = new String[nl.getLength()]; // Set up new array to hold child ids
for (int i = 0; i < nl.getLength(); i++) {
sa[i] = nodeToElement(nl.item(i)).getAttribute("Id");
}
I'm very much a Java beginner, so I've been wondering if DOM was the correct choice for this project. Perhaps I should have used SAX instead, but as I don't have much XML experience, I'm not sure which best suits my needs and, as I mentioned, I have managed to do most of the things I need, it's just this last tricky bit that I'm stuck on.
Currently my output looks like this:
IF CR.SCHEME == "EXCT" PRINT:
"A spouse’s pension would be paid equal to % of your Core pension at
date of death, ignoring the fact that you may have chosen to convert
part of your pension into a lump sum at retirement."
Child flow: 351
It would be great if there is some way to do this using DOM. Apologies if anything is unclear, I'm new to most of this.

You should be able to do something like this:
String output = "";
for (int i = 0; i < nl.getLength(); i++) {
Node n = nl.item(i);
if(n.getNodeType() == Node.TEXT_NODE) {
output += n.getTextContent();
} else if (n.getNodeType() == Node.ELEMENT_NODE && n.getNodeName().equals("O")) {
output += lookup(doc, ((Element)n).getAttribute("id"));
}
}
System.out.println(output);
The lookup method is something you would need to implement yourself but it would look something like this:
private static String lookup(Document doc, String id) {
return "<IMPLEMENT_LOOKUP_HERE>";
}

Related

How do I select a specific element from a set with similar XPath paths?

There are 2 drop-down lists. Each has a similar meaning, for example, "Jorge". Lists in different modules. When I need to fill in, for example, a list that is lower in the tree, then the first match is taken along the XPath path, on an undisclosed list.
Not lists, but values in drop-down lists!
There are 2 drop-down lists. Each has a similar meaning, for example, "Jorge". Lists in different modules. When I need to fill in, for example, a list that is lower in the tree, then the first match is taken along the XPath path, on an undisclosed list.
Not lists, but values in drop-down lists!
I wanted to implement it in Java this way:
Example:
if (findElement(By.xpath("(//example//example)")).isDisplayed()) {
findElement(By.xpath("(//example//example)")).click();
}
But in this case, the element is not displayed.
How to implement a search of all values similar to the XPath path in order to get the one that is displayed?
I tried to do something like this: (//example//example)1 (//example//example)[2] (//example//example)[3]
In my case, we have that 1 - the element does not exist [2] - exists, but is not displayed (isDisplayed = false) [3] - exists, is displayed (isDisplayed = true)
iterating through the values in the loop for [n] cannot be implemented, because, for example, the value 1 is not.
Described as difficult as possible :D. Excuse me.
If someone understands my nonsense, please help me. How to implement my requirement?
enter image description here
UPD:
The problem was solved (for me) by substituting the first value into the expression ()"{1}" immediately.
Now I'm interested in why I get an exception after the first iteration:
Method threw 'org.openqa.selenium.ElementNotInteractableException' exception.
Code:
int number = 1;
String option = "(//ul[contains(#style, 'display: block')]//li//span[contains(text(),'" + valueField + "') or strong[contains(text(),'" + valueField.toUpperCase() + "')]])";
findElement(By.xpath(option+"["+number+"]"));
String[] words = valueField.split(" ");
StringBuilder builder = new StringBuilder();
for (int i = 0; i < words.length; i++) {
builder.append(words[i]);
setFieldByLabel(nameModule, nameLabel, builder.toString());
fastWaitLoading();
for (int y = 0; y < 10; y++) {
if (findElement(By.xpath(option+"["+number+"]")).isDisplayed()) {
new Actions(browser.getWebDriver())
.moveToElement(findElement(option))
.click()
.build()
.perform();
break;
}
number++;
}
}
So I am trying to fully understand your question, and I don't. What I would recommend for a situation like this is, iterate through all elements by creating a list with: findElements(By.xpath ... )
This way you will get a list of webelements and you can iterate through them. Then apply a foreach, assert if element is displayed (it exists as it has been found with findElements) and you should be able to interact with it.
Yeah, everything is in a prominent place)
Missed it
new actions(browser.getWebDriver()) .moveToElement(findElement(**option**)) .click() .build() .perform(); break;
Here
new actions(browser.getWebDriver())
.moveToElement(findElement(**option + "[" + number+"]"**))
.click()
.build()
.perform();
break;

finding the value pair that has the highest affinity in Java?

Hi I am current working on a algorithm problem set.
Given the below file in a file.txt file,
yahoo,ap42
google,ap42
twitter,thl76
google,aa314
google,aa314
google,thl76
twitter,aa314
twitter,ap42
yahoo,aa314
A web server logs page views in a log file. The log file consists of one line per page view. A page view consists of page id and a user id, separated by a comma. The affinity of a pair of pages is the number of distinct users who viewed both pages. For example in the quoted log file, the affinity of yahoo and google is 2 (because ap42 viewed both and aa314 viewed both).
My requirement is to create an algorithm which will return the pair of pages with highest affinity.
Currently, I have written below code, however, right now it is not returning the pair of pages with highest affinity, any suggest of how I am modify the code to make it work? thanks. :
Scanner in = new Scanner(new File("./file.txt"));
ArrayList<String[]> logList = new ArrayList<String[]>();
while (in.hasNextLine()) {
logList.add(in.nextLine().split(","));
}
String currentPage;
String currentUser;
int highestCount =0;
for (int i = 0; i < logList.size()-1; i++) {
int affinityCount =0;
currentPage = logList.get(i)[0];
currentUser = logList.get(i)[1];
for (int j = logList.size()-1; j > 0; j--) {
if (i != j) {
if (!currentPage.equals(logList.get(j)[0])
&& currentUser.equals(logList.get(j)[1])) {
affinityCount++;
System.out.println("currentPage: "+currentPage+" currentUser: "+ currentUser);
System.out.println("logList.get(j)[0]: "+logList.get(j)[0]+" logList.get(j)[1]): "+ logList.get(j)[1]);
System.out.println(affinityCount);
}
}
}
}
Am going to write the algorithm here . You can convert that into the code.
Traverse the file and create a hashMap of .
After this traversal, you shall get the pages viewed by each user.
Now traverse this dataset. For each user, take out the list of pages he viewed. Make all possible combinations of pair of pages and put it in a max heap with value set to 1. If the combination exists in heap, increment the value.
Make sure you treat - yahoo,google same as google,yahoo while comparing.
At the end of this, the element at top of the heap is your output.

Getting the text content of an XML element without getting the text content of its child nodes

I am trying to import an image from an SVG format into the software I am working with.
For example, I have an SVG like this:
<svg height="100" width="100">
<text>I Love SVG!
<tspan> NOT! </tspan>"
</text>
</svg>
When processing this data the 'text' element, I have TextString= "element.getTextContent()". This makes TextString = "I Love SVG! NOT!" when all I want is "I Love SVG!". So the getTextContent method returns the text from the element and its child elements, when I don't want to include the child elements text.
Is there a simple way to grab only the text content of an element without getting the child nodes text as well? Thanks
You can iterate through the element's child nodes using element.getChildNodes(). Any of them that have a node type of Node.TEXT_NODE are text nodes (there will often be more than one).
String textContent = "";
NodeList childNodes = element.getChildNodes();
for (int i = 0; i < childNodes.length(); i++) {
Node n = childNodes.item(i);
if (n.getNodeType() == Node.TEXT_NODE) {
textContent += n.getNodeValue();
}
}
Found out a solution to my problem, thanks for the suggestions JLRishe and kulatamicuda.
instead of TextString = element.getTextContent() ( which got the text from all child nodes as well as the original text node)
I used TextString = element.getFirstChild().getTextContent().
When using this I made sure the first child was actually a text node and not something like this:
<text><tspan>tspan text</tspan></text>

What is the easiest way to check that one XML file is a "subset" of the second XML?

I have two XML files. The first XML has a bunch of nodes that should be present in second XML as well. The second XML might have a few extra nodes as well. I need a Java based program that can automate this check - i.e. it should tell me that given two XML files, all the nodes of the first file is present in the second xml.
I am looking at Java + XMLUnit. However XMLUnit does not have a exact solution for this. Help please.
Thanks.
Here is a sample code from xmlunit.
One method there actually compares two XMLs and finds out the differences.
public void testCompareToSkeletonXML() throws Exception {
String myControlXML = "<location><street-address>22 any street</street-address><postcode>XY00 99Z</postcode></location>";
String myTestXML = "<location><street-address>20 east cheap</street-address><postcode>EC3M 1EB</postcode></location>";
DifferenceListener myDifferenceListener = new IgnoreTextAndAttributeValuesDifferenceListener();
Diff myDiff = new Diff(myControlXML, myTestXML);
myDiff.overrideDifferenceListener(myDifferenceListener);
assertTrue("test XML matches control skeleton XML " + myDiff, myDiff.similar());
}
You can compare one XML against the other(keeping one as skeletal XML) to find if one is the subset of other.
If that way isn't satisfactory, there is yet another method finding all differences between given two XMLs.
public void testAllDifferences() throws Exception {
String myControlXML = "<news><item id=\"1\">War</item>"
+ "<item id=\"2\">Plague</item><item id=\"3\">Famine</item></news>";
String myTestXML = "<news><item id=\"1\">Peace</item>"
+ "<item id=\"2\">Health</item><item id=\"3\">Plenty</item></news>";
DetailedDiff myDiff = new DetailedDiff(compareXML(myControlXML, myTestXML));
List allDifferences = myDiff.getAllDifferences();
assertEquals(myDiff.toString(), 0, allDifferences.size());
}
See the docs of XMLUnit for more.
First things first. Let me go on record and say that XMLUnit is a gem. I loved it. If you are looking at some unit testing of XML values / attributes / structure etc. chances are that you will find a readymade solution with XMLUnit. This is a good place to start from.
It is quite extensible. It already comes with an identity check (as in the XMLs have the same elements and attributes in the same order) or similarity check (as in the XMLs have the same elements and attributes regardless of the order).
However, in my case I was looking for a slightly different usage. I had a big-ish XML (a few hundred nodes), and a bunch of XML files (around 350,000 of them). I needed to not compare certain particular nodes, that I could identify with XPATH. They were not necessarily always in the same position in the XML but there were some generic way of identifying them with XPATH. Sometimes, some nodes were to be ignored based on values of some other nodes. Just to give some idea
The logic here is on the node that I want to ignore i.e price.
/bookstore/book[price>35]/price
The logic here is on a node that is at a relative position. I want to ignore author based on the value of price. And these two are related by position.
/bookstore/book[price=30]/./author
After much tinkering around, I settled for a low tech solution. Before using XMLUnit to compare the files, I used XPATH to mask the values of the nodes that were to be ignored.
public static int massageData(File xmlFile, Set<String> xpaths, String mask)
throws JDOMException, IOException {
logger.debug("Data massaging started for " + xmlFile.getAbsolutePath());
int counter = 0;
Document doc = (Document) new SAXBuilder().build(xmlFile
.getAbsolutePath());
for (String xpath : xpaths) {
logger.debug(xpath);
XPathExpression<Element> xpathInstance = XPathFactory.instance()
.compile(xpath, Filters.element());
List<Element> elements = xpathInstance.evaluate(doc);
// element = xpathInstance.evaluateFirst(doc);
if (elements != null) {
if (elements.size() > 1) {
logger.warn("Multiple matches were found for " + xpath
+ " in " + xmlFile.getAbsolutePath()
+ ". This could be a *potential* error.");
}
for (Element element : elements) {
logger.debug(element.getText());
element.setText(mask);
counter++;
}
}
}
Hope this helps.

jsoup multi element output

hello guys i am try to print the output of two element data simultaneously
Document document2 = Jsoup.parse(webPage2);
Document document22 = Jsoup.parse(webPage2);
Elements links2 = document2.select("a.yschttl");
Elements links22 = document22.select("div.abstr");
can we include both a.yschttl and div.abstr or...
for (Element link2 : links2) {
out.println(link2);
}
can we include two say links2 and links22 in same for loop...
or how to achive it...
You can do something like:
for (int i = 0; i < links2.size(); i++) {
out.println(links2.get(i));
out.println(links22.get(i));
}
But in this case you will get IndexOutOfBoundsException if size of links22 higher than size of links2.
What do you want to achieve?
If you are just trying to select both at the same time, you can do something like this:
for (Element link : document.select("a.yschttl, div.abstr") {
out.println(link);
}
If you are trying to make two selections and outputting those values in tandem, you will have to do something like #vacuum suggests, but being careful of the lengths of the lists.
A side note, you don't have to parse the document twice to make two selections. You can parse once and select twice.

Categories