How to read xml file with attributes in java? - java

I am aware of SO question Failing to get element values using Element.getAttribute() but because I am java begginer, I have additional questions. What I am trying to build is simple application, which will read XML file and then compare it against "golden master." My problem is:
I have lots of different XML files, which differ in attributes
The XML files are relatively big. (810 lines of filed - hard to check it by human eye)
Example of file:
<DocumentIdentification v="Unique_ID"/>
<DocumentVersion v="1"/>
<DocumentType v="P81"/>
<SenderIdentification v="TEST-001--123456" codingScheme="A01"/>
<CreationDateTime v="2012-10-15T13:00:00Z"/>
<InArea v="10STS-TST------W" codingScheme="A01"/>
<OutArea v="10YWT-AYXOP01--8" codingScheme="A01"/>
<TimeSeries>
<Period>
<TimeInterval v="2012-10-14T22:00Z/2012-10-15T22:00Z"/>
<Resolution v="PT15M"/>
<Interval>
<Pos v="1"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="2"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="3"/>
<Qty v="452"/>
</Interval>
...
...
<Interval>
<Pos v="96"/>
<Qty v="891"/>
</Interval>
</Period>
</TimeSeries>
Applying solution from the question mentioned above does not get me much further... I realised that I can cast attributes to NamedNodeMap but I dont know how to iterate through it programatically
Yes, I know it sounds much like "do my homework" but what I really need is at least small kick to butt, moving me in correct direction. Thanks for help

The method item(int index) should help iterating through the attributes:
NamedNodeMap map = getItFromSomeWhere();
int i = 0;
while ((Node node = map.item(i++)) != null) {
// node is ith node in the named map
}

Related

Read and change zoom level of named destinations in pdf

I'd like to read and change zoom level of named destinations in a pdf file using iText 7. Ive come up with the following code:
Map<String, PdfObject> names =
document.getCatalog().getNameTree(PdfName.Dests).getNames();
for (Map.Entry<String, PdfObject> dest : names.entrySet()) {
if (dest.getValue().isArray()) {
PdfArray arr = (PdfArray) dest.getValue();
PdfName asName = arr.getAsName(1); // /Fit
arr.set(1, FitR);
//System.out.println();
arr.setModified();
}
}
However, this code fails to work against my example file and has other flaws as well.
Most importantly, it tries to deal with one type of zoom (/Fit), but other types (/XYZ and so on) should also be handled with. Second, I don't know how to get the page number of named destination as key pair named of destination and its zoom value doesn't seem to have this information. Please see a screenshot of debug session below:
Note, at SO there is already a question dealing with exactly the same topic. The thing is the answer to that question give too little information to deal with this problem.

How am I supposed to extract properties from a node in Apache Jackrabbit from xml?

I have been playing around with the example number three in here http://jackrabbit.apache.org/jcr/first-hops.html , however to me it remains unclear how to get access to the properties of a node.
In the first screenshot
I used the debugger from my IDE and I evaluated this expression
session.getNode("/importxml/xhtml:html/xhtml:body/mathml:math/mathml:apply/mathml:apply[2]/mathml:apply[2]/mathml:cn").getProperty("jcr:xmltext/jcr:xmlcharacters").getString().trim();
You can see how I can get access to "jcr:xmltest/jcr:xmlcharacters" and have 2 as a result.
However, when I try to get this information, get this property out of the node, I am unable to perform this operation as in this screenshot.
This is the code fragment in the above screenshot:
var node = session.getNode("/importxml/xhtml:html/xhtml:body/mathml:math/mathml:apply/mathml:apply[2]/mathml:apply[2]/mathml:cn");
var properties = node.getProperties();
List<string> result = new ArrayList<>();
while(properties.hasNext()) {
Property property = properties.nextProperty();
result.add(property.getString().trim());
}
return result;
You can see how I get as a response only a value containing "nt:unstructured".
Unfortunately I couldn't find many code examples online, on Github, etc. many outdated, and also, there are not books as there are for Scrapy or other libraries/frameworks.
Thank you in advance.
Have a nice day!
Davide
In the first case, you are looking at the properties of:
/importxml/xhtml:html/xhtml:body/mathml:math/mathml:apply/mathml:apply[2]/mathml:apply[2]/mathml:cn/jcr:xmltext
In the second case:
/importxml/xhtml:html/xhtml:body/mathml:math/mathml:apply/mathml:apply[2]/mathml:apply[2]/mathml:cn
Note the different paths.

How to retrieve the xml value using the xpath?

My Scenario : I may get the different outputs which is shown below.I want to retrieve the "Units" tag value depends on Code tag.
Output1 :
<Riders>
<Rider>
<Name>ALSP</Name>
<Code>ALSP</Code>
<Units>3</Units>
</Rider>
<Rider>
<Name>Individual</Name>
<Code>Select Type of Coverage</Code>
<OptionCode>Individual</OptionCode>
<IsFeature>true</IsFeature>
</Rider>
</Riders>
Output2 :
<Riders>
<Rider>
<Name>AFO</Name>
<Code>AFO</Code>
<Units>6</Units>
</Rider>
<Rider>
<Name>Individual</Name>
<Code>Select Type of Coverage</Code>
<OptionCode>Individual</OptionCode>
<IsFeature>true</IsFeature>
</Rider>
</Riders>
I have tried below xpath but didn't worked out. Could anyone suggest me.
/Riders/Rider/Code[text()[contains(.,'AFO')] or text() [contains(.,'ALSP')]]/Units
Depending on if you want to get the node with regards to the root or just find any matching node, you could use something like...
*/Rider[Code[contains(.,'ALSP')]]/Units
Which will return you all the Units nodes which belong to any Rider node, which have a Code node, whose text contains ALSP
Of course you could also use...
*/Rider[Code[text() = 'ALSP']]/Units
If the Code must match exactly.
The above will find all the Units nodes of the Rider node anywhere in the document. If the position is important, you would need to replace */ with /Riders/ instead.
Now, if you want to find both ALSP and AFO, you could use something like...
*/Rider[Code[text() = 'ALSP' or text() = 'AFO']]/Units

JAXB Multiple Content for Element

I'm currently working on a txt-to-xml project. Basically what I'm doing is creating different XmlElements for some of the content.
I got a DTD up and running and for now I'm creating a default xml, just to make sure every xml created is a valid xml (for the DTD given).
I'm mainly creating new Classes for every Element, which doesn't have a #PCDATA structure and it's working pretty fine so far.
Now I'm struggling with a problem:
I got the following in my DTD:
<!ELEMENT REACTION(#PCDATA | ACTOR*)>
What I'm looking for in my Text is something like:
Prof. X clapped!
and I want to extract this into my XML as:
<REACTION>
<ACTOR>Prof. X</ACTOR> clapped!
</REACTION>
So what I basically want is a String-Attribute within the ReactionClass which is devlares as XML-Element but holds an Actor-Attribute + Rest of the Text. I thought of something like:
String m_sText;
String m_sActor;
public ReactionClass(){
this.Actor = "Prof. X";
this.sText = this.m_sActor + " clapped!";
}
#XmlElement(name = "TEXT")
public String getM_sText(){ return this.m_sText; }
#XmlElement(name = "ACTOR")
public String getM_sActor(){ return this.m_sActor; }
For all other Nodes, such as the RootNode I created a RootNodeClass which holds different attributes, such as m_nLocation, m_nTime, m_nYear which are declared as XML-Elements, so the JAXB-Marshaller just builds up the XML on basis of these elements:
<ROOT>
<TIME>09:00</TIME>
<LOCATION>New York</TIME>
<YEAR>1992</YEAR>
</ROOT>
I wanted to do the same with the REACTION-Node (like mentioned above), but when creating a new Class REACTION I'm getting sth. like:
<REACTION>
<TEXT>Prof. X clapped!</TEXT>
<ACTOR>Prof. X</ACTOR>
</REACTION>
How would I put them into one Element but still keep the Tags such as above?
If anybody got an idea how to manage this I would be very thankful!
Thanks Max
First, what you most probably need is #XmlMixed. You'll probably have a structure like:
#XmlMixed
#XmlElementRefs({
#XmlElementRef(name="ACTOR", type=JAXBElement.class),
...})
List<Object> content;
With this you could put there Strings and JAXBElement<Actor> to achieve so-called mixed content.
Next, you might consider turning your DTD into XML Schema first and compiling it - or compiling the DTD with XJC.
Finally, what you have is so-called "semi-structured data" which I think is not quite suitable for JAXB. JAXB works great for strong and clear structures, but if you have mixed stuff you get weird models that are hard to work with. I can't suggest an alternative though.

How do I get the RDF resource that represents the objects which have a certain [property,object] pair?

Okay, to clarify, I have an XML/RDF file that describes data with a natural categorical tree structure (like folders and files). The data is not structured in a tree, rather, there is information that explains how to rebuild the tree (namely the nested set values of each node). I am starting with no knowledge other than the assumption that some statement in the file has a RootTree property who's object is the URI of the statement describing the root node of the tree.
Obtaining that object is easy, I simply use:
// Obtain the node describing the root of the Pearltree.
mRootProp = mModel.createProperty(Pearltree.RDF.PearlTreeNS, "rootTree");
NodeIterator roots = mModel.listObjectsOfProperty(mRootProp);
Now, I am further able to list all statements which have the property pt:parentTree and the object roots.nextNode():
StmtIterator sit = mModel.listStatements(null, RDF.ParentTree, rootNode);
This gives me a list of all such statements. These statements are part of elements that look like such in the RDF/XML file (note these have a different parentTree value but appear in the same context):
<pt:RootPearl rdf:about="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#rootPearl">
<dcterms:title><![CDATA[Pearltrees videos]]></dcterms:title>
<pt:parentTree rdf:resource="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268" />
<pt:inTreeSinceDate>2012-06-11T20:25:55</pt:inTreeSinceDate>
<pt:leftPos>1</pt:leftPos>
<pt:rightPos>8</pt:rightPos>
</pt:RootPearl>
<pt:PagePearl rdf:about="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#pearl46838293">
<dcterms:title><![CDATA[why Pearltrees?]]></dcterms:title>
<dcterms:identifier>http://www.youtube.com/watch?v%3di4rDqMMFx8g</dcterms:identifier>
<pt:parentTree rdf:resource="http://www.pearltrees.com/dcow/pearltrees-videos/id5296268" />
<pt:inTreeSinceDate>2012-06-11T20:25:55</pt:inTreeSinceDate>
<pt:leftPos>2</pt:leftPos>
<pt:rightPos>3</pt:rightPos>
</pt:PagePearl>
...
Now, what I would like to do is obtain a reference to all statements with subject sit.nextStatement()'s subject. In this example:
"http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#rootPearl"
and
"http://www.pearltrees.com/dcow/pearltrees-videos/id5296268#pearl46838293"
My goal is to obtain the content of each element including its rightPos and leftPos so I can reconstruct the tree.
You can simplify your code somewhat as follows:
mRootProp = mModel.createProperty(Pearltree.RDF.PearlTreeNS, "rootTree");
Resource root = mModel.listResourcesWithProperty( mRootProp ).next();
This assumes you know you have exactly one root per model. If that might not be true, modify the code accordingly.
The method:
getSubject()
of a Statement will return the Subject as a Resource. You can then use the
getProperty(Property p)
method of the returned Resource to obtain the Statements that include the property in question.
So, in my case, I use:
Resource r;
Statement title, id, lpos, rpos;
while(sit.hasNext()) {
r = sit.nextStatement().getSubject();
title = r.getProperty(DCTerms.title);
id = r.getProperty(DCTerms.identifier);
lpos = r.getProperty(PearlTree.RDF.leftPos);
rpos = r.getProperty(PearlTree.RDF.rightPos);
...
}

Categories