I'm currently working on a txt-to-xml project. Basically what I'm doing is creating different XmlElements for some of the content.
I got a DTD up and running and for now I'm creating a default xml, just to make sure every xml created is a valid xml (for the DTD given).
I'm mainly creating new Classes for every Element, which doesn't have a #PCDATA structure and it's working pretty fine so far.
Now I'm struggling with a problem:
I got the following in my DTD:
<!ELEMENT REACTION(#PCDATA | ACTOR*)>
What I'm looking for in my Text is something like:
Prof. X clapped!
and I want to extract this into my XML as:
<REACTION>
<ACTOR>Prof. X</ACTOR> clapped!
</REACTION>
So what I basically want is a String-Attribute within the ReactionClass which is devlares as XML-Element but holds an Actor-Attribute + Rest of the Text. I thought of something like:
String m_sText;
String m_sActor;
public ReactionClass(){
this.Actor = "Prof. X";
this.sText = this.m_sActor + " clapped!";
}
#XmlElement(name = "TEXT")
public String getM_sText(){ return this.m_sText; }
#XmlElement(name = "ACTOR")
public String getM_sActor(){ return this.m_sActor; }
For all other Nodes, such as the RootNode I created a RootNodeClass which holds different attributes, such as m_nLocation, m_nTime, m_nYear which are declared as XML-Elements, so the JAXB-Marshaller just builds up the XML on basis of these elements:
<ROOT>
<TIME>09:00</TIME>
<LOCATION>New York</TIME>
<YEAR>1992</YEAR>
</ROOT>
I wanted to do the same with the REACTION-Node (like mentioned above), but when creating a new Class REACTION I'm getting sth. like:
<REACTION>
<TEXT>Prof. X clapped!</TEXT>
<ACTOR>Prof. X</ACTOR>
</REACTION>
How would I put them into one Element but still keep the Tags such as above?
If anybody got an idea how to manage this I would be very thankful!
Thanks Max
First, what you most probably need is #XmlMixed. You'll probably have a structure like:
#XmlMixed
#XmlElementRefs({
#XmlElementRef(name="ACTOR", type=JAXBElement.class),
...})
List<Object> content;
With this you could put there Strings and JAXBElement<Actor> to achieve so-called mixed content.
Next, you might consider turning your DTD into XML Schema first and compiling it - or compiling the DTD with XJC.
Finally, what you have is so-called "semi-structured data" which I think is not quite suitable for JAXB. JAXB works great for strong and clear structures, but if you have mixed stuff you get weird models that are hard to work with. I can't suggest an alternative though.
Related
I do REST calls to a WebService and receive always XML as response. Then i'm parsing that XML und filling Java objects with those informations.
The Problem is that the element-tags could have different namespaces, like this:
<ns:title>....</ns:title>
or
<ns2:title>....<ns2:title>
or
<title>...<title>
EDIT:
And the namespace URIs look like this:
<ns2:feed xmlns="http://www.example.com/routing1/routing2"
xmlns:ns2="http://www.w3.org/../Atom"
xmlns:ns3="http://www.example.com/routing1/routing2"
xmlns:ns4="http://purl.org/routing1/routing2/1.0">
So therefore i changed the method element.getElementsByTagNameNS("specifiedNamespace", "title") to element.getElementsByTagNameNS("*", "title").
Is that okay to match all namespace, because i have also the case that the element-tag doesn't have a namespace like the third example <title>..</title>..
Is there a better procedure, to solve that problem? Or is it okay to solve it like, how i do it?
Thanks.
EDIT: 2 response examples
1.
<ns2:feed xmlns="http://www.example.com/routing1/routing2" xmlns:ns2="http://www.w3.org/../Atom" xmlns:ns3="http://www.example.com/routing1/routing2" xmlns:ns4="http://purl.org/routing1/routing2/1.0">
...
<ns2:someTag1>..</ns2:someTag1>
<ns2:title>title</ns2:title>
<entry>...</entry>
....
</ns2:feed>
2
<ns2:feed xmlns="http://www.w3.org/../Atom" xmlns:ns2="http://www.example.com/routing1/routing2" xmlns:ns3="http://www.example.com/routing1/routing2" xmlns:ns4="http://purl.org/routing1/routing2/1.0">
...
<someTag1>..<someTag1>
<title>title<title>
<ns2:entry>...</ns2:entry>
....
</ns2:feed>
Your title elements have the same namespace in both of your examples.
In the first example, you have:
xmlns:ns2="http://www.w3.org/../Atom"
and
<ns2:title>title</ns2:title>
so this means that title is in the http://www.w3.org/../Atom namespace.
In the second example, you have:
xmlns="http://www.w3.org/../Atom"
and
<title>title<title>
so here again title is in the http://www.w3.org/../Atom namespace.
The prefixes are different (the second example isn't using a prefix for title), but the namespace is the same.
This means that you should be able to use:
element.getElementsByTagNameNS("http://www.w3.org/../Atom", "title")
and it should successfully select the title element, even if the prefixes change.
I built xml files in Android from objects by appending multiple objects to the same file using Simple
<listOfBtDevices>
<devices class="java.util.ArrayList">
<BTDevice>
<address>00:27:13:A3:2D:14</address>
<bondState>NONE</bondState>
<deviceType>LAPTOP</deviceType>
<name>LTPH</name>
<services>AUDIO CAPTURE NETWORKING OBJECT_TRANSFER RENDERING TELEPHONY</services>
<rssi>-95</rssi>
</BTDevice>
<BTDevice>
<address>00:27:13:A3:2D:14</address>
<bondState>NONE</bondState>
<deviceType>LAPTOP</deviceType>
<name>LTPH</name>
<services>AUDIO CAPTURE NETWORKING OBJECT_TRANSFER RENDERING TELEPHONY</services>
<rssi>-95</rssi>
</BTDevice>
<BTDevice>
<address>00:27:13:A3:2D:14</address>
<bondState>NONE</bondState>
<deviceType>LAPTOP</deviceType>
<name>LTPH</name>
<services>AUDIO CAPTURE NETWORKING OBJECT_TRANSFER RENDERING TELEPHONY</services>
<rssi>-95</rssi>
</BTDevice>
</devices>
<timestamp>22.11.2013_10.56.44</timestamp>
</listOfBtDevices>
<listOfBtDevices>
<devices class="java.util.ArrayList">
<BTDevice>
<address>00:27:13:A3:2D:14</address>
<bondState>NONE</bondState>
<deviceType>LAPTOP</deviceType>
<name>LTPH</name>
<services>AUDIO CAPTURE NETWORKING OBJECT_TRANSFER RENDERING TELEPHONY</services>
<rssi>-95</rssi>
</BTDevice>
</devices>
<timestamp>22.11.2013_10.56.50</timestamp>
</listOfBtDevices>
In the example above the object is ListOfBtDevices which is compound of a (String)timestamp and an ArrayList of BTDevice. The question is how can I deserialize it in multiple ListOfBtDevice objects using Simple or other framework on the Desktop Computer?
Thank you and sorry if I made mistakes but I am beginner in JAVA.
First of all, for you to be able to deserialize the xml file it needs to have one single root node. not multiple like in your case.
You have:
<listOfBtDevices>
...
</listOfBtDevices>
<listOfBtDevices>
...
</listOfBtDevices>
What it should look like:
<root>
<listOfBtDevices>
...
</listOfBtDevices>
<listOfBtDevices>
...
</listOfBtDevices>
</root>
After that you need to deserialize using some framework like Simple or XStream. I would recommend Simple for your purposes and I will give you a rudimentary example using Simple:
To deserialize the XML into objects using Simple you have to create model classes which have the same structure like your xml. You describe how the Objects are mapped to XML using annotations like #Element or #Attribute. In your case it would look something like this:
First you need a class which corresponds to your root node:
// The name you enter here is how the root node is called in your xml
#Root(name = "root")
public class BluetoothDeviceListContainer {
// It contains a List, inline is set to true because the list items are directly inside the root node, each entry is called "listOfBtDevices"
#ElementList(entry = "listOfBtDevices", inline = true)
List<BluetoothDeviceList> bluetoothDeviceList;
}
And you continue to do this until you have fully described your xml with classes. Next you would create a Class BluetoothDeviceList which corresponds to the "listOfBtDevices" nodes and enter which elements and attributes are contained and this class etc.
When you are finished you can deserialize your xml like this:
Serializer serializer = new Persister();
BluetoothDeviceListContainer container = serializer.read(BluetoothDeviceListContainer.class, xmlAsString);
After adding some getters and if you need setters to your model classes you can access all deserialized properties like this:
BluetoothDeviceList list = container.get(0);
...
Here you can find a complete tutorial for Simple and here some additional examples.
Given XML like this:
...
<Sport SportId="1">
<Name language="en">Soccer</Name>
<Name language="fi">Jalkapallo</Name>
...
</Sport>
...
How can I, using the Simple XML Framework, read the two values into fields in a Java class? (The <Sport> element is already correctly mapped to the corresponding class.)
public class Sport {
...
String nameEn;
String nameFi;
...
}
I've tried approaches like:
#Element(name = "Name")
#Path("Name[#language='en']")
String nameEn;
But the parsing fails with:
Exception in thread "main" org.simpleframework.xml.core.PathException:
Invalid index for path '[#language='en']' in field 'nameEn'
Also, omitting #Element like this:
#Path("Name[#language='en']")
String nameEn;
...parsing doesn't crash, but nameEn value stays null.
I'd like the matching to be based on the language attribute (instead of ordering), but I'm wondering if that's possible (maybe XPath support in Simple Framework is limited?).
Have you tried getting the text of the element explicitly? i.e. Name[#language='en']/text()
Your Xpath is selecting the element, not the text of the element, which can cause some XML engines to choke.
I am aware of SO question Failing to get element values using Element.getAttribute() but because I am java begginer, I have additional questions. What I am trying to build is simple application, which will read XML file and then compare it against "golden master." My problem is:
I have lots of different XML files, which differ in attributes
The XML files are relatively big. (810 lines of filed - hard to check it by human eye)
Example of file:
<DocumentIdentification v="Unique_ID"/>
<DocumentVersion v="1"/>
<DocumentType v="P81"/>
<SenderIdentification v="TEST-001--123456" codingScheme="A01"/>
<CreationDateTime v="2012-10-15T13:00:00Z"/>
<InArea v="10STS-TST------W" codingScheme="A01"/>
<OutArea v="10YWT-AYXOP01--8" codingScheme="A01"/>
<TimeSeries>
<Period>
<TimeInterval v="2012-10-14T22:00Z/2012-10-15T22:00Z"/>
<Resolution v="PT15M"/>
<Interval>
<Pos v="1"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="2"/>
<Qty v="500"/>
</Interval>
<Interval>
<Pos v="3"/>
<Qty v="452"/>
</Interval>
...
...
<Interval>
<Pos v="96"/>
<Qty v="891"/>
</Interval>
</Period>
</TimeSeries>
Applying solution from the question mentioned above does not get me much further... I realised that I can cast attributes to NamedNodeMap but I dont know how to iterate through it programatically
Yes, I know it sounds much like "do my homework" but what I really need is at least small kick to butt, moving me in correct direction. Thanks for help
The method item(int index) should help iterating through the attributes:
NamedNodeMap map = getItFromSomeWhere();
int i = 0;
while ((Node node = map.item(i++)) != null) {
// node is ith node in the named map
}
I am trying to parse the stack overflow data dump, one of the tables is called posts.xml which has around 10 million entry in it. Sample xml:
<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="1" PostTypeId="1" AcceptedAnswerId="26" CreationDate="2010-07-07T19:06:25.043" Score="10" ViewCount="1192" Body="<p>Now that the Engineer update has come, there will be lots of Engineers building up everywhere. How should this best be handled?</p>
" OwnerUserId="11" LastEditorUserId="56" LastEditorDisplayName="" LastEditDate="2010-08-27T22:38:43.840" LastActivityDate="2010-08-27T22:38:43.840" Title="In Team Fortress 2, what is a good strategy to deal with lots of engineers turtling on the other team?" Tags="<strategy><team-fortress-2><tactics>" AnswerCount="5" CommentCount="7" />
<row Id="2" PostTypeId="1" AcceptedAnswerId="184" CreationDate="2010-07-07T19:07:58.427" Score="5" ViewCount="469" Body="<p>I know I can create a Warp Gate and teleport to Pylons, but I have no idea how to make Warp Prisms or know if there's any other unit capable of transporting.</p>
<p>I would in particular like this to built remote bases in 1v1</p>
" OwnerUserId="10" LastEditorUserId="68" LastEditorDisplayName="" LastEditDate="2010-07-08T00:16:46.013" LastActivityDate="2010-07-08T00:21:13.163" Title="What protoss unit can transport others?" Tags="<starcraft-2><how-to><protoss>" AnswerCount="3" CommentCount="2" />
<row Id="3" PostTypeId="1" AcceptedAnswerId="56" CreationDate="2010-07-07T19:09:46.317" Score="7" ViewCount="356" Body="<p>Steam won't let me have two instances running with the same user logged in.</p>
<p>Does that mean I cannot run a dedicated server on a PC (for example, for Left 4 Dead 2) <em>and</em> play from another machine?</p>
<p>Is there a way to run the dedicated server without running steam? Is there a configuration option I'm missing?</p>
" OwnerUserId="14" LastActivityDate="2010-07-07T19:27:04.777" Title="How can I run a dedicated server from steam?" Tags="<steam><left-4-dead-2><dedicated-server><account>" AnswerCount="1" />
<row Id="4" PostTypeId="1" AcceptedAnswerId="14" CreationDate="2010-07-07T19:11:05.640" Score="10" ViewCount="201" Body="<p>When I get to the insult sword-fighting stage of The Secret of Monkey Island, do I have to learn every single insult and comeback in order to beat the Sword Master?</p>
" OwnerUserId="17" LastEditorUserId="17" LastEditorDisplayName="" LastEditDate="2010-07-08T21:25:04.787" LastActivityDate="2010-07-08T21:25:04.787" Title="Do I have to learn all of the insults and comebacks to be able to advance in The Secret of Monkey Island?" Tags="<monkey-island><adventure>" AnswerCount="3" CommentCount="2" />
I would like to parse this xml, but only load certain attributes of the xml, which are Id, PostTypeId, AcceptedAnswerId and other 2 attributes. Is there a way in SAX so that it only loads these attributes?? If there is then how? I am pretty new to SAX, so some guidance would help.
Otherwise loading the whole thing would just be purely slow and some of the attributes won't be used anyways so it's useless.
One other question is that would it be possible to jump to a particular row that has a row Id X? If possible then how do I do this?
"StartElement" Sax Event permits to process a single XML ELement.
In java code you must implement this method
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if("row".equals(localName)) {
//this code is executed for every xml element "row"
String id = attributes.getValue("id");
String PostTypeId = attributes.getValue("PostTypeId");
String AcceptedAnswerId = attributes.getValue("AcceptedAnswerId");
//others two
// you have your att values for an "row" element
}
}
For every element, you can access:
Namespace URI
XML QName
XML LocalName
Map of attributes, here you can extract your two attributes...
see ContentHandler Implementation for specific deatils.
bye
UPDATED: improved prevous snippet.
It is pretty much the same approach as I've answered here already.
Scroll down to the org.xml.sax Implementation part. You'll only need a custom handler.
Yes, you can override methods that process only the elements you want:
http://www.javacommerce.com/displaypage.jsp?name=saxparser1.sql&id=18232
http://www.java2s.com/Code/Java/XML/SAXDemo.htm
SAX doesn't "load" elements. It informs your application of the start and end of each element, and it's entirely up to your application to decide which elements it takes any notice of.