How to parse xml and get the data from xml string? - java

I am getting one xml string, that I want to parse and get the data from it. I tried to parse it to json but I get the empty braces as a result.
public class ResultsActivity extends Activity {
String outputPath;
TextView tv;
public static int PRETTY_PRINT_INDENT_FACTOR = 4;
public static String TEST_XML_STRING;
DocumentBuilder builder;
InputStream is;
Document dom;
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
tv = new TextView(this);
setContentView(tv);
String imageUrl = "unknown";
Bundle extras = getIntent().getExtras();
if( extras != null) {
imageUrl = extras.getString("IMAGE_PATH" );
outputPath = extras.getString( "RESULT_PATH" );
}
// Starting recognition process
new AsyncProcessTask(this).execute(imageUrl, outputPath);
}
public void updateResults(Boolean success) {
if (!success)
return;
try {
StringBuffer contents = new StringBuffer();
FileInputStream fis = openFileInput(outputPath);
try {
Reader reader = new InputStreamReader(fis, "UTF-8");
BufferedReader bufReader = new BufferedReader(reader);
String text = null;
while ((text = bufReader.readLine()) != null) {
contents.append(text).append(System.getProperty("line.separator"));
}
} finally {
fis.close();
}
XmlToJson xmlToJson = new XmlToJson.Builder(contents.toString()).build();
// convert to a JSONObject
JSONObject jsonObject = xmlToJson.toJson();
// OR convert to a Json String
String jsonString = xmlToJson.toString();
// OR convert to a formatted Json String (with indent & line breaks)
String formatted = xmlToJson.toFormattedString();
Log.e("xml",contents.toString());
Log.e("json",jsonObject.toString());
} catch (Exception e) {
displayMessage("Error: " + e.getMessage());
}
}
public void displayMessage( String text )
{
tv.post( new MessagePoster( text ) );
}
#Override
public boolean onCreateOptionsMenu(Menu menu) {
// Inflate the menu; this adds items to the action bar if it is present.
getMenuInflater().inflate(R.menu.activity_results, menu);
return true;
}
class MessagePoster implements Runnable {
public MessagePoster( String message )
{
_message = message;
}
public void run() {
tv.append( _message + "\n" );
setContentView( tv );
}
private final String _message;
}
}
I followed this link : https://github.com/smart-fun/XmlToJson
Can I only parse xml? How can I get the data out of xml string?
Following is the xml string:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ocrsdk.com/schema/recognizedBusinessCard-1.0.xsd http://ocrsdk.com/schema/recognizedBusinessCard-1.0.xsd" xmlns="http://ocrsdk.com/schema/recognizedBusinessCard-1.0.xsd">
<businessCard imageRotation="noRotation">
<field type="Mobile">
<value>•32147976</value>
</field>
<field type="Address">
<value>Timing: 11:00 a.m. to 5.00 p.m</value>
</field>
<field type="Address">
<value>MULTOWECIALITY HOSPITAL Havnmg Hotel MwyantfwfMf), TOL: 1814 7»7» / 0454 7575 fax: 2514 MSS MtoMte t wvHwJaMtur0Mapttal.com</value>
</field>
<field type="Name">
<value>M. S. (Surgery), Fais, Fics</value>
</field>
<field type="Company">
<value>KASTURI MEDICARE PVT. LTD.</value>
</field>
<field type="Job">
<value>Consulting General Surgeon Special Interest: Medical Administrator: KsturiSecretary: IMA - Mira</value>
</field>
<field type="Text">
<value>Mob.: •32114976
Dr. Rakhi R
M. S. (Surgery), Surgeon
Special Interest: Medical
President: Bhayander Medical Association
Scientific Secretary: IMA - Mira Bhayander
Timing: 11:00 a.m. to 5.00 p.m
%
*
KASTURI MEDICARE PVT. LTD.
ISO 9001:2008 Certified, ASNH Cliniq 21 Certified,
MtoMte t wvHwJaMtur0Mapttal.com
mkhLkasturi0gmoiH.com</value>
</field>
</businessCard>
I checked this link to parse the xml: http://androidexample.com/XML_Parsing_-_Android_Example/index.php?view=article_discription&aid=69
But this string dose not have the list, I am not getting how to parse this xml string. Can anyone help please?? Thank you..

You can parse Json easily than XML.
So I will suggest you to parse Json,
First Convert XMLto Json then parse the JsonObject.
here is reference you can take to convert XML to JSON Step by Step
https://stackoverflow.com/a/18339178/6676466

For Xml parsing you can go for either XML Pull Parser or XML DOM Parser.
Both the process are quite lengthy and involves a lot code as it focuses on manual parsing on XML.
Another way is to use This Library in your project and boom most of your job is done. It will parse your XML just like you parse your JSON using GSON.
All you need to do is to create a instance of the parser and use it like:
XmlParserCreator parserCreator = new XmlParserCreator() {
#Override
public XmlPullParser createParser() {
try {
return XmlPullParserFactory.newInstance().newPullParser();
} catch (Exception e) {
throw new RuntimeException(e);
}
}
};
GsonXml gsonXml = new GsonXmlBuilder()
.setXmlParserCreator(parserCreator)
.create();
String xml = "<model><name>my name</name><description>my description</description></model>";
SimpleModel model = gsonXml.fromXml(xml, SimpleModel.class);
Remember that you need to create a POJO class for your response just like you do for GSON.
Include the library in your gradle using:
compile 'com.stanfy:gson-xml-java:0.1.+'
Please read the github link for library carefully to know the usage and limitations.

from your question I don't get the reason to convert xml to json but just to get a way to fetch some fields out of the xml directly.
If there is no need to process the json data at a later step I recommend you to use XPATH. With Xpath you can get the data of you xml with a simple path query like "/document/businessCard/field[#type='Mobile']/value"
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(URI_TO_YOUR_DOCUMENT);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/document/businessCard/field[#type='Mobile']/value");

Related

Extract values from xml file using Java

Here is my response contain XML file and I want to retrieve bEntityID="328" from this xml response
<?xml version="1.0" encoding="UTF-8"?>
<ns2:aResponse xmlns:ns2="http://www.***.com/F1/F2/F3/2011-09-11">
<createBEntityResponse bEntityID="328" />
</ns2:aResponse>
I am trying to this but getting null
System.out.println("bEntitytID="+XmlPath.with(response.asString())
.getInt("aResponse.createBEntityResponse.bEntityID"));
Any suggestion for getting BEntityID from this response?
Though I dont suggest the below approach to use Regex to get element values, but if you are too desperate to get then try the below code:
public class xmlValue {
public static void main(String[] args) {
String xml = "<ns2:aResponse xmlns:ns2=\"http://www.***.com/F1/F2/F3/2011-09-11\">\n" +
" <createBEntityResponse bEntityID=\"328\" />\n" +
"</ns2:aResponse>";
System.out.println(getTagValue(xml,"createBEntityResponse bEntityID"));
}
public static String getTagValue(String xml, String tagName){
String [] s;
s = xml.split("createBEntityResponse bEntityID");
String [] valuesBetweenQuotes = s[1].split("\"");
return valuesBetweenQuotes[1];
}
}
Output: 328
Note: Better solution is to use XML parsers
This will fetch the first tag value:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
Other way around is to use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser()); //parse the whole xml doc
for (Element e : doc.select("tagName")) {
System.out.println(e); //select the specific tag and prints
}
I think the best way is deserializing xml to pojo like here, and then get value
entityResponse.getEntityId();
I tried with the same XML file and was able to get the value of bEntityId with the following code. Hope it helps.
#Test
public void xmlPathTests() {
try {
File xmlExample = new File(System.getProperty("user.dir"), "src/test/resources/Data1.xml");
String xmlContent = FileUtils.readFileToString(xmlExample);
XmlPath xmlPath = new XmlPath(xmlContent).setRoot("aResponse");
System.out.println(" Entity ::"+xmlPath.getInt(("createBEntityResponse.#bEntityID")));
assertEquals(328, xmlPath.getInt(("createBEntityResponse.#bEntityID")));
} catch (Exception e) {
e.printStackTrace();
}
}

ROME API Parse Image URL in CDATA from RSS Feed

Rome API does not parse the image URL if the URL is given within the CDATA section.
For example, http://www.espn.com/espn/rss/espnu/news this feed has
<image>
<![CDATA[
URL of the image
]]>
</image>
Within the SyndFeed resulting from SyndFeedInput, I have checked the foreignMarkups, enclosures, DCModules.
value of other elements, such as Description and Title are also given within the CDATA, and Rome API is able to parse these values.
code snippet
XmlReader xmlReader = null;
try {
xmlReader = new XmlReader(new URL("http://www.espn.com/espn/rss/espnu/news"));
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(xmlReader);
} catch (Exception e) {
e.printStackTrace();
}
I looked into the API in more details. The API provides plugins to override the parsing
https://rometools.github.io/rome/RssAndAtOMUtilitiEsROMEV0.5AndAboveTutorialsAndArticles/RssAndAtOMUtilitiEsROMEPluginsMechanism.html
I wrote a class that extends RSS20Parser implements WireFeedParser and override the parseItem method
#Override
public Item parseItem(Element rssRoot, Element eItem, Locale locale) {
Item item = super.parseItem(rssRoot, eItem, locale);
Element imageElement = eItem.getChild("image", getRSSNamespace());
if (imageElement != null) {
String imageUrl = imageElement.getText();
Element urlElement = imageElement.getChild("url");
if(urlElement != null)
{
imageUrl = urlElement.getText();
}
Enclosure e = new Enclosure();
e.setType("image");
e.setUrl(imageUrl);
item.getEnclosures().add(e);
}
return item;
}
Now in SyndFeed, access the enclosures list and you will be able to find the image URL
List<SyndEntry> entries = feed.getEntries();
for (SyndEntry entry : entries) {
...
...
List<SyndEnclosure> enclosures = entry.getEnclosures();
if(enclosures!=null) {
for(SyndEnclosure enclosure : enclosures) {
if(enclosure.getType()!=null && enclosure.getType().equals("image")){
System.out.println("image URL : "+enclosure.getUrl());
}
}
}
}
and create a rome.properties file which is accessible in classpath with following entry
WireFeedParser.classes=your.package.name.CustomRomeRssParser

How to add <![CDATA[ and ]]> in XML prepared by Jaxb

How to prepare XML with CDATA ,
I am preraring this response via Jaxb,
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:tem="http://tempuri.org/">
<SOAP-ENV:Header/>
<soapenv:Body>
<tem:RequestData>
<tem:requestDocument>
<![CDATA[
<Request>
<Authentication CMId="68" Function="1" Guid="5594FB83-F4D4-431F-B3C5-EA6D7A8BA795" Password="poihg321TR"/>
<Establishment Id="4297867"/>
</Request>
]]>
</tem:requestDocument>
</tem:RequestData>
</soapenv:Body>
</soapenv:Envelope>
But from Jaxb i am not getting CDATA , how to put CDATA inside <tem:requestDocument> element.
Here is my Java Code :
public static String test1() {
try {
initJB();
String response = null;
StringBuffer xmlStr = null;
String strTimeStamp = null;
com.cultagent4.travel_republic.gm.Envelope envelope = null;
com.cultagent4.travel_republic.gm.Header header = null;
com.cultagent4.travel_republic.gm.Body body = null;
com.cultagent4.travel_republic.gm.RequestData requestData = null;
com.cultagent4.travel_republic.gm.RequestDocument requestDocument = null;
com.cultagent4.travel_republic.gm.RequestDocument.Request request = null;
com.cultagent4.travel_republic.gm.RequestDocument.Request.Authentication authentication = null;
com.cultagent4.travel_republic.gm.RequestDocument.Request.Establishment establishment = null;
ObjectFactory objFact = new ObjectFactory();
envelope = objFact.createEnvelope();
header = objFact.createHeader();
envelope.setHeader(header);
body = objFact.createBody();
requestData = objFact.createRequestData();
requestDocument = objFact.createRequestDocument();
request = new RequestDocument.Request();
authentication = new RequestDocument.Request.Authentication();
authentication.setCMId("68");
authentication.setGuid("5594FB83-F4D4-431F-B3C5-EA6D7A8BA795");
authentication.setPassword("poihg321TR");
authentication.setFunction("1");
request.setAuthentication(authentication);
establishment = new RequestDocument.Request.Establishment();
establishment.setId("4297867");
request.setEstablishment(establishment);
requestDocument.setRequest(request);
requestData.setRequestDocument(requestDocument);
body.setRequestData(requestData);
envelope.setBody(body);
jaxbMarshallerForBase = jaxbContextForBase.createMarshaller();
OutputStream os = new ByteArrayOutputStream();
System.out.println();
// output pretty printed
// jaxbMarshallerForBase.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
// jaxbMarshallerForBase.marshal(envelope, System.out);
// jaxbMarshallerForBase.marshal(envelope, os);
jaxbMarshallerForBase.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
jaxbMarshallerForBase.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
// jaxbMarshallerForBase.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, false);
// get an Apache XMLSerializer configured to generate CDATA
XMLSerializer serializer = getXMLSerializer();
// marshal using the Apache XMLSerializer
SAXResult result = new SAXResult(serializer.asContentHandler());
System.out.println("*************");
jaxbMarshallerForBase.marshal(envelope, result);
System.out.println("--------------");
return null;
} catch (JAXBException ex) {
Logger.getLogger(GM_TravelRepublic.class.getName()).log(Level.SEVERE, null, ex);
} finally {
return null;
}
}
private static XMLSerializer getXMLSerializer() {
// configure an OutputFormat to handle CDATA
OutputFormat of = new OutputFormat();
// specify which of your elements you want to be handled as CDATA.
// The use of the ; '^' between the namespaceURI and the localname
// seems to be an implementation detail of the xerces code.
// When processing xml that doesn't use namespaces, simply omit the
// namespace prefix as shown in the third CDataElement below.
of.setCDataElements(new String[]{"^Request","^Authentication","^Establishment"});
// set any other options you'd like
of.setPreserveSpace(true);
of.setIndenting(true);
StringWriter writer = new StringWriter();
// create the serializer
XMLSerializer serializer = new XMLSerializer(of);
serializer.setOutputByteStream(System.out);
return serializer;
}
Here I am getting same xml , but without CDATA. My server is not accepting the request without CDATA.Please help.
Can you make the logic from this
imports
import org.dom4j.CDATA;
import org.dom4j.DocumentHelper;
sample code
public static String appendCdata(String input) {
CDATA cdata = DocumentHelper.createCDATA(input);
return cdata.asXML();
}
You need to create an custom adapter class which extends the XMLAdapter class.
import javax.xml.bind.annotation.adapters.XmlAdapter;
public class CDATAAdapter extends XmlAdapter<String, String> {
#Override
public String marshal(String inStr) throws Exception {
return "<![CDATA[" + inStr + "]]>";
}
#Override
public String unmarshal(String v) throws Exception {
return inStr;
}
}
Inside your Java Bean or POJO define XMLJavaTypeAdapter on the string required in CDATA
#XmlJavaTypeAdapter(value=CDATAAdapter.class)
private String message;
By default, the marshaller implementation of the JAXB RI tries to escape characters. To change this behaviour we write a class that
implements the CharacterEscapeHandler.
This interface has an escape method that needs to be overridden.
import com.sun.xml.internal.bind.marshaller.CharacterEscapeHandler;
m.setProperty("com.sun.xml.internal.bind.characterEscapeHandler",
new CharacterEscapeHandler() {
#Override
public void escape(char[] ch, int start, int length,
boolean isAttVal, Writer writer)
throws IOException {
writer.write(ch, start, length);
}
});
Secondly, it cn also be done via Eclipse MOXy implementation.
CDATA is character data, it looks like your server wants the part of the XML starting with Request to come in as text. It may be enough for you to create an XmlAdapter to convert the instance of Request to a String. The resulting characters will be escaped not in CDATA but this May fit your use case.
Then if you really need it as CDATA in addition to the XmlAdapter you can apply one of the strategies described in the link below:
How to generate CDATA block using JAXB?
From the setCDataElements method description in the Apache docs :
Sets the list of elements for which text node children should be output as CDATA.
What I think that means is, the children of the tem:requestDocument element should all be part of one single text chunk (and not xml elements by themselves) in order for this to work. Once you've done that, probably a simple
of.setCDataElements(new String[]{"tem^requestDocument"});
should do the trick.
Try it and let me know :)
I think that in your private static XMLSerializer getXMLSerializer() method you are setting wrong the CDATA elements, because your CDATA element is <tem:requestDocument> instead of Request Authentication and Establishment which are the content. Try with:
of.setCDataElements(new String[]{"tem^requestDocument","http://tempuri.org/^requestDocument","requestDocument"});
instead of:
of.setCDataElements(new String[]{"^Request","^Authentication","^Establishment"});
Hope this helps,
Your server is expecting <tem:requestDocument> to contain text, and not
a <Request> element. CDATA is really just helpful for creating hand-written
XML so you don't have to worry about escaping embedded XML. The thing is,
JAXB handles escaping just fine and if your server is a good XML citizen it should
treat properly escaped XML the same as XML in a CDATA block.
So, instead of adding a request element inside your requestDocument like
you do in:
requestDocument = objFact.createRequestDocument();
request = new RequestDocument.Request();
...
requestDocument.setRequest(request);
You should first use JAXB to marshal request into a properly escaped String
and set that sa the requestDocument value:
requestDocument = objFact.createRequestDocument();
request = new RequestDocument.Request();
...
String escapedRequest = marshal(request);
requestDocument.setRequest(escapedRequest);
Implementing marshal(request) is left as an exercise. ;)

Get schema location from XML file (noNamespaceSchemaLocation)

We are parsing an XML file with the SAX parser. Is it possible to get the schema location from the XML?
<view id="..." title="..."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{schema}">
I want to retrieve the {schema} value from the XML. Is this possible? And how to I access this value of noNamespaceSchemaLocation? I'm using the default SAX Parser.
#Override
public void startElement(String uri, String localName,
String name, Attributes attributes)
{ .... }
Thank you.
It all depends with what kind of tool/library you are working (a basic SAXParser? Xerces? JDom? ...) But what you want is the value of the attribute "noNamespaceSchemaLocation" in the namspace defined by the URI "http://www.w3.org/2001/XMLSchema-instance"
in JDom, it would be something like:
Element view = ...; // get the view element
String value = view.getAttributeValue("noNamespaceSchemaLocation", Namespace.getNamespace("http://www.w3.org/2001/XMLSchema-instance"));
Here is how I get the XSD's name using XMLStreamReader:
public static String extractXsdValueOrNull(#NonNull final InputStream xmlInput)
{
final XMLInputFactory f = XMLInputFactory.newInstance();
try
{
final XMLStreamReader r = f.createXMLStreamReader(xmlInput);
while (r.hasNext())
{
final int eventType = r.next();
if (XMLStreamReader.START_ELEMENT == eventType)
{
for (int i = 0; i <= r.getAttributeCount(); i++)
{
final boolean foundSchemaNameSpace = XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI.equals(r.getAttributeNamespace(i));
final boolean foundLocationAttributeName = SCHEMA_LOCATION.equals(r.getAttributeLocalName(i));
if (foundSchemaNameSpace && foundLocationAttributeName)
{
return r.getAttributeValue(i);
}
}
return null; // only checked the first element
}
}
return null;
}
catch (final XMLStreamException e)
{
throw new RuntimeException(e);
}
}
Actually XMLStreamReader does all the magic, namely:
only parses the XML's beginning (not the whole XML)
does not assume a particular namespace alias (i.e. xsi)

Convert XML to JSON format

I have to convert docx file format (which is in openXML format) into JSON format. I need some guidelines to do it. Thanks in advance.
You may take a look at the Json-lib Java library, that provides XML-to-JSON conversion.
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
JSON json = xmlSerializer.read( xml );
If you need the root tag too, simply add an outer dummy tag:
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
JSON json = xmlSerializer.read("<x>" + xml + "</x>");
There is no direct mapping between XML and JSON; XML carries with it type information (each element has a name) as well as namespacing. Therefore, unless each JSON object has type information embedded, the conversion is going to be lossy.
But that doesn't necessarily matter. What does matter is that the consumer of the JSON knows the data contract. For example, given this XML:
<books>
<book author="Jimbo Jones" title="Bar Baz">
<summary>Foo</summary>
</book>
<book title="Don't Care" author="Fake Person">
<summary>Dummy Data</summary>
</book>
</books>
You could convert it to this:
{
"books": [
{ "author": "Jimbo Jones", "title": "Bar Baz", "summary": "Foo" },
{ "author": "Fake Person", "title": "Don't Care", "summary": "Dummy Data" },
]
}
And the consumer wouldn't need to know that each object in the books collection was a book object.
Edit:
If you have an XML Schema for the XML and are using .NET, you can generate classes from the schema using xsd.exe. Then, you could parse the source XML into objects of these classes, then use a DataContractJsonSerializer to serialize the classes as JSON.
If you don't have a schema, it will be hard getting around manually defining your JSON format yourself.
The XML class in the org.json namespace provides you with this functionality.
You have to call the static toJSONObject method
Converts a well-formed (but not necessarily valid) XML string into a JSONObject. Some information may be lost in this transformation because JSON is a data format and XML is a document format. XML uses elements, attributes, and content text, while JSON uses unordered collections of name/value pairs and arrays of values. JSON does not does not like to distinguish between elements and attributes. Sequences of similar elements are represented as JSONArrays. Content text may be placed in a "content" member. Comments, prologs, DTDs, and <[ [ ]]> are ignored.
If you are dissatisfied with the various implementations, try rolling your own. Here is some code I wrote this afternoon to get you started. It works with net.sf.json and apache common-lang:
static public JSONObject readToJSON(InputStream stream) throws Exception {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
SAXJsonParser handler = new SAXJsonParser();
parser.parse(stream, handler);
return handler.getJson();
}
And the SAXJsonParser implementation:
package xml2json;
import net.sf.json.*;
import org.apache.commons.lang.StringUtils;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import java.util.ArrayList;
import java.util.List;
public class SAXJsonParser extends DefaultHandler {
static final String TEXTKEY = "_text";
JSONObject result;
List<JSONObject> stack;
public SAXJsonParser(){}
public JSONObject getJson(){return result;}
public String attributeName(String name){return "#"+name;}
public void startDocument () throws SAXException {
stack = new ArrayList<JSONObject>();
stack.add(0,new JSONObject());
}
public void endDocument () throws SAXException {result = stack.remove(0);}
public void startElement (String uri, String localName,String qName, Attributes attributes) throws SAXException {
JSONObject work = new JSONObject();
for (int ix=0;ix<attributes.getLength();ix++)
work.put( attributeName( attributes.getLocalName(ix) ), attributes.getValue(ix) );
stack.add(0,work);
}
public void endElement (String uri, String localName, String qName) throws SAXException {
JSONObject pop = stack.remove(0); // examine stack
Object stashable = pop;
if (pop.containsKey(TEXTKEY)) {
String value = pop.getString(TEXTKEY).trim();
if (pop.keySet().size()==1) stashable = value; // single value
else if (StringUtils.isBlank(value)) pop.remove(TEXTKEY);
}
JSONObject parent = stack.get(0);
if (!parent.containsKey(localName)) { // add new object
parent.put( localName, stashable );
}
else { // aggregate into arrays
Object work = parent.get(localName);
if (work instanceof JSONArray) {
((JSONArray)work).add(stashable);
}
else {
parent.put(localName,new JSONArray());
parent.getJSONArray(localName).add(work);
parent.getJSONArray(localName).add(stashable);
}
}
}
public void characters (char ch[], int start, int length) throws SAXException {
JSONObject work = stack.get(0); // aggregate characters
String value = (work.containsKey(TEXTKEY) ? work.getString(TEXTKEY) : "" );
work.put(TEXTKEY, value+new String(ch,start,length) );
}
public void warning (SAXParseException e) throws SAXException {
System.out.println("warning e=" + e.getMessage());
}
public void error (SAXParseException e) throws SAXException {
System.err.println("error e=" + e.getMessage());
}
public void fatalError (SAXParseException e) throws SAXException {
System.err.println("fatalError e=" + e.getMessage());
throw e;
}
}
Converting complete docx files into JSON does not look like a good idea, because docx is a document centric XML format and JSON is a data centric format. XML in general is designed to be both, document and data centric. Though it is technical possible to convert document centric XML into JSON, handling the generated data might be overly complex. Try to focus on the actual needed data and convert only that part.
If you need to be able to manipulate your XML before it gets converted to JSON, or want fine-grained control of your representation, go with XStream. It's really easy to convert between: xml-to-object, json-to-object, object-to-xml, and object-to-json. Here's an example from XStream's docs:
XML
<person>
<firstname>Joe</firstname>
<lastname>Walnes</lastname>
<phone>
<code>123</code>
<number>1234-456</number>
</phone>
<fax>
<code>123</code>
<number>9999-999</number>
</fax>
</person>
POJO (DTO)
public class Person {
private String firstname;
private String lastname;
private PhoneNumber phone;
private PhoneNumber fax;
// ... constructors and methods
}
Convert from XML to POJO:
String xml = "<person>...</person>";
XStream xstream = new XStream();
Person person = (Person)xstream.fromXML(xml);
And then from POJO to JSON:
XStream xstream = new XStream(new JettisonMappedXmlDriver());
String json = xstream.toXML(person);
Note: although the method reads toXML() XStream will produce JSON, since the Jettison driver is used.
If you have a valid dtd file for the xml snippet, then you can easily convert xml to json and json to xml using the open source eclipse link jar. Detailed sample JAVA project can be found here: http://www.cubicrace.com/2015/06/How-to-convert-XML-to-JSON-format.html
I have come across a tutorial, hope it helps you.
http://www.techrecite.com/xml-to-json-data-parser-converter
Use
xmlSerializer.setForceTopLevelObject(true)
to include root element in resulting JSON.
Your code would be like this
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
xmlSerializer.setForceTopLevelObject(true);
JSON json = xmlSerializer.read(xml);
Docx4j
I've used docx4j before, and it's worth taking a look at.
unXml
You could also check out my open source unXml-library that is available on Maven Central.
It is lightweight, and has a simple syntax to pick out XPaths from your xml, and get them returned as Json attributes in a Jackson ObjectNode.

Categories