I am using SAX parser to read data from the xml. Now I want to modify the data using SAX parser. how can I do that?
My XML file is,
<ProductCatalog>
<item id="w1" type="SmartWatches">
<name>Apple Watch</name>
<price>400</price>
<image>abc.jpg</image>
<manufacturer>Apple</manufacturer>
<condition>New</condition>
<discount>10</discount>
<accessories>
<accessory>charger</accessory>
<accessory>belt</accessory>
</accessories>
</item>
<item id="w2" type="SmartWatches">
<name>Apple Watch</name>
<price>400</price>
<image>abc.jpg</image>
<manufacturer>Apple</manufacturer>
<condition>New</condition>
<discount>10</discount>
<accessories>
<accessory>charger</accessory>
<accessory>belt</accessory>
</accessories>
</item>
</ProductCatalog>
And also this is my implementation of SAX parser in java.
public class UserHandler extends DefaultHandler {
private Item item;
private String value;
private String filePath;
private Map<String, Item> map = new HashMap<String, Item>();
UserHandler(String filePath) {
this.filePath = filePath;
parseDocument();
printValue();
}
private void printValue() {
Set keys = map.keySet();
for (Iterator i = keys.iterator(); i.hasNext();) {
String key = (String) i.next();
System.out.println("Key : " + key);
Item itemDemo = map.get(key);
System.out.println("Name : " + itemDemo.getName());
System.out.println("Price : " + itemDemo.getPrice());
System.out.println("Image : " + itemDemo.getImage());
System.out.println("Manufacturer : " + itemDemo.getManufacturer());
System.out.println("Condition : " + itemDemo.getCondition());
System.out.println("Discount : " + itemDemo.getDiscount());
System.out.println("Accessories : " + itemDemo.getAccessories().toString());
}
}
private void parseDocument() {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
parser.parse(filePath, this);
} catch (ParserConfigurationException e) {
System.out.println("ParserConfig error");
} catch (SAXException e) {
System.out.println("SAXException : xml not well formed");
} catch (IOException e) {
e.printStackTrace();
}
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("item")) {
item = new Item();
item.setId(attributes.getValue("id"));
item.setType(attributes.getValue("type"));
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("item")) {
map.put(item.getId(), item);
return;
}
if (qName.equalsIgnoreCase("name")) {
item.setName(value);
return;
}
if (qName.equalsIgnoreCase("price")) {
item.setPrice(value);
return;
}
if (qName.equalsIgnoreCase("image")) {
item.setImage(value);
return;
}
if(qName.equalsIgnoreCase("manufacturer")){
item.setManufacturer(value);
return;
}
if(qName.equalsIgnoreCase("condition")){
item.setCondition(value);
return;
}
if(qName.equalsIgnoreCase("discount")){
item.setDiscount(value);
return;
}
if(qName.equalsIgnoreCase("accessory")){
item.getAccessories().add(value);
return;
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
value = new String(ch, start, length);
}
}
How can I modify a particular attribute in XML file? The changes should be replicated in file also. I cannot use DOM parser.
First of all, you seem to be doing some XML-object binding manually through a SAX parser. It would make a lot more sense to use JAXB, which takes care of this for you. Since you already have an Item class with the necessary properties (getters/setters), it would probably take a minimum of JAXB annotations and a ProductCatalog class (with a list of Items) to make it possible to fully unmarshal XML to objects, and marshal objects to XML.
Depending on how you want to alter things, there's a couple of approaches possible. If you want to alter the content of the XML, but not do things like change the names of elements or attributes, or change namespaces, then using the above mentioned JAXB to turn the XML into Java objects, manipulate those and turn them back into XML would suffice. The code for doing the JAXB marshalling/unmarshalling is minimal, and all the rest would be handling POJOs through familiar Java code.
If you want to drastically alter the structure of your XML, or do things like rename elements/attributes, then you might wish to look into XSLT, which is an XML-based transformation language. XSLT processing in Java is fast and efficient. It takes a bit of getting used to, but with the correct mindset XSLT becomes a very powerful tool in manipulating XML.
Both the JAXB and XML transformation APIs have been part of the Java SE API for quite a long time now, and the standard Java runtime includes default implementations for JAXB 2 and XSLT 1. Implementations are pluggable, in case you want additional functionality. Examples are Moxy, a JAXB implementation with some more advanced binding capacities, and Saxon, an implementation of XSLT 2.
You can even combine these technologies. You can unmarshal Java objects using JAXB from an XML file while going through an XSLT transformation, and the other way around: send a Java object through XSLT into XML.
Related
I have a large XML file containing many key value pairs. The file contains both multi-line comments and actual data. Within the comments section, there are examples of how the data/key-value pairs should be arranged. The SAX parser that I have made successfully retrieves the keys and values from the file, but it also reads the example keys/values contained within the comments, which I do not want to happen. How can I make it so my SAX parser ignores everything within the comments section? I am not allowed to edit the file and I must use java.
Below is an example of the file that I am working with. Notice how there are data tags within the comment section. I do not want to read the sample data within these tags, but my parser records them anyways.
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.
Example:
... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data> **I DO NOT WANT TO READ THIS**
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>
-->
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="AmountUnits" xml:space="preserve">
<value>Amount/Units</value>
</data>
</root>
Here is the code that I am using:
public class xmlPropertiesBuilder extends DefaultHandler {
private boolean valueFound;
public void readXMLFile(File xmlFile) throws SAXException, IOException, ParserConfigurationException {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
parser.parse(xmlFile, this);
valueFound = false;
}
#Override
public void startDocument() throws SAXException {
System.out.println("Start Document");
}
#Override
public void endDocument() throws SAXException {
System.out.println("End Document");
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equals("data")){
System.out.println("Start Element: " + qName);
System.out.println("Key: " + attributes.getValue("name"));
} else if(qName.equals("value")){
valueFound = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("data")){
System.out.println("End Element: " + qName + "\n");
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if(valueFound){
System.out.println("Value: " + new String(ch, start, length));
valueFound = false;
}
}
}
Well it looks like the JAXP SAX parser does in fact ignore data contained within comments. I was just interpreting my tests wrong. In my example XML file, I did not include some tags, one of which is called <reshader>. These reshader tags also contain a <value> tag, which my parser was picking up (which I assumed was from the comments, but it turned out to be from the reshaders).
I was able to fix my problem by adding a boolean variable called 'dataFound' that would only be set to true when a tag was found. Then within my characters method, I simply changed the if condition from if(valueFound){...} to if(dataFound && valueFound){...}. Finally, within the endElement() method, I set the 'dataFound' variable to false whenever a </data> tag was found.
I am querying XML files with size of around 1 MB(20k+ lines). I am using XPath to describe what I want to get and VTD-XML library to get it. I think that I have some problems with performance.
The problem is, I am making about 5k+ queries to XML file. It takes approximately 16-17 seconds to retrieve all values. I want to ask you, if this is normal performance for such task? How I can improve it?
I am using VTD-XML library with AutoPilot navigation approach which give me opportunity to use XPath. Implementation is as following:
private VTDGen vg = new VTDGen();
private VTDNav vn;
private AutoPilot ap = new AutoPilot();
public void init(String xml) {
log.info("Creating document");
xml = xml.replace("<?xml version=\"1.0\"?>", "<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
byte[] bytes = xml.getBytes(StandardCharsets.UTF_8);
vg.setDoc(bytes);
try {
vg.parse(true);
vn = vg.getNav();
} catch (ParseException e) {
e.printStackTrace();
}
log.info("Document created");
}
public String parseXmlOrReturnNull(String query) {
String xPathStringVal = null;
try {
ap.selectXPath(query);
ap.bind(vn);
int i = -1;
while ((i = ap.evalXPath()) != -1) {
xPathStringVal = vn.getXPathStringVal();
}
}catch (XPathEvalException e) {
e.printStackTrace();
} catch (NavException e) {
e.printStackTrace();
} catch (XPathParseException e) {
e.printStackTrace();
}
return xPathStringVal;
}
My xml files have specific format, they are divided into lot of parts - segments, and my queries are same for all segments(I am querying it in a loop). For example part of xml:
<segment>
<a>
<b>value1</b>
<c>
<d>value2</d>
<e>value3</d>
</c>
</a>
</segment>
<segment>
<a>
<b>value4</b>
<c>
<d>value5</d>
<e>value6</d>
<f>value6</d>
</c>
</a>
</segment>
...
If I want to get value1 in first segment I am using query:
//segment[1]/a/b
for value 4 in second segment
//segment[2]/a/b
etc.
Intuition says a few things: in my approach every query is independent (it doesn't know anything about other query), it means that AutoPilot, my iterator, always starts at the beginning of the file when I want to query it.
My question is: Is there any way to set AutoPilot at the beginning of processing segment? And when I finish querying move AutoPilot to next segment? I think that if my method will start searching value not from the beginning but from specifying point It will be much faster.
Another way is to divide xml file into small xml files (one xml file = one segment) and querying those small xml files.
What do you think guys? Thanks in advance
Minor: The replace is not needed as UTF-8 is the default encoding; only when there is an encoding, one would need to patch it to UTF-8.
The XPath should only done once, to not start from [0] to the next index.
If you need a List representation you could use JAXB with annotations.
An event based primitive parsing without DOM object probably is best (SAXParser).
Handler handler = new org.xml.sax.helpers.DefaultHandler {
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
}
};
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputStream in = new ByteArrayInputStream(bytes);
parser.parse(in, handler);
We are parsing an XML file with the SAX parser. Is it possible to get the schema location from the XML?
<view id="..." title="..."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{schema}">
I want to retrieve the {schema} value from the XML. Is this possible? And how to I access this value of noNamespaceSchemaLocation? I'm using the default SAX Parser.
#Override
public void startElement(String uri, String localName,
String name, Attributes attributes)
{ .... }
Thank you.
It all depends with what kind of tool/library you are working (a basic SAXParser? Xerces? JDom? ...) But what you want is the value of the attribute "noNamespaceSchemaLocation" in the namspace defined by the URI "http://www.w3.org/2001/XMLSchema-instance"
in JDom, it would be something like:
Element view = ...; // get the view element
String value = view.getAttributeValue("noNamespaceSchemaLocation", Namespace.getNamespace("http://www.w3.org/2001/XMLSchema-instance"));
Here is how I get the XSD's name using XMLStreamReader:
public static String extractXsdValueOrNull(#NonNull final InputStream xmlInput)
{
final XMLInputFactory f = XMLInputFactory.newInstance();
try
{
final XMLStreamReader r = f.createXMLStreamReader(xmlInput);
while (r.hasNext())
{
final int eventType = r.next();
if (XMLStreamReader.START_ELEMENT == eventType)
{
for (int i = 0; i <= r.getAttributeCount(); i++)
{
final boolean foundSchemaNameSpace = XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI.equals(r.getAttributeNamespace(i));
final boolean foundLocationAttributeName = SCHEMA_LOCATION.equals(r.getAttributeLocalName(i));
if (foundSchemaNameSpace && foundLocationAttributeName)
{
return r.getAttributeValue(i);
}
}
return null; // only checked the first element
}
}
return null;
}
catch (final XMLStreamException e)
{
throw new RuntimeException(e);
}
}
Actually XMLStreamReader does all the magic, namely:
only parses the XML's beginning (not the whole XML)
does not assume a particular namespace alias (i.e. xsi)
I am working with one application in which SAXparsing is placed. To get the City & State name from latitude and longitude I'm using Google API. Google API url google api
I want to get long_name short_name & type of header Tag address_component .
All the information I am getting successfully from this XML but problem is that when I am trying to get type Tag value . There are Two type tag in this header and I am always getting second type tag value .
Sample XML:
<address_component>
<long_name>Gujarat</long_name>
<short_name>Gujarat</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
How can I get type Tag value is administrative_area_level_1 as well as political?
I came across the following link which is really easy to give a start-
http://javarevisited.blogspot.com/2011/12/parse-read-xml-file-java-sax-parser.html
I add the data into one file named as location.xml(if you get this from web do your own logic for getting data after getting that data convert into Inputstream pass it to following code) i wrote a method in that you can get it
public void ReadAndWriteXMLFileUsingSAXParser(){
try
{
DefaultHandler handler = new MyHandler();
// parseXmlFile("infilename.xml", handler, true);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
InputStream rStream = null;
rStream = getClass().getResourceAsStream("location.xml");
saxParser.parse(rStream, handler);
}catch (Exception e)
{
System.out.println(e.getMessage());
}
}
This is MyHandler class. your final data stored into one vector called as "data"
class MyHandler extends DefaultHandler {
String rootname;Attributes atr;
private boolean flag=false;private Vector data;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
rootname=localName;
atr=atts;
if(rootname.equalsIgnoreCase("address_component")){
data=new Vector();
flag=true;
}
}
public void characters(char[] ch, int start, int length){
String value=new String(ch,start,length);
if(flag)
{
if(rootname.equalsIgnoreCase("type")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("long_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("short_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
}
}
public void endElement(String uri, String localName, String qName){
rootname=localName;
if(rootname.equalsIgnoreCase("address_component")){
flag=false;
}
}
}
you can find all data into the data vector and also you can find the data onconsole
as
++++++++++++++Gujarat
++++++++++++++Gujarat
++++++++++++++administrative_area_level_1
++++++++++++++political
Read this tutorial. This will help you to parse xml file using sax parser.
I have to convert docx file format (which is in openXML format) into JSON format. I need some guidelines to do it. Thanks in advance.
You may take a look at the Json-lib Java library, that provides XML-to-JSON conversion.
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
JSON json = xmlSerializer.read( xml );
If you need the root tag too, simply add an outer dummy tag:
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
JSON json = xmlSerializer.read("<x>" + xml + "</x>");
There is no direct mapping between XML and JSON; XML carries with it type information (each element has a name) as well as namespacing. Therefore, unless each JSON object has type information embedded, the conversion is going to be lossy.
But that doesn't necessarily matter. What does matter is that the consumer of the JSON knows the data contract. For example, given this XML:
<books>
<book author="Jimbo Jones" title="Bar Baz">
<summary>Foo</summary>
</book>
<book title="Don't Care" author="Fake Person">
<summary>Dummy Data</summary>
</book>
</books>
You could convert it to this:
{
"books": [
{ "author": "Jimbo Jones", "title": "Bar Baz", "summary": "Foo" },
{ "author": "Fake Person", "title": "Don't Care", "summary": "Dummy Data" },
]
}
And the consumer wouldn't need to know that each object in the books collection was a book object.
Edit:
If you have an XML Schema for the XML and are using .NET, you can generate classes from the schema using xsd.exe. Then, you could parse the source XML into objects of these classes, then use a DataContractJsonSerializer to serialize the classes as JSON.
If you don't have a schema, it will be hard getting around manually defining your JSON format yourself.
The XML class in the org.json namespace provides you with this functionality.
You have to call the static toJSONObject method
Converts a well-formed (but not necessarily valid) XML string into a JSONObject. Some information may be lost in this transformation because JSON is a data format and XML is a document format. XML uses elements, attributes, and content text, while JSON uses unordered collections of name/value pairs and arrays of values. JSON does not does not like to distinguish between elements and attributes. Sequences of similar elements are represented as JSONArrays. Content text may be placed in a "content" member. Comments, prologs, DTDs, and <[ [ ]]> are ignored.
If you are dissatisfied with the various implementations, try rolling your own. Here is some code I wrote this afternoon to get you started. It works with net.sf.json and apache common-lang:
static public JSONObject readToJSON(InputStream stream) throws Exception {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware(true);
SAXParser parser = factory.newSAXParser();
SAXJsonParser handler = new SAXJsonParser();
parser.parse(stream, handler);
return handler.getJson();
}
And the SAXJsonParser implementation:
package xml2json;
import net.sf.json.*;
import org.apache.commons.lang.StringUtils;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import java.util.ArrayList;
import java.util.List;
public class SAXJsonParser extends DefaultHandler {
static final String TEXTKEY = "_text";
JSONObject result;
List<JSONObject> stack;
public SAXJsonParser(){}
public JSONObject getJson(){return result;}
public String attributeName(String name){return "#"+name;}
public void startDocument () throws SAXException {
stack = new ArrayList<JSONObject>();
stack.add(0,new JSONObject());
}
public void endDocument () throws SAXException {result = stack.remove(0);}
public void startElement (String uri, String localName,String qName, Attributes attributes) throws SAXException {
JSONObject work = new JSONObject();
for (int ix=0;ix<attributes.getLength();ix++)
work.put( attributeName( attributes.getLocalName(ix) ), attributes.getValue(ix) );
stack.add(0,work);
}
public void endElement (String uri, String localName, String qName) throws SAXException {
JSONObject pop = stack.remove(0); // examine stack
Object stashable = pop;
if (pop.containsKey(TEXTKEY)) {
String value = pop.getString(TEXTKEY).trim();
if (pop.keySet().size()==1) stashable = value; // single value
else if (StringUtils.isBlank(value)) pop.remove(TEXTKEY);
}
JSONObject parent = stack.get(0);
if (!parent.containsKey(localName)) { // add new object
parent.put( localName, stashable );
}
else { // aggregate into arrays
Object work = parent.get(localName);
if (work instanceof JSONArray) {
((JSONArray)work).add(stashable);
}
else {
parent.put(localName,new JSONArray());
parent.getJSONArray(localName).add(work);
parent.getJSONArray(localName).add(stashable);
}
}
}
public void characters (char ch[], int start, int length) throws SAXException {
JSONObject work = stack.get(0); // aggregate characters
String value = (work.containsKey(TEXTKEY) ? work.getString(TEXTKEY) : "" );
work.put(TEXTKEY, value+new String(ch,start,length) );
}
public void warning (SAXParseException e) throws SAXException {
System.out.println("warning e=" + e.getMessage());
}
public void error (SAXParseException e) throws SAXException {
System.err.println("error e=" + e.getMessage());
}
public void fatalError (SAXParseException e) throws SAXException {
System.err.println("fatalError e=" + e.getMessage());
throw e;
}
}
Converting complete docx files into JSON does not look like a good idea, because docx is a document centric XML format and JSON is a data centric format. XML in general is designed to be both, document and data centric. Though it is technical possible to convert document centric XML into JSON, handling the generated data might be overly complex. Try to focus on the actual needed data and convert only that part.
If you need to be able to manipulate your XML before it gets converted to JSON, or want fine-grained control of your representation, go with XStream. It's really easy to convert between: xml-to-object, json-to-object, object-to-xml, and object-to-json. Here's an example from XStream's docs:
XML
<person>
<firstname>Joe</firstname>
<lastname>Walnes</lastname>
<phone>
<code>123</code>
<number>1234-456</number>
</phone>
<fax>
<code>123</code>
<number>9999-999</number>
</fax>
</person>
POJO (DTO)
public class Person {
private String firstname;
private String lastname;
private PhoneNumber phone;
private PhoneNumber fax;
// ... constructors and methods
}
Convert from XML to POJO:
String xml = "<person>...</person>";
XStream xstream = new XStream();
Person person = (Person)xstream.fromXML(xml);
And then from POJO to JSON:
XStream xstream = new XStream(new JettisonMappedXmlDriver());
String json = xstream.toXML(person);
Note: although the method reads toXML() XStream will produce JSON, since the Jettison driver is used.
If you have a valid dtd file for the xml snippet, then you can easily convert xml to json and json to xml using the open source eclipse link jar. Detailed sample JAVA project can be found here: http://www.cubicrace.com/2015/06/How-to-convert-XML-to-JSON-format.html
I have come across a tutorial, hope it helps you.
http://www.techrecite.com/xml-to-json-data-parser-converter
Use
xmlSerializer.setForceTopLevelObject(true)
to include root element in resulting JSON.
Your code would be like this
String xml = "<hello><test>1.2</test><test2>123</test2></hello>";
XMLSerializer xmlSerializer = new XMLSerializer();
xmlSerializer.setForceTopLevelObject(true);
JSON json = xmlSerializer.read(xml);
Docx4j
I've used docx4j before, and it's worth taking a look at.
unXml
You could also check out my open source unXml-library that is available on Maven Central.
It is lightweight, and has a simple syntax to pick out XPaths from your xml, and get them returned as Json attributes in a Jackson ObjectNode.