How to read XML declaration with Java SAX - java

I want to read the XML declaration from an XML file with Java SAX. For example
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
I tried using DefaultHandler, but characters and startElement don't get called for the XML declaration. This is my code:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAXStuff {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
SAXParser sp = SAXParserFactory.newInstance().newSAXParser();
sp.parse("test.xml", new DefaultHandler() {
public void characters(char[] ch, int start, int length) throws SAXException {
for(int i = start; i < start + length; i++) {
System.out.print(ch[i]);
}
}
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println(qName);
}
});
}
}
How can I get the XML declaration using SAX in Java?

Since Java 14, org.xml.sax.ContentHandler has a declaration method for this purpose. DefaultHandler implements ContentHandler, so this method can be overriden to provide a custom action.
This is the method signature:
void declaration​(String version, String encoding, String standalone) throws SAXException
version - the version string as in the input document, null if not specified
encoding - the encoding string as in the input document, null if not specified
standalone - the standalone string as in the input document, null if not specified
Example:
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler(){
#Override
public void declaration(String version, String encoding, String standalone) {
String declaration = "<?xml "
+ (version != null ? "version=\"" + version + "\"": "")
+ (encoding != null ? " encoding=\"" + encoding + "\"": "")
+ (standalone != null ? " standalone=\"" + standalone + "\"": "")
+ "?>";
System.out.println(declaration);
}
};
parser.parse(new File("file.xml"), handler);

Related

Not able to Catch Element using SAX Parser

I'm reading XML file using SAX parser utility.
Here is my sample XML
<?xml version="1.0"?><company><Account AccountNumber="100"><staff><firstname>yong</firstname><firstname>jin</firstname></staff></Account></company>
Here is the code
import java.util.Arrays;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXML {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bAccount = false;
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("ACCOUNT")) {
bAccount = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.println("Im here:" + bAccount);
if (bAccount) {
System.out.println("Account First Name : " + new String(ch, start, length));
bAccount = false;
StringBuilder Account = new StringBuilder();
for (int i = start; i < ch.length - 1; i--) {
if (String.valueOf(ch[i]).equals("<")) {
System.out.println("Account:" +Account);
break;
} else {
Account.append(ch[i]);
}
}
}
}
};
saxParser.parse("C:\\Lenny\\Work\\XML\\Out_SaxParsing_01.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
As you can see in XML, Account tag is something like this Account AccountNumber="100", What I want to do is, I want to capture Tag too as well.
So to achieve that, in characters method, I'm trying to read the array from right to left, So that I could get the Account AccountNumber="100", when Account encountered as event.
But am not able to reach there, The event is getting generated, but its not going to characters method. I think it should go into characters method once Account tag is encountered. But its not..!
May I know please what am missing or doing wrong ?
Any Help please..!
AccountNumber="100" is an attribute of the Account element so inside the startElement handler you have you can read out the attributes parameter to access that value.

Java Replace words within xml

I have the following xml
<some tag>
<some_nested_tag attr="Hello"> Text </some_nested_tag>
Hello world Hello Programming
</some tag>
From the above xml, I want to replace the occurances of the word "Hello" which are part of the tag content but not part of tag attribute.
I want the following output (Replacing Hello by HI):
<some tag>
<some_nested_tag attr="Hello"> Text </some_nested_tag>
HI world HI Programming
</some tag>
I tried java regex and also some of the DOM parser tutorials, but without any luck. I am posting here for help as I have limited time available to fix this in my project. Help would be appreciated.
That can be done by using a negative lookbehind.
Try this regex:
(?<!attr=")Hello
It will match Hello that is not preceded by attr=.
So you could try this:
str = str.replaceAll("(?<!attr=")Hello", "Hi");
It can also be done by negative lookahead:
Hello(?!([^<]+)?>)
string.replaceAll("(?i)\\shello\\s", " HI ");
Regex Explanation:
\sHello\s
Options: Case insensitive
Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
Match the character string “Hello” literally (case insensitive) «Hello»
Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s»
hi
Insert the character string “ HI ” literally « HI »
Regex101 Demo
XSLT is a language for transforming XML documents into other XML documents. You can match all the text nodes containing 'Hello' and replace the content of those particular nodes.
A small example of using XSLT in Java:
import javax.xml.transform.*;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.File;
import java.io.IOException;
import java.net.URISyntaxException;
public class TestMain {
public static void main(String[] args) throws IOException, URISyntaxException, TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
Source xslt = new StreamSource(new File("transform.xslt"));
Transformer transformer = factory.newTransformer(xslt);
Source text = new StreamSource(new File("input.xml"));
transformer.transform(text, new StreamResult(new File("output.xml")));
}
}
There was a good question on replacing string using XSLT - you can find an example of XSLT template there:
XSLT string replace
Here is a fully functional example using SAX parser. It is adapted to your case with minimal changes from this example
The actual replacement takes place in MyCopyHandler#endElement() and MyCopyHandler#startElement() and the XML element text content is collected in MyCopyHandler#characters(). Note the buffer maintenance too - it is important in handling mixed element content (text and child elements)
I know XSLT solution is also possible, but it is not that portable.
public class XMLReplace {
/**
* #param args
* #throws SAXException
* #throws ParserConfigurationException
*/
public static void main(String[] args) throws Exception {
final String str = "<root> Hello <nested attr='Hello'> Text </nested> Hello world Hello Programming </root>";
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setErrorHandler(new MyErrorHandler());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PrintWriter out = new PrintWriter(baos);
MyCopyHandler duper = new MyCopyHandler(out);
reader.setContentHandler(duper);
InputSource is = new InputSource(new StringReader(str));
reader.parse(is);
out.close();
System.out.println(baos);
}
}
class MyCopyHandler implements ContentHandler {
private boolean namespaceBegin = false;
private String currentNamespace;
private String currentNamespaceUri;
private Locator locator;
private final PrintWriter out;
private final StringBuilder buffer = new StringBuilder();
public MyCopyHandler(PrintWriter out) {
this.out = out;
}
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
public void startDocument() {
}
public void endDocument() {
}
public void startPrefixMapping(String prefix, String uri) {
namespaceBegin = true;
currentNamespace = prefix;
currentNamespaceUri = uri;
}
public void endPrefixMapping(String prefix) {
}
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
// Flush buffer - needed in case of mixed content (text + elements)
out.print(buffer.toString().replaceAll("Hello", "HI"));
// Prepare to collect element text content
this.buffer.setLength(0);
out.print("<" + qName);
if (namespaceBegin) {
out.print(" xmlns:" + currentNamespace + "=\"" + currentNamespaceUri + "\"");
namespaceBegin = false;
}
for (int i = 0; i < atts.getLength(); i++) {
out.print(" " + atts.getQName(i) + "=\"" + atts.getValue(i) + "\"");
}
out.print(">");
}
public void endElement(String namespaceURI, String localName, String qName) {
// Process text content
out.print(buffer.toString().replaceAll("Hello", "HI"));
out.print("</" + qName + ">");
// Reset buffer
buffer.setLength(0);
}
public void characters(char[] ch, int start, int length) {
// Store chunk of text - parser is allowed to provide text content in chunks for performance reasons
buffer.append(Arrays.copyOfRange(ch, start, start + length));
}
public void ignorableWhitespace(char[] ch, int start, int length) {
for (int i = start; i < start + length; i++)
out.print(ch[i]);
}
public void processingInstruction(String target, String data) {
out.print("<?" + target + " " + data + "?>");
}
public void skippedEntity(String name) {
out.print("&" + name + ";");
}
}
class MyErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
show("Warning", e);
throw (e);
}
public void error(SAXParseException e) throws SAXException {
show("Error", e);
throw (e);
}
public void fatalError(SAXParseException e) throws SAXException {
show("Fatal Error", e);
throw (e);
}
private void show(String type, SAXParseException e) {
System.out.println(type + ": " + e.getMessage());
System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
System.out.println("System ID: " + e.getSystemId());
}
}

Reading XML for getting entities

I'm using SAX (Simple API for XML) to parse an XML document. My purpose is to parse the document so that i can separate entities from the the XML and create an ER Diagram from these entities (which i will create manually after i get all the entities the file have).
Although i'm on very initial stage of coding every thing i have discussed above, but i' just stuck at this particular problem right now.
here is my code:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Parser extends DefaultHandler {
public void getXml() {
try {
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
final MySet openingTagList = new MySet();
final MySet closingTagList = new MySet();
DefaultHandler defaultHandler = new DefaultHandler() {
public void startDocument() throws SAXException {
System.out.println("Starting Parsing...\n");
}
public void endDocument() throws SAXException {
System.out.print("\n\nDone Parsing!");
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (!openingTagList.contains(qName)) {
openingTagList.add(qName);
System.out.print("<" + qName + ">");
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
for (int i = start; i < (start + length); i++) {
System.out.print(ch[i]);
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (!closingTagList.contains(qName)) {
closingTagList.add(qName);
System.out.print("</" + qName + ">");
}
}
};
saxParser.parse("student.xml", defaultHandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
Parser readXml = new Parser();
readXml.getXml();
}
}
What i'm trying to achieve is when the startElement method detects that the tag was already traversed it should skip the tag as well all the other entities inside the tag, but i'm confused about how to implement that part.
Note: Purpose is to read the tags, i don't care about the records in between them. MySet is just an abstraction which contains method like contains (if the set has the passed data) etc nothing much.
Any help would be appropriated. Thanks
Due to the nature of xml it's not possible to know which tags will appear later in the file. So there is no 'skip the next x bytes'-trick.
Just ask for reasonable sized files - maybe there is a possibility to split the data.
In my opinion reading a xml file with more than 1 gb is no fun - regardless of the used library.

Parsing and updating xml using SAX parser in java

I have an xml file with similar tags ->
<properties>
<definition>
<name>IP</name>
<description></description>
<defaultValue>10.1.1.1</defaultValue>
</definition>
<definition>
<name>Name</name>
<description></description>
<defaultValue>MyName</defaultValue>
</definition>
<definition>
<name>Environment</name>
<description></description>
<defaultValue>Production</defaultValue>
</definition>
</properties>
I want to update the default value of the definition with name : Environment.
Is it possible to do that using SAX parser?
Can you please point me to proper documentation?
So far I have parsed the document but when I update defaultValue, it updates all defaultValues. I dont know how to parse the exact default value tag.
Anything is possible with SAX, it's just waaaaay harder than it has to be. It's pretty old school and there are many easier ways to do this (JAXB, XQuery, XPath, DOM etc ).
That said lets do it with SAX.
It sounds like the problem you are having is that you are not tracking the state of your progress through the document. SAX simply works by making the callbacks when it stumbles across an event within the document
This is a fairly crude way of parsing the doc and updating the relevant node using SAX. Basically I am checking when we hit a element with the value you want to update (Environment) and setting a flag so that when we get to the contents of the defaultValue node, the characters callback lets me remove the existing value and replace it with the new value.
import java.io.StringReader;
import java.util.Arrays;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class Q26897496 extends DefaultHandler {
public static String xmlDoc = "<?xml version='1.0'?>"
+ "<properties>"
+ " <definition>"
+ " <name>IP</name>"
+ " <description></description>"
+ " <defaultValue>10.1.1.1</defaultValue>"
+ " </definition>"
+ " <definition>"
+ " <name>Name</name>"
+ " <description></description>"
+ " <defaultValue>MyName</defaultValue>"
+ " </definition>"
+ " <definition>"
+ " <name>Environment</name>"
+ " <description></description>"
+ " <defaultValue>Production</defaultValue>"
+ " </definition>"
+ "</properties>";
String elementName;
boolean mark = false;
char[] updatedDoc;
public static void main(String[] args) {
Q26897496 q = new Q26897496();
try {
q.parse();
} catch (Exception e) {
e.printStackTrace();
}
}
public Q26897496() {
}
public void parse() throws Exception {
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
SAXParser saxParser = spf.newSAXParser();
XMLReader xml = saxParser.getXMLReader();
xml.setContentHandler(this);
xml.parse(new InputSource(new StringReader(xmlDoc)));
System.out.println("new xml: \n" + new String(updatedDoc));
}
#Override
public void startDocument() throws SAXException {
System.out.println("starting");
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
this.elementName = localName;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String value = new String(ch).substring(start, start + length);
if (elementName.equals("name")) {
if (value.equals("Environment")) {
this.mark = true;
}
}
if (elementName.equals("defaultValue") && mark == true) {
// update
String tmpDoc = new String(ch);
String leading = tmpDoc.substring(0, start);
String trailing = tmpDoc.substring(start + length, tmpDoc.length());
this.updatedDoc = (leading + "NewValueForDefaulValue" + trailing).toCharArray();
mark = false;
}
}
}

How do I parse my simple XML file with Java and SAX?

I am trying to parse the file below. I want to print the id and name of each passenger. Can you give me code to parse it ?
<?xml version="1.0" encoding="utf-8"?>
<root xmlns:android="www.google.com">
<passenger id = "001">
<name>Tom Cruise</name>
</passenger>
<passenger id = "002">
<name>Tom Hanks</name>
</passenger>
</root>
UPDATE
This is what i had tried. Code, problems etc mentioned here -
Error in output of a simple SAX parser
Here is a working example to start with, though I suggest you to use StAX instead, you will see that SAX is not very convenient
import java.io.File;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class SAX2 {
public static void main(String[] args) throws Exception {
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(new File("test.xml"), new DefaultHandler() {
#Override
public void startElement(String uri, String localName,
String qName, Attributes atts) throws SAXException {
if (qName.equals("passenger")) {
System.out.println("id = " + atts.getValue(0));
}
}
#Override
public void endElement(String uri, String localName, String qName)
throws SAXException {
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String text = new String(ch, start, length);
if (!text.trim().isEmpty()) {
System.out.println("name " + text);
}
}
});
}
}
output
id = 001
name Tom Cruise
id = 002
name Tom Hanks
Create a DocumentBuilderFactory.
Obtain a DocumentBuilder from the factory.
Use one of the parse() methods of the builder to create a Document.
Once you have a Document, you can get the passenger Elements with Document's getElementsByTagName() method.
I'm sure you'll be able to work out the rest.
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
InputStream xmlInput = new FileInputStream("theFile.xml");
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new SaxHandler();
saxParser.parse(xmlInput, handler);
} catch (Throwable err) {
err.printStackTrace ();
}
String str = "<?xml version=\"1.0\" encoding=\"utf-8\"?> " +
"<root xmlns:android=\"www.google.com\">" +
"<passenger id = \"001\">" +
"<name>Tom Cruise</name>" +
"</passenger>" +
"<passenger id = \"002\">" +
"<name>Tom Hanks</name>" +
"</passenger>" +
"</root>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(str));
final Document document = db.parse(is);
System.out.println("node Name " + document.getChildNodes().item(0).getChildNodes().item(1).getNodeName());

Categories