I'm writing a client which needs to read multiple consecutive small XML documents over a socket. I can assume that the encoding is always UTF-8 and that there is optionally delimiting whitespace between documents. The documents should ultimately go into DOM objects. What is the best way to accomplish this?
The essense of the problem is that the parsers expect a single document in the stream and consider the rest of the content junk. I thought that I could artificially end the document by tracking the element depth, and creating a new reader using the existing input stream. E.g. something like:
// Broken
public void parseInputStream(InputStream inputStream) throws Exception
{
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.newDocument();
XMLEventWriter domWriter = xof.createXMLEventWriter(new DOMResult(doc));
XMLStreamReader xmlStreamReader = factory.createXMLStreamReader(inputStream);
XMLEventReader reader = factory.createXMLEventReader(xmlStreamReader);
int depth = 0;
while (reader.hasNext()) {
XMLEvent evt = reader.nextEvent();
domWriter.add(evt);
switch (evt.getEventType()) {
case XMLEvent.START_ELEMENT:
depth++;
break;
case XMLEvent.END_ELEMENT:
depth--;
if (depth == 0)
{
domWriter.add(eventFactory.createEndDocument());
System.out.println(doc);
reader.close();
xmlStreamReader.close();
xmlStreamReader = factory.createXMLStreamReader(inputStream);
reader = factory.createXMLEventReader(xmlStreamReader);
doc = documentBuilder.newDocument();
domWriter = xof.createXMLEventWriter(new DOMResult(doc));
domWriter.add(eventFactory.createStartDocument());
}
break;
}
}
}
However running this on input such as <a></a><b></b><c></c> prints the first document and throws an XMLStreamException. Whats the right way to do this?
Clarification: Unfortunately the protocol is fixed by the server and cannot be changed, so prepending a length or wrapping the contents would not work.
Length-prefix each document (in bytes).
Read the length of the first document from the socket
Read that much data from the socket, dumping it into a ByteArrayOutputStream
Create a ByteArrayInputStream from the results
Parse that ByteArrayInputStream to get the first document
Repeat for the second document etc
IIRC, XML documents can have comments and processing-instructions at the end, so there's no real way of telling exactly when you have come to the end of the file.
A couple of ways of handling the situation have already been mentioned. Another alternative is to put in an illegal character or byte into the stream, such as NUL or zero. This has the advantage that you don't need to alter the documents and you never need to buffer an entire file.
just change to whatever stream
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import javax.xml.namespace.QName;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
public class LogParser {
private XMLInputFactory inputFactory = null;
private XMLStreamReader xmlReader = null;
InputStream is;
private int depth;
private QName rootElement;
private static class XMLStream extends InputStream
{
InputStream delegate;
StringReader startroot = new StringReader("<root>");
StringReader endroot = new StringReader("</root>");
XMLStream(InputStream delegate)
{
this.delegate = delegate;
}
public int read() throws IOException {
int c = startroot.read();
if(c==-1)
{
c = delegate.read();
}
if(c==-1)
{
c = endroot.read();
}
return c;
}
}
public LogParser() {
inputFactory = XMLInputFactory.newInstance();
}
public void read() throws Exception {
is = new XMLStream(new FileInputStream(new File(
"./myfile.log")));
xmlReader = inputFactory.createXMLStreamReader(is);
while (xmlReader.hasNext()) {
printEvent(xmlReader);
xmlReader.next();
}
xmlReader.close();
}
public void printEvent(XMLStreamReader xmlr) throws Exception {
switch (xmlr.getEventType()) {
case XMLStreamConstants.END_DOCUMENT:
System.out.println("finished");
break;
case XMLStreamConstants.START_ELEMENT:
System.out.print("<");
printName(xmlr);
printNamespaces(xmlr);
printAttributes(xmlr);
System.out.print(">");
if(rootElement==null && depth==1)
{
rootElement = xmlr.getName();
}
depth++;
break;
case XMLStreamConstants.END_ELEMENT:
System.out.print("</");
printName(xmlr);
System.out.print(">");
depth--;
if(depth==1 && rootElement.equals(xmlr.getName()))
{
rootElement=null;
System.out.println("finished element");
}
break;
case XMLStreamConstants.SPACE:
case XMLStreamConstants.CHARACTERS:
int start = xmlr.getTextStart();
int length = xmlr.getTextLength();
System.out
.print(new String(xmlr.getTextCharacters(), start, length));
break;
case XMLStreamConstants.PROCESSING_INSTRUCTION:
System.out.print("<?");
if (xmlr.hasText())
System.out.print(xmlr.getText());
System.out.print("?>");
break;
case XMLStreamConstants.CDATA:
System.out.print("<![CDATA[");
start = xmlr.getTextStart();
length = xmlr.getTextLength();
System.out
.print(new String(xmlr.getTextCharacters(), start, length));
System.out.print("]]>");
break;
case XMLStreamConstants.COMMENT:
System.out.print("<!--");
if (xmlr.hasText())
System.out.print(xmlr.getText());
System.out.print("-->");
break;
case XMLStreamConstants.ENTITY_REFERENCE:
System.out.print(xmlr.getLocalName() + "=");
if (xmlr.hasText())
System.out.print("[" + xmlr.getText() + "]");
break;
case XMLStreamConstants.START_DOCUMENT:
System.out.print("<?xml");
System.out.print(" version='" + xmlr.getVersion() + "'");
System.out.print(" encoding='" + xmlr.getCharacterEncodingScheme()
+ "'");
if (xmlr.isStandalone())
System.out.print(" standalone='yes'");
else
System.out.print(" standalone='no'");
System.out.print("?>");
break;
}
}
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
new LogParser().read();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void printName(XMLStreamReader xmlr) {
if (xmlr.hasName()) {
System.out.print(getName(xmlr));
}
}
private static String getName(XMLStreamReader xmlr) {
if (xmlr.hasName()) {
String prefix = xmlr.getPrefix();
String uri = xmlr.getNamespaceURI();
String localName = xmlr.getLocalName();
return getName(prefix, uri, localName);
}
return null;
}
private static String getName(String prefix, String uri, String localName) {
String name = "";
if (uri != null && !("".equals(uri)))
name += "['" + uri + "']:";
if (prefix != null)
name += prefix + ":";
if (localName != null)
name += localName;
return name;
}
private static void printAttributes(XMLStreamReader xmlr) {
for (int i = 0; i < xmlr.getAttributeCount(); i++) {
printAttribute(xmlr, i);
}
}
private static void printAttribute(XMLStreamReader xmlr, int index) {
String prefix = xmlr.getAttributePrefix(index);
String namespace = xmlr.getAttributeNamespace(index);
String localName = xmlr.getAttributeLocalName(index);
String value = xmlr.getAttributeValue(index);
System.out.print(" ");
System.out.print(getName(prefix, namespace, localName));
System.out.print("='" + value + "'");
}
private static void printNamespaces(XMLStreamReader xmlr) {
for (int i = 0; i < xmlr.getNamespaceCount(); i++) {
printNamespace(xmlr, i);
}
}
private static void printNamespace(XMLStreamReader xmlr, int index) {
String prefix = xmlr.getNamespacePrefix(index);
String uri = xmlr.getNamespaceURI(index);
System.out.print(" ");
if (prefix == null)
System.out.print("xmlns='" + uri + "'");
else
System.out.print("xmlns:" + prefix + "='" + uri + "'");
}
}
A simple solution is to wrap the documents on the sending side in a new root element:
<?xml version="1.0"?>
<documents>
... document 1 ...
... document 2 ...
</documents>
You must make sure that you don't include the XML header (<?xml ...?>), though. If all documents use the same encoding, this can be accomplished with a simple filter which just ignores the first line of each document if it starts with <?xml
Found this forum message (which you probably already saw), which has a solution by wrapping the input stream and testing for one of two ascii characters (see post).
You could try an adaptation on this by first converting to use a reader (for proper character encoding) and then doing element counting until you reach the closing element, at which point you trigger the EOM.
Hi
I also had this problem at work (so won't post resulting the code). The most elegant solution that I could think of, and which works pretty nicely imo, is as follows
Create a class for example DocumentSplittingInputStream which extends InputStream and takes the underlying inputstream in its constructor (or gets set after construction...).
Add a field with a byte array closeTag containing the bytes of the closing root node you are looking for.
Add a field int called matchCount or something, initialised to zero.
Add a field boolean called underlyingInputStreamNotFinished, initialised to true
On the read() implementation:
Check if matchCount == closeTag.length, if it does, set matchCount to -1, return -1
If matchCount == -1, set matchCount = 0, call read() on the underlying inputstream until you get -1 or '<' (the xml declaration of the next document on the stream) and return it. Note that for all I know the xml spec allows comments after the document element, but I knew I was not going to get that from the source so did not bother handling it - if you can not be sure you'll need to change the "gobble" slightly.
Otherwise read an int from the underlying inputstream (if it equals closeTag[matchCount] then increment matchCount, if it doesn't then reset matchCount to zero) and return the newly read byte
Add a method which returns the boolean on whether the underlying stream has closed.
All reads on the underlying input stream should go through a separate method where it checks if the value read is -1 and if so, sets the field "underlyingInputStreamNotFinished" to false.
I may have missed some minor points but i'm sure you get the picture.
Then in the using code you do something like, if you are using xstream:
DocumentSplittingInputStream dsis = new DocumentSplittingInputStream(underlyingInputStream);
while (dsis.underlyingInputStreamNotFinished()) {
MyObject mo = xstream.fromXML(dsis);
mo.doSomething(); // or something.doSomething(mo);
}
David
I had to do something like this and during my research on how to approach it, I found this thread that even though it is quite old, I just replied (to myself) here wrapping everything in its own Reader for simpler use
I was faced with a similar problem. A web service I'm consuming will (in some cases) return multiple xml documents in response to a single HTTP GET request. I could read the entire response into a String and split it, but instead I implemented a splitting input stream based on user467257's post above. Here is the code:
public class AnotherSplittingInputStream extends InputStream {
private final InputStream realStream;
private final byte[] closeTag;
private int matchCount;
private boolean realStreamFinished;
private boolean reachedCloseTag;
public AnotherSplittingInputStream(InputStream realStream, String closeTag) {
this.realStream = realStream;
this.closeTag = closeTag.getBytes();
}
#Override
public int read() throws IOException {
if (reachedCloseTag) {
return -1;
}
if (matchCount == closeTag.length) {
matchCount = 0;
reachedCloseTag = true;
return -1;
}
int ch = realStream.read();
if (ch == -1) {
realStreamFinished = true;
}
else if (ch == closeTag[matchCount]) {
matchCount++;
} else {
matchCount = 0;
}
return ch;
}
public boolean hasMoreData() {
if (realStreamFinished == true) {
return false;
} else {
reachedCloseTag = false;
return true;
}
}
}
And to use it:
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<root>first root</root>" +
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<root>second root</root>";
ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
SplittingInputStream splitter = new SplittingInputStream(is, "</root>");
BufferedReader reader = new BufferedReader(new InputStreamReader(splitter));
while (splitter.hasMoreData()) {
System.out.println("Starting next stream");
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println("line ["+line+"]");
}
}
I use JAXB approach to unmarshall messages from multiply stream:
MultiInputStream.java
public class MultiInputStream extends InputStream {
private final Reader source;
private final StringReader startRoot = new StringReader("<root>");
private final StringReader endRoot = new StringReader("</root>");
public MultiInputStream(Reader source) {
this.source = source;
}
#Override
public int read() throws IOException {
int count = startRoot.read();
if (count == -1) {
count = source.read();
}
if (count == -1) {
count = endRoot.read();
}
return count;
}
}
MultiEventReader.java
public class MultiEventReader implements XMLEventReader {
private final XMLEventReader reader;
private boolean isXMLEvent = false;
private int level = 0;
public MultiEventReader(XMLEventReader reader) throws XMLStreamException {
this.reader = reader;
startXML();
}
private void startXML() throws XMLStreamException {
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
return;
}
}
}
public boolean hasNextXML() {
return reader.hasNext();
}
public void nextXML() throws XMLStreamException {
while (reader.hasNext()) {
XMLEvent event = reader.peek();
if (event.isStartElement()) {
isXMLEvent = true;
return;
}
reader.nextEvent();
}
}
#Override
public XMLEvent nextEvent() throws XMLStreamException {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
level++;
}
if (event.isEndElement()) {
level--;
if (level == 0) {
isXMLEvent = false;
}
}
return event;
}
#Override
public boolean hasNext() {
return isXMLEvent;
}
#Override
public XMLEvent peek() throws XMLStreamException {
XMLEvent event = reader.peek();
if (level == 0) {
while (event != null && !event.isStartElement() && reader.hasNext()) {
reader.nextEvent();
event = reader.peek();
}
}
return event;
}
#Override
public String getElementText() throws XMLStreamException {
throw new NotImplementedException();
}
#Override
public XMLEvent nextTag() throws XMLStreamException {
throw new NotImplementedException();
}
#Override
public Object getProperty(String name) throws IllegalArgumentException {
throw new NotImplementedException();
}
#Override
public void close() throws XMLStreamException {
throw new NotImplementedException();
}
#Override
public Object next() {
throw new NotImplementedException();
}
#Override
public void remove() {
throw new NotImplementedException();
}
}
Message.java
#XmlAccessorType(XmlAccessType.FIELD)
#XmlRootElement(name = "Message")
public class Message {
public Message() {
}
#XmlAttribute(name = "ID", required = true)
protected long id;
public long getId() {
return id;
}
public void setId(long id) {
this.id = id;
}
#Override
public String toString() {
return "Message{id=" + id + '}';
}
}
Read multiply messages:
public static void main(String[] args) throws Exception{
StringReader stringReader = new StringReader(
"<Message ID=\"123\" />\n" +
"<Message ID=\"321\" />"
);
JAXBContext context = JAXBContext.newInstance(Message.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
XMLInputFactory inputFactory = XMLInputFactory.newFactory();
MultiInputStream multiInputStream = new MultiInputStream(stringReader);
XMLEventReader xmlEventReader = inputFactory.createXMLEventReader(multiInputStream);
MultiEventReader multiEventReader = new MultiEventReader(xmlEventReader);
while (multiEventReader.hasNextXML()) {
Object message = unmarshaller.unmarshal(multiEventReader);
System.out.println(message);
multiEventReader.nextXML();
}
}
results:
Message{id=123}
Message{id=321}
Related
I was given exercise that I need to refactor several java projects.
Only those 2 left which I truly don't have an idea how to refactor.
csv.writer
public class CsvWriter {
public CsvWriter() {
}
public void write(String[][] lines) {
for (int i = 0; i < lines.length; i++)
writeLine(lines[i]);
}
private void writeLine(String[] fields) {
if (fields.length == 0)
System.out.println();
else {
writeField(fields[0]);
for (int i = 1; i < fields.length; i++) {
System.out.print(",");
writeField(fields[i]);
}
System.out.println();
}
}
private void writeField(String field) {
if (field.indexOf(',') != -1 || field.indexOf('\"') != -1)
writeQuoted(field);
else
System.out.print(field);
}
private void writeQuoted(String field) {
System.out.print('\"');
for (int i = 0; i < field.length(); i++) {
char c = field.charAt(i);
if (c == '\"')
System.out.print("\"\"");
else
System.out.print(c);
}
System.out.print('\"');
}
}
csv.writertest
public class CsvWriterTest {
#Test
public void testWriter() {
CsvWriter writer = new CsvWriter();
String[][] lines = new String[][] {
new String[] {},
new String[] { "only one field" },
new String[] { "two", "fields" },
new String[] { "", "contents", "several words included" },
new String[] { ",", "embedded , commas, included",
"trailing comma," },
new String[] { "\"", "embedded \" quotes",
"multiple \"\"\" quotes\"\"" },
new String[] { "mixed commas, and \"quotes\"", "simple field" } };
// Expected:
// -- (empty line)
// only one field
// two,fields
// ,contents,several words included
// ",","embedded , commas, included","trailing comma,"
// """","embedded "" quotes","multiple """""" quotes"""""
// "mixed commas, and ""quotes""",simple field
writer.write(lines);
}
}
test
public class Configuration {
public int interval;
public int duration;
public int departure;
public void load(Properties props) throws ConfigurationException {
String valueString;
int value;
valueString = props.getProperty("interval");
if (valueString == null) {
throw new ConfigurationException("monitor interval");
}
value = Integer.parseInt(valueString);
if (value <= 0) {
throw new ConfigurationException("monitor interval > 0");
}
interval = value;
valueString = props.getProperty("duration");
if (valueString == null) {
throw new ConfigurationException("duration");
}
value = Integer.parseInt(valueString);
if (value <= 0) {
throw new ConfigurationException("duration > 0");
}
if ((value % interval) != 0) {
throw new ConfigurationException("duration % interval");
}
duration = value;
valueString = props.getProperty("departure");
if (valueString == null) {
throw new ConfigurationException("departure offset");
}
value = Integer.parseInt(valueString);
if (value <= 0) {
throw new ConfigurationException("departure > 0");
}
if ((value % interval) != 0) {
throw new ConfigurationException("departure % interval");
}
departure = value;
}
}
public class ConfigurationException extends Exception {
private static final long serialVersionUID = 1L;
public ConfigurationException() {
super();
}
public ConfigurationException(String arg0) {
super(arg0);
}
public ConfigurationException(String arg0, Throwable arg1) {
super(arg0, arg1);
}
public ConfigurationException(Throwable arg0) {
super(arg0);
}
}
configuration.test
public class ConfigurationTest{
#Test
public void testGoodInput() throws IOException {
String data = "interval = 10\nduration = 100\ndeparture = 200\n";
Properties input = loadInput(data);
Configuration props = new Configuration();
try {
props.load(input);
} catch (ConfigurationException e) {
assertTrue(false);
return;
}
assertEquals(props.interval, 10);
assertEquals(props.duration, 100);
assertEquals(props.departure, 200);
}
#Test
public void testNegativeValues() throws IOException {
processBadInput("interval = -10\nduration = 100\ndeparture = 200\n");
processBadInput("interval = 10\nduration = -100\ndeparture = 200\n");
processBadInput("interval = 10\nduration = 100\ndeparture = -200\n");
}
#Test
public void testInvalidDuration() throws IOException {
processBadInput("interval = 10\nduration = 99\ndeparture = 200\n");
}
#Test
public void testInvalidDeparture() throws IOException {
processBadInput("interval = 10\nduration = 100\ndeparture = 199\n");
}
#Test
private void processBadInput(String data) throws IOException {
Properties input = loadInput(data);
boolean failed = false;
Configuration props = new Configuration();
try {
props.load(input);
} catch (ConfigurationException e) {
failed = true;
}
assertTrue(failed);
}
#Test
private Properties loadInput(String data) throws IOException {
InputStream is = new StringBufferInputStream(data);
Properties input = new Properties();
input.load(is);
is.close();
return input;
}
}
Ok, here some advice regarding the code.
CsvWriter
The bad thing is that you print everything to System.out. It will be hard to test without mocks. Instead I suggest you to add field PrintStream which defines where all data will go.
import java.io.PrintStream;
public class CsvWriter {
private final PrintStream printStream;
public CsvWriter() {
this.printStream = System.out;
}
public CsvWriter(PrintStream printStream) {
this.printStream = printStream;
}
...
You then write everything to this stream. This refactoring easy since you use replace function(Ctrl+R in IDEA). Here is the example how you do it.
private void writeField(String field) {
if (field.indexOf(',') != -1 || field.indexOf('\"') != -1)
writeQuoted(field);
else
printStream.print(field);
}
Others stuff seems ok in this class.
CsvWriterTest
First thing first you don't check all logic in a single method. Make small methods with different kind of tests. It's ok to keep your current test though. Sometimes it's useful to check most of the logic in a complex scenario.
Also pay attention to the names of the methods. Check this
Obviously you test doesn't check the results. That's why we need this functionality with PrintStream. We can build a PrintStream on top of the instance of ByteArrayOutputStream. We then construct a string and check if the content is valid. Here is how you can easily check what was written
public class CsvWriterTest {
private ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
private PrintStream printStream = new PrintStream(byteArrayOutputStream);
#Test
public void testWriter() {
CsvWriter writer = new CsvWriter(printStream);
... old logic here ...
writer.write(lines);
String result = new String(byteArrayOutputStream.toByteArray());
Assert.assertTrue(result.contains("two,fields"));
Configuration
Make fields private
Make messages more concise
ConfigurationException
Seems good about serialVersionUID. This thing is needed for serialization/deserialization.
ConfigurationTest
Do not use assertTrue(false/failed); Use Assert.fail(String) with some message which is understandable.
Tip: if you don't have much experience and need to refactor code like this, you may want to read some chapters of Effective Java 2nd edition by Joshua Bloch. The book is not so big so you can read it in a week and it has some rules how to write clean and understandable code.
I want to get element path while parsing XML using java StAX2 parser.
How to get information about the current element path?
<root>
<a><b>x</b></a>
</root>
In this example the path is /root/a/b.
Keep a stack. Push the element name on START_ELEMENT and pop it on END_ELEMENT.
Here's a short example. It does nothing other than print the path of the element being processed.
public static void main(String[] args) throws IOException, XMLStreamException {
try (FileInputStream in = new FileInputStream("test.xml")) {
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLStreamReader reader = factory.createXMLStreamReader(in);
LinkedList<String> path = new LinkedList<>();
int next;
while ((next = reader.next()) != XMLStreamConstants.END_DOCUMENT) {
switch (next) {
case XMLStreamConstants.START_ELEMENT:
// push the name of the current element onto the stack
path.addLast(reader.getLocalName());
// print the path with '/' delimiters
System.out.println("Reading /" + String.join("/", path));
break;
case XMLStreamConstants.END_ELEMENT:
// pop the name of the element being closed
path.removeLast();
break;
}
}
}
}
"The chronicler's duty"
Method 1: dedicated stack, #teppic suggestion
try (InputStream in = new ByteArrayInputStream(xml.getBytes())) {
final XMLInputFactory2 factory = (XMLInputFactory2) XMLInputFactory.newInstance();
final XMLStreamReader2 reader = (XMLStreamReader2) factory.createXMLStreamReader(in);
Stack<String> pathStack = new Stack<>();
while (reader.hasNext()) {
reader.next();
if (reader.isStartElement()) {
pathStack.push(reader.getLocalName());
processPath('/' + String.join("/", pathStack));
} else if (reader.isEndElement()) {
pathStack.pop();
}
}
}
Method 2 (ugly): hacking Woodstox's InputElementStack
Implementing adapter to access InputElementStack, its protected mCurrElement and interate parents (this slows down algoritm).
package com.ctc.wstx.sr;
import java.util.LinkedList;
public class StackUglyAdapter {
public static String PATH_SEPARATOR = "/";
private InputElementStack stack;
public StackUglyAdapter(InputElementStack stack) {
this.stack = stack;
}
public String getCurrElementLocalName() {
return this.stack.mCurrElement.mLocalName;
}
public String getCurrElementPath() {
LinkedList<String> list = new LinkedList<String>();
Element el = this.stack.mCurrElement;
while (el != null) {
list.addFirst(el.mLocalName);
el = el.mParent;
}
return PATH_SEPARATOR+String.join(PATH_SEPARATOR,list);
}
}
example of use:
try (final InputStream in = new ByteArrayInputStream(xml.getBytes())) {
final XMLInputFactory2 factory =
(XMLInputFactory2) XMLInputFactory.newInstance();
final XMLStreamReader2 reader =
(XMLStreamReader2) factory.createXMLStreamReader(in);
final StackUglyAdapter stackAdapter =
new StackUglyAdapter(((StreamReaderImpl) reader).getInputElementStack());
while (reader.hasNext()) {
reader.next();
if (reader.isStartElement()) {
processPath(stackAdapter.getCurrElementPath());
}
}
}
Method 1 with dedicated stack is better, because is API implementation-independent and is just as fast as the Method 2.
Is there a simple Java method way of "moving" all XML namespace declarations of an XML document to the root element? Due to a bug in parser implementation of an unnamed huge company, I need to programmatically rewrite our well formed and valid RPC requests in a way that the root element declares all used namespaces.
Not OK:
<document-element xmlns="uri:ns1">
<foo>
<bar xmlns="uri:ns2" xmlns:ns3="uri:ns3">
<ns3:foobar/>
<ns1:sigh xmlns:ns1="uri:ns1"/>
</bar>
</foo>
</document-element>
OK:
<document-element xmlns="uri:ns1" xmlns:ns1="uri:ns1" xmlns:ns2="uri:ns2" xmlns:ns3="uri:ns3">
<foo>
<ns2:bar>
<ns3:foobar/>
<ns1:sigh/>
</ns2:bar>
</foo>
</document-element>
Generic names for missing prefixes are acceptable. Default namespace may stay or be replaced/added as long as it is defined on the root element. I don't really mind which specific XML technology is used to achieve this (I would prefer to avoid DOM though).
To clarify, this answer refers to what I'd like to achieve as redeclaring namespace declarations within root element scope (entire document) on the root element. Essentially the related question is asking why oh why would anyone implement what I now need to work around.
Wrote a two-pass StAX reader/writer, which is simple enough.
import java.io.*;
import java.util.*;
import javax.xml.stream.*;
import javax.xml.stream.events.*;
public class NamespacesToRoot {
private static final String GENERATED_PREFIX = "pfx";
private final XMLInputFactory inputFact;
private final XMLOutputFactory outputFact;
private final XMLEventFactory eventFactory;
private NamespacesToRoot() {
inputFact = XMLInputFactory.newInstance();
outputFact = XMLOutputFactory.newInstance();
eventFactory = XMLEventFactory.newInstance();
}
public String transform(String xmlString) throws XMLStreamException {
Map<String, String> pfxToNs = new HashMap<String, String>();
XMLEventReader reader = null;
// first pass - analyze
try {
if (xmlString == null || xmlString.isEmpty()) {
throw new IllegalArgumentException("xmlString is null or empty");
}
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
buildNamespaces(event, pfxToNs);
}
}
System.out.println(pfxToNs);
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
}
// reverse mapping, also gets rid of duplicates
Map<String, String> nsToPfx = new HashMap<String, String>();
for (Map.Entry<String, String> entry : pfxToNs.entrySet()) {
nsToPfx.put(entry.getValue(), entry.getKey());
}
List<Namespace> namespaces = new ArrayList<Namespace>(nsToPfx.size());
for (Map.Entry<String, String> entry : nsToPfx.entrySet()) {
namespaces.add(eventFactory.createNamespace(entry.getValue(), entry.getKey()));
}
// second pass - rewrite
XMLEventWriter writer = null;
try {
StringWriter stringWriter = new StringWriter();
writer = outputFact.createXMLEventWriter(stringWriter);
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
boolean rootElement = true;
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement origStartElement = event.asStartElement();
String prefix = nsToPfx.get(origStartElement.getName().getNamespaceURI());
String namespace = origStartElement.getName().getNamespaceURI();
String localName = origStartElement.getName().getLocalPart();
Iterator attributes = origStartElement.getAttributes();
Iterator namespaces_;
if (rootElement) {
namespaces_ = namespaces.iterator();
rootElement = false;
} else {
namespaces_ = null;
}
writer.add(eventFactory.createStartElement(
prefix, namespace, localName, attributes, namespaces_));
} else {
writer.add(event);
}
}
writer.flush();
return stringWriter.toString();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
try {
if (writer != null) {
writer.close();
}
} catch (XMLStreamException ex) {
}
}
}
private void buildNamespaces(XMLEvent event, Map<String, String> pfxToNs) {
System.out.println("el: " + event);
StartElement startElement = event.asStartElement();
Iterator nsIternator = startElement.getNamespaces();
while (nsIternator.hasNext()) {
Namespace nsAttr = (Namespace) nsIternator.next();
if (nsAttr.isDefaultNamespaceDeclaration()) {
System.out.println("need to generate a prefix for " + nsAttr.getNamespaceURI());
generatePrefix(nsAttr.getNamespaceURI(), pfxToNs);
} else {
System.out.println("add prefix binding for " + nsAttr.getPrefix() + " --> " + nsAttr.getNamespaceURI());
addPrefix(nsAttr.getPrefix(), nsAttr.getNamespaceURI(), pfxToNs);
}
}
}
private void generatePrefix(String namespace, Map<String, String> pfxToNs) {
int i = 1;
String prefix = GENERATED_PREFIX + i;
while (pfxToNs.keySet().contains(prefix)) {
i++;
prefix = GENERATED_PREFIX + i;
}
pfxToNs.put(prefix, namespace);
}
private void addPrefix(String prefix, String namespace, Map<String, String> pfxToNs) {
String existingNs = pfxToNs.get(prefix);
if (existingNs != null) {
if (existingNs.equals(namespace)) {
// nothing to do
} else {
// prefix clash, need to rename this prefix or reuse an existing
// one
if (pfxToNs.values().contains(namespace)) {
// reuse matching prefix
} else {
// rename
generatePrefix(namespace, pfxToNs);
}
}
} else {
// need to add this prefix
pfxToNs.put(prefix, namespace);
}
}
public static void main(String[] args) throws XMLStreamException {
String xmlString = "" +
"<document-element xmlns=\"uri:ns1\" attr=\"1\">\n" +
" <foo>\n" +
" <bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n" +
" <ns3:foobar ns3:attr1=\"meh\" />\n" +
" <ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n" +
" </bar>\n" +
" </foo>\n" +
"</document-element>";
System.out.println(xmlString);
NamespacesToRoot transformer = new NamespacesToRoot();
System.out.println(transformer.transform(xmlString));
}
}
Note that this is just fast example code which could use some tweaks, but is also a good start for anyone with a similar problem.
Below is a simple app does the namespace re-declaration... based on XPath and VTD-XML.
import com.ximpleware.*;
import java.io.*;
public class moveNSDeclaration {
public static void main(String[] args) throws IOException, VTDException{
// TODO Auto-generated method stub
VTDGen vg = new VTDGen();
String xml="<document-element xmlns=\"uri:ns1\">\n"+
"<foo>\n"+
"<bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n"+
"<ns3:foobar/>\n"+
"<ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n"+
"</bar>\n"+
"</foo>\n"+
"</document-element>\n";
vg.setDoc(xml.getBytes());
vg.parse(false); // namespace unaware to all name space nodes addressable using xpath #*
VTDNav vn = vg.getNav();
XMLModifier xm = new XMLModifier(vn);
FastIntBuffer fib = new FastIntBuffer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// get the index value of xmlns declaration of root element
AutoPilot ap =new AutoPilot (vn);
ap.selectXPath("//#*");
int i=0;
//remove all ns node under root element
//save those nodes to be re-inserted into the root element up on verification of uniqueness
while((i=ap.evalXPath())!=-1){
if (vn.getTokenType(i)==VTDNav.TOKEN_ATTR_NS){
xm.remove(); //remove all ns node
fib.append(i);
}
}
//remove redundant ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
for (i=j+1;i<fib.size();i++){
if (fib.intAt(i)!=-1)
if (vn.compareTokens(fib.intAt(j), vn, fib.intAt(i))==0){
fib.modifyEntry(i, -1);
}
}
}
}
// compose a string to insert back into the root element containing all subordinate ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
int os = vn.getTokenOffset(fib.intAt(j));
int len = vn.getTokenOffset(fib.intAt(j)+1)+vn.getTokenLength(fib.intAt(j)+1)+1-os;
//System.out.println(" os len "+ os + " "+len);
//System.out.println(vn.toString(os,len));
baos.write(" ".getBytes());
baos.write(vn.getXML().getBytes(),os,len);
}
}
byte[] attrBytes = baos.toByteArray();
vn.toElement(VTDNav.ROOT);
xm.insertAttribute(attrBytes);
//System.out.println(baos.toString());
baos.reset();
xm.output(baos);
System.out.println(baos.toString());
}
}
Output looks like
<document-element xmlns="uri:ns2" xmlns:ns3="uri:ns3" xmlns:ns1="uri:ns1" >
<foo>
<bar >
<ns3:foobar/>
<ns1:sigh />
</bar>
</foo>
</document-element>
I'm running a small Android project which could read RSS/Atom Feed documents, using SAX library. Everything works well for default RSS sources, but with minimized sources (without spaces or new line tokens), it produces nothing but a list of blank items. My logs in Log cat also display nothing. I double check this problems with variant RSS sites, but problems still there. Below is my inheritance class of DefaultHandler which I use to handle Rss sources
public class RssContentHandler extends DefaultHandler {
private static final int UNKNOWN_STATE = -1;
private static final int ELEMENT_START = 0;
private static final int TITLE_END = 1;
private static final int DESCRIPTION_END = 2;
private static final int LINK_END = 3;
private static final int PUBDATE_END = 4;
private static final int CHANNEL_END = 5;
private int iState = UNKNOWN_STATE;
private String fullCharacters;
private boolean itemFound = false;
private RssItem rssItem;
private RssFeed rssFeed;
public RssContentHandler() {
}
public RssFeed getFeed() {
return this.rssFeed;
}
#Override
public void startDocument() {
rssItem = new RssItem();
rssFeed = new RssFeed();
Log.i("startDocument", "startDocument");
}
#Override
public void endDocument() {
}
#Override
public void startElement(String _uri, String _localName, String _qName, Attributes _attributes) {
if (_localName.equalsIgnoreCase("item")) {
itemFound = true;
rssItem = new RssItem();
this.iState = UNKNOWN_STATE;
} else
this.iState = ELEMENT_START;
fullCharacters = "";
}
#Override
public void endElement(String _uri, String _localName, String _qName) {
if (_localName.equalsIgnoreCase("item"))
this.rssFeed.addItem(this.rssItem);
else if (_localName.equalsIgnoreCase("title"))
this.iState = TITLE_END;
else if (_localName.equalsIgnoreCase("description"))
this.iState = DESCRIPTION_END;
else if (_localName.equalsIgnoreCase("link"))
this.iState = LINK_END;
else if (_localName.equalsIgnoreCase("pubDate"))
this.iState = PUBDATE_END;
else if (_localName.equalsIgnoreCase("channel"))
this.iState = CHANNEL_END;
else
this.iState = UNKNOWN_STATE;
}
#Override
public void characters(char[] _ch, int _start, int _length) {
String strCharacters = new String(_ch, _start, _length);
if (this.iState == ELEMENT_START)
fullCharacters += strCharacters;
else {
if (!itemFound) {
switch (this.iState) {
case TITLE_END:
this.rssFeed.setTitle(fullCharacters);
break;
case DESCRIPTION_END:
this.rssFeed.setDescription(fullCharacters);
break;
case LINK_END:
this.rssFeed.setLink(fullCharacters);
break;
case PUBDATE_END:
this.rssFeed.setPubDate(fullCharacters);
break;
}
} else {
switch (this.iState) {
case TITLE_END:
this.rssItem.setTitle(fullCharacters);
Log.i("characters", fullCharacters);
break;
case DESCRIPTION_END:
this.rssItem.setDescription(fullCharacters);
break;
case LINK_END:
this.rssItem.setLink(fullCharacters);
break;
case PUBDATE_END:
this.rssItem.setPubDate(fullCharacters);
break;
}
}
this.iState = UNKNOWN_STATE;
}
}
}
and snippet to setup the parser:
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet();
try {
request.setURI(new URI(_strUrl));
} catch (URISyntaxException e) {
e.printStackTrace();
}
HttpResponse response = client.execute(request);
Reader inputStream = new InputStreamReader(response.getEntity().getContent());
RssContentHandler rssContentHandler = new RssContentHandler();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(inputStream);
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
saxParser.parse(inputSource, rssContentHandler);
this.rssFeed = rssContentHandler.getFeed();
P/s: i'm using Android 2.3 x86 installed on VirtualBox for Debugging, and these sources work fine with the built-in RSS Reader app come with the x86 version. So what's wrong here?
Try with _qName instead of _localName.
Your xml contains CDATA so You cann't parse the XML response with your current parser. You have to use LexicalHandler for parsing Raw HTML.
public class MyHandler implements LexicalHandler {
public void startDTD(String name, String publicId, String systemId)
throws SAXException {}
public void endDTD() throws SAXException {}
public void startEntity(String name) throws SAXException {}
public void endEntity(String name) throws SAXException {}
public void startCDATA() throws SAXException {}
public void endCDATA() throws SAXException {}
public void comment (char[] text, int start, int length)
throws SAXException {
String comment = new String(text, start, length);
System.out.println(comment);
}
You can also parse your XML with DOM if memory is not the issue. For more help visit Handling Lexical Events
I'm trying to get only elements that have text, ex xml :
<root>
<Item>
<ItemID>4504216603</ItemID>
<ListingDetails>
<StartTime>10:00:10.000Z</StartTime>
<EndTime>10:00:30.000Z</EndTime>
<ViewItemURL>http://url</ViewItemURL>
....
</item>
It should print
Element Local Name:ItemID
Text:4504216603
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
This code prints also root, item etc. Is it even possible, it must be I just can't google it.
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
InputStream input = new FileInputStream(new File("src/main/resources/file.xml"));
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(input);
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
}
if (event == XMLStreamConstants.CHARACTERS) {
if(!xmlStreamReader.getText().trim().equals("")){
System.out.println("Text:"+xmlStreamReader.getText().trim());
}
}
}
Edit incorrect behaviour :
Element Local Name:root
Element Local Name:item
Element Local Name:ItemID
Text:4504216603
Element Local Name:ListingDetails
Element Local Name:StartTime
Text:10:00:10.000Z
Element Local Name:EndTime
Text:10:00:30.000Z
Element Local Name:ViewItemURL
Text:http://url
I don't want that root and other nodes which don't have text to be printed, just the output which I wrote above. thank you
Try this:
while (xmlStreamReader.hasNext()) {
int event = xmlStreamReader.next();
if (event == XMLStreamConstants.START_ELEMENT) {
try {
String text = xmlStreamReader.getElementText();
System.out.println("Element Local Name:" + xmlStreamReader.getLocalName());
System.out.println("Text:" + text);
} catch (XMLStreamException e) {
}
}
}
SAX based solution (works):
public class Test extends DefaultHandler {
public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException, XPathExpressionException, XMLStreamException {
SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
parser.parse(new File("src/file.xml"), new Test());
}
private String currentName;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
currentName = qName;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
String string = new String(ch, start, length);
if (hasText(string)) {
System.out.println(currentName);
System.out.println(string);
}
}
private boolean hasText(String string) {
string = string.trim();
return string.length() > 0;
}
}
Stax solution :
Parse document
public void parseXML(InputStream xml) {
try {
DOMResult result = new DOMResult();
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
XMLEventReader reader = xmlInputFactory.createXMLEventReader(new StreamSource(xml));
TransformerFactory transFactory = TransformerFactory.newInstance();
Transformer transformer = transFactory.newTransformer();
transformer.transform(new StAXSource(reader), result);
Document document = (Document) result.getNode();
NodeList startlist = document.getChildNodes();
processNodeList(startlist);
} catch (Exception e) {
System.err.println("Something went wrong, this might help :\n" + e.getMessage());
}
}
Now all nodes from the document are in a NodeList so do this next :
private void processNodeList(NodeList nodelist) {
for (int i = 0; i < nodelist.getLength(); i++) {
if (nodelist.item(i).getNodeType() == Node.ELEMENT_NODE && (hasValidAttributes(nodelist.item(i)) || hasValidText(nodelist.item(i)))) {
getNodeNamesAndValues(nodelist.item(i));
}
processNodeList(nodelist.item(i).getChildNodes());
}
}
Then for each element node with valid text get name and value
public void getNodeNamesAndValues(Node n) {
String nodeValue = null;
String nodeName = null;
if (hasValidText(n)) {
while (n != null && isWhiteSpace(n.getTextContent()) == true && StringUtils.isWhitespace(n.getTextContent()) && n.getNodeType() != Node.ELEMENT_NODE) {
n = n.getFirstChild();
}
nodeValue = StringUtils.strip(n.getTextContent());
nodeName = n.getLocalName();
System.out.println(nodeName + " " + nodeValue);
}
}
Bunch of useful methods to check nodes :
private static boolean hasValidAttributes(Node node) {
return (node.getAttributes().getLength() > 0);
}
private boolean hasValidText(Node node) {
String textValue = node.getTextContent();
return (textValue != null && textValue != "" && isWhiteSpace(textValue) == false && !StringUtils.isWhitespace(textValue) && node.hasChildNodes());
}
private boolean isWhiteSpace(String nodeText) {
if (nodeText.startsWith("\r") || nodeText.startsWith("\t") || nodeText.startsWith("\n") || nodeText.startsWith(" "))
return true;
else
return false;
}
I also used StringUtils, you can get that by including this in your pom.xml if you're using maven :
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.5</version>
</dependency>
This is inefficient if you're reading huge files, but not so much if you split them first. This is what I've come with(with google). There are more better solutions this is mine, I'm an amateur(for now).