Handling XML escape characters (e.g. quotes) using JAXB Marshaller

Handling XML escape characters (e.g. quotes) using JAXB Marshaller - java

I need to serialize an XML java object to a XML file using the JAXB Marshaller (JAXB version 2.2). Now in the xml object, I have a tag which contains String value such that:
"<"tagA>
**"<"YYYYY>done"<"/YYYYY>**
"<"/tagA>
Now as you can see that this string value again contains tags.
I want this to be written in the same way in the xml file.
But JAXB Marshaller converts these values such as:
"&"lt;YYYYY"&"gt;"&"#xD;done ...& so on
I am not able to treat these escape characters separately using JAXB 2.2
Is it possible anyways?
Any help in this regard will be great..
Thanks in advance,
Abhinav Mishra

Done it by setting the following property for the JAXB Marshaller:
marshaller.setProperty("jaxb.encoding", "Unicode");

There is one simpler way. First use custom escape sequence:
m.setProperty(CharacterEscapeHandler.class.getName(), new CharacterEscapeHandler() {
#Override
public void escape(char[] ch, int start, int length, boolean isAttVal, Writer out) throws IOException {
out.write( ch, start, length );
}
});
Then marshal it to a String like mentioned below
StringWriter writer = new StringWriter();
m.marshal(marshalObject, writer);
and then create a document object from the writer mentioned below
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource is = new InputSource( new StringReader( writer.toString() ) );
Document doc = builder.parse( is );
escape characters issue will be resolved

With JAXB marshaller if you want full control over which characters to escape(e.g. "\'") you will have to add property :
Marshaller marshaller = jc.createMarshaller();
marshaller.setProperty(CharacterEscapeHandler.class.getName(), new CustomCharacterEscapeHandler());
and create a new CustomCharacterEscapeHandler class
import com.sun.xml.bind.marshaller.CharacterEscapeHandler;
import java.io.IOException;
import java.io.Writer;
public class CustomCharacterEscapeHandler implements CharacterEscapeHandler {
public CustomCharacterEscapeHandler() {
super();
}
public void escape(char[] ch, int start, int length, boolean isAttVal, Writer out) throws IOException {
// avoid calling the Writerwrite method too much by assuming
// that the escaping occurs rarely.
// profiling revealed that this is faster than the naive code.
int limit = start+length;
for (int i = start; i < limit; i++) {
char c = ch[i];
if(c == '&' || c == '<' || c == '>' || c == '\'' || (c == '\"' && isAttVal) ) {
if(i!=start)
out.write(ch,start,i-start);
start = i+1;
switch (ch[i]) {
case '&':
out.write("&");
break;
case '<':
out.write("<");
break;
case '>':
out.write(">");
break;
case '\"':
out.write(""");
break;
case '\'':
out.write("&apos;");
break;
}
}
}
if( start!=limit )
out.write(ch,start,limit-start);
}
}
Hope that helps.

You can leverage the CDATA structure. Standard JAXB does not cover this structure. There is an extension in EclipseLink JAXB (MOXy) for this (I'm the tech lead). Check out my answer to a related question:
How to generate CDATA block using JAXB?
It describes the #XmlCDATA annotation in MOXy:
import javax.xml.bind.annotation.XmlRootElement;
import org.eclipse.persistence.oxm.annotations.XmlCDATA;
#XmlRootElement(name="c")
public class Customer {
private String bio;
#XmlCDATA
public void setBio(String bio) {
this.bio = bio;
}
public String getBio() {
return bio;
}
}
For more information see:
http://bdoughan.blogspot.com/2010/07/cdata-cdata-run-run-data-run.html

Depending on what you are exactly looking for you can either :
disable character escaping
or use CDATA string which support can be added into JAXB with just a bit of configuration

Related

Java Modify XML

I want to read an XML file in Java and then update certain elements in that file with new values. My file is > 200mb and performance is important, so the DOM model cannot be used.
I feel that a StaX Parser is the solution, but there is no decent literature on using Java StaX to read and then write XML back to the same file.
(For reference I have been using the java tutorial and this helpful tutorial to get what I have so far)
I am using Java 7, but there doesn't seem to be any updates to the XML parsing API since...a long time ago. So this probably isn't relevant.
Currently I have this:
public static String readValueFromXML(final File xmlFile, final String value) throws FileNotFoundException, XMLStreamException
{
XMLEventReader reader = new XMLInputFactory.newFactory().createXMLEventReader(new FileReader(xmlFile));
String found = "";
boolean read = false;
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
if (event.isStartElement() &&
event.asStartElement().getName().getLocalPart().equals(value))
{
read = true;
}
if (event.isCharacters() && read)
{
found = event.asCharacters().getData();
break;
}
}
return found;
}
which will read the XMLFile and return the value of the selected element. However, I have another method updateXMLFile(final File xmlFile, final String value) which I want to use in conjunction with this.
So my question is threefold:
Is there a StaX implementation for editing XML
Will XPath be any help? Can that be used without converting my file to a Document?
(More Generally) Why doesn't Java have a better XML API?

There are two things you may want to look at. The first is to use JAXB to bind the XML to POJOs which you can then have your way with and serialize the structure back to XML when needed.
The second is a JDBC driver for XML, there are several available for a fee, not sure if there are any open source ones or not. In my experience JAXB is the better choice. If the XML file is too large to handle efficiently with JAXB I think you need to look at using a database as a replacement for the XML file.

This is my approach, which reads events from the file using StaX and writes them to another file. The values are updated as the loop passes over the correctly named elements.
public void read(String key, String value)
{
try (FileReader fReader = new FileReader(inputFile); FileWriter fWriter = new FileWriter(outputFile))
{
XMLEventFactory factory = XMLEventFactory.newInstance();
XMLEventReader reader = XMLInputFactory.newFactory().createXMLEventReader(fReader);
XMLEventWriter writer = XMLOutputFactory.newFactory().createXMLEventWriter(fWriter);
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
boolean update = false;
if (event.isStartElement() && event.asStartElement().getName().getLocalPart().equals(key))
{
update = true;
}
else if (event.isCharacters() && update)
{
Characters characters = factory.createCharacters(value);
event = characters;
update = false;
}
writer.add(event);
}
}
catch (XMLStreamException | FactoryConfigurationError | IOException e)
{
e.printStackTrace();
}
}

remove whitespaces inside XML tag with java

I am getting XML with the following tags. What I do is, read the XML file with Java using Sax parser and save them to database. but it seems that spaces are there after the p tag like below.
<Inclusions><![CDATA[<p> </p><ul> <li>Small group walking tour</li> <li>Entrance fees</li> <li>Professional guide </li> <li>Guaranteed to skip the long lines</li> <li>Headsets to hear the guide clearly</li> </ul>
<p></p>]]></Inclusions>
But when we insert the read string to the database(PostgreSQL 8) it is printing bad charactors like below for those spaces.
\011\011\011\011\011\011\011\011\011\011\011\011 Small
group walking tour Entrance fees Professional guide
Guaranteed to skip the long lines Headsets to hear
the guide clearly \012\011\011\011\011\011
I want to know why it is printing bad characters(011\011) like that ?
What is the best way to remove spaces inside XML tags with java? (Or how to prevent those bad characters.)
I have checked samples and most of them with python samples.
This is how the XML reads with SAX in my program,
Method 1
// ResultHandler is the class that used to read the XML.
ResultHandler handler = new ResultHandler();
// Use the default parser
SAXParserFactory factory = SAXParserFactory.newInstance();
// Retrieve the XML file
FileInputStream in = new FileInputStream(new File(inputFile)); // input file is XML.
// Parse the XML input
SAXParser saxParser = factory.newSAXParser();
saxParser.parse( in , handler);
This is how the ResultHandler class used to read the XML as Sax parser with Method-1
import org.apache.log4j.Logger;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
// other imports
class ResultHandler extends DefaultHandler {
public void startDocument ()
{
logger.debug("Start document");
}
public void endDocument ()
{
logger.debug("End document");
}
public void startElement(String namespaceURI, String localName, String qName, Attributes attribs)
throws SAXException {
strValue = "";
// add logic with start of tag.
}
public void characters(char[] ch, int start, int length)
throws SAXException {
//logger.debug("characters");
strValue += new String(ch, start, length);
//logger.debug("strValue-->"+strValue);
}
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {
// add logic to end of tag.
}
}
So that need to know, how to set setIgnoringElementContentWhitespace(true) or similar with sax parser.

You can try to set for your DocumentBuilderFactory
setIgnoringElementContentWhitespace(true)
because of this:
Due to reliance on the content model this setting requires the parser
to be in validating mode
you also need to set
setValidating(true)
Or the str= str.replaceAll("\\s+", ""); might as well work

I'm also finding an exact answer. But think this will help for u.
The C/Modula-3 octal notation; vs there meaning in this link
It says
\011 is for Horizontal tab (ASCII HT)
\012 is for Line feed (ASCII NL, newline)
You can replace multiple spaces with one space as follows
str = str.replaceAll("\s([\s])+", " ");

Reading Java Properties file without escaping values

My application needs to use a .properties file for configuration.
In the properties files, users are allow to specify paths.
Problem
Properties files need values to be escaped, eg
dir = c:\\mydir
Needed
I need some way to accept a properties file where the values are not escaped, so that the users can specify:
dir = c:\mydir

Why not simply extend the properties class to incorporate stripping of double forward slashes. A good feature of this will be that through the rest of your program you can still use the original Properties class.
public class PropertiesEx extends Properties {
public void load(FileInputStream fis) throws IOException {
Scanner in = new Scanner(fis);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext()) {
out.write(in.nextLine().replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}
}
Using the new class is a simple as:
PropertiesEx p = new PropertiesEx();
p.load(new FileInputStream("C:\\temp\\demo.properties"));
p.list(System.out);
The stripping code could also be improved upon but the general principle is there.

Two options:
use the XML properties format instead
Writer your own parser for a modified .properties format without escapes

You can "preprocess" the file before loading the properties, for example:
public InputStream preprocessPropertiesFile(String myFile) throws IOException{
Scanner in = new Scanner(new FileReader(myFile));
ByteArrayOutputStream out = new ByteArrayOutputStream();
while(in.hasNext())
out.write(in.nextLine().replace("\\","\\\\").getBytes());
return new ByteArrayInputStream(out.toByteArray());
}
And your code could look this way
Properties properties = new Properties();
properties.load(preprocessPropertiesFile("path/myfile.properties"));
Doing this, your .properties file would look like you need, but you will have the properties values ready to use.
*I know there should be better ways to manipulate files, but I hope this helps.

The right way would be to provide your users with a property file editor (or a plugin for their favorite text editor) which allows them entering the text as pure text, and would save the file in the property file format.
If you don't want this, you are effectively defining a new format for the same (or a subset of the) content model as the property files have.
Go the whole way and actually specify your format, and then think about a way to either
transform the format to the canonical one, and then use this for loading the files, or
parse this format and populate a Properties object from it.
Both of these approaches will only work directly if you actually can control your property object's creation, otherwise you will have to store the transformed format with your application.
So, let's see how we can define this. The content model of normal property files is simple:
A map of string keys to string values, both allowing arbitrary Java strings.
The escaping which you want to avoid serves just to allow arbitrary Java strings, and not just a subset of these.
An often sufficient subset would be:
A map of string keys (not containing any whitespace, : or =) to string values (not containing any leading or trailing white space or line breaks).
In your example dir = c:\mydir, the key would be dir and the value c:\mydir.
If we want our keys and values to contain any Unicode character (other than the forbidden ones mentioned), we should use UTF-8 (or UTF-16) as the storage encoding - since we have no way to escape characters outside of the storage encoding. Otherwise, US-ASCII or ISO-8859-1 (as normal property files) or any other encoding supported by Java would be enough, but make sure to include this in your specification of the content model (and make sure to read it this way).
Since we restricted our content model so that all "dangerous" characters are out of the way, we can now define the file format simply as this:
<simplepropertyfile> ::= (<line> <line break> )*
<line> ::= <comment> | <empty> | <key-value>
<comment> ::= <space>* "#" < any text excluding line breaks >
<key-value> ::= <space>* <key> <space>* "=" <space>* <value> <space>*
<empty> ::= <space>*
<key> ::= < any text excluding ':', '=' and whitespace >
<value> ::= < any text starting and ending not with whitespace,
not including line breaks >
<space> ::= < any whitespace, but not a line break >
<line break> ::= < one of "\n", "\r", and "\r\n" >
Every \ occurring in either key or value now is a real backslash, not anything which escapes something else.
Thus, for transforming it into the original format, we simply need to double it, like Grekz proposed, for example in a filtering reader:
public DoubleBackslashFilter extends FilterReader {
private boolean bufferedBackslash = false;
public DoubleBackslashFilter(Reader org) {
super(org);
}
public int read() {
if(bufferedBackslash) {
bufferedBackslash = false;
return '\\';
}
int c = super.read();
if(c == '\\')
bufferedBackslash = true;
return c;
}
public int read(char[] buf, int off, int len) {
int read = 0;
if(bufferedBackslash) {
buf[off] = '\\';
read++;
off++;
len --;
bufferedBackslash = false;
}
if(len > 1) {
int step = super.read(buf, off, len/2);
for(int i = 0; i < step; i++) {
if(buf[off+i] == '\\') {
// shift everything from here one one char to the right.
System.arraycopy(buf, i, buf, i+1, step - i);
// adjust parameters
step++; i++;
}
}
read += step;
}
return read;
}
}
Then we would pass this Reader to our Properties object (or save the contents to a new file).
Instead, we could simply parse this format ourselves.
public Properties parse(Reader in) {
BufferedReader r = new BufferedReader(in);
Properties prop = new Properties();
Pattern keyValPattern = Pattern.compile("\s*=\s*");
String line;
while((line = r.readLine()) != null) {
line = line.trim(); // remove leading and trailing space
if(line.equals("") || line.startsWith("#")) {
continue; // ignore empty and comment lines
}
String[] kv = line.split(keyValPattern, 2);
// the pattern also grabs space around the separator.
if(kv.length < 2) {
// no key-value separator. TODO: Throw exception or simply ignore this line?
continue;
}
prop.setProperty(kv[0], kv[1]);
}
r.close();
return prop;
}
Again, using Properties.store() after this, we can export it in the original format.

Based on #Ian Harrigan, here is a complete solution to get Netbeans properties file (and other escaping properties file) right from and to ascii text-files :
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Reader;
import java.io.Writer;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Properties;
/**
* This class allows to handle Netbeans properties file.
* It is based on the work of : http://stackoverflow.com/questions/6233532/reading-java-properties-file-without-escaping-values.
* It overrides both load methods in order to load a netbeans property file, taking into account the \ that
* were escaped by java properties original load methods.
* #author stephane
*/
public class NetbeansProperties extends Properties {
#Override
public synchronized void load(Reader reader) throws IOException {
BufferedReader bfr = new BufferedReader( reader );
ByteArrayOutputStream out = new ByteArrayOutputStream();
String readLine = null;
while( (readLine = bfr.readLine()) != null ) {
out.write(readLine.replace("\\","\\\\").getBytes());
out.write("\n".getBytes());
}//while
InputStream is = new ByteArrayInputStream(out.toByteArray());
super.load(is);
}//met
#Override
public void load(InputStream is) throws IOException {
load( new InputStreamReader( is ) );
}//met
#Override
public void store(Writer writer, String comments) throws IOException {
PrintWriter out = new PrintWriter( writer );
if( comments != null ) {
out.print( '#' );
out.println( comments );
}//if
List<String> listOrderedKey = new ArrayList<String>();
listOrderedKey.addAll( this.stringPropertyNames() );
Collections.sort(listOrderedKey );
for( String key : listOrderedKey ) {
String newValue = this.getProperty(key);
out.println( key+"="+newValue );
}//for
}//met
#Override
public void store(OutputStream out, String comments) throws IOException {
store( new OutputStreamWriter(out), comments );
}//met
}//class

You could try using guava's Splitter: split on '=' and build a map from resulting Iterable.
The disadvantage of this solution is that it does not support comments.

#pdeva: one more solution
//Reads entire file in a String
//available in java1.5
Scanner scan = new Scanner(new File("C:/workspace/Test/src/myfile.properties"));
scan.useDelimiter("\\Z");
String content = scan.next();
//Use apache StringEscapeUtils.escapeJava() method to escape java characters
ByteArrayInputStream bi=new ByteArrayInputStream(StringEscapeUtils.escapeJava(content).getBytes());
//load properties file
Properties properties = new Properties();
properties.load(bi);

It's not an exact answer to your question, but a different solution that may be appropriate to your needs. In Java, you can use / as a path separator and it'll work on both Windows, Linux, and OSX. This is specially useful for relative paths.
In your example, you could use:
dir = c:/mydir

stax - get xml node as string

xml looks like so:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
I'm using stax to process one "<statement>" at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.
using this approach: http://www.devx.com/Java/Article/30298/1954
I'm looking to do something like this:
String statementXml = staxXmlReader.getNodeByName("statement");
//load statementXml into database

I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:
<many-many-tags>
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
</many-many-tags>
My solution:
public static String readElementBody(XMLEventReader eventReader)
throws XMLStreamException {
StringWriter buf = new StringWriter(1024);
int depth = 0;
while (eventReader.hasNext()) {
// peek event
XMLEvent xmlEvent = eventReader.peek();
if (xmlEvent.isStartElement()) {
++depth;
}
else if (xmlEvent.isEndElement()) {
--depth;
// reached END_ELEMENT tag?
// break loop, leave event in stream
if (depth < 0)
break;
}
// consume event
xmlEvent = eventReader.nextEvent();
// print out event
xmlEvent.writeAsEncodedUnicode(buf);
}
return buf.getBuffer().toString();
}
Usage example:
XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
XMLEvent xmlEvent = eventReader.nextEvent();
if (xmlEvent.isStartElement()) {
StartElement elem = xmlEvent.asStartElement();
String name = elem.getName().getLocalPart();
if ("DESCRIPTION".equals(name)) {
String xmlFragment = readElementBody(eventReader);
// do something with it...
System.out.println("'" + fragment + "'");
}
}
else if (xmlEvent.isEndElement()) {
// ...
}
}
Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:
'
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
'

You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.
import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
t.transform(new StAXSource(xsr), new StreamResult(file));
}
}
}

Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?
Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance.
Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time).
This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.

I've been googling and this seems painfully difficult.
given my xml I think it might just be simpler to:
StringBuilder buffer = new StringBuilder();
for each line in file {
buffer.append(line)
if(line.equals(STMT_END_TAG)){
parse(buffer.toString())
buffer.delete(0,buffer.length)
}
}
private void parse(String statement){
//saxParser.parse( new InputSource( new StringReader( xmlText ) );
// do stuff
// save string
}

Why not just use xpath for this?
You could have a fairly simple xpath to get all 'statement' nodes.
Like so:
//statement
EDIT #1: If possible, take a look at dom4j. You could read the String and get all 'statement' nodes fairly simply.
EDIT #2: Using dom4j, this is how you would do it:
(from their cookbook)
String text = "your xml here";
Document document = DocumentHelper.parseText(text);
public void bar(Document document) {
List list = document.selectNodes( "//statement" );
// loop through node data
}

I had the similar problem and found the solution.
I used the solution proposed by #t0r0X but it does not work well in the current implementation in Java 11, the method xmlEvent.writeAsEncodedUnicode creates the invalid string representation of the start element (in the StartElementEvent class) in the result XML fragment, so I had to modify it, but then it seems to work well, what I could immediatelly verify by the parsing of the fragment by DOM and JaxBMarshaller to specific data containers.
In my case I had the huge structure
<Orders>
<ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
.....
</ns2:SyncOrder>
<ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
.....
</ns2:SyncOrder>
...
</Orders>
in the file of multiple hundred megabytes (a lot of repeating "SyncOrder" structures), so the usage of DOM would lead to a large memory consumption and slow evaluation. Therefore I used the StAX to split the huge XML to smaller XML pieces, which I have analyzed with DOM and used the JaxbElements generated from the xsd definition of the element SyncOrder (This infrastructure I had from the webservice, which uses the same structure, but it is not important).
In this code there can be seen Where the XML fragment has een created and could be used, I used it directly in other processing...
private static <T> List<T> unmarshallMultipleSyncOrderXmlData(
InputStream aOrdersXmlContainingSyncOrderItems,
Function<SyncOrderType, T> aConversionFunction) throws XMLStreamException, ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory locDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
locDocumentBuilderFactory.setNamespaceAware(true);
DocumentBuilder locDocBuilder = locDocumentBuilderFactory.newDocumentBuilder();
List<T> locResult = new ArrayList<>();
XMLInputFactory locFactory = XMLInputFactory.newFactory();
XMLEventReader locReader = locFactory.createXMLEventReader(aOrdersXmlContainingSyncOrderItems);
boolean locIsInSyncOrder = false;
QName locSyncOrderElementQName = null;
StringWriter locXmlTextBuffer = new StringWriter();
int locDepth = 0;
while (locReader.hasNext()) {
XMLEvent locEvent = locReader.nextEvent();
if (locEvent.isStartElement()) {
if (locDepth == 0 && Objects.equals(locEvent.asStartElement().getName().getLocalPart(), "Orders")) {
locDepth++;
} else {
if (locDepth <= 0)
throw new IllegalStateException("There has been passed invalid XML stream intot he function. "
+ "Expecting the element 'Orders' as the root alament of the document, but found was '"
+ locEvent.asStartElement().getName().getLocalPart() + "'.");
locDepth++;
if (locSyncOrderElementQName == null) {
/* First element after the "Orders" has passed, so we retrieve
* the name of the element with the namespace prefix: */
locSyncOrderElementQName = locEvent.asStartElement().getName();
}
if(Objects.equals(locEvent.asStartElement().getName(), locSyncOrderElementQName)) {
locIsInSyncOrder = true;
}
}
} else if (locEvent.isEndElement()) {
locDepth--;
if(locDepth == 1 && Objects.equals(locEvent.asEndElement().getName(), locSyncOrderElementQName)) {
locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
/* at this moment the call of locXmlTextBuffer.toString() gets the complete fragment
* of XML containing the valid SyncOrder element, but I have continued to other processing,
* which immediatelly validates the produced XML fragment is valid and passes the values
* to communication object: */
Document locDocument = locDocBuilder.parse(new ByteArrayInputStream(locXmlTextBuffer.toString().getBytes()));
SyncOrderType locItem = unmarshallSyncOrderDomNodeToCo(locDocument);
locResult.add(aConversionFunction.apply(locItem));
locXmlTextBuffer = new StringWriter();
locIsInSyncOrder = false;
}
}
if (locIsInSyncOrder) {
if (locEvent.isStartElement()) {
/* here replaced the standard implementation of startElement's method writeAsEncodedUnicode: */
locXmlTextBuffer.write(startElementToStrng(locEvent.asStartElement()));
} else {
locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
}
}
}
return locResult;
}
private static String startElementToString(StartElement aStartElement) {
StringBuilder locStartElementBuffer = new StringBuilder();
// open element
locStartElementBuffer.append("<");
String locNameAsString = null;
if ("".equals(aStartElement.getName().getNamespaceURI())) {
locNameAsString = aStartElement.getName().getLocalPart();
} else if (aStartElement.getName().getPrefix() != null
&& !"".equals(aStartElement.getName().getPrefix())) {
locNameAsString = aStartElement.getName().getPrefix()
+ ":" + aStartElement.getName().getLocalPart();
} else {
locNameAsString = aStartElement.getName().getLocalPart();
}
locStartElementBuffer.append(locNameAsString);
// add any attributes
Iterator<Attribute> locAttributeIterator = aStartElement.getAttributes();
Attribute attr;
while (locAttributeIterator.hasNext()) {
attr = locAttributeIterator.next();
locStartElementBuffer.append(" ");
locStartElementBuffer.append(attributeToString(attr));
}
// add any namespaces
Iterator<Namespace> locNamespaceIterator = aStartElement.getNamespaces();
Namespace locNamespace;
while (locNamespaceIterator.hasNext()) {
locNamespace = locNamespaceIterator.next();
locStartElementBuffer.append(" ");
locStartElementBuffer.append(attributeToString(locNamespace));
}
// close start tag
locStartElementBuffer.append(">");
// return StartElement as a String
return locStartElementBuffer.toString();
}
private static String attributeToString(Attribute aAttr) {
if( aAttr.getName().getPrefix() != null && aAttr.getName().getPrefix().length() > 0 )
return aAttr.getName().getPrefix() + ":" + aAttr.getName().getLocalPart() + "='" + aAttr.getValue() + "'";
else
return aAttr.getName().getLocalPart() + "='" + aAttr.getValue() + "'";
}
public static SyncOrderType unmarshallSyncOrderDomNodeToCo(
Node aSyncOrderItemNode) {
Source locSource = new DOMSource(aSyncOrderItemNode);
Object locUnmarshalledObject = getMarshallerAndUnmarshaller().unmarshal(locSource);
SyncOrderType locCo = ((JAXBElement<SyncOrderType>) locUnmarshalledObject).getValue();
return locCo;
}

Java Plist XML Parsing

I'm parsing a (not well formed) Apple Plist File with java.
My Code looks like this:
InputStream in = new FileInputStream( "foo" );
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader parser = factory.createXMLEventReader( in );
while (parser.hasNext()){
XMLEvent event = parser.nextEvent();
//code to navigate the nodes
}
The parts I"m parsing are looking like this:
<dict>
<key>foo</key><integer>123</integer>
<key>bar</key><string>Boom & Shroom</string>
</dict>
My problem is now, that nodes containing a ampersand are not parsed like they should because the ampersand is representing a entity.
What can i do to get the value of the node as a complete String, instead of broken parts?
Thank you in advance.

You should be able to solve your problem by setting the IS_COALESCING property on the XMLInputFactory (I also prefer XMLStreamReader over XMLEventReader, but ymmv):
XMLInputFactory factory = XMLInputFactory.newInstance();
factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);
InputStream in = // ...
xmlReader = factory.createXMLStreamReader(in, "UTF-8");
Incidentally, to the best of my knowledge none of the JDK parsers will handle "not well formed" XML without choking. Your XML is, in fact, well-formed: it uses an entity rather than a raw ampersand.

There is a predefined method getElementText(), which is buggy in jdk1.6.0_15, but works ok with jdk1.6.0_19. A complete program to easily parse the plist file is this:
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.XMLEvent;
public class Parser {
public static void main(String[] args) throws XMLStreamException, IOException {
InputStream in = new FileInputStream("foo.xml");
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader parser = factory.createXMLEventReader(in);
assert parser.nextEvent().isStartDocument();
XMLEvent event = parser.nextTag();
assert event.isStartElement();
final String name1 = event.asStartElement().getName().getLocalPart();
if (name1.equals("dict")) {
while ((event = parser.nextTag()).isStartElement()) {
final String name2 = event.asStartElement().getName().getLocalPart();
if (name2.equals("key")) {
String key = parser.getElementText();
System.out.println("key: " + key);
} else if (name2.equals("integer")) {
String number = parser.getElementText();
System.out.println("integer: " + number);
} else if (name2.equals("string")) {
String str = parser.getElementText();
System.out.println("string: " + str);
}
}
}
assert parser.nextEvent().isEndDocument();
}
}

This library enables your Java application to handle property lists of various formats.
Read / write property lists from / to files, streams or byte arrays
Convert between property list formats
Property list contents are provided as objects from the NeXTSTEP environment (NSDictionary, NSArray, NSString, etc.)
Serialize native java data structures to property list objects
Deserialize from property list objects to native java data structures
<dependency>
<groupId>com.googlecode.plist</groupId>
<artifactId>dd-plist</artifactId>
<version>1.26</version>
</dependency>
dd-plist

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Handling XML escape characters (e.g. quotes) using JAXB Marshaller - java

Done it by setting the following property for the JAXB Marshaller: marshaller.setProperty("jaxb.encoding", "Unicode");

Depending on what you are exactly looking for you can either : disable character escaping or use CDATA string which support can be added into JAXB with just a bit of configuration

Related

Java Modify XML

remove whitespaces inside XML tag with java

Reading Java Properties file without escaping values

stax - get xml node as string

Java Plist XML Parsing

Categories

Resources