Escape sequences in XML - java

I'm getting an Exception due to special characters when Xml is accessed by client
Can any one help me...?

You need to set the correct encoding, and make sure the XML document is created with the same encoding.
<?xml version="1.0" encoding="INSERT ENCODING HERE"?>

You will need to ensure the special characters are enclosed within CDATA sections:
<![CDATA[
some special characters here
]]>

I have found my mistake in my case while opening/reading the XML i'm getting the error because of three symbols. So need to replace the Character: <>& by EntityName: <>&. Then only the parsing error will not be displayed.
Click Here to see HTML Symbol Entities Reference
Click Here to Read XML Basic Generation
In other scenario instead symbols the Entity names need to be replaced then only parsing exception will not be displayed.

You can include XML's special characters in XPL. XPL is structured exactly like XML but allows the special characters. The XPL to XML conversion utilities will take care of all the details for you. http://hll.nu

Related

how to escape special characters in xml without escaping xml tags(<>) in java

I want to escape special characters in xml input.
I tried StringEscapeUtils.escapeXml10(xmlString) but it ends up escaping xml tags also(<>).
For example:
<Company>Test & Test</Company>
should normalized to
<Company>Test&Test</Company>
Not
<Company>Test&Test</Company>
You're basically asking how to automatically convert invalid XML to valid XML. That's not a tractable problem, in the general case (imagine for example that you had an embedded < in the actual data).
The correct solution to this problem is to identify why you're starting with invalid XML, and fix that issue.

ignoring & in DOM XML parser

I am trying to parse an XML file using SAX parser.But when it finds an & it gives me an error "The entity name must immediately follow the '&' in the entity reference.".How can i make the parser to ignore '&' while parsing or if possible to convert it into & from the DTD itself
Your input is not valid XML, since it seems to contain & characters which are not followed by an entity name or character reference.
The cleanest way to solve this is to make sure that the input is valid XML before you parse it, i.e. replace the offending & characters with &.
I don't think you can convince any decent XML parser to silently ignore XML syntax errors.
Find the person/entity responsible for producing the invalid XML input
Make sure that person/entity never again in his/her/its life will ever be capable of producing invalid XML again
Repeat for any new offender
Use of unnecessary violence in the apprehension of the XML villains HAS been approved
Or, you can just resign and use TagSoup or something similar.

Escape special characters/Symbols in XML?

while creating a XML using a table in my DB , i got many special characters like registered trademark, trademark, degree, different punctuation, etc (these are present in symbol form , hexadecimal, name code , number code )... . some other words like , °, ...
Also some characters are shown as x99,xEA, etc in my XML.
Is there a library/ API to handle all these while creating XML using JAVA Code.
I am using "UTF-8" character encoding for my XML.
Also i cann't clean my DB to have consistent data since it's production data.
A potential option is to enclose your data in CDATA tags, which marks the data as character data that may include markup, but should not be processed as such.
There is a free command line tool for transforming files with special characters in text to valid XML. It also assures that the file encoding matches what is specified in the declaration.
There is also a Java developer suite that allows you to use the parser to parse such files (called XPL) as an alternative to XML or a pre-process into XML. It uses a StAX-like process called StAX-PL.

Escaping an xml string in java

I read elements with CDATA sections from a rss-feed which I need to convert to valid xml. The content in the CDATA section is mostly valid xhtml, but some times characters like ampersand appear in attributes (url's).
I can use .replaceAll("&", "&") to solve this but thinking a bit forward it may be that other invalid characters show up in attributes or text.
The CMS to which I'm importing the element, won't accept CDATA sections without setting up another configuration for the content, so my question is: is there any simple way to escape the string, only for attributes and text?
I'm using the jdom library to manipulate the xml after the import.
Edit: I've checked out apache's StringEscapeUtils, but this is escaping the whole string. I need something that will only escape attribute values and text inside elements.
Apache Commons provides handy functions for this: StringEscapeUtils
When you use JDOM it will automatically correctly escape ay content that needs it. Is your CMS loaded with the output of JDOM, or are you using some other library to populate the CMS...?
In essence, if you have valid XML input, and you use JDOM (something from org.jdom2.output.*) to output the data, then you will always have good output.... so, what are you doing to have broken output?
Rolf

Handling special characters in XML when transforming with Saxon

I'm attempting to apply a stylesheet to an XML document using Saxon. Given an XML file that was generated in Microsoft Word and that has Microsoft Word-style quotes, such as around FOO in the following document
<?xml version="1.0" encoding="UTF-8"?>
<doc>
<act>
<performer typeCode=“FOO“ />
<performer typeCode="BAR" />
</act>
</doc>
Saxon throws the following error:
SXXP0003: Error reported by XML parser: Invalid byte 1 of 1-byte UTF-8 sequence.
What is the best way to handle these type of "special" characters in XML that were intended to be valid but break in actual parsing/transformation?
Since the above is not valid XML, you will have to do some preprocessing of the input (say with a FilterReader), as just about any XML parser will indicate an error (and typically a fatal error, so you cannot handle the error and continue).
If the special quotes are only in the xml you can do a simple replace of the special quotes with plain quotes (a little more work if you have to check the preamble for the encoding type). If you want to keep special quotes elsewhere in the document you will have to do something a bit more complicated (mostly keep track as to whether you are in a tag or not).
trouble is those 'special' quotes are not valid xml. Saxon or any other xml parser is going to throw that stuff out and not parse the document.
Only thing I can suggest is a search and replace for those and replace them with the expected quotes.

Categories