Escaping Special Characters with JiBX (Un) Marshalling - java

i want that during marshelling special character should escape,
is there any way to do this?
alt="<i><b> image alt</b></i>"
this is saved as
<b><i>image alt</b></i>
i want to save value as it is

If you store something as XML, you HAVE to escape that signs. Otherwise you XML will become invalid:
<xml>text</xml>
if test == </xml> the XML will be clearly invalid:
<xml></xml></xml>
This must be:
<xml></xml></xml>
If you unmarshall it, it should become the correct value again.
You may also use CDATA

I thought I share my experience, because answers I found weren't quit comprehensive (and I'm still not pretty sure if this is the most professional solution out there).
In our project we use maven-jibx-plugin to generate POJOs from XSDs (in two runs as usual: 1. *.xsd->binding.xml, then 2. binding.xml-> *.java).
Based on documentation of value node and Dennis Sosnoski's answer on jibx mailing list I added xml-maven-plugin to our project build process. I use it to apply an XSL file on generated binding.xml before POJO generation. The point is to change value of style attribute on appropriate value node from text to cdata.
So far it seams it solved my encoding issue and now I can return to client xmls like:
<Description><![CDATA[<strong>Valuable content goes here</strong>...<br />]]></Description>
Hope this makes someones life easier. :)

Related

Camel xpath toLowerCase and contains

We are moving to camel in our application. I need to proccess some xml messages (get values\compare statuses). To solve this problems have bunch of custom processors written using pure java, but I was asked to change this using camel features.
Example of code:
.choice()
.when().xpath("/Response/Header/Status = 'OK' ")......
This is working fine.
Now I need to compare hint with some other hint, to do this I need to set value of:
/Response/Header/Hint
to lower case and check for contains.
If - /Response/Header/Hint value (for example:
<Hint>MyHint</Hint>
- to lower case contains "hint" then route to... otherwise to ....
I am not xpath expert and camel looks like has some changes fo this, so can you please help me with this.
One more thing I am interested, how do I remove whole < Hint>MyHint< /Hint> before passing message forward (remove some tags)
And can you advice some tutorial to get quickly into xpath for camel.
You could use fn:lower-case(string) to compare the hint as explained in How can I convert a string to upper- or lower-case with XSLT?.
About the removal of the <Hint> tag you have mutiple posibilities, like:
Use XSLT to filter the content as shown in remove xml tags with XSLT
Call a Bean that does the filtering
Answer is this:
.choice()
.when().xpath("/Response/Header/Status/text() = 'OK'")
.to("xslt:xsl/RemoveTag.xsl")
.choice().when().xpath("//Response/Header/Hint[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'hint')]").to
RemoveTag.xsl is small changed remove xml tags with XSLT
Great thank to olivier roger!

Java fixed field file format

I would like to know if exists something like JAXB for fixed field ascii text format.
Can be very useful to Marshal java objects to fixed field ascii text file like JAXB do with XML.
Thank you.
I found this, i think it is exactly what i needed. Please mind that needs commons-lang but not last version, i had to use legacy v.2.6 otherwise i got a StringUtils not found.
http://fixedformat4j.ancientprogramming.com/usage/index.html

Java library to escape/clean XML?

I get some malformed xml text input like:
"<Tag>something</Tag> 8 > 3, 2 < 3, ... <Tag>something</Tag>"
I want to clean the input so to get:
"<Tag>something</Tag> 8 > 3, 2 < 3, ... <Tag>something</Tag>"
That is, escape those special symbols like <,> and yet keep the valid tags ("<Tag>something</Tag>, note, with the same case)
Do you know of any java library to do this? Probably a xml/html parser? (though I don't really need a parser, simple a "clean" procedure)
JTidy is "HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML"
But it can also be used with xml. Check the documentation. It's incredible smart, it will probably work for you.
I don't know of any library that would do that. Your input is malformed XML, and no proper XML parser would accept it. More important, it is not always possible to distinguish an actual tag from something that looks-like-a-tag-but-is-really-text. Therefore any heuristic-based attempt that you make to solve the problem will be fragile; i.e. it could occasionally produce malformed XML.
The best approach is address the problem before you assemble the XML.
If you generate the XML by (for example) unparsing a DOM, then the unparser will take care of the escaping for you.
If you are generating the XML by templating or string bashing, then you need to call something like StringEscapeUtils.escapeXml on the relevant text chunks ... before the XML tags get incorporated.
If you leave the problem until after the "XML" has been assembled, it cannot be properly fixed.
The best solution is to fix the program generating your text input. The easiest such fix would involve an escape utility like the other answers suggested. If that's not an option, I'd use a regular expression like
</?[a-zA-Z]+ */?>
to match the expected tags, and then split the string up into tags (which you want to pass through unchanged) and text between tags (against which you want to apply an escape method.)
I wouldn't count on an XML parser to be able to do it for you because what you're dealing with isn't valid XML. It is possible for the existing lack of escaping to produce ambiguities, so you might not be able to do a perfect job either.
Check out Guava's XmlEscaper. It is in pre-release for version 11 but the code is available.
Apache Commons Lang contains a class named StringEscapeUtils which does exactly what you want! The method you'd want to use is escapeXml, I presume.

Expressing markup in Java XML property files: CDATA vs. escaped tags

I am reading and writing Java Properties files in XML format. Many of the property value have HTML embedded, which developers wrap in [[CDATA elements like so:
<entry key="foo"><![CDATA[
<b>bar</b>
]]></entry>
However, when I use the Java API to load these properties and later write them back to XML, it doesn't wrap these entries in CDATA elements, but rather escapes the tags, like so:
<entry key="foo"><b>bar</b></entry>
Are these two formats equivalent? Am I introducing any potential problems by replacing CDATA with escaped tags?
Not equivalent, but the text value you get by calling getText() is the same.
However, I would suggest you to abandon Properties in favor of real XML parsed by JAXB - it's awesome, you'll like it.
Didn't found any nice one, so at least these:
Object -> XML: here
Sun's verbose tutorial: http://java.sun.com/webservices/docs/2.0/tutorial/doc/JAXBUsing.html
When the files are loaded into memory in a Properties object there is no difference between the two formats you have shown, as Ondra Žižka an answer with. CDATA sections are a way to escape a block of text instead of escaping every character in it.
I would consider the non-xml property file format myself, you will continue to see the tags in the raw files, but newline characters would need to be escaped.
Yes you could be inducing some problems, depending on how the data is used.
For example if you use it in a HTML page, A<br>B will print as
A
B
But A<br>B will show as
A<br>B

How to generate an *exact* copy of an XML document with resolved entities

Given an XML document like this:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>&title;</title>
</doc>
I wanted to parse the above XML document and generate a copy of it with all of its entities already resolved. So given the above XMl document, the parser should output:
<!DOCTYPE doc SYSTEM 'http://www.blabla.com/mydoc.dtd'>
<author>john</author>
<doc>
<title>Stack Overflow Madness</title>
</doc>
I know that you could implement an org.xml.sax.EntityResolver to resolve entities, but what I don't know is how to properly generate a copy of the XML document with everything still intact (except its entities). By everything, I mean the whitespaces, the dtd at the top of the document, the comments, and any other things except the entities that should have been resolved previously. If this is not possible, please suggest a way that at least can preserve most of the things (e.g. all but no comments).
Note also that I am restricted to the pure Java API provided by Sun, so no third party libraries can be used here.
Thanks very much!
EDIT: The above XML document is a much simplified version of its original document. The original one involves a very complex entity resolution using EntityResolver whose significance I have greatly reduced in this question. What I am really interested is how to produce an exact copy of the XML document with an XML parser that uses EntityResolver to resolve the entities.
You almost certainly cannot do this using any XML parser I've heard of, and certainly the Sun XML parsers cannot do it. They will happily discard details that have no significance as far as the meaning of the XML is concerned. For example,
<title>Stack Overflow Madness</title>
and
<title >Stack Overflow Madness</title >
are indistinguishable from the perspective of the XML syntax, and the Sun parsers (rightly) treat them as identical.
I think your choices are to do the replacement treating the XML as text (as #Wololo suggests) or relax your requirements.
By the way, you can probably use an XmlEntityResolver independently of the XML parser. Or create a class that does the same thing. This may mean that String.replace... is not the answer, but you should be able to implement an ad-hoc expander that iterates over the characters in a character buffer, expanding them into a second one.
Is it possible for you to read in the xml template as a string?
And with the string do something like
string s = "<title>&title;</title>";
s = s.replace("&title;", "Stack Overflow Madness");
SaveXml(s);

Categories