How do I include &, <, > etc in XML attribute values - java

I want to create an XML file which will be used to store the structure of a Java program. I am able to successfully parse the Java program and create the tags as required. The problem arises when I try to include the source code inside my tags, since Java source code may use a vast number of entity reference and reserved characters like &, < ,> , &. I am not able to create a valid XML.
My XML should go like this:
<?xml version="1.0"?>
<prg name="prg_name">
<class name= "class_name>
<parent>parent class</parent>
<interface>Interface name</interface>
.
.
.
<method name= "method_name">
<statement>the ordinary java statement</statement>
<if condition="Conditional Expression">
<statement> true statements </statement>
</if>
<else>
<statement> false statements </statement>
</else>
<statement> usual control statements </statement>
.
.
.
</method>
</class>
.
.
.
</prg>
Like this, but the problem is conditional expressions of if or other statements have a lot of & or other reserved symbols in them which prevents XML from getting validated. Since all this data (source code) is given by the user I have little control over it. Escaping the characters will be very costly in terms of time.
I can use CDATA to escape the element text but it can not be used for attribute values containing conditional expressions. I am using Antlr Java grammar to parse the Java program and getting the attributes and content for the tags. So is there any other workaround for it?

You will have to escape
" to "
' to &apos;
< to <
> to >
& to &
for xml.

In XML attributes you must escape
" with "
< with <
& with &
if you wrap attribute values in double quotes ("), e.g.
<MyTag attr="If a<b & b<c then a<c, it's obvious"/>
meaning tag MyTag with attribute attr with text If a<b & b<c then a<c, it's obvious - note: no need to use &apos; to escape ' character.
If you wrap attribute values in single quotes (') then you should escape these characters:
' with &apos;
< with <
& with &
and you can write " as is.
Escaping of > with > in attribute text is not required, e.g. <a b=">"/> is well-formed XML.

Related

How to ignore ' ' in Xpath?

I am trying to figure out how to ignore ' ' in the locator while writing xpath.
For example:
//affiliation-summary[#ng-if='oaProfileStatus.oa.workflowState === 'PENDING_CONFIRMATION'']
Now what I am trying to do is ignore ' before P and ' after N in pending and confirmation respectively so that Xpath don't treat these as start and end point of identifier.
Just to be full proof if what I am trying to achieve is not clear yet then this is what we do in java
System.out.println("\"Hello");
to ignore " from terminating the String so that "Hello will be printed.
How can I achieve this in Xpath?
In XPath 1.0 you can put ' in a string by using " as the string delimiter or vice versa. When the XPath is nested in an XML document (e.g. XSLT, XSD, or XProc), you may also need to escape this as an XML character reference, or when XPath is nested in a Java/C# string you may need to escape it as \' or \"
In XPath 2.0 there is an additional option: for a string delimited by single quotes you can include a doubled single quote ('Don''t do it!'),
and for a string delimited by double quotes you can include a doubled double quote ("Don't say ""I can't""").
A useful trick for XPath-within-XSLT, especially with 1.0, is to use variables, for example
<xsl:variable name="quot">"</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="sentence"
select="concat('Don', $apos, 't say ', $quot, 'I can', $apos, 't', $quot)"/>

Skipping Html Content in Tag attributes

I am using SAX Parser to parse following piece of data with "Description" attribute containing HTML content . But I am getting error "The value of attribute "Description" associated with an element type "null" must not contain the '<' character".
How to make SAX Parser ignore this tag while XML Processing?
<Thread ThreadID="22" Title="google"
Description="http://google.com/"
DisplayName="Sam" LoginID="hjaja" UserEmailID="abx#ers"
UserSapCode="12345"
IsAnonymous="Yes" CreatedDate="2015-04-29T21:56:04.943" ReplyCount="0"
ViewCount="0" PopularityPoints="0" LastUpdatedBy="" LastPostDate="" />
Thanks in advance.
I really thing that you should take a look at this post (HTML code inside XML) to see how other people recommended to tackle such problem.
No XML parser can parse this data as the data do not comply the xml format. Please refer XML specifications.
There are two ways you can solve this:
Change the source format
Change the source to create the proper XML. You can include HTMLs by escaping the characters using these:
" "
' &apos;
< <
> >
& &
Change the target algo
Second is by creating your own parsing algorithm for you case.
Usually answer is always the the first one.

xml mapping error

i am working on project , in that there is one xml file (IDE Eclipse Indigo).
I am facing a problem with sincle line
<?xml version="1.0" encoding="UTF-8"?>
<BookingConfirmRQ xmlns="http://www.expediaconnect.com/EQC/BC/2007/09"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Authentication username="yyyyyyyy" password="xxxxxxxx" />
<Hotel id="<hotelId/>" />
<BookingConfirmNumbers>
<BookingConfirmNumber bookingID="<bookindId/>"
bookingType="<bookingType/>" confirmNumber="<confirmNumber/>"
confirmTime="<confirmTime/>" />
</BookingConfirmNumbers>
</BookingConfirmRQ>
Here near < Hotel id="< hotelId/>"/> am getting error like_
The value of attribute "id" associated with an element type "Hotel" must not contain the '<' character.
i search it , checked jar's, reformatted still getting error, can sombody help me?
thank u.
You can ignore validation of XML from eclipse windows-preference-validation menu and this way if you don't want to change you can avoid this error
Attribute values should only contain literal text:
<Hotel id="134" />
You need to escape the angle brackets in the value of the attribute like this:
<Hotel id="<hotelId/>" />
Same with the all the other attributes. The angle brackets are on the list of reserved characters that have to be escaped in XML.
Unless you do that, the XML is not well-formed and nothing will process it. Turning off validation - i.e. validation against a DTD or schema - will not help here. The XML has to be well-formed before it can be parsed.
That said, the XML looks very odd, as if you're including whole XML-elements as the value of attributes which is just wrong. So even if you fix the escaping problem this XML may not say what you meant.

how to use <,> tags in java when create xml file

I want to html tag in xml. I'm using CDATA it is run in xml but I create xml file with java <, > tags was "<". I don't understand this situation.
String returnUrl="<![CDATA[ac=S<br/>DNbZCQOijAl6HrAAyyGV]]>";
Node returnUrlNode = doc.createElement("returnurl");
returnUrlNode.setTextContent(returnUrl);
userNode.appendChild(returnUrlNode);
If for whatever reason you want the text to be in a CDATA section and not a simple text node, you'll need to create the CDATA yourself. I'm assuming you're using the DOM and not some API that looks similar, so it would be:
Node returnUrlNode = doc.createElement("returnurl");
returnUrlNode.appendChild(
doc.createCDATASection(
"Whatever text you wanted to go in here, including unescaped < and >."));
Note that like SLaks pointed out, when the DOM is serialized, all escaping will happen automatically. (In this case, that means that the <![CDATA[ and ]]> will be added automatically.) This is just how you'd create an actual CDATA section if you need the output to be a CDATA section and not a normal text node.
The Java XML APIs will automatically escape your content.
You can just write .setTextContent("ac=S<br/>DNbZCQOijAl6HrAAyyGV"), and Java will excape the < and > for you.
You need to use the XML escape characters:
& &
< <
> >
" "
' &apos;

Java, UnmarshallingException caused by XML attribute with special chars: ;ìè+òàù-<^èç°§_>!£$%&/()=?~`'#;

my xml file has a tag with an attribute "containsValue" which contains the "special" characters you can see in the subject:
<original_msg_body id="msgBodySpecialCharsRule" containsValue=";ìè+òàù-<^èç°§_>!£$%&/()=?~`'#;" />
in my xml schema the attribute has xs:string:
<xs:attribute name="containsValue" type="xs:string" />
I use this value inside a Java software which check if this value is contained inside another String.
but I always obtain this Exception:
javax.xml.bind.UnmarshalException
- with linked exception:
[org.xml.sax.SAXParseException: The value of attribute "containsValue" associated with an element type "original_msg_body" must not contain the '<' character.]
How can I solve it? I've tried changing the attribute type to xs:NMTOKEN, ut I get the same exception. Is there any other type?
I think I could change the characters encoding, for example using the HTML representation, like <, but than could be tricky for the string comparison...
Use entity references: replace < with < and > with &gt etc. in your XML document. Your XML parser will then handle conversion between actual character and its entity reference. That is, in your code you get the actual < or > character.
You need to escape special XML entities like <, >, " with <, >, &quote;

Categories