I have a confusion on this requirement how to do it.
I receive an xml as a string from the database and need to find the value of particular elements inside the xml string. Here, my thought was,
1- convert String to xml.
2 - loop the xml using NodeList and DocumentBuilder (OR) Use JaxB. which one is the better option?
I'd definitely recommend JAXB instead of doing it by hand but if you're a bit masochistic it's doable by hand :3
One more option is to use Regular Expressions or use Groovy:)
Related
What is the most efficient to search an element?
Would it take to traverse through the complete DOM4j document?Should I use XPATH here?
I am actually comparing two XML documents. Will iterate through first xml one by one and search for it the second xml document.
It is not a straightforward comparison. I would be comparing name attribute value with second xml's elements. And if first xml has any name such as name="xx.yy" then I need to look for <xx>
<yy></yy>
</xx> in second xml.
Maybe you could use Jsoup for this? I don'k know what kind of comparison are you up to, but with Jsoup you could simply select all nodes from both XMLs and iterate over both collections in one loop.
Jsoup is very effective and easy to use if you need to select random node just by its attribute (any attribute) tag name or content.
The two approaches I usually follow are:
Convert the HTML to a string, and then test it against a target string. The problem with this approach is that it is too brittle, and there'll be very frequent false negatives due to say, things like extra whitespace somewhere.
Convert the HTML to a string and parse it back as an XML, and then use XPath queries to assert on specific nodes. This approach works well but not all HTML comes with closing tags and parsing it as XML fails in such cases.
Both these approaches have serious flaws. I imagine there must be a well-established approach (or approaches) for this sort of tests. What is it?
You could use jsoup or JTidy instead of XML parsing and use your second strategy.
We are moving to camel in our application. I need to proccess some xml messages (get values\compare statuses). To solve this problems have bunch of custom processors written using pure java, but I was asked to change this using camel features.
Example of code:
.choice()
.when().xpath("/Response/Header/Status = 'OK' ")......
This is working fine.
Now I need to compare hint with some other hint, to do this I need to set value of:
/Response/Header/Hint
to lower case and check for contains.
If - /Response/Header/Hint value (for example:
<Hint>MyHint</Hint>
- to lower case contains "hint" then route to... otherwise to ....
I am not xpath expert and camel looks like has some changes fo this, so can you please help me with this.
One more thing I am interested, how do I remove whole < Hint>MyHint< /Hint> before passing message forward (remove some tags)
And can you advice some tutorial to get quickly into xpath for camel.
You could use fn:lower-case(string) to compare the hint as explained in How can I convert a string to upper- or lower-case with XSLT?.
About the removal of the <Hint> tag you have mutiple posibilities, like:
Use XSLT to filter the content as shown in remove xml tags with XSLT
Call a Bean that does the filtering
Answer is this:
.choice()
.when().xpath("/Response/Header/Status/text() = 'OK'")
.to("xslt:xsl/RemoveTag.xsl")
.choice().when().xpath("//Response/Header/Hint[contains(translate(., 'ABCDEFGHIJKLMNOPQRSTUVWXYZ', 'abcdefghijklmnopqrstuvwxyz'), 'hint')]").to
RemoveTag.xsl is small changed remove xml tags with XSLT
Great thank to olivier roger!
I need some help reading an xml document.
I got a Class Person and i want create a List from that xml
the xml is something like:
<root>
<field1></field1>
<field2></field1>
<field3></field1>
<Persons>
<id></id>
<List>
<Person>
<Name>...</Name>
<LastName>...</LastName>
</Person>
<Person>
<Name>...</Name>
<LastName>...</LastName>
</Person>
<Person>
<Name>...</Name>
<LastName>...</LastName>
</Person>
</List
</Persons>
<field4></field1>
<field5></field1>
<field6></field1>
</root>
i'm using dom parser (org.w3c.dom)
Can anyone please sohw me what's the best way to get the Persons information ?
Thanks
If you want only read info, you'd better (after loading DOM) use XPath on it. XPath is present in J2SE API. Write if you need special examples.
You have to use Simple API for XML (SAX). You may also use Streaming API for XML (StaX) (tutorial).
I prefer JAXB. Its also present in the J2SE API.
Write if you need help.
I hate to just leave this here, but I answered a similar question here.
In Java you have quite a few options on actually parsing the XML - XPath will be the slowest but gives you a nice expression language to query the content with. DOM will be the second slowest but give you a tree-model in memory of your doc to walk. SAX will be faster, but requires you build the list as it parses through the doc on the fly and lastly STAX will be the fastest, but requires that you write some specific code to your format to build your list out.
Lastly, I would recommend a library I wrote called SJXP that gives you the performance of STAX with the ease of XPath... it is the perfect blend of the two.
You write rules like "/root/Persons/list/Person/Name" and give it your doc and it will fire every time it hits a name and call a user-provided callback for you, handing you the name it found.
You create a few rules for all the values you want and viola... you can create a START_TAG rule for the "/root/Persons/list/Person" open-tag, and create a new "Person p = new Person()" in your code, then as every sub-element hits, you just set the appropriate value on the person, something like this (as an example):
IRule linkRule = new DefaultRule(Type.CHARACTER, "/root/Persons/list/Person/Name") {
#Override
public void handleParsedCharacters(XMLParser parser, String text, Object userObject) {
// Get the last person we added on open-tag.
Person p = personList.get(personList.size() - 1);
// <Name> tag was parsed, 'text' is our parsed Name. Set it.
p.setName(text);
}
}
The nice thing about SJXP is that the memory overhead is lower than the other parser approaches and performance higher (SAX will parse the elements on a match, STAX-based parsing doesn't parse the elements out of the stream until they are requested).
You will end up writing equally confusing code just to traverse your DOM and all the Node elements to build your list.
LASTLY, if you felt comfortable with XML->Object mapping, you could do what another person said and leverage JAXB. You will need to write a schema for your XML files, then it will generate Java objects for you that map perfect to them. Then you can just map your XML file directly to your Java object and call something like "persons.getList()" or whatever JAXB generates for you.
The memory overhead and performance will be on par with DOM parsing in that case (roughly).
XPath is one of the solution,
if you do not want to use another library...
Than try defining the DTD and using the ID parameter, most of the parsers have getElementById(ID) funciton
Another easy way is to use regular expressions:
Pattern pattern = Pattern.compile("<Person>.*?<Name>(.*?)</Name>.*?<LastName>(.*?)</LastName>.*?</Person>", Pattern.MULTILINE | Pattern.DOTALL);
Matcher matcher = pattern.matcher(xml);
while (matcher.find())
{
String name = matcher.group(1);
String lastName = matcher.group(2);
}
Store the name and lastName in your own Persons-Datastructure.
Define the Pattern.compile command as a constant outside your method because it needs time for initialization.
Please see
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
I'm using a DocumentBuilder to parse XML files. However, the specification for the project requires that within text nodes, strings like " and < be returned literally, and not decoded as characters (" and <).
A previous similar question, Read escaped quote as escaped quote from xml, received one answer that seems to be specific to Apache, and another that appears to simply not not do what it says it does. I'd love to be proven wrong on either count, however :)
For reference, here is some code:
file = new File(fileName);
DocBderFac = DocumentBuilderFactory.newInstance();
DocBder = DocBderFac.newDocumentBuilder();
doc = DocBder.parse(file);
NodeList textElmntLst = doc.getElementsByTagName(text);
Element textElmnt = (Element) textElmntLst.item(0);
NodeList txts = textElmnt.getChildNodes();
String txt = ((Node) txts.item(0)).getNodeValue();
System.out.println(txt);
I would like that println() to produce things like
"3>2"
instead of
"3>2"
which is what currently happens.
Thanks!
You can turn them back into xml-encoded form by
StringEscapeUtils.escapeXml(str);
(javadoc, commons-lang)
I'm using a DocumentBuilder to parse XML files. However, the specification for the project requires that within text nodes, strings like " and < be returned literally, and not decoded as characters (" and <).
Bad requirement. Don't do that.
Or at least consider carefully why you think you want or need it.
CDATA sections and escapes are a tactic for allowing you to pass text like quotes and '<' characters through XML and not have XML confuse them with markup. They have no meaning in themselves and when you pull them out of the XML, you should accept them as the quotes and '<' characters they were intended to represent.
One approach might be to try dom4j, and to use the Node.asXML() method. It might return a deep structure, so it might need cloning to get just the node or text you want without any of its children.
Both good answers, but both a little too heavy-weight for this very small-scale application. I ended up going with the total hack of just stripping out all &s (I do this to &s that aren't part of escapes later anyway). It's ugly, but it's working.
Edit: I understand there's all kinds of things wrong with this, and that the requirement is stupid. It's for a school project, all that matters is that it work in one case, and the requirement is not my fault :)