I need to parse through an XML document in the database and search for a given expression in it. Then, I must return a String value if the given expression is present in the XML else I need to parse through the next expression and return another String value and so on.
I achieved this by using the following code:
// An xml document is passed as a Node when getEntryType() method is called
public static class XMLTextFields {
public static String getEntryType(Node target) throws XPathExpressionException {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXpath();
String entryType = null;
String [] expression = new String [] {"./libx:package", "./libx:libapp", "./libx:module"};
String [] type = new String [] {"Package", "Libapp", "Module" };
for (int i = 0; i < 3; i ++) {
XPathExpression expr = xpath.compile(expression[i]);
Object result = expr.evaluate(target, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
if (nodes.getLength() == 0)
continue;
entryType = (type[i]);
}
return entryType;
}
}
I am wondering if there is a simpler way to do this? Meaning, is there a way to use the "expression" like a function which returns a string if the expression is present in the xml.
I am guessing I should be able to do something like this but am not exactly sure:
String [] Expression = new String [] {"[./libx:package]\"Package\"", ....}
Meaning, return "Package" if libx:package node exists in the given XML
If your XPath processor is version 2, you can use if expressions: http://www.w3.org/TR/xpath20/#id-conditionals .
You can use XSLT here. In XSLT you can check the node name by using
<xsl:value-of select="*[starts-with(name(),'libx:package')]" />
OR you can check using
<xsl:if select="name()='libx:package'" >
<!-- Your cusotm elements here... -->
</xsl:if>
You can check existence of Element OR Attribute this way to validate specific needs.
hope this helps.
Yes there is, just use an XPath functions in your expression:
Expression exp = xpath.compile("local-name(*[local-name() = 'package'])")
// will return "package" for any matching elements
exp.evaluate(target, XPathConstants.STRING);
But this will return "package" instead of "Package". Note the capital P
Below is the Test code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import java.util.Map;
import java.util.HashMap;
public class Test {
private static Map<String, String> mappings = new HashMap<String, String>();
static {
mappings.put("package", "Package");
mappings.put("libapp", "Application");
mappings.put("module", "Module");
}
public static void main(String[] args) throws Throwable {
XPathFactory factory = XPathFactory.newInstance();
XPath xpath = factory.newXPath();
String entryType = null;
XPathExpression [] expressions = new XPathExpression[] {
xpath.compile("local-name(*[local-name() = 'package'])"),
xpath.compile("local-name(*[local-name() = 'libapp'])"),
xpath.compile("local-name(*[local-name() = 'module'])")
};
DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = fac.newDocumentBuilder();
Document doc = parser.parse(args[0]);
for(int i = 0; i < expressions.length; i++) {
String found = (String) expressions[i].evaluate(doc.getDocumentElement(),
XPathConstants.STRING);
entryType = mappings.get(found);
if(entryType != null && !entryType.trim().isEmpty()) {
break;
}
}
System.out.println(entryType);
}
}
Contents of text file:
<?xml version="1.0"?>
<root xmlns:libx="urn:libex">
<libx:package>mypack</libx:package>
<libx:libapp>myapp</libx:libapp>
<libx:module>mymod</libx:module>
</root>
In XPath 1.0
concat(translate(substring(local-name(libx:package|libx:libapp|libx:module),
1,
1),
'plm',
'PLM'),
substring(local-name(libx:package|libx:libapp|libx:module),2))
EDIT: It was dificult to understand the path because there was not provided input sample...
#ALL: Thanks!
I used:
XPathExpression expr = xpath.compile("concat(substring('Package',0,100*boolean(//libx:package))," + "substring('Libapp',0,100*boolean(//libx:libapp)),substring('Module',0,100*boolean(//libx:module)))");
expr.evaluate(target, XPathConstants.STRING);
Related
I have the next situation, i have a string like xml, I would like to search into this String to the second car_id (<car_id>12345678</car_id>) and get the value
if not avaible to search the car_type (<car_type id_1="2" id_2="32">55555</car_type>)
i tried the next code see below but is not working fine. is there better way to do/loop the Strings? thanks and thanks to Stackoverflow
Strings:
<car_dealer><car_id>2</car_id></car_dealer><car><car_id>12345678</car_id></car>
<car_dealer><car_id>2</car_id></car_dealer><car><car_type id_1="2" id_2="32">55555</car_type></car>
Code:
String carId = input.substring(input.lastIndexOf("<car_id>")+9, input.lastIndexOf("</car_id>"))
String carType= input.substring(input.indexOf("<car><car_type>")+84, input.indexOf("</car>"))
I would strongly discourage you from parsing those strings using regex or indexOf -trickery. Those are always bound to break at some time.
Should your strings that really, really look like xml actually be xml, you could parse the values using xpath. Something like this:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class A {
public static void main(String[] args) throws Exception {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
String xml1 = "<xml><car_dealer><car_id>2</car_id></car_dealer><car><car_id>12345678</car_id></car></xml>";
String xml2 = "<xml><car_dealer><car_id>2</car_id></car_dealer><car><car_type id_1=\"2\" id_2=\"32\">55555</car_type></car></xml>";
Document doc1 = stringToDom(xml1);
Document doc2 = stringToDom(xml2);
XPathExpression expr1 = xpath.compile("//car/car_id/text()");
String carId = (String) expr1.evaluate(doc1, XPathConstants.STRING);
XPathExpression expr2 = xpath.compile("//car/car_type/text()");
String carType = (String) expr2.evaluate(doc2, XPathConstants.STRING);
System.out.println("***");
System.out.println("carId: " + carId);
System.out.println("carType: " + carType);
System.out.println("***");
/* prints
***
carId: 12345678
carType: 55555
***
*/
}
public static Document stringToDom(String xmlSource) throws SAXException,
ParserConfigurationException, IOException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(new StringReader(xmlSource)));
}
}
You can find the car id by using the pattern matching as below. Hope this will help you.
String xmlString = "<car_id>12345678</car_id>afhkjasd<car_id>123456789</car_id>";
Pattern pattern = Pattern.compile("(<car_id>)([0-9]{0,})(</car_id>)");
Matcher matcher = pattern.matcher(xmlString);
while (matcher.find()) {
System.out.println(matcher.group(2));
}
Index of will do the job:
Original:
<car_dealer><car_id>2</car_id></car_dealer><car><car_id>12345678</car_id></car>
<car_dealer><car_id>2</car_id></car_dealer><car><car_type id_1="2" id_2="32">55555</car_type></car>
Index of Solution:
String car_id = input.substring(input.indexOf("<car><car_id>") + "<car><car_id>".length(), input.indexOf("</car_id></car>"));
Do same for others.
Good luck!
I have to extract some integers from a tag of a html code.
For example if I have:
< tag blabla="title"><a href="/test/tt123> TEST 1 < tag >
I did that removing all the chars and leaving only the digits and it worked until in the title name there was another digit, so i got "1231".
str.replaceAll("[^\\d.]", "");
How can I do to extract only the "123" integer?? Thanks for your help!
Jsoup is a good api to play around with html. Using that you could do like
String html = "<tag blabla=\"title\"><a href=\"/test/tt123\"> TEST 1 <tag>";
Document doc = Jsoup.parseBodyFragment(html);
String value = doc.select("a").get(0).attr("href").replaceAll("[^\\d.]", "");
System.out.println(value);
You could do this (a method that removes all duplicates in any number):
int[] foo = new int[str.length];
for(int i = 0; i < str.length; i++) {
foo[i] = Integer.parseInt(str.charAt(i));
}
Set<Integer> set = new HashSet<Integer>();
for(int i = 0; i < foo.length; i++){
set.add(foo[i]);
}
Now you have a set where all duplicate numbers from any string are removed. I saw your last comment not. So this answer might not be very useful to you. What you could do is that the three first digits in the foo array as well, which will give you 123.
First use XPath to parse out only the href value, then apply your replaceAll to achieve what you desired.
And you don't have to download any additional frameworks or libraries for this to work.
Here's a quick demo class on how this works:
package com.example.test;
import java.io.StringReader;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.xml.sax.InputSource;
public class Test {
public static void main(String[]args){
String xml = "<tag blabla=\"title\"> TEST 1 </tag>";
XPath xPath = XPathFactory.newInstance().newXPath();
InputSource source = new InputSource(new StringReader(xml));
String hrefValue = null;
try {
hrefValue = (String) xPath.evaluate("//#href", source, XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
String numbers = hrefValue.replaceAll("[^\\d.]", "");
System.out.println(numbers);
}
}
I have this XML file:
<root>
<node1>
<name>A</name>
<node2>
<name>B</name>
<node3>
<name>C</name>
<number>001</number>
</node3>
</node2>
</node1>
</root>
I am parsing the file, to get the name for each node, and the corresponding number if existing.
I use:
String number = eElement.getElementsByTagName("number").item(0).getTextContent();
This should give me something like:
Name | Number
A |
B |
C | 001
But I get:
Name | Number
A | 001
B | 001
C | 001
So, I think the getElementsByTagName("Number") is looking for number node in all the children of a node. I don't want that. Does anybody know a workaround?
I thought of using XPath instead of the above method, but I really want to know if there's an existing way. Thanks
You can use the javax.xml.xpath APIs in the JDK/JRE to have much more control over the XML returned over getElementsByTagName.
import java.io.File;
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document document = docBuilder.parse(new File("filename.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
Element element = (Element) xpath.evaluate("//node3/name", document, XPathConstants.NODE);
}
}
Hope this helps,
import java.io.File;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class XML {
public static void main(String[] args) throws IOException {
File input = new File("D:\\sample.xml");
Document doc = Jsoup.parse(input, "UTF-8");
Elements allElements = doc.select("root");
for(Element value : allElements){
System.out.println(value.text());
}
String node3Num = doc.select("node3").tagName("number").text();
System.out.println(node3Num);
}
}
Output:
A B C 001
C 001
I have used jsoup-1.7.2 jar (you can download from jsoup.org)
Assuming that your eElement variable is always one of the <node1/>, <node2/>, … elements in question, then the following code should work when you replace your own snippet mentioned above:
String number = null;
NodeList childNodes = eElement.getChildNodes();
for (int i = 0; i < childNodes.getLength(); i++) {
Node node = childNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE
&& node.getNodeName().equals("number")) {
number = node.getTextContent();
break;
}
}
The number variable will be null when there is no <number/> child; it will contain the number you need otherwise.
I need a method like Document.getElementsByTagName(), but one that searches only tags from a certain level (ie, not nested tags with the same name)
Example file:
<script>
<something>
<findme></findme><!-- DO NOT FIND THIS TAG -->
</something>
<findme></findme><!-- FIND THIS TAG -->
</script>
Document.getElementsByTagName() simply returns all findme tags in the document.
Here is an example with XPath
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class TestXPath {
private static final String FILE = "a.xml" ;
private static final String XPATH = "/script/findme";
public static void main(String[] args) {
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
docFactory.setNamespaceAware(true);
DocumentBuilder builder;
try {
builder = docFactory.newDocumentBuilder();
Document doc = builder.parse(FILE);
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH);
Object hits = expr.evaluate(doc, XPathConstants.NODESET ) ;
if ( hits instanceof NodeList ) {
NodeList list = (NodeList) hits ;
for (int i = 0; i < list.getLength(); i++ ) {
System.out.println( list.item(i).getTextContent() );
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
With
<script>
<something>
<findme>1</findme><!-- DO NOT FIND THIS TAG -->
</something>
<findme>Find this</findme><!-- FIND THIS TAG -->
<findme>More of this</findme><!-- FIND THIS TAG AS WELL -->
</script>
It yields
Find this
More of this
If you're using DOM, the only way I can think of would be a recursive function that looks at the children of each element.
Use:
/*/findme
This XPath expression selects all findme elements that are children of the top element of the XML document.
This expression:
//findme[count(ancestor::*) = 5]
selects all findme elements in the XML document that have exactly five ancestor - elements -- that is, they all are at "level 6".
I can't fetch text value with Node.getNodeValue(), Node.getFirstChild().getNodeValue() or with Node.getTextContent().
My XML is like
<add job="351">
<tag>foobar</tag>
<tag>foobar2</tag>
</add>
And I'm trying to get tag value (non-text element fetching works fine). My Java code sounds like
Document doc = db.parse(new File(args[0]));
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes())
System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
It prints out
tag type (1):
tag1
tag1
tag1
null
#text type (3):
_blank line_
_blank line_
...
Thanks for the help.
I'd print out the result of an2.getNodeName() as well for debugging purposes. My guess is that your tree crawling code isn't crawling to the nodes that you think it is. That suspicion is enhanced by the lack of checking for node names in your code.
Other than that, the javadoc for Node defines "getNodeValue()" to return null for Nodes of type Element. Therefore, you really should be using getTextContent(). I'm not sure why that wouldn't give you the text that you want.
Perhaps iterate the children of your tag node and see what types are there?
Tried this code and it works for me:
String xml = "<add job=\"351\">\n" +
" <tag>foobar</tag>\n" +
" <tag>foobar2</tag>\n" +
"</add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
ByteArrayInputStream bis = new ByteArrayInputStream(xml.getBytes());
Document doc = db.parse(bis);
Node n = doc.getFirstChild();
NodeList nl = n.getChildNodes();
Node an,an2;
for (int i=0; i < nl.getLength(); i++) {
an = nl.item(i);
if(an.getNodeType()==Node.ELEMENT_NODE) {
NodeList nl2 = an.getChildNodes();
for(int i2=0; i2<nl2.getLength(); i2++) {
an2 = nl2.item(i2);
// DEBUG PRINTS
System.out.println(an2.getNodeName() + ": type (" + an2.getNodeType() + "):");
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getTextContent());
if(an2.hasChildNodes()) System.out.println(an2.getFirstChild().getNodeValue());
System.out.println(an2.getTextContent());
System.out.println(an2.getNodeValue());
}
}
}
Output was:
#text: type (3): foobar foobar
#text: type (3): foobar2 foobar2
If your XML goes quite deep, you might want to consider using XPath, which comes with your JRE, so you can access the contents far more easily using:
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
Full example:
import static org.junit.Assert.assertEquals;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.junit.Before;
import org.junit.Test;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
public class XPathTest {
private Document document;
#Before
public void setup() throws Exception {
String xml = "<add job=\"351\"><tag>foobar</tag><tag>foobar2</tag></add>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(new InputSource(new StringReader(xml)));
}
#Test
public void testXPath() throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xp = xpf.newXPath();
String text = xp.evaluate("//add[#job='351']/tag[position()=1]/text()",
document.getDocumentElement());
assertEquals("foobar", text);
}
}
I use a very old java. Jdk 1.4.08 and I had the same issue. The Node class for me did not had the getTextContent() method. I had to use Node.getFirstChild().getNodeValue() instead of Node.getNodeValue() to get the value of the node. This fixed for me.
If you are open to vtd-xml, which excels at both performance and memory efficiency, below is the code to do what you are looking for...in both XPath and manual navigation... the overall code is much concise and easier to understand ...
import com.ximpleware.*;
public class queryText {
public static void main(String[] s) throws VTDException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", true))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
// first manually navigate
if(vn.toElement(VTDNav.FC,"tag")){
int i= vn.getText();
if (i!=-1){
System.out.println("text ===>"+vn.toString(i));
}
if (vn.toElement(VTDNav.NS,"tag")){
i=vn.getText();
System.out.println("text ===>"+vn.toString(i));
}
}
// second version use XPath
ap.selectXPath("/add/tag/text()");
int i=0;
while((i=ap.evalXPath())!= -1){
System.out.println("text node ====>"+vn.toString(i));
}
}
}