XPath produces garbled output instead of Unicode characters

XPath produces garbled output instead of Unicode characters - java

I am parsing this XML file:
<?xml version="1.0" encoding="UTF-8"?>
<tests>
<test category="Русский"/>
<test category="ελληνικά"/>
<test category="中文"/>
<test category="English"/>
</tests>
Main class is:
import java.io.File;
import java.io.FileInputStream;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class TestUnicode {
public static void main(String[] args) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression lolwhy = xpath.compile("//test");
final InputSource inputSource =
new InputSource(
new FileInputStream(
new File("sample.xml")));
NodeList parent = (NodeList) lolwhy.evaluate(
inputSource,
XPathConstants.NODESET);
System.out.println(parent.getLength());
for (int i = 0; i < parent.getLength(); i++) {
System.out.println(parent.item(i).getAttributes().
getNamedItem("category").getNodeValue());
}
}
}
And the output is:
4
???????
????????
??
English
What am I doing wrong here?
EDIT: ok, this issue was related to hebrew appears as question marks in netbeans and the solution is this: Setting the default Java character encoding?

Could be that the parsing is ok, but the output is wrong.
If you you used a font that doesn't contain those characters, or if you output the values to HTML, but specify a wrong encoding, this can be the result.
The font-issue being the more likely one.

System.out.println is the culprit.
See if this helps
http://hints.macworld.com/article.php?story=20050208053951714

Related

Using Xpath to add a new value to attributes

I need to use xpath to navigate to the analysis/analysis parameter attributes by add in a new value:
As a first step I have tried retrieving a value from the analysis tag but cannot get this to work (no sample value being retrieved, no output in console). Can anyone see where I am going wrong here and then further show how I can then add a new value.
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class XPathTestReports {
public static void main(String[] args) {
try {
String xpath = "/UserDocument/report-plan-catalog/collection/collection/collection/report-config/report-plan/settings[#analysis]";
FileInputStream file = new FileInputStream(new File("c:/workspace/savedreportscatalog.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xPath.compile(xpath);
String attributeValue = "" + xPathExpression.evaluate(xmlDocument, XPathConstants.STRING);
System.out.println(attributeValue);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
}
XML Sample
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<UserDocument>
<report-plan-catalog>
<collection timestamp="" title="report plans" uid="">
<collection container-id="" timestamp="2015-04-29" title="*" uid="">
<collection container-id="94533" timestamp="2015-04-29" title="*" uid="5cfc">
<report-config container-id="5cfc" timestamp="2015-04-29" title="Asset" type="Risk" uid="4718">
<configuration>
<reportType>Live</reportType>
</configuration>
<report-plan name="Asset">
<columns>
<column name="Nom" subtotal-function="Sum" total-function="Sum"/>
<column name="Id"/>
<column name="Ref"/>
</columns>
<settings analysis="someValue" analysisParameters="" filtering-enabled="true" object-actions="false" show-object-actions="true" sorting-enabled="true"/>
<viewpoint kind="simple">
<slices/>
</report-plan>
</report-config>
</collection>
</collection>
</collection>
</report-plan-catalog>
</UserDocument>

It's still not very clear what your end goal is, but
/UserDocument/report-plan-catalog/collection/collection/collection/report-config/report-plan/settings[#analysis]
selects a settings element, given that is has an attribute #analysis, that's what the predicate (inside angle brackets) means. If that is not what you want, use
/UserDocument/report-plan-catalog/collection/collection/collection/report-config/report-plan/settings/#analysis
to select the #analysis attribute of the settings element.
Not very familiar with Java, but I'll make two further guesses:
is there perhaps a namespace in your input document that you have not shown?
perhaps this is not the right way to deal with attribute nodes. Try
string(/UserDocument/report-plan-catalog/collection/collection/collection/report-config/report-plan/settings/#analysis)
EDIT: Now tested the Java code, and
String xpath = "/UserDocument/report-plan-catalog/collection/collection/collection/report-config/report-plan/settings/#analysis";
definitely works, after correcting the input XML, which is currently not well-formed - the viewpoint element is not closed.
I get as output:
$ java XPathTestReports
someValue

Getting too many child nodes and cant get attributes

I have a simple XML, and I want to get the attributes. There are a few examples on the web, but I still dont understand why I get 17 when I see only 4. I even try to count locations where I think text could be, but still I don't get that number unless is the length of the output . Which leads me to not know how to get the attribute name of all Tag3.
<?xml version="1.0" encoding="UTF-8"?>
<tag1 xmlns="something">
<xxxxxx-Set>
<tag3 Name="a"/>
<tag3 Name="b"/>
<tag3 Name="c"/>
<tag3 Name="d"/>
</xxxxxx-Set>
<tagB>
<tag3 Name="a"/>
<tag3 Name="b"/>
<tag3 Name="c"/>
<tag3 Name="d"/>
</tagB>
</tag1>
This is my java code:
import java.io.File;
import java.util.Arrays;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ParseXML {
public static void main(String[] args) {
try {
File test= new File("test.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(test);
NodeList tagAs= doc.getElementsByTagName("xxxxxx-Set").item(0).getChildNodes(); //should be all the tag3 elements?
for(int i = 0; i < tagAs.getLength(); i++) {
System.out.println(tagAs);
System.out.println(i);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Note: adding .getAttributes().getNamedItem("Name").getNodeValue() to the print statement gives me null exception.
And the output is:
[xxxxxx-Set: null]
0
[xxxxxx-Set: null]
1
...
[xxxxxx-Set: null]
16

If you want to take all your Name attributes (it's better to name them with lower case), use next approach:
Element xSet = (Element) doc.getElementsByTagName("xxxxxx-Set").item(0);
NodeList xSetTags = xSet.getElementsByTagName("tag3");
for(int i = 0; i < xSetTags.getLength(); i++) {
Element tag3 = (Element) xSetTags.item(i);
System.out.println(tag3.getAttribute("Name"));
}
I made it using org.w3c.dom.Element class. It's not the best idea to work with org.w3c.dom.Node, because this class represents not only xml elements, but attributes, comments and other too. Look documentation to get difference between Node and Element classes.

Parse Last.Fm XML from API in Java

URL: http://ws.audioscrobbler.com/2.0/?method=chart.gethypedtracks&api_key=1732077d6772048ccc671c754061cb18&limit=10
From the above url I need to somehow remove the Artist name and the track name from the XML file produced from each Song given but I have no Idea how to work with an XML file structured in this way ??
Any help or pointers would be very much appreciated !
Thanks,
Ross

Here's a fully working class that loads the URL you have indicated and parses the Track and artist names.
Basically it reads the xml into a Document, and runs 2 xpath queries in loops to get the data you want.
The document itself is simple xml, if you reformat it, it looks like:
<?xml version="1.0" encoding="utf-8"?>
<lfm status="ok">
<tracks page="1" perPage="10" totalPages="50" total="500">
<track>
<name>Hysterical</name>
<duration>231</duration>
<percentagechange>3626</percentagechange>
<mbid/>
<url>http://www.last.fm/music/Clap+Your+Hands+Say+Yeah/_/Hysterical</url>
<streamable fulltrack="0">0</streamable>
<artist>
<name>Clap Your Hands Say Yeah</name>
...
All I did to clean it up was run it through a re-formatter like xmlstarlet as I mentioned in my comment. Note: you don't have to reformat it for java to read it if it's well formed. Human readable is all a re-format does for you.
The first xpath query gets the track name using a path lfm/tracks/track/name. You can use something like this xpath tester to try out your xpath queries (you can paste your xml in and it will reformat it too). If you don't understand xpath, there are many sources on the net.
The second xpath works relative to the current track name node, and looks for a following-sibling node of type artist with a name sub-node, and then displays the text of the node.
Here's the code
package net.fish;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class ParseXML {
private static final DocumentBuilderFactory DOCUMENT_BUILDER_FACTORY = DocumentBuilderFactory.newInstance();
private static final XPathFactory XPATH_FACTORY = XPathFactory.newInstance();
public static void main(String[] args) throws Exception {
new ParseXML().parseXml("http://ws.audioscrobbler.com/2.0/?method=chart.gethypedtracks&api_key=1732077d6772048ccc671c754061cb18&limit=10");
}
private void parseXml(String urlPath) throws Exception {
URL url = new URL(urlPath);
URLConnection connection = url.openConnection();
DocumentBuilder db = DOCUMENT_BUILDER_FACTORY.newDocumentBuilder();
final Document document = db.parse(connection.getInputStream());
XPath xPathEvaluator = XPATH_FACTORY.newXPath();
XPathExpression nameExpr = xPathEvaluator.compile("lfm/tracks/track/name");
NodeList trackNameNodes = (NodeList) nameExpr.evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < trackNameNodes.getLength(); i++) {
Node trackNameNode = trackNameNodes.item(i);
System.out.println(String.format("Track Name: %s" , trackNameNode.getTextContent()));
XPathExpression artistNameExpr = xPathEvaluator.compile("following-sibling::artist/name");
NodeList artistNameNodes = (NodeList) artistNameExpr.evaluate(trackNameNode, XPathConstants.NODESET);
for (int j=0; j < artistNameNodes.getLength(); j++) {
System.out.println(String.format(" - Artist Name: %s", artistNameNodes.item(j).getTextContent()));
}
}
}
}

Why doesn't XPath namespace-uri() work out of the box with Java?

I am trying to use the namespace-uri() function in XPath to retrieve nodes based on their fully qualified name. The query //*[local-name() = 'customerName' and namespace-uri() = 'http://example.com/officeN'] in this online XPath tester, among others, correctly returns the relevant nodes. Yet the following self-contained Java class does not retrieve anything. What am I doing wrong with namespace-uri()?
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
public class Test{
public static void main(String[] args)throws Exception {
XPathExpression expr = XPathFactory.newInstance().newXPath().compile(
"//*[local-name() = 'customerName' and namespace-uri() = 'http://example.com/officeN']");
String xml=
"<Agents xmlns:n=\"http://example.com/officeN\">\n"+
"\t<n:Agent>\n\t\t<n:customerName>Joe Shmo</n:customerName>\n\t</n:Agent>\n"+
"\t<n:Agent>\n\t\t<n:customerName>Mary Brown</n:customerName>\n\t</n:Agent>\n</Agents>";
System.out.println(xml);
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new InputSource(new StringReader(xml)));
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
System.err.println("\n\nNodes:");
for (int i = 0; i < nodes.getLength(); i++) {
System.err.println(nodes.item(i));
}
}
}

The query looks fine. You also need to declare your DocumentBuilderFactory to be "namespace-aware".
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
dbf.newDocumentBuilder().parse(new InputSource(new StringReader(xml)));

how to disable dtd at runtime in java's xpath?

I got dtd in file and I cant remove it. When i try to parse it in Java I get "Caused by: java.net.SocketException: Network is unreachable: connect", because its remote dtd. can I disable somehow dtd checking?

You should be able to specify your own EntityResolver, or use specific features of your parser? See here for some approaches.
A more complete example:
<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "//FOO//" "foo.dtd">
<foo>
<bar>Value</bar>
</foo>
And xpath usage:
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Main {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
System.out.println("Ignoring " + publicId + ", " + systemId);
return new InputSource(new StringReader(""));
}
});
Document document = builder.parse(new File("src/foo.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String content = xpath.evaluate("/foo/bar/text()", document
.getDocumentElement());
System.out.println(content);
}
}
Hope this helps...

This worked for me:
SAXParserFactory saxfac = SAXParserFactory.newInstance();
saxfac.setValidating(false);
try {
saxfac.setFeature("http://xml.org/sax/features/validation", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
saxfac.setFeature("http://xml.org/sax/features/external-general-entities", false);
saxfac.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
}
catch (Exception e1) {
e1.printStackTrace();
}

I had this problem before. I solved it by downloading and storing a local copy of the DTD and then validating against the local copy. You need to edit the XML file to point to the local copy.
<!DOCTYPE root-element SYSTEM "filename">
Little more info here: http://www.w3schools.com/dtd/dtd_intro.asp
I think you can also manually set some sort of validateOnParse property to "false" in your parser. Depends on what library you're using to parse the XML.
More info here: http://www.w3schools.com/dtd/dtd_validation.asp

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

XPath produces garbled output instead of Unicode characters - java

Could be that the parsing is ok, but the output is wrong. If you you used a font that doesn't contain those characters, or if you output the values to HTML, but specify a wrong encoding, this can be the result. The font-issue being the more likely one.

System.out.println is the culprit. See if this helps http://hints.macworld.com/article.php?story=20050208053951714

Related

Using Xpath to add a new value to attributes

Getting too many child nodes and cant get attributes

Parse Last.Fm XML from API in Java

Why doesn't XPath namespace-uri() work out of the box with Java?

how to disable dtd at runtime in java's xpath?

Categories

Resources