A question on how to solve the string problem in Java

A question on how to solve the string problem in Java - java

I've created a simple xml file here:
http://roberthan.host56.com/productsNew.xml
which is quite simple, the root node is [products] while all other element nodes are [product]. Under each [product] node, there are two child nodes, [code] and [name], so it basically looks like:
[product]
[code]ddd[/code]
[name]ssss[/name]
[/product]
I've also written up the following Java code to parse this XML file and take out the text content of the [product] node, and add it to a JComboBox.
docBuilder = docFactory.newDocumentBuilder();
doc = docBuilder.parse("http://roberthan.host56.com/productsNew.xml");
NodeList productNodes = doc.getElementsByTagName("product");
productlist.clear();
for (i = 0; i < productNodes.getLength(); i++)
{
Node childNode = productNodes.item(i);
if (childNode.hasChildNodes()) {
NodeList nl = childNode.getChildNodes();
Node nameNode = nl.item(2);
productlist.add(nameNode.getTextContent());
}
}
final JComboBox productComboB = new JComboBox();
Iterator iterator = productlist.iterator();
while(iterator.hasNext())
{
productComboB.addItem(iterator.next().toString());
}
The code is quite straightforward, I firstly parse the xml and get all the product nodes and put them into a nodelist, and the productList is an arrayList. I loop through the all the [product] nodes, for each of them, if it has child nodes, then I take the second child node (which is the [name] node) and put the text content of it in the array list, and finally, I loop through the arrayList and add each item to the combo box.
The problem I got is, if I select the [code] child node, which means "Node nameNode = nl.item(1)", it will work perfectly; however, if I change that item(1) to item(2) to extract all the [name] nodes, the combo box will have a drop down list, but all the items are blank, like I have inserted 10 empty strings.
Also, if I try to add a "Hello World" string into the combo box after the above code, the "Hello World" item will appear after the 10 empty items.
I have spent the whole afternoon debugging this but still no breakthrough, the XML is actually quite simple and the Java is straightforward too. Could anyone share some thoughts with me on this please. Thanks a lot!

It is because the node list contains text nodes also.
If you add the following snippet to your code you will find that
for(int j = 0;j<nl.getLength();j++){
System.out.println(nl.item(j).getNodeName());
}
It will give the following output for each iteration of the product
#text
code
#text
name
#text
This means you have to get the 3rd element to get the name node.
Node nameNode = nl.item(3);
But I'll suggest you to use XPath to solve this problem.
NodeList nodelist = XPathAPI.selectNodeList(doc, "//products/product/name");
for (int i = 0; i < nodelist.getLength(); i++) {
productlist.add(nodelist.item(i).getTextContent());
}

XPath using this expression will easily solve your problem:
String XPATH_EXPRESSION1 = "//name/text()";
e.g.,
public static final String PRODUCTS_NEW = "http://roberthan.host56.com/productsNew.xml";
public static final String XPATH_EXPRESSION1 = "//name/text()";
public XmlFun() {
URL productsUrl;
try {
productsUrl = new URL(PRODUCTS_NEW);
List<String> nameList = xPathExtract(productsUrl.openStream());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
private List<String> xPathExtract(InputStream inStream) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document domDoc = builder.parse(inStream);
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression xExpr = xpath.compile(XPATH_EXPRESSION1);
NodeList nodes = (NodeList)xExpr.evaluate(domDoc, XPathConstants.NODESET);
List<String> resultList = new ArrayList<String>();
for (int i = 0; i < nodes.getLength(); i++) {
String node = nodes.item(i).getNodeValue();
resultList.add(node);
}
return resultList;
}

Related

Problem with finding XML files in IntelliJ even though they are there

First post here so bear with me, also tell me if im doing something wrong :)
I have this problem that in my IDE the application works just fine and it loads all the XML files correctly with all the data.
But when I "Build artifact" to make a release the released application.jar does NOT show all of my XML data.
After alot of googling I think it has to do with where I place my XML files and folders because when I tried to recreate the error in my IDE it gave me NullPointerException to the filepath.
This application is to be used by other people so hardcoding the absolute path is not an option.
Also good to know is that I am have two functions.
--> One function for reading only one XML file located in its own package inside src.
--> Another function used to read several XML files from a seperate package inside src.
I will paste the code below aswell as a picture showing my package structure in IntelliJ IDE.
▼ Picture of folder structure here ▼
https://i.stack.imgur.com/M9xap.png
I have tried marking ItemsXML and MonsterXML as resource in project structure but no change.
▼ Reading of one XML file below ▼
public void ReadItemXMLfile(){
try{
String fileName = "src\ItemsXML\items.xml";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(fileName);
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
XPathExpression expr = xpath.compile("/items/item"); // LOOT ID NUMBER
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i = 0; i < nodes.getLength(); i++) {
Node testNode = nodes.item(i);
if(testNode.getNodeType() == Node.ELEMENT_NODE){
Element element = (Element) testNode;
String idFromItemXml = "";
String itemNameFromItemXml = "";
idFromItemXml = element.getAttribute("id");
itemNameFromItemXml = element.getAttribute("name");
for(MonsterXML monster : monstersArrayList){
for(MonsterLootXML loot : monster.getLootableItems()){
if(loot.getId().equals(idFromItemXml)){
loot.setName(itemNameFromItemXml.substring(0, 1).toUpperCase() + itemNameFromItemXml.substring(1));
}
}
}
}
}
} catch (ParserConfigurationException parserConfigurationException) {
parserConfigurationException.printStackTrace();
} catch (IOException ioException) {
ioException.printStackTrace();
} catch (XPathExpressionException xPathExpressionException) {
xPathExpressionException.printStackTrace();
} catch (SAXException saxException) {
saxException.printStackTrace();
}
}
▼Reading of several XML files in a folder below▼
public void ReadMonsterXMLfiles(){
try{
File dir = new File("src\\MonsterXML");
if (dir.exists() && dir.isDirectory()) {
File [] files = dir.listFiles((d, name) -> name.endsWith(".xml"));
if (files != null) {
for (File file: files) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true); // never forget this!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(file.getPath());
XPathFactory xpathfactory = XPathFactory.newInstance();
XPath xpath = xpathfactory.newXPath();
XPathExpression expr = xpath.compile("/monster/#name | /monster/#experience | /monster/#manacost | /monster/health/#now");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
MonsterXML monsterXML = new MonsterXML();
monsterXML.setName(nodes.item(2).getTextContent());
monsterXML.setHealth(nodes.item(3).getTextContent());
monsterXML.setExperience(nodes.item(0).getTextContent());
monsterXML.setManaToSummon(nodes.item(1).getTextContent());
monsterXML.setName(monsterXML.getName().substring(0, 1).toUpperCase() + monsterXML.getName().substring(1));
// MONSTER LOOT (ID) AND MONSTER LOOT (DROPCHANCE%)
expr = xpath.compile("/monster/loot//item"); // LOOT ID NUMBER
result = expr.evaluate(doc, XPathConstants.NODESET);
nodes = (NodeList) result;
MonsterLootXML monsterLootXML = null;
for (int i = 0; i < nodes.getLength(); i++) {
Node testNode = nodes.item(i);
if(testNode.getNodeType() == Node.ELEMENT_NODE){
Element element = (Element) testNode;
monsterLootXML = new MonsterLootXML();
monsterLootXML.setId(element.getAttribute("id"));
monsterLootXML.setLootChance(element.getAttribute("chance"));
monsterLootXML.setLootChance(Calculations.correctDropChanceNumber(monsterLootXML.getLootChance()));
if(element.hasAttribute("countmax")){
monsterLootXML.setAmount(element.getAttribute("countmax"));
}
else{
monsterLootXML.setAmount("1");
}
monsterXML.addLootableItems(monsterLootXML);
}
}
monstersArrayList.add(monsterXML);
}
}
}
}
catch (Exception e) {
e.printStackTrace();
}
}
If anyone knows this well I would love to get some tutoring on discord if possible :)
Thanks you all!

Answer to this is to include the XML files in your folder (outside of your .jar) and then just refeering to that XML file when starting the application.
If you want to have the XML files inside of your .jar and load them from there you need to look up "getResourceAsStream".

How do I parse answers from AWS Mechanical Turk in Java?

The answer() method for an Assignment returns a String like this:
<?xml version="1.0" encoding="ASCII"?><QuestionFormAnswers xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionFormAnswers.xsd"><Answer><QuestionIdentifier>blah</QuestionIdentifier><FreeText>toplevel</FreeText></Answer></QuestionFormAnswers>
How am I supposed to parse this to get the actual answers? I see in older versions of the API there's a QuestionFormAnswers type. This is also referenced in the documentation, which states:
public String getAnswer()
The Worker's answers submitted for the HIT contained in a QuestionFormAnswers document, if the Worker provides an answer. If the Worker does not provide any answers, Answer may contain a QuestionFormAnswers document, or Answer may be empty.
Returns:
The Worker's answers submitted for the HIT contained in a QuestionFormAnswers document, if the Worker provides an answer. If the Worker does not provide any answers, Answer may contain a QuestionFormAnswers document, or Answer may be empty.
But it actually returns a String and not a QuestionFormAnswers. How do I parse this string XML result? Can I just use any standard method of parsing XML documents?

The answer appears to be yes, you can use any standard XML parsing technique.
Here is what worked for me:
private static Map<String, String> parseXML(String answerXML) {
try {
List<String> identifierList = new ArrayList<>();
List<String> answerList = new ArrayList<>();
InputSource is = new InputSource(new StringReader(answerXML));
Document document = null;
document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(is);
XPath xpath = XPathFactory.newInstance().newXPath();
NodeList identifiers = null;
try {
identifiers = (NodeList) xpath.evaluate("//Answer/QuestionIdentifier", document,
XPathConstants.NODESET);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
for (int i = 0; i < identifiers.getLength(); i++) {
Node identifier = identifiers.item(i);
String relation = identifier.getTextContent();
identifierList.add(relation);
}
NodeList texts = (NodeList) xpath.evaluate("//Answer/FreeText", document, XPathConstants.NODESET);
for (int i = 0; i < texts.getLength(); i++) {
Node text = texts.item(i);
String answer = text.getTextContent();
answerList.add(answer);
}
Map<String, String> result = new HashMap<>();
for (int k = 0; k < identifierList.size(); k++) {
result.put(identifierList.get(k), answerList.get(k));
}
return result;
} catch (Exception e) {
log.error("Failed to parse XML " + answerXML, e);
}
return null;
}
This creates a map from the input ids to the answers.

Retrieve XML Element names with Java from unknown message format

I am parsing XML from lots of JMS messaging topics, so the structure of each message varies a lot and I'd like to make one general tool to parse them all.
To start, all I want to do is get the element names:
<gui-action>
<action>some action</action>
<params>
<param1>blue</param1>
<param2>tall</param2>
<params>
</gui-action>
I just want to retrieve the strings "gui-action", "action", "params", "param1", and "param2." Duplicates are just fine.
I've tried using org.w3c.dom.Node, Element, NodeLists and I'm not having much luck. I keep getting the element values, not the names.
private Element root;
private Document doc;
private NodeList nl;
//messageStr is passed in elsewhere in the code
//but is a string of the full XML message.
doc = xmlParse( messageStr );
root = doc.getDocumentElement();
nl = root.getChildNodes();
int size = nl.getLength();
for (int i=0; i<size; i++) {
log.info( nl.item(i).getNodeName() );
}
public Document xmlParse( String xml ){
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db;
InputSource is;
try {
//Using factory get an instance of document builder
db = dbf.newDocumentBuilder();
is = new InputSource(new StringReader( xml ) );
doc = db.parse( is );
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch(SAXException se) {
se.printStackTrace();
} catch(IOException ioe) {
ioe.printStackTrace();
}
return doc;
//parse using builder to get DOM representation of the XML file
}
My logged "parsed" XML looks like this:
#text
action
#text
params
#text

Figured it out. I was iterating over only the child nodes, and not including the parent. So now I just filter out the #texts, and include the parent. Derp.
log.info(root.getNodeName() );
for (int i=0; i<size; i++) {
nodeName = nl.item(i).getNodeName();
if( nodeName != "#text" ) {
log.info( nodeName );
}
}
Now if anyone knows a way to get a NodeList of the entire document, that would be awesome.

Xpath approach in case of large files

The class you're gonna see right now is the classic approach to parse an XML document via XPath in Java:
public class Main {
private Document createXMLDocument(String fileName) throws Exception {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document doc = builder.parse(fileName);
return doc;
}
private NodeList readXMLNodes(Document doc, String xpathExpression) throws Exception {
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(xpathExpression);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
return nodes;
}
public static void main(String[] args) throws Exception {
Main m = new Main();
Document doc = m.createXMLDocument("tv.xml");
NodeList nodes = m.readXMLNodes(doc, "//serie/eason/#id");
int n = nodes.getLength();
Map<Integer, List<String>> series = new HashMap<Integer, List<String>>();
for (int i = 1; i <= n; i++) {
nodes = m.readXMLNodes(doc, "//serie/eason[#id='" + i + "']/episode/text()");
List<String> episodes = new ArrayList<String>();
for (int j = 0; j < nodes.getLength(); j++) {
episodes.add(nodes.item(j).getNodeValue());
}
series.put(i, episodes);
}
for (Map.Entry<Integer, List<String>> entry : series.entrySet()) {
System.out.println("Season: " + entry.getKey());
for (String ep : entry.getValue()) {
System.out.println("Episodio: " + ep);
}
System.out.println("+------------------------------------+");
}
}
}
In there I find some methods to be worrying in case of a huge xml file. Like the use of
Document doc = builder.parse(fileName);
return doc;
or
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
return nodes;
I'm worried because the xml document I need to handle is created by the customer and inside you can basically have an indefinite number of records describing emails and their contents (every user has its own personal email, so lots of html in there). I know it's not the smartest approach but it's one of the possibilities and it was already up and running before I arrived here.
My question is: how can I parse and evaluate huge xml files using xpath?

You could use the StAX parser. It will take less memory than the DOM options. A good introduction to StAX is at http://tutorials.jenkov.com/java-xml/stax.html

First of all, XPath doesn't parse XML. Your createXMLDocument() method does that, producing as output a tree representation of the parsed XML. The XPath is then used to search the tree representation.
What you are really looking for is something that searches the XML on the fly, while it is being parsed.
One way to do this is with an XQuery system that implements "document projection" (for example, Saxon-EE). This will analyze your query to see what parts of the document are needed, and when you parse your document, it will build a tree containing only those parts of the document that are actually needed.
If the query is as simple as the one in your example, however, then it isn't too hard to code it as a SAX application, where events such as startElement and endElement are notified by the XML parser to the application, without building a tree in memory.

Create XML document using nodeList

I need to create a XML Document object using the NodeList. Can someone pls help me to do this. This is my Java code:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class ReadFile {
public static void main(String[] args) {
String exp = "/configs/markets";
String path = "testConfig.xml";
try {
Document xmlDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(path);
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xPath.compile(exp);
NodeList nodes = (NodeList)
xPathExpression.evaluate(xmlDocument,
XPathConstants.NODESET);
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
I want to have an XML file like this:
<configs>
<markets>
<market>
<name>Real</name>
</market>
<market>
<name>play</name>
</market>
</markets>
</configs>
Thanks in advance.

You should do it like this:
you create a new org.w3c.dom.Document newXmlDoc where you store the nodes in your NodeList,
you create a new root element, and append it to newXmlDoc
then, for each node n in your NodeList, you import n in newXmlDoc, and then you append n as a child of root
Here is the code:
public static void main(String[] args) {
String exp = "/configs/markets/market";
String path = "src/a/testConfig.xml";
try {
Document xmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(path);
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xPath.compile(exp);
NodeList nodes = (NodeList) xPathExpression.
evaluate(xmlDocument, XPathConstants.NODESET);
Document newXmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().newDocument();
Element root = newXmlDocument.createElement("root");
newXmlDocument.appendChild(root);
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
Node copyNode = newXmlDocument.importNode(node, true);
root.appendChild(copyNode);
}
printTree(newXmlDocument);
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static void printXmlDocument(Document document) {
DOMImplementationLS domImplementationLS =
(DOMImplementationLS) document.getImplementation();
LSSerializer lsSerializer =
domImplementationLS.createLSSerializer();
String string = lsSerializer.writeToString(document);
System.out.println(string);
}
The output is:
<?xml version="1.0" encoding="UTF-16"?>
<root><market>
<name>Real</name>
</market><market>
<name>play</name>
</market></root>
Some notes:
I've changed exp to /configs/markets/market, because I suspect you want to copy the market elements, rather than the single markets element
for the printXmlDocument, I've used the interesting code in this answer
I hope this helps.
If you don't want to create a new root element, then you may use your original XPath expression, which returns a NodeList consisting of a single node (keep in mind that your XML must have a single root element) that you can directly add to your new XML document.
See following code, where I commented lines from the code above:
public static void main(String[] args) {
//String exp = "/configs/markets/market/";
String exp = "/configs/markets";
String path = "src/a/testConfig.xml";
try {
Document xmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().parse(path);
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression xPathExpression = xPath.compile(exp);
NodeList nodes = (NodeList) xPathExpression.
evaluate(xmlDocument,XPathConstants.NODESET);
Document newXmlDocument = DocumentBuilderFactory.newInstance()
.newDocumentBuilder().newDocument();
//Element root = newXmlDocument.createElement("root");
//newXmlDocument.appendChild(root);
for (int i = 0; i < nodes.getLength(); i++) {
Node node = nodes.item(i);
Node copyNode = newXmlDocument.importNode(node, true);
newXmlDocument.appendChild(copyNode);
//root.appendChild(copyNode);
}
printXmlDocument(newXmlDocument);
} catch (Exception ex) {
ex.printStackTrace();
}
}
This will give you the following output:
<?xml version="1.0" encoding="UTF-16"?>
<markets>
<market>
<name>Real</name>
</market>
<market>
<name>play</name>
</market>
</markets>

you can try the adoptNode() method of Document. Maybe you will need to iterate over your NodeList. You can access the individual Nodes with nodeList.item(i).If you want to wrap your search results in an Element, you can use createElement() from the Document and appendChild() on the newly created Element

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

A question on how to solve the string problem in Java - java

Related

Problem with finding XML files in IntelliJ even though they are there

How do I parse answers from AWS Mechanical Turk in Java?

Retrieve XML Element names with Java from unknown message format

Xpath approach in case of large files

Create XML document using nodeList

Categories

Resources