I have a large XML file that I want to parse
XML - The XML file has over 300 cases and other tags
i'm only interested in the Cases. What I want todo is take all the cases and everything in the case tag and save it in a new DOM doc that only holds the cases, Once I have this New DOM I want to send it to another class that will take the information and format it into a word document( but i'll takle that once I get there)
an example of my XML is
<suite>
<cases>
<case>
<id/>
<title/>
<type/>
<priority/>
<estimate/>
<references/>
<custom>
<functional_area/>
<technology_dependence/>
<reviewed/>
<steps_completed>
</steps_completed>
<preconds> </preconds>
<steps_seperated>
<step>
<index/>
<content>
</content>
<expected>
</expected>
</step>
<step>
<index/>
<content>
</content>
<expected>
</expected>
</step>
<step>
</steps_seperated>
</custom>
</case>
</suite>
</cases>
There are about 400 of these case nodes
My java
setting up the initial
private void setXMLdoc(String path){
xmlDoc = getDocument(path) ;
}
getting the xml file
private Document getDocument(String path) {
try{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(path);
} catch (ParserConfigurationException ex) {
Logger.getLogger(ImportXML.class.getName()).log(Level.SEVERE, null, ex);
} catch (SAXException ex) {
Logger.getLogger(ImportXML.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(ImportXML.class.getName()).log(Level.SEVERE, null, ex);
}
return null;
}
would this create a new doc that only contains the cases?
NodeList fList = xmlDoc.getElementsByTagName("case");
also how would I print out all elements todo with the case? / print out all the elements todo with all the cases
Thanks in advance - im still pretty new so sorry if this question doesn't make sense or seems a bit basic
The approximate code will be
DOMParser parser=new DOMParser();
InputSource source=new InputSource(<the XML file/network stream>);
parser.parse(source);
Element docElement=parser.getDocument().getDocumentElement();
XPath xPath=xPathFactory.newXPath();
XPathExpression expression_=xPath.compile("//case");
NodeList list_=(NodeList)expression_.evaluate(docElement,XPathConstants.NODESET);DocumentBuilder documentBuilder=DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document newDocument=documentBuilder.newDocument();
Element newElement=newDocument.createElement("SOME_NAME");
newDocument.appendChild(newElement);
for(int i=0;i<list_.getLength();i++){Node n=newDocument.importNode(list_.item(i),true);newElement.appendChild(n);}
then send the 'newDocument' to the other class
Related
I am trying to parse a string representation of a xml document with jdom2. I expect the xml string
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
to be a valid xml document. But when I run this simple code snippet:
import java.io.IOException;
import java.io.StringReader;
import org.jdom2.*;
public class Main {
public static void main(String []args){
SAXBuilder parser = new SAXBuilder();
String data = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>";
try {
Document doc = parser.build(new StringReader(data));
} catch (JDOMException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
I receive the error:
org.jdom2.input.JDOMParseException: Error on line 1: Premature end of file.
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:232)
at org.jdom2.input.sax.SAXBuilderEngine.build(SAXBuilderEngine.java:303)
at org.jdom2.input.SAXBuilder.build(SAXBuilder.java:1196)
at testpack.Main.main(Main.java:32)
Does the xml specification not allow an xml payload without an root element?
If not, how should I check if the xml document is empty?
Edit: I also noticed that the documentation in Jdom2 for the Document() class states that
A document must have a root element, so this document will not be well-formed and accessor methods will throw an IllegalStateException if this document is accessed before a root element is added.
It might just be that Jdom2 doesn't support empty xml documents?
After further investigation I have noted that the specification for and xml document as defined by w3 specifies that a 'Well formed xml document' should adhere to
It contains one or more elements.
Meaning zero elements is not an option. The input xml String is a malformed xml string.
I'm a total Java virgin, and I've been stumbling slowly but surely in developing an IRC bot for my friends. So far, I've gotten nearly all of the features in working order. But, I'm really wracking my brain over this problem here, my bot so far can reply with a link, but every week, I have to change the link in the java file manually and recompile the whole thing. So, I want it to be able to parse the pertinent values from an XML file in the same directory the bot's java files are in, and be able to update those same values through an IRC client.
import org.jibble.pircbot.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.xml.sax.*;
import org.w3c.dom.*;
public class ModBot extends PircBot {
static String inputFile = "./botdata.xml";
static String outputFile = "./botdata.xml";
public ModBot() {
setLogin("ModBot");
this.setName("ModBot");
setVersion(" ");
}
public void onMessage(String channel, String sender, String login, String hostname, String message) {
if (message.equalsIgnoreCase("!lolcat")) {
sendMessage(channel, sender + "http://i.imgur.com/4IX4cUL.jpg");
}
if (message.startsWith("!updatelolcat ")) {
if(login.equals("Mainmod"));
String changelolcat = message.substring(14);
}
}
}
And the XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE botdata [
<!ELEMENT botdata (lolcat,partytime,start,end)>
<!ELEMENT lolcat (#PCDATA)>
<!ELEMENT partytime (start,end)>
<!ELEMENT start (#PCDATA)>
<!ELEMENT end (#PCDATA)>
]>
<botdata>
<lolcat>http://i.imgur.com/4IX4cUL.jpg</lolcat>
<partytime>
<start>8:45:30</start>
<end>11:00:00</end>
</partytime>
</botdata>
What I want to do is take whatever "changelolcat" is, and overwrite the current link in the XML, and then a way to read from the same XML to send what's in "lolcat" to anyone replying "!lolcat". I've been going through xpath and jdom and stuff, and I just can't make sense of it. What I've read with methods using xpath looks promising, and I'd prefer to use it because it's prettier to read.
EDIT:
It worked, I put in
try {DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse("botdata.xml");
Node botdata = document.getElementsByTagName("botdata").item(0);
NodeList nodes = botdata.getChildNodes();
for (int i = 0; i < nodes.getLength(); i++) {
Node element = nodes.item(i);
if ("lolcat".equals(element.getNodeName())) {
element.setTextContent(changelolcat);
}
}
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
DOMSource domSource = new DOMSource(document);
StreamResult streamResult = new StreamResult(new File("botdata.xml"));
transformer.transform(domSource, streamResult);
}catch (ParserConfigurationException pce) {
pce.printStackTrace();
} catch (TransformerException tfe) {
tfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} catch (SAXException sae) {
sae.printStackTrace();
}
after String changelolcat = message.substring(14);
EDIT: I figured out how to parse from my XML to send what's in a node as a message, does this look right? I feel like I'm not supposed to keep copying the doc builder over and over in different methods
if (message.equalsIgnoreCase("!lolcat")) {
try {
DocumentBuilderFactory documentBuilderFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder =
documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.parse(botxml);
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String lolcat = xpath.evaluate("//lolcat", document);
sendMessage(channel, sender + lolcat);
} catch (Exception e) {
e.printStackTrace();
}
}
One way to do this would be to use the javax.xml.parsers.DocumentBuilder to parse your xml file into javax.swing.text.Document, which provides a great interface for getting and modifying individual elements and their values. You can then write the modified document back to the original file, overwriting it with the new version.
Here are some links to the relevant javadocs:
Document
DocumentBuilder
You might also want to look at the javadocs for the DocumentBuilderFactory object, which is definitely the best way to generate DocumentBuilders.
I am newbie in XML parsing world. I found solution for reading xml file with 'namespaces' as given below. I refered this stackoverflow link Default XML namespace, JDOM, and XPath and reading Element works. But I am not able to modify the element. Here goes my problem statements.
My xml file looks like this.
<?xml version="1.0" encoding="UTF-8"?>
<ProofSpecification xmlns="http://www.zurich.ibm.com/security/idemix"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.zurich.ibm.com/security/idemix ProofSpecification.xsd">
<Declaration>
<AttributeId name="id1" proofMode="unrevealed" type="string" />
<AttributeId name="id2" proofMode="unrevealed" type="string" />
</Declaration>
<Specification>
<Credentials>
<Credential issuerPublicKey="file:/Users/ipk.xml"
credStruct="file:/Users/CredStruct_ResUAC.xml" name="SpecResUAC">
<Attribute name="FirstName">id1</Attribute>
<Attribute name="LastName">id2</Attribute>
</Credential>
</Credentials>
<Pseudonyms>
<Pseudonym name="pseudonym"></Pseudonym>
<DomainPseudonym>1331859289489</DomainPseudonym>
</Pseudonyms>
<Messages />
</Specification>
</ProofSpecification>
And my java code snippet looks like this.
public class ModifyXMLFileJDom {
public static void main(String[] args) throws JDOMException, IOException {
try {
SAXBuilder builder = new SAXBuilder();
File xmlFile = new File("/blabla/ProofSpecResUAC_old.xml");
Document doc = (Document) builder.build(xmlFile);
XPath xpath = XPath.newInstance("x:ProofSpecification/x:Specification/x:Pseudonyms/x:DomainPseudonym");
xpath.addNamespace("x", doc.getRootElement().getNamespaceURI());
System.out.println("domainPseudonym: "+xpath.valueOf(doc));
xpath.setVariable("555555", doc);
XMLOutputter xmlOutput = new XMLOutputter();
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(doc, new FileWriter("/blabla/proofSpecOut.xml"));
System.out.println("File updated!");
} catch (IOException io) {
io.printStackTrace();
} catch (JDOMException e) {
e.printStackTrace();
}
}
}
This piece of code works fine and Reading Element "domainPseudonym" works.
But I want to modify this element from
<DomainPseudonym>1331859289489</DomainPseudonym> to
<DomainPseudonym>555555</DomainPseudonym>
I tried modifying using function xpath.setVariable("555555", doc) but not working and not giving out any error. The end result is that it copies the same content to the new xml file "proofSpecOut.xml".
XPath is not used for modifying the xml, only for finding parts of it. XPath.setVariable() is only for setting variables in your xpath expression. You want to use XPath.selectSingleNode() to retrieve one of the Elements of your document and then you want to modify that directly using Element.setText().
Hy i need to parse this XML :
<WhoisRecord xmlns="http://adam.kahtava.com/services/whois" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<DomainName>68.140.1.1</DomainName>
<RegistryData><AbuseContact><Email>abuse-mail#verizonbusiness.com</Email><Name>abuse</Name><Phone>+1-800-900-0241</Phone></AbuseContact><AdministrativeContact><Email>stephen.r.middleton#verizon.com</Email><Name>Verizon Internet Services</Name><Phone>800-243-6994</Phone></AdministrativeContact><BillingContact i:nil="true"/><CreatedDate>2002-05-13T00:00:00-04:00</CreatedDate><RawText i:nil="true"/><Registrant><Address>22001 Loudoun County Parkway</Address><City>Ashburn</City><Country>US</Country><Name>UUNET Technologies, Inc.</Name><PostalCode>20147</PostalCode><StateProv>VA</StateProv></Registrant><TechnicalContact><Email>swipper#verizonbusiness.com</Email><Name>swipper</Name><Phone>+1-800-900-0241</Phone></TechnicalContact><UpdatedDate>2004-03-16T00:00:00-05:00</UpdatedDate><ZoneContact i:nil="true"/></RegistryData></WhoisRecord>
My code looks like this:
public class XMLParser
{
String streamTitle = "";
/** Called when the activity is first created.
* #throws IOException
* #throws SAXException */
public String startparse(String xml) throws SAXException, IOException
{
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
try
{
DocumentBuilder builder = builderFactory.newDocumentBuilder();
builder.parse(xml);
}
catch (ParserConfigurationException e)
{
e.printStackTrace();
}
return builderFactory.getAttribute("WhoisRecord").toString();
}
}
when i try to return something from startparse i simple get nothing.
XMLParser xmlpar = new XMLParser();
Log.v("Faruk TEST ", "udss:"+xmlpar.startparse(temp));
Do some one know a simple solution for this problem?
Try this i hope it is what you are looking for ...
Android: parse XML from string problems
I think that your call to builderFactory.getAttribute is wrong.
DocumentBuilder.parse() returns a Document object, and this will contain the DOM that you've just parsed. You can use this to access the elements of the XML.
Try this Working with XML-Android
When I parse my xml file (variable f) in this method, I get an error
C:\Documents and Settings\joe\Desktop\aicpcudev\OnlineModule\map.dtd (The system cannot find the path specified)
I know I do not have the dtd, nor do I need it. How can I parse this File object into a Document object while ignoring DTD reference errors?
private static Document getDoc(File f, String docId) throws Exception{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(f);
return doc;
}
Try setting features on the DocumentBuilderFactory:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
...
Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.
A similar approach to the one suggested by #anjanb
builder.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("foo.dtd")) {
return new InputSource(new StringReader(""));
} else {
return null;
}
}
});
I found that simply returning an empty InputSource worked just as well?
I found an issue where the DTD file was in the jar file along with the XML. I solved the issue based on the examples here, as follows: -
DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
if (systemId.contains("doc.dtd")) {
InputStream dtdStream = MyClass.class
.getResourceAsStream("/my/package/doc.dtd");
return new InputSource(dtdStream);
} else {
return null;
}
}
});
Source XML (With DTD)
<!DOCTYPE MYSERVICE SYSTEM "./MYSERVICE.DTD">
<MYACCSERVICE>
<REQ_PAYLOAD>
<ACCOUNT>1234567890</ACCOUNT>
<BRANCH>001</BRANCH>
<CURRENCY>USD</CURRENCY>
<TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
</REQ_PAYLOAD>
</MYACCSERVICE>
Java DOM implementation for accepting above XML as String and removing DTD declaration
public Document removeDTDFromXML(String payload) throws Exception {
System.out.println("### Payload received in XMlDTDRemover: " + payload);
Document doc = null;
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature("http://xml.org/sax/features/namespaces", false);
dbf.setFeature("http://xml.org/sax/features/validation", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource();
is.setCharacterStream(new StringReader(payload));
doc = db.parse(is);
} catch (ParserConfigurationException e) {
System.out.println("Parse Error: " + e.getMessage());
return null;
} catch (SAXException e) {
System.out.println("SAX Error: " + e.getMessage());
return null;
} catch (IOException e) {
System.out.println("IO Error: " + e.getMessage());
return null;
}
return doc;
}
Destination XML (Without DTD)
<MYACCSERVICE>
<REQ_PAYLOAD>
<ACCOUNT>1234567890</ACCOUNT>
<BRANCH>001</BRANCH>
<CURRENCY>USD</CURRENCY>
<TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
</REQ_PAYLOAD>
</MYACCSERVICE>
I know I do not have the dtd, nor do I need it.
I am suspicious of this statement; does your document contain any entity references? If so, you definitely need the DTD.
Anyway, the usual way of preventing this from happening is using an XML catalog to define a local path for "map.dtd".
here's another user who got the same issue : http://forums.sun.com/thread.jspa?threadID=284209&forumID=34
user ddssot on that post says
myDocumentBuilder.setEntityResolver(new EntityResolver() {
public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
throws SAXException, java.io.IOException
{
if (publicId.equals("--myDTDpublicID--"))
// this deactivates the open office DTD
return new InputSource(new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
else return null;
}
});
The user further mentions "As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation..."
Hope this helps.
I'm working with sonarqube, and sonarlint for eclipse showed me Untrusted XML should be parsed without resolving external data (squid:S2755)
I managed to solve it using:
factory = DocumentBuilderFactory.newInstance();
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
// If you can't completely disable DTDs, then at least do the following:
// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-general-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-general-entities
// JDK7+ - http://xml.org/sax/features/external-general-entities
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
// Xerces 1 - http://xerces.apache.org/xerces-j/features.html#external-parameter-entities
// Xerces 2 - http://xerces.apache.org/xerces2-j/features.html#external-parameter-entities
// JDK7+ - http://xml.org/sax/features/external-parameter-entities
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
// Disable external DTDs as well
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// and these as well, per Timothy Morgan's 2014 paper: "XML Schema, DTD, and Entity Attacks"
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);