I have a number of pre-generated, static xml files containing soap requests. I can read them, send the request, and get back and answer from the server. I would like to get some advice on how to create a dynamic process:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<getProject xmlns="http://myserver/">
<atr1>string</atr1>
<atr2>string</atr2>
</getProject>
</soap:Body>
</soap:Envelope>
So, I want to be able to read these xml files, change the values of the nodes , etc. to real values gathered from user input at run-time. What would be the best way to go: read the xml file line by line and use a regex to replace value, or maybe make a temp copy of the xml file, use sax to replace the node value, then send the new xml, or completely discard the pre-generated xml files and instead create them on-the-fly, or how? Any suggestions would be appreciated.
Using regexes would be fragile, because the formatting of the XML could change in ways you're not expecting, and still be well-formed and valid XML, but not fit your regexes. In general it's not recommended to use regexes to parse XML.
Using SAX to read in the XML file (why make a temp copy?), copy all nodes to the output, modifying certain ones to put in the user-supplied values. That sounds like a good, workable solution.
Create the XML from scratch: that does sound simpler, if you know their structure in advance, and it's not too big. One way to do this would be to use an XSLT stylesheet, and pass in the user-supplied values as parameters.
You could use castor and create objects from the xml, and xml from the objects.
private void changeTagData(List<String> tagNameList, SOAPBody body) {
for(String tagName : tagNameList){
NodeList nodeList = body.getElementsByTagName(tagName);
int length = nodeList.getLength();
Node node;
for (int i = 0; i < length; i++) {
node = (Node) nodeList.item(i);
node.setTextContent("change tag data");
}
}
}
XStream can also be used in this process i am also doing some what same thing. If you like you can try XStream also.
Related
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet href="Sample.xsl" type="text/xsl"?>
<MyDoc>.....</MyDoc>
I want to modify the attribute href's value to 'MyDoc.xsl'. I have tried using XPath but it returns nothing:
//xml-stylesheet[contains(text(), 'Sample.xsl')]/#href";
Also using Document only gives elements starting at MyDoc
NodeList list = taggedC32Doc.getElementsByTagName("*");
Is there any way i can do this?
The line you want to change is a Processing Instruction, not an Element, so neither of your attempts to find it as an element will work. Try
/processing-instruction(xml-stylesheet)
You can then get that node's data, which will be href="Sample.xsl" type="text/xsl". Perform the appropriate string manipulation to find and change the href pseudo-attribute in that string -- sorry, most XML APIs don't provide any assistance in doing so, because as far as XML is concerned the PI's data is an unformatted string even though it's usually structured to resemble attributes -- and set the new data back into the ProcessingInstruction node.
I need to parse an XML string with MATLAB (caution: without file I/O, so I don't want to write the string to a file and then read them). I'm receiving the strings from an HTTP connection and the parsing should be very fast. I'm mostly concerned about reading the values of certain tags in the entire string
The net is full of death threats about parsing XML with regexp so I didn't want to get into that just yet. I know MATLAB has seamless java integration but I'm not very java savvy. Is there a quick way to get certain values from XML very very rapidly?
For example I want to get the 'volume' information from this string below and write this to a variable.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<root>
<volume>256</volume>
<length>0</length>
<time>0</time>
<state>stop</state>
....
For what it's worth, below is the Matlab executable Java code to perform the required task, without writing to an intermediate file:
%An XML formatted string
strXml = [...
'<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>' char(10)...
'<root>' char(10) ...
' <volume>256</volume>' char(10) ...
' <length>0</length>' char(10) ...
' <time>0</time>' char(10) ...
' <state>stop</state>' char(10) ...
'</root>' ];
%"simple" java code to create a document from said string
xmlDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder.parse(java.io.StringBufferInputStream(strXml));
%"intuitive" methods to explore the xmlDocument
nodeList = xmlDocument.getElementsByTagName('volume');
numberOfNodes = nodeList.getLength();
firstNode = nodeList.item(0);
firstNodeContent = firstNode.getTextContent;
disp(firstNodeContent); %Returns '256'
As an alternative, if your application allows it, consider passing the URL directly into your XML parser. Untested java code is below, but that probably also opens up the Matlab built-in xslt function as well.
xmlDocument = javax.xml.parsers.DocumentBuilderFactory.newInstance().newDocumentBuilder.parse('URL_AS_A_STRING_HERE');
Documentation here. Start at the "javax.xml.parsers" package.
There's an entire class of functions for dealing with xml, including xmlread and xmlwrite. Those should be pretty useful for your problem.
I am not familiar with Matlab's APIs at all, but I would point out that using the DOM method outlined by Pursuit will take the most time/memory if you only want specific values out of the XML stream you are getting back over the HTTP connection.
While STAX will give you the fastest parsing approach in Java, using the API can be unwieldy especially if you are not that familiar with Java. You could use SJXP which is an extremely thin abstraction ontop of STAX parsing in Java (disclaimer: I am the author) that allows you to define paths to the elements you want, then you give the parser a stream (your HTTP stream in this case) and it pulls out all the values for you.
As an example, let's say you wanted the /root/state and /root/volume values out of the examples XML you posted, the actual Java would look something like this:
// Create /root/state rule
IRule stateRule = new DefaultRule(Type.CHARACTER, "/root/state") {
#Override
public void handleParsedCharacters(XMLParser parser, String text, Object userObject) {
System.out.println("State is: " + text);
}
}
// Create /root/volume rule
IRule volRule = new DefaultRule(Type.CHARACTER, "/state/volume") {
#Override
public void handleParsedCharacters(XMLParser parser, String text, Object userObject) {
System.out.println("Volume is: " + text);
}
}
// Create the parser with the given rules
XMLParser parser = new XMLParser(stateRule, volRule);
You can do all of that initialization on program start then at some point later when you are processing the stream from your HTTP connection, you would do something like:
parser.parser(httpConnection.getOutputStream());
or the like; then all of your handler code you defined in your rules will get called as the parser runs through the stream of characters from the HTTP connection.
As I mentioned I am not familiar with Matlab and don't know the proper ways to "Matlab-i-fy" this code, but it looks like from the first example you can more or less just use the Java APIs directly in which case this solution will both be faster and use significantly less memory for parsing if that is important than the DOM approach.
I’m trying to parse text from a file that comes in a pseudo XML format. I can get a DOM document out of it when it comes in the following structure:
<product>
<product_id>234567</product_id>
<description>abc</description>
</product>
The problem I’m running into happens when the structure is similar to the following:
<product>
<product_id>234567</product_id>
<description>abc</description>
<quantity 1:2>
<version>1.1</version>
</quantity 1:2>
<version>1.2</version>
<quantity 2:2>
</quantity 2:2>
</product>
It generates the following exception due to the space in <quantity 1:2>:
org.xml.sax.SAXParseException:[Fatal Error] :1:167: Element type " quantity " must be followed by either attribute specifications, ">" or "/>"
I can get around this by replacing the space with an underscore. The problem is the structure can be vary in size and include several child nodes with the same format (<node 1:x>) and the file can contain hundreds of structures to parse. Is there a class available that will parse text like this a return a tree-like object?
Your file is not an XML at all, and SAX is for XML (Simple API for XML). You should re-think your structure so you can do something like:
<quantity myAttr="1.2">
<version>1.2</version>
</quantity>
<quantity myAttr="1.x">
<version>1.1</version>
</quantity>
<version>1.0</version>
Or something like that.
Preprocess the file and change elements with that x:y form to <element value="x:y"/> then your DOM/SAX parsers will not choke.
I would suggest using a regular expression to help but that way leads to madness.
It generates the following exception due to the space in <quantity 1:2>
This is not the root cause of the error, the root cause is, as people have already mentioned, your file format is not valid XML. A valid XML tag would look like <quantity attr1="val1" attr2="val2>.
It sounds like you have no control over the file format. In this case I think the easiest way is to preprocess your file into valid XML then have DOM/SAX parser to parse it:
FileInputStream file = new FileInputStream("pseudo.pxml");
ByteArrayOutputStream temp = new ByteArrayOutputStream();
int c = -1;
while ((c=file.read()) >= 0){
temp.write(c);
}
String xml = new String(temp.toByteArray());
xml = xml.replaceAll("([^:\s]+:[^:\s]+)", "value=\"\\1\"");
ByteArrayInputStream xmlIn = new ByteArrayInputStream(xml.getBytes());
/* use xmlIn for your XML parsers */
Note that I did not test this code nor is it optimized; just wanted to give you an idea.
I have an xml trying to parse & read it, but dont know how many nodes the xml may contain? So I am trying to read the node & node values ?
How I get the same say:
<company>
<personNam>John</personName>
<emailId>abc#test.com</emaiId>
<department>Products</department>
(may have additionaly nodes & values for same)
</company>
Sorry forgot to add my code, using Dom:-
Document document = getDocumentBuilder().parse(new ByteArrayInputStream(myXML.getBytes("UTF-8")));
String xPathExp = "//company";
XPath xPath = getXPath();
NodeList nodeList = (NodeList)xPath.evaluate(xPathExp, document, XPathConstants.NODESET);
nodeListSize = nodeList.getLength();
System.out.println("#####nodeListSize"+nodeListSize);
for(int i=0;i<nodeListSize;i++){
element=(Element)nodeList.item(i);
m1XMLOutputResponse=element.getTextContent();
System.out.println("#####"+element.getTagName()+" "+element.getTextContent());
}
Consider using the JAXB library. It's really a painless way of mapping your XML to Java classes and back. The basic principle is that JAXB takes your XML Schemas (XSD) and generates corresponding Java classes for you. Then you just call marshall or unmarshall methods which populate your Java class with the contents of the XML, or generates the XML from your Java class.
The only drawback is, of course, that you'd need to know how to write the XML Schemas :)
Learn how to use XML DOM. Here is an example on how to use XML DOM to fetch node and node values.
We serialize/deserialize XML using XStream... and just got an OutOfMemory exception.
Firstly I don't understand why we're getting the error as we have 500MB allocated to the server.
Question is - what changes should we make to stay out of trouble? We want to ensure this implementation scales.
Currently we have ~60K objects, each ~50 bytes. We load the 60K POJO's in memory, and serialize them to a String which we send to a web service using HttpClient. When receiving, we get the entire String, then convert to POJO's. The XML/object hierarchy is like:
<root>
<meta>
<date>10/10/2009</date>
<type>abc</type>
</meta>
<data>
<field>x</field>
</data>
[thousands of <data>]
</root>
I gather the best approach is to not store the POJO's in memory and not write the contents to a single String. Instead we should write the individual <data> POJO's to a stream. XStream supports this but seems like the <meta> element wouldn't be supported. Data would need to be in form:
<root>
<data>
<field>x</field>
</data>
[thousands of <data>]
</root>
So what approach is easiest to stream the entire tree?
You definitely want to avoid serializing your POJOs into a humongous String and then writing that String out. Use the XStream APIs to serialize the POJOs directly to your OutputStream. I ran into the same situation earlier this year when I found that I was generating 200-300Mb XML documents and getting OutOfMemoryErrors. It was very easy to make the switch.
And ditto of course for the reading side. Don't read the XML into a String and ask XStream to deserialize from that String: deserialize directly from the InputStream.
You mention a second issue regarding not being able to serialize the <meta> element and the <data> elements. I don't think this is an XStream problem or limitation as I routinely serialize much more complex structures on the order of:
<myobject>
<item>foo</item>
<anotheritem>foo</anotheritem>
<alist>
<alistitem>
<value1>v1</value1>
<value2>v2</value2>
<value3>v3</value3>
...
</alistitem>
...
<alistitem>
<value1>v1</value1>
<value2>v2</value2>
<value3>v3</value3>
...
</alistitem>
</alist>
<anotherlist>
<anotherlistitem>
<valA>A</valA>
<valB>B</valB>
<valC>C</valC>
...
</anotherlistitem>
...
</anotherlist>
</myobject>
I've successfully serialized and deserialized nested lists too.
Not sure what the problem is here...you've found your answer on that webpage.
The example code on the link you provided suggests:
Writer someWriter = new FileWriter("filename.xml");
ObjectOutputStream out = xstream.createObjectOutputStream(someWriter, "root");
out.writeObject(dataObject);
// iterate over your objects...
out.close();
and for reading nearly identical but with Reader for Writer and Input for Output:
Reader someReader = new FileReader("filename.xml");
ObjectInputStream in = xstream.createObjectInputStream(someReader);
DataObject foo = (DataObject)in.readObject();
// do some stuff here while there's more objects...
in.close();
I'd suggest using tools like Visual VM or Eclipse Memory Analyzer to make sure you don't have a memory leak/problem.
Also, how do you know each object is 50 bytes? That doesn't sound likely.
Use XMLStreamWriter (or XStream) to serialize it, you can write whatever you want on it. If you have the option of getting the input stream instead of the entire string, use a SAXParser, it is event based and, although the implementation maybe a little bit clumsy, you will be able to read any XML that is thrown at you, even if it the XML is huge (I have parse 2GB+ more XML files with SAXParser).
Just as a side note, you should send the binary data and not the string to a XML parser. XML parsers will read the encoding of the byte array that is going to come next through the xml tag in the beginning of the XML sequence:
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
A string is encoded in something already. It's better practice to let the XML parse the original stream before you create a String with it.