The API I need to work with does not support xpath, which is a bit of a headache! :-( lol
The xml I want to parse is as a String. My questions:
Is there a Java equivalent of "simplexml_load_string", where it makes the string into an xml document for parsing?
Which is better for parsing, SAX or DOM? I need to get a couple of values out of the XML and the structure isn't that deep. [3 levels]
Thanks!
Maybe this will help you
//http://developer.android.com/intl/de/reference/android/content/res/XmlResourceParser.html
import org.xmlpull.v1.XmlPullParserException;
try {
XmlResourceParser xrp = ctx.getResources().getXml(R.xml.rules);
while (xrp.getEventType() != XmlResourceParser.END_DOCUMENT) {
if (xrp.getEventType() == XmlResourceParser.START_TAG) {
String s = xrp.getName();
if (s.equals("category")) {
String catname = xrp.getAttributeValue(null, "name");
String rule = xrp.getAttributeValue(null, "rule");
}
} else if (xrp.getEventType() == XmlResourceParser.END_TAG) {
;
} else if (xrp.getEventType() == XmlResourceParser.TEXT) {
;
}
xrp.next();
}
xrp.close();
} catch (XmlPullParserException xppe) {
Log.e(TAG(), "Failure of .getEventType or .next, probably bad file format");
xppe.toString();
} catch (IOException ioe) {
Log.e(TAG(), "Unable to read resource file");
ioe.printStackTrace();
}
Not sure.
If the XML file/string is small, DOM is a good choice as it provides more capability. SAX should be used for larger XML files where memory usage and performance is a concern.
Related
Can anyone help obtaining text from "Font Patterns (FNG)" field from an AFP file. Is there any library (preferably Java) which can be used for this task?
Thank you,
You can try afplib. It has some sample code that dumps all structured fields (org.afplib.samples.DumpAFP). It produces output like this:
...
FNG number:47,offset:49787,id:13889161,length:8201,rawData:null,charset:null,PatData:[B#4e3958e7,
FNG number:48,offset:57988,id:13889161,length:8201,rawData:null,charset:null,PatData:[B#77f80c04,
FNG number:49,offset:66189,id:13889161,length:8201,rawData:null,charset:null,PatData:[B#1dac5ef,
FNG number:50,offset:74390,id:13889161,length:6991,rawData:null,charset:null,PatData:[B#5c90e579,
EFN number:51,offset:81381,id:13871497,length:17,rawData:null,charset:null,RSName:C0EX0480,
You could use the binary array PatData to extract the font pattern like this:
try (AfpInputStream in = new AfpInputStream(
new BufferedInputStream(new FileInputStream(args[0])))) {
SF sf;
while((sf = in.readStructuredField()) != null) {
if(sf instanceof FNG) {
byte[] pattern = ((FNG)sf).getPatData();
}
}
} catch (IOException e) {
e.printStackTrace();
}
I want to read an XML file in Java and then update certain elements in that file with new values. My file is > 200mb and performance is important, so the DOM model cannot be used.
I feel that a StaX Parser is the solution, but there is no decent literature on using Java StaX to read and then write XML back to the same file.
(For reference I have been using the java tutorial and this helpful tutorial to get what I have so far)
I am using Java 7, but there doesn't seem to be any updates to the XML parsing API since...a long time ago. So this probably isn't relevant.
Currently I have this:
public static String readValueFromXML(final File xmlFile, final String value) throws FileNotFoundException, XMLStreamException
{
XMLEventReader reader = new XMLInputFactory.newFactory().createXMLEventReader(new FileReader(xmlFile));
String found = "";
boolean read = false;
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
if (event.isStartElement() &&
event.asStartElement().getName().getLocalPart().equals(value))
{
read = true;
}
if (event.isCharacters() && read)
{
found = event.asCharacters().getData();
break;
}
}
return found;
}
which will read the XMLFile and return the value of the selected element. However, I have another method updateXMLFile(final File xmlFile, final String value) which I want to use in conjunction with this.
So my question is threefold:
Is there a StaX implementation for editing XML
Will XPath be any help? Can that be used without converting my file to a Document?
(More Generally) Why doesn't Java have a better XML API?
There are two things you may want to look at. The first is to use JAXB to bind the XML to POJOs which you can then have your way with and serialize the structure back to XML when needed.
The second is a JDBC driver for XML, there are several available for a fee, not sure if there are any open source ones or not. In my experience JAXB is the better choice. If the XML file is too large to handle efficiently with JAXB I think you need to look at using a database as a replacement for the XML file.
This is my approach, which reads events from the file using StaX and writes them to another file. The values are updated as the loop passes over the correctly named elements.
public void read(String key, String value)
{
try (FileReader fReader = new FileReader(inputFile); FileWriter fWriter = new FileWriter(outputFile))
{
XMLEventFactory factory = XMLEventFactory.newInstance();
XMLEventReader reader = XMLInputFactory.newFactory().createXMLEventReader(fReader);
XMLEventWriter writer = XMLOutputFactory.newFactory().createXMLEventWriter(fWriter);
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
boolean update = false;
if (event.isStartElement() && event.asStartElement().getName().getLocalPart().equals(key))
{
update = true;
}
else if (event.isCharacters() && update)
{
Characters characters = factory.createCharacters(value);
event = characters;
update = false;
}
writer.add(event);
}
}
catch (XMLStreamException | FactoryConfigurationError | IOException e)
{
e.printStackTrace();
}
}
So I've been working on this project of mine for the past two weeks and I've not made any headway with this. My issue isn't with parsing the XML file to begin with, but rather what to do with it afterwards. So I've made programs with SAX, StAX and DOM parsers in which I take a very large XML file and then print out the elements and their values in order. However, the XML I'm dealing with is large so using DOM is inefficient of course. Another problem I have however is that the xml file has 40,000 entries of information and its structure is complicated. This is a little excerpt of it:
<metabolite>
<version>3.5</version>
<creation_date>2005-11-16 08:48:42 -0700</creation_date>
<update_date>2013-02-08 17:07:44 -0700</update_date>
<accession>HMDB00002</accession>
<secondary_accessions>
</secondary_accessions>
<name>1,3-Diaminopropane</name>
<description>1,3-Diaminopropane is a stable, flammable and highly hydroscopic fluid. It is a polyamine that is normally quite toxic if swallowed, inhaled or absorbed through the skin. It is a catabolic byproduct of spermidine. It is also a precursor in the enzymatic synthesis of beta-alanine. 1, 3-Diaminopropane is involved in the arginine/proline metabolic pathways and the beta-alanine metabolic pathway.</description>
<synonyms>
<synonym>1,3-Diamino-N-propane</synonym>
<synonym>1,3-Propanediamine</synonym>
<synonym>1,3-Propylenediamine</synonym>
<synonym>1,3-Trimethylenediamine</synonym>
<synonym>3-Aminopropylamine</synonym>
<synonym>a,w-Propanediamine</synonym>
<synonym>Propane-1,3-diamine</synonym>
<synonym>Trimethylenediamine</synonym>
</synonyms>
<chemical_formula>C3H10N2</chemical_formula>
So this one of 40 entries, and it contains many more elements etc in it. What I need to be able to do with my program is allow the user to select what information he wants from the 40,000 entry, and then return the information in the form of an excel sheet. So if I only wanted say the version number and name for all 40,000 entries, it'll return just those values into excel. Currently I've made a program that loops through using StAX and returns all the elements and values through print onto console. How would I go about creating a data structure, such as a tree or something, that would then allow me to do what it is that I want to do (i.e. traverse through that data and return only the data I'm seeking).
This is what I've done so far in terms of looping through my document and returning the information in order for the 40,000 entries:
public class xmlRead {
private static XMLStreamReader reader;
public xmlRead(){
try{
InputStream file = new FileInputStream("/Users/Kevlar/Dropbox/PhD/Java/HMDB/testOutput.xml");
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
reader = inputFactory.createXMLStreamReader(file);
assert(reader.getEventType() == XMLEvent.START_DOCUMENT);
} catch (XMLStreamException e){
System.err.println("XMLStreamException : " + e.getMessage());
} catch (FactoryConfigurationError e){
System.err.println("FactoryConfigurationError : " + e.getMessage());
} catch (FileNotFoundException e){
System.err.println("FileNotFoundException : " + e.getMessage());
}
}
public void metaboliteInfo() throws XMLStreamException{
while(reader.hasNext()){
int event = reader.getEventType();
if(event == XMLStreamConstants.START_ELEMENT && reader.getLocalName() == "metabolite"){
System.out.println("New " + reader.getLocalName());
mainElements(reader);
}
else if(event == XMLStreamConstants.END_DOCUMENT){
System.out.println("end of document");
break;
}
else{
reader.next();
}
}
reader.close();
}
public void mainElements(XMLStreamReader reader) throws XMLStreamException{
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
System.out.println("Element :" + reader.getLocalName());
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
private void subElements(XMLStreamReader reader) throws XMLStreamException {
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
System.out.println("Sub element :" + reader.getLocalName());
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
public void findElements(XMLStreamReader reader, String element) throws XMLStreamException{
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
if(reader.getLocalName() == element){
System.out.println(reader.getLocalName());
}
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
public static void main(String[] args) throws XMLStreamException{
xmlRead test = new xmlRead();
test.metaboliteInfo();
}
}
I should probably note here too that I'm not actually a programmer. I just have to deal with these XML files for the purpose of my research but don't have anyone else to do it for me so my knowledge about java is limited I'm afraid (i.e. explaining things in layman terms would be great).
Look up JAXB. This is a framework for converting XML to java code and vice versa. If you use JXB to auto generate your java classes for you, you don't need to worry about hand rolling your own data structure.
You'll need to start off with an XML schema, which defines what your XML file is allowed to look like. If you don't have one already, you can create an XML Schema Definition (XSD) file from the XML file, by using a tool such as XMLSpy.
JAXB provides a tool called xjc. This can be used to generate Java classes automatically from an XML schema. Where your XML has repeating tags, these java classes contain collections that can be iterated over.
XQuery solution. Using this exrpression you can filter input xml document:
declare function local:rewrite($node as node()) as node()?
{
typeswitch ($node)
case element() return
if (matches(local-name($node), "(version|name|synonym)")) then
element {node-name($node)}
{
$node/#*,
for $child in $node/node() return local:rewrite($child)
}
else
()
default return
$node
};
for $m in //metabolite
return <metabolite>{for $c in $m/node() return local:rewrite($c)}</metabolite>
Replace (version|name|synonym) with regexp that matches xml node names you need to provide.
Java 7 code that evaluates XQuery expression:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.Configuration;
import net.sf.saxon.om.DocumentInfo;
import net.sf.saxon.query.DynamicQueryContext;
import net.sf.saxon.query.StaticQueryContext;
import net.sf.saxon.query.XQueryExpression;
import org.xml.sax.InputSource;
// inside a method
Configuration config = new Configuration();
StaticQueryContext sqc = config.newStaticQueryContext();
DynamicQueryContext dqc = new DynamicQueryContext(config);
String xq = "XQUERY_EXPRESSION";
try (InputStream xmlFileInput = new FileInputStream("data.xml");
OutputStream xmlFileOutput = new FileOutputStream("data-filtered.xml")) {
XQueryExpression expression = sqc.compileQuery(xq);
SAXSource source = new SAXSource(new InputSource(xmlFileInput));
DocumentInfo di = config.buildDocument(source);
dqc.setContextItem(di);
expression.run(dqc, new StreamResult(xmlFileOutput), null);
} catch (Exception e) {
System.err.println(e.getMessage());
}
Saxon (e.g. saxon9he.jar) library must be present in classpath in order to compile and run this code.
I'm trying to write the output Strings into a word document using the following code :
try {
out = new PrintWriter(new BufferedWriter(new FileWriter("report.doc", true)));
out.println("<html><style>"+string1+"</style><table cellspacing = 0 cellpadding = 0><tr>" + string2 + "</html>" + (char)12);
} catch (IOException e) {
JOptionPane.showMessageDialog(null, "File error " + e.getMessage());
} finally {
if (out != null) {
try {
out.close();
} catch (Exception ignore) {
}
}
}
If this method is called (let's say) 10 times, it only writes the information of the first String. However, when I replace 'report.doc' with 'report.html', the created html file contains all the information of the 10 Strings.
How can I alter my code so that it can generate a word document with all the information as is the created html document?
You need to use a .doc processor java library. Without the help of word processor library, you need to know the format of .doc documents, just you know the structure of .html documents.
Apache poi is a good example of such a library.
Another approach is to port the MS Office libraries to a java library using COM bridges. I have been using a commercial tool for that purpose. JACOB seems an open source example of a Java-COM bridge, though I have not tested this product.
I'm making an android app - where I need to have weather-information. I've found this from yahoo weather. It's an XML and I want information such as: "day", "low" and "high".
Refer: http://weather.yahooapis.com/forecastrss?w=12718298&u=c
<yweather:forecast day="Sun" date="19 Feb 2012" low="-2" high="3" text="Clear" code="31"/>
(Line can be found in the bottom of the link)
I have no idea how to do this - please help. Source codes, examples and clues will be appreciated.
Here's the solution for future users:
InputStream inputXml = null;
try
{
inputXml = new URL("http://weather.yahooapis.com/forecastrss?w=12718298&u=c").openConnection().getInputStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.
newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(inputXml);
NodeList nodi = doc.getElementsByTagName("yweather:forecast");
if (nodi.getLength() > 0)
{
Element nodo = (Element)nodi.item(0);
String strLow = nodo.getAttribute("low");
Element nodo1 = (Element)nodi.item(0);
String strHigh = nodo1.getAttribute("high");
System.out.println("Temperature low: " + strLow);
System.out.println("Temperature high: " + strHigh);
}
}
catch (Exception ex)
{
System.out.println(ex.getMessage());
}
finally
{
try
{
if (inputXml != null)
inputXml.close();
}
catch (IOException ex)
{
System.out.println(ex.getMessage());
}
}
}
It's been a couple of years since I used XML in Android, but this was quite helpful to me when I started out: anddev.org
The link seems to be a feed. (which is XML, obviously). There are many feed-reader APIs in Java. So, here you go
Read feed documentation, http://developer.yahoo.com/weather/
Read how to parse/read the feed, Rome Library to read feeds Java
Pull out your desired fields.
I guess this is already done. (easily found on Google) http://www.javahouse.altervista.org/gambino/Articolo_lettura_feed_da_java_en.html