So I've been working on this project of mine for the past two weeks and I've not made any headway with this. My issue isn't with parsing the XML file to begin with, but rather what to do with it afterwards. So I've made programs with SAX, StAX and DOM parsers in which I take a very large XML file and then print out the elements and their values in order. However, the XML I'm dealing with is large so using DOM is inefficient of course. Another problem I have however is that the xml file has 40,000 entries of information and its structure is complicated. This is a little excerpt of it:
<metabolite>
<version>3.5</version>
<creation_date>2005-11-16 08:48:42 -0700</creation_date>
<update_date>2013-02-08 17:07:44 -0700</update_date>
<accession>HMDB00002</accession>
<secondary_accessions>
</secondary_accessions>
<name>1,3-Diaminopropane</name>
<description>1,3-Diaminopropane is a stable, flammable and highly hydroscopic fluid. It is a polyamine that is normally quite toxic if swallowed, inhaled or absorbed through the skin. It is a catabolic byproduct of spermidine. It is also a precursor in the enzymatic synthesis of beta-alanine. 1, 3-Diaminopropane is involved in the arginine/proline metabolic pathways and the beta-alanine metabolic pathway.</description>
<synonyms>
<synonym>1,3-Diamino-N-propane</synonym>
<synonym>1,3-Propanediamine</synonym>
<synonym>1,3-Propylenediamine</synonym>
<synonym>1,3-Trimethylenediamine</synonym>
<synonym>3-Aminopropylamine</synonym>
<synonym>a,w-Propanediamine</synonym>
<synonym>Propane-1,3-diamine</synonym>
<synonym>Trimethylenediamine</synonym>
</synonyms>
<chemical_formula>C3H10N2</chemical_formula>
So this one of 40 entries, and it contains many more elements etc in it. What I need to be able to do with my program is allow the user to select what information he wants from the 40,000 entry, and then return the information in the form of an excel sheet. So if I only wanted say the version number and name for all 40,000 entries, it'll return just those values into excel. Currently I've made a program that loops through using StAX and returns all the elements and values through print onto console. How would I go about creating a data structure, such as a tree or something, that would then allow me to do what it is that I want to do (i.e. traverse through that data and return only the data I'm seeking).
This is what I've done so far in terms of looping through my document and returning the information in order for the 40,000 entries:
public class xmlRead {
private static XMLStreamReader reader;
public xmlRead(){
try{
InputStream file = new FileInputStream("/Users/Kevlar/Dropbox/PhD/Java/HMDB/testOutput.xml");
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
reader = inputFactory.createXMLStreamReader(file);
assert(reader.getEventType() == XMLEvent.START_DOCUMENT);
} catch (XMLStreamException e){
System.err.println("XMLStreamException : " + e.getMessage());
} catch (FactoryConfigurationError e){
System.err.println("FactoryConfigurationError : " + e.getMessage());
} catch (FileNotFoundException e){
System.err.println("FileNotFoundException : " + e.getMessage());
}
}
public void metaboliteInfo() throws XMLStreamException{
while(reader.hasNext()){
int event = reader.getEventType();
if(event == XMLStreamConstants.START_ELEMENT && reader.getLocalName() == "metabolite"){
System.out.println("New " + reader.getLocalName());
mainElements(reader);
}
else if(event == XMLStreamConstants.END_DOCUMENT){
System.out.println("end of document");
break;
}
else{
reader.next();
}
}
reader.close();
}
public void mainElements(XMLStreamReader reader) throws XMLStreamException{
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
System.out.println("Element :" + reader.getLocalName());
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
private void subElements(XMLStreamReader reader) throws XMLStreamException {
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
System.out.println("Sub element :" + reader.getLocalName());
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
public void findElements(XMLStreamReader reader, String element) throws XMLStreamException{
int level = 1;
do{
int event = reader.next();
if(event == XMLStreamConstants.START_ELEMENT){
if(reader.getLocalName() == element){
System.out.println(reader.getLocalName());
}
level++;
if(level == 2){
subElements(reader);
level--;
}
}
else if(event == XMLStreamConstants.CHARACTERS && !reader.isWhiteSpace()){
System.out.println(reader.getText());
}
else if(event == XMLStreamConstants.END_ELEMENT){
level--;
}
}while(level > 0);
reader.close();
}
public static void main(String[] args) throws XMLStreamException{
xmlRead test = new xmlRead();
test.metaboliteInfo();
}
}
I should probably note here too that I'm not actually a programmer. I just have to deal with these XML files for the purpose of my research but don't have anyone else to do it for me so my knowledge about java is limited I'm afraid (i.e. explaining things in layman terms would be great).
Look up JAXB. This is a framework for converting XML to java code and vice versa. If you use JXB to auto generate your java classes for you, you don't need to worry about hand rolling your own data structure.
You'll need to start off with an XML schema, which defines what your XML file is allowed to look like. If you don't have one already, you can create an XML Schema Definition (XSD) file from the XML file, by using a tool such as XMLSpy.
JAXB provides a tool called xjc. This can be used to generate Java classes automatically from an XML schema. Where your XML has repeating tags, these java classes contain collections that can be iterated over.
XQuery solution. Using this exrpression you can filter input xml document:
declare function local:rewrite($node as node()) as node()?
{
typeswitch ($node)
case element() return
if (matches(local-name($node), "(version|name|synonym)")) then
element {node-name($node)}
{
$node/#*,
for $child in $node/node() return local:rewrite($child)
}
else
()
default return
$node
};
for $m in //metabolite
return <metabolite>{for $c in $m/node() return local:rewrite($c)}</metabolite>
Replace (version|name|synonym) with regexp that matches xml node names you need to provide.
Java 7 code that evaluates XQuery expression:
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.Configuration;
import net.sf.saxon.om.DocumentInfo;
import net.sf.saxon.query.DynamicQueryContext;
import net.sf.saxon.query.StaticQueryContext;
import net.sf.saxon.query.XQueryExpression;
import org.xml.sax.InputSource;
// inside a method
Configuration config = new Configuration();
StaticQueryContext sqc = config.newStaticQueryContext();
DynamicQueryContext dqc = new DynamicQueryContext(config);
String xq = "XQUERY_EXPRESSION";
try (InputStream xmlFileInput = new FileInputStream("data.xml");
OutputStream xmlFileOutput = new FileOutputStream("data-filtered.xml")) {
XQueryExpression expression = sqc.compileQuery(xq);
SAXSource source = new SAXSource(new InputSource(xmlFileInput));
DocumentInfo di = config.buildDocument(source);
dqc.setContextItem(di);
expression.run(dqc, new StreamResult(xmlFileOutput), null);
} catch (Exception e) {
System.err.println(e.getMessage());
}
Saxon (e.g. saxon9he.jar) library must be present in classpath in order to compile and run this code.
Related
I want to read an XML file in Java and then update certain elements in that file with new values. My file is > 200mb and performance is important, so the DOM model cannot be used.
I feel that a StaX Parser is the solution, but there is no decent literature on using Java StaX to read and then write XML back to the same file.
(For reference I have been using the java tutorial and this helpful tutorial to get what I have so far)
I am using Java 7, but there doesn't seem to be any updates to the XML parsing API since...a long time ago. So this probably isn't relevant.
Currently I have this:
public static String readValueFromXML(final File xmlFile, final String value) throws FileNotFoundException, XMLStreamException
{
XMLEventReader reader = new XMLInputFactory.newFactory().createXMLEventReader(new FileReader(xmlFile));
String found = "";
boolean read = false;
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
if (event.isStartElement() &&
event.asStartElement().getName().getLocalPart().equals(value))
{
read = true;
}
if (event.isCharacters() && read)
{
found = event.asCharacters().getData();
break;
}
}
return found;
}
which will read the XMLFile and return the value of the selected element. However, I have another method updateXMLFile(final File xmlFile, final String value) which I want to use in conjunction with this.
So my question is threefold:
Is there a StaX implementation for editing XML
Will XPath be any help? Can that be used without converting my file to a Document?
(More Generally) Why doesn't Java have a better XML API?
There are two things you may want to look at. The first is to use JAXB to bind the XML to POJOs which you can then have your way with and serialize the structure back to XML when needed.
The second is a JDBC driver for XML, there are several available for a fee, not sure if there are any open source ones or not. In my experience JAXB is the better choice. If the XML file is too large to handle efficiently with JAXB I think you need to look at using a database as a replacement for the XML file.
This is my approach, which reads events from the file using StaX and writes them to another file. The values are updated as the loop passes over the correctly named elements.
public void read(String key, String value)
{
try (FileReader fReader = new FileReader(inputFile); FileWriter fWriter = new FileWriter(outputFile))
{
XMLEventFactory factory = XMLEventFactory.newInstance();
XMLEventReader reader = XMLInputFactory.newFactory().createXMLEventReader(fReader);
XMLEventWriter writer = XMLOutputFactory.newFactory().createXMLEventWriter(fWriter);
while (reader.hasNext())
{
XMLEvent event = reader.nextEvent();
boolean update = false;
if (event.isStartElement() && event.asStartElement().getName().getLocalPart().equals(key))
{
update = true;
}
else if (event.isCharacters() && update)
{
Characters characters = factory.createCharacters(value);
event = characters;
update = false;
}
writer.add(event);
}
}
catch (XMLStreamException | FactoryConfigurationError | IOException e)
{
e.printStackTrace();
}
}
I have an object arraylist, can someone please help me by telling me the most efficient way to write AND retrieve an object from file?
Thanks.
My attempt
public static void LOLadd(String ab, String cd, int ef) throws IOException {
MyShelf newS = new MyShelf();
newS.Fbooks = ab;
newS.Bbooks = cd;
newS.Cbooks = ef;
InfoList.add(newS);
FileWriter fw;
fw = new FileWriter("UserInfo.out.txt");
PrintWriter outt = new PrintWriter(eh);
for (int i = 0; i <InfoList.size(); i++)
{
String ax = InfoList.get(i).Fbooks;
String ay = InfoList.get(i).Bbooks;
int az = InfoList.get(i).Cbooks;
output.print(ax + " " + ay + " " + az); //Output all the words to file // on the same line
output.println(""); //Make space
}
fw.close();
output.close();
}
My attempt to retrieve file. Also, after retrieving file, how can I read each column of Objects?? For example, if I have ::::: Fictions, Dramas, Plays --- How can I read, get, replace, delete, and add values to Dramas column?
public Object findUsername(String a) throws FileNotFoundException, IOException, ClassNotFoundException
{
ObjectInputStream sc = new ObjectInputStream(new FileInputStream("myShelf.out.txt"));
//ArrayList<Object> List = new ArrayList<Object>();
InfoList = null;
Object obj = (Object) sc.readObject();
InfoList.add((UserInfo) obj);
sc.close();
for (int i=0; i <InfoList.size(); i++) {
if (InfoList.get(i).user.equals(a)){
return "something" + InfoList.get(i);
}
}
return "doesn't exist";
}
public static String lbooksMatching(String b) {
//Retrieve data from file
//...
for (int i=0; i<myShelf.size(); i++) {
if (myShelf.get(i).user.equals (b))
{
return b;
}
else
{
return "dfbgregd";
}
}
return "dfgdfge";
}
public static String matching(String qp) {
for (int i=0; i<myShelf.size(); i++) {
if (myShelf.get(i).pass.equals (qp))
{
return c;
}
else
{
return "Begegne";
}
}
return "Bdfge";
}
Thanks!!!
It seems like you want to serialize an object and persist that serialized form to some kind of storage (in this case a file).
2 important remarks here :
Serialization
Internal java serialization
Java provides automatic serialization which requires that the object be marked by implementing the java.io.Serializable interface. Implementing the interface marks the class as "okay to serialize," and Java then handles serialization internally.
See this post for a code sample on how to serialize /
deserialize an object to/from bytes.
This might nog always be the ideal way to persist an object, as you have no control over the format (handled by java), it's not human readable, and you can versioning issues if your objects change.
Marshalling to JSON or XML
A better way to seralize an object to disk is to use another data format like XML or JSON.
A sample on how to convert an object to/from a JSON structure can be found here.
Important : I would not do the kind of serialization in code like you're doing unless there is a very good reason (that I don't see here). It quickly becomes messy and is subject to change when your objects change. I would opt for a more automated way of serializing. Also, when using a format like JSON / XML, you know that there are tons of APIs available to read/write to that format, so all of that serialization / deserialization logic doesn't need to be implemented by you anymore.
Persistence
Writing your serialized object to a file isn't always a good idea for various reasons (no versioning / concurrency issues / .....).
A better approach is to use a database. If it's a hierarchical database, take a look at Hibernate or JPA to persist your objects with very little code.
If it's a document database like MongoDB, you can persist your JSON serialized representation.
There are tons of resources available on persisting objects to databases in Java. I would suggest checking out JPA, the the standard API for persistence and object/relational mapping .
Here is another basic example, which will give you insight into Arraylist,constructor and writing output to file:
After running this, if you are using IDE go to project folder, there you will file *.txt file.
import java.io.*;
import java.util.List;
import java.util.ArrayList;
import java.util.logging.Level;
import java.util.logging.Logger;
public class ListOfNumbers {
private List<Integer> list;
private static final int SIZE = 10;//whatever size you wish
public ListOfNumbers () {
list = new ArrayList<Integer>(SIZE);
for (int i = 0; i < SIZE; i++) {
list.add(new Integer(i));
}
}
public void writeList() {
PrintWriter out = null;
try {
out = new PrintWriter(new FileWriter("ManOutFile.txt"));
for (int i = 0; i < SIZE; i++) {
out.println("Value at: " + i + " = " + list.get(i));
}
out.close();
} catch (IOException ex) {
Logger.getLogger(ListOfNumbers.class.getName()).log(Level.SEVERE, null, ex);
} finally {
out.close();
}
}
public static void main(String[] args)
{
ListOfNumbers lnum=new ListOfNumbers();
lnum.writeList();
}
}
xml looks like so:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
I'm using stax to process one "<statement>" at a time and I got that working. I need to get that entire statement node as a string so I can create "123.xml" and "456.xml" or maybe even load it into a database table indexed by account.
using this approach: http://www.devx.com/Java/Article/30298/1954
I'm looking to do something like this:
String statementXml = staxXmlReader.getNodeByName("statement");
//load statementXml into database
I had a similar task and although the original question is older than a year, I couldn't find a satisfying answer. The most interesting answer up to now was Blaise Doughan's answer, but I couldn't get it running on the XML I am expecting (maybe some parameters for the underlying parser could change that?). Here the XML, very simplyfied:
<many-many-tags>
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
</many-many-tags>
My solution:
public static String readElementBody(XMLEventReader eventReader)
throws XMLStreamException {
StringWriter buf = new StringWriter(1024);
int depth = 0;
while (eventReader.hasNext()) {
// peek event
XMLEvent xmlEvent = eventReader.peek();
if (xmlEvent.isStartElement()) {
++depth;
}
else if (xmlEvent.isEndElement()) {
--depth;
// reached END_ELEMENT tag?
// break loop, leave event in stream
if (depth < 0)
break;
}
// consume event
xmlEvent = eventReader.nextEvent();
// print out event
xmlEvent.writeAsEncodedUnicode(buf);
}
return buf.getBuffer().toString();
}
Usage example:
XMLEventReader eventReader = ...;
while (eventReader.hasNext()) {
XMLEvent xmlEvent = eventReader.nextEvent();
if (xmlEvent.isStartElement()) {
StartElement elem = xmlEvent.asStartElement();
String name = elem.getName().getLocalPart();
if ("DESCRIPTION".equals(name)) {
String xmlFragment = readElementBody(eventReader);
// do something with it...
System.out.println("'" + fragment + "'");
}
}
else if (xmlEvent.isEndElement()) {
// ...
}
}
Note that the extracted XML fragment will contain the complete extracted body content, including white space and comments. Filtering those on demand, or making the buffer size parametrizable have been left out for code brevity:
'
<description>
...
<p>Lorem ipsum...</p>
Devils inside...
...
</description>
'
You can use StAX for this. You just need to advance the XMLStreamReader to the start element for statement. Check the account attribute to get the file name. Then use the javax.xml.transform APIs to transform the StAXSource to a StreamResult wrapping a File. This will advance the XMLStreamReader and then just repeat this process.
import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
File file = new File("out" + xsr.getAttributeValue(null, "account") + ".xml");
t.transform(new StAXSource(xsr), new StreamResult(file));
}
}
}
Stax is a low-level access API, and it does not have either lookups or methods that access content recursively. But what you actually trying to do? And why are you considering Stax?
Beyond using a tree model (DOM, XOM, JDOM, Dom4j), which would work well with XPath, best choice when dealing with data is usually data binding library like JAXB. With it you can pass Stax or SAX reader and ask it to bind xml data into Java beans and instead of messing with xml process Java objects. This is often more convenient, and it is usually quite performance.
Only trick with larger files is that you do not want to bind the whole thing at once, but rather bind each sub-tree (in your case, one 'statement' at a time).
This is easiest done by iterating Stax XmlStreamReader, then using JAXB to bind.
I've been googling and this seems painfully difficult.
given my xml I think it might just be simpler to:
StringBuilder buffer = new StringBuilder();
for each line in file {
buffer.append(line)
if(line.equals(STMT_END_TAG)){
parse(buffer.toString())
buffer.delete(0,buffer.length)
}
}
private void parse(String statement){
//saxParser.parse( new InputSource( new StringReader( xmlText ) );
// do stuff
// save string
}
Why not just use xpath for this?
You could have a fairly simple xpath to get all 'statement' nodes.
Like so:
//statement
EDIT #1: If possible, take a look at dom4j. You could read the String and get all 'statement' nodes fairly simply.
EDIT #2: Using dom4j, this is how you would do it:
(from their cookbook)
String text = "your xml here";
Document document = DocumentHelper.parseText(text);
public void bar(Document document) {
List list = document.selectNodes( "//statement" );
// loop through node data
}
I had the similar problem and found the solution.
I used the solution proposed by #t0r0X but it does not work well in the current implementation in Java 11, the method xmlEvent.writeAsEncodedUnicode creates the invalid string representation of the start element (in the StartElementEvent class) in the result XML fragment, so I had to modify it, but then it seems to work well, what I could immediatelly verify by the parsing of the fragment by DOM and JaxBMarshaller to specific data containers.
In my case I had the huge structure
<Orders>
<ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
.....
</ns2:SyncOrder>
<ns2:SyncOrder xmlns:ns2="..." xmlns:ns3="....." ....>
.....
</ns2:SyncOrder>
...
</Orders>
in the file of multiple hundred megabytes (a lot of repeating "SyncOrder" structures), so the usage of DOM would lead to a large memory consumption and slow evaluation. Therefore I used the StAX to split the huge XML to smaller XML pieces, which I have analyzed with DOM and used the JaxbElements generated from the xsd definition of the element SyncOrder (This infrastructure I had from the webservice, which uses the same structure, but it is not important).
In this code there can be seen Where the XML fragment has een created and could be used, I used it directly in other processing...
private static <T> List<T> unmarshallMultipleSyncOrderXmlData(
InputStream aOrdersXmlContainingSyncOrderItems,
Function<SyncOrderType, T> aConversionFunction) throws XMLStreamException, ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory locDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
locDocumentBuilderFactory.setNamespaceAware(true);
DocumentBuilder locDocBuilder = locDocumentBuilderFactory.newDocumentBuilder();
List<T> locResult = new ArrayList<>();
XMLInputFactory locFactory = XMLInputFactory.newFactory();
XMLEventReader locReader = locFactory.createXMLEventReader(aOrdersXmlContainingSyncOrderItems);
boolean locIsInSyncOrder = false;
QName locSyncOrderElementQName = null;
StringWriter locXmlTextBuffer = new StringWriter();
int locDepth = 0;
while (locReader.hasNext()) {
XMLEvent locEvent = locReader.nextEvent();
if (locEvent.isStartElement()) {
if (locDepth == 0 && Objects.equals(locEvent.asStartElement().getName().getLocalPart(), "Orders")) {
locDepth++;
} else {
if (locDepth <= 0)
throw new IllegalStateException("There has been passed invalid XML stream intot he function. "
+ "Expecting the element 'Orders' as the root alament of the document, but found was '"
+ locEvent.asStartElement().getName().getLocalPart() + "'.");
locDepth++;
if (locSyncOrderElementQName == null) {
/* First element after the "Orders" has passed, so we retrieve
* the name of the element with the namespace prefix: */
locSyncOrderElementQName = locEvent.asStartElement().getName();
}
if(Objects.equals(locEvent.asStartElement().getName(), locSyncOrderElementQName)) {
locIsInSyncOrder = true;
}
}
} else if (locEvent.isEndElement()) {
locDepth--;
if(locDepth == 1 && Objects.equals(locEvent.asEndElement().getName(), locSyncOrderElementQName)) {
locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
/* at this moment the call of locXmlTextBuffer.toString() gets the complete fragment
* of XML containing the valid SyncOrder element, but I have continued to other processing,
* which immediatelly validates the produced XML fragment is valid and passes the values
* to communication object: */
Document locDocument = locDocBuilder.parse(new ByteArrayInputStream(locXmlTextBuffer.toString().getBytes()));
SyncOrderType locItem = unmarshallSyncOrderDomNodeToCo(locDocument);
locResult.add(aConversionFunction.apply(locItem));
locXmlTextBuffer = new StringWriter();
locIsInSyncOrder = false;
}
}
if (locIsInSyncOrder) {
if (locEvent.isStartElement()) {
/* here replaced the standard implementation of startElement's method writeAsEncodedUnicode: */
locXmlTextBuffer.write(startElementToStrng(locEvent.asStartElement()));
} else {
locEvent.writeAsEncodedUnicode(locXmlTextBuffer);
}
}
}
return locResult;
}
private static String startElementToString(StartElement aStartElement) {
StringBuilder locStartElementBuffer = new StringBuilder();
// open element
locStartElementBuffer.append("<");
String locNameAsString = null;
if ("".equals(aStartElement.getName().getNamespaceURI())) {
locNameAsString = aStartElement.getName().getLocalPart();
} else if (aStartElement.getName().getPrefix() != null
&& !"".equals(aStartElement.getName().getPrefix())) {
locNameAsString = aStartElement.getName().getPrefix()
+ ":" + aStartElement.getName().getLocalPart();
} else {
locNameAsString = aStartElement.getName().getLocalPart();
}
locStartElementBuffer.append(locNameAsString);
// add any attributes
Iterator<Attribute> locAttributeIterator = aStartElement.getAttributes();
Attribute attr;
while (locAttributeIterator.hasNext()) {
attr = locAttributeIterator.next();
locStartElementBuffer.append(" ");
locStartElementBuffer.append(attributeToString(attr));
}
// add any namespaces
Iterator<Namespace> locNamespaceIterator = aStartElement.getNamespaces();
Namespace locNamespace;
while (locNamespaceIterator.hasNext()) {
locNamespace = locNamespaceIterator.next();
locStartElementBuffer.append(" ");
locStartElementBuffer.append(attributeToString(locNamespace));
}
// close start tag
locStartElementBuffer.append(">");
// return StartElement as a String
return locStartElementBuffer.toString();
}
private static String attributeToString(Attribute aAttr) {
if( aAttr.getName().getPrefix() != null && aAttr.getName().getPrefix().length() > 0 )
return aAttr.getName().getPrefix() + ":" + aAttr.getName().getLocalPart() + "='" + aAttr.getValue() + "'";
else
return aAttr.getName().getLocalPart() + "='" + aAttr.getValue() + "'";
}
public static SyncOrderType unmarshallSyncOrderDomNodeToCo(
Node aSyncOrderItemNode) {
Source locSource = new DOMSource(aSyncOrderItemNode);
Object locUnmarshalledObject = getMarshallerAndUnmarshaller().unmarshal(locSource);
SyncOrderType locCo = ((JAXBElement<SyncOrderType>) locUnmarshalledObject).getValue();
return locCo;
}
The API I need to work with does not support xpath, which is a bit of a headache! :-( lol
The xml I want to parse is as a String. My questions:
Is there a Java equivalent of "simplexml_load_string", where it makes the string into an xml document for parsing?
Which is better for parsing, SAX or DOM? I need to get a couple of values out of the XML and the structure isn't that deep. [3 levels]
Thanks!
Maybe this will help you
//http://developer.android.com/intl/de/reference/android/content/res/XmlResourceParser.html
import org.xmlpull.v1.XmlPullParserException;
try {
XmlResourceParser xrp = ctx.getResources().getXml(R.xml.rules);
while (xrp.getEventType() != XmlResourceParser.END_DOCUMENT) {
if (xrp.getEventType() == XmlResourceParser.START_TAG) {
String s = xrp.getName();
if (s.equals("category")) {
String catname = xrp.getAttributeValue(null, "name");
String rule = xrp.getAttributeValue(null, "rule");
}
} else if (xrp.getEventType() == XmlResourceParser.END_TAG) {
;
} else if (xrp.getEventType() == XmlResourceParser.TEXT) {
;
}
xrp.next();
}
xrp.close();
} catch (XmlPullParserException xppe) {
Log.e(TAG(), "Failure of .getEventType or .next, probably bad file format");
xppe.toString();
} catch (IOException ioe) {
Log.e(TAG(), "Unable to read resource file");
ioe.printStackTrace();
}
Not sure.
If the XML file/string is small, DOM is a good choice as it provides more capability. SAX should be used for larger XML files where memory usage and performance is a concern.
For this project I'm working on, I want to take multiple excel sheets and then merge them into one, manipulating the data as I please to make everything a little more readable.
What would be the best way to open files, read their contents, store that content, create a new file (.csv), then paste the information in the organization of my choosing?
I definitely need to stick to java, as this will be part of a pre-existing automated process and I don't want to have to change everything to another language.
Is there a useful package out there that I should know about?
Many thanks
Justian
I think any serious work in Excel should consider Joel's solution of letting Office do it for you on a Windows machine you call remotely if necessary. If, however, your needs are simple enough or you really need a pure Java solution, Apache's POI library does a good enough job.
As far as I know, csv is not excel-specific, but rather just a "comma-separated values"-file.
So this might help you.
Writing CSV files is usually very simple, for obvious reasons. You can write your own helper class to do it. The caveat is to ensure that you do not have your delimeter in any of the outputs.
Reading CSV is trickier. There isn't a standard library like there is in Python (a much better language, IMHO, for doing CSV processing), but if you search for it there are a lot of decent free implementations around.
The biggest question is the internal representation in your program: Depending on the size of your inputs and outputs, keeping everything in memory may be out of the question. Can you do everything in one pass? (I mean, read some, write some, etc.)
You may also want to use sparse representations rather than just represent all the spreadsheets in an array.
Maybe you should try this one:
Jxcell,it is a java spreadsheet component,and can read/write/edit all xls/xlsx/csv files.
Try this code
import java.util.*;
import java.util.Map.Entry;
import java.util.concurrent.TimeoutException;
import java.util.logging.Logger;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.poi.ss.usermodel.*;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.xssf.usermodel.XSSFWorkbookFactory;
public class App {
public void convertExcelToCSV(Sheet sheet, String sheetName) {
StringBuffer data = new StringBuffer();
try {
FileOutputStream fos = new FileOutputStream("C:\\Users\\" + sheetName + ".csv");
Cell cell;
Row row;
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext()) {
row = rowIterator.next();
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
cell = cellIterator.next();
CellType type = cell.getCellTypeEnum();
if (type == CellType.BOOLEAN) {
data.append(cell.getBooleanCellValue() + ",");
} else if (type == CellType.NUMERIC) {
data.append(cell.getNumericCellValue() + ",");
} else if (type == CellType.STRING) {
data.append(cell.getStringCellValue() + ",");
} else if (type == CellType.BLANK) {
data.append("" + ",");
} else {
data.append(cell + ",");
}
}
data.append('\n');
}
fos.write(data.toString().getBytes());
fos.close();
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
}
public static void main(String [] args)
{
App app = new App();
String path = "C:\\Users\\myFile.xlsx";
InputStream inp = null;
try {
inp = new FileInputStream(path);
Workbook wb = WorkbookFactory.create(inp);
for(int i=0;i<wb.getNumberOfSheets();i++) {
System.out.println(wb.getSheetAt(i).getSheetName());
app.convertExcelToCSV(wb.getSheetAt(i),wb.getSheetAt(i).getSheetName());
}
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
finally {
try {
inp.close();
} catch (Exception ex) {
System.out.println(ex.getMessage());
}
}
}
}