I'm using org.apache.xml.security.c14n.Canonicalizer which was recommended to me here: Sort xml attributes for pretty print using javax.xml.transform.Transformer. I will need it to run in Java 5 though.. it doesn't seem to work.
Are there any options?
XOM has a Canonicalizer which will do this.
In addition to being a very good general-purpose XML DOM library, it's a much more lightweight solution to canonicalization than your XSLT-based solution.
Related
Custom StAX Parser for XML using javax wrappers
How do you do this; or at least good suggestions on the right documentation / examples / tutorials?
I've been using the javax.xml.stream package to process XML files but the application is begging for some "non-standard XML" (easy to understand what the means if you're not picky). I can write the parser, but I want this to be configurable: so that the app continues to use the same XML processing code except for changing the parser as needed.
The hard part at this point is finding concrete info on how this is done. Documentation speaks of, for example, configuring the parameters of SAXParserFactory and such, but I haven't found specific documentation or examples. I've even looked into some existing StAX source code. Need some good hints / guidance on how this is done in order to move forward.
According to the documentation, you can't. You can use one of three approved parsers. Anything else will result in an error.
I just started with XBRL.
What Java lib(s) do you use for creating XBRL documents?
I find it hard to find "opensource" java libs for XBRL creation/manipulation.
#edbras: Check out Arelle.org. It's not Java, but it's free and it is python, and some java, which should be close enough. There are other commercial options if you are interested.
regards,
Per Solli
I use JAXB, and seem to work fine. I create the java code with jaxb ref implementation and use it through the factories...
I need to parse a large (>800MB) XML file from Jython. The XML is not deeply nested, containing about a million relevant elements. I need to convert these elements into real objects.
I've used nu.xom.* successfully before, but now that I've switched from Java to Jython, the library fails with the following message:
The parser has encountered more than
"64,000" entity expansions in this
document; this is the limit imposed by
the application.
I have not found a way to fix this, so I probably have to look for another XML library. It could be either Java or Jython-compatible Python and should be efficient. Pythonic would be great, nu.xom.* is simple but not very pythonic. Do you have any suggestions?
Sax is the best way to parse large documents.
Sounds like you're hitting the default expansion limit.
See this note:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4843787
You need to set System property "entityExpansionLimit" to change
the default.
(added) see also the answer to this question.
Try using the SAX parser, it is great for streaming large XML files.
Does jython support xml.etree.ElementTree? If so, use the iterparse method to keep your memory size down. Read this and use elem.clear() as described.
there is a lxml python library, that can parse large files, without loading data to memory.
but i don't know if i jython compatible
I need to convert XML data to Java objects. What would be best practice to convert this XML data to object?
Idea is to fetch data via a web service (it doesn't use WSDL, just HTTP GET queries, so I cannot use any framework) and answers are in XML. What would be best practice to handle this situation?
JAXB is a standard API for doing this: http://java.sun.com/developer/technicalArticles/WebServices/jaxb/
Have a look at XStream. It might not be the quickest, but it is one of the most user friendly and straightforward converters in Java, especially if your model is not complex.
For a JMS project we were marshalling and unmarshalling (going from java to xml and xml to java) XML embedded in TextMessages (string property). We tried JAXB, Jibx, and XMLBeans. We found that XMLBeans worked best for us. Fast, easily configurable, good documentation, and easy Maven integration.
I have used and will continue to use JDOM -> www.jdom.org
Another option is a Sax Parser. It is procedural - i.e. a visitor pattern - but if the xml is fairly lightweight, (and even medium weight) I have found it to be very useful for this.
JAXB API which comes in Java(In built).
I have used JIBX in MQ module. It works very well. Ant config is simple. Used Xsd2Jibx converter to generate the binding files and Java beans from XML schema. Marshalling and un-marshalling allow to specify character-set parameter. It was useful in my project to handle custom character-set. But I found an issue in the binding compiler. If the Java bean has lengthier path name, it generates class file with lengthier file name which will cause issue in Windows XP(it has a maximum file length limit).
I haven't used other APIs. So I am not trying to compare with others. If you decided to use JIBX, I hope this will be helpful.
More details, please refer JIBX website
I've used XStream as well, it is easy to use and customizable. You can add your own custom converters and that was very handy for me...
So surprised more people have not mentioned Jibx. Amazing lib and i think a lot simpler to use than Jaxb. Performance is also fab!
For this you can also consider apache's bitwixt and simple framework for xml
Does anyone know of a method, or library, to convert SGML into XML?
EDIT: For clarification, I have to do the conversion in Java, and I cannot use the SP parser or the related SX tool.
It seems that the general consensus is that there are no existing libraries for doing SGML work in Java. Certainly after several days of fruitlessly searching Google, and asking this question here, I have found no resources on this subject.
The answer is not always that simple, as it depends on the sgml DTD. I haven't actually found a general SGML parser in Java at all, but this article uses SP which includes a converter.
See http://jclark.com/sp/sx.htm for the SX converter from SGML to XML in the SP package.
There is the mlParser, but I'm having a hard time trying to locate it: http://www.balisage.net/Proceedings/vol1/html/Smith01/BalisageVol1-Smith01.html
There is no api for parsing SGML using Java at this time. There also isn't any api or library for converting SGML to XML and then parsing it using Java. With the status of SGML being supplanted by XML for all the projects I've worked on until now, I don't think there will every be any work done in this area, but that is only a guess.
Here is some open source code code from a University that does it, however I haven't tried it and you would have to search to find the other dependent classes. I believe the only viable solution in Java would require Regular Expressions.
Also, here is a link for public SGML/XML software.