Are there faster XML parsers in Java than Xalan/Xerces [closed]

Are there faster XML parsers in Java than Xalan/Xerces [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I haven't found many ways to increase the performance of a Java application that does intensive XML processing other than to leverage hardware such as Tarari or Datapower. Does anyone know of any open source ways to accelerate XML parsing?

Take a look at Stax (streaming) parsers. See the sun reference manual. One of the implementations is the woodstox project.

Since it hasn't been directly mentioned, I'll throw in Aalto, which is fastest java xml parser according to some measurements, like:
JVM-serializers (which compares, XML, JSON, protobuf, Thrift etc etc)
Alternative serialization methods for WSTest (Java web services)
which are not written by Aalto developers.

VTD-XML is very fast.
It has a DOM-like API and even XPath queries.

Piccolo claims to be pretty fast. Can't say I've used it myself though. You might also try JDOM. As ever, benchmark with representative data of your real load.
It partly depends on what you're trying to do. Do you need to pull the whole document into memory, or can you operate in a streaming manner? Different approaches have different trade-offs and are better for different situations.

Depending on the complexity of your XML messages you might find a custom parser can be 10x faster (though more work to write) However if performance is critical, I wouldn't suggest using a generic parser. (Also I wouldn't suggest using XML as its not designed for performance, but that's another story, .. ;)

Check Javolution as well

Related

Equivalent of FileReader for Java resources [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there a utility class (e.g. in commons-io or Guava, if not in core Java) that's the equivalent of FileReader, but for resources? I mean, yes, I can write
Reader myReader = new InputStreamReader(getClass().getResourceAsStream("myResource"));
but it would be nice to do it with less boilerplate noise.

There is no more concise what of writing this using core Java classes.
Guava has some helpers for dealing with resources, but nothing that wraps a resource as a Reader.
There is nothing relevant in Apache commons.
And in fact, what you've written is arguable wrong. It depends on the platform character encoding being the same as the encoding of your embedded resources. IMO, that is a more important issue than the amount of boiler plate code you need to write.
You can address the boilerplate "problem" by writing your own utility methods.
Re: this "reason" for not writing your own utility method.
Because if somebody's already done it I don't want to maintain it myself.
Assuming that you write the method correctly, the maintenance effort will be almost zero. And since (with the hint above) you now know how to write it correctly, the implementation effort will be almost zero too. You've probably expended more effort looking for an existing helper (and asking here) than you would have saved ... if you'd found one.

Good localization framework for Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm looking for best localization framework for Java. Default localization framework is not enough for me.
Ideally I'm looking for localization framework with following features:
Style support - like Apache Wicket localization framework does
Formating support - dates and numbers format support
Plurals support - like GWT localization framework does
Message hierarchy - localization finding among hierarchy gained at message level. Typically is lookup path name[style][locale] -> name[locale] -> name[style] -> name, but it is gained at file level - if some message is not in file it wouldn't be translated. I would like to have localization gained at message level.
If you know about some good localization framework, please give me echo. If you think that my requirements are somehow wrong, please let me know also.

Are you looking for a l10n framework or a web-framework with l10n features?
If your case is the second one, except for the Plurals support, Apache Tapestry it's a wonderful choice. :)
http://tapestry.apache.org

Take a look at ICU4J.
Especially things like MessageFormat, PluralFormat, SelectFormat to deal with messages, and that can also take care of date/time/number/currency formatting. Or use various formatters directly (DateFormat, NumberFormat, etc.)
ChoiceFormat in Java is also worth a look.
But in general the resource bundles mechanism coupled with MessageFormat is almost always enough (take a look at UResourceBundle in ICU4J, more powerful than the Java ResourceBundle).

Lightweight Java socket library [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I've used Mina and Netty, but now I'm in the market for a lightweight library that may also be used in Android. I prefer Nio or AsyncIo over standard io implementations.
Update 1
The lack of responses really makes me think I should write my own library. Right now I'm using raw NIO and its not a lot of fun.

You might try using some pieces from Jetty as suggested in this email. I really like Jetty because it's small, self contained, and you can use some or all of it flexibly.

Since this seems to be dead on arrival, I'll answer it by saying my custom IO library will be the best.

To answer your question, there is no one size fits all async library. Netty and Mina might be the closest to such a thing, but most projects may still have to contain some pure NIO/ASYNCIO customized solutions.
I maintain you are on the right track. The more experience you have with low-level NIO/ASYNCIO the more you will appreciate and be able to get the most out of the somewhat-less-low-level Netty.

Looking for a simple Java spider [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to supply a base URL (such as http://www.wired.com) and need to spider through the entire site outputting an array of pages (off the base URL). Is there any library that would do the trick?
Thanks.

I have used Web Harvest a couple of times, and it is quite good for web scraping.
Web-Harvest is Open Source Web Data
Extraction tool written in Java. It
offers a way to collect desired Web
pages and extract useful data from
them. In order to do that, it
leverages well established techniques
and technologies for text/xml
manipulation such as XSLT, XQuery and
Regular Expressions. Web-Harvest
mainly focuses on HTML/XML based web
sites which still make vast majority
of the Web content. On the other hand,
it could be easily supplemented by
custom Java libraries in order to
augment its extraction capabilities.
Alternatively, you can roll your own web scraper using tools such as JTidy to first convert an HTML document to XHTML, and then processing the information you need with XPath. For example, a very naïve XPath expression to extract all hyperlinks from http://www.wired.com, would be something like //a[contains(#href,'wired')]/#href. You can find some sample code for this approach in this answer to a similar question.

'Simple' is perhaps not a relevant concept here. it's a complex task. I recommend nutch.

Is there a library similar to pyparsing in Java? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.

Another good parser generator is ANTLR, that might be what you're looking for.

May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:
Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.

A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)

I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.
There is a good list of parser (generators or not) at Java Source.

There are quite a number choices for stringhandling in java.
Maybe the very basic java.util.Scanner and java.util.StringTokenizer Classes are helpfull for you?
Another good choice is maybe the org.apache.commons.lang.text library.
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/text/package-summary.html

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Are there faster XML parsers in Java than Xalan/Xerces [closed] - java

Take a look at Stax (streaming) parsers. See the sun reference manual. One of the implementations is the woodstox project.

VTD-XML is very fast. It has a DOM-like API and even XPath queries.

Check Javolution as well

Related

Equivalent of FileReader for Java resources [closed]

Good localization framework for Java [closed]

Lightweight Java socket library [closed]

Looking for a simple Java spider [closed]

Is there a library similar to pyparsing in Java? [closed]

Categories

Resources