Is there a library similar to pyparsing in Java? [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I need to quickly build a parser for a very simplified version of a html-like markup language in Java. In python, I would use pyparsing library to do this. Is there something similar for Java? Please, don't suggest libraries already out there for html parsing, my application is a school assignment which will demonstrate walking a tree of objects and serializing to text using visitor pattern, so I'm not thinking in real world terms here. Basically all I need here is tags, attributes and text nodes.

Another good parser generator is ANTLR, that might be what you're looking for.

May be overkill for your use, but javacc is an excellent industrial-strength parser generator. I've used this program/library several times, its reliable and worth learning, particularly if you are going to work with languages and compilers. Here's the description of the program from the website listed above:
Java Compiler Compiler [tm] (JavaCC [tm]) is the most popular parser generator for use with Java [tm] applications. A parser generator is a tool that reads a grammar specification and converts it to a Java program that can recognize matches to the grammar. In addition to the parser generator itself, JavaCC provides other standard capabilities related to parser generation such as tree building (via a tool called JJTree included with JavaCC), actions, debugging, etc.

A quick search for parser generators in Java yields JParsec. I've never used it - but it's inspired by a Haskell library, so by definition it must be good:-)

I like JParsec (which I just discovered thanks to Torsten) because it doesn't generate code... :-) Perhaps less efficient, but enough for small tasks.
I found a similar library, JTopas.
There is a good list of parser (generators or not) at Java Source.

There are quite a number choices for stringhandling in java.
Maybe the very basic java.util.Scanner and java.util.StringTokenizer Classes are helpfull for you?
Another good choice is maybe the org.apache.commons.lang.text library.
http://commons.apache.org/lang/apidocs/org/apache/commons/lang/text/package-summary.html

Related

Equivalent of FileReader for Java resources [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there a utility class (e.g. in commons-io or Guava, if not in core Java) that's the equivalent of FileReader, but for resources? I mean, yes, I can write
Reader myReader = new InputStreamReader(getClass().getResourceAsStream("myResource"));
but it would be nice to do it with less boilerplate noise.
There is no more concise what of writing this using core Java classes.
Guava has some helpers for dealing with resources, but nothing that wraps a resource as a Reader.
There is nothing relevant in Apache commons.
And in fact, what you've written is arguable wrong. It depends on the platform character encoding being the same as the encoding of your embedded resources. IMO, that is a more important issue than the amount of boiler plate code you need to write.
You can address the boilerplate "problem" by writing your own utility methods.
Re: this "reason" for not writing your own utility method.
Because if somebody's already done it I don't want to maintain it myself.
Assuming that you write the method correctly, the maintenance effort will be almost zero. And since (with the hint above) you now know how to write it correctly, the implementation effort will be almost zero too. You've probably expended more effort looking for an existing helper (and asking here) than you would have saved ... if you'd found one.

Good localization framework for Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I'm looking for best localization framework for Java. Default localization framework is not enough for me.
Ideally I'm looking for localization framework with following features:
Style support - like Apache Wicket localization framework does
Formating support - dates and numbers format support
Plurals support - like GWT localization framework does
Message hierarchy - localization finding among hierarchy gained at message level. Typically is lookup path name[style][locale] -> name[locale] -> name[style] -> name, but it is gained at file level - if some message is not in file it wouldn't be translated. I would like to have localization gained at message level.
If you know about some good localization framework, please give me echo. If you think that my requirements are somehow wrong, please let me know also.
Are you looking for a l10n framework or a web-framework with l10n features?
If your case is the second one, except for the Plurals support, Apache Tapestry it's a wonderful choice. :)
http://tapestry.apache.org
Take a look at ICU4J.
Especially things like MessageFormat, PluralFormat, SelectFormat to deal with messages, and that can also take care of date/time/number/currency formatting. Or use various formatters directly (DateFormat, NumberFormat, etc.)
ChoiceFormat in Java is also worth a look.
But in general the resource bundles mechanism coupled with MessageFormat is almost always enough (take a look at UResourceBundle in ICU4J, more powerful than the Java ResourceBundle).

Is there a dynamic word/tag cloud Java API somewhere? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
There are loads of great word and tag clouds available, the most prominent being wordle.net. But I am looking to display something akin to what some folks did for a twitter replay of the 2010 world cup, just not using flash. I'm not too familiar with R, but it seems to be the best tool for generating some statistical decay of font size over time. Is there a Java API (or combination of APIs) that might make this capability easier from the start?
I'm not aware of a good R package for that. There are some functions, like cloud in the snippets package, and maybe other functions, but nothing compared to http://wordle.net, http://tagcrowd.com/, or Many Eyes. Drew Conway has done some nice stuff with tm + ggplot2; I also played with it a while ago, but this was more of to play with 3D tag cloud (with rgl) than wordle.
In Python or Processing, there are some ongoing projects detailed on this related question. To my knowledge, Tagxedo looks great but it has no API and it relies on Silverlight.
Pierre Lindenbaum also has some Java code, see his blog post Playing with the Wordle algorithm: a tag cloud of Mesh Terms.
It's not great, but there is an open-source project (alas, in PHP) that does word clouds over time. The example uses presidential speeches.
http://chir.ag/projects/preztags/
Here is one that I created in Java as part of a larger project for deriving information from unstructured data : https://github.com/regunathb/Sift. The "tagcloud" project has all the required classes for generating a tag cloud and writing it to multiple putput image formats.

Looking for a simple Java spider [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I need to supply a base URL (such as http://www.wired.com) and need to spider through the entire site outputting an array of pages (off the base URL). Is there any library that would do the trick?
Thanks.
I have used Web Harvest a couple of times, and it is quite good for web scraping.
Web-Harvest is Open Source Web Data
Extraction tool written in Java. It
offers a way to collect desired Web
pages and extract useful data from
them. In order to do that, it
leverages well established techniques
and technologies for text/xml
manipulation such as XSLT, XQuery and
Regular Expressions. Web-Harvest
mainly focuses on HTML/XML based web
sites which still make vast majority
of the Web content. On the other hand,
it could be easily supplemented by
custom Java libraries in order to
augment its extraction capabilities.
Alternatively, you can roll your own web scraper using tools such as JTidy to first convert an HTML document to XHTML, and then processing the information you need with XPath. For example, a very naïve XPath expression to extract all hyperlinks from http://www.wired.com, would be something like //a[contains(#href,'wired')]/#href. You can find some sample code for this approach in this answer to a similar question.
'Simple' is perhaps not a relevant concept here. it's a complex task. I recommend nutch.

Are there faster XML parsers in Java than Xalan/Xerces [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I haven't found many ways to increase the performance of a Java application that does intensive XML processing other than to leverage hardware such as Tarari or Datapower. Does anyone know of any open source ways to accelerate XML parsing?
Take a look at Stax (streaming) parsers. See the sun reference manual. One of the implementations is the woodstox project.
Since it hasn't been directly mentioned, I'll throw in Aalto, which is fastest java xml parser according to some measurements, like:
JVM-serializers (which compares, XML, JSON, protobuf, Thrift etc etc)
Alternative serialization methods for WSTest (Java web services)
which are not written by Aalto developers.
VTD-XML is very fast.
It has a DOM-like API and even XPath queries.
Piccolo claims to be pretty fast. Can't say I've used it myself though. You might also try JDOM. As ever, benchmark with representative data of your real load.
It partly depends on what you're trying to do. Do you need to pull the whole document into memory, or can you operate in a streaming manner? Different approaches have different trade-offs and are better for different situations.
Depending on the complexity of your XML messages you might find a custom parser can be 10x faster (though more work to write) However if performance is critical, I wouldn't suggest using a generic parser. (Also I wouldn't suggest using XML as its not designed for performance, but that's another story, .. ;)
Check Javolution as well

Categories