I would like to use commons-compress to work with various compression/archive formats.
However on first look it seems commons-compress only supports detecting some types of files, but only based on the first few bytes.
Is there a way I can use commons-compress to automatically detect file-types based on file extension? I surely can build this myself, but it would be nice to have this provided by the compression library itself.
After some more digging, I found that there are a few classes that help here, namely FilenameUtil, BZip2Utils, GzipUtils, ... so for each supported format, there is a *Utils class which allows to detect this type by extension.
See e.g. http://commons.apache.org/proper/commons-compress/javadocs/api-1.10/org/apache/commons/compress/compressors/bzip2/BZip2Utils.html
Related
Right now I am working on a project where I have to read a series of Strings in from a text file in java.
I know how to do it in the general way (using a FileReader, Buffer, etc.). My issue arises because I am not allowed to use external libraries at all. Are these considered external libraries?
To put things more easily, whats a good definition of an external library? Is it anything that I would have to import?
As a follow up question, how would I be able to read from a text file without using any of those libraries, if they're not allowed?
External libraries would be ones you download that do not come with Java API. An example would be the apache commons API. If you can just import it then it's not an external library.
I have to write a very large XLS file, I have tried Apache POI but it simply takes up too much memory for me to use.
I had a quick look through StackOverflow and I noticed some references to the Cocoon project and, specifically the HSSFSerializer. It seems that this is a more memory-efficient way to write XLS files to disk (from what I've read, please correct me if I'm wrong!).
I'm interested in the use case described here: http://cocoon.apache.org/2.1/userdocs/xls-serializer.html . I've already written the code to write out the file in the Gnumeric format, but I can't seem to find how to invoke the HSSFSerializer to convert it to XLS.
On further reading it seems like the Cocoon project is a web framework of sorts. I may very well be barking up the wrong tree, but:
Could you provide an example of reading in a file, running the HSSFSerializer on it and writing that output to another file? It's not clear how to do so from the documentation.
My friend, HSSF serializer is part of POI. You are just setting certain attributes in the xml to be serialized (but you need a whole process to create it). Also, setting a whole pipeline using this framework just to create a XLS seems odd as it changes the app's architecture. ¿Is that your decision?
From the docs:
An alternate way of generating a spreadsheet is via the Cocoon
serializer (yet you'll still be using HSSF indirectly). With Cocoon
you can serialize any XML datasource (which might be a ESQL page
outputting in SQL for instance) by simply applying the stylesheet and
designating the serializer.
If memory is an issue, try XSSF or SXSSF in POI.
I don't know if by "XLS" you mean a specific, prior to Office 2007, version of this "Horrible SpreadSheet Format" (which is what HSSF stands for), or just anything you can open with a recent version of MS Office, OpenOffice, ...
So depending on your client requirements (i.e. those that will open your Excel file), another option might be available : generating a .XLSX file.
It comes down to producing an XML file in the proper grammar, which seems to be fit to your situation, as you seem to have already done that with the Gnumeric XML-based file format without technical trouble, and without hitting memory-effisciency issues.
Please note other XML-based spreadsheet formats exist, that Excel and other clients would be able to use. You might want to dig into the open document file formats.
As to wether to use Apache Cocoon or something else:
Cocoon can sure host the XSL processing ; batch (Cocoon CLI) processing is available if you require Cocoon, but require it not to run as a webapp (though as far as I remember, CLI feature was broken in the lastest builds of the 2.1 series) ; and Cocoon comes with a load of features and technologies that could address further requirements.
Cocoon might be overkill if it just comes down to running an XSL transformation, for which there is a bunch of well-known, lighter tools you can pick from.
I would like to write toy IDE for Java, so I ask a question about one particular thing that as I hope can help me get started.
I have editor implemented on top of swing and i have some text in there. There is for example:
import java.util.List;
Now I need a way to send "java.util.List" string to a method that returns me all the information I may need including JavaDoc document.
So is there any tool that can set up classpath with libraries, that would parse every string I send and try to find if there is any Class/Interface with documentation to return?
So is there any tool that can set up classpath with libraries, that would parse every string I send and try to find if there is any Class/Interface with documentation to return?
AFAIK, no. There is no such free-standing tool or library. You will need to implement it yourself. (Don't expect that writing a Java IDE is simple ... even a "toy" one.)
Libraries will have class files, which will not have javadocs.. So it is not clear what you want to do.
There are many byte code engineering tools to analyse and extract information from class files. For example asm or bcel. Javassist allows to process both source and byte code, so may be close to what you need.
You could use html parser to get the javadoc and other info from the web using the full path to the class (including package names to construct the correct URL per class). This will of course depend on the version of java you are using.
You can also use the javadoc tool from within java to generate the desired documentation from java source files (which can be downloaded from the web). The source code of the tool could also help you out. See http://java.sun.com/j2se/javadoc/faq/#developingwithjavadoc
Lastly, if you need information based on runtime types in your program, you might want to check reflection capabilities.
First you need to know How to print imported java libraries?. Then download java API documentation here. Once you find out imported libraries, open an inputStream in order to read appropriate HTML file.
Beware! This technic will only work when importing from jdk.
In Android applications, resources are specified in xml documents, which automatically are built into the R class, readily accessible within the source code as strongly typed.
Is there any way I could use a similar approach for a regular Java desktop application?
What I'd like to accomplish, is both the removal of strings from the code (as a separation of "layers", more or less) and to make it easy to add support for localization, by simply telling the program to choose the xml file corresponding to the desired language.
I've googled around a bit, but the things I'm looking for seem to be drowning in results about parsing or outputting xml, rather than tools utilizing xml to generate code.
Eclipse's message bundle implementation (used by plugins for example) integrates with the Externalize Strings feature and generates both a static class and a resource properties file for your strings:
http://www.eclipse.org/eclipse/platform-core/documents/3.1/message_bundles.html
For this integration to work Eclipse needs to see org.eclipse.osgi.util.NLS on the class path. From memory, the dependencies of the libraries it was available in were a little tricky for the project I used this approach in, so I just got the source and have it as a stand-alone class in my core module (see the comments for more on that).
It provides the type safety you're looking for and the IDE features save a lot of time. I've found no downsides to the approach so far.
Edit: this is actually what ghostbust555 mentioned in the comments, but not clear in that article that this isn't limited to Eclipse plugins and you refer to your resources via static members of a messages class.
I haven't seen any mention of others using this approach with their own applications, but to me it makes complete sense given the IDE integration and type safety.
I'm not sure if this is what you mean but check out internationalization- http://netbeans.org/kb/docs/java/gui-automatic-i18n.html
Are you looking for something that parses XML files and generates Java instances of similar "struct-like" objects, like JAXP, and JAXB?
I came across ResGen which, given resource bundle XML files generates Java files that can be used to access the resources in a type-safe way.
http://eigenbase.sourceforge.net/resgen/
I'm interested in dealing with archive contents in a similar way to dealing with Images through the awt.imageio api: Just get them as a file and see if you know how to decode them.
Obviously, there's the jar apis but I believe they only work with zip formats.
End use is Clojure code.
I think that some of the types you can handle through Apache Compress library. I think, that you can also make a close look to Apache Tika library, that extracts text and metadata from different file types, and as I remember, that they want to extend Apache Compress with more archive types (look onto patches in the JIRA)