non-java files in package structure - java

We have a developer who is in the habit of committing non-java files (xsd, dtd etc) in the java packages under the src/java folder in our repository. Admittedly, these are relevant files to that package, but I just hate to see non-java files in the src folder.
Is this is a common practice that I should get used to or are we doing something strange by maintaining these files like this?

The problem with putting non Java (or other languages) files that are closely tied to the code in a different place than the code is knowing where to find them. It is possible to standardize the locations then theoretically everyone will know where to go and what to do. But I find in practice that does not happen.
Imagine your app still being maintained 5 or 10 years down the road by a team of junior - intermediate developers that do not work at the company now and will never talk to anyone who works on your project now. Putting files closely linked to the source in the source package structure could make their lives easier.
I am a big proponent of eliminating as many ambiguities as possible within reason.

It's very common and even recommended as long as its justifiable. Generally it's justifiable when it's a static resource (DTD+XSLT for proprietary formats, premade scripts etc.) but it's not when the file is something that's likely to be updated by a third party like IP/geographic location database dump.

I think it gets easier if you think of 'src' as not specifically meaning 'source code'. Think of it as the source of resources that are things needed by your program at compile time and/or runtime.
Things that are a product of compile or build activities should not go here.
Admittedly, like most things, exceptions may apply :)
Update:
Personally, I like to break down src further with subdirectories for each resource type underneath it. Others may like that division at a higher level.

There is a lot of jar libraries that uses the same practice.
I think it is acceptable and comfortable.

In Eclipse it works well for us to have a src folder containing java classes, and a configuration folder (which is blessed as a source folder) containing property files etc. Then they all go in the output folder together and can be found in the classpath while still being in seperate folders inside Eclipse

One of the advantages of keeping all the auxiliary files next to the source is that version consistency is maintained between these 3rd party libraries and your source code. If you ever need to go back and debug a specific version, you can pull the entire set of source+config and have it all be the same version.
That being said I'd put them in a $project/config/ directory, or some such, rather than in $project/src/java itself. They're not source, nor java, really, so it's misleading having them in that directory.
When you really get down to it, though, this is an issue of personal style. There's no "Right" answer and you should be talking with those team members and understanding why they made this decision. Using this thread as evidence to support a unilateral decision probably won't go over well. ;)

Its pretty common, you can find it in really popular frameworks, e.g. xsd files for spring various schemas. Also people usually place hibernate mapping files in the same package as the model classes.

I think this is common as long as the files are necessary. The problems arise when people start committing files that are not needed with the source, such as design specs or random text files.

It is surely common, but incredibly lazy and sloppy. My skin crawls when I see it.
Using a tool such as Maven to build your products enables you to easily, and clearly separate code from resources.
Eclipse bundles can be similarly separated.

Related

Should I bundle source and class files in the same JAR?

Separate Jars
When creating JAR files, I've always kept the source separate and offered it as an optional extra.
eg:
Foo.jar
Foo-source.jar
It seems to be the obvious way to do things and is very common. Advantages being:
Keeps binary jar small
Source may not be open / public
Faster for classloader? (I've no idea, just guessing)
Single Jar
I've started to doubt whether these advantages are always worth it. I'm working on a tiny component that is open-source. None of the advantages I've listed above were problems in this project anyway:
Classes + source still trivially small (and will remain that way)
Source is open
Class loading speed of this jar is irrelevant
Keeping the source with the classes does however bring new advantages:
Single dependency
No issues of version mismatch between source and classes
Developers using this jar will always have the source to hand (to debug or inspect)
Those new advantages are really attractive to me. Yes, I could just zip source, classes and even javadoc into a zip file and let clients of my component decide which they want to use (like Google do with the guava libraries) but is it really worth it?
I know it goes against conventional software engineering logic a little, but I think the advantages of a single jar file out-weigh the alternatives.
Am I wrong? Is there a better way?
Yes, I could just zip source, classes and even javadoc into a zip file and let clients of my component decide which they want to use (like Google do with the guava libraries) but is it really worth it?
Of course it is worth it! It takes about 2 seconds to do it, or just a few minutes to change your build scripts.
This is the way that most people who distribute sources and binaries handle this problem.
EDIT
It is not your perspective you need to consider. You have to think of this from the perspective of the people deploying / using your software.
They aren't going to use the source code on the deployment platform.
Therefore putting the source code in the binary JAR is a waste of disc space, slows down deployment and slows down application startup.
If they want to do something about it, they've got a problem. How do they rebuild the JAR file to get rid of the source code? How do they know what is safe to leave out?
From the deployer / user's perspectives, there are no positives, only negatives.
Finally, your point about people not being able to track source versus binary versions doesn't really hold water. Most people who would be interested in the source code are perfectly capable of doing this. Besides, there some simple things you can do to address the issue, like using JAR filenames that include your software's version number, or putting the version number into the manifest.
I have just come across a potential pitfall for the java+classes in a single jar.
If you have java files in a jar and that jar is included in the classpath of a subsequent javac execution, you MUST make sure that the timestamps of the java file is less than the timestamp of the class file.
This scenario can happen when you copy/move the java or class files prior to packaging as a jar.
If the java file is newer than the class, then even though the java file is found on the classpath (rather than an argument to javac), javac will attempt to compile that java file and then potentially end up with duplicate class errors during the compilation stage.
For this reason I would recommend keeping the source in a separate jar to the class files.
Note that relevant flags in javac will not allow you to prefer class over source: http://docs.oracle.com/javase/7/docs/technotes/tools/windows/javac.html#searching
I prefer 'Separate Jars'.
Because binary class jar is for running on JVM, but source not. Source should be carefully maintained by your source control system(SVN). If source needs to release, zip it in separate jar. Many open source separates class jar and source one.
If you want others to test and inspect/improve your code then you can have your source with the binaries. If not, keep the source away from the jar.
How small is small and why should your jar act differently from others?
Unless you have a very good reason why your jar should have the sources, not simply debugging but something specific to this one jar then I'd say no, choice is best.
I say this because if your jar should not be different from other, then you have to work on the assumption that others should do the same as you. If so, the size of the jar is not important, because its duplicated over all "small" jars. Then my WAR is much bigger than needed which, admittedly is not a massive issue, but is not something I would chose for production when I can download sources in DEV so easily.

What is best practice (and implications) for packaging projects into JAR's?

What is considered best practice deciding how to define the set of JAR's for a project (for example a Swing GUI)? There are many possible groupings:
JAR per layer (presentation, business, data)
JAR per (significant?) GUI panel. For significant system, this results in a large number of JAR's, but the JAR's are (should be) more re-usable - fine-grained granularity
JAR per "project" (in the sense of an IDE project); "common.jar", "resources.jar", "gui.jar", etc
I am an experienced developer; I know the mechanics of creating JAR's, I'm just looking for wisdom on best-practice.
Personally, I like the idea of a JAR per component (e.g. a panel), as I am mad-keen on encapsulation, and the holy-grail of re-use accross projects. I am concerned, however, that on a practical, performance level, the JVM would struggle class loading over dozens, maybe hundreds of small JAR's. Each JAR would contain; the GUI panel code, necessary resources (i.e. not centralised) so each panel can stand alone.
When I say "holy grail of reuse", I say this more because it demonstrates a cleanly decoupled, encapsulated design, rather than necessarily expecting its re-use elsewhere. I consider myself a "normally intelligent" person; I consider the spagetti of intertwined nonsense I've had to deal with during my career slows me down 10 to 100-fold. A cleanly decoupled design allows me to deal with one concept at a time, one layer, one class.
Does anyone have wisdom to share?
I would recommend as fewer JARs as possible.
The logic behind it, the disk storage is the cheapest commodity there available, but time spending tracing down complex dependencies is priceless.
Hence the emergence of the .war files where all dependencies of the web application are put into a single file.
BTW, Eclipse has a JAR exporter plugin which puts all dependent jars into a super jar and expose the entry level main method, so you can start your app with java -jar file.jar command. Although the resultant jar may be large, the flip side is not maintaining very complex class paths for you application.
So, in your case I would go with one jar per project. If you determine that you indeed need to reuse some code in another project, just refactor it into the base project and make it a dependency in your existent project and another project.
You can actually use both approaches. Spring for example offers a big monolithic jar file, which contains most common functionality. If you want however you can also download independent jar files. It is then left to the user to select what is best. Big jar files are easier to deploy, but they are harder to upgrade. Also you may need to add a big jar whereas you only need a simple class. I find that is is easier to spot dependencies with small jar files. I also thinK that updating/upgrading is easier.
Java provides encapsulation and re-use at the class layer - the jar file format doesn't really provide it. I don't see much advantage in putting a significant component in its own jar, unless you think lots of people will be downloading it.
I read somewhere (and I was trying to find it when I found this) that project per layer is the best. It's what I've been doing. Struts, Spring MVC, Swing, whatever in one layer, EJBs in another, business services in another and DAOs in another. I put all of the DTOs in its own project as well, even though they don't represent a layer, but are instead passed through the layers.
The main benefit I remember reading about was being able to version each jar separately.
Oh, and BTW, each layer actually has two jars, one for the interfaces that the layer above uses, and another for the implementation(s).

CPD / PMD between projects?

I am rephrasing this question to make it a little more straightforward and easy to understand, hopefully.
I have roughly 30 components (internal) that go into a single web application. That means 30 different projects with their own separate POM. I use inheritance quite a bit in my POMs so one of the things they inherit is a PMD/CPD configuration to prevent code duplication.
Even though I have CPD/PMD running, it only detects duplicate code within the same project. I would like it to detect in any of my projects if there is code shared among the projects that can be refactored out. Moreover, I was looking for something that could (using the same concept/pattern) verify that no code is shared between other open source dependencies.
It would be CPD/PMD, except it would operate on the source jars. This task would consume a large amount of memory if you scan all projects and their dependencies for duplication. Right now, I would just like to apply that to internal projects. If it works, then it would be relatively easy/straightforward to scale that out.
Walter
I'm not sure I got everything but...
I'd create an aggregating module with all projects as dependencies, use the maven-dependency-plugin and it's unpack-dependencies mojo to get all dependencies sources jar (the mojo can take a classifier as parameter) and unpack-them (maybe in target/generated-sources/java, the maven build helper plugin may help here) and finally run pmd:cpd on the whole source base.
This may need some tweaking, I didn't test this at all.
It sounds like you want to find duplicate code anywhere in your 30 projects. I can't speak for PMD; I assume you tell it to make one giant project containing all the source files from the union of the projects. But yes, this would take a lot of RAM and CPU.
Another tool that does is the Java CloneDR. The CloneDR finds duplicate code whether it is exactly the same or close (e.g., a few edits) regardless of source code layout or intervening comments. It is pretty easy to set it up to process all the files in your set of projects.
Just run PMD:CPD as a stand-alone program. All it needs is a directory, and it will recurse. At least, it did for me. I moved all my source to one directory and ran the CPD gui from the batch file distributed with PMD-4.2.5 .
You can perhaps take a look at sonar :
Sonar-CPD engine that is much more scalable and can detect cross-projects duplications.
You can try Lizard for Python.
It doesn't work on source jars, though.
"Code Duplicate Detector
lizard -Eduplicate {path to your code}"
https://pypi.org/project/lizard/
PMD/CPD provides more granularity since it allows the user to specify the number of tokens before a block of code is flagged as duplicate.
https://pmd.github.io/latest/pmd_userdocs_cpd.html#cli-options-reference

How to automate a build of a Java class and all the classes it depends on?

I guess this is kind of a follow-on to question 1522329.
That question talked about getting a list of all classes used at runtime via the java -verbose:class option.
What I'm interested in is automating the build of a JAR file which contains my class(es), and all other classes they rely on. Typically, this would be where I am using code from some third party open source product's "client logic" but they haven't provided a clean set of client API objects. Their complete set of code goes server-side, but I only need the necessary client bits.
This would seem a common issue but I haven't seen anything (e.g. in Eclipse) which helps with this. Am I missing something?
Of course I can still do it manually by: biting the bullet and including all the third-party code in a massive JAR (offending my purist sensibilities) / source walkthrough / trial and error / -verbose:class type stuff (but the latter wouldn't work where, say, my code runs as part of a J2EE servlet, and thus I only want to see this for a given Tomcat webapp and, ideally, only for classes related to my classes therein).
I would recommend using a build system such as Ant or Maven. Maven is designed with Java in mind, and is what I use pretty much exclusively. You can even have Maven assemble (using the assembly plugin) all of the dependent classes into one large jar file, so you don't have to worry about dependencies.
http://maven.apache.org/
Edit:
Regarding the servlet, you can also define which dependencies you want packaged up with your jar, and if you are making a stand alone application you can have the jar tool make an executable jar.
note: yes, I am a bit of a Maven advocate, as it has made the project I work on much easier. No I do not work on the project personally. :)
Take a look at ProGuard.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.
What you want is not only to include the classes you rely on but also the classes, the classes you rely on, rely on. And so on, and so forth.
So that's not really a build problem, but more a dependency one. To answer your question, you can either solve this with Maven (apparently) or Ant + Ivy.
I work with Ivy and I sometimes build "ueber-jar" using the zipgroupfileset functionality of the Ant Jar task. Not very elegant would say some, but it's done in 10 seconds :-)

Organize small utilities functions

After years of programming, we all have a set of small functions used as helpers utilities that we wish it comes build-in so we can use it in any project and have ti taken care by more people (test and optimized).
I have quite a collection of these functions. I wonder how do you guys organize them? Do you have any tips?
This is how I do it. I put it in a separate project (an eclipse project) let say "MyUtils" and it referred to by other projects. This works but because the utils collection are getting bigger and bigger something it is kind of weird that the utils are bigger than the project code (for small projects). And to ship it in Jar, you have to select them all by hand (or include them all). Is there a better way?
Also, as Java requires all functions to be in a class so I have ton of static functions (those that does not fit in OOP) for example a function read text file from a file name. Like this:
package nawaman.myutil;
public class UText {
static public String ReadTextFile(String pFileName) {
...
}
static public String[] ReadLines_fromFile(String pFileName) {
...
}
static public String ReadLine_fromFile(String pFileName, int pLineNumber) {
...
}
...
}
So when I need to include all the functions goes when though it is not used.
Is there a better way to do this?
I use eclipse on Linux anyway if there is special technique for it but fell free to share if you have techniques with other tools.
I treat such utility classes just like other components external to the software that I develop:
For each component I create a Eclipse project and build it to a jar.
Classes are grouped logically in packages, e.g. [domain].util.net, [domain].util.text etc.
In a project I include the dependencies I need. Maven can help you here.
You write that utility classes have a lot of static methods. That's something I don't use a lot. For example the text functions you show can be refactored to a class or set of classes that extend or implement classes and interfaces from the collections framework. That makes it easier to integrate my code with other libraries.
This works but because the utils collection are getting bigger and bigger something it is kind of weird that the utils are bigger than the project code (for small projects). And to ship it in Jar, you have to select them all by hand (or include them all). Is there a better way?
For my projects I use javac to select all the classes from my util libraries. For this I compile all classes from my project to an empty output directory. javac automatically resolves the dependencies to the util libraries because I added the util library pathes as source pathes. Now I can create a jar that contains all classes of my project and only the needed classes of the util libraries.
Also, as Java requires all functions to be in a class so I have ton of static functions (those that does not fit in OOP) for example a function read text file from a file name.
I do it the same way. But I try have a lot of small util classes instead of a few big ones, so that I don't have to include tons of unneeded methods to my jars.
My "utilities" have their own package namespace and SVN repository. They are, in essence my own libraries: distinct projects which may be pulled in, shared, tagged, updated, whatever.
The organization used within each of these "libraries" depends on the scope and function in question.
Because I disagree with the structure being a slave to some potential class/JAR output:
If you are concerned about "method bloat" in the classes and/or JARs, please use an automated tool to combat this. ProGuards is just one example and, while it can obfuscate, it can work equally well at just "dead code elimination".
Split your utils module into smaller subprojects. Use Maven or other build system to track versions of all your util modules. They are crucial to your systems because I think they used are in almost all your projects. Use tools like Findbugs or PMD to mesure quality of your code.
Every project need to know which version of utils module is using. It unacceptable in my opinion to add to binaries/sources of one of yours 'nonutils' project some loosely coupled util classes.
Please, revise yours classes with other commons projects like Apache Commons. I assume that lot of your utility code is similiar. Think better of rewriting yours static metods, because they obstruct testing (I'm sure that Findbugs will be complaining a lot too).
To sum up - creating a utils library is a hard stuff and a lot of responsability. So requirements in area of code quality are very high. I hope that my advice will help.
You should be very careful with removing classes after compilation - you may end up in a class not found situation at runtime. If you never use reflection or Class.forName() you should be safe, but those introduce runtime dependencies which the compiler cannot help you with (like it can with "new").
Remember - those classes not used do not use memory in the running program, only uses bytes on disk.
Personally I've ended up at saying disk space is cheap, and the risk of accidntially removing a class defintion used causing a runtime break, is not worth it to me, so I say - all code used for compilation must be shipped.
I don't use Eclipse, but in Visual Studio you can add a reference to file without it being physically moved or copied. This allows you to define a file in the root of your source control that all of your projects can reference without it being included in every project or having to deal with the copying problem. With this kind of solution you can intelligently split your util methods into different files and selectively include them based on what individual projects need. Also you can get rid of the extra .jar.
That said, I have no idea if Eclipse supports this kind of file referencing, but it might be worthwhile to look.

Categories