JQAssistant with multiple projects and builds

JQAssistant with multiple projects and builds - java

I have JQAssistant scanning my project(s) and can query each of the projects. The documentation cites a Team Server ability where all projects/builds are stored in a central Neo4j db.
I cannot find any documentation though on how this would be handled, or what happens for multiple builds. Nodes do not seem to be tagged with a build number, and neither with the project name, so it appears to be one big lump.
Is there an easy way to tag everything on the way in with projectName and BuildNumber, or am I missing something? I assume I could tag everything one JQAssistant is run and tag everything missing these tags, but then I lose parallelism and seems too hacky.
This would also help pruning data based on old builds to avoid too much build-up.
Any help much appreciated,

There's a little misunderstanding: the idea behind the team instance is to have Neo4j instance per project with a single (i.e. latest) snapshot of the graph (usually filled by a nightly CI run). So there's currently no build (identifiable by a date, number etc.) in the data - but could be an interesting feature.

Related

Nutch: What version of Nutch + Cassandra actually works?

I'm trying to do some crawling with Nutch and I'd like to test out Cassandra as a backend, however using the latest version of nutch and its dependencies Cassandra throws a variety of errors as you move through the inject, generate, fetch, etc. process.
The errors are all related to actual problems in code, not out of memory or configuration. I've fixed some of them by modifying code within gora-cassandra, but it's still not functional.
My question is, does a working version of these 2 projects exist? By working i mean you can run through inject, generate, fech, parse, updatedb on at least a small set of urls, without error.
Here's an example of one of the classes giving an error during fetch:
java.lang.NullPointerException
at org.apache.gora.cassandra.query.CassandraSuperColumn.getUnionIndex
I have used HBase as the backend and that just works, although HBase itself is a monster to manage so that's why i'd like to test out Cassandra. However, i'm about to give up on this as I don't think I should be having to modify gora-cassandra code just to get a basic example to run.
Thanks

According to this link it's just broken, which is about 3 months old http://lucene.472066.n3.nabble.com/Re-user-Digest-3-Jun-2017-19-27-20-0000-Issue-2758-td4339060.html
Its unclear why backends that do not work are even documented.
HBase is most widely used, followed by MongoDB... on the other end of
the spectrum, Cassandra is least used and broken. It has not been
maintained for quite some time... and yes this is reflected by use of
Super Columns. We are currently re-writing the backend as part of a
GSoC project.
I would agree with the guy making the original statement, Its unclear why backends that do not work are even documented.
Really tired of this project and its lack usable documentation.

Who is using my maven artifact?

I have a system consisting of multiple web applications (war) and libraries (jar). All of them are using maven and are under my control (source code, built artifacts in Nexus,...). Let say that application A is using library L1 directly and L2 indirectly (it is used from L1). I can easily check the dependency tree top-down from the application, using maven's dependency:tree or graph:project plugins. But how can I check, who's using my library? From my example, I want to know, whether A is the only application (or library) using L1 and that L2 is used from L1 and from some other application, let say B. Is there any plugin for maven or nexus or should I try to write some script for that? What are your suggestions?

If you wish to achieve this on a repository level, Apache Archiva has a "used by" feature listed under project information
.
This is similar to what mvnrepository.com lists under its "used by" section of an artifact description.
Unfortunately, Nexus does not seem to provide an equivalent feature.
Now I suppose it would be a hassle to maintain yet another repository just for that, but then it would probably easier than what some other answers suggestions, such as writing a plugin to Nexus. I believe Archiva can be configured to proxy other repositories.
Update
In fact, there's also a plugin for Nexus to achieve the "used by" feature.

As far as I know nothing along these lines exists as an open source tool. You could write a Nexus plugin that traverses a repo and checks for usages of your component in all other components by iterating through all the pom's and analyzing them. This would be a rather heavy task to run though since it would have to look at all components and parse all the poms.
In a similar fashion you could do it on a local repository with some other tool. However it probably makes more sense to parse the contents of a repo manager rather than a local repository.

I don't think there's a Maven way to do this. That being said, there are ways of doing this or similar things. Here's a handful examples:
Open up your projects in your favorite IDE. For instance Eclipse will help you with impact analysis on a class level, which most of the time might be good enough
Use a simple "grep" on your source directory. This sounds a bit brusk (as well as stating the obvious), perhaps, but we've used this a lot
Use dependency analysis tools such as Sonargraph or Lattix

I am not aware of any public libraries for this job, so I wrote a customized app which does it for me.
I work with a distribution which involves more than 70 artifacts bundled together. Many times after modifying an artifact, I want to ensure changes are backward compatible (i.e. no compilation errors are introduced in dependent artifacts). To achieve this, it was crucial to know all dependents of modified artifact.
Hence, I wrote an app which scans through all artifacts under a directory(/subdirectories), extracts their pom.xml and searches (in dependency section of pom) for occurrence of modified artifact.
(I did this in java although shell/windows script can do this even more compactly.)
I'll be happy to share code on github, if that could be of any help.

One way that might suit your needs are to create a master-pom with all your maven projects. Then you run the following command on the master-pom:
mvn dependency:tree -DoutputType=graphml -DoutputFile=dependency.graphml
Open the generated file in yEd.
Used the instructions found here:
http://www.summa-tech.com/blog/2011/04/12/a-visual-maven-dependency-tree-view/

More interesting is probably: what would you do with this information? Inform the developers of A not to use library L1 or L2 anymore, because it has a critical bug?
In my opinion you should be able to create a blacklist of dependencies/parents/plugins on your repository manager. Once a project tries to deploy/upload itself with a blacklisted artifact, it should fail. I'm saying uploading and not downloading, because that might break a lot of projects. As far as I know, this is not yet available for any repository-manager.

One of the ways to approach this problem is outside Java itself : write an OS-level monitoring script that tracks each case of fopen() on the jar file under question! Assuming this is in a corporate environemnt, you might have to wait for a few weeks (!) to allow all using processes to access the library at least once!
On Windows, you might use Sysinternals Process Monitor to do this:
http://technet.microsoft.com/en-us/sysinternals/bb896645
On Unix variants, you would use DTrace or strace.

IMHO and also from my experience, looking for a technical solution for such a problem is often an overkill. If the reason why you want to know who is using your artifact(library) is because you want to ensure backward compatibility when you change an artifact or something similar, I think it is best done by communicating your changes using traditional channels and also encourage other teams who might be using your library to talk about it (project blogs, wiki, email, a well known location where documentations are put, Jour fixe etc.).
In theory, you could write a script that crawls though each project in your repository and then parses the maven build.xml (assuming they all use maven) and see whether they have defined a dependency to your artifact. If all the projects in your organization follows the standard maven structure, it should be easy to write one such script (though if any of those projects have a dependency to your artifact via a transitive dependency, things can get a bit more tricky).

Multi-component versioning/building best practices

I have a Java project, built with Maven, that aggregates several components, each one in its own Maven project. Any one of these components may evolve separately.
The structure of my project can be described as follows:
my-main-project that depends on:
my-component-1
my-component-2
etc.
Nowadays, all pom.xml are using "snapshot" versions, so, they are all using the "latest" version available in my repository.
But once I send a release version to my customer, I'm supposed to freeze the versions and make a TAG (or equivalent) in my source-control, so I can restore a previous state in case of maintenance.
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions? Also, if I have many components (my project currenty has 30+ small subcomponents) would I have to renumber/reversion each one before each release? When a single component evolves (due to bug fix or enhancement), must I increase its version so that the changes do not affect pre-existing releases, right?
How people using maven generally handle this many-component versioning case?
Of course, I could just rely on my version-control tags to restore to a previous point-in-time and just tag every component on each release, but I don't like this approach, since the dependency versioning (with maven) gives me much more control and visibility about what is packaged, and relations of (broken-)compatibility and many more.

General Considerations
You may consider some relations between your components.
Are they really independant (each one vs each other) ? Or is there some kinds of relation ... some commons lifecycles ?
If you find some relationship between them, consider using maven multi-modules : http://www.sonatype.com/books/mvnex-book/reference/multimodule.html. In a few words, you will have a parent, with one version, and some modules (some jars .. in a way like Spring and its submodules). This will help you to reduce versions management.
You may consider using maven-release-plugin. It will help you to tag, build and deploy automatically your modules, dealing more easily with versionning and links with SCM and Repository.
Moreover, combine with multi-module it would drastically help you !
There is a lot of topic dealing with this on Stack Overflow.
I don't know if you already know that. I could explain it a lot further if you want, but you may have enough elements to search by yourself if you don't.
Straight Answers
So, my question is: should I change all pom.xml files before each release, give version numbers to the components, and tie everything with this dependency versions?
Yes you should. In Application Lifecycle Management follow the changes is REALLY important. So, as you could imagine, and as you point it out, you really should build and tag each of your components. It could be painful, but maven-realease-plugin and multi module (even with a Continuous Integration plateform) it could be easier.
would I have to renumber/reversion each one before each release?
For exactly the same reasons : yes !
must I increase its version so that the changes do not affect pre-existing releases, right?
Yes, you should too. Assuming you choose a common versionning like MAJOR.minor.correction, the first number indicate compatibilty breaks. Minor version would bring some breaks, but should not. Corrections whould NEVER affect compatibility.
How people using maven generally handle this many-component versioning case?
I cannot reply for every one, but my previous comments on release-plugin and multi-module considered as best pratices. If you want to a little bit further, you can imagine use more powerfull SCM (Clearcase, Perforce, ...), but maven integration is fewer, not "well" documented and community provide less examples than SVN or Git.

Maven Release Plugin
If you are using a multi-module pom.xml you should be able to do mvn release -DautoVersionSubmodules and have it do a "release" build of all your dependencies and remove the -SNAPSHOT versions and upload them to your repository. That is what the release plugin and its workflow exists solely to do.

question about application instance management

I am currently working on a rather large project with a team distributed across the United States. Developers regular commit code to the source repository. We have the following application builds (all are managed by an application, no manual processes):
Continuous Integration: a monitor checks to see if the code repository has been updated, if so it does a build and runs our unit test suite. On errors, the team receive email notifications
Daily Build: Developers use this build to verify their bug fixes or new code on an actual application server, and if "things" succeed, the developer may resolve the task.
Weekly Build: Testers verify the resolved issue queue on this build. It is a more stable testing environment.
Current Release build: used for demoing and an open testing platform for potential new users.
Each build refreshes the database associated with it. This cleans data and verifies any databases changes that go along with the new code are pulled in. One concern I hear from our testers is that we need to pre-populate the weekly build database with some expected testing data, as opposed to more generic data that developers work with. This seems like a legitimate concern/need and is something we are working on.
I am tossing what we are doing out to see if the SO community sees any gap with what we are doing, or have any concerns. Things seems to be working well, but it FEELS like it could be better. Your thoughts?

An additional step that is followed is that once the release build passes tests (say smoke test) then it is qualified as a good build (say a golden build) and you use some kind of labeling mechanism to label all the artefacts (code, install scripts, makefiles, installable etc.) that went into the creation of the golden image. The golden build may become a release candidate later or not.
Probably you are already doing this, since you don't mention I added what I had observed.

this is pretty much the way we do it.
The DB of the testers themselves is only reset on demand. If we would refresh this automatically every week then
we would lose the references to bug symptoms; if a bug is found but a developer only looks at it a few weeks later (or simply after the weekend) then all eveidence of that bug may have dissapeared
testers might be in the middle of a big test case (taking more than 1 day for instance)
we have tons of unit tests which are running against a DB which is refreshed (automatically of course) each time an integration build is executed
regards,
Stijn

I think you have a good, comprehensive process, as long as it fits in with when your customers want to see updates. One possible gap I can see is that it looks like you wouldn't be able to get a critical customer bug fix into production in less than a week, since your test builds are weekly and then you'd need time for the testers to verify the fix.
If you fancy thinking about things a different way, have a look at this article on continuous deployment - it can be a bit hard to accept the concept at first, but it definitely has some potential.

Build management/ Continuous Integration best practices

How does your team handle Builds?
We use Cruise Control, but (due to lack of knowledge) we are facing some problems - Code freeze in SVN - Build management
Specifically, how do you make available a particular release when code is constantly being checked in?
Generally, can you discuss what best practices you use in release management?

I'm positively astonished that this isn't a duplicate, but I can't find another one.
Okay, here's the deal. They are two separate, but related questions.
For build management, the essential point is that you should have an automatic, repeatable build that rebuilds the entire collection of software from scratch, and goes all the way to your deliverable configuration. in other words, you should build effectively a release candidate every time. Many projects don't really do this, but I've seen it burn people (read "been burned by it") too many times.
Continuous integration says that this build process should be repeated every time there is a significant change event to the code (like a check in) if at all possible. I've done several projects in which this turned into a build every night because the code was large enough that it took several hours to build, but the ideal is to set up your build process so that some automatic mechanism --- like an ant script or make file --- only rebuilds the pieces affected by a change.
You handle the issue of providing a specific release by in some fashion preserving the exact configuration of all affected artifacts for each build, so you can apply your repeatable build process to the exact configuration you had. (That's why it's called "configuration management.") The usual version control tools, like git or subversion, provide ways to identify and name configurations so they can be recovered; in svn, for example, you might construct a tag for a particular build. You simply need to keep a little bit of metadata around so you know which configuration you used.
You might want to read one of the "Pragmatic Version Control" books, and of course the stuff on CI and Cruise Control on Martin Fowler's site is essential.

Look at continuous integration: best pratices, from Martin Fowler.
Well, I have managed to find a related thread, I participated in, a year ago. You might find it useful, as well.
And here is how we do it.
[Edited]
We are using Cruise Control as integration tool. We just deal with the trunk, which is the main Subversion repository in our case. We seldom pull out a new branch for doing new story cards, when there is a chance of complex conflicts. Normally, we pull out a branch for a version release and create the build from that and deliver that to our test team. Meanwhile we continue the work in trunk and wait for the feedback from test team. Once all tested we create a tag from the branch, which is immutable logically in our case. So, we can release any version any time to any client in case. In case of bugs in the release we don't create tag, we fix the things there in the branch. After getting everything fixed and approved by test team, we merge the changes back to trunk and create a new tag from the branch specific to that release.
So, the idea is our branches and tags are not really participating in continuous integration, directly. Merging branch code back to the trunk automatically make that code becomes the part CI (Continuous Integration). We normally do just bugfixes, for the specific release, in branches, so it doesn't really participate into CI process, I believe. To the contrary, if we start doing new story cards, for some reasons, in a branch, then we don't keep that branch apart too long. We try to merge it back to trunk as soon as possible.
Precisely,
We create branches manually, when we plan a next release
We create a branch for the release and fix bugs in that branch in case
After getting everything good, we make a tag from that branch, which is logically immutable
At last we merge the branch back to trunk if has some fixes/modifications

Release Management goes well beyond continuous integration.
In your case, you should use Cruise Control to automatically make a tag, which allows developers to go on coding while your incremental build can take place.
If your build is incremental, that means you can trigger it every x minutes (and not for every commit, because if they are too frequent, and if your build is too long, it may not have time to finish before the next build tries to take place). The 'x' should be tailored to be longer that a compilation/unit test cycle.
A continuous integration should include automatic launch of unit tests as well.
Beyond that, a full release management process will involve:
a series of deployment on homologation servers
a full cycle of homologation / UAT (User Acceptance Test)
non-regression tests
performance / stress tests
pre-production (and parallel run tests)
before finally releasing into production.
Again "release management" is much more complex than just "continuous integration" ;)

Long story short: Create a branch copied from trunk and checkout/build your release on that branch on the build server.
However, to get to that point in a completely automated fashion using cc.net is not an easy task. I could go into details about our build process if you like, but it's probably too fine grained for this discussion.
I agree with Charlie about having an automatic, repeatable build from scratch. But we don't do everything for the "Continuous" build, only for Nightly, Beta, Weekly or Omega (GA/RTM/Gold) release builds. Simply because some things, like generating documentation, can take a long time, and for the continuous build you want to provide developer with rapid feedback on a build result.
I totally agree with preserving exact configuration, which is why branching a release or tagging is a must. If you have to maintain a release, i.e. you can't just release another copy of trunk, then a branch on release approach is the way to go, but you will need to get comfortable with merging.

You can use Team Foundation Server 2008 and Microsoft Studio Team System to accomplish your source control, branching, and releases.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.