Moving a big Legacy Java Project to Git - java

I have a very well structured big project which is maintained in SVN in my team. And we not want to move to Git - which will further integrated with continious integration servers like Teamcity or Jenkins (yet to decide).
When I saw the svn I found that since svn allows you to create tags from anywhere - It has a lot of tags which only contains single-single projects.
The codebase is so huge that I import all the code in one git repo. One approach would be for each project - I can create seperate repos, or combine some of the project to one repo (not sure)
The thing I want to achieve is I want to retain all the history and break the repo so that it becomes easy to manage. (Also it ll be helpful in future while integration with jenkins for automated builds on every commit)
How can I ensure to retain history, have all the existing tags, and move to git

git-svn is the obvious solution to help with this migration but, in my opinion, this is better for a one-off migration from SVN to git.
If you want to keep both the SVN and git repositories alive and allow commits to both (keeping them both in sync), I would recommend SubGit.

When we moved from svn to git I found this stackoverflow post very helpful, specifically the second answer. I assume from your description that everything is under a single trunk. That does make it a little more complicated. I have not tried moving individual projects before, but you may be able to do it in one of two ways:
Migrate the entire svn repo, tags, branches, etc. to your local git repo (before you push to the remote). Then break up the projects by following the suggestions in (this) stackoverflow post. This should give you individual repos that you can then push to the remote repos.
You may be able to alter the steps when running the svn to git conversion and specify an individual project, but that seems dangerous and confusing since the tags/branches won't necessarily line up.

Related

Working with maven and multiple git repositories - reducing the pain

We recently migrated from SVN, with most code in a single repo, to git, with most projects in their own repos (about 70 of them). We build about a dozen different apps from this java source. The apps all run on *nix servers. We use maven and nexus to build. Many of us are struggling with developing features when that feature touches more than one repo. Here are a few of the challenges:
The developer has to branch each repo separately - we use the same name for all branches for one feature to make tracking less difficult.
One must update poms of all repos to point to the updated versions of each repo's artifact. If multiple people are working on the same branch, there can be a lot of merging others pom changes. When I commit a change to a repo, then the artifact is renamed to "-SNAPSHOT" which means more pom updates.
Changes need to be pushed in the right order or our automated builds will fail, e.g: repo A depends on a change to repo B; if repo A is pushed before repo B is built and deployed, then repo A won't build.
The person reviewing the feature has to look at changes in multiple repos.
When the feature is merged from its branch to, say, master, One has to remember all the repos that were touched.
It looks like switching to a mostly monorepo approach might be best, tho there are some drawbacks there:
Building the entire codebase with maven takes a looong time. (Why can't maven be more like make, only building things that have changed or whose dependencies have changed?)
Each push kicks off a big set of builds and many unit tests rather than just one repo's artifact build and test.
The developers who generally work in one or two repos prefer this new multi-repo world and will resist a change back.
I've looked into git submodules and sub trees, which don't seem to solve many of our issues (Not sure about Google Repo). Some of us use tools like "mu" to help. It would be sweet if there was a toolkit that would help developers maintain versions in poms, and track changes across repos.
Let me know if you have a set of procedures or tools you use to ease development in this kind of environment.
with most projects in their own repos (about 70 of them).`
For me this is where the problems start. My vote goes for minimising this number significantly.
If you really don't want a single repo (1 repo gets my vote) then you could separate the code base into n*change_often repos with 1*change_rarely repo. Keeping the n small is important. This way you would avoid rebuilding the bits that change rarely.
Also, even with the a single repo you don't need to reference everything by source and use binaries for base libraries. When a base library changes the person making the change could also update all the references in one go so that that all projects are up to date.

Several projects sharing same database (Spring JAVA)

I am facing one problem and not sure what is the best way to go,
So I have two git repositories with Spring projects which have to share the same database. (Both are spring projects with hibernate).
One of them is main project so-called MASTER which should modify all Hibernate entities and other I will call SLAVE which is secondary project and needs to read only from the same database.
Here is the small illustration what I have.
So the issue appears when I realized that need to keep the duplicate of entities in both master and slave.
I found two ways to go with this issue.
Using git submodules. Where I can have my entities to be an independent submodule.
Building a JAR from entity classes and include it in both projects.
This both solutions are not meeting my requirements which are:
The solution of submodules is not good because whenever I commit anything from the MASTER I want SLAVE to track that changes. Please note, I have 3 git branches for both projects, master, staging and production. So all the branches should have accordingly their version of entities.
The solution of the jars will work, just I do not find it nice and solid, as I should all the time build them and add a dependency for every project.
The development of these projects is done independently from each other.
Please, could you share your opinion on this issue?
I kind of sure that I am not only the one who is trying to achieve the same.
You should consider publishing your jar to a maven repository for easier exchange between the projects. You could even host your own like sonar nexus. https://www.sonatype.com/nexus-repository-sonatype
Personally, I think that managing the versions can be very annoying when you have multiple projects. Especially when you are testing something and you have to create a new jar and then publish it over and over again. However your project will be rebuildable and you can controll which project/module can use a newer version of your entity-dependency.

Transitioning from SVN to Git (git-svn, multiple branches in workspace and no dependency management)!

We are currently trying to pilot the transition from Git to SVN to increase production and collaboration within our team.
However we are facing some issues with trasitioning and finding counterparts which currently work for us. I've been reading up on Git and can't seem to find a specific answer.
Here are some issues:
Our project is composed of several subprojects each built as a project of its own. How do we manage these subprojects with Git? One of the main issues I've encountered is when switching branches, I have to individually switch branches among
I've read about Subproject support as mentioned in https://git.wiki.kernel.org/index.php/SubprojectSupport, but I've also read that this isn't supported by git-svn.
We have multiple SVN branches currently, each representing a release. Most of us have all relevant branches (usually 2-3) checked out in our workspace. Switching branches might be okay if it's fast, but another problem is the configuration of our build paths & etc (considering we don't have any dependency management system in place at our level of development and all are done manually). Is there a way to go around this in Git, either by allowing multiple branches active in a workspace, or through rapid switching?
I'm not sure if there will be any specific correct answer, but pointing me to relevant resources will be helpful as well. Thank you.
Your question is rather broad (or possibly contains multiple questions), but I'll try a general answer:
Our project is composed of several subprojects each built as a project
of its own. How do we manage these subprojects with Git?
Usually you would put these into one Git repo, each in its subdirectory. You can use multiple repositories, but that only makes sense if the projects are versioned, branched and released independently. Branches are always per repository in Git (unlike in SVN, where you branch a single directory), so the rule of thumb is: What is branched together shares on repo, what is branched separately gets its own repo.
We have multiple SVN branches currently, each representing a release.
Most of us have all relevant branches (usually 2-3) checked out in our
workspace. Switching branches might be okay if it's fast, but another
problem is the configuration of our build paths & etc (considering we
don't have any dependency management system in place at our level of
development and all are done manually). Is there a way to go around
this in Git, either by allowing multiple branches active in a
workspace, or through rapid switching?
You cannot have multiple branches checked out in one working directory (how would that even work?). You can make multiple clones (each with its own working directory), then check out different branches. However, I'm not sure that is the best solution for you.
Switching branches in git is very fast - essentially just the time for the filesystem I/O required to change the files that need changing.
About the build paths: If you switch branches in git, the paths do not change, because the switch happens inside the working directory.
A final note: It looks like you should really look into some kind of dependency management and artifact management. Doing all this with source code only is rather error-prone and difficult.

Who is using my maven artifact?

I have a system consisting of multiple web applications (war) and libraries (jar). All of them are using maven and are under my control (source code, built artifacts in Nexus,...). Let say that application A is using library L1 directly and L2 indirectly (it is used from L1). I can easily check the dependency tree top-down from the application, using maven's dependency:tree or graph:project plugins. But how can I check, who's using my library? From my example, I want to know, whether A is the only application (or library) using L1 and that L2 is used from L1 and from some other application, let say B. Is there any plugin for maven or nexus or should I try to write some script for that? What are your suggestions?
If you wish to achieve this on a repository level, Apache Archiva has a "used by" feature listed under project information
.
This is similar to what mvnrepository.com lists under its "used by" section of an artifact description.
Unfortunately, Nexus does not seem to provide an equivalent feature.
Now I suppose it would be a hassle to maintain yet another repository just for that, but then it would probably easier than what some other answers suggestions, such as writing a plugin to Nexus. I believe Archiva can be configured to proxy other repositories.
Update
In fact, there's also a plugin for Nexus to achieve the "used by" feature.
As far as I know nothing along these lines exists as an open source tool. You could write a Nexus plugin that traverses a repo and checks for usages of your component in all other components by iterating through all the pom's and analyzing them. This would be a rather heavy task to run though since it would have to look at all components and parse all the poms.
In a similar fashion you could do it on a local repository with some other tool. However it probably makes more sense to parse the contents of a repo manager rather than a local repository.
I don't think there's a Maven way to do this. That being said, there are ways of doing this or similar things. Here's a handful examples:
Open up your projects in your favorite IDE. For instance Eclipse will help you with impact analysis on a class level, which most of the time might be good enough
Use a simple "grep" on your source directory. This sounds a bit brusk (as well as stating the obvious), perhaps, but we've used this a lot
Use dependency analysis tools such as Sonargraph or Lattix
I am not aware of any public libraries for this job, so I wrote a customized app which does it for me.
I work with a distribution which involves more than 70 artifacts bundled together. Many times after modifying an artifact, I want to ensure changes are backward compatible (i.e. no compilation errors are introduced in dependent artifacts). To achieve this, it was crucial to know all dependents of modified artifact.
Hence, I wrote an app which scans through all artifacts under a directory(/subdirectories), extracts their pom.xml and searches (in dependency section of pom) for occurrence of modified artifact.
(I did this in java although shell/windows script can do this even more compactly.)
I'll be happy to share code on github, if that could be of any help.
One way that might suit your needs are to create a master-pom with all your maven projects. Then you run the following command on the master-pom:
mvn dependency:tree -DoutputType=graphml -DoutputFile=dependency.graphml
Open the generated file in yEd.
Used the instructions found here:
http://www.summa-tech.com/blog/2011/04/12/a-visual-maven-dependency-tree-view/
More interesting is probably: what would you do with this information? Inform the developers of A not to use library L1 or L2 anymore, because it has a critical bug?
In my opinion you should be able to create a blacklist of dependencies/parents/plugins on your repository manager. Once a project tries to deploy/upload itself with a blacklisted artifact, it should fail. I'm saying uploading and not downloading, because that might break a lot of projects. As far as I know, this is not yet available for any repository-manager.
One of the ways to approach this problem is outside Java itself : write an OS-level monitoring script that tracks each case of fopen() on the jar file under question! Assuming this is in a corporate environemnt, you might have to wait for a few weeks (!) to allow all using processes to access the library at least once!
On Windows, you might use Sysinternals Process Monitor to do this:
http://technet.microsoft.com/en-us/sysinternals/bb896645
On Unix variants, you would use DTrace or strace.
IMHO and also from my experience, looking for a technical solution for such a problem is often an overkill. If the reason why you want to know who is using your artifact(library) is because you want to ensure backward compatibility when you change an artifact or something similar, I think it is best done by communicating your changes using traditional channels and also encourage other teams who might be using your library to talk about it (project blogs, wiki, email, a well known location where documentations are put, Jour fixe etc.).
In theory, you could write a script that crawls though each project in your repository and then parses the maven build.xml (assuming they all use maven) and see whether they have defined a dependency to your artifact. If all the projects in your organization follows the standard maven structure, it should be easy to write one such script (though if any of those projects have a dependency to your artifact via a transitive dependency, things can get a bit more tricky).

Old Programmer looking to use subversion assembla

Please help… Old Programmer looking to use subversion assembla at my firm. I am doing alot of Java in Eclipse and my issues is the following. I am going to make it very easy.
1) I build a web site in Eclipse with JSP. I check it in and commit. it is live
2) I start working on version two of the site but someone finds a bug in one of the prod JSPs. how do I checkout that version of the jsp update it and then commit it to the project. please tell me the right steps
Here's the workflow that most organizations use:
When you make a production release, you tag it. In SVN, this is done with svn cp, copying trunk into a named directory under tags.
If you need to make bugfixes to a production release, you use svn cp to copy the tagged revision into a branch under branches. You then check out this named revision, make your changes, and check in.
If you're going to push the changes out to production, you can tag them from the branch, again using svn cp. Tags are cheap in Subversion.
If the fixes you made in the branch need to go back into trunk, you can merge them.
This is covered in the docs (this is a link to the chapter on branching and merging, but I recommend you read the introductory material if you're not familiar with SVN).
http://svnbook.red-bean.com/en/1.5/index.html
Read chapters 2 and 3 and that will be enough to hit the ground running. The command you are likely looking for is svn update -rNNN however, without some background on SVN odds are excellent that you'll misuse it as SVN is very like (yet in some ways different) than the old school CVS, RCS, SCCS like systems.
You might want to skim chapter 1 too, as the revisioning model SVN uses is a little different than tight locking models (if you've been using one of those).

Categories