Say there's a legacy Java project A. That project for whatever reason has some things in it that are confidential (e.g. passwords, encryption keys, emails) and / or environment-specific (e.g. hard-coded paths, server names, emails). Due to the complexities involved, it doesn't seem possible to change the project to not contain the information in the source code.
At some point, a new outsourcing team joins the development. Given the above situation, the outsourcing team cannot get access to the project source verbatim. They have a separate development environment, so it's possible to make a separate copy of the project in their VCS that has the problems addressed (i.e. all the things needed are cleaned / updated as necessary to work in their environment). Let's call that version A2.
The workflow would generally include two things related to A and A2:
The code can change at both sides (i.e. both A and A2 can change, A being changed by the original team and A2 by the outsourcing team), including having source code change conflicts
There's a need to keep the two projects in sync. It's not required to have them in sync all the time, but it's important to have a relatively painless way to do that. It's assumed this must be a manual process when there are conflicts to be resolved
This workflow can be achieved by manually keeping two projects and merging between them.
Related questions:
How would one go about managing the two versions with git, i.e. what are the options compared to manual merging?
Is this the best setup or is there a better option?
For new projects, what is the preferred way (in the sense - what do you do if you have similar situation?) to keep the confidential / environment-specific things out of source control? Is that a good thing anyway?
This approach is going to cause you pain. What you need to do is use git filter-branch to eliminate server names, passwords out and replace with a non-working general form - ie, it should not run - anywhere!
Next, set up smudge/clean scripts to alter the files that contain that information to populate the values to what they need to be for your solution to run on that local system only. There will be different parameters on your production environment compared to your development environment. The key is to have this information abstracted.
Now you should have no issue sharing the same repository with the outsourced team. Managing branches in one repo versus scrubbing commits between to repos is way easier.
#icyrock.com: That seems the recipe for a disaster.
My suggestion is to separate source code from sensible data.
Note that this is a more general suggestion, you probably want to keep that sensible data safely stored and with limited access.
Steps:
1. remove every sensible data from the source code,
2. create a new git repository that contains that sensible data
3. reference sensible data from the original source code (this depends on the programming language, Java is not my field of expertise)
At this point the "cleaned" source code can be safely shared with the outsourcing team, thay will not have access to the "sensible data" repo, but they probably have a similar repo with their own version of that sensible data (i.e. "demo" or "trial" or "non-production" paths, server names, emails).
Of course the above is needed if the outsourcing team should be put in a position to test their changes in a test environment, which I strongly assume as a MUST have. They ARE doing tests, aren't they?
This will drastically reduce if not eliminate as a whole any problem related to big messy merge between 2 copy of the same stuff being actively developed in parallel.
Related
I was used to manage versions with a tag in Git. But that was a long time ago, for stand-alone applications. Now the problem is that I have a web application, and at the same application might connect clients that expect to communicate to different versions of the application.
So, I added to the input a path variable for the version in that way :
#PathParam("version") String version
And the client can specify the version in the URL:
https://whatever.com/v.2/show
Then across the code I added conditions like this:
if(version.equals("v.2") {
// Do something
}
else if(version.equals("v.3") {
// Do something else
}
else {
// Or something different
}
The problem is that my code is becoming very messy. So I decided to do in a different way. I added this condition only in one point of the code, and from there I call different classes according to the version:
MyClassVersion2.java
MyClassVersion3.java
MyClassVersion4.java
The problem now is that I have a lot of duplication.
And I want to solve this problem as well. How can I do now to have a web application that:
1) Deal with multiple versions
2) It is not messy (with a lot of conditions)
3) Doesn't have much duplication
Normally, when we speak of an old version of an application, we mean that the behavior and appearance of that version is cast in stone and does not change. If you make even the slightest modification to the source files of that application, then its behavior and/or appearance may change, (and according to Murphy's law it will change,) which is unacceptable.
So, if I were you, I would lock all the source files of the old version in the source code repository, so that nobody can commit to them, ever. This approach solves the problem and dictates how you have to go about everything else: Every version would have to have its own set of source files which would be completely unrelated to the source files of all other versions.
Now, if the old versions of the application must have something in common with the newest version, and this thing changes, (say, the database,) then we are not exactly talking about different versions of the application, we have something more akin to different skins: The core of the application evolves, but users who picked a skin some time ago are allowed to stick with that skin. In this case, the polymorphism solution which has already been suggested by others might be a better approach.
your version number is in a place in the URL named the 'Context Root'.
You could release multiple different WAR files each of which is configured to respond on different Context Roots.
So one war for version 1, one war for version 2 etc.
This leaves you with code duplication.
So what you are really asking is, "how do I efficiently modularise Java web applications?".
This is a big question, and leads you into "Enterprise Java".
Essentially you need to solve it by abstracting your common code to a different application. Usually this is called 'n-tier' design.
So you'd create an 'integration tier' application which your 'presentation' layer war files speaks to.
The Integration tier contains all the common code so that it isn't repeated.
Your integration tier could be EJB or webservices etc.
Or you could investigate using OSGi.
We have a large project consisting of the following:
A: C++ source code / libraries
B: Java and Python wrapping of the C++ libraries, using SWIG
C: a GUI written in Java, that depends on the Java API/wrapping.
People use the project in all the possible ways:
C++ projects using the C++ API
Java projects using the Java API
Python scripting
MATLAB scripting (using the Java API)
through the Java GUI
Currently, A, B and C are all in a single Subversion repository. We're moving to git/GitHub, so we have an opportunity to reorganize. We are thinking of splitting A, B, and C into their own repositories. This raises a few questions for us:
Does it make sense to split off the Java and Python SWIG wrapping (that is, the interface (*.i) files) into a separate repository?
Currently, SWIG-generated .java files are output in the source tree of the GUI and are tracked in SVN. Since we don't want to track machine-generated files in our git repositories, what is the best way of maintaining the dependency of the GUI on the .java/.jar files generated by SWIG? Consider this: if a new developer wants to build the Java GUI, they shouldn't need to build A and B from scratch; they should be able to get a copy of C from which they can immediately build the GUI.
Versioning: When everything is in one repository, A, B and C are necessarily consistent with each other. When we have different repositories, the GUI needs to work with a known version of the wrapping, and the wrapping needs to work with a known version of the C++ API. What is the best way to manage this?
We have thought deeply about each of these questions, but want to hear how the community would approach these issues. Perhaps git submodule/subtree is part of the solution to some of these? We haven't used either of these, and it seems submodules cause people some headache. Does anybody have stories of success with either of these?
OK, I looked in a similar problem like you (multiple interacting projects) and I tried the three possibilities subtree, submodules and a single plain repository with multiple folders containing the individual parts. If there are more/better solutions I am not aware of them.
In short I went for a single repository but this might not be the best solution in your case, that depends...
The benefit of submodules is that it allows easy management as every part is itself a repo. Thus individual parties can work only on their repo and the other parts can be added from predefined binary releases/... (however you like). You have to add an addtional repo that concatenates the individual repos together.
This is both the advantage and disadvantage: Each commit in this repo defines a running configuration. Ideally your developers will have to make each commit twice one for the "working repo" (A through C) and one for the configuration repo.
So this method might be well suited if you intent you parts A-C to be mostly independent and not changing too often (that is only for new releases).
I have to confess that I did not like the subtree method personally. For me (personally) the syntax seems clumsy and the benefit is not too large.
The benefit is that remote modifications are easily fetched and inserted but you loose the remote history. Thus you should avoid to interfere with the remote development.
This is the downside: If you intend to do modifications on the parts you have always to worry about the history. You can of course just develop in a git remote and for testing/merging/integrating change to the master branch. This is ok for mainly reading remote repos (if I am developing only on A but need B and C) but not for regular modifications (in the example for A).
The last possibility is one plain repo with folders for each part. The benefit is that no adminstration to keep the parts in sync is directly needed. However you will not be able to guarantee that each commit will be a running commit. Also you developers will have to do the administration by hand.
You see that the choice depends on how close the individual parts A-C are interconnected. Here I can only guess:
If you are in an earlier stage of development where modifications throughout the whole source tree are common one big repo is better handleable than a splitted version. If your interfaces are mostly constant the splitting allows smaller repos and a more strict separation of things.
The SWIG code and the C++ code seems quite close. Thus splitting those two seems less practical than splitting the GUI from the rest (for my guess).
For you other question "How to handle new developers/(un)tracking machine-generated code?":
How many commits are made/required (only releases or each individual commit)? If only releases are of interest you could go with binary packages. If you intent to share each single commit, you would have to provide many different binary versions. Here I would suggest let them compile the whole tree once consuming a few minutes and from there on rebuilding is just a short make that should not take too long. This could even automatized using hooks.
I have a Java-based server, transmitting data from many remote devices to one app via TCP/IP. I need to develop several versions of it. How can I develop and then dwell them without need in coding for 2 projects?I'm asking not only for that project, but for different approaches.
Where the behaviour differs, make the behaviour "data driven" - typically by externalizing the data the drives the behaviour to properties files that are read at runtime/startup.
The goal is to have a single binary whose behaviour varies depending on the properties files found in the runtime environment.
Java supports this pattern through the Properties class, which offers convenient ways of loading properties. In fact, most websites operate in this way, for example the production database user/pass details are never (should never be) in the code. The sysadmins will edit a properties file that is read at start up, and which is protected by the operating system's file permissions.
Other options are to use a database to store the data that drives behaviour.
It can be a very powerful pattern, but it can be abused too, so some discretion is advised.
I think you need to read up on Source Control Management (SCM) and Version Control Systems (VCS).
I would recommend setting up a git or Subversion repository and adding the code initially to trunk and then branching it off to the number of branches (versions you'll be working on).
The idea of different versions is this:
You're developing your code and have it in your SCM's trunk (or otherwise known as a HEAD). At some point you consider the code stable enough for a release. You therefore create a tag (let's call it version 1.0). You cannot (should not) make changes to tags -- they're only there as a marker in time for you. If you have a client who has version 1.0 and reports bugs which you would like to fix, you create a branch based on a copy of your tag. The produced version would (normally) be 1.x (1.1, 1.2, etc). When you're done with your fixes, you tag again and release the new version.
Usually, most of the development happens on your trunk.
When you are ready with certain fixes, or know that certain fixes have already been applied to your trunk, you can merge these changes to other branches, if necessary.
Make any other version based on previous one by reusing code base, configurations and any other asset. In case if several versions should be in place at one time use configuration management practices. Probably you should consider some routing activities and client version checks on server side. This is the place where 'backward compatibility' comes to play.
The main approach is first to find and extract the code that won't change from one version to another. The best is to maximize this part to share the maximum of code base and to ease the maintenance (correcting a bug for one means correcting for all).
Then it depends on what really changes from one version to another. The best is that on the main project you can use some abstract classes or interfaces that you will be able to implement for each specific project.
First off, I'm coming (back) to Java from C#, so apologies if my terminology or philosophy doesn't quite line up.
Here's the background: we've got a growing collection of internal support tools written for the web. They use HTML5/AJAX/other buzzwords for the frontend and Java for the backend. These tools utilize a lightweight in-house framework so they can share an administrative interface for security and other configuration. Each tool has been written by a separate author and I expect that trend to continue, so I'd like to make it easy for future authors to stay "standardized" on the third-party libraries that we've already decided to use for things like DI, unit testing, ORM, etc.
Our package naming currently looks like this:
com.ourcompany.tools.framework
com.ourcompany.tools.apps.app1name
com.ourcompany.tools.apps.app2name
...and so on.
So here's my question: should each of these apps (and the framework) be treated as a separate project for purposes of Maven setup, Eclipse, etc?
We could have lots of apps appear here over time, so it seems like separation would keep dependencies cleaner and let someone jump in on a single tool more easily. On the other hand, (1) maybe "splitting" deeper portions of a package structure over multiple projects is a code smell and (2) keeping them combined would make tool writers more inclined to use third-party libraries already in place for the other tools.
FWIW, my initial instinct is to separate them.
What say you, Java gurus?
I would absolutely separate them. For the purposes of Maven, make sure each app/project has the appropriate dependencies to the framework/apps so you don't have to build everything when you just want to build a single app.
I keep my projects separated out, but use a parent pom for including all of the dependencies and other common properties. Individual tools / projects have a name and a reference to the parent project, and any project-specific dependencies, if any. This works for helping to keep to common libraries and dependencies, since the common ones are already all configured, but allows me to focus on the specific portion of the codebase that I need to work with.
I'd definitely separate these kind of things out into separate projects.
You should use Maven to handle the dependencies / build process automatically (both for your own internal shared libraries and third party dependencies). There won't be any issue having multiple applications reference the same shared libraries - you can even keep multiple versions around if you need to.
Couple of bonuses from this approach:
This forces you to think carefully about your API design for the shared projects which will be a good thing in the long run.
It will probably also give you about the right granularity for source code control - i.e. your developers can check out and work on specific applications or backend modules individually
If there is a section of a project that is likely to be used on more than one project it makes sense to pull that out. It will make it a little cleaner as well if you need to update the code in one of the commonly used projects.
If you keep them together you will have fewer obstacles developing, building and deploying your tools.
We had the opposite situation, having many separate projects. After merging them into one project tree we are much more productive and this is more important to us than whatever conventions happen to be trending.
We currently have an application which is essentially a fully-functional demo for potential clients. All the functionality is there. However, we use generic branding/logos, call our own web services (which would later be swapped out for calls to client web-services), etc.
Here is my question. If we have two different clients, we would prefer as little duplicate code as possible. I understand that this could be done -- from a java perspective -- by simply including a shared JAR. However, we will need to change around resources. Also, one client may not want some functionality that another client does want. On top of this, if we are doing general bug fixes, we will normally want these fixes to be in both versions of the application.
We are using Git for version control and Maven for building the project.
One option we discussed is simply branching the project and maintaining separate versions. However, then we would have to manually merge changes that we want reflected in all versions of the app.
Another option we discussed is somehow swapping out resources, etc. using maven profiles. However, if we need to make any non-superficial changes to the code itself, this could be a problem. We might have to get into factories and different implementations.
Does anyone have recommendations on the best way to handle this?
We use a library project with git submodules to handle all of our similar projects. The master project is pretty hefty but we use a configuration file to determine what features should be in the finished product.