I'm fairly new to git and this might be a question with an obvious answer:
So I've got a project/framework I'm using to manage a bunch of automated test cases. So the folder code structure is below.
Java
|-Package1
|-Common code
|-tests
|- Client1 tests
|- Client2 tests
|
|
|- ClientN tests
Is it recommended to maintain client specific code in different branches? Or is it better to make a copy of the project and maintain in different repos per client? Basically each client has different tests written on top of same core using Selenium/TestNG.
Both options are possible but I think in here it is more a question of maintainability. I this case I would go for the separate repos approach but with an extra: git submodules
There are 2 reasons for choosing this approach:
complexity of maintaining separate branches. Each client would have a development workflow. That means you will probably have to branch off when you want to add a new feature to one of your clients from the existing client branch and things get messy. At some point you may start to confuse branches;
the second reason is keeping up with the common code; if you have the clients on separate branches how do you keep up with the common code updates? regular merges / rebases ... ? A lot more overhead on the long term than using separate git repos.
With the separate repos/submodule approach you would get something like this:
Common Code Repo ( referred to as CCR )
Client 1 Repo has CCR as submodule
Client 2 Repo has CCR as submodule
...
Client N Repo has CCR as submodule
Like this everything is managed independently and every project can have its own flow without having dependencies and messy branching structure.
Related
I'm not sure how to phrase this so I apologize if the title of the question does not make sense to you.
Due to various historical reasons, I have multiple teams contribute to the same code repo, that serves multiple service endpoints. Currently all teams deployments and releases are done together which creates a lot of churn.
I'm trying to get to this state: team A and B can still share the same code base, but they can deploy separately using different Kubernetes namespace? Like:
Team A's code is all under com/mycompany/team_a, team B's under com/mycompany/team_b
Somewhere in the repo there is a config that does the mapping:
com/mycompany/team_a/* => config_team_a.yaml, that has a Kubernetes config, maybe with namespace TeamA/ServiceA
com/mycompany/team_b/* => config_team_b.yaml with namespace TeamB/ServiceB
So that they can build their image separately and, of course, deploy separately.
Correct me if I'm wrong, but from the description of your problem it looks like you actually have two problems:
The fact that you have separate services code in the same repo (team A and team B);
The fact that you have several environments (development/production, for example)
The second issue can be easily solved if you use Helm, for example. It allows you to template your builds and pass different configs to it.
The first one, can also be partly solved by helm, since you can also separate your teams builds using templating.
However, a few years ago, I was working on a .net monorepo and faced yet another problem: every time there was a PR merged to our git repo, a build was triggered in Jenkins for every service we had, even those that did not have changes. From the description of your problem, it is not clear to me if you have a Jenkins pipeline configured and/or if you are also facing something similar, but if you are, you can have a look at what I did to workaround the issue: repo. Feel free to have a look and I hope that helps.
We recently migrated from SVN, with most code in a single repo, to git, with most projects in their own repos (about 70 of them). We build about a dozen different apps from this java source. The apps all run on *nix servers. We use maven and nexus to build. Many of us are struggling with developing features when that feature touches more than one repo. Here are a few of the challenges:
The developer has to branch each repo separately - we use the same name for all branches for one feature to make tracking less difficult.
One must update poms of all repos to point to the updated versions of each repo's artifact. If multiple people are working on the same branch, there can be a lot of merging others pom changes. When I commit a change to a repo, then the artifact is renamed to "-SNAPSHOT" which means more pom updates.
Changes need to be pushed in the right order or our automated builds will fail, e.g: repo A depends on a change to repo B; if repo A is pushed before repo B is built and deployed, then repo A won't build.
The person reviewing the feature has to look at changes in multiple repos.
When the feature is merged from its branch to, say, master, One has to remember all the repos that were touched.
It looks like switching to a mostly monorepo approach might be best, tho there are some drawbacks there:
Building the entire codebase with maven takes a looong time. (Why can't maven be more like make, only building things that have changed or whose dependencies have changed?)
Each push kicks off a big set of builds and many unit tests rather than just one repo's artifact build and test.
The developers who generally work in one or two repos prefer this new multi-repo world and will resist a change back.
I've looked into git submodules and sub trees, which don't seem to solve many of our issues (Not sure about Google Repo). Some of us use tools like "mu" to help. It would be sweet if there was a toolkit that would help developers maintain versions in poms, and track changes across repos.
Let me know if you have a set of procedures or tools you use to ease development in this kind of environment.
with most projects in their own repos (about 70 of them).`
For me this is where the problems start. My vote goes for minimising this number significantly.
If you really don't want a single repo (1 repo gets my vote) then you could separate the code base into n*change_often repos with 1*change_rarely repo. Keeping the n small is important. This way you would avoid rebuilding the bits that change rarely.
Also, even with the a single repo you don't need to reference everything by source and use binaries for base libraries. When a base library changes the person making the change could also update all the references in one go so that that all projects are up to date.
Background
I’ve been using the serverless framework since version 0.5 successfully. The project was made in python using lambda and api-gateway and we group all our API’s on the same git repo separated by folders simulating the same structure that our services have, at the end this is a Nanoservice architecture and was integrated with cognito, custom authorizer, stages, the entire deal. An example of the structure:
functions/V1
– /users
—- /post
----- handler.py
----- s-function.json
—- /delete
—- /get
– /groups
—- /get
s-project.json
s-resources-cf.json
Problem
Now I'm trying to do the same in Java, and of course because of Java is not supported in 0.5 then I go for V1. The first issue that I found how to use the same API gateway for multiple resources using nanoservice architecture. Assuming that this will be fixed soon, I want to include in the process Codepipeline and Codebuild. Checking all examples over the internet with serverless, everyone is making one single Java package with several handlers for post, get, …, requests and one serverless.yml with the configuration, then the buildspec.yml and one git repo for this. This works great but if i’m going to create a combination of Micro and Nano services how i’m going to have N git repos where i can isolate deploys with Codepiline, for me this is exponential support for repositories, codepipeline builds etc… but on the other hand if i want to edit one single function, make a push and trigger the codepipeline(build/deploy and test) this single java handler and no the entire infrastructure how can i achieve this?
In the real world, everyone has one git repo per micro/nano service? (easily we can have +100 resources in one single apigateway project), all CI deployments are isolated in this way? and how to group an entire api to manage order in local development to recreate the same order of resources with folders, or this aproach is wrong?
Hopefully anyone else solve this problem before and can give me some guidance
We have a large project consisting of the following:
A: C++ source code / libraries
B: Java and Python wrapping of the C++ libraries, using SWIG
C: a GUI written in Java, that depends on the Java API/wrapping.
People use the project in all the possible ways:
C++ projects using the C++ API
Java projects using the Java API
Python scripting
MATLAB scripting (using the Java API)
through the Java GUI
Currently, A, B and C are all in a single Subversion repository. We're moving to git/GitHub, so we have an opportunity to reorganize. We are thinking of splitting A, B, and C into their own repositories. This raises a few questions for us:
Does it make sense to split off the Java and Python SWIG wrapping (that is, the interface (*.i) files) into a separate repository?
Currently, SWIG-generated .java files are output in the source tree of the GUI and are tracked in SVN. Since we don't want to track machine-generated files in our git repositories, what is the best way of maintaining the dependency of the GUI on the .java/.jar files generated by SWIG? Consider this: if a new developer wants to build the Java GUI, they shouldn't need to build A and B from scratch; they should be able to get a copy of C from which they can immediately build the GUI.
Versioning: When everything is in one repository, A, B and C are necessarily consistent with each other. When we have different repositories, the GUI needs to work with a known version of the wrapping, and the wrapping needs to work with a known version of the C++ API. What is the best way to manage this?
We have thought deeply about each of these questions, but want to hear how the community would approach these issues. Perhaps git submodule/subtree is part of the solution to some of these? We haven't used either of these, and it seems submodules cause people some headache. Does anybody have stories of success with either of these?
OK, I looked in a similar problem like you (multiple interacting projects) and I tried the three possibilities subtree, submodules and a single plain repository with multiple folders containing the individual parts. If there are more/better solutions I am not aware of them.
In short I went for a single repository but this might not be the best solution in your case, that depends...
The benefit of submodules is that it allows easy management as every part is itself a repo. Thus individual parties can work only on their repo and the other parts can be added from predefined binary releases/... (however you like). You have to add an addtional repo that concatenates the individual repos together.
This is both the advantage and disadvantage: Each commit in this repo defines a running configuration. Ideally your developers will have to make each commit twice one for the "working repo" (A through C) and one for the configuration repo.
So this method might be well suited if you intent you parts A-C to be mostly independent and not changing too often (that is only for new releases).
I have to confess that I did not like the subtree method personally. For me (personally) the syntax seems clumsy and the benefit is not too large.
The benefit is that remote modifications are easily fetched and inserted but you loose the remote history. Thus you should avoid to interfere with the remote development.
This is the downside: If you intend to do modifications on the parts you have always to worry about the history. You can of course just develop in a git remote and for testing/merging/integrating change to the master branch. This is ok for mainly reading remote repos (if I am developing only on A but need B and C) but not for regular modifications (in the example for A).
The last possibility is one plain repo with folders for each part. The benefit is that no adminstration to keep the parts in sync is directly needed. However you will not be able to guarantee that each commit will be a running commit. Also you developers will have to do the administration by hand.
You see that the choice depends on how close the individual parts A-C are interconnected. Here I can only guess:
If you are in an earlier stage of development where modifications throughout the whole source tree are common one big repo is better handleable than a splitted version. If your interfaces are mostly constant the splitting allows smaller repos and a more strict separation of things.
The SWIG code and the C++ code seems quite close. Thus splitting those two seems less practical than splitting the GUI from the rest (for my guess).
For you other question "How to handle new developers/(un)tracking machine-generated code?":
How many commits are made/required (only releases or each individual commit)? If only releases are of interest you could go with binary packages. If you intent to share each single commit, you would have to provide many different binary versions. Here I would suggest let them compile the whole tree once consuming a few minutes and from there on rebuilding is just a short make that should not take too long. This could even automatized using hooks.
Say there's a legacy Java project A. That project for whatever reason has some things in it that are confidential (e.g. passwords, encryption keys, emails) and / or environment-specific (e.g. hard-coded paths, server names, emails). Due to the complexities involved, it doesn't seem possible to change the project to not contain the information in the source code.
At some point, a new outsourcing team joins the development. Given the above situation, the outsourcing team cannot get access to the project source verbatim. They have a separate development environment, so it's possible to make a separate copy of the project in their VCS that has the problems addressed (i.e. all the things needed are cleaned / updated as necessary to work in their environment). Let's call that version A2.
The workflow would generally include two things related to A and A2:
The code can change at both sides (i.e. both A and A2 can change, A being changed by the original team and A2 by the outsourcing team), including having source code change conflicts
There's a need to keep the two projects in sync. It's not required to have them in sync all the time, but it's important to have a relatively painless way to do that. It's assumed this must be a manual process when there are conflicts to be resolved
This workflow can be achieved by manually keeping two projects and merging between them.
Related questions:
How would one go about managing the two versions with git, i.e. what are the options compared to manual merging?
Is this the best setup or is there a better option?
For new projects, what is the preferred way (in the sense - what do you do if you have similar situation?) to keep the confidential / environment-specific things out of source control? Is that a good thing anyway?
This approach is going to cause you pain. What you need to do is use git filter-branch to eliminate server names, passwords out and replace with a non-working general form - ie, it should not run - anywhere!
Next, set up smudge/clean scripts to alter the files that contain that information to populate the values to what they need to be for your solution to run on that local system only. There will be different parameters on your production environment compared to your development environment. The key is to have this information abstracted.
Now you should have no issue sharing the same repository with the outsourced team. Managing branches in one repo versus scrubbing commits between to repos is way easier.
#icyrock.com: That seems the recipe for a disaster.
My suggestion is to separate source code from sensible data.
Note that this is a more general suggestion, you probably want to keep that sensible data safely stored and with limited access.
Steps:
1. remove every sensible data from the source code,
2. create a new git repository that contains that sensible data
3. reference sensible data from the original source code (this depends on the programming language, Java is not my field of expertise)
At this point the "cleaned" source code can be safely shared with the outsourcing team, thay will not have access to the "sensible data" repo, but they probably have a similar repo with their own version of that sensible data (i.e. "demo" or "trial" or "non-production" paths, server names, emails).
Of course the above is needed if the outsourcing team should be put in a position to test their changes in a test environment, which I strongly assume as a MUST have. They ARE doing tests, aren't they?
This will drastically reduce if not eliminate as a whole any problem related to big messy merge between 2 copy of the same stuff being actively developed in parallel.