When do I commit when moving files in a git? (Jgit)

When do I commit when moving files in a git? (Jgit) - java

I am implementing a bot that performs scheduled backups.
from a front-end a user will be able to change the folder names the backups are stored in.
according to:
What's the purpose of git-mv?
mv oldname newname
git add newname
git rm oldname
is what I want to do when a folder or file name is to be changed.
so I move the files using Java FileUtils,
add the new file/folder and remove the old file/folder using:
git.add().addFilepattern(newName).call();
git.rm().addFilepattern(oldName).call();
git.commit().setAll(true).setMessage("Renamed group "+oldName+ " to " +newName).call();
The main goal being: to preserve the history of the files being moved.
Should I commit after adding the 'new' file before removing the 'old'?
Is my current order of operations fine and committing after both operations should preserve the change history?
I am still new to Git and how the logging works, in TortoiseGit it shows files added and removed, would it show up as a move in the log if the process worked?
Thank you for your time.

Git does not actually record history of individual files in the repository; it records the history of the entire repository as a single unit. There's nothing in a commit that explicitly says that the foo.txt in revision 2 is a continuation of the bar.txt in revision 1. Instead, renames are inferred by tools that examine the repository — after the changes have been committed — using the heuristic that if a commit removes a file and also creates another file with similar contents, the old file was renamed to the new one.
This heuristic only recognizes a rename if both changes occur in the same commit. If you remove a file, commit, then add the file back with a different name and commit again, Git will see that as separate deletion and addition of unrelated files.
Note that rename detection is optional and tools may not do it by default. With git log you need to use the -M option, for example, or do git config --bool diff.renames true.

I'm not familiar with JGit, but your Java code should probably mirror what Git is actually doing beneath the interface when you run your command. Since you are already doing this, I don't see any problem. I would make sure that the entire renaming operation appears in a single commit. There are several reasons for wanting to do this. You may want to revert the renaming at some point. If you have a single commit, it would be easy to do this via git revert.
With regard to preserving the history, renaming a file makes it harder to track the history, but not impossible, e.g.
git log --follow ./path/to/file

Related

¿What is actually the Working directory in Git? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am spending lots of time to get a clear idea about the 'Working directory in Git'
Is it a especific folder or directory? or is a version of a directory? Can anyone help me to understand this concept.
What if I create a directory 'mydir' locally
then I run: git init.
thanks

In Git, the phrase working directory was once a synonym for working tree. It isn't any longer, because the phrase working directory may also be used by your OS (usually with a third word in front, as current working directory). Modern Git tries to use the phrase working tree as much as possible, though this is sometimes shortened to work-tree or worktree, as in git worktree add for instance.
In your OS, when they use the phrase current working directory, this refers to the folder or directory1 you are working in at the time. That may be within your working tree.
In Git, the phrase working tree refers to the OS-maintained directories-and-files that hold your copies of files. These are yours, to deal with as you wish: Git simply fills them in from committed files.
What if I create a directory 'mydir' locally then I [run]: git init
Let me rephrase this as the following series of shell commands:
$ mkdir mydir
$ cd mydir
$ git init
The mkdir creates a new, empty directory, within your current working directory. The cd then enters this empty directory, so that now what was ./mydir is your current working directory. The git init command runs with its own current working directory being this empty directory.
Since the directory mydir was empty at the time you ran git init, Git will create a hidden directory / folder named .git within this mydir directory. This hidden directory contains the repository proper. The repository consists of a number of files and directories that implement several databases:
One database is a simple key-value store that uses hash IDs to locate internal Git objects. This is the main (and usually largest) of the two primary databases that make up a Git repository.
One database is another simple key-value store that uses names as keys, to store hash IDs, which are then used in the first database. This is the secondary database that makes up a Git repository. This particular database's implementation in current versions of Git tends to be a bit dodgy: it relies too much on your operating system. On macOS and Windows, it tends to be a bit flawed. There is ongoing work in Git to replace this with a proper database implementation, which will eliminate this problem.
Apart from these two main databases, the repository contains many auxiliary files, including Git's index (aka staging area). The most important point here is that all of these entities live within the .git directory, though.
As there are no commits yet, both main databases are empty. At this point, so is Git's index.
Your work-tree consists of all files and directories inside your current working directory except the .git directory, which holds Git's files. Since your work-tree is yours, and is maintained by your OS (not by Git), you can now create any files you like here.
At some point, you will want to have Git create a new commit. This will be the very first commit in the repository. To create this commit, you will add the files you would like to go into this initial commit, into Git's index / staging-area, using git add. The git add program works by copying your work-tree files into Git's index. So, with your OS's current working directory being the mydir directory, you can now just create some file(s):
$ echo "repository for project X" > README
$ git add README
$ git commit
The echo command here creates a new file named README in your working tree. The git add command takes the working tree file, compresses and Git-ifies it to make it ready to be stored in a new commit, and writes the stored file into Git's index.2 The final command, git commit, gathers some metadata from you—the person making the commit—and writes out Git's index and this metadata, storing the results in the main database, to create a new commit.
Once you've made this new, initial commit—the very first commit in the repository—it becomes possible for branch names to exist. They cannot exist until this point because each branch name must hold a valid, existing hash ID, and hash IDs for future commits are not predictable.3 Now that there is one commit, that's the only hash ID that any branch name can hold.4
Over time, you will add more and more commits to the repository. (In general, it's pretty rare to ever drop a commit, except for, e.g., the way git rebase replaces commits with new-and-improved ones. It's not impossible, it is just difficult.) Each new commit therefore adds to the repository.
The repository itself, then, consists of:
the databases that hold commits and other objects, and the names that find them;
Git's index, used to hold your proposed next commit; and
other maintenance items that you and/or Git may find useful.
The commit objects, and in fact all objects in the big database, are strictly read-only. Nothing and no one can ever change them. They're in a form that is directly useful only to Git itself, though.
Cloning the repository consists of copying the two databases, although the names database is only partly copied, and gets changed during the cloning process.
Meanwhile, your working tree is where you have Git extract commits, turning stuff that's only directly useful to Git—and that is read-only—into stuff you can work with and modify. These are your files. This is how you do your work, in your working tree. You can use the results to update Git's index, and then use Git's index to create a new commit, that adds on to the repository without changing anything that already exists in the repository.
1At the OS level, the terms folder and directory are synonyms. Git itself does not store folders or directories: it just stores files whose names may contain embedded slashes, such as path/to/file.ext. That's all one single file name. Your OS may force you to first make a folder named path, then in that folder, make a folder named to, and only then use the combined path and to folders to make a file named file.ext within that path. The current working directory can be changed to path, so that you would use the name to/file.ext, instead of path/to/file.ext, or even to path/to so that you would use the name file.ext. In all cases, Git will internally work with a stored file named path/to/file.ext. So your current working directory is an OS concept, referring to how you move around within the folders that your OS maintains.
2Technically, the index doesn't actually hold the files directly. It holds instead a Git blob object hash ID for the file, which provides the key to the key-value object database so that Git can look up the file's content, plus the name of the file—complete with (forward) slashes—and some additional information. The blob object holds a compressed and de-duplicated copy of the file's content.
This de-duplication, and the fact that it is git add that readies the file for committing, means that git commit will go quite fast, as it need not prepare anything for committing: it just saves, permanently, the blob objects already stored in the index.
3The hash ID of a commit is a cryptographic checksum of the commit's complete content. The content include not only the saved source files (as an internal Git tree object), but also the exact date-and-time-stamp. Since we don't even know what you'll commit in the future, much less exactly when you will commit it, we cannot compute what the future hash ID will be. You may know what you will commit, which gets you closer; but unless you know exactly when you will commit it, you won't know the hash ID either.
4Branch names in particular are constrained: they may only hold a commit hash ID. Tag names can hold the hash ID of any of Git's four internal object types. (Usually, though, a tag name either holds a commit hash ID, or the hash ID of a newly-created annotated tag object, which in turn holds a commit hash ID.) Other types of names may have their own constraints.

Git manage environment specific configuration

I have a requirement to have a property configuration for different environment like dev, uat and production. For example a config.properties having and entry like environment=dev, this I need to change for staging branch as environment=uat and for master branch as environment=prd .
I tried to commit these files in each branch respectively and tried adding config.properties in gitignore so that it will not consider in next commits.
But git ignore not getting updated so I ran command
git rm -rf --cached src/config.properties
git add src/config.properties
git commit -m ".gitignore fix"
But this command is deleting the file from local repository itself and the proceeding commits also deleting from branches. I want to handle the branch as such so as Jenkins will do the deployment without editing config file manually. I am using fork for git UI. Is there any way to handle this kind of situation?

You should not version a config.properties (git rm is right), and ignore it indeed.
That way, it won't pose any issue during merge.
It is easier to have three separate files, one per environment:
config.properties.dev
config.properties.uat
config.properties.prd
In each branch, you would then generate config.properties, with the right value in it, from one of those files, depending on the current execution environment.
Since you have separate branches per environment, with the right file in it, you can have a generation script which will determine the name of the checked out branch with:
branch=$(git rev-parse --symbolic --abbrev-ref HEAD)
That means you could:
version only a template file config.properties.<env>
version value files named after the branches: config.properties.dev, config.properties.uat...: since they are different, there is no merge issue when merging or switching branches.
Finally, you would register (in a .gitattributes declaration) a content filter driver.
(image from "Customizing Git - Git Attributes", from "Pro Git book")
The smudge script, associated to the template file (package.json.tpl), would generate (automatically, on git checkout) the actual config.properties file by looking values in the right config.properties.<env> value file.
The generated actual config.properties file remains ignored (by the .gitignore).
See a complete example at "git smudge/clean filter between branches".

Intellij - Git status shows files have been changed when they have not

I am working on a Java project in Intellij that uses git. Quite a few files are blue (to show that changes have been made), however when I right click them and click on "Git -> Compare with Latest Repository Version" it says that the contents are identical. Anyone know why this happens? It only seems to happen to files that I've opened to look at but haven't changed. Could it happen if I accidentally added extra white space and then deleted it or something? Or just extra whitespace in general?

This is how GIT is different from SVN. GIT's change detection algorithm does not depend only on the content of the file but the meta data (timestamp last modified, etc) of the file as well. So even if you are adding just one space and removing it later on; if you save it, it modifies the metadata of the file.
For more details, you can have a look at: What algorithm does git use to detect changes on your working tree?

Eclipse can not commit all files to Git

I have project in eclipse. I was developing it. Everything was fine. I use git to commit changes.
But starting from some point I noticed that not all files in git repository are committed.
When I do commit, git just do not show it under list of available files. I have tried to commit each file - no result, tried "add to index" - no result.
Does somebody know what can be the reason? I have such problem first time.
And no tracking symbol ">" appears.

In order to see if there is an issue with a .gitignore, switch back to the command line, and type:
git check-ignore -v -- yourFile
You will immediately see if one of the .gitignore rules applies to it or not.

Can Eclipse's Refactor > Move be integrated with Git?

One of the great things about using an IDE for Java is the automated refactorings you get. The problem I'm having is that after using Refactor > Move to move a class into a different package (which moves the file itself in the filesystem), git status shows that the file in the old location has been deleted, and the one in the new location has been added.
The workaround I've found is clunky:
mv src/com/example/newpackage/Foo.java src/com/example/oldpackage/Foo.java
git mv src/com/example/oldpackage/Foo.java src/com/example/newpackage/Foo.java
Is there any way (when using the Git plugin for Eclipse) to have the refactoring do a git mv instead of a naive filesystem move?

That's the way how Git works with renames/moves (delete old file and add new file). It then detects the contents of the file, and recognizes a rename based on an algorithm. So even it shows you delete and add, if you commit and then do a "git log --follow movedfilename", it should show you the whole history, even the history before the rename.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.