Find out the SHA1 of a file from the index

Find out the SHA1 of a file from the index - java

Short story:
Like the title says, how to get the SHA1 or hash from the index of a checked out file using JGit?
Long story:
I am writing an GUI application and using JGit for revisioning of just one file. So a user is able to open a window which contains all of the revisions of this file in a nice table.
User can make his changes and commit them. Also, user can go back in time and choose an older revision from a table to work on.
This workflow is very simple. What I internally do with JGit is that I use only one branch (the master branch). The HEAD is always pointing to this branch and the tip of this branch is always the newest commit. When a user chooses an older revision I simply instantiate CheckoutCommand class, set path to file using addPath() and use the branch master using setName().
The above results in HEAD pointing to master branch which in turn points to the newest revision (not the user chosen revision). But the index and the working directory itself are now at the revision chosen by the user.
So, finally, I want to be able to present to user which of those revisions in table is currently checked out or activated or whatever you want to call this. This revision would than be highlighted like on the below screenshot. But i cannot use the tip of the master branch for this purpose. I need to somehow get the SHA1 from the index.
There is a question posted which is the exactly what I want but in the context of JGit (the author of the question uses git).
EDIT: After just a little bit more analyzing I found that I could use JGit DirCache to access the contents of the index. So using DirCache class I am able to get the SHA1 of the file in the index just like in this question. But now I see that this hash is not the same as the revision hash from which I checked out. Meaning, I can not use this method to determine which revision from a table is checked out.
So, is there any other way using my workflow described as is to determine which of the revisions is user chosen to work on? Or even, maybe someone can propose a different approach.
My current approach for this problem is to use JGit AddNoteCommand. When user checks out the revision I will simply add a note to this revision with some "key: value". This key will indicate if the revision is checked out or not. Anyone with a better suggestion?

so first of all, sorry to say that, but I think it's dangerous and unintuitive to do what you do. Git is built so that you use branches. I think what you do is called Detached-head manipulation and it's not recommended, even though JGit allows you to do many things.
But if you are very careful well you can go on.
Second the Dircache (previously Index) object has been very mysterious to me and I think (I am not too sure though) the JGit team is still working on it.
Finally, to actually answer the question: I think you should use the LogCommand, with its addPath(...) method. You will get a list of RevCommit, from which you can determine the SHA1. I don't precisely remember how you get the SHA1, I think you should call getName() when you have a Ref object. I guess you'll find it on StackOverflow.
However, I would recommend to use branches (depending on what operation you want to perform on your commit), based on the SHA1 you got: you create a branch from the SHA1 you just found and can perform safely any operation you want. Then, either you destroy the branch if you don't want to commit anything or you will merge it later.

Related

How to detect if a merge (from a pull request) will result in no files being modified?

I have created a repository hook on Bitbucket that automatically creates a pull request from one branch to another (and merge it) when some specific conditions are fulfilled. I would like to detect the case where the result of the merge is that no files will be modified (empty merge).
What I tried so far:
To check if there is commits on source branch that are not on target branch :
CommitsBetweenRequest commitsBetweenRequest = new CommitsBetweenRequest
.Builder(repository)
.include(sourceBranch)
.exclude(targetBranch)
.build();
Boolean anyChanges = commitService
.getCommitsBetween(commitsBetweenRequest, new PageRequestImpl(0, 1))
.stream()
.anyMatch();
This works great for simple cases. However, some "empty merges" are still undetected. This occurs when files modified on source branch have also been modified on target branch the same way, but in different commits.
FB1 -- FB2 (source)
/
/
---- A ---- B ---- C (target)
getCommitsBetween() will return FB1 and FB2.
However, if files modified in FB1 and FB2 have been modified in C as well (the same way) the result of the merge is no changes at all.
Checking the difference between branches in terms of file changes :
//I have tried to invert sourceBranch and targetBranch, without success
ChangesetsRequest changesRequest = new ChangesetsRequest
.Builder(request.getRepository(), sourceBranch)
.sinceId(targetBranch)
.build();
commitService.getChangesets(changesRequest, new PageRequestImpl(0, 1));
It does not give expected results. It give me the list of files modified in both source and target branch (not the files modified as the result of the merge).

This question can be even more general:
How can you determine what changes will occur as a result of a merge?
Perhaps the quickest and easiest method, is to simply perform the merge and see what changed:
git switch target-branch --detach
git merge source-branch --no-ff
git diff #~1 #
Here we are checking out the target branch using --detach so you move off of the branch and don't actually modify it, and we merge with --no-ff to make sure the diff command works even in the case where the source branch could be fast-forwarded. Commits are cheap in Git and get cleaned up with garbage collection, so it shouldn't ever be a problem to make a commit just for testing. Behind the scenes this is probably what Git SCM tools are doing to let you view a Pull/Merge Request before it's completed.
Note, in the case of your specific question, if the above diff command has no output, then you have confirmed your query regarding "no changes".
But do we really have to actually do the merge; is it possible to determine it?
I don't know for sure, but for now I'm going with you probably have to do the merge. Usually you can get pretty darned close by checking the diff of just the source branch from the target branch, for example:
git diff target-branch...source-branch
# note the 3 dots is shorthand for starting from the merge-base, so this is like:
git diff $(git merge-base source-branch target-branch) source-branch
but, as you point out in the question, that's not always perfect if some of the same changes were made on both the source and target branches. You could inspect the changes the other way too and compare them piece by piece, but it's likely that the logic to determine it is complicated enough that your best bet is to just do the merge and look at it. I wouldn't bother trying to re-create the merge logic if you don't have to. Besides, there are different merge strategies that can be used, and even the default merge strategy is subject to change; in fact it did recently.

Work out Analyzer, Version, etc. from Lucene index files?

Just double-checking on this: I assume this is not possible and that if you want to keep such info somehow bundled up with the index files in your index directory you have to work out a way to do it yourself.
Obviously you might be using different Analyzers for different directories, and 99% of the time it is pretty important to use the right one when constructing a QueryParser: if your QP has a different one all sorts of inaccuracies might crop up in the results.
Equally, getting the wrong Version of the index files might, for all I know, not result in a complete failure: again, you might instead get inaccurate results.
I wonder whether the Lucene people have ever considered bundling up this sort of info with the index files? Equally I wonder if anyone knows whether any of the Lucene derivative apps, like Elasticsearch, maybe do incorporate such a mechanism?
Actually, just looking inside the "_0" files (_0.cfe, _0.cfs and _0.si) of an index, all 3 do actually contain the word "Lucene" seemingly followed by version info. Hmmm...
PS other related thoughts which occur: say you are indexing a text document of some kind (or 1000 documents)... and you want to keep your index up-to-date each time it is opened. One obvious way to do this would be to compare the last-modified date of individual files with the last time the index was updated: any documents which are now out-of-date would need to have info pertaining to them removed from the index, and then have to be re-indexed.
This need must occur all the time in connection with Lucene indices. How is it generally tackled in the absence of helpful "meta info" included in with the index files proper?

Anyone interested in this issue:
It does appear from what I said that the Version is contained in the index files. I looked at the CheckIndex class and the various info you can get from that, e.g. CheckIndex.Status.SegmentInfoStatus, without finding a way to obtain the Version. I'm starting to assume this is deliberate, and that the idea is just to let Lucene handle the updating of the index as required. Not an entirely satisfactory state of affairs if so...
As for getting other things, such as the Analyzer class, it appears you have to implement this sort of "metadata" stuff yourself if you want to... this could be done by just including a text file in with the other files, or alternately it appears you can use the IndexData class. Of course your Version could also be stored this way.
For writing such info, see IndexWriter.setCommitData().
For retrieving such info, you have to use one of several (?) subclasses of IndexReader, such as DirectoryReader.

AEM: After deleting user groups, rep:policy nodes remain intact

I'm quite stunned at what I have found while tinkering with AEM (don't think it matters but for accuracy of my reporting I'm using 6.1) trying to automate my group permission creation. I have this group called aem-tools-readonly that has a specific set of permissions on it. No problem there, the thing that kind of surprises me is the following, if I happen to delete said group it does not delete the respective rep:policy nodes that correspond to that group. So if I re-create aem-tools-readonly it picks up the same config for my group. I am wondering a couple of things.
Should I be concerned security wise of creating holes in my permission scheme if groups get deleted as I move along with my projects ?
Why aren't these rep:policy nodes not getting deleted, is there a
valid reason ?
How can I easily delete all rep:policy nodes of for example my aem-tools-readonly group ?
Any information/thoughts are welcomed ...
Thanks

As far as I know this has always been this way.
This is how the ACL's implementation works in CRX.
To fix that prior to deleting a group you could clear its whole accesses - probably by deleting the proper entries lying under any rep:policy.
There is no easy (automatic way) to do that. just code. it should be quite easy though to find any descendant of any rep:policy that has your group name within it.

Find and delete duplicates in a Lotus Notes database

I am very new to lotus notes. Recently my team mates were facing a problem regarding the Duplicates in Lotus notes as shown below in the CASE A and CASE B.
So we bought a app named scanEZ (Link About scanEX). Using this tool we can remove the first occurrence or the second occurrence. As in the case A and Case B the second items are considered as redundant because they do not have child. So we can remove all the second item as given below and thus removing the duplicates.
But in the Case 3 the order gets changed, the child item comes first and the Parent items comes second so i am unable to use the scanEX app.
Is there any other better way or software or script to accomplish my task. As I am new to this field I have not idea. Kindly help me.
Thanks in advance.

Probably the easiest way to approach this would be to force the view to always display documents with children first. That way the tool you have purchased will behave consistently for you. You would do this by adding a hidden sorted column to the right of the column that that you have circled. The formula in this column would be #DocChildren, and the sort options for the column would be set to 'Descending'. (Note that if you are uncomfortable making changes in this view, you can make a copy of it, make your changes in the copy, and run ScanEZ against the copy as well. You can also do all of this in a local replica of the database, and only replicate it back to the server when you are satisified that you have the right results.)
The other way would be to write your own code in LotusScript or Java, using the Notes classes. There are many different ways that you could write that code,

I agree with Richard's answer. If you want more details on how to go thru the document collection you could isolate the documents into a view that shows only the duplicates. Then write an agent to look at the UNID of the document, date modified and other such data elements to insure that you are getting the last updated document. I would add a field to the document as in FLAG='keep'. Then delete documents that don't have your flag in the document with a second agent. If you take this approach you can often use the same agents in other databases.
Since you are new to Notes keep in mind that Notes is a document database. There are several different conflicts like save conflicts or replication conflicts. Also you need to look at database settings on how duplicates can be handled. I would read up on these topics just so you can explain it to your co-workers/project manager.
Eventually in your heavily travelled databases you might be able to automate this process after you work down the source of the duplicates.

These are clearly not duplicates.
The definition of duplicate is that they are identical and so it does not matter which one is kept and which one is removed. To you, the fact that one has children makes it more important, which means that they are not pure duplicates.
What you have not stated is what you want to do if multiple documents with similar dates/subjects have children (a case D if you will).
To me this appears as three separate problems.
The first problem is to sort out the cases where more than one
document in a set has children.
Then sort out the cases where only one document in a set has children.
Then sort out the cases where none of the documents in a set has children.
The approach in each case will be different. The article from Ytira only really covers the last of these cases.

ROLLBACK undo redo

I'm building a database using a BST (binary search tree) and I want the user to be able to roll back the last 5 commands. Any Suggestions? I'm using Java.

Have you considered using Berkey DB? It's free and supported nested transactions (which would allow you to have any number of levels of rollback):
http://download.oracle.com/docs/cd/E17076_02/html/gsg_txn/JAVA/nestedtxn.html
Even if you decide to implement your own DB, it might be useful as a reference.

It sounds like you want the Memento pattern. Essentially, you create an object that has all of the information required to:
From the state of the tree before the operation, repeat the operation. (Redo)
From the state of the tree after the operation, revert the operation. (Undo)
You'd keep the last five of these around. When the user asks for an undo, take the latest, ask it to revert the operation, then indicate somehow (some index variable, for example) where you are in the list of mementos. You should then be able to move through the list in either direction, undoing and redoing as much as you want.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.