Git Diff at method level

Git Diff at method level - java

I am looking at getting information on all the methods/function Added , Deleted and Modified between any two commits
Notes -
Code Base is in Java and on Github
Utlimate Goal - I must be able to get all the Deleted, Modified(Both source code modification and renaming of methods) and Newly added Methods between any two commits spanning across
sub-packages and classes
More pleased if full method signature
is returned along with fully qualified method name
Things I Tried
git Diff - Link - but the Diff history is huge and I'm really only interested in the changes of methods added, deleted or modified (ie in Java lists the class but not the function)
git log -L :function:path/to/file - prints the change history of that function, doesn't do what I intend to do and watchers are on a specific function but not on whole git repo. Another limitation is of getting diff between two commits.
Desired Results
Diff between any two commits should return
Methods Added ->
myMethod12 - path/to/class
myMethod34 - path/to/class
Methods Deleted ->
myMethod3 - path/to/class
myMethod11 - path/to/class
Methods Renamed ->
(Previous Name) (Revised Name) (Path)
myMethod6 yourMethod32 path/to/class
Methods Modified (source code modifs) ->
myMethod44 - path/to/class
or ideally the fully qualified method name
ie
Methods Added ->
com.example.subp.subp2.nestedpack.addMessages(Message[] msgs)
...

Git is a general tool. It does not understand your source language (in this case Java, but what if your source language were instead Swift or Python or C++ or TypeScript or, well, whatever else you can think of?). It just understands "lines of text" and has simple (or sometimes, not-very-simple) regular expressions to recognize function / method / class / other such definitions, to annotate diffs.1
To get the kind of output you want, you need a tool that does understand the language in question.
Given such a tool, you will give it:
an older version (a commit or a file from that commit), and
a newer version (another commit, or "the same" file from that commit).
It should then read those two commits' files, figure out what methods you have, and produce whatever analysis you like.
What this tool needs from Git is two versions. When and whether it can handle just getting two files, or needs two entire snapshots, depends on that tool.
The git difftool command may, or may not, be helpful for invoking this other tool. What git difftool does is compare two entire commits, then, for each differing file, feed the old and new versions of those files to another tool. You choose that second tool, from any tool you have on your computer, anywhere. Git merely invokes that tool, on the pair of files extracted from the pair of commits. If this does what you need, you're now done. If not, you may need some more steps: for instance, you might want to run git diff --raw <commit1> <commit2> and parse its output, or just git checkout each of the two commits into some temporary locations (using a temporary index for each) and work from there.
1Note that regular expressions are not capable of proper parsing; most real languages require a grammar. See, e.g., Regular Expression Vs. String Parsing. A proper CS-theoretic discussion will get into Finite State Automata but is generally off topic on StackOverflow.

First of all, git works with text and isn't responsible for indexing
your sources, searching for methods definitions, etc.
So, probably, the best solution is diff. Here is described how to use diff between specific commits.
From myself I would like to encourage you for using diff for specific file: git diff specific-file and using grep if diff is huge:
git diff | grep -e method-name -e public -e private -e void -e etc
I hope you will invent more suitable command for your goals. Good luck!

I don’t know of a tool that does exactly that.
In order to do something like that you need a Java-aware diff tool. difftastic supports Java. You get output like this (diff on some random BSD-3 licensed code):

Related

How to get git -log of all branches in Java?

I have a task to implement a program in Java (pure Java without 3rd party libraries) that reads a history of any git repository and puts the commits into tree data structure.
Could you give me any hints? How to read git log in Java without 3rd party libraries?

You might want to take a look at Processes and Threads and how to execute a command in the runtime. It does have some details and need fundamental understanding of java.lang.Runtime, java.io and some other relevant topics, so that I'd refrain to write a whole method here and recommend you to search for a good tutorial and also get the first idea from other questions here, like → getting output from executing a command line program

Symbol to signify the root of a project

Is there a well accepted symbol in the programming world for the root of a project?
For example, the tilde ~ is the user's home directory, but this not just convention, but part of UNIX.
I am looking for a symbol that is merely convention.

If you are looking for a convention for use in communicating with a team, I'd suggest the project name followed by a /. This makes it clear as to what project you are referring to. If the project name is already implied by the context, it seems to be the convention to simply use a subdirectory name, with or without a trailing slash. See here and here for examples from Linux-kernel related documentation.

I'm not aware of any such convention. In Autoconf, variables top_srcdir and abs_top_srcdir points to the root of a project. In git, this does the job:
git rev-parse --show-toplevel
However, if you are looking for a single character symbol, I suggest borrowing the tee character: ⊤ (U+22A4, &#8868). I don't think it has ever been used for that, but it captures the idea of top.

the root of a project
What means the root of the project exactly ? Given which context ? Which types of projects ? Are you talking about a deployed web projects ? A source tree of a web projects ? A command line utility written in C ? Or in Java ? Or Go ?
Each language and framework provides its on sets of predefined structures to follow. The root of the project is then, either the root of the vcs, which may store many assets not strictly related to the business of the software, or the root according to the given framework / language you are working with, in which case, i assume it is safe to say, it can be anything because they are so many different fw for so many different concerns.

Windows vs. POSIX
The Portable Operating System Interface (POSIX) like UNIX.
Windows has C:// or other drivers as root, while POSIX have / as root.
to know if the file is a root path or not, you can use path.isAbsolute('PATH_HERE') this ill return true if it is a root path.
to know if your node is running on a windows or POSIX platform use process.platform
to check if you are running in windows:
var isWin = /^win/.test(process.platform);
nodeJS Docs: https://nodejs.org/dist/latest-v6.x/docs/api/path.html#path_path_isabsolute_path

i think people usually use label to be the root instead of symbol, e.g., /server for the root of node app.

The Be-all, End-all
After doing the bare minimum of research and reading about 1/4 of a wikipedia article on Root Directory I have come to the almighty, forever-binding conclusion that:
No, there is no standardized way of indicating you are in the root directory of an arbitrary project. (Apart from reading the path itself)
Here is another link pertaining to inodes farther down to make it seem like I did more research.
In that case, making a standard seems like fun doesn't it?
The standard you come up with doesn't have to be global, it can just apply to your dev team if you want it to. In that case, let's make 3 right off the top of our (my) head.
How about |->foo/bar/a.java? The | indicates a flat level, with nothing before it.
We could always try a boring (but useful... I guess): (foo)/bar/a.java
Or to spice things up a little bit, we could do...
I am gROOT
|foo|/bar/a.java
Whatever standard you choose (which is kinda funny, because the usage of standard implies that there's only one) you're now going to have to...
Implement it!
This is going to be the hard part. You're going to have to find some way to indicate to the OS that you're not only in an arbitrary directory, but that you're in a directory that holds slightly more significance than others. Maybe you add another section to the INODE (in *nix at least) that specifies that it's important. Maybe you don't fuss around with all the OS level stuff, and instead patch git to recognize the root of all git projects... which now that I think about it, kind of already happens.
Possible Implementation
Lets use git as an example. Git projects are denoted by .git files in the root directory. So let's take that a step farther and put a .base file in every directory that is the root of a project (or what have you). The .base doesn't even need to have anything in it, it just needs to be there. Now, patch up whatever terminal you're using to recognize the .base file as the root of an arbitrary project, and display it however you like! EZ-PZ
Possible additions?
Some other thoughts here, maybe you could add some configuration to the .base file, like so:
proj_name=WorldTraveller
lang=java
other=stuff
can=go
here=whatever
which then drives how its displayed in the terminal. The above configuration using my first suggested standard would be
|->WorldTraveller/Countries/France/a.java
Note
I'm not trying to come off as a sarcastic D.i.a.B, so if I came off as one it wasn't my intention. I like to have fun answering questions sometimes.

How to accommodate multiple coding styles? (git vs. IDE)

I am collaborating on a git-sourced, maven-managed Java project with differing code styling preferences with users using multiple IDE's (note 1).
Is there a tool or IDE configuration that will allow code to be viewed and edited using style-1, but committed to SCM using style-2?
My research points me to 'no', but a solution combining git hooks and Checkstyle/jrefactory might be possible.
So if 'no' to above, is there a tool/process that will perform the TBD process actions below?
The checkout process flow for User1 would be:
git pull
TBD process formats code to User1 style-1
User1 works in their preferred IDE with style-1 settings
The commit workflow for User1 would be:
User1 is ready to commit/push code
TBD process formats code to standard format style-standard
git push
Note 1: multiple IDE's = Eclipse, IntelliJ, Netbeans.
Note 2: My question differs from this question in that I'd like to focus on an IDE-related solution, since forcing the minority of standards-divergent users is probably a more efficient solution.
Note 3: Acknowledging that this shouldn't be done for best-practices-reasons. However, if you grant that it's time expect more flexibility from our IDEs and SCMs, this question is intended to explore those solutions.

First of all, you really shouldn't do that. Codestyle wars are bad for any project, and it is best to decide upon one codestyle that everybody must use. It is simple to configure IDEs to automatically apply the specified codestyle at every filesave, so the developers don't have to write code in the target codestyle themselves, they can let the IDE do that for them. True, this doesn't solve the fact that they'll have to read code in a codestyle they don't yet like, but it's a lot safer than having invisible automatic code changes; that's a major source of bugs.
Maybe you can use Eclipse's code formatter from the command line to apply a different codestyle. You'd have to set up git hooks, make sure everybody has Eclipse available, and provide the proper configuration files for their preferred codestyle. You'd need hooks both for post-checkout and pre-commit, one to set up the user's codestyle, the other to commit in the central codestyle. To go one step further, you can play with the index to add the formatted code so that it doesn't include style differences in git diff (although they will show up in git diff --staged).
Again, you shouldn't do that.

I agree with Sergiu Dumitriu in this not being a very good idea. But still git provides exactly what you are looking for. Even though this will only work if your central coding style is very well defined and strictly followed. Here’s how it works:
Git provides smudge/clean filters. They allow you to pass all code through a so-called “smudge” filter on checkout and reverse that with a “clean” filter when code is added to the staging area. These filters are set in .gitattributes, and there is a repository-local version of that file available in .git/info/attributes.
So you set your smudge filter to a tool that will change the code to your personal coding style on checkout:
And your clean filter will convert the code back to the central coding style on checkin (more precisely: when file are staged):
It is very important, that smudge -> clean is a no-op / generates the original file again. Otherwise you will still check in format changes every time you change a file.
Using smudge and clean filters will retain all the functionality of git (including git diff etc). You can find the full docu in git help attributes

Merging two local files in subversion

I destroyed my subversion tree. (I attempted to ignore a few files, broke something, and now the svn says it can't find my root directory, even though it correctly notes the differences between files in said directory.) So, now I have about twenty files from my current project that I'd like to commit but can't.
I ended up checking out a new tree entirely, but now I don't know how to intelligently merge my files from the broken tree to the new tree I just checked out. I don't want to simply copy the files, as this will wipe changes others have done since I've updated. (The broken tree doesn't let me update.) Using 'svn merge' isn't meant to be used on two local copies, right? What tools can I use?

Use kdiff3 and manually merge your changes into the repository. Then commit.

I don't think that you can find better merging tool than winmerge
BTW - I didn't like the kdiff3 :)

Patching Java software

I'm trying to create a process to patch our current java application so users only need to download the diffs rather than the entire application. I don't think I need to go as low level as a binary diff since most of the jar files are small, so replacing an entire jar file wouldn't be that big of a deal (maybe 5MB at most).
Are there standard tools for determining which files changed and generating a patch for them? I've seen tools like xdelta and vpatch, but I think they work at a binary level.
I basically want to figure out - which files need to be added, replaced or removed. When I run the patch, it will check the current version of the software (from a registry setting) and ensure the patch is for the correct version. If it is, it will then make the necessary changes. It doesn't sound like this would be too difficult to implement on my own, but I was wondering if other people had already done this. I'm using NSIS as my installer if that makes any difference.
Thanks,
Jeff

Be careful when doing this--I recommend not doing it at all.
The biggest problem is public static variables. They are actually compiled into the target, not referenced. This means that even if a java file doesn't change, the class must be recompiled or you will still refer to the old value.
You also want to be very careful of changing method signatures--you will get some very subtle bugs if you change a method signature and do not recompile all files that call that method--even if the calling java files don't actually need to change (for instance, change a parameter from an int to a long).
If you decide to go down this path, be ready for some really hard to debug errors (generally no traces or significant indications, just strange behavior like the number received not matching the one sent) on customer site that you cannot duplicate and a lot of pissed off customers.
Edit (too long for comment):
A binary diff of the class files might work but I'd assume that some kind of version number or date gets compiled in and that they'd change a little every compile for no reason but that could be easily tested.
You could take on some strict development practices of not using public final statics (make them private) and not every changing method signatures (deprecate instead) but I'm not convinced that I know all the possible problems, I just know the ones we encountered.
Also binary diffs of the Jar files would be useless, you'd have to diff the classes and re-integrate them into the jars (doesn't sound easy to track)
Can you package your resources separately then minimize your code a bit? Pull out strings (Good for i18n)--I guess I'm just wondering if you could trim the class files enough to always do a full build/ship.
On the other hand, Sun seems to do an okay job of making class files that are completely compatible with the previous JRE release, so they must have guidelines somewhere.

You may want to see if Java WebStart can help you as it is designed to do exactly those things you want to do.
I know that the documentation describes how to create and do incremental updates, but we deploy the whole application as it changes very rarely. It is then an issue of updating the JNLP when ready.

How is it deployed?
On a local network I just leave everything as .class files in a folder. The startup script uses robocopy or rsync to copy from network share to local. If any .class file is different it is synced down. If not, it doesn't sync.
For non-local network I created my own updater. It downloads a text file of md5sums and compares to local files. If different it pulls file down from http.

A long time ago the way we solved this was to used Classpath and jar files. Our application was built in a Jar file, and it had a launcher Jar file. The launcher classpath had a patch.jar that was read into the classpath before the main application.jar. This meant that we could update the patch.jar to supersede any classes in the main application.
However, this was a long time ago. You may be better using something like the Java Web Start type of approach, which offers more seamless application updating.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.