108 lines
3.9 KiB
Markdown
108 lines
3.9 KiB
Markdown
# Repo Diff Trees
|
|
|
|
repo_diff_trees.py compares two repo source trees and outputs reports on the
|
|
findings.
|
|
|
|
The ouput is in CSV and is easily consumable in a spreadsheet.
|
|
|
|
In addition to importing to a spreadsheet, you can also create your own
|
|
Data Studio dashboard like [this one](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
|
|
|
|
If you wish to create your own dashboard follow the instructions below:
|
|
|
|
1. Sync the two repo workspaces you wish to compare. Example:
|
|
|
|
```
|
|
mkdir android-8.0.0_r1
|
|
cd android-8.0.0_r1
|
|
repo init \
|
|
--manifest-url=https://android.googlesource.com/platform/manifest \
|
|
--manifest-branch=android-8.0.0_r1
|
|
# Adjust the number of parallel jobs to your needs
|
|
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
|
|
cd ..
|
|
mkdir android-8.0.0_r11
|
|
cd android-8.0.0_r11
|
|
repo init \
|
|
--manifest-url=https://android.googlesource.com/platform/manifest \
|
|
--manifest-branch=android-8.0.0_r11
|
|
# Adjust the number of parallel jobs to your needs
|
|
repo sync --current-branch --no-clone-bundle --no-tags --jobs=8
|
|
cd ..
|
|
```
|
|
|
|
2. Run repo_diff_trees.py. Example:
|
|
|
|
```
|
|
python repo_diff_trees.py --exclusions_file=android_exclusions.txt \
|
|
android-8.0.0_r1 android-8.0.0_r11
|
|
```
|
|
|
|
3. Create a [new Google spreadsheet](https://docs.google.com/spreadsheets/create).
|
|
4. Import projects.csv to a new sheet.
|
|
5. Create a [new data source in Data Studio](https://datastudio.google.com/datasources/create).
|
|
6. Connect your new data source to the project.csv sheet in the Google spreadsheet.
|
|
7. Add a "Count Diff Status" field by selecting the menu next to the "Diff
|
|
Status" field and selecting "Count".
|
|
8. Copy the [Data Studio dashboard sample](https://datastudio.google.com/open/0Bz6OwjyDcWYDbDJoQWtmRl8telU).
|
|
Make sure you are logged into your Google account and you have agreed to Data Studio's terms of service. Once
|
|
this is done you should get a link to "Make a copy of this report".
|
|
9. Select your own data source for your copy of the dashboard when prompted.
|
|
10. You may see a "Configuration Incomplete" message under
|
|
the "Modified Projects" pie chart. To address this select the pie chart,
|
|
then replace the "Invalid Metric" field for "Count Diff Status".
|
|
|
|
## Analysis method
|
|
|
|
repo_diff_trees.py goes through several stages when comparing two repo
|
|
source trees:
|
|
|
|
1. Match projects in source tree A with projects in source tree B.
|
|
2. Diff projects that have a match.
|
|
3. Find commits in source tree B that are not in source tree A.
|
|
|
|
The first two steps are self explanatory. The method
|
|
of finding commits only in B is explaned below.
|
|
|
|
## Finding commits not upstream
|
|
|
|
After matching up projects in both source tree
|
|
and diffing, the last stage is to iterate
|
|
through each project matching pair and find
|
|
the commits that exist in the downstream project (B) but not the
|
|
upstream project (A).
|
|
|
|
'git cherry' is a useful tool that finds changes
|
|
which exist in one branch but not another. It does so by
|
|
not only by finding which commits that were merged
|
|
to both branches, but also by matching cherry picked
|
|
commits.
|
|
|
|
However, there are many instances where a change in one branch
|
|
can have an equivalent in another branch without being a merge
|
|
or a cherry pick. Some examples are:
|
|
|
|
* Commits that were squashed with other commits
|
|
* Commits that were reauthored
|
|
|
|
Cherry pick will not recognize these commits as having an equivalent
|
|
yet they clearly do.
|
|
|
|
This is addressed in two steps:
|
|
|
|
1. First listing the "git cherry" commits that will give us the
|
|
list of changes for which "git cherry" could not find an equivalent.
|
|
2. Then we "git blame" the entire project's source tree and compile
|
|
a list of changes that actually have lines of code in the tree.
|
|
3. Finally we find the intersection: 'git cherry' changes
|
|
that have lines of code in the final source tree.
|
|
|
|
|
|
## Caveats
|
|
|
|
The method described above has proven effective on Android
|
|
source trees. It does have shortcomings.
|
|
|
|
* It does not find commits that only delete lines of code.
|
|
* It does take into accounts merge conflict resolutions.
|