Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloc --git-diff-rel does not respond consistently when dealing with copy/rename #782

Open
EnricoPicci opened this issue Nov 1, 2023 · 3 comments

Comments

@EnricoPicci
Copy link

Describe the bug
This is the same issue reported in #780 - I am just not able to open the issue again, hence I create a new one.

cloc; OS; OS version

  • cloc version: 1.98
  • OS Linux:
  • OS version: 22.04

To Reproduce
Clone the repo https://github.com/EnricoPicci/git-metrics.
Then run repeatedly the following command:
cloc --git-diff-rel --csv --by-file --timeout=10 --quiet 6fb8624bad8d62ee14da5c7a527c786b301f7529^1 6fb8624bad8d62ee14da5c7a527c786b301f7529

What I get on my machine (running Ubuntu 22.04.2) is either a list of 11 differences or (less often) a list of 16 differences.

Expected result
The result of the cloc --git-diff-rel should always be the same.

Additional context
I have written this nodejs script to run the commands few times - the scripts exits when it finds that the result of the previous run of the command is different from the result of the last run of the command

var child_process = require("child_process");
var fs = require("fs");

var previousLength = 0;
var i = 0;
var maxNumberOfIterations = 100;

setInterval(function () {
    console.log('iteration ' + i);
    console.log('lastLength ' + previousLength);
    child_process.execSync('cloc --git-diff-rel --csv --by-file --timeout=10 --quiet  6fb8624bad8d62ee14da5c7a527c786b301f7529^1 6fb8624bad8d62ee14da5c7a527c786b301f7529 > out.csv');
    var lastLength = fs.readFileSync('out.csv', 'utf8').split('\n').length;
    if (previousLength && lastLength !== previousLength) {
        console.log('the command has not returned the same number of lines: once it was ' + previousLength + ' and now it is ' + lastLength);
        process.exit(1);
    }
    if (i === maxNumberOfIterations) {
        console.log('the command has returned the same number of lines every time after ' + i + ' iterations');
        process.exit(0);
    }
    i++;
    previousLength = lastLength;
}, 100);

Attached below you may find both the results I get, the one that returns 11 records and the one that returns 16 records.

out_11.csv
out_16.csv

@AlDanial
Copy link
Owner

AlDanial commented Nov 3, 2023

Thanks for the test repo. I confirmed the inconsistent behavior. The 16 case happens rarely, < 5%, so this will be a challenge to debug.

AlDanial added a commit that referenced this issue Jan 5, 2024
Inconsistent removal of // in get_leading_dirs() from
paths created by File::Temp::tempdir() caused about 3%
of invocations to cause faulty file pair alignments.
As a result diff computations would vary between runs
and/or show duplicate files in the output.
@AlDanial
Copy link
Owner

AlDanial commented Jan 5, 2024

This bug many, many hours to resolve! Thanks for submitting this issue and especially for the repo to reproduce it. The root cause was related to stripping temporary directory names from locations where the git contents were expanded--in a small percentage of cases the leading directory was retained which messed up the file pair alignment, leading to incorrect added/modified/deleted findings.

I ran 1,000 cases with the fix and all results were consistent. Give 8368d3f a try.

@EnricoPicci
Copy link
Author

Thanks for fixing.
My tests now show no prob at all.
And thanks a lot for the great tool you have built.
Have a nice evening (in my timeframe this is late evening).
Ciao
Enrico

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants