Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect Co-authored-by trailers in the contributors graph #30925

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kemzeb
Copy link
Contributor

@kemzeb kemzeb commented May 9, 2024

Resolves #29183.

Added an automated test (and refactored the test code around it) and also manually tested the Contributors, Commit Frequency, and Recent Commits pages in a few small repos and the output appears to be fine.

@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label May 9, 2024
@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 9, 2024
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label May 9, 2024
@kemzeb
Copy link
Contributor Author

kemzeb commented May 9, 2024

Now that I am looking at into it further, I think Pulse may need to account for co-authors too. I think this function is the one that needs updating

}

email = strings.TrimRight(parts[1], ">")
if _, err := mail.ParseAddress(email); err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since users can add anything between the <> brackets I believed that we could try to throw away lines that contain invalid email addresses.

Though while testing by creating blank repos and merging the commit history of existing projects into these repos I observed that some emails associated to bots will be rejected (e.g. the pre-commit bot on GitHub). I guess this happens because these emails don't follow RFC 5322. Not sure if I should do this validation check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how we parse other trailer lines I suppose. I would make it consistent with those.

@kemzeb
Copy link
Contributor Author

kemzeb commented May 9, 2024

Hmm I'm not sure how common it is, but should I account for duplicate Co-authored-by entries i.e. entries that have the same email in a single commit?

@lunny lunny added this to the 1.23.0 milestone May 10, 2024
@@ -125,8 +127,7 @@ func getExtendedCommitStats(repo *git.Repository, revision string /*, limit int
_ = stdoutWriter.Close()
}()

gitCmd := git.NewCommand(repo.Ctx, "log", "--shortstat", "--no-merges", "--pretty=format:---%n%aN%n%aE%n%as", "--reverse")
// AddOptionFormat("--max-count=%d", limit)
gitCmd := git.NewCommand(repo.Ctx, "log", "--shortstat", "--no-merges", "--pretty=format:---%n%aN%n%aE%n%as%n%(trailers:key=Co-authored-by,valueonly=true)", "--reverse")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked compat on this trailers option and it does appear to be present in git 2.27 at least which is the oldest version where git's site has docs for, so it's likely fine to use:

https://git-scm.com/docs/git-log/2.27.0

@kemzeb
Copy link
Contributor Author

kemzeb commented May 11, 2024

Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.

I manually tested all the code paths for parseCoAuthorTrailerValue() and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?

@silverwind
Copy link
Member

Hmm I'm not sure how common it is, but should I account for duplicate Co-authored-by entries i.e. entries that have the same email in a single commit?

I guess duplicates (by email) should count as one.

@silverwind
Copy link
Member

silverwind commented May 13, 2024

Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.

I manually tested all the code paths for parseCoAuthorTrailerValue() and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?

I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.

@silverwind
Copy link
Member

BTW is it feasible to move all commit message parsing into a shared function? As I see it, parsing these Co-Authored-By lines is not specific to Commit graph and could be useful elsewhere too.

@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 14, 2024
@kemzeb
Copy link
Contributor Author

kemzeb commented May 14, 2024

Recent changes exports the trailer parsing logic and adds adds parameterized tests for it. We now also handle duplicated Co-authored-by entries (i.e. entries that have the same email).

@kemzeb
Copy link
Contributor Author

kemzeb commented May 14, 2024

I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.

Nice read! It appears it is caused by their parser's use of regex, where it iterates a commit's metadata lines looking for its first match with a line that starts with author. From what I understand the main problem is that you can author a commit without a name (while working in a Codespaces environment) where the parser would mistakenly not catch; it would then continue iterating over the commit metadata and eventually find a false author line that was added by the attacker.

I have included some tests for my parser and I'm open to include more possible inputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/go Pull requests that update Go code size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

contributors graph respect "Co-authored-by"
4 participants