Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: DO NOT MERGE: patching: rewrite: try to stabilize patch index stanzas as well as From lines #6455

Conversation

rpardini
Copy link
Member

@rpardini rpardini commented Apr 1, 2024

RFC: DO NOT MERGE: patching: rewrite: try to stabilize patch index stanzas as well as From lines

  • RFC: DO NOT MERGE: patching: rewrite: try to stabilize patch index stanzas as well as From lines
    • git format-patch --zero-commit doesn't affect index xxx...yyy lines, only From:
      • so use the classy "use a regex with a callback" solution as git format-patch doesn't offer one
    • this will make all patches change when rewritten, but hopefully for the last time !
    • not sure what other impacts this might have though - git am might not work any more?
    • we need to preserve index 000000000000..xxx as zeros, which indicate new file creation, thus:
      • new file creations are rewritten as index 000000000000..111111111111
      • non-creations are rewritten as index 111111111111..222222222222
  • DO NOT MERGE: Example rewrite of a patch with both file change & file creation

@github-actions github-actions bot added size/small PR with less then 50 lines Hardware Hardware related - kernel, u-boot, patches labels Apr 1, 2024
@rpardini
Copy link
Member Author

rpardini commented Apr 1, 2024

For awareness, people I know have an interest in this: @SteeManMI (suffered from rewriting) @amazingfate (introduced fixes 12-char index) @ColorfulRhino (suffered), possibly more?

Check the diff on the patch in this PR -- yes, all our patch indexes would change, but it would be "hopefully" for the last time. 🤔

Either way please don't merge -- if discussions prove positive I'll send the Python change separately and then we can do a batch of rewrites to equalize all.

@ColorfulRhino
Copy link
Collaborator

ColorfulRhino commented Apr 2, 2024

Interesting script!
But what advantage do we have in removing the index hashes?

From https://git-scm.com/docs/git-diff#generate_patch_text_with_p :

index <hash>..<hash> <mode>
[...]
The index line includes the blob object names before and after the change. The is included if the file mode does not change; otherwise, separate lines indicate the old and the new mode.

Which means the index hashes are hashes from the actual file names, as in not the file name of the patch, but the file name of the file which is being patched. They should stay constant unless the file name changes or a new Git version adds more stuff to the hash as seen here:

-index 0000000000..4899e23fa3
+index 000000000000..4899e23fa3aa

@SteeManMI
Copy link
Contributor

I can't comment on the implementation here, but I will comment on the need for addressing this issue. As the code currently stands, these patch rewrites generate a lot of noise in the diffs that I feel potentially obscures any real changes. When scrolling through all of the changes, it is easy to pass over something small during a review.

Also, even if this causes a one-time big hit on every patch, it will be worth it in the long run by having more readable/reviewable diffs.

Now I leave it to the rest of you to figure out the best implementation.

@ColorfulRhino
Copy link
Collaborator

I can't comment on the implementation here, but I will comment on the need for addressing this issue. As the code currently stands, these patch rewrites generate a lot of noise in the diffs that I feel potentially obscures any real changes. When scrolling through all of the changes, it is easy to pass over something small during a review.

Also, even if this causes a one-time big hit on every patch, it will be worth it in the long run by having more readable/reviewable diffs.

I agree.
But the issue are also new patches coming in. I think we need to automate this, e.g. have a GH Actions script check on each PR if a patch is included/added/modified and automatically format it in the correct "Armbian" way.

@rpardini
Copy link
Member Author

rpardini commented Apr 2, 2024

Thanks for the feedback.
I realize this is incomplete, since there is (at least) a 3rd case I need to handle: file deletions.
Patches can delete files (not very common) and I dunno which index xxx..zzzz invocation is required, but I'm pretty sure it would be messed up with this.

also new patches coming in. I think we need to automate this

true, but a separate problem. Also, I'd like to allow even the simplest of patches to come in -- lower barrier to entry -- and then be rewritten, eg in a dependabot-style, instead of blocking.

@rpardini rpardini force-pushed the pr/RFC-DO-NOT-MERGE-patching-rewrite-try-to-stabilize-patch-index-stanzas-as-well-as-From-lines branch from ce437c4 to c2742ec Compare April 2, 2024 18:09
@ColorfulRhino
Copy link
Collaborator

For now I honestly still don't see how "removing" the file hashes would benefit us 😅

For example

diff --git a/sound/soc/codecs/rk3308_codec.h b/sound/soc/codecs/rk3308_codec.h
index 6cfa69157785..93e089dae081 100644

The has 6cfa69157785 is simply hashed from the text a/sound/soc/codecs/rk3308_codec.h and 93e089dae081 is hashes from b/sound/soc/codecs/rk3308_codec.h as far as I can understand. As they say The index line includes the blob object names before and after the change.

true, but a separate problem. Also, I'd like to allow even the simplest of patches to come in -- lower barrier to entry -- and then be rewritten, eg in a dependabot-style, instead of blocking.

Yes, that's what I meant :) I'm not sure if Actions can modify a commit in a PR (e.g. modify a commit with a patch into the correct Armbian format) or if there needs to be a second commit afterwards (creating noise, but making it more understandable)

@rpardini
Copy link
Member Author

rpardini commented Apr 2, 2024

simply hashed from the text

No -- it comes from the git revisions's SHA1 in git from where we export the patches.

Thus they change all the time, creating (for us) "noise".

That "noise" would be very useful for a kernel developer (working on the mailing list) for example.

@rpardini
Copy link
Member Author

rpardini commented Apr 2, 2024

I'm not sure if Actions can modify a commit in a PR (e.g. modify a commit with a patch into the correct Armbian format)

It depends if the sender of the PR checked the "allow edits by maintainer & secrets" checkbox.

@ColorfulRhino
Copy link
Collaborator

No -- it comes from the git revisions's SHA1 in git from where we export the patches.

Thus they change all the time, creating (for us) "noise".

Are you sure? I just created 3 test files test1.txt (content: one), test2.txt test3.txt (content of both: two_and_three).

git diff D:\Documents\test1.txt D:\Documents\test2.txt
diff --git "a/D:\\Documents\\test1.txt" "b/D:\\Documents\\test2.txt"
index 43dd47e..4c59102 100644
--- "a/D:\\Documents\\test1.txt"
+++ "b/D:\\Documents\\test2.txt"
@@ -1 +1 @@
-one
\ No newline at end of file
+two_and_three
\ No newline at end of file
git diff D:\Documents\test1.txt D:\Documents\test3.txt
diff --git "a/D:\\Documents\\test1.txt" "b/D:\\Documents\\test3.txt"
index 43dd47e..4c59102 100644
--- "a/D:\\Documents\\test1.txt"
+++ "b/D:\\Documents\\test3.txt"
@@ -1 +1 @@
-one
\ No newline at end of file
+two_and_three
\ No newline at end of file

Change the content of test2.txt to only_two:

git diff D:\Documents\test1.txt D:\Documents\test2.txt
diff --git "a/D:\\Documents\\test1.txt" "b/D:\\Documents\\test2.txt"
index 43dd47e..6ab791f 100644
--- "a/D:\\Documents\\test1.txt"
+++ "b/D:\\Documents\\test2.txt"
@@ -1 +1 @@
-one
\ No newline at end of file
+only_two
\ No newline at end of file
git diff D:\Documents\test1.txt D:\Documents\test3.txt
diff --git "a/D:\\Documents\\test1.txt" "b/D:\\Documents\\test3.txt"
index 43dd47e..4c59102 100644
--- "a/D:\\Documents\\test1.txt"
+++ "b/D:\\Documents\\test3.txt"
@@ -1 +1 @@
-one
\ No newline at end of file
+two_and_three
\ No newline at end of file

By looking at the two numbers after index, it looks to me like it's a hash of the file's content (not its name).
Maybe that's what you mwant and I midunderstood 😅

I found the implementation here: https://github.com/git/git/blob/c2cbfbd2e28cbe27c194d62183b42f27a6a5bb87/diff.c#L4504C3-L4506C44

strbuf_addf(msg, "%s%sindex %s..%s", line_prefix, set,
			    diff_abbrev_oid(&one->oid, abbrev),
			    diff_abbrev_oid(&two->oid, abbrev));

[...]

static const char *diff_abbrev_oid(const struct object_id *oid, int abbrev)
{
	if (startup_info->have_repository)
		return repo_find_unique_abbrev(the_repository, oid, abbrev);
	else {
		char *hex = oid_to_hex(oid);
		if (abbrev < 0)
			abbrev = FALLBACK_DEFAULT_ABBREV;
		if (abbrev > the_hash_algo->hexsz)
			BUG("oid abbreviation out of range: %d", abbrev);
		if (abbrev)
			hex[abbrev] = '\0';
		return hex;
	}
}

@rpardini
Copy link
Member Author

rpardini commented Apr 3, 2024

if (startup_info->have_repository)

I guess you're not hitting this case.

@rpardini rpardini force-pushed the pr/RFC-DO-NOT-MERGE-patching-rewrite-try-to-stabilize-patch-index-stanzas-as-well-as-From-lines branch from c2742ec to 929bf04 Compare April 3, 2024 16:28
@rpardini
Copy link
Member Author

rpardini commented Apr 3, 2024

Sorry to pester, @amazingfate -- what do you think about this? (Either way I need to test/fix file deletions done in patches, but I won't waste time if there's no general agreement on the "idea".)

@ColorfulRhino
Copy link
Collaborator

Hmm I think if it does create noise, this PR is good. I just haven't seen the index line change much in the patches, which is why I thought it was the file hash. I'll have a closer look on past patches soon, to see if the index is changing all the time.

@ColorfulRhino
Copy link
Collaborator

ColorfulRhino commented Apr 4, 2024

I saw it changing now. But it looks like the index only changes when the base file changes as well, when the hunks also change. If that's the case, then I don't think this is "bad" noise. It would be bad if the hunks don't change, but the index does.

Edit: I found one occasion where index got changed even though hunks were not changed.

@rpardini rpardini force-pushed the pr/RFC-DO-NOT-MERGE-patching-rewrite-try-to-stabilize-patch-index-stanzas-as-well-as-From-lines branch from 929bf04 to a2795f7 Compare April 5, 2024 23:29
…stanzas as well as From lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- not sure what other impacts this might have though - `git am` might not work any more?
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
@rpardini rpardini force-pushed the pr/RFC-DO-NOT-MERGE-patching-rewrite-try-to-stabilize-patch-index-stanzas-as-well-as-From-lines branch from a2795f7 to 3aa4733 Compare April 27, 2024 09:52
@rpardini
Copy link
Member Author

I realize this is incomplete, since there is (at least) a 3rd case I need to handle: file deletions.
Patches can delete files (not very common) and I dunno which index xxx..zzzz invocation is required, but I'm pretty sure it would be messed up with this.

This turned out untrue. File deletions don't have an index stanza at all. Thus this seems to actually complete.

rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 18, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to rpardini/armbian-build that referenced this pull request May 18, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
@rpardini
Copy link
Member Author

Sent the can-merge version of this in #6623
Thanks all for their input

@rpardini rpardini closed this May 18, 2024
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 18, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 19, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 20, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to rpardini/armbian-build that referenced this pull request May 21, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 21, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 21, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 22, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to armsurvivors/armbian-build that referenced this pull request May 27, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
rpardini added a commit to rpardini/armbian-build that referenced this pull request May 28, 2024
…rom lines

- `git format-patch --zero-commit` doesn't affect `index xxx...yyy` lines, only `From: `
  - so use the _classy_ "use a regex with a callback" solution as git format-patch doesn't offer one
- this will make _all_ patches change when rewritten, but hopefully _for the last time_ !
- we need to preserve `index 000000000000..xxx` as zeros, which indicate new file creation, thus:
  - new file creations are rewritten as `index 000000000000..111111111111`
  - non-creations are rewritten as `index 111111111111..222222222222`
- this is the final version of armbian#6455
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hardware Hardware related - kernel, u-boot, patches size/small PR with less then 50 lines
Development

Successfully merging this pull request may close these issues.

None yet

3 participants