Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sys/bitarray API with set-bit counting, xor, nth-bit-set #72901

Merged
merged 3 commits into from
May 21, 2024

Conversation

luqasn
Copy link
Contributor

@luqasn luqasn commented May 16, 2024

Hello there!

I am currently working on integrating a memory-optimized forward error correction algorithm for LoRaWAN "natively" in Zephyr instead of relying on the external module that is currently being used.

For that, I need some more bitarray methods than are currently available.
In order to make review more enjoyable, I have added the three additional methods I need in this first PR and will then follow up with a second PR adding the new forward error correction algorithm that uses them.

This is my first Zephyr contribution and I am very much looking forward to your feedback.
Thanks for having a look!

This is part one of several changes to add more methods to the bitarray api
so that it can be used for broader usecases, specifically LoRaWAN forward
error correction.

Signed-off-by: Lucas Romero <luqasn@gmail.com>
Copy link

Hello @luqasn, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there are still some failing tests

andyross
andyross previously approved these changes May 17, 2024
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems clean, though sort of weirdly non-orthogonal. The original use case for this code was a synchronized slab-ish allocator. Not sure what xor or popcount are for. But they fit and don't make anything worse, so no reason not to merge.

@luqasn
Copy link
Contributor Author

luqasn commented May 17, 2024

Thanks for the reviews and sorry for the oversight in the tests.

I added the fix as a new commit as to not have you re-review everything, but please let me know if you want me to fix up the original commit instead.

@andyross I agree that it felt only half-fitting, but still better than having two bitarray things that are slightly different in usage.

@luqasn
Copy link
Contributor Author

luqasn commented May 17, 2024

I fixed the test that was failing because of a real implementation gap, but now I got a target dependent failure that I don't understand.

Here is my interpretation:

The tests seem to fail only on mobile cpus due to a check when unlocking a spinlock.
It tries to check if the unlocking cpu is the same as the cpu holding the lock.
But while trying to do that it fails on an assert because apparently getting the current cpu is not supported if z_smp_cpu_mobile?
What baffles me is that all other tests pass happily even though they also use spinlocks in the same way.
The only difference I can see is that the method under test for the failure uses two spinlocks instead of one, but that does not seem to be the root of the problem.

Can you help shed some light on this?

@andyross
Copy link
Contributor

But while trying to do that it fails on an assert because apparently getting the current cpu is not supported if z_smp_cpu_mobile?

Something got borked. The assert there is to detect common race conditions with the use of the "current CPU" in contexts where the current thread might migrate to another CPU. Basically it wants to see that you're in an ISR or that interrupts are masked locally.

If you're holding a spinlock (because you're trying to unlock one) then interrupts really should be masked. That they aren't implies that something went wrong. Most likely you have a mismatched lock somehow, but if you're sure the locking is correct then something very weird is going on. What platform is this failing on?

@luqasn
Copy link
Contributor Author

luqasn commented May 17, 2024

Thanks for your help!
It is failing on some platforms but not on others. One that I looked at the GHA run for was Cortex A53.

I think I will start by isolating the problematic commit/test into a new PR to rule out that one of the test that might run before it are the problem.

And then I'll have to stare long and hard at the code to see if my locking is weird somewhere.

Already did that some and my first idea would be to change the order of the double unlock to unlock in reverse order of the locking to see if that helps. Feels more "correct" anyways.

Comment on lines +221 to +224
key = k_spin_lock(&bitarray->lock);

__ASSERT_NO_MSG(bitarray != NULL);
__ASSERT_NO_MSG(bitarray->num_bits > 0);
Copy link
Collaborator

@TaiJuWu TaiJuWu May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logical issue in my mind.
You should check bitarray is not empty before bitarray->lock.

I know other function is written as you, but I infer it is a mistake.
Maybe we need others to confirm.

There is another reason I think it is a mistake.
Consider follow scenario:
If a thread take a lock and then assert, the lock will be held forever.

Copy link
Contributor Author

@luqasn luqasn May 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review!
Yes, I just copy/pasted the structure from the existing methods.

I see what you mean and agree that it is wrong in theory, but I am not sure how important it is.
I think the process will quit/kernel will panic anyways, either with a SEGFAULT or with an assertion error.
And if that's the case, it doesn't matter that the lock isn't released.

But if the others agree, I can change all locking in this file to be more theoretically correct.

@luqasn
Copy link
Contributor Author

luqasn commented May 18, 2024

Glad to report that changing the unlock order to be in reverse lock order indeed fixed the issue. 🥳

@cfriedt
Copy link
Member

cfriedt commented May 20, 2024

@luqasn - CI is passing \o/

The only remaining thing to do is convert your last two commits to fixups and squash as Andy mentioned.

something like this should do it (please check my commit math though)

git reset HEAD~2
# add just the find-nth changes
git commit --fixup d85be1c
# add just the xor changes
git commit --fixup 52b7daa
git rebase -i --autosquash HEAD~5

This is part one of several changes to add more methods to the bitarray api
so that it can be used for broader usecases, specifically LoRaWAN forward
error correction.

Signed-off-by: Lucas Romero <luqasn@gmail.com>
This is part one of several changes to add more methods to the bitarray api
so that it can be used for broader usecases, specifically LoRaWAN forward
error correction.

Signed-off-by: Lucas Romero <luqasn@gmail.com>
@luqasn
Copy link
Contributor Author

luqasn commented May 21, 2024

Thanks for all the feedback and help, I've squashed and rebased the fixes into their respective commits.

Ready to merge from my side, but what do the others think about @TaiJuWu's remarks?
Should I take care of the erroneous "struct access before NULL check" in the whole file or not?
If yes: in this PR or in a new one?

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm out of complaints, thanks

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - kind of foolish of me to miss this, but we will definitely need tests for each new function as well.

Ideally they would exercise a meaningful permutation of input parameters for each function, even invalid inputs like NULL, etc, but generally we want to exercise as many code paths as possible for coverage.

Usually the order of commits is important.

  1. feature A
  2. test A
  3. feature B
  4. test B

@martinjaeger
Copy link
Member

@cfriedt There are tests for each new function, just not added as extra commits, but as part of the commit with the added function itself. I think that's fine.

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry - now I see the tests. Normally we prefer tests in separate commits, but this should be ok

@nashif nashif merged commit 3f50197 into zephyrproject-rtos:main May 21, 2024
29 checks passed
Copy link

Hi @luqasn!
Congratulations on getting your very first Zephyr pull request merged 🎉🥳. This is a fantastic achievement, and we're thrilled to have you as part of our community!

To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge.

Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁

Comment on lines +248 to +249
*count += POPCOUNT(bitarray->bundles[bd.sidx] & bd.smask);
*count += POPCOUNT(bitarray->bundles[bd.eidx] & bd.emask);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, would this work on all supported toolchains? It looks like the POPCOUNT macro is only defined in toolchain/gcc.h, and not in toolchain/llvm.h, etc.?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants