Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect memory alignment of components. #478

Open
RobertBiehl opened this issue Aug 31, 2021 · 10 comments
Open

Incorrect memory alignment of components. #478

RobertBiehl opened this issue Aug 31, 2021 · 10 comments
Labels
bug Something isn't working has a workaround It's a bug, but there's a workaround

Comments

@RobertBiehl
Copy link

RobertBiehl commented Aug 31, 2021

Describe the bug
Component structs passed to system do not adhere to the defined memory alignment correctly.

To Reproduce

  1. enable avx

  2. Create component with aligned struct, e.g.

typedef struct EcsTransform3 {
    __attribute((aligned(32))) mat4 value;
} EcsTransform3;
  1. Inside system check address of pointer to struct
void TestSystem(ecs_iter_t* it) {
    EcsTransform3 *transform = ecs_term(it, EcsTransform3, 1);
    printf("(EcsTransform3 *) addr = %#010x", transform);
}

Sometimes prints addresses not aligned to 32 bytes as expected from CGLM_ALIGN_MAT.
(In my case this was 0x000000011af54ef0, which is only divisible by 16, not by 32.)
This causes crashes in calls to AVX intrinsics.

Expected behavior
Address of component should be aligned correctly, in this case to 32 bytes.

Additional context
Add any other context about the problem here (operating system, hardware, ...).

@RobertBiehl RobertBiehl added the bug Something isn't working label Aug 31, 2021
@SanderMertens
Copy link
Owner

Hm, flecs should be able to (and has been verified to) handle alignment correctly. It does rely on alignof/ECS_ALIGNOF reporting the correct alignment, could you verify that if you do ECS_ALIGNOF(EcsTransform3) it returns 32?

Additionally you could run the following code to check if the correct alignment has been registered with flecs:

ECS_COMPONENT(world, EcsTransform3);

EcsComponent *ptr = ecs_get(world, ecs_id(EcsTransform3), EcsComponent);
printf("%d\n", ptr->alignment);

@RobertBiehl
Copy link
Author

RobertBiehl commented Sep 1, 2021

Hi @SanderMertens, thanks for the reply and letting me know about how to inspect alignment.

printf("%d\n", ptr->alignment); does indeed print 32 and ECS_ALIGNOF(EcsTransform3) is also 32.

I guess I'll now need to find out how I could have gotten a pointer to an unaligned address in the system.

Currently the only thing that solves is is to locally copy the mat4 to aligned memory on the stack before continuing.
CGLM_ALIGN_MAT mat4 box_to_world; glm_mat4_copy(transform[i].value, box_to_world);

@chengts95
Copy link

chengts95 commented Sep 8, 2021

Yes, I have the same problem, I was thinking it was my fault, but the alignment of components is always 16byte even I changed the malloc function to use default 32byte alignment.

@SanderMertens
Copy link
Owner

@RobertBiehl @chengts95 if you're interested in debugging this further, the part of the code that should take care of the alignment is ecs_vector_t.

A vector is a contiguous block of memory with a header in front of it (containing the number of elements + allocated size). At an offset from the header the actual array begins, and it's important that this offset is aligned to the alignment of the component type.

The alignment is computed by macro's and passed into the vector functions (to allow for compile-time optimization where the vector type is known). If anything's wrong with how the component alignment is used, I would expect that to be in this code.

I'd like to look into this myself as well but it may take a bit of time before I can get to it.

@chengts95
Copy link

chengts95 commented Sep 8, 2021

If I am right, the alignment attribute can only determine the padding of the struct to fulfill the address alignment requirement, like expanding a 14byte struct to 16byte so its array is aligned.
However, the heap memory address is determined by the malloc function, the malloc function is default to 16byte and I didn't see the vector is created by aligned_alloc anywhere, so it will be 16byte aligned. So @RobertBiehl's struct cannot ensure a 32byte aligned address.

My purpose was to make components with double float-point numbers aligned to 32byte addresses so aligned AVX can be used, which means the first double number's address should be aligned to 32byte. The key problem is to have the starting address aligned to 32byte even the struct itself doesn't have an alignment attribute.

Example:

    int *p1 = (int*)malloc(16*sizeof(int) );
 //16*4=64 bytes can be aligned to 32byte address, however the alignment is 16byte
    free(p1);
 
    int *p2 =  (int*)aligned_alloc(32, 16*sizeof (int)); //same amount of data but aligned to 32byte address
    free(p2);

@chengts95
Copy link

chengts95 commented Sep 9, 2021

@RobertBiehl @chengts95 if you're interested in debugging this further, the part of the code that should take care of the alignment is ecs_vector_t.

A vector is a contiguous block of memory with a header in front of it (containing the number of elements + allocated size). At an offset from the header the actual array begins, and it's important that this offset is aligned to the alignment of the component type.

The alignment is computed by macro's and passed into the vector functions (to allow for compile-time optimization where the vector type is known). If anything's wrong with how the component alignment is used, I would expect that to be in this code.

I'd like to look into this myself as well but it may take a bit of time before I can get to it.

Where is the code to compute the offset? The offset should be different under different malloc situations.

@SanderMertens
Copy link
Owner

@chengts95 this is the code that computes the offset: https://github.com/SanderMertens/flecs/blob/master/include/flecs/private/vector.h#L50

Now that I'm looking at the code I think I also see the issue 💥 . The offset calculation simply does max(header_size, alignment) which works as long as the alignment is not larger than the alignment of the malloc, but in this case (32 bits alignment vs. 16 bits malloc alignment) this won't work.

If malloc returns address 16 then the offset calculation would add 32 which ends up being 48 which is unaligned. The code must be changed to take the max of header_size and the starting address aligned to the component alignment.

@chengts95
Copy link

chengts95 commented Sep 10, 2021

@chengts95 this is the code that computes the offset: https://github.com/SanderMertens/flecs/blob/master/include/flecs/private/vector.h#L50

Now that I'm looking at the code I think I also see the issue 💥 . The offset calculation simply does max(header_size, alignment) which works as long as the alignment is not larger than the alignment of the malloc, but in this case (32 bits alignment vs. 16 bits malloc alignment) this won't work.

If malloc returns address 16 then the offset calculation would add 32 which ends up being 48 which is unaligned. The code must be changed to take the max of header_size and the starting address aligned to the component alignment.

That's it! I also want to have a 32byte aligned double array, if we determine the alignment only by the struct size, you cannot get an AVX array with a plain double number component, if we can manually choose the alignment just like malloc and aligned_alloc, it will be better! Normally people can manually peel the loop to fit the AVX requirement, but it is more convenient if the starting address is aligned.

@SanderMertens SanderMertens added this to To do in v3 via automation Oct 1, 2021
@SanderMertens
Copy link
Owner

Sorry for the late reply! I've been looking at ways to address this, but there isn't a straightforward way to solve this with the current vector without changing lots of code.

I'll soon start working again on more storage related stuff, and one thing I've been wanting to do is change the component storage from a regular vector to a paged vector. I'll make sure that the new storage is able to deal with >16bits alignment.

@SanderMertens
Copy link
Owner

SanderMertens commented May 27, 2022

I just modified the component storage to no longer use ecs_vector_t. Component arrays now use a regular malloc. This doesn't give you correct memory aligned addresses yet, but you can now register a malloc with the OS API that only returns 128bit aligned (or more) addresses.

Not a full solution, but at least there's a workaround :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has a workaround It's a bug, but there's a workaround
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants