Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework and parallelize abi hashing #1225

Open
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

Thomas1664
Copy link
Contributor

@Thomas1664 Thomas1664 commented Oct 8, 2023

Changes in behavior

  • sbom files now contain relative paths to port files (as they're supposed to anyway looking at variable names)
  • abi and sbom files are saved directly to packages after build (abi file was first saved to buildtrees and copied to packages after build)

Parallelized

  • get_cmake_script_hashes()
  • get_private_abi()

Additionally, those 2 parallelized functions are executed asynchronously while initialize_abi_info() is running. This function initializes AbiInfo which includes getting toolset, compiler, and triplet info.

The former populate_abi_tag() is split up in order to separate concerns and allow parallelization.

@autoantwort
Copy link
Contributor

Do you have numbers how much time was spend there? :)

@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Oct 9, 2023

Do you have numbers how much time was spend there? :)

It's important to note that the time vcpkg does something is very small. Calculating abis takes around 75 ms or 10% of the total time vcpkg runs (excluding sub processes) when installing one port. I'm not sure if it's worth parallelizing all of it on a per-port level or parallelizing certain tasks like parsing cmake and finding script helpers. The goal is to find out what's most efficient and the code is split up to allow easy combination of the required steps.

@Neumann-A
Copy link
Contributor

Isn't the main bottleneck just I/O? Also the parallelization is not allow to break existing abi hashes.

@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Oct 9, 2023

Isn't the main bottleneck just I/O?

Yes! That's why I pointed at script hashes which is something I'll definitely consider parallelizing. I'd also take a look at saving the abi file (and maybe also that weird sbom file) on disk.

My profiling also suggests that parsing the cmake contents of a port for the use of helper functions takes some time - we loop through the cmake contents >100 times!

Also the parallelization is not allow to break existing abi hashes.

I'm not sure what you mean by that. ABI can't be broken by calculating hashes of one port in different order because the vector is sorted in the end. I'm also aware that we can't parallelize everything without iterating multiple times over install plan actions.

@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Oct 9, 2023

The top 3 hottest paths are get_compiler_info, get_cmake_script_hashes, and get_toolset. get_cmake_script_hashes is executed exactly once and the other two functions only a few times at most. These are 2 different operations (get_compiler_info needs get_toolset). Therefore, we could do them asynchronously.

If we could get the compiler info for host and target triplet in parallel we could be even faster, but I don't think this is possible?

src/vcpkg/abi.cpp Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Thomas1664 Thomas1664 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BillyONeal I have a few questions about what is set in AbiInfo in certain cases.

return;
}

abi_info.compiler_info = paths.get_compiler_info(*abi_info.pre_build_info, toolset);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we not need compiler info if either --head or --editable were used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't have a abi_info object you don't need the fill the compiler_info entry in it.

src/vcpkg/commands.build.cpp Show resolved Hide resolved
src/vcpkg/commands.build.cpp Show resolved Hide resolved
@autoantwort
Copy link
Contributor

#802 adds a new PortAbiCache to speed up port abi creation. But this is only helpful if you have multiple ABIs for the same port, which you normally only have if you have host-triplet != target-triplet. But #802 builds a port multiple times with different abis and needed this (resulted in a multiple minutes speedup when using --all).
https://github.com/microsoft/vcpkg-tool/pull/802/files#diff-a93efc9f4d36565c420ffe731eb4aafb23b3f5f300db52302f898c4a64884a94

@Thomas1664
Copy link
Contributor Author

#802 adds a new PortAbiCache to speed up port abi creation. But this is only helpful if you have multiple ABIs for the same port, which you normally only have if you have host-triplet != target-triplet. But #802 builds a port multiple times with different abis and needed this (resulted in a multiple minutes speedup when using --all). https://github.com/microsoft/vcpkg-tool/pull/802/files#diff-a93efc9f4d36565c420ffe731eb4aafb23b3f5f300db52302f898c4a64884a94

I don't think it's worth optimizing for host-triplet != target-triplet because it applies only to few ports. However, I think I might end up parallelizing make_abi_tag().

@Thomas1664 Thomas1664 marked this pull request as ready for review November 2, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants