Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include provider versions in plan json output #35096

Open
melinath opened this issue Apr 29, 2024 · 6 comments
Open

Include provider versions in plan json output #35096

melinath opened this issue Apr 29, 2024 · 6 comments
Labels
enhancement new new issue not yet triaged

Comments

@melinath
Copy link

melinath commented Apr 29, 2024

Terraform Version

All versions - tested with 1.8.2

Use Cases

Some tools that parse plans may want to control which resources they process based on the provider version used for the resource. For example, a policy validation tool might only work with version 4.X of a provider, but not version 5.X, due to breaking changes in the major release (or vice versa).

Attempted Solutions

version_constraint (or similar) is not sufficient because it only records the configured constraint, not the specific version. Third-party tools that operate on plan json files may have different requirements than configuration authors.

Proposal

This could be stored at the global level (for example, provider_versions as a top-level sibling to terraform_version.) That would probably meet most of the requirement, but it would not allow "partially" valid plans - for example, if two versions of the same provider are installed under different aliases, you wouldn't be able to tell which resources were installed with which version.

This could be resolved by tracking the alias used for each installed provider version and for provisioning each resource, and/or each resource could individually track which provider version will be used to provision it.

References

No response

@melinath melinath added enhancement new new issue not yet triaged labels Apr 29, 2024
@apparentlymart
Copy link
Member

Hi @melinath! Thanks for this feature request.

There are already some features in Terraform's JSON output that are intended to deal with a problem similar to the one you've described here.

Firstly, Terraform generally assumes that any consumer that will be making deep use of the resource-type-specific parts of the JSON plan output will also obtain the JSON schema information about the providers used by the configuration that the plan was created for, and so this sort of "global metadata" tends to live there instead of directly in the plan JSON.

The JSON schema information includes "Schema Representation" which is the general container used to describe the schema of an individual resource type or other schema-based object, and that includes version as a resource-type-specific version number, decided by the provider.

The intent of this design is to represent that a breaking change to a provider's schema doesn't tend to break the entire schema all at once, and instead only tends to change a subset of the available resource types. Therefore using the whole provider version number is too coarse.

If instead a consuming program checks that version matches whatever version number was current at the time the tool was written, it can raise an error if it receives input for an unsupported version. In a future release it could then be updated to use these resource-type-specific version numbers to distinguish between multiple different supported versions.

I realize this is not exactly what you've asked for in this feature request, but I'm curious to learn whether this existing mechanism would be a potential alternative for what you're trying to achieve. If not, it would be helpful to discuss why so that we can better understand how your use-case differs from what we were considering when we created the current design.

Thanks!

@melinath
Copy link
Author

melinath commented Apr 29, 2024

In my experience working on the google provider, we don't generally upgrade the version of a resource's schema unless we need to do a Schema Migration (for example, to backfill a default value for a new field to prevent diffs and/or recreation.) I would not expect people to be able to rely on that to indicate whether a resource is compatible with a particular arbitrary third-party tool.

Being able to pin to a particular provider version (or set of provider versions) also feels more reliable / easier to understand.

Currently, TF configurations can pin to a provider version constraint; so can modules. However, third-party tools that operate on plan data have no way to do so.

@apparentlymart
Copy link
Member

Thanks for that extra context, @melinath.

An assumption in the existing design is that schema migrations are used when there are breaking changes, and thus the schema version would naturally increase as part of making a breaking change. However, you're right that this isn't necessarily guaranteed -- a provider could potentially make a breaking schema change without providing any upgrade path across that breaking change.

@melinath
Copy link
Author

Yeah, it's true that a provider can definitely make a breaking schema change without providing an automated upgrade path. This is what the google provider does - we only ensure that manual upgrades are possible, since schema migrations can only change the state, not the config.

But even changes that are non-breaking for Terraform users may be breaking for tools trying to read Terraform plans. For example, adding a new optional field with no default value. If a tool relies on that field to be present (for whatever reason), it will not be compatible with versions of the provider that are earlier than the one that introduced the field. While a tool author could check for the presence of the field in a (separately-provided) schema export, it would be much simpler / more generic to be able to check the provider version directly in the plan.

@apparentlymart
Copy link
Member

Some further reflection on this:

Wrapping automations tend to want to store plan JSON indefinitely and so are sensitive to overheads in the format. The existing format is already pretty challenging in this regard, which is a big part of why the new model for Terraform Stacks (still experimental and under development at the time I'm writing this) uses a streaming event model where the wrapping automation is expected to build its own artifact containing only the subset of data it cares about, rather than trying to build a single artifact for all needs.

For traditional Terraform the plan JSON won't be going anywhere (it's protected by compatibility promises) but we do need to be pretty cautious about extending it with new information that might bloat it even further and upset those who have already built automations around it.

Therefore a compromise that I'm thinking about is to put the provider version numbers in the provider schema JSON instead of the plan JSON. Information about providers themselves is more thematically connected with provider schemas than it is with the plan. The provider schema JSON is also more readily cacheable between runs, e.g. using a checksum of the dependency lock file as the cache key and exploiting the fact that the version selections are unlikely to be changing on every new plan. Therefore there's a more plausible story for how consumers can avoid paying the cost of storing the same information repeatedly for every plan/apply round.

For a consumer that doesn't perform such caching they'd still pay the same cost of storing the same data each time, but they'd be storing it in the schema JSON instead of the plan JSON. For automations that can cache the provider schema JSON between runs, they can request and store a new copy only when the provider version selections have changed.

@melinath
Copy link
Author

melinath commented Apr 30, 2024

Providing version information in the schema export definitely seems like a reasonable thing to do regardless of whether the information is in the plan as well! :-)

I hadn't considered the literal storage cost of the additional per-resource data. It seems like including the version information at the top-level of the plan would not have the same ballooning effect, so could also still be an option?

The things I'm trying to account for are essentially number of user actions, user error, and non-local processing.

If the version information is only available through the exported schema, then a user needs to export both the plan and the schema to pass into a tool - even if the only thing being used from the schema is version information. This is usually only a one-time cost but one action would still be easier than two.

Additionally, having the schema and version separate means that there's no way to verify that the schema provided is actually a schema used for the plan (unless the version information is also encoded in the plan file somehow.) A user could accidentally provide the wrong schema and trigger unexpected behavior.

And finally, if the user is submitting a file to an API for further processing, there could be additional costs to sending the entire provider schema alongside the plan. In terms of caching, ideally the API would be able to read the provider version(s) from the plan, then pull down those provider versions and export & cache the schema if needed (rather than relying on the schema to be provided by the user each time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests

2 participants