Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terragrunt IAC Engine Plugin System #3103

Open
yhakbar opened this issue Apr 26, 2024 · 2 comments
Open

Terragrunt IAC Engine Plugin System #3103

yhakbar opened this issue Apr 26, 2024 · 2 comments
Assignees
Labels
accepted Accepted RFC rfc Request For Comments

Comments

@yhakbar
Copy link
Contributor

yhakbar commented Apr 26, 2024

Summary

Introduce the ability to integrate with plugins to drive custom behavior in the underlying IAC tool orchestrated by Terragrunt (like OpenTofu or Terraform).

Motivation

Users have been lacking two significant capabilities that are addressed by this RFC:

  1. The ability to customize the usage of tofu and terraform when called by Terragrunt.

    Users have been relying on ensuring that a particular versions of tools have been set prior to executing terragrunt or utilizing a shim to alter the execution of the underlying IAC tool.

  2. The ability to alter the context of the IAC execution separate from the Terragrunt execution.

    So far, there has been no way to isolate the IAM access that the underlying IAC tool has from the access that Terragrunt has. The IAC tool has also had to run in the same compute environment and on the same filesystem as Terragrunt.

    Terragrunt super users would like to be able to isolate the compute resources allocated to Terragrunt from the compute allocated to the underlying IAC tool so that they can fan out IAC updates across multiple instances/containers/pods.

Proposal

Allow users to optionally specify an IAC engine, which will control how the underlying IAC operations like plans, applies, etc will be carried out instead of directly calling the tofu or terraform binaries.

Users will be able to use a configuration block that looks like the following to configure their engine in the relevant terragrunt.hcl:

engine {
   source  = "github.com/acme/terragrunt-plugin-custom-opentofu"
   version = "v0.0.1" # Optionally specify version
   type    = "rpc" # Optionally specify the type of plugin
}

The source field would be either the path to a local binary (signified by starting the value with . or /) or a URL pointing to a GitHub repository with a releases page containing an asset that can be used (the appropriate architecture and platform would be guessed based on detected values of the host machine, and can be explicitly set via environment variables).

The optional version field would indicate the git tag associated with the release to download, when the source is not a local binary. Throws an error if set for a local path, and is the latest release by default for remote sources.

The optional type field would indicate the type of plugin used by the engine. The default rpc value would indicate that the plugin is using HashiCorp's go-plguin and communicating with the plugin in a client/server relationship via RPC. For simplicity in authoring plugins, this will be the first type of plugin supported by this RFC. It's possible that in the future, a secondary type of plugin leveraging the Golang plugin package would be used with type shared.

Technical Details

This proposal impacts how and if tofu and terraform get called by Terragrunt.

To support this change, the following will have to be done:

  • Ensure that all calls to tofu/terraform are mediated by some logic that checks if an engine block has been configured, and only directly execute one of the binaries if none have.
  • Ensure that all those calls have public interfaces that plugin authors can use to verify that their usage in engine blocks will work.
  • Create the architecture for downloading assets from GitHub releases using a tool like go-gh, or something equivalent.
  • The assets should have their integrity verified by computing their checksums.
  • Concurrent access to the same plugin should be thought through from an early stage due to the nature of Terragrunt. A locking mechanism to ensure that concurrent attempts to download the same plugin won't result in race conditions.
  • Plugins should be centrally cached, with the location of that cache configurable by users via an environment variable.
  • HCL config parsing will need to be updated to respect the new engine block.
  • When an engine block is configured in the terragrunt.hcl that is used for a terragrunt command, and the type is rpc, dynamically fetch, verify, then load the plugin, then execute IAC updates using it.
  • Logging should be introduced to signal to users that control has shifted from Terragrunt to the plugin.
  • Documentation on the engine system should be written up, and guidance on how to author new plugins.

To ensure that this functionality can be developed smoothly with minimal risk of regression, the functionality should be introduced under a feature flag that is enabled by setting the environment variable TG_EXPERIMENTAL_ENGINE=1. Users should be made aware that leveraging this functionality in production is risky while the functionality is being battle tested with additional warning logging.

Documentation will need to be authored that demonstrates how to write an IAC Engine plugin and guidance on testing it.

In addition, Gruntwork will host two plugins that will demonstrate how to author plugins following best practices:

  • terragrunt-iac-engine-opentofu
  • terragrunt-iac-engine-terraform

They will execute tofu and terraform in the same way Terragrunt currently does. Users will be able to use the repositories as springboards for their custom implementation of the same functionality.

Press Release

A new engine configuration block has been released allowing you to customize and configure how your IAC updates orchestrated by Terragrunt!

To try it out, all you need to do is include the following in your terragrunt.hcl:

engine {
   source = "github.com/gruntwork-io/terragrunt-iac-engine-opentofu"
}

Due to the fact that this functionality is still experimental, and not recommended for general production usage, set the following environment variable to opt-in to this functionality:

export TG_EXPERIMENTAL_ENGINE=1

The next time you call Terragrunt, it will dynamically fetch and load the Gruntwork OpenTofu IAC Engine plugin for Terragrunt to use instead of calling OpenTofu directly.

You can find the plugin here. <-- This link is intentionally broken as this is a mock press release.

If you'd like to customize how OpenTofu is used when orchestrated by Terragrunt, feel free to fork the repository and call your own version of the plugin!

Drawbacks

  • This will complicate and potentially introduce regression into core Terragrunt functionality in invoking IAC tools.
  • Additional maintenance burden will be imposed on the maintainers in that IAC Engine plugins will have to remain compatible with default direct invocations of tofu and terraform.
  • Troubleshooting issues for users can become more complicated if errors exist in the implementation of their IAC Engines, rather than anything Terragrunt ships.
  • Ensuring that the plugin system works well introduces an entirely new source of burden for how Terragrunt is maintained.

Alternatives

  • Avoiding introducing any plugin system at all. This was not chosen due to the scaling limits customers are reaching with our current architecture. They have a need for the ability to have more control over how IAC execution works, so it's deemed worth it to explore this avenue.

Migration Strategy

This shouldn't result in any need for adjustments on the behalf of customers for their existing code bases to be compatible.

IAC Engines should remain an optional feature of Terragrunt for the foreseeable future.

Unresolved Questions

  • Would the majority of users prefer that we start with the rpc plugin type instead of the shared plugin type? If so, please make your voice heard in the comments on this RFC.
  • How much time would it take to build out this system? We can incrementally release functionality behind the TG_EXPERIMENTAL_ENGINE feature flag to release incomplete functionality early. There may be a long waiting period before we consider this functionality ready for general audiences, however.
  • What is the appetite users have for building their own IAC Engine plugins? Is this something that would be used by the vast minority of users, and how much would this impact their workflows?
  • What best practices in Golang plugin architectures can we be sure to adopt out of the gate to ensure that this feature is successful?

References

Proof of Concept Pull Request

Changes

  • The default plugin type has changed to the RPC type, as initial experimentation by @denis256 has proven out the go-plugin library as more likely to be useful for community members.
    Problems included complicated compile configurations for plugins and a requirement that plugin authors have a strong understanding of the Terragrunt build system.
    To default on the side of making things easier for the community, we've decided to adjust the plan to start with the RPC type plugin, and add the shared type later if beneficial.
@yhakbar yhakbar added rfc Request For Comments pending-decision Pending decision from maintainers labels Apr 26, 2024
@yhakbar
Copy link
Contributor Author

yhakbar commented May 9, 2024

Some feedback has been shared offline regarding the performance implications of introducing this plugin system that stemmed from a lack of clarity in this RFC regarding the difference between the shared and rpc plugin types.

Shared

The shared type of plugin leverages the built-in Golang plugin package. This kind of plugin is a shared library (typically having the extension .so) that would be dynamically loaded by the running Terragrunt process, and have its exported functions called directly from the Terragrunt process.

There would be no Inter-Process Communication (IPC) between Terragrunt and a second process running along with Terragrunt, and it would be largely equivalent to calling the functions from directly within the Terragrunt binary from a performance and usage perspective.

The downsides of this approach are that it requires that the plugin be written in Golang and one that is compatible with the version of Golang used to compile Terragrunt, see the warnings here. It also prevents most fault isolation of panics in the plugin, etc, as the plugin would be running in the same process as Terragrunt.

RPC

The rpc type of plugin leverages the HashiCorp go-plugin package. It is how provider plugins work in OpenTofu and Terraform. This type of plugin is spun up as a secondary process, and Terragrunt would establish a client - server connection with the plugin.

The advantages of this approach are that the plugin can be written in languages other than Golang, as long as they have good support for the protocol used by the plugin system (e.g. gRPC), and it allows for panics to happen in a secondary process, making it easier to prevent blow-ups in the engine from impacting the Terragrunt process. You can see a number of other advantages here.

The downsides of this approach are that there is a detectable impact to performance. There can be significant overhead in spinning up one or more Engine plugins, which then spin up one or more Provider plugins and having all that IPC happening.

@yhakbar
Copy link
Contributor Author

yhakbar commented May 30, 2024

Please take note that the default plugin type we will be exploring as part of this RFC is the RPC type.

This is due to the exploratory work done by @denis256 to make sure that the plugin system we build here will be maximally adoptable by the community. If you would prefer that we adjust this direction, please make your voice heard!

@yhakbar yhakbar added accepted Accepted RFC and removed pending-decision Pending decision from maintainers labels May 30, 2024
@yhakbar yhakbar self-assigned this May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Accepted RFC rfc Request For Comments
Projects
None yet
Development

No branches or pull requests

1 participant