Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reload Configuration on Signal #191

Open
blaggacao opened this issue Oct 7, 2020 · 8 comments
Open

Reload Configuration on Signal #191

blaggacao opened this issue Oct 7, 2020 · 8 comments
Projects

Comments

@blaggacao
Copy link
Contributor

blaggacao commented Oct 7, 2020

Is your feature request related to a problem? Please describe.
For implementing #184 , I need a way to instruct trow to geacefully reload it's certificates and accept new connections on a new ssl contexts.

Describe the solution you'd like
Since rollover is done well ahead of certificate TTL (at roughly 80%), no urgency is at hand which is while current connection can be normally phased out.

It is not necesary for trow to detect those changes, although this is a possibility, should it be easier to implement. Probably not, since safeguards would have to be put in place to avoid race conditions while reloading the different files.

Rather the most portable solution appears to be exposing a configuration api stub, which induces a reload of the TLS context from disk upon trigger (eg. POST). Maybe something simple and similar to the healthcheck endpoints can be conceived or alternativle a special configuration port that would not be exposed outside of the pod context for some basic shielding.

Describe alternatives you've considered
Kill & restart. Does imply service interruptuons.

Additional context

@amouat
Copy link
Contributor

amouat commented Oct 7, 2020

Yeah, reloading makes sense. I think the Unix way is to respond to a sighup, right?

@blaggacao
Copy link
Contributor Author

Yeah, reloading makes sense. I think the Unix way is to respond to a sighup, right?

Indeed, if that would be implemented it would play very very well along with: https://github.com/spiffe/spiffe-helper#readme as it's process wrapper / reload manager. If somebody could lend a hand with this in the coming days it would be a perfect coordination of efforts. 😉

@amouat
Copy link
Contributor

amouat commented Oct 9, 2020

How often does certificate rotation occur? The way that occurs to me to implement it would probably incur some downtime:

  • Set the k8s readiness endpoint to fail to stop new traffic
  • Wait for current connections to finish (could be a while if large upload/download, which is what worries me)
  • Reload config, effectively restart server
  • Set readiness to ok

We're actually in the middle of a refactoring that will replace the underlying framework, so it might be an idea to complete that before moving to this task.

@amouat
Copy link
Contributor

amouat commented Oct 9, 2020

I'm going to give this a more generic title, as I think we should be able to handle reloading all config.

@amouat amouat changed the title Graceful Certificate Realoading Reload Configuration on Signal Oct 9, 2020
@blaggacao
Copy link
Contributor Author

How often does certificate rotation occur?

Spire default is every 5 minutes, but it's configurable and users are expected to tweak as to strike a good balance between their security requirements and service performance.

I think reloading the whole server with downtime is relatively straight forward. I've done exactly this for other solutions, that do not support certificate reloading. Though, I don't think it fits for a canonical implementation.

There has even been a discussion on OpenSSL mailing list about the topic.

https://www.mail-archive.com/openssl-users@openssl.org/msg88596.html

The conclusion was more or less:

  • keeping running contexts around
  • only use new certs for new connections

@amouat
Copy link
Contributor

amouat commented Oct 12, 2020

Thanks. The trouble is that's pretty low-level stuff, and I'm not sure how much I can control it with the current frameworks.

It does also bring up an alternative solution - monitoring the cert file and automatically reloading if it changes. If it is easily possible to "keep running contexts around" that may be a better solution, but I'm still leaning towards using signals (which implies a restart and complete config reload).

@blaggacao
Copy link
Contributor Author

blaggacao commented Oct 12, 2020

I came to the conclusion, that if time and budget is to be spent on this issue, it ultimately should be made available upstream: rwf2/Rocket#1448

It looks as if this is the rocket frameworks implementation of tls. I kind of get it, first time reading rust code, though.

@blaggacao
Copy link
Contributor Author

blaggacao commented Oct 13, 2020

I've setup a testbed here: #193
→ TTL can be set to something like 30 seconds here: https://github.com/ContainerSolutions/trow/pull/193/files#diff-ac309bd9e52a2419f8aaff3203228458fbaec4f7336192cf4f4ec269ec7befd3R7

@amouat amouat added this to To do in Trow Oct 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Trow
  
To do
Development

No branches or pull requests

2 participants