-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Small improvements for preparing PostgreSQL HA #1138
Conversation
0f7d0d1
to
bf4ed7f
Compare
okay, also queuing this up for tomorrow. I like that you broke this out, in spite of being so little line churn, most are thought provoking for someone without deep reflexes in your code, so I will have to read them when fresh. |
bf4ed7f
to
2cdbcfa
Compare
8b86158
to
1f8eb5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor comments I think, not really worthy blockers per se, but if you want to rectify them I think it'd be a small improvement.
1f8eb5f
to
4f2b191
Compare
Well really I want to see |
7b8c3c6
to
ae85bad
Compare
686c15b
to
587b233
Compare
587b233
to
e28f271
Compare
Threw in a slightly streamlined way of using rescue's built in matching functionality. I think it does what you want (confirm to be sure). I'm not sure if you've used git's After doing so (or rejecting it and removing the commit, or editing it a bit more), consider the entire series approved. |
I used to autosquash, not anymore. Call me weird, but I enjoy moving commits around in the interactive rebase view. 😏 |
ffac0be
to
3f81ce2
Compare
Alright, you be weird. Hey, you asked. |
We forgot to decrement some of the semaphores. With this commit we are fixing that. While doing so I moved the place of some semaphore updates for having a cleaner prog.
Having group of sections like associations, includes, semaphores, methods etc. makes the model files a bit more readable compared with mixing includes and methods. Also redacted_columns would not need to be read frequently, so moving them to less prominent space made sense.
Previously, hostname was set based on whether Config.postgres_service_hostname is set or not. However, checking the existence of DnsZone is safer. After all it is possible to set Config.postgres_service_hostname but not configure the DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone related contexes (such as certificate generation), thus it is not unusual to set the config but not the DnsZone.
The label itself is already designed to be runnable after different occassions, such as during provisioning or at the end of PITR. This commit simply adds a semaphore to trigger the label when needed.
Previously, if the initialization fails midway, it might not be repeatable depending the the failure point. This commit cleans up the data and config directories before attempting to retry.
This will be useful when we setup HA standbys. Standbys would want to connect to the primary nodes with verify-full option. At that time they will need to know root CAs to be able to complete verification of the certificate chain.
During HA standby setup, standbys would need to connect to the primary nodes and while doing so they will use this certificate to authenticate. However, to be able to use this certificate to authenticate, the certificate needs to have clientAuth capability.
In usual case, certificate creation would be completed very quickly as we sign certificates ourselves. However if certificate creation delays, potentially due to an error in previous step, and we continue with PostgresServer provisioning, server can put empty certs, which would prevent Postgres from starting. We add a check in PostgresServer::refresh_certificates to not progress if the certs are not ready yet.
Pulse check can throw an exception due to various reason. If that happens the thread responsible for performing pulse check for that particular resource would exit and we cannot perform any more health checks. We already catch the exceptions while processing SSH sessions for the same purpose. Adding a similar exception handling logic for pulse checks prevents thread exits.
Resources might want to store some, intended to be minimal, data in pulses to be able to make better unavailability decisions. Normally, a way to achieve this would be overriding the aggregate_readings method, but this seems like a common need so adding it to the default aggregate_readings method makes sense. However, it is important to note that the intention here is keeping such data to be small, so that it would not put extra load on monitor process that is designed to have high frequency.
3f81ce2
to
de22015
Compare
This PR consist of small improvements and refactors that I did while developing HA for PostgreSQL. I'm introducing them as separate PR for making the already complex HA PR a bit simpler/cleaner.
Clean up PostgreSQL semaphore updates
We forgot to decrement some of the semaphores. With this commit we are fixing
that. While doing so I moved the place of some semaphore updates for having a
cleaner prog.
Move redacted_columns in PostgresResource to end of file
This is purely estetic. I will add more class methods and wanted to put all of
them to the end of the file because it is OK for them to take less mind and eye
space.
Set PostgresResource's hostname based on existence of dns_zone
Previously, hostname was set based on whether Config.postgres_service_hostname
is set or not. However, checking the existence of DnsZone is safer. After all
it is possible to set Config.postgres_service_hostname but not configure the
DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone
related contexes (such as certificate generation), thus it is not unusual to
set the config but not the DnsZone.
Make PostgresServer's configure label triggerable via semaphore
The label itself is already designed to be runnable after different occassions,
such as during provisioning or at the end of PITR. This commit simply adds a
semaphore to trigger the label when needed.
Make initializing database from backup idempotent
Previously, if the initialization fails midway, it might not be repeatable
depending the the failure point. This commit cleans up the data and config
directories before attempting to retry.
Put root CA bundle to PostgresServer's VMs
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes with verify-full option. At that time they will need to
know root CAs to be able to complete verification of the certificate chain.
Add clientAuth capability to Postgres certificates
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes and they will use this certificate to authenticate. To be
able to use this certificate to authenticate, the certificates needs to have
clientAuth capability.
Wait for certificate cration by PostgresResource
In usual case, certificate creation would be completed very quickly as we sign
certificates ourselves. However if certificate creation delays, potentially due
to an error in previous step, and we continue with PostgresServer provisioning,
server can put empty certs, which would prevent Postgres from starting. We add
a check in PostgresServer::refresh_certificates to not progress if the certs
are not ready yet.
Rescue errors during pulse check
Pulse check can throw an exception due to various reason. If that happens the
thread responsible for performing pulse check for that particular resource
would exit and we cannot perform any more health checks. We already catch the
exceptions while processing SSH sessions for the same purpose. Adding a similar
exception handling logic for pulse checks prevents thread exits.
Allow passing and storing extra data in pulses
Resources might want to store some, intended to be minimal, data in pulses to
be able to make better unavailability decisions. Normally, a way to achieve
this would be overriding the aggregate_readings method, but this seems like a
common need so adding it to the default aggregate_readings method makes sense.
However, it is important to note that the intention here is keeping such data
to be small, so that it would not put extra load on monitor process that is
designed to have high frequency.