Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small improvements for preparing PostgreSQL HA #1138

Merged
merged 11 commits into from Feb 12, 2024
Merged

Small improvements for preparing PostgreSQL HA #1138

merged 11 commits into from Feb 12, 2024

Conversation

byucesoy
Copy link
Member

@byucesoy byucesoy commented Jan 23, 2024

This PR consist of small improvements and refactors that I did while developing HA for PostgreSQL. I'm introducing them as separate PR for making the already complex HA PR a bit simpler/cleaner.

Clean up PostgreSQL semaphore updates
We forgot to decrement some of the semaphores. With this commit we are fixing
that. While doing so I moved the place of some semaphore updates for having a
cleaner prog.

Move redacted_columns in PostgresResource to end of file
This is purely estetic. I will add more class methods and wanted to put all of
them to the end of the file because it is OK for them to take less mind and eye
space.

Set PostgresResource's hostname based on existence of dns_zone
Previously, hostname was set based on whether Config.postgres_service_hostname
is set or not. However, checking the existence of DnsZone is safer. After all
it is possible to set Config.postgres_service_hostname but not configure the
DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone
related contexes (such as certificate generation), thus it is not unusual to
set the config but not the DnsZone.

Make PostgresServer's configure label triggerable via semaphore
The label itself is already designed to be runnable after different occassions,
such as during provisioning or at the end of PITR. This commit simply adds a
semaphore to trigger the label when needed.

Make initializing database from backup idempotent
Previously, if the initialization fails midway, it might not be repeatable
depending the the failure point. This commit cleans up the data and config
directories before attempting to retry.

Put root CA bundle to PostgresServer's VMs
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes with verify-full option. At that time they will need to
know root CAs to be able to complete verification of the certificate chain.

Add clientAuth capability to Postgres certificates
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes and they will use this certificate to authenticate. To be
able to use this certificate to authenticate, the certificates needs to have
clientAuth capability.

Wait for certificate cration by PostgresResource
In usual case, certificate creation would be completed very quickly as we sign
certificates ourselves. However if certificate creation delays, potentially due
to an error in previous step, and we continue with PostgresServer provisioning,
server can put empty certs, which would prevent Postgres from starting. We add
a check in PostgresServer::refresh_certificates to not progress if the certs
are not ready yet.

Rescue errors during pulse check
Pulse check can throw an exception due to various reason. If that happens the
thread responsible for performing pulse check for that particular resource
would exit and we cannot perform any more health checks. We already catch the
exceptions while processing SSH sessions for the same purpose. Adding a similar
exception handling logic for pulse checks prevents thread exits.

Allow passing and storing extra data in pulses
Resources might want to store some, intended to be minimal, data in pulses to
be able to make better unavailability decisions. Normally, a way to achieve
this would be overriding the aggregate_readings method, but this seems like a
common need so adding it to the default aggregate_readings method makes sense.

However, it is important to note that the intention here is keeping such data
to be small, so that it would not put extra load on monitor process that is
designed to have high frequency.

prog/postgres/postgres_server_nexus.rb Outdated Show resolved Hide resolved
model/postgres/postgres_resource.rb Show resolved Hide resolved
config.rb Outdated Show resolved Hide resolved
@fdr
Copy link
Collaborator

fdr commented Jan 31, 2024

okay, also queuing this up for tomorrow. I like that you broke this out, in spite of being so little line churn, most are thought provoking for someone without deep reflexes in your code, so I will have to read them when fresh.

Copy link
Collaborator

@fdr fdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very minor comments I think, not really worthy blockers per se, but if you want to rectify them I think it'd be a small improvement.

bin/monitor Show resolved Hide resolved
@fdr
Copy link
Collaborator

fdr commented Feb 6, 2024

Well really I want to see rescue => ex justified but once you've done that, let's put it in (or you can send them out in whatever batches if you want to hedge your bets)

@byucesoy byucesoy force-pushed the pg-refactors branch 2 times, most recently from 7b8c3c6 to ae85bad Compare February 6, 2024 12:01
@fdr
Copy link
Collaborator

fdr commented Feb 9, 2024

Threw in a slightly streamlined way of using rescue's built in matching functionality. I think it does what you want (confirm to be sure). I'm not sure if you've used git's --autosquash, but it would auto...squash it with the correct commit if you accept the change.

After doing so (or rejecting it and removing the commit, or editing it a bit more), consider the entire series approved.

@byucesoy
Copy link
Member Author

byucesoy commented Feb 9, 2024

I used to autosquash, not anymore. Call me weird, but I enjoy moving commits around in the interactive rebase view. 😏

@fdr
Copy link
Collaborator

fdr commented Feb 10, 2024

I used to autosquash, not anymore. Call me weird, but I enjoy moving commits around in the interactive rebase view. 😏

Alright, you be weird. Hey, you asked.

We forgot to decrement some of the semaphores. With this commit we are fixing
that. While doing so I moved the place of some semaphore updates for having a
cleaner prog.
Having group of sections like associations, includes, semaphores, methods etc.
makes the model files a bit more readable compared with mixing includes and
methods.

Also redacted_columns would not need to be read frequently, so moving them to
less prominent space made sense.
Previously, hostname was set based on whether Config.postgres_service_hostname
is set or not. However, checking the existence of DnsZone is safer. After all
it is possible to set Config.postgres_service_hostname but not configure the
DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone
related contexes (such as certificate generation), thus it is not unusual to
set the config but not the DnsZone.
The label itself is already designed to be runnable after different occassions,
such as during provisioning or at the end of PITR. This commit simply adds a
semaphore to trigger the label when needed.
Previously, if the initialization fails midway, it might not be repeatable
depending the the failure point. This commit cleans up the data and config
directories before attempting to retry.
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes with verify-full option. At that time they will need to
know root CAs to be able to complete verification of the certificate chain.
During HA standby setup, standbys would need to connect to the primary nodes
and while doing so they will use this certificate to authenticate. However,
to be able to use this certificate to authenticate, the certificate needs to
have clientAuth capability.
In usual case, certificate creation would be completed very quickly as we sign
certificates ourselves. However if certificate creation delays, potentially due
to an error in previous step, and we continue with PostgresServer provisioning,
server can put empty certs, which would prevent Postgres from starting. We add
a check in PostgresServer::refresh_certificates to not progress if the certs
are not ready yet.
Pulse check can throw an exception due to various reason. If that happens the
thread responsible for performing pulse check for that particular resource
would exit and we cannot perform any more health checks. We already catch the
exceptions while processing SSH sessions for the same purpose. Adding a similar
exception handling logic for pulse checks prevents thread exits.
Resources might want to store some, intended to be minimal, data in pulses to
be able to make better unavailability decisions. Normally, a way to achieve
this would be overriding the aggregate_readings method, but this seems like a
common need so adding it to the default aggregate_readings method makes sense.

However, it is important to note that the intention here is keeping such data
to be small, so that it would not put extra load on monitor process that is
designed to have high frequency.
@byucesoy byucesoy merged commit e3a724a into main Feb 12, 2024
5 checks passed
@byucesoy byucesoy deleted the pg-refactors branch February 12, 2024 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants