Small improvements for preparing PostgreSQL HA #1138

byucesoy · 2024-01-23T02:18:58Z

This PR consist of small improvements and refactors that I did while developing HA for PostgreSQL. I'm introducing them as separate PR for making the already complex HA PR a bit simpler/cleaner.

Clean up PostgreSQL semaphore updates
We forgot to decrement some of the semaphores. With this commit we are fixing
that. While doing so I moved the place of some semaphore updates for having a
cleaner prog.

Move redacted_columns in PostgresResource to end of file
This is purely estetic. I will add more class methods and wanted to put all of
them to the end of the file because it is OK for them to take less mind and eye
space.

Set PostgresResource's hostname based on existence of dns_zone
Previously, hostname was set based on whether Config.postgres_service_hostname
is set or not. However, checking the existence of DnsZone is safer. After all
it is possible to set Config.postgres_service_hostname but not configure the
DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone
related contexes (such as certificate generation), thus it is not unusual to
set the config but not the DnsZone.

Make PostgresServer's configure label triggerable via semaphore
The label itself is already designed to be runnable after different occassions,
such as during provisioning or at the end of PITR. This commit simply adds a
semaphore to trigger the label when needed.

Make initializing database from backup idempotent
Previously, if the initialization fails midway, it might not be repeatable
depending the the failure point. This commit cleans up the data and config
directories before attempting to retry.

Put root CA bundle to PostgresServer's VMs
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes with verify-full option. At that time they will need to
know root CAs to be able to complete verification of the certificate chain.

Add clientAuth capability to Postgres certificates
This will be useful when we setup HA standbys. Standbys would want to connect
to the primary nodes and they will use this certificate to authenticate. To be
able to use this certificate to authenticate, the certificates needs to have
clientAuth capability.

Wait for certificate cration by PostgresResource
In usual case, certificate creation would be completed very quickly as we sign
certificates ourselves. However if certificate creation delays, potentially due
to an error in previous step, and we continue with PostgresServer provisioning,
server can put empty certs, which would prevent Postgres from starting. We add
a check in PostgresServer::refresh_certificates to not progress if the certs
are not ready yet.

Rescue errors during pulse check
Pulse check can throw an exception due to various reason. If that happens the
thread responsible for performing pulse check for that particular resource
would exit and we cannot perform any more health checks. We already catch the
exceptions while processing SSH sessions for the same purpose. Adding a similar
exception handling logic for pulse checks prevents thread exits.

Allow passing and storing extra data in pulses
Resources might want to store some, intended to be minimal, data in pulses to
be able to make better unavailability decisions. Normally, a way to achieve
this would be overriding the aggregate_readings method, but this seems like a
common need so adding it to the default aggregate_readings method makes sense.

However, it is important to note that the intention here is keeping such data
to be small, so that it would not put extra load on monitor process that is
designed to have high frequency.

prog/postgres/postgres_server_nexus.rb

model/postgres/postgres_resource.rb

config.rb

fdr · 2024-01-31T06:41:07Z

okay, also queuing this up for tomorrow. I like that you broke this out, in spite of being so little line churn, most are thought provoking for someone without deep reflexes in your code, so I will have to read them when fresh.

model/postgres/postgres_resource.rb

prog/postgres/postgres_resource_nexus.rb

fdr

very minor comments I think, not really worthy blockers per se, but if you want to rectify them I think it'd be a small improvement.

model/postgres/postgres_resource.rb

prog/postgres/postgres_server_nexus.rb

bin/monitor

fdr · 2024-02-06T06:50:37Z

Well really I want to see rescue => ex justified but once you've done that, let's put it in (or you can send them out in whatever batches if you want to hedge your bets)

prog/postgres/postgres_resource_nexus.rb

fdr · 2024-02-09T01:49:20Z

Threw in a slightly streamlined way of using rescue's built in matching functionality. I think it does what you want (confirm to be sure). I'm not sure if you've used git's --autosquash, but it would auto...squash it with the correct commit if you accept the change.

After doing so (or rejecting it and removing the commit, or editing it a bit more), consider the entire series approved.

byucesoy · 2024-02-09T11:50:18Z

I used to autosquash, not anymore. Call me weird, but I enjoy moving commits around in the interactive rebase view. 😏

fdr · 2024-02-10T18:03:56Z

I used to autosquash, not anymore. Call me weird, but I enjoy moving commits around in the interactive rebase view. 😏

Alright, you be weird. Hey, you asked.

We forgot to decrement some of the semaphores. With this commit we are fixing that. While doing so I moved the place of some semaphore updates for having a cleaner prog.

Having group of sections like associations, includes, semaphores, methods etc. makes the model files a bit more readable compared with mixing includes and methods. Also redacted_columns would not need to be read frequently, so moving them to less prominent space made sense.

Previously, hostname was set based on whether Config.postgres_service_hostname is set or not. However, checking the existence of DnsZone is safer. After all it is possible to set Config.postgres_service_hostname but not configure the DnsZone. In fact, Config.postgres_service_hostname is useful in non DnsZone related contexes (such as certificate generation), thus it is not unusual to set the config but not the DnsZone.

The label itself is already designed to be runnable after different occassions, such as during provisioning or at the end of PITR. This commit simply adds a semaphore to trigger the label when needed.

Previously, if the initialization fails midway, it might not be repeatable depending the the failure point. This commit cleans up the data and config directories before attempting to retry.

This will be useful when we setup HA standbys. Standbys would want to connect to the primary nodes with verify-full option. At that time they will need to know root CAs to be able to complete verification of the certificate chain.

During HA standby setup, standbys would need to connect to the primary nodes and while doing so they will use this certificate to authenticate. However, to be able to use this certificate to authenticate, the certificate needs to have clientAuth capability.

In usual case, certificate creation would be completed very quickly as we sign certificates ourselves. However if certificate creation delays, potentially due to an error in previous step, and we continue with PostgresServer provisioning, server can put empty certs, which would prevent Postgres from starting. We add a check in PostgresServer::refresh_certificates to not progress if the certs are not ready yet.

Pulse check can throw an exception due to various reason. If that happens the thread responsible for performing pulse check for that particular resource would exit and we cannot perform any more health checks. We already catch the exceptions while processing SSH sessions for the same purpose. Adding a similar exception handling logic for pulse checks prevents thread exits.

Resources might want to store some, intended to be minimal, data in pulses to be able to make better unavailability decisions. Normally, a way to achieve this would be overriding the aggregate_readings method, but this seems like a common need so adding it to the default aggregate_readings method makes sense. However, it is important to note that the intention here is keeping such data to be small, so that it would not put extra load on monitor process that is designed to have high frequency.

byucesoy requested review from fdr, pykello, enescakir, furkansahin and velioglu January 23, 2024 02:18

byucesoy self-assigned this Jan 23, 2024

enescakir reviewed Jan 23, 2024

View reviewed changes

prog/postgres/postgres_server_nexus.rb Outdated Show resolved Hide resolved

model/postgres/postgres_resource.rb Show resolved Hide resolved

config.rb Outdated Show resolved Hide resolved

byucesoy force-pushed the pg-refactors branch from 0f7d0d1 to bf4ed7f Compare January 29, 2024 05:08

byucesoy force-pushed the pg-refactors branch from bf4ed7f to 2cdbcfa Compare January 31, 2024 07:42

byucesoy requested a review from enescakir January 31, 2024 07:44

byucesoy mentioned this pull request Jan 31, 2024

Fix PG Server certificate refresh infinite loop #1162

Closed

byucesoy force-pushed the pg-refactors branch 2 times, most recently from 8b86158 to 1f8eb5f Compare February 1, 2024 15:47

fdr reviewed Feb 5, 2024

View reviewed changes

model/postgres/postgres_resource.rb Outdated Show resolved Hide resolved

fdr reviewed Feb 5, 2024

View reviewed changes

prog/postgres/postgres_resource_nexus.rb Show resolved Hide resolved

fdr approved these changes Feb 5, 2024

View reviewed changes

fdr reviewed Feb 5, 2024

View reviewed changes

model/postgres/postgres_resource.rb Outdated Show resolved Hide resolved

byucesoy force-pushed the pg-refactors branch from 1f8eb5f to 4f2b191 Compare February 6, 2024 04:11

fdr reviewed Feb 6, 2024

View reviewed changes

prog/postgres/postgres_server_nexus.rb Show resolved Hide resolved

fdr reviewed Feb 6, 2024

View reviewed changes

bin/monitor Show resolved Hide resolved

fdr approved these changes Feb 6, 2024

View reviewed changes

byucesoy force-pushed the pg-refactors branch 2 times, most recently from 7b8c3c6 to ae85bad Compare February 6, 2024 12:01

furkansahin reviewed Feb 6, 2024

View reviewed changes

prog/postgres/postgres_resource_nexus.rb Show resolved Hide resolved

byucesoy force-pushed the pg-refactors branch 2 times, most recently from 686c15b to 587b233 Compare February 8, 2024 13:46

byucesoy added the waiting_deployment label Feb 8, 2024

byucesoy force-pushed the pg-refactors branch from 587b233 to e28f271 Compare February 8, 2024 13:54

fdr force-pushed the pg-refactors branch from ff1aa4b to ffac0be Compare February 9, 2024 01:47

byucesoy force-pushed the pg-refactors branch from ffac0be to 3f81ce2 Compare February 9, 2024 11:50

fdr approved these changes Feb 10, 2024

View reviewed changes

byucesoy added 11 commits February 12, 2024 07:52

Clean up PostgreSQL semaphore updates

c8a671f

We forgot to decrement some of the semaphores. With this commit we are fixing that. While doing so I moved the place of some semaphore updates for having a cleaner prog.

Make PostgresServer's configure label triggerable via semaphore

30fb065

The label itself is already designed to be runnable after different occassions, such as during provisioning or at the end of PITR. This commit simply adds a semaphore to trigger the label when needed.

Make initializing database idempotent

84d1fe8

Previously, if the initialization fails midway, it might not be repeatable depending the the failure point. This commit cleans up the data and config directories before attempting to retry.

Put root CA bundle to PostgresServer's VMs

9451ef5

This will be useful when we setup HA standbys. Standbys would want to connect to the primary nodes with verify-full option. At that time they will need to know root CAs to be able to complete verification of the certificate chain.

Add clear:true for blob storage secret key

ed6f044

byucesoy force-pushed the pg-refactors branch from 3f81ce2 to de22015 Compare February 12, 2024 06:53

byucesoy merged commit e3a724a into main Feb 12, 2024
5 checks passed

byucesoy deleted the pg-refactors branch February 12, 2024 07:51

byucesoy removed the waiting_deployment label Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small improvements for preparing PostgreSQL HA #1138

Small improvements for preparing PostgreSQL HA #1138

byucesoy commented Jan 23, 2024 •

edited

fdr commented Jan 31, 2024

fdr left a comment

fdr commented Feb 6, 2024

fdr commented Feb 9, 2024 •

edited

byucesoy commented Feb 9, 2024

fdr commented Feb 10, 2024

Small improvements for preparing PostgreSQL HA #1138

Small improvements for preparing PostgreSQL HA #1138

Conversation

byucesoy commented Jan 23, 2024 • edited

fdr commented Jan 31, 2024

fdr left a comment

Choose a reason for hiding this comment

fdr commented Feb 6, 2024

fdr commented Feb 9, 2024 • edited

byucesoy commented Feb 9, 2024

fdr commented Feb 10, 2024

byucesoy commented Jan 23, 2024 •

edited

fdr commented Feb 9, 2024 •

edited