Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow more than 1 PgBouncer replicas #622

Open
2 tasks done
low-on-mana opened this issue Jul 4, 2022 · 5 comments
Open
2 tasks done

allow more than 1 PgBouncer replicas #622

low-on-mana opened this issue Jul 4, 2022 · 5 comments
Labels
kind/enhancement kind - new features or changes status/needs-discussion status - this needs discussion

Comments

@low-on-mana
Copy link

Checks

Chart Version

latest

Kubernetes Version

NA

Helm Version

NA

Description

We are using the latest version of this chart in production for airflow 2.3.0 ( we did this migration few days back ).

One of the issues we faced is related to pgbouncer.
What happened was K8 rescheduled the pgbouncer pod to another node, since there is only 1 pod running we had one task failure which we had to retry manually later.

We can have safe_to_evict false or pod disruption budget as another solution but best would be to make pgbouncer HA by using multi pods.

Can we have 2 pods for HA pgbouncer ?

spec:
  replicas: 1
  strategy:
    rollingUpdate:
      ## multiple pgbouncer pods can safely run concurrently

Relevant Logs

No response

Custom Helm Values

No response

@low-on-mana low-on-mana added the kind/bug kind - things not working properly label Jul 4, 2022
@jurovee
Copy link

jurovee commented Jul 4, 2022

@low-on-mana is this really safe to use in an Airflow environment? I was wondering about the same actually, to have some kind of backup if one PgBouncer replica fails (during k8s node patching or whatever). Official chart also uses a hardcoded replicas: 1.

I've tried to understand how can multiple PgBouncer replicas affect the deployment (connections to DB etc.) but didn't find any suitable links, tutorials, nothing.. explaining this multi-replica PgBouncer thing.

Would it also require to customize values such as maxClientConnections and poolSize? E.g. you set replicas to 3 then you would need to customize these values accordingly (divide by 3?).

Anyone who has any experience in this?

@stale
Copy link

stale bot commented Sep 5, 2022

This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label

@stale stale bot added the lifecycle/stale lifecycle - this is stale label Sep 5, 2022
@thesuperzapper
Copy link
Member

@low-on-mana @jurovee I agree that having multiple PgBouncer replicas would be (in theory) great for redundancy, especially during node outages/upgrades, the problem is that any disruption to the database connection during a transaction will result in airflow raising an error, which I doubt airflow will gracefully recover from.

(NOTE: airflow uses SQLAlchemy in "pessimistic" pooling mode with the pre-ping approach, which can't handle mid-transaction failures)

That is to say, more PgBouncer replicas actually increases the possiblity of airflow trying to use a connection to a PgBouncer Pod that is no longer active (and crashing as a result).

We would need to investigate getting airflow to use a different SQLAlchemy pooling mode (to allow mid-transaction failures to be resolved gracefully) before we can increase PgBouncer replicas.

@stale stale bot removed the lifecycle/stale lifecycle - this is stale label Sep 13, 2022
@thesuperzapper thesuperzapper changed the title Pgbouncer num replicas is not modifable allow more than 1 PgBouncer replicas Sep 13, 2022
@thesuperzapper thesuperzapper added kind/enhancement kind - new features or changes status/needs-discussion status - this needs discussion and removed kind/bug kind - things not working properly labels Sep 13, 2022
@waldoppper
Copy link

@thesuperzapper Forgive me but why do you say higher "PgBouncer replicas actually increases the possibility of airflow trying to use a[n inactive] connection?"

I'm chasing HA on this particular component also, and want to understand the risk you're describing.

@stale
Copy link

stale bot commented Nov 26, 2022

This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label

@stale stale bot added the lifecycle/stale lifecycle - this is stale label Nov 26, 2022
@stale stale bot closed this as completed Dec 3, 2022
@thesuperzapper thesuperzapper removed the lifecycle/stale lifecycle - this is stale label Dec 21, 2022
@thesuperzapper thesuperzapper added this to Unsorted in Issue Triage and PR Tracking via automation Dec 21, 2022
@thesuperzapper thesuperzapper moved this from Unsorted to Triage | Needs Discussion in Issue Triage and PR Tracking Dec 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement kind - new features or changes status/needs-discussion status - this needs discussion
Projects
Issue Triage and PR Tracking
Triage | Needs Discussion
Development

No branches or pull requests

4 participants