Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for git-sync with PVC (central sync deployment) #575

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

karakanb
Copy link
Contributor

@karakanb karakanb commented May 4, 2022

What issues does your PR fix?

What does your PR do?

This PR is a quick attempt to bring a single git-sync pod that would update the dags on a PVC, and the rest of the pods can just mount this diirectory instead of cloning the repos over and over again. I am not sure if this is the right way to go btw, I am open to feedback, and I haven really tested this yet.

There is a very good chance that this is a completely wrong way of implementing this, in which case I'd be totally okay to close the PR.

Checklist

For all Pull Requests

For releasing ONLY

@karakanb
Copy link
Contributor Author

karakanb commented May 12, 2022

The more that I think about this, the more it doesn't make sense to try to keep three strategies to copy the files tbh.

Currently, if this PR or sth along the way goes forward, we'll have 3 ways of doing this:

  1. git sync, running on every worker pod every time the task is launched, a.k.a status quo with gitSync.enabled: true
  2. persistent volume: no git-sync, just some dag data in the volume
  3. git sync + persistent volume: what this PR seems to propose

for the people that are currently using the first option, replacing it with number 3 doesn't really make any difference, with a small change though: if those people have DAGs that are creating files on the local volumes and whatever, that means that if we mount the shared volume directly on the pods, they might write stuff that are propagated to other pods as well, whereas in the status quo with git sync this never happens; therefore, rolling out option 3 would be a breaking change.

here's what I propose instead:

  • keep the current values file, and majority of the changes in this PR.
    • I am assuming this is kinda the right way of doing things, but regardless, the idea is to make it work with a separate deployment that populates a PV.
  • instead of running git-sync as an init container in every pod, do the following:
    • mount the PVC to a random location
    • run an barebones init container that copies the files from the PVC into the dags path in the values file

if we manage to do it this way, it'd mean that:

  • the chart could keep the backwards compatibility
  • we'd reduce the chance of unknown failures by a lot after an upgrade
    • it might very well be the case that the cluster admin is not on top of every DAG: in case someone had a DAG that does sth with the DAG directory, the end result of mounting the directory directly might cause a lot of trouble without no one noticing.

@thesuperzapper what do you think? I can try to advance this PR towards this way if you'd be up guiding me a bit.

Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
…tsync

Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
…enabled

Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
@karakanb karakanb force-pushed the feature/try-adding-deployment-for-pvc-gitsync branch from feb6191 to 3512922 Compare May 12, 2022 23:25
…loyment

Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
@karakanb
Copy link
Contributor Author

I went ahead and attempted doing the change with the copy trick, but there are a couple of things not working still:

  • I changed the securityContext because I couldn't make the volume writeable otherwise, not sure if that's the right way though
  • for some reason the git secret is not being mounted with the correct permissions, and git sync is not accepting those permissions, hence failing to pull the images

I'll try to work on this soon again, but leaving in case you can tackle it.

Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
Signed-off-by: Burak Karakan <burak.karakan@gmail.com>
@karakanb
Copy link
Contributor Author

I think I have managed to get it working: the contents are being properly mounted, scheduler is able to see and schedule tasks as well. My local setup is not that good, so I am not able to test the whole flow, but as far as these changes are concerned, I think it is in an okay shape.

the things missing are:

  • security contexts are a bit of a mess, I don't know what I am doing there, so they need to be double-checked for sure.
  • maybe there needs to be some sort of a signal that the contents are fetched before the worker pods start?
    • it might be the case that the first time this is deployed, maybe the scheduler runs the copy before the dags are fetched?
    • we might need to add another init container to check if the repo is fetched already.
  • I haven't really tested regular flows.

any feedback is appreciated.

@thesuperzapper thesuperzapper added this to PR | Needs Review in Issue Triage and PR Tracking Jun 6, 2022
@karakanb
Copy link
Contributor Author

karakanb commented Jul 3, 2022

Another usecase for sth like this:

  • currently if you have gitSync enabled, it means every pod will have a gitSync sidecar running.
  • The problem with this is that if GitHub is down for example, gitSync sidecars will bring all of the airflow pods down as well instead of gracefully handling the failure.
  • In that case, all the gitSync containers are failing to fetch the repo, which is, for some reason, causing them to restart all the pods continuously.
  • This means in the case of the Git repo being unaccessible, everything about Airflow is being down as well.

For processes like gitSync where maybe 90% of the fetch operations do not change anything, I'd like to be able to set some sort of a failure strategy for the gitSync operation so that it doesn't bring Airflow down for everyone, and work with the stale copy of the DAGs where necessary. Having a separate deployment allows doing that.

@ohayak
Copy link

ohayak commented Feb 3, 2023

Any news on this PR ?

@thesuperzapper thesuperzapper added this to the airflow-8.7.0 milestone Feb 6, 2023
@thesuperzapper
Copy link
Member

@ohayak @karakanb sorry for the delay in reviewing this (was moving countries, lol), I will take a look soon.

@karakanb I see that you still have WIP on this PR, is this ready for review (so I can target it for 8.7.0 of the chart)?

@karakanb
Copy link
Contributor Author

karakanb commented Feb 8, 2023

I think it is ready for review.

@karakanb karakanb changed the title WIP: try adding deployment for pvc gitsync try adding deployment for pvc gitsync Feb 8, 2023
@thesuperzapper thesuperzapper changed the title try adding deployment for pvc gitsync feat: add support for git-sync with PVC (central sync deployment) May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

support using git-sync with PVC (allowing there to only be one git-sync container)
3 participants