New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot update non-DAGs packages imported from DAGs using git-sync #39203
Comments
See also #118 (and especially this reply). |
After further analysis, we think it can be a race condition as per the
Here is the status of our cache:
... where |
I would be surprised But Maybe that's a side effect of https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#parsing-pre-import-modules. This configuration (True by default starting from 2.6.0) will pre-import "airflow" module in the parent process before fork happens. And it might well be that your local settings imported together will pull other imports when "airflow" module is imported. An easy way to check it, is to set the flag to false. I'd recommend you to do it. But then if your hypothesis is right, you should also be able to test it by adding |
Apache Airflow version
Other Airflow 2 version (please specify below)
If "Other Airflow 2 version" selected, which one?
2.8.3
What happened?
We use a structure of git submodules synchronized by git-sync sidecars as per the documentation of Typical Structure of Packages.
Some git submodules are containing DAGs files. Some other git submodules are containing utility libraries imported from the DAGs. The directory containing the utility libraries is configured in the PYTHONPATH to be available to DAGs. At startup everything goes well.
But we have an issue when a new module is added, and a DAG is modified to import this new module (both modification pushed to git, with submodules updates, and the git-sync sidecar synchronizing the files). Then, the DAG Processor component of Airflow starts to reprocess the DAG Bag but it seems like the cache of importlib is not invalidated, and the new module is not found.
We have such logs in the DAG Processor, in the import errors DB table, and thus in the UI:
Note that this is not an issue with the PYTHONPATH, as restarting the dag-processor and worker containers fixes the error.
What you think should happen instead?
The documentation of
importlib
mentions thatinvalidate_caches()
might be used:It seems like the airflow processes should call
invalidate_caches()
when therepo
linked git ref changes (meaning the content of the code might have changed and should be reprocessed with fresh imports).How to reproduce
Operating System
Kubernetes
Versions of Apache Airflow Providers
And using the constraints file from Apache Airflow for Python 3.10 and version 2.8.3.
Deployment
Official Apache Airflow Helm Chart
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: