Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When $TMPDIR is within $PWD, creating wheel dies on infinite recursion #992

Closed
1 task done
hroncok opened this issue Feb 8, 2024 · 10 comments · Fixed by #1014
Closed
1 task done

When $TMPDIR is within $PWD, creating wheel dies on infinite recursion #992

hroncok opened this issue Feb 8, 2024 · 10 comments · Fixed by #1014

Comments

@hroncok
Copy link
Contributor

hroncok commented Feb 8, 2024

Describe the bug

Due to the logic in

@contextmanager
def _in_temporary_directory(src_dir: Path) -> t.Iterator[None]:
with TemporaryDirectory(prefix='.tmp-yarl-pep517-') as tmp_dir:
with chdir_cm(tmp_dir):
tmp_src_dir = Path(tmp_dir) / 'src'
copytree(src_dir, tmp_src_dir, symlinks=True)
os.chdir(tmp_src_dir)
yield
the build backend dies with File name too long as the function recursively tries to copy $TMPDIR to itself when it is inside $PWD.

In Fedora, when we build wheels during building RPMs, we set $TMPDIR to be in $PWD -- it is a requirement to let RPM know about all the sources that have been built (e.g. to successfully create debugsource packages).

To Reproduce

$ git clone git@github.com:aio-libs/yarl.git
$ cd yarl
[yarl (master)]$ mkdir .tmp-in-tree
[yarl (master %)]$ export TMPDIR=$PWD/.tmp-in-tree
[yarl (master %)]$ pip wheel . --no-deps

Expected behavior

The folder needs to be excluded from copying. See my fix for the same problem in pip: pypa/pip#7873

Logs/tracebacks

$ pip wheel . --no-deps
Processing .../yarl
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: yarl
  Building wheel for yarl (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for yarl (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      **********************
      * Accelerated build *
      **********************
      Traceback (most recent call last):
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/contextlib.py", line 81, in inner
          return func(*args, **kwds)
                 ^^^^^^^^^^^^^^^^^^^
        File ".../yarl/packaging/pep517_backend/_backend.py", line 287, in build_wheel
          with maybe_prebuild_c_extensions(
        File "/usr/lib64/python3.12/contextlib.py", line 137, in __enter__
          return next(self.gen)
                 ^^^^^^^^^^^^^^
        File ".../yarl/packaging/pep517_backend/_backend.py", line 261, in maybe_prebuild_c_extensions
          with build_dir_ctx:
        File "/usr/lib64/python3.12/contextlib.py", line 137, in __enter__
          return next(self.gen)
                 ^^^^^^^^^^^^^^
        File ".../yarl/packaging/pep517_backend/_backend.py", line 202, in _in_temporary_directory
          copytree(src_dir, tmp_src_dir, symlinks=True)
        File "/usr/lib64/python3.12/shutil.py", line 588, in copytree
          return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/shutil.py", line 542, in _copytree
          raise Error(errors)
      shutil.Error: [('.../yarl/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/.tmp-yarl-pep517-pspb0f32/src/.tmp-in-tree/....[Errno 36] File name too long: ...tmp-yarl-pep517-pspb0f32'")]
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for yarl
Failed to build yarl
ERROR: Failed to build one or more wheels

Python Version

$ python --version
Python 3.12.1

But happens with any.

multidict Version

$ python -m pip show multidict
N/A (happens at build time)

yarl Version

$ python -m pip show yarl
N/A (happens at build time)

OS

Fedora Linux

Additional context

This was reported to us as a problem in one of our tools: befeleme/pyp2spec#44

The same issue existed in pip 4 years ago: pypa/pip#7872

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@webknjaz
Copy link
Member

webknjaz commented Feb 8, 2024

Brilliant! Thanks for the details! I saw this problem in ansible/pylibssh (where I have RPM smoke tests and Packit integrated) but haven't had time to figure it our myself. I'll try to patch this across the @aio-libs projects that share the same code and there. Thanks for solving this mystery!

@hroncok
Copy link
Contributor Author

hroncok commented Feb 8, 2024

I am writing the ignore function and will test it.

@webknjaz
Copy link
Member

webknjaz commented Feb 8, 2024

@hroncok what do you think the correct behavior should be? I was thinking maybe erroring out early asking the caller to use a different tmp directory? I was also thinking (earlier, before I discovered the bug) of having a config setting for explicitly requesting in and out-of-tree builds passed to the backend that would give the caller a way of overriding the defaults.

@hroncok
Copy link
Contributor Author

hroncok commented Feb 8, 2024

This implements the behavior as I think is the correct behavior:

@contextmanager
def _in_temporary_directory(src_dir: Path) -> t.Iterator[None]:
    with TemporaryDirectory(prefix='.tmp-yarl-pep517-') as tmp_dir:
        with chdir_cm(tmp_dir):
            tmp_src_dir = Path(tmp_dir) / 'src'
            def _ignore(d, _):
                """
                Prevent temporary directory to be copied into self
                recursively forever.
                Fixes https://github.com/aio-libs/yarl/issues/992
                """
                if Path(d) == tmp_src_dir.parent:
                    return [tmp_src_dir.name]
                return []
            copytree(src_dir, tmp_src_dir, symlinks=True, ignore=_ignore)
            os.chdir(tmp_src_dir)
            yield

@webknjaz
Copy link
Member

webknjaz commented Feb 9, 2024

@hroncok I suppose this could work. Though I'm not sure how safe it's to put tmp dir inside a project dir in general. It seems like similar problems are bound to happen with other tools. I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.

With that, I don't know if it's reasonable to allow such a silent behavior. Wouldn't it be safer to have the downstreams opt-in to it with an error by default? More globally, I can imagine various build backends picking up garbage files from tmp just because it's laying around in the workdir. Should this be solved in a systemic way in the distro macros? Can it be changed to something like TMPDIR="%{_buildrootdir}/tmp"?

@mgorny to get another downstream perspective, could you share any insights why Gentoo doesn't put tmp in the source checkout dir?

@brianjmurrell
Copy link

This implements the behavior as I think is the correct behavior:

@contextmanager
def _in_temporary_directory(src_dir: Path) -> t.Iterator[None]:
    with TemporaryDirectory(prefix='.tmp-yarl-pep517-') as tmp_dir:
        with chdir_cm(tmp_dir):
            tmp_src_dir = Path(tmp_dir) / 'src'
            def _ignore(d, _):
                """
                Prevent temporary directory to be copied into self
                recursively forever.
                Fixes https://github.com/aio-libs/yarl/issues/992
                """
                if Path(d) == tmp_src_dir.parent:
                    return [tmp_src_dir.name]
                return []
            copytree(src_dir, tmp_src_dir, symlinks=True, ignore=_ignore)
            os.chdir(tmp_src_dir)
            yield

Confirming that this does seem to fix the build issue I was having in befeleme/pyp2spec#44.

@hroncok
Copy link
Contributor Author

hroncok commented Feb 9, 2024

I suppose this could work.

It works for pip ;)

Though I'm not sure how safe it's to put tmp dir inside a project dir in general.

Generally, it is quite safe. It took 4 years to find a project that was negatively impacted.

In my head, taking the whole pwd and starting copying it somewhere is unsafe by design. What if the working tree is dirty and contains very large files?

It seems like similar problems are bound to happen with other tools.

You mean other tools that put tmpdir in pwd will also be impacted by the current implementation of _in_temporary_directory? Yes.

Or do you mean other build backends will have the same problem when we put tmpdir in pwd? Because so far there hasn't been much.

I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.

That looks nice, but I am afraid it would be too hard for us to do it that way now. Traditionally, rpms are built directly in unpacked source tree -- we would need to adjust all the Python packages to create subdirectories in rpm's %prep.

@webknjaz
Copy link
Member

webknjaz commented Feb 20, 2024

I suppose this could work.

It works for pip ;)

Though I'm not sure how safe it's to put tmp dir inside a project dir in general.

Generally, it is quite safe. It took 4 years to find a project that was negatively impacted.

Yeah, but it still seems like a can of worms. Especially as the resulting dists end up in the same tree by default.

In my head, taking the whole pwd and starting copying it somewhere is unsafe by design. What if the working tree is dirty and contains very large files?

Yes, I was considering using git worktree to get the clean source out of the Git tree. Though, while implementing this, I ended up forgetting about that idea and simply used a cp -r.

My idea was to essentially prevent accidental local modifications from influencing the dists that are going to be uploaded for public consumption. And default to in-tree builds for editable installs since this is what's needed during development.
I recognize that this approach may not be ideal for those applying patches while repackaging, but I wanted to minimize the probability of auto-including junk.

It seems like similar problems are bound to happen with other tools.

You mean other tools that put tmpdir in pwd will also be impacted by the current implementation of _in_temporary_directory? Yes.

Or do you mean other build backends will have the same problem when we put tmpdir in pwd? Because so far there hasn't been much.

Both, I suppose. PEP 517 does not demand building in a certain directory while giving the backends the freedom to do whatever for as long as the communication interface and frontend/backend boundary is respected. PEP 660 kinda implies an in-tree build, though, due to the nature of editable installs.

I guess the point is that the layout that pyproject-rpm-macros works most of the time is an accident because it relies on something that is not guaranteed by the standards currently in place. And so, I argue that there should be a mechanism to change that.

I like how Gentoo organizes the build dir layout — they have a top-level package tmp dir and inside, there's separate work, temp and build. So they don't intersect, being isolated from each other, while still being scoped on disk.

That looks nice, but I am afraid it would be too hard for us to do it that way now. Traditionally, rpms are built directly in unpacked source tree -- we would need to adjust all the Python packages to create subdirectories in rpm's %prep.

Well, I had to point it out anyway.. I suppose the current way out is to implement that config setting configuration I was thinking about to explicitly request building in-tree. I imagine this might be useful to other downstreams too.

I'll probably integrate your patch with some conditionals and the PEP 517 config setting on top to address this. And then, you'll need to adjust the RPM spec to pass the in-tree setting (I know that it's doable).

@webknjaz
Copy link
Member

I got to looking into this and one observation is that using python -m build performs copying and chdir first, while python -m pip wheel doesn't. That's why this wasn't reproducible easily.

@hroncok
Copy link
Contributor Author

hroncok commented May 31, 2024

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants