Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Image for running distilabel CLI #611

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open

Conversation

ignacioct
Copy link
Contributor

@ignacioct ignacioct commented May 7, 2024

Closes #608

I've implemented the two images for running Distilabel, one that builds from runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04 and is able of using CUDA and one more constrained from python:3.11-slim. To try them out:

docker build --tag distilabel_cuda --file docker/CUDA.Dockerfile . --no-cache
docker run --rm distilabel_cuda distilabel pipeline run --config "https://huggingface.co/datasets/distilabel-internal-testing/test-dockerfile-2/raw/main/pipeline.yaml"
docker build --tag distilabel_local --file docker/local.Dockerfile . --no-cache
docker run --rm distilabel_local distilabel pipeline run --config "https://huggingface.co/datasets/distilabel-internal-testing/test-dockerfile-2/raw/main/pipeline.yaml"

I was unsure of which dependencies to include in the local image. Do you have any ideas @plaguss @gabrielmbmb ?

@ignacioct ignacioct self-assigned this May 7, 2024
@ignacioct ignacioct changed the base branch from main to develop May 7, 2024 11:21
@ignacioct ignacioct marked this pull request as ready for review May 14, 2024 10:31
@plaguss
Copy link
Contributor

plaguss commented May 14, 2024

As suggested by @alvarobartt we should work with the nvidia base images. I'll copy here the comments:

  • This should be enough to get started:
FROM nvidia/cuda:12.3.0-base-ubuntu22.04 AS build

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
    apt-get install python3 python3-pip -y

RUN ln -s /usr/bin/python3 /usr/bin/python
ENV PYTHON=/usr/bin/python

ARG TORCH="2.2.0"

RUN python -m pip install --no-cache-dir --upgrade pip && \
    python -m pip install --no-cache-dir torch==${TORCH}
  • Additionally, you could also add some BUILD_ARGS for both the CUDA and Ubuntu versions, as well as distilabel itself, so that we can use that Dockerfile to build Docker images for multiple CUDA versions (ideally only 12.3 and 11.8 should be needed)

  • Plus something else we'll need to take into consideration is that the image may be used from Linux distributions or Windows (not sure if Windows requires some flags to be set in order to properly identify the GPU and such, but maybe worth double checking)

@ignacioct
Copy link
Contributor Author

@plaguss should be good to go. As I talked with @alvarobartt yesterday, we are sticking to the runpod image for now. I added the build arguments, and as far as the research I did, there should be no problem on Windows. If it arises, happy to adapt it with more feedback.

Base automatically changed from develop to main May 20, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Docker image
2 participants