Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] IPEX-LLM + Axolotl Docker Image #10821

Open
kwaa opened this issue Apr 21, 2024 · 25 comments
Open

[Feature Request] IPEX-LLM + Axolotl Docker Image #10821

kwaa opened this issue Apr 21, 2024 · 25 comments
Assignees

Comments

@kwaa
Copy link

kwaa commented Apr 21, 2024

As the title suggests.

I try to make my own Dockerfile, but it always fails to build.

https://github.com/moeru-ai/Moeru-Llama-3-8B/blob/main/Dockerfile

(It would be great to have IPEX-LLM + ollama/llama.cpp images too)

@qiyuangong
Copy link
Contributor

Hi @kwaa
Thank you for submitting this issue. We will consider adding Docker images for IPEX-LLM + Axolotl Docker and other examples. However, it usually takes some time to go through internal reviews (especially for docker image).

Back to your docker file, can you share the error messages during docker building? Maybe we can fix this problem first. :)

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

can you share the error messages during docker building? Maybe we can fix this problem first. :)

Okay. The problem is that the installation of the dependency fails:

# Install requirements
RUN pip install -e . && \
    pip install transformers==4.36.0
podman -v # podman version 5.0.2

sudo podman compose build
Downloading smmap-5.0.1-py3-none-any.whl (24 kB)
Downloading svgwrite-1.4.3-py3-none-any.whl (67 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 7.1 MB/s eta 0:00:00
Building wheels for collected packages: optimum, rouge-score, fire, ffmpy, wavedrom
  Building wheel for optimum (pyproject.toml): started
  Building wheel for optimum (pyproject.toml): finished with status 'done'
  Created wheel for optimum: filename=optimum-1.13.2-py3-none-any.whl size=395599 sha256=38896e176613a1c92028a9c2383cfe66ab8aef2f86d9540096930717ef731afb
  Stored in directory: /root/.cache/pip/wheels/c7/36/5c/712f2d963d6d312afee816293b58610a3442d1a1de2182e651
  Building wheel for rouge-score (setup.py): started
  Building wheel for rouge-score (setup.py): finished with status 'done'
  Created wheel for rouge-score: filename=rouge_score-0.1.2-py3-none-any.whl size=24955 sha256=2f55fb6e5a68b745b71c26dd809038bc0d897f941d83d80cbaa0d0f031a4ec11
  Stored in directory: /root/.cache/pip/wheels/1e/19/43/8a442dc83660ca25e163e1bd1f89919284ab0d0c1475475148
  Building wheel for fire (setup.py): started
  Building wheel for fire (setup.py): finished with status 'done'
  Created wheel for fire: filename=fire-0.6.0-py2.py3-none-any.whl size=117047 sha256=f2fc59bf03786a40ae6b203d36fd6cd26335e2987e9598f6b8a1a1f4cc368d49
  Stored in directory: /root/.cache/pip/wheels/6a/f3/0c/fa347dfa663f573462c6533d259c2c859e97e103d1ce21538f
  Building wheel for ffmpy (setup.py): started
  Building wheel for ffmpy (setup.py): finished with status 'done'
  Created wheel for ffmpy: filename=ffmpy-0.3.2-py3-none-any.whl size=5600 sha256=9f04bfa5e3cc1ac776f4482a61ec953af63db0bb0ded8f0dd7bc7457e2301df6
  Stored in directory: /root/.cache/pip/wheels/55/3c/f2/f6e34046bac0d57c13c7d08123b85872423b89c8f59bafda51
  Building wheel for wavedrom (setup.py): started
  Building wheel for wavedrom (setup.py): finished with status 'done'
  Created wheel for wavedrom: filename=wavedrom-2.0.3.post3-py2.py3-none-any.whl size=30071 sha256=dadea55437d57655f9f7d85627c29ddc43c7629a4bb5eb9330129dd0f36b71df
  Stored in directory: /root/.cache/pip/wheels/23/cf/3b/4dcf6b22fa41c5ece715fa5f4e05afd683e7b0ce0f2fcc7bb6
Successfully built optimum rouge-score fire ffmpy wavedrom
Installing collected packages: wcwidth, pytz, pydub, nh3, ffmpy, appdirs, aniso8601, addict, xxhash, wrapt, werkzeug, websockets, tzdata, toolz, tomlkit, threadpoolctl, termcolor, tensorboard-data-server, svgwrite, sqlparse, smmap, shtab, shortuuid, shellingham, setproctitle, sentry-sdk, semantic-version, scipy, ruff, rpds-py, querystring-parser, python-multipart, python-dateutil, pynvml, pygments, pyasn1, pyarrow-hotfix, pyarrow, protobuf, prompt-toolkit, packaging, orjson, multidict, mdurl, markdown2, markdown, Mako, llvmlite, kiwisolver, joblib, jmespath, itsdangerous, importlib-resources, humanfriendly, httpcore, hf_transfer, grpcio, greenlet, graphql-core, google-crc32c, frozenlist, fonttools, entrypoints, docstring-parser, docker-pycreds, dill, decorator, cycler, contourpy, cloudpickle, cachetools, blinker, attrs, art, aioitertools, aiofiles, absl-py, yarl, wavedrom, tensorboard, sqlalchemy, scikit-learn, rsa, responses, requests-oauthlib, referencing, pyasn1-modules, proto-plus, pandas, numba, nltk, multiprocess, matplotlib, markdown-it-py, httpx, gunicorn, graphql-relay, googleapis-common-protos, google-resumable-media, gitdb, Flask, fire, docker, coloredlogs, botocore, aiosignal, rouge-score, rich, jsonschema-specifications, graphene, gradio-client, google-auth, gitpython, bitsandbytes, alembic, aiohttp, accelerate, wandb, tyro, typer, mlflow, jsonschema, google-auth-oauthlib, google-api-core, fschat, aiobotocore, s3fs, peft, google-cloud-core, datasets, bert-score, altair, trl, optimum, gradio, google-cloud-storage, evaluate, gcsfs, axolotl
  Attempting uninstall: websockets
    Found existing installation: websockets 12.0
    Uninstalling websockets-12.0:
      Successfully uninstalled websockets-12.0
  Attempting uninstall: protobuf
    Found existing installation: protobuf 5.27.0rc1
    Uninstalling protobuf-5.27.0rc1:
      Successfully uninstalled protobuf-5.27.0rc1
  Attempting uninstall: packaging
    Found existing installation: packaging 24.0
    Uninstalling packaging-24.0:
      Successfully uninstalled packaging-24.0
  Attempting uninstall: blinker
    Found existing installation: blinker 1.4
ERROR: Cannot uninstall 'blinker'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
Error: building at STEP "RUN pip install -e . &&     pip install transformers==4.36.0": while running runtime: exit status 1

@qiyuangong
Copy link
Contributor

This error is caused by trying to uninstall python libs installed by OS package manager (e.g., apt). In your example, it's blinker.

A simple workaround is to add --ignore-installed after the pip command.

pip install transformers==4.36.0 --ignore-installed blinker

If this command doesn't work, please uninstall os install package first.

apt remove python3-blinker
pip install transformers==4.36.0

https://stackoverflow.com/questions/53807511/pip-cannot-uninstall-package-it-is-a-distutils-installed-project

@qiyuangong qiyuangong self-assigned this Apr 22, 2024
@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

Update: The image builds successfully after adding apt remove -y python3-blinker, but it looks like still need to set up the accelerate config as described here.

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

If I try to enable DeepSpeed via accelerate config, it tells me that DeepSpeed is not installed.

Do you want to enable dynamic shape tracing? [yes/NO]:    
Do you want to use DeepSpeed? [yes/NO]: yes               
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/config.py", line 67, in config_command
    config = get_user_input()
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/config.py", line 40, in get_user_input
    config = get_cluster_input()
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/cluster.py", line 192, in get_cluster_input
    is_deepspeed_available()
AssertionError: DeepSpeed is not installed => run `pip3 install deepspeed` or build it from source
exit code: 1

I skipped DeepSpeed and generated this default_config.yaml:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
dynamo_config:
  dynamo_backend: IPEX
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

@qiyuangong
Copy link
Contributor

If I try to enable DeepSpeed via accelerate config, it tells me that DeepSpeed is not installed.

Do you want to enable dynamic shape tracing? [yes/NO]:    
Do you want to use DeepSpeed? [yes/NO]: yes               
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/config.py", line 67, in config_command
    config = get_user_input()
             ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/config.py", line 40, in get_user_input
    config = get_cluster_input()
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/config/cluster.py", line 192, in get_cluster_input
    is_deepspeed_available()
AssertionError: DeepSpeed is not installed => run `pip3 install deepspeed` or build it from source
exit code: 1

I skipped DeepSpeed and generated this default_config.yaml:

compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
dynamo_config:
  dynamo_backend: IPEX
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

Yes. We omit deepspeed in requirements (so does axolotl).

You can answer no for deepspeed. If you need the deepspeed in later stage (tensor parallel etc), you can install it with pip and re-config accelerate.

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

Currently axolotl reports error when saving the prepared dataset:

/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-22 16:47:55,676 - INFO - intel_extension_for_pytorch auto imported
2024-04-22 16:47:55,683 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-04-22 16:47:56,906] [INFO] [datasets.<module>:58] [PID:46] PyTorch version 2.1.0a0+cxx11.abi available.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

[2024-04-22 16:47:57,920] [WARNING] [axolotl.scripts.finetune.do_cli:60] [PID:46] [RANK:0] scripts/finetune.py will be replaced with calling axolotl.cli.train
[2024-04-22 16:47:57,922] [WARNING] [axolotl.validate_config:263] [PID:46] [RANK:0] We recommend setting `load_in_8bit: true` for LORA finetuning
[2024-04-22 16:47:57,923] [INFO] [axolotl.normalize_config:169] [PID:46] [RANK:0] GPU memory usage baseline: 0.000GB ()
[2024-04-22 16:47:57,924] [WARNING] [axolotl.scripts.check_accelerate_default_config:363] [PID:46] [RANK:0] accelerate config file found at /root/.cache/huggingface/accelerate/default_config.yaml. This can lead to unexpected errors
[2024-04-22 16:47:57,924] [INFO] [axolotl.scripts.check_user_token:371] [PID:46] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used.
[2024-04-22 16:47:58,135] [DEBUG] [axolotl.load_tokenizer:216] [PID:46] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-22 16:47:58,135] [DEBUG] [axolotl.load_tokenizer:217] [PID:46] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-22 16:47:58,135] [DEBUG] [axolotl.load_tokenizer:218] [PID:46] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-22 16:47:58,135] [DEBUG] [axolotl.load_tokenizer:219] [PID:46] [RANK:0] UNK: None / None
[2024-04-22 16:47:58,135] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:46] [RANK:0] Unable to find prepared dataset in /workspace/last_run_prepared/0468ae86c6bad72780d77c7d538dd375
[2024-04-22 16:47:58,135] [INFO] [axolotl.load_tokenized_prepared_datasets:182] [PID:46] [RANK:0] Loading raw datasets...
[2024-04-22 16:47:58,135] [WARNING] [axolotl.load_tokenized_prepared_datasets:184] [PID:46] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset.
[2024-04-22 16:47:58,135] [INFO] [axolotl.load_tokenized_prepared_datasets:191] [PID:46] [RANK:0] No seed provided, using default seed of 42
[2024-04-22 16:48:18,281] [INFO] [axolotl.load_tokenized_prepared_datasets:394] [PID:46] [RANK:0] merging datasets
[2024-04-22 16:48:18,310] [INFO] [axolotl.load_tokenized_prepared_datasets:404] [PID:46] [RANK:0] Saving merged prepared dataset to disk... /workspace/last_run_prepared/0468ae86c6bad72780d77c7d538dd375
Traceback (most recent call last):
  File "/workspace/axolotl/finetune.py", line 86, in <module>
    fire.Fire(do_cli)
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/finetune.py", line 81, in do_cli
    dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/cli/__init__.py", line 315, in load_datasets
    train_dataset, eval_dataset, total_num_steps, prompters = prepare_dataset(
                                                              ^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data.py", line 78, in prepare_dataset
    train_dataset, eval_dataset, prompters = load_prepare_datasets(
                                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data.py", line 441, in load_prepare_datasets
    dataset, prompters = load_tokenized_prepared_datasets(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/axolotl/src/axolotl/utils/data.py", line 405, in load_tokenized_prepared_datasets
    dataset.save_to_disk(prepared_ds_path)
  File "/usr/local/lib/python3.11/dist-packages/datasets/arrow_dataset.py", line 1515, in save_to_disk
    fs, _ = url_to_fs(dataset_path, **(storage_options or {}))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fsspec/core.py", line 383, in url_to_fs
    chain = _un_chain(url, kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/fsspec/core.py", line 323, in _un_chain
    if "::" in path
       ^^^^^^^^^^^^
TypeError: argument of type 'PosixPath' is not iterable
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'finetune.py', 'lora.yml']' returned non-zero exit status 1.

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

Currently axolotl reports error when saving the prepared dataset:

This looks to be related to OpenAccess-AI-Collective/axolotl#1544.

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

I fixed this by patching code (moeru-ai/Moeru-Llama-3-8B@c65e3b1#diff-b135d17426f077f767e0ec29114d24b182dcaa3f6dadaee03d8ff424adcdff0bR407), the problem now is that it will Segfault in the Starting trainer phase:

/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-22 18:13:19,839 - INFO - intel_extension_for_pytorch auto imported
2024-04-22 18:13:19,861 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-04-22 18:13:21,510] [INFO] [datasets.<module>:58] [PID:46] PyTorch version 2.1.0a0+cxx11.abi available.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

[2024-04-22 18:13:23,047] [WARNING] [axolotl.scripts.finetune.do_cli:60] [PID:46] [RANK:0] scripts/finetune.py will be replaced with calling axolotl.cli.train
[2024-04-22 18:13:23,050] [WARNING] [axolotl.validate_config:263] [PID:46] [RANK:0] We recommend setting `load_in_8bit: true` for LORA finetuning
[2024-04-22 18:13:23,051] [INFO] [axolotl.normalize_config:169] [PID:46] [RANK:0] GPU memory usage baseline: 0.000GB ()
[2024-04-22 18:13:23,051] [WARNING] [axolotl.scripts.check_accelerate_default_config:363] [PID:46] [RANK:0] accelerate config file found at /root/.cache/huggingface/accelerate/default_config.yaml. This can lead to unexpected errors
[2024-04-22 18:13:23,051] [INFO] [axolotl.scripts.check_user_token:371] [PID:46] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used.
[2024-04-22 18:13:23,379] [DEBUG] [axolotl.load_tokenizer:216] [PID:46] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-22 18:13:23,379] [DEBUG] [axolotl.load_tokenizer:217] [PID:46] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-22 18:13:23,379] [DEBUG] [axolotl.load_tokenizer:218] [PID:46] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-22 18:13:23,379] [DEBUG] [axolotl.load_tokenizer:219] [PID:46] [RANK:0] UNK: None / None
[2024-04-22 18:13:23,380] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:46] [RANK:0] Loading prepared dataset from disk at /workspace/last_run_prepared/0468ae86c6bad72780d77c7d538dd375...
[2024-04-22 18:13:23,383] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:46] [RANK:0] Prepared dataset loaded from disk...
[2024-04-22 18:13:23,387] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] total_num_tokens: 18727
[2024-04-22 18:13:23,388] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] `total_supervised_tokens: 14240`
[2024-04-22 18:13:26,569] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:46] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 18727
[2024-04-22 18:13:26,569] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] data_loader_len: 3
[2024-04-22 18:13:26,569] [INFO] [axolotl.log:60] [PID:46] [RANK:0] sample_packing_eff_est across ranks: [0.914404296875]
[2024-04-22 18:13:26,569] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] sample_packing_eff_est: None
[2024-04-22 18:13:26,569] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] total_num_steps: 12
[2024-04-22 18:13:26,571] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] total_num_tokens: 338809
[2024-04-22 18:13:26,580] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] `total_supervised_tokens: 249975`
[2024-04-22 18:13:26,582] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:46] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 338809
[2024-04-22 18:13:26,583] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] data_loader_len: 80
[2024-04-22 18:13:26,583] [INFO] [axolotl.log:60] [PID:46] [RANK:0] sample_packing_eff_est across ranks: [0.9618260583212209]
[2024-04-22 18:13:26,583] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] sample_packing_eff_est: 0.97
[2024-04-22 18:13:26,583] [DEBUG] [axolotl.log:60] [PID:46] [RANK:0] total_num_steps: 320
[2024-04-22 18:13:26,598] [DEBUG] [axolotl.train.log:60] [PID:46] [RANK:0] loading tokenizer... /workspace/models/llama-3-8b
[2024-04-22 18:13:26,815] [DEBUG] [axolotl.load_tokenizer:216] [PID:46] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-22 18:13:26,815] [DEBUG] [axolotl.load_tokenizer:217] [PID:46] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-22 18:13:26,815] [DEBUG] [axolotl.load_tokenizer:218] [PID:46] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-22 18:13:26,815] [DEBUG] [axolotl.load_tokenizer:219] [PID:46] [RANK:0] UNK: None / None
[2024-04-22 18:13:26,815] [DEBUG] [axolotl.train.log:60] [PID:46] [RANK:0] loading model and peft_config...
[2024-04-22 18:13:26,816] [INFO] [axolotl.load_model:366] [PID:46] [RANK:0] patching _expand_mask
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  3.79it/s]
[2024-04-22 18:14:33,304] [INFO] [axolotl.load_model:677] [PID:46] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-22 18:14:34,232] [INFO] [axolotl.load_lora:789] [PID:46] [RANK:0] found linear modules: ['up_proj', 'k_proj', 'gate_proj', 'down_proj', 'v_proj', 'o_proj', 'q_proj']
trainable params: 2,143,322,112 || all params: 5,851,353,088 || trainable%: 36.629512520711515
[2024-04-22 18:15:12,793] [INFO] [axolotl.load_model:714] [PID:46] [RANK:0] GPU memory usage after adapters: 0.000GB ()
[2024-04-22 18:15:12,838] [INFO] [axolotl.train.log:60] [PID:46] [RANK:0] Pre-saving adapter config to /workspace/out
[2024-04-22 18:15:12,947] [INFO] [axolotl.train.log:60] [PID:46] [RANK:0] Starting trainer...
[2024-04-22 18:15:13,108] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:46] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-22 18:15:13,109] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:46] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-22 18:15:13,197] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:46] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
  0%|          | 0/80 [00:00<?, ?it/s]
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [12th Gen Intel(R) Core(TM) i3-12100]
Registry and code: 13 MB
Command: /usr/bin/python3 finetune.py lora.yml
Uptime: 115.136140 s
Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/usr/local/lib/python3.11/dist-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'finetune.py', 'lora.yml']' died with <Signals.SIGSEGV: 11>.

@qiyuangong
Copy link
Contributor

qiyuangong commented Apr 22, 2024

LIBXSMM_TARGET: adl [12th Gen Intel(R) Core(TM) i3-12100]

Hi @kwaa

I will add the prepare data fail issue to troubleshooting. Thank you for your solution! :)

This segment fault issue may be related to intel-extension-for-pytorch and oneAPI.

Can you provide oneAPI version, GPU driver version and pip list? We will try to reproduce this issue.

@kwaa
Copy link
Author

kwaa commented Apr 22, 2024

Can you provide oneAPI version, GPU driver version and pip list? We will try to reproduce this issue.

I installed mesa 24.0.5, intel-compute-runtime 24.09.28717.12 and level-zero 1.16.14 on host, the rest is in the Dockerfile. (intelanalytics/ipex-llm-xpu:2.1.0-SNAPSHOT)

requirements.txt was copied from the axolotl example and has not been modified.

My GPUs are currently A770 16G and UHD730.

sudo intel_gpu_top -L
# card2                    Intel Dg2 (Gen12)                 pci:vendor=8086,device=56A0,card=0
# └─renderD129            
# card1                    Intel Alderlake_s (Gen12)         pci:vendor=8086,device=4692,card=0
# └─renderD128

@qiyuangong
Copy link
Contributor

sudo intel_gpu_top -L

Thank you for providing detailed env! :)

Your dockerfile seems fine. It is based on our XPU inference image. That means you are using intel/oneapi-basekit:2024.0.1 with level-zero 1.16.14 driver.

According to the previous error message and env setting, I think the main problem is this finetune program is running on card 1, i.e., iGPU. This may lead to segment fault and OOM.

Please refer to this doc and select ARC 770 as main GPU. In most cases, you can choose GPU with env.

export ONEAPI_DEVICE_SELECTOR=level_zero:1

@qiyuangong
Copy link
Contributor

qiyuangong commented Apr 23, 2024

BTW, instead of patching Axolotl. We can downgrade datasets to 2.15.0 to avoid previous the prepare data fail issue. Will add this change to doc and quick start. #10849

pip install datasets==2.15.0

qiyuangong added a commit that referenced this issue Apr 23, 2024
* Downgrade datasets to 2.15.0 to address axolotl prepare issue OpenAccess-AI-Collective/axolotl#1544

Tks to @kwaa for providing the solution in #10821 (comment)
@kwaa
Copy link
Author

kwaa commented Apr 23, 2024

export ONEAPI_DEVICE_SELECTOR=level_zero:1

Hmm... Something went wrong, I tried to run sycl-ls inside a container and it only showed cpu.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i3-12100 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]

@kwaa
Copy link
Author

kwaa commented Apr 23, 2024

Hmm... Something went wrong, I tried to run sycl-ls inside a container and it only showed cpu.

This is probably a NixOS bug, after I rolled back the system, sycl-ls would output normally.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i3-12100 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [23.35.27191.42]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 730 OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 730 1.3 [1.3.27191]

This is because older versions of intel-compute-runtime are not compatible with the 6.8 kernel, see intel/compute-runtime#710

@qiyuangong
Copy link
Contributor

Hmm... Something went wrong, I tried to run sycl-ls inside a container and it only showed cpu.

This is probably a NixOS bug, after I rolled back the system, sycl-ls would output normally.

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 12th Gen Intel(R) Core(TM) i3-12100 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics OpenCL 3.0 NEO  [23.35.27191.42]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 730 OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.27191]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) UHD Graphics 730 1.3 [1.3.27191]

This is because older versions of intel-compute-runtime are not compatible with the 6.8 kernel, see intel/compute-runtime#710

ARC 770 is in 0. Please change env.

export ONEAPI_DEVICE_SELECTOR=level_zero:0

@kwaa
Copy link
Author

kwaa commented Apr 23, 2024

I think it might be useful to provide an accelerate/default_config.yaml reference file to avoid misconfiguration.

Also, I fixed this (#10821 (comment)) by setting environments (intel/compute-runtime#710 (comment)), but at the moment the Trainer doesn't seem to be running correctly: it runs for a second and then never logs again, and I don't get any usable files outside of the json in the output_dir.

Maybe there is something wrong with my axolotl config?

/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-23 18:33:37,631 - INFO - intel_extension_for_pytorch auto imported
2024-04-23 18:33:37,650 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-04-23 18:33:38,905] [INFO] [datasets.<module>:58] [PID:53] PyTorch version 2.1.0a0+cxx11.abi available.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

[2024-04-23 18:33:39,850] [WARNING] [axolotl.scripts.finetune.do_cli:60] [PID:53] [RANK:0] scripts/finetune.py will be replaced with calling axolotl.cli.train
[2024-04-23 18:33:39,853] [WARNING] [axolotl.validate_config:263] [PID:53] [RANK:0] We recommend setting `load_in_8bit: true` for LORA finetuning
[2024-04-23 18:33:39,854] [INFO] [axolotl.normalize_config:169] [PID:53] [RANK:0] GPU memory usage baseline: 0.000GB ()
[2024-04-23 18:33:39,854] [WARNING] [axolotl.scripts.check_accelerate_default_config:363] [PID:53] [RANK:0] accelerate config file found at /root/.cache/huggingface/accelerate/default_config.yaml. This can lead to unexpected errors
[2024-04-23 18:33:39,854] [INFO] [axolotl.scripts.check_user_token:371] [PID:53] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used.
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: None / None
[2024-04-23 18:33:40,126] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:53] [RANK:0] Loading prepared dataset from disk at /workspace/last_run_prepared/0468ae86c6bad72780d77c7d538dd375...
[2024-04-23 18:33:40,134] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:53] [RANK:0] Prepared dataset loaded from disk...
[2024-04-23 18:33:40,143] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_tokens: 18727
[2024-04-23 18:33:40,146] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] `total_supervised_tokens: 14240`
[2024-04-23 18:33:43,050] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 18727
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] data_loader_len: 3
[2024-04-23 18:33:43,051] [INFO] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est across ranks: [0.914404296875]
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est: None
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_steps: 12
[2024-04-23 18:33:43,062] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_tokens: 338809
[2024-04-23 18:33:43,071] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] `total_supervised_tokens: 249975`
[2024-04-23 18:33:43,072] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 338809
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] data_loader_len: 80
[2024-04-23 18:33:43,073] [INFO] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est across ranks: [0.9618260583212209]
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est: 0.97
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_steps: 320
[2024-04-23 18:33:43,085] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading tokenizer... /workspace/models/llama-3-8b
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: None / None
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading model and peft_config...
[2024-04-23 18:33:43,287] [INFO] [axolotl.load_model:366] [PID:53] [RANK:0] patching _expand_mask
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.76it/s]
[2024-04-23 18:34:38,518] [INFO] [axolotl.load_model:677] [PID:53] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-23 18:34:39,488] [INFO] [axolotl.load_lora:789] [PID:53] [RANK:0] found linear modules: ['k_proj', 'v_proj', 'gate_proj', 'up_proj', 'down_proj', 'o_proj', 'q_proj']
trainable params: 2,143,322,112 || all params: 5,851,353,088 || trainable%: 36.629512520711515
[2024-04-23 18:35:15,919] [INFO] [axolotl.load_model:714] [PID:53] [RANK:0] GPU memory usage after adapters: 0.000GB ()
[2024-04-23 18:35:17,927] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Pre-saving adapter config to /workspace/out
[2024-04-23 18:35:18,034] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Starting trainer...
[2024-04-23 18:35:18,204] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-23 18:35:18,205] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-23 18:35:18,273] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809

@qiyuangong
Copy link
Contributor

qiyuangong commented Apr 24, 2024

I think it might be useful to provide an accelerate/default_config.yaml reference file to avoid misconfiguration.

Also, I fixed this (#10821 (comment)) by setting environments (intel/compute-runtime#710 (comment)), but at the moment the Trainer doesn't seem to be running correctly: it runs for a second and then never logs again, and I don't get any usable files outside of the json in the output_dir.

Maybe there is something wrong with my axolotl config?

/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
/usr/local/lib/python3.11/dist-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-04-23 18:33:37,631 - INFO - intel_extension_for_pytorch auto imported
2024-04-23 18:33:37,650 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
[2024-04-23 18:33:38,905] [INFO] [datasets.<module>:58] [PID:53] PyTorch version 2.1.0a0+cxx11.abi available.
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

[2024-04-23 18:33:39,850] [WARNING] [axolotl.scripts.finetune.do_cli:60] [PID:53] [RANK:0] scripts/finetune.py will be replaced with calling axolotl.cli.train
[2024-04-23 18:33:39,853] [WARNING] [axolotl.validate_config:263] [PID:53] [RANK:0] We recommend setting `load_in_8bit: true` for LORA finetuning
[2024-04-23 18:33:39,854] [INFO] [axolotl.normalize_config:169] [PID:53] [RANK:0] GPU memory usage baseline: 0.000GB ()
[2024-04-23 18:33:39,854] [WARNING] [axolotl.scripts.check_accelerate_default_config:363] [PID:53] [RANK:0] accelerate config file found at /root/.cache/huggingface/accelerate/default_config.yaml. This can lead to unexpected errors
[2024-04-23 18:33:39,854] [INFO] [axolotl.scripts.check_user_token:371] [PID:53] [RANK:0] Skipping HuggingFace token verification because HF_HUB_OFFLINE is set to True. Only local files will be used.
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-23 18:33:40,124] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: None / None
[2024-04-23 18:33:40,126] [INFO] [axolotl.load_tokenized_prepared_datasets:179] [PID:53] [RANK:0] Loading prepared dataset from disk at /workspace/last_run_prepared/0468ae86c6bad72780d77c7d538dd375...
[2024-04-23 18:33:40,134] [INFO] [axolotl.load_tokenized_prepared_datasets:181] [PID:53] [RANK:0] Prepared dataset loaded from disk...
[2024-04-23 18:33:40,143] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_tokens: 18727
[2024-04-23 18:33:40,146] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] `total_supervised_tokens: 14240`
[2024-04-23 18:33:43,050] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 18727
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] data_loader_len: 3
[2024-04-23 18:33:43,051] [INFO] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est across ranks: [0.914404296875]
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est: None
[2024-04-23 18:33:43,051] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_steps: 12
[2024-04-23 18:33:43,062] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_tokens: 338809
[2024-04-23 18:33:43,071] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] `total_supervised_tokens: 249975`
[2024-04-23 18:33:43,072] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 338809
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] data_loader_len: 80
[2024-04-23 18:33:43,073] [INFO] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est across ranks: [0.9618260583212209]
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] sample_packing_eff_est: 0.97
[2024-04-23 18:33:43,073] [DEBUG] [axolotl.log:60] [PID:53] [RANK:0] total_num_steps: 320
[2024-04-23 18:33:43,085] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading tokenizer... /workspace/models/llama-3-8b
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 128256 / <|im_end|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 128000 / <|begin_of_text|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 128001 / <|end_of_text|>
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: None / None
[2024-04-23 18:33:43,286] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading model and peft_config...
[2024-04-23 18:33:43,287] [INFO] [axolotl.load_model:366] [PID:53] [RANK:0] patching _expand_mask
Loading checkpoint shards: 100%|██████████| 4/4 [00:01<00:00,  2.76it/s]
[2024-04-23 18:34:38,518] [INFO] [axolotl.load_model:677] [PID:53] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-23 18:34:39,488] [INFO] [axolotl.load_lora:789] [PID:53] [RANK:0] found linear modules: ['k_proj', 'v_proj', 'gate_proj', 'up_proj', 'down_proj', 'o_proj', 'q_proj']
trainable params: 2,143,322,112 || all params: 5,851,353,088 || trainable%: 36.629512520711515
[2024-04-23 18:35:15,919] [INFO] [axolotl.load_model:714] [PID:53] [RANK:0] GPU memory usage after adapters: 0.000GB ()
[2024-04-23 18:35:17,927] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Pre-saving adapter config to /workspace/out
[2024-04-23 18:35:18,034] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Starting trainer...
[2024-04-23 18:35:18,204] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-23 18:35:18,205] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809
[2024-04-23 18:35:18,273] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 338809

We will consider about the default accelerate/default_config.yaml. But different HW leads to different configs, so it may be hard to provide a suitable one.

Just checked your axolot config. There are several problems

  1. Llama 3 is not supported by axolotl v0.4.0 (ipex-llm only support axolotl v0.4.0 right now). It's just supported in the main branch with Adding Llama-3 qlora OpenAccess-AI-Collective/axolotl#1536 and feat: Add LLaMA-3 instruct prompt strategies for fine-tuning  OpenAccess-AI-Collective/axolotl#1553 . Llama 3 also requires a different yaml, especially for tokens, https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/examples/llama-3/lora-8b.yml#L67 .
  2. dataset path - path: /workspace/datasets/alpaca_2k_test should be fine.

To support Llama 3 finetuning on ARC, we need to change to the main branch and upgrade several key libs (e.g., peft, transformers). Meanwhile, we need to change several source codes in Ipex-llm.

Good news is that we are already working on peft upgrade and llama 3 axolotl support. Will let you know when it's ready. :)

@kwaa
Copy link
Author

kwaa commented Apr 24, 2024

It looks like I need to download llama-2-7b and try the example lora.yml to confirm that the current version works.

@kwaa
Copy link
Author

kwaa commented Apr 24, 2024

I tried unsloth/llama-2-7b and it was consistent with the previous behavior.

[2024-04-24 17:20:25,931] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading tokenizer... /workspace/models/llama-2-7b
[2024-04-24 17:20:25,986] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 2 / </s>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 1 / <s>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 0 / <unk>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: 0 / <unk>
[2024-04-24 17:20:25,987] [INFO] [axolotl.load_tokenizer:224] [PID:53] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading model and peft_config...
[2024-04-24 17:20:25,990] [INFO] [axolotl.load_model:366] [PID:53] [RANK:0] patching _expand_mask
Loading checkpoint shards: 100%|██████████| 3/3 [00:11<00:00,  3.88s/it]
[2024-04-24 17:20:56,250] [INFO] [axolotl.load_model:677] [PID:53] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-24 17:20:56,501] [INFO] [axolotl.load_lora:789] [PID:53] [RANK:0] found linear modules: ['q_proj', 'up_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj']
trainable params: 39,976,960 || all params: 3,742,765,056 || trainable%: 1.0681129967245264
[2024-04-24 17:21:33,106] [INFO] [axolotl.load_model:714] [PID:53] [RANK:0] GPU memory usage after adapters: 0.000GB ()
[2024-04-24 17:21:35,274] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Pre-saving adapter config to /workspace/out
[2024-04-24 17:21:35,278] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Starting trainer...
[2024-04-24 17:21:35,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041
[2024-04-24 17:21:35,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041
[2024-04-24 17:21:35,612] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041

@qiyuangong
Copy link
Contributor

I tried unsloth/llama-2-7b and it was consistent with the previous behavior.

[2024-04-24 17:20:25,931] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading tokenizer... /workspace/models/llama-2-7b
[2024-04-24 17:20:25,986] [DEBUG] [axolotl.load_tokenizer:216] [PID:53] [RANK:0] EOS: 2 / </s>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:217] [PID:53] [RANK:0] BOS: 1 / <s>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:218] [PID:53] [RANK:0] PAD: 0 / <unk>
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.load_tokenizer:219] [PID:53] [RANK:0] UNK: 0 / <unk>
[2024-04-24 17:20:25,987] [INFO] [axolotl.load_tokenizer:224] [PID:53] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-24 17:20:25,987] [DEBUG] [axolotl.train.log:60] [PID:53] [RANK:0] loading model and peft_config...
[2024-04-24 17:20:25,990] [INFO] [axolotl.load_model:366] [PID:53] [RANK:0] patching _expand_mask
Loading checkpoint shards: 100%|██████████| 3/3 [00:11<00:00,  3.88s/it]
[2024-04-24 17:20:56,250] [INFO] [axolotl.load_model:677] [PID:53] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-24 17:20:56,501] [INFO] [axolotl.load_lora:789] [PID:53] [RANK:0] found linear modules: ['q_proj', 'up_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj']
trainable params: 39,976,960 || all params: 3,742,765,056 || trainable%: 1.0681129967245264
[2024-04-24 17:21:33,106] [INFO] [axolotl.load_model:714] [PID:53] [RANK:0] GPU memory usage after adapters: 0.000GB ()
[2024-04-24 17:21:35,274] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Pre-saving adapter config to /workspace/out
[2024-04-24 17:21:35,278] [INFO] [axolotl.train.log:60] [PID:53] [RANK:0] Starting trainer...
[2024-04-24 17:21:35,516] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041
[2024-04-24 17:21:35,517] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041
[2024-04-24 17:21:35,612] [INFO] [axolotl.utils.samplers.multipack._len_est:178] [PID:53] [RANK:0] packing_efficiency_estimate: 0.97 total_num_tokens per device: 414041

Seems axolotl loaded the checkpoint but didn't start training. Can you dump GPU usage with the following command?

sudo xpu-smi dump -d 0 -m 0,1,2,5,18

-d 0 means device 0. Change it if you are using other devices.

@qiyuangong
Copy link
Contributor

Hi @kwaa

We built our example with meta llama-2-7b. Not sure if it works on other models.

Can you give us the model version & link you are using in fintuning? We can try to reproduce this error.

@kwaa
Copy link
Author

kwaa commented Apr 26, 2024

Oh, sorry I missed the message before.

I'm using unsloth/llama-2-7b

Then I now suspect it may have something to do with the container running in the background.

If I run the container in the foreground, the interface gets stuck, so I started trying to fix my system yesterday to make it display with iGPU...

I'll keep updating if there are changes.

@kwaa
Copy link
Author

kwaa commented Apr 29, 2024

Update: I fixed the iGPU display issue with i915.enable_psr=1, but it wasn't running Trainer in the foreground either.

I also tried running xpu-smi inside the container, but it doesn't seem to exist

crun: executable file `xpu-smi` not found in $PATH: No such file or directory: OCI runtime attempted to invoke a command that was not found

@qiyuangong
Copy link
Contributor

level_zero

Hi @kwaa

If xpu-smi was not found in PATH, it may be caused by the wrong env or forgetting to source /xxx/Konami/servers.sh.

If you are using our image as base image, you can follow this command. https://github.com/intel-analytics/ipex-llm/blob/main/docker/llm/finetune/qlora/xpu/docker/start-qlora-finetuning-on-xpu.sh#L5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants