Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable bazel build tensorflow with cuda support inside a docker #67145

Closed
PriyajeetGoswami opened this issue May 8, 2024 · 6 comments
Closed
Assignees
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.7 Issues related to TF 2.7.0 type:build/install Build and install issues

Comments

@PriyajeetGoswami
Copy link

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.7.0

Custom code

No

OS platform and distribution

Linux Ubuntu 20.04

Mobile device

No response

Python version

3.8.10

Bazel version

6.2.1

GCC/compiler version

9.3.0

CUDA/cuDNN version

11.2

GPU model and memory

No response

Current behavior?

https://www.tensorflow.org/install/source
Followed the instructions here to create a tensorflow docker with gpu support, unable to build the tensorlfow using bazel using the following command bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda

Standalone code to reproduce the issue

I want to create a docker using which I build tflite model in C++ which can access the Nvidia gpu.

Relevant log output

root@b7bdee9743e8:/tensorflow/tensorflow# bazel build //tensorflow/tools/pip_package:wheel --repo_env=WHEEL_NAME=tensorflow --config=cuda
Starting local Bazel server and connecting to it...
WARNING: Option 'java_toolchain' is deprecated
WARNING: Option 'host_java_toolchain' is deprecated
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=1 --terminal_columns=191
INFO: Reading rc options for 'build' from /tensorflow/tensorflow/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'build' from /tensorflow/tensorflow/.bazelrc:
  'build' options: --define framework_shared_object=true --java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --host_java_toolchain=@tf_toolchains//toolchains/java:tf_java_toolchain --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/common,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/fallback,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file /tensorflow/tensorflow/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file /tensorflow/tensorflow/.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:cuda in file /tensorflow/tensorflow/.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:linux in file /tensorflow/tensorflow/.bazelrc: --copt=-w --host_copt=-w --define=PREFIX=/usr --define=LIBDIR=$(PREFIX)/lib --define=INCLUDEDIR=$(PREFIX)/include --define=PROTOBUF_INCLUDE_PATH=$(PREFIX)/include --cxxopt=-std=c++14 --host_cxxopt=-std=c++14 --config=dynamic_kernels --distinct_host_configuration=false --experimental_guard_against_concurrent_changes
INFO: Found applicable config definition build:dynamic_kernels in file /tensorflow/tensorflow/.bazelrc: --define=dynamic_loaded_kernels=true --copt=-DAUTOLOAD_DYNAMIC_KERNELS
WARNING: Option 'java_toolchain' is deprecated
WARNING: Option 'host_java_toolchain' is deprecated
INFO: Repository local_config_cuda instantiated at:
  /tensorflow/tensorflow/WORKSPACE:15:14: in <toplevel>
  /tensorflow/tensorflow/tensorflow/workspace2.bzl:1079:19: in workspace
  /tensorflow/tensorflow/tensorflow/workspace2.bzl:94:19: in _tf_toolchains
Repository rule cuda_configure defined at:
  /tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl:1448:33: in <toplevel>
ERROR: An error occurred during the fetch of repository 'local_config_cuda':
   Traceback (most recent call last):
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1401, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 978, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 666, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn"])
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 643, column 41, in find_cuda_config
                exec_result = _exec_find_cuda_config(repository_ctx, script_path, cuda_libraries)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 637, column 19, in _exec_find_cuda_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
ERROR: /tensorflow/tensorflow/WORKSPACE:15:14: fetching cuda_configure rule //external:local_config_cuda: Traceback (most recent call last):
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 1401, column 38, in _cuda_autoconf_impl
                _create_local_cuda_repository(repository_ctx)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 978, column 35, in _create_local_cuda_repository
                cuda_config = _get_cuda_config(repository_ctx, find_cuda_config_script)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 666, column 30, in _get_cuda_config
                config = find_cuda_config(repository_ctx, find_cuda_config_script, ["cuda", "cudnn"])
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 643, column 41, in find_cuda_config
                exec_result = _exec_find_cuda_config(repository_ctx, script_path, cuda_libraries)
        File "/tensorflow/tensorflow/third_party/gpus/cuda_configure.bzl", line 637, column 19, in _exec_find_cuda_config
                return execute(repository_ctx, [python_bin, "-c", decompress_and_execute_cmd])
        File "/tensorflow/tensorflow/third_party/remote_config/common.bzl", line 230, column 13, in execute
                fail(
Error in fail: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
ERROR: @local_config_cuda//:enable_cuda :: Error loading option @local_config_cuda//:enable_cuda: Repository command failed
Could not find any cublas_api.h matching version '' in any subdirectory:
        ''
        'include'
        'include/cuda'
        'include/*-linux-gnu'
        'extras/CUPTI/include'
        'include/cuda/CUPTI'
of:
        '/usr'
        '/usr/lib/x86_64-linux-gnu'
        '/usr/local/cuda'
        '/usr/local/cuda/lib64/stubs'
        '/usr/local/cuda/targets/x86_64-linux/lib'
@google-ml-butler google-ml-butler bot added the type:build/install Build and install issues label May 8, 2024
@tilakrayal tilakrayal added TF 2.7 Issues related to TF 2.7.0 subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels May 8, 2024
@tilakrayal
Copy link
Contributor

@PriyajeetGoswami,

I was able to clone the tensorflow repository without any problem on Ubuntu. I observed that you are using Bazel 6.2,
GCC 9.3 which is incompatible with TF v2.7.0. And also TF v2.7.0 is a pretty older version, please try to install the latest stable version.

Could you please create a virtual environment and try to install the tensorflow as mentioned in this official document link and have a look at the compatible tested build configurations as well. Please find the attached screenshot for reference.

Screenshot 2023-07-21 3 47 25 PM

Is there any specific reason to install tensorflow v2.7, because as mentioned above v2.7 is the pretty older version. It's unlikely for TF 2.7 version to receive any bug fixes except when we have security patches. There is a high possibility that this was fixed with later TF versions. Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label May 9, 2024
@PriyajeetGoswami
Copy link
Author

I was trying TF 2.7 because of some project requirement but moved on from that, used latest version of TF, it worked. Thanks

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label May 15, 2024
@tilakrayal
Copy link
Contributor

@PriyajeetGoswami,
Glad the issue is resolved. Please feel free to move this issue to closed status. Thank you!

@tilakrayal tilakrayal added the stat:awaiting response Status - Awaiting response from author label May 16, 2024
Copy link

This issue is stale because it has been open for 7 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label May 24, 2024
Copy link

This issue was closed because it has been inactive for 7 days since being marked as stale. Please reopen if you'd like to work on this further.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.7 Issues related to TF 2.7.0 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

2 participants