Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CANN Backend support #1606

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

CANN Backend support #1606

wants to merge 1 commit into from

Conversation

3manifold
Copy link

@3manifold 3manifold commented Jan 26, 2024

CANN Backend support

Introduction

CANN (Compute Architecture of Neural Networks), developed by Huawei, is a heterogeneous computing architecture for AI scenarios.
It provides multi-layer programming interfaces to help users quickly build AI applications and services based on the Ascend platform.

CANN backend in CTranslate2, enables running AI models on the Ascend NPU extending the existing CPU & CUDA workflows. One can find more on Ascend NPU and CANN library here.

Examples of projects that support CANN include ONNX Runtime & OpenCV.

resolves #1609

Notes

Implementation

CANN backend support implementation introduces Device::CANN similarly to CPU & CUDA.
CANN workflow can be enabled using -DWITH_CANN=ON in cmake configuration (see examples/cann). As to CUDA, CANN can coexist alongside CPU workflow.

CANN workflow is accessible through examples (examples/cann/main.cc), cli or Python module.
Operators & primitives were implemented for CANN in order for the end-to-end example in ctranslate2 documentation to run successfully.

Tests

Tests were extended for Device::CANN & respective DataType. Additional tests were also implemented involving extra/edge cases. Gtest output: gtest_cann.log

Environment Setup

  • Download CANN drivers by selecting AArch64.run category (current implementation involved CANN 7.0.RC1.alpha001).
  • Build image & run container as in docker/cann.

For details about how to set up the development environment and operating environment, see Development and Operating Environment Setup
and CANN Software Installation Guide.

Build CANN Python module

CANN Python module is expected to be built using the respective Docker files. Nevertheless, here we provide a quick way for building, ideal for testing and benchmarking.

#!/bin/bash

# execute from project root 
rm -rf build-release/
mkdir build-release && cd build-release || exit
 
cmake -DWITH_CANN=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_CLI=OFF -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DCMAKE_PREFIX_PATH="/opt/OpenBLAS" -DWITH_OPENBLAS=ON -DWITH_RUY=ON ..

VERBOSE=1 make -j"$(nproc)" install && cd ..  

export CIBW_ARCHS=aarch64  
pip3 uninstall --yes ctranslate2

pip install -r python/install_requirements.txt

cd python && python3 setup.py bdist_wheel && cd ..

python3 -m pip install python/dist/ctranslate2*.whl

export LD_LIBRARY_PATH=/usr/local/lib:${LD_LIBRARY_PATH}

Build CANN C++ example

#!/bin/bash

# execute from project root

# first build ct2lib
rm -rf build-release/
mkdir build-release && cd build-release || exit

cmake -DWITH_CANN=ON -DCMAKE_BUILD_TYPE=Release -DBUILD_CLI=OFF -DWITH_MKL=OFF -DOPENMP_RUNTIME=COMP -DCMAKE_PREFIX_PATH="/opt/OpenBLAS" -DWITH_OPENBLAS=ON -DWITH_RUY=ON ..

make -j"$(nproc)"

rm CMakeCache.txt

# then build cann_run
cmake -DCMAKE_BUILD_TYPE=Release ../examples/cann/

make -j"$(nproc)"
# ./cann_run <ende_ctranslate2_path>

Samples

Python

import ctranslate2 

print("get_supported_compute_types for cann: ", ctranslate2.get_supported_compute_types("cann")) 
print("get_cann_device_count: ", ctranslate2.get_cann_device_count())
 
translator = ctranslate2.Translator("/ctranslate2_docs/ende_ctranslate2/", device="auto") 
    
results = translator.translate_batch([["▁H", "ello", "▁world", "!"]])  
output_tokens = results[0].hypotheses[0]
print(output_tokens)
> python3 ct2python_example.py
get_supported_compute_types for cann:  {'int8_float16', 'int8_float32', 'int8', 'float32', 'bfloat16', 'int8_bfloat16', 'float16'}
get_cann_device_count:  8 
['▁Hallo', '▁Welt', '!'] 

C++

Execution example in C++ can be found in examples/cann.

CLI

echo "▁H ello ▁world !" | ./ct2-translator --model "./ende_ctranslate2/"

root@90b230f7e68f /t/t/c/cli# echo  "▁H ello ▁world !" | ./ct2-translator --model "./ende_ctranslate2/"
▁Hallo ▁Welt !

Benchmark

We conducted several runs measuring translation latency using all 192 CPU cores and 1 NPU device for a single batch.
In specific, experiments demonstrate results for 4 consecutive runs involving 4 and 306 tokens respectively. NPU proved
faster in all cases.

Input tokens

4 tokens
{{"▁H", "ello", "▁world", "!"}}
306 tokens
{{"▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", ".", "▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", ".", "▁In", "▁this", "▁paper", ",", "▁we", "▁speed", "▁up", "▁the", "▁context", "▁extension", "▁of", "▁L", "LM", "s", ",", "▁in", "▁two", "▁aspects", ".", "▁Particularly", ",", "▁it", "▁can", "▁be", "▁implemented", "▁with", "▁only", "▁two", "▁lines", "▁of", "▁code", "▁in", "▁training", ",", "▁while", "▁being", "▁optional", "▁in", "▁in", "fer", "ence", ".", "▁Typical", "ly" , "▁training", "▁L", "LM", "s", "▁with", "▁long", "▁context", "▁sizes", "▁is" ,"▁comp", "ut", "ation", "ally", "▁expensive", "▁requiring", "▁extensive", "▁training", "▁hours", "▁and", "▁G", "PU", "▁resources", ".", "▁On", "▁the", "▁one", "▁hand", ",", "▁although", "▁den", "se", "▁global", "▁attention", "▁is", "▁needed", "▁during", "▁in", "fer", "ence", ",", "▁fine", "-", "tun", "ing", "▁the", "▁model", "▁can" ,"▁be", "▁effectively", "▁and", "▁efficiently", "▁done", "▁by", "▁spar", "se", "▁local", "▁attention", "."}}

Hardware

CPU: arm64 Kunpeng 920 Series @2.6GHz (192 cores - utilized all)
NPU: Ascend 910A AI Processor (8 devices - utilized 1)

Experiments

image
image

4 tokens cpu cann
1 0:00:00.098600 0:00:00.093737
2 0:00:00.098584 0:00:00.092929
3 0:00:00.131760 0:00:00.093115
4 0:00:00.109684 0:00:00.093026
306 tokens cpu cann
1 0:00:02.437300 0:00:02.283184
2 0:00:02.468804 0:00:02.018239
3 0:00:02.469789 0:00:01.877654
4 0:00:02.744319 0:00:02.080763

@3manifold 3manifold marked this pull request as draft January 26, 2024 14:23
@3manifold 3manifold force-pushed the ct2-cann branch 2 times, most recently from e7c01a1 to 8ce20f6 Compare January 29, 2024 09:46
@3manifold 3manifold marked this pull request as ready for review January 29, 2024 11:12
Co-authored-by: kandrio <konstantinosand@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] CANN Backend support
1 participant