You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Downloading and loading models on ONNX over GPU devices crashes. (at least on T4 on Colab)
Current Behavior
Crashes with:
An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel.
: ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory
at ai.onnxruntime.providers.OrtCUDAProviderOptions.add(Native Method)
at ai.onnxruntime.providers.OrtCUDAProviderOptions.<init>(OrtCUDAProviderOptions.java:44)
at com.johnsnowlabs.ml.onnx.OnnxWrapper$.mapToCUDASessionConfig(OnnxWrapper.scala:152)
at com.johnsnowlabs.ml.onnx.OnnxWrapper$.mapToSessionOptionsObject(OnnxWrapper.scala:136)
at com.johnsnowlabs.ml.onnx.OnnxWrapper$.com$johnsnowlabs$ml$onnx$OnnxWrapper$$withSafeOnnxModelLoader(OnnxWrapper.scala:90)
at com.johnsnowlabs.ml.onnx.OnnxWrapper$.read(OnnxWrapper.scala:122)
at com.johnsnowlabs.ml.onnx.ReadOnnxModel.readOnnxModel(OnnxSerializeModel.scala:98)
at com.johnsnowlabs.ml.onnx.ReadOnnxModel.readOnnxModel$(OnnxSerializeModel.scala:75)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.readOnnxModel(MPNetEmbeddings.scala:471)
at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.readModel(MPNetEmbeddings.scala:416)
at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.readModel$(MPNetEmbeddings.scala:407)
at com.johnsnowlabs.nlp.embeddings.MPNetEmbeddings$.readModel(MPNetEmbeddings.scala:471)
at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.$anonfun$$init$$1(MPNetEmbeddings.scala:424)
at com.johnsnowlabs.nlp.embeddings.ReadMPNetDLModel.$anonfun$$init$$1$adapted(MPNetEmbeddings.scala:424)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1(ParamsAndFeaturesReadable.scala:50)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$onRead$1$adapted(ParamsAndFeaturesReadable.scala:49)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.onRead(ParamsAndFeaturesReadable.scala:49)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.ParamsAndFeaturesReadable.$anonfun$read$1$adapted(ParamsAndFeaturesReadable.scala:61)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:38)
at com.johnsnowlabs.nlp.FeaturesReader.load(ParamsAndFeaturesReadable.scala:24)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:513)
at com.johnsnowlabs.nlp.pretrained.ResourceDownloader$.downloadModel(ResourceDownloader.scala:505)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.downloadModel(ResourceDownloader.scala:705)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.downloadModel(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Expected Behavior
Should work before upgrading to newer version of Spark NLP
I haven't been able to replicate the error. I tried in Google Colab with T4 but it is working for spark-np 5.2.0. Can you take a look at this notebook, reproduce the error and let me know
You forgot to load ONNX GPU build in start function: spark = sparknlp.start(gpu=True). Once the session is started with the GPU build of ONNX and TF, the ONNX models will fail with that error
Some extra information, I can use A100 GPUs without any issue. So this must be something with Colab itself, it is either missing something (lib) or it has them but a different versions. (usually older, so for GPU we usually do something in the Colab script-setup to fix those)
@danilojsl Let's find out what's missing and how to fix them, then we can modify the GPU installation for Colab accordingly:
maziyarpanahi
changed the title
ONNX crashes on GPU in latest Spark NLP 5.2
ONNX crashes on Colab's T4 GPU runtime
Dec 28, 2023
maziyarpanahi
changed the title
ONNX crashes on Colab's T4 GPU runtime
ONNX models crashe on Colab's T4 GPU runtime
Dec 28, 2023
maziyarpanahi
changed the title
ONNX models crashe on Colab's T4 GPU runtime
ONNX models crash when they are used in Colab's T4 GPU runtime
Dec 28, 2023
Is there an existing issue for this?
Who can help?
@danilojsl
What are you working on?
Downloading and loading models on ONNX over GPU devices crashes. (at least on T4 on Colab)
Current Behavior
Crashes with:
Expected Behavior
Should work before upgrading to newer version of Spark NLP
Steps To Reproduce
!pip install spark-nlp pyspark
Spark NLP version and Apache Spark
Spark NLP version 5.2.0
Apache Spark version: 3.5.0
Type of Spark Application
Python Application
Java Version
11
Java Home Directory
No response
Setup and installation
No response
Operating System and Version
No response
Link to your project (if available)
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: