Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ov::cache_dir mechanism is sensitive to the currently set LC_NUMERIC locale #24370

Closed
3 tasks done
RyanMetcalfeInt8 opened this issue May 5, 2024 · 5 comments
Closed
3 tasks done
Assignees
Labels
bug Something isn't working support_request

Comments

@RyanMetcalfeInt8
Copy link

OpenVINO Version

2024.1

Operating System

Windows System

Device used for inference

CPU

Framework

None

Model used

Whisper Encoder Base (https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-base-models.zip)

Issue description

I've noticed that the OpenVINO caching mechanism is sensitive to the currently set LC_NUMERIC locale. This problem has revealed itself in applications that use OpenVINO, and also support multiple languages.

For example, in our set of AI Plugins for Audacity, there has been many users that have reported an issue that the AI features work the very first time that they are run, but then produce bad results afterward. Here is a good example: intel/openvino-plugins-ai-audacity#1

It took a while to figure out that the problem can easily be reproduced by explicitly setting other locales, and so I have attached a small reproducer that clearly illustrates the problem:

Step-by-step reproduction

Here is the source code to reproduce:
ov_cache_locale_test.zip

  1. Compile it in a similar way as OpenVINO samples. i.e.:
"C:\Path\To\w_openvino_toolkit_windows_2024.1.0.15008.f4afc983258_x86_64\w_openvino_toolkit_windows_2024.1.0.15008.f4afc983258_x86_64\setupvars.bat"
cd ov_cache_locale_test
build.bat -b build
  1. This test uses ggml-base-encoder-openvino.xml/.bin IR's (but note, the problem is reproduceable for many models). You can grab these IRs from this package on HF: https://huggingface.co/Intel/whisper.cpp-openvino-models/blob/main/ggml-base-models.zip. Copy ggml-base-encoder-openvino.xml/.bin to build/Release folder.

  2. Run the compiled, main.exe:

:: go to build dir
cd build\Release

:: run it
main.exe
  1. You should get output that looks like this:
The default numeric locale is: C
Set LC_NUMERIC locale to C
locale C Passed: output between first run (caching the blob), and second run (using the blob) matches!
Set LC_NUMERIC locale to de_DE.utf8
locale de_DE.utf8 Failed: Mismatch in output between first run (caching the blob), and second run (using the blob).

Which illustrates the problem.

For the default locale ("C"), the model produces the same output for the first & second compile + run sequence, where in the first run the blob was created / cached, and the second it was used.

For the 'de_DE.utf8' locale, there is a mismatch.

Relevant log output

No response

Issue submission checklist

  • I'm reporting an issue. It's not a question.
  • I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
  • There is reproducer code and related data files such as images, videos, models, etc.
@riverlijunjie
Copy link
Contributor

@RyanMetcalfeInt8 The cache blob can be exported and imported successfully but get incorrect inference result after we set setlocale(LC_NUMERIC, "de_DE.utf8"), is my understanding correct?

@RyanMetcalfeInt8
Copy link
Author

Hi @riverlijunjie,

Yes, I think your understanding is correct. Although in the reproducer code that I posted, I am not using explicit export / import APIs -- it uses cache directory mechanism (but yes, export / import happens 'under the hood' in this case I believe).

@zhaixuejun1993
Copy link
Contributor

zhaixuejun1993 commented May 9, 2024

This issue seems caused by the bug in the pugixml library used in OpenVINO.
https://pugixml.org/docs/manual.html#access.nodedata
image
I tried to limit the "C" env in export/import, this issue can be fixed.

@riverlijunjie
Copy link
Contributor

Agree with @zhaixuejun1993 , we have root cause this issue and we did observe that 1 byte was removed during exporting model cache if locale is "de_DE.utf8" comparing to "C".
It is the simplest fixing to setlocale(LC_NUMERIC,"C") before export/import model cache and restore it after export/import model cache, but I'm not sure whether it will bring any other side effect, @ilya-lavrenov any comments?

github-merge-queue bot pushed a commit that referenced this issue May 17, 2024
### Details:
- Check the value of setlocale for export/import, if different with "C"
will set to "C" and record the original value, after export/import done
will reset to the original.
- *Fix the error caused by pugixml library with The setlocale function
installs the specified system locale or its portion as the new C locale.
different C may return unexpected results with setlocal()*

### Tickets:
 - #24370

---------

Signed-off-by: Zhai, Xuejun <xuejun.zhai@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: River Li <river.li@intel.com>
@peterchen-intel
Copy link
Contributor

The verified message has been provided in PR, close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working support_request
Projects
None yet
Development

No branches or pull requests

4 participants