Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error processing pdf, jpg/png files #874

Closed
3 tasks
eosho opened this issue May 10, 2024 · 1 comment
Closed
3 tasks

Error processing pdf, jpg/png files #874

eosho opened this issue May 10, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@eosho
Copy link

eosho commented May 10, 2024

Describe the bug

When a pdf, jpg or png file is uploaded via the admin portal, the batch_push_results function app fails with a file is corrupted... error:

Expected behavior

When pdf, png or jpg file are uploaded, it's expected to be supported and processed via form recognizer.

How does this bug make you feel?

Share a gif from giphy to tells us how you'd feel

:)
lol

Debugging information

Steps to reproduce

Steps to reproduce the behavior:

  1. Upload a pdf, png or jpg file
  2. Process them & check the batch_push_results function for errors
  3. The following error is generated "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats." }
  4. These files are available for deletion but nothing more.

Screenshots

If applicable, add screenshots to help explain your problem.

Logs

If applicable, add logs to help the engineer debug the problem.

Executing 'Functions.batch_push_results' (Reason='New queue message detected on 'doc-processing'.', Id=b9384d92--xxxx)
Python queue trigger function processed a queue item: {"filename": "Frequently Asked Questions.pdf"}
Result: Failure Exception: ValueError: Error: Traceback (most recent call last): File "/home/site/wwwroot/utilities/helpers/AzureFormRecognizerHelper.py", line 78, in begin_analyze_document_from_url poller = self.document_analysis_client.begin_analyze_document_from_url( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 89, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/ai/formrecognizer/_document_analysis_client.py", line 198, in begin_analyze_document_from_url return _client_op_path.begin_analyze_document( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 89, in wrapper_use_tracer return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/ai/formrecognizer/_generated/v2023_07_31/operations/_document_models_operations.py", line 518, in begin_analyze_document raw_result = self._analyze_document_initial( # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/azure/ai/formrecognizer/_generated/v2023_07_31/operations/_document_models_operations.py", line 443, in _analyze_document_initial raise HttpResponseError(response=response) azure.core.exceptions.HttpResponseError: (InvalidRequest) Invalid request. Code: InvalidRequest Message: Invalid request. Inner error: { "code": "InvalidContent", "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats." } . Error: (InvalidRequest) Invalid request. Code: InvalidRequest Message: Invalid request. Inner error: { "code": "InvalidContent", "message": "The file is corrupted or format is unsupported. Refer to documentation for the list of supported formats." } Stack: File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 545, in _handle__invocation_request call_result = await self._loop.run_in_executor( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/dispatcher.py", line 826, in _run_sync_func return ExtensionManager.get_sync_invocation_wrapper(context, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/azure-functions-host/workers/python/3.11/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper result = function(**args) ^^^^^^^^^^^^^^^^ File "/home/site/wwwroot/BatchPushResults.py", line 30, in batch_push_results do_batch_push_results(msg) File "/home/site/wwwroot/BatchPushResults.py", line 47, in do_batch_push_results embedder.embed_file(file_sas, file_name) File "/home/site/wwwroot/utilities/helpers/embedders/PushEmbedder.py", line 37, in embed_file self.__embed(source_url=source_url, embedding_config=embedding_config) File "/home/site/wwwroot/utilities/helpers/embedders/PushEmbedder.py", line 46, in __embed documents: List[SourceDocument] = self.document_loading.load( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/site/wwwroot/utilities/helpers/DocumentLoadingHelper.py", line 17, in load return loader.load(document_url) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/site/wwwroot/utilities/document_loading/Layout.py", line 13, in load pages_content = azure_form_recognizer_client.begin_analyze_document_from_url( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/site/wwwroot/utilities/helpers/AzureFormRecognizerHelper.py", line 147, in begin_analyze_document_from_url raise ValueError(f"Error: {traceback.format_exc()}. Error: {e}")

Tasks

To be filled in by the engineer picking up the issue

  • Task 1
  • Task 2
  • ...
@eosho eosho added the bug Something isn't working label May 10, 2024
@eosho
Copy link
Author

eosho commented May 16, 2024

Closing this. Identified the issue as rbac related.

@eosho eosho closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant