Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Error: /typecheck in --runpdf-- #1162

Open
muramasatheninja opened this issue Oct 8, 2023 · 3 comments
Open

[Bug]: Error: /typecheck in --runpdf-- #1162

muramasatheninja opened this issue Oct 8, 2023 · 3 comments
Assignees
Labels
third party issue Problem with a third party dependency

Comments

@muramasatheninja
Copy link

Describe the bug

Some background info. I'm using paperless-ngx. I tried to add the Document ECMA-395 (Ultra-Speed
Compact Disc ReWritable System Description" Which can be found at the following website.
https://www.ecma-international.org/publications-and-standards/standards/ecma-395/

When trying to add it to paperless-ngx it call myocrpdf and fails at the end. I went ahead and tried processing the file with myocrpdf directly, and it fails with the following.

Error: /typecheck in --runpdf--
Operand stack:
--nostringval-- --nostringval-- --nostringval-- --nostringval--
Execution stack:
%interp_exit .runexec2 --nostringval-- runpdf --nostringval-- 2 %stopped_push --nostringval-- runpdf runpdf false 1 %stopped_push 1974 1 3 %oparray_pop
1973 1 3 %oparray_pop 1961 1 3 %oparray_pop 1962 1 3 %oparray_pop runpdf runpdf runpdf runpdf
Dictionary stack:
--dict:778/1123(ro)(G)-- --dict:0/20(G)-- --dict:87/200(L)-- --dict:18/20(L)--
Current allocation mode is local
GPL Ghostscript 10.01.2: Unrecoverable error, exit code 1

SubprocessOutputError: Ghostscript PDF/A rendering failed

I found a similiar bug from paperless-ngx
paperless-ngx/paperless-ngx#3933

They seem to think this is an ocrmypdf issue. It seems like ghostcript is the component failing but I'm not familiar with how ocrpdf is calling it so I figured i would start here.

Steps to reproduce

1. ocrmypdf --force-ocr ECMA-395_1st_edition_december_2010.pdf ecma-395_ocr.pdf
2. ocrmypdf scans content
3. reaches postprocessing.
4. fails with error above.
https://www.ecma-international.org/publications-and-standards/standards/ecma-395/

Files

Input is copyrighted but is freely available. No output was created.
ocrmypdf_ecma_395_log.txt

How did you download and install the software?

No response

OCRmyPDF version

14.4.0

Relevant log output

ocrmypdf 14.4.0                                                                                                                                   __main__.py:57
Running: ['tesseract', '--version']                                                                                                              __init__.py:133
Found tesseract 5.3.0                                                                                                                            __init__.py:350
Running: ['tesseract', '--version']                                                                                                              __init__.py:133
Running: ['gs', '--version']                                                                                                                     __init__.py:133
Found gs 10.01.2                                                                                                                                 __init__.py:350
Running: ['gs', '--version']                                                                                                                     __init__.py:133
Running: ['tesseract', '--list-langs']                                                                                                           __init__.py:133
stdout/stderr = List of available languages in "/usr/share/tesseract/tessdata/" (1):                                                              __init__.py:73
eng                                                                                                                                                             
                                                                                                                                                                
os.symlink(ECMA-395_1st_edition_december_2010.pdf, /tmp/ocrmypdf.io.o4dxioqt/origin)                                                              helpers.py:160
os.symlink(/tmp/ocrmypdf.io.o4dxioqt/origin, /tmp/ocrmypdf.io.o4dxioqt/origin.pdf)                                                                helpers.py:160
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 130/130 0:00:00
Using Tesseract OpenMP thread limit 1                                                                                                       tesseract_ocr.py:176
Start processing 32 pages concurrently                                                                                                              

--------snipped because too long for github full log attached--------------

OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 130/130 0:00:00
Postprocessing...                                                                                                                                   _sync.py:307
os.symlink(/tmp/ocrmypdf.io.o4dxioqt/graft_layers.pdf, /tmp/ocrmypdf.io.o4dxioqt/fix_docinfo.pdf)                                                 helpers.py:160
Running: ['gs', '--version']                                                                                                                     __init__.py:133
Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None',                   __init__.py:133
'-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true',                             
'-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.o4dxioqt/fix_docinfo.pdf',                            
'/tmp/ocrmypdf.io.o4dxioqt/pdfa.ps']                                                                                                                            
GPL Ghostscript 10.01.2 (2023-06-21)                                                                                                             __init__.py:108
Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.                                                                                  __init__.py:108
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:                                                                       __init__.py:108
see the file COPYING for details.                                                                                                                __init__.py:108
Error: /typecheck in --runpdf--                                                                                                                  __init__.py:108
Operand stack:                                                                                                                                   __init__.py:108
--nostringval--   --nostringval--   --nostringval--   --nostringval--                                                                            __init__.py:108
Execution stack:                                                                                                                                 __init__.py:108
%interp_exit   .runexec2   --nostringval--   runpdf   --nostringval--   2   %stopped_push   --nostringval--   runpdf   runpdf   false   1        __init__.py:108
%stopped_push   1974   1   3   %oparray_pop   1973   1   3   %oparray_pop   1961   1   3   %oparray_pop   1962   1   3   %oparray_pop   runpdf                  
runpdf   runpdf   runpdf                                                                                                                                        
Dictionary stack:                                                                                                                                __init__.py:108
--dict:778/1123(ro)(G)--   --dict:0/20(G)--   --dict:87/200(L)--   --dict:18/20(L)--                                                             __init__.py:108
Current allocation mode is local                                                                                                                 __init__.py:108
GPL Ghostscript 10.01.2: Unrecoverable error, exit code 1                                                                                        __init__.py:108
GPL Ghostscript 10.01.2 (2023-06-21)                                                                                                          ghostscript.py:245
Copyright (C) 2023 Artifex Software, Inc.  All rights reserved.                                                                                                 
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:                                                                                      
see the file COPYING for details.                                                                                                                               
Error: /typecheck in --runpdf--                                                                                                                                 
Operand stack:                                                                                                                                                  
   --nostringval--   --nostringval--   --nostringval--   --nostringval--                                                                                        
Execution stack:                                                                                                                                                
   %interp_exit   .runexec2   --nostringval--   runpdf   --nostringval--   2   %stopped_push   --nostringval--   runpdf   runpdf   false   1                    
%stopped_push   1974   1   3   %oparray_pop   1973   1   3   %oparray_pop   1961   1   3   %oparray_pop   1962   1   3   %oparray_pop                           
runpdf   runpdf   runpdf   runpdf                                                                                                                               
Dictionary stack:                                                                                                                                               
   --dict:778/1123(ro)(G)--   --dict:0/20(G)--   --dict:87/200(L)--   --dict:18/20(L)--                                                                         
Current allocation mode is local                                                                                                                                
GPL Ghostscript 10.01.2: Unrecoverable error, exit code 1                                                                                                       
                                                                                                                                                                
ExitCodeException                                                                                                                                   _sync.py:430
Traceback (most recent call last):                                                                                                                              
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_exec/ghostscript.py", line 232, in generate_pdfa                                                            
    p = run_polling_stderr(                                                                                                                                     
        ^^^^^^^^^^^^^^^^^^^                                                                                                                                     
  File "/usr/lib/python3.11/site-packages/ocrmypdf/subprocess/__init__.py", line 114, in run_polling_stderr                                                     
    raise CalledProcessError(proc.returncode, args, output=None, stderr=stderr)                                                                                 
subprocess.CalledProcessError: Command '['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite',                              
'-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true',                                 
'-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr',                                          
'/tmp/ocrmypdf.io.o4dxioqt/fix_docinfo.pdf', '/tmp/ocrmypdf.io.o4dxioqt/pdfa.ps']' returned non-zero exit status 1.                                             
                                                                                                                                                                
The above exception was the direct cause of the following exception:                                                                                            
                                                                                                                                                                
Traceback (most recent call last):                                                                                                                              
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_sync.py", line 391, in run_pipeline                                                                         
    optimize_messages = exec_concurrent(context, executor)                                                                                                      
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                      
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_sync.py", line 308, in exec_concurrent                                                                      
    pdf, messages = post_process(pdf, context, executor)                                                                                                        
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                        
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_sync.py", line 239, in post_process                                                                         
    pdf_out = convert_to_pdfa(pdf_out, ps_stub_out, context)                                                                                                    
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                    
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_pipeline.py", line 790, in convert_to_pdfa                                                                  
    context.plugin_manager.hook.generate_pdfa(                                                                                                                  
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 265, in __call__                                                                              
    return self._hookexec(self.name, self.get_hookimpls(), kwargs, firstresult)                                                                                 
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                 
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 80, in _hookexec                                                                            
    return self._inner_hookexec(hook_name, methods, kwargs, firstresult)                                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                        
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 60, in _multicall                                                                           
    return outcome.get_result()                                                                                                                                 
           ^^^^^^^^^^^^^^^^^^^^                                                                                                                                 
  File "/usr/lib/python3.11/site-packages/pluggy/_result.py", line 60, in get_result                                                                            
    raise ex[1].with_traceback(ex[2])                                                                                                                           
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 39, in _multicall                                                                           
    res = hook_impl.function(*args)                                                                                                                             
          ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                             
  File "/usr/lib/python3.11/site-packages/ocrmypdf/builtin_plugins/ghostscript.py", line 77, in generate_pdfa                                                   
    ghostscript.generate_pdfa(                                                                                                                                  
  File "/usr/lib/python3.11/site-packages/ocrmypdf/_exec/ghostscript.py", line 246, in generate_pdfa                                                            
    raise SubprocessOutputError('Ghostscript PDF/A rendering failed') from e                                                                                    
ocrmypdf.exceptions.SubprocessOutputError: Ghostscript PDF/A rendering failed
@jbarlow83
Copy link
Collaborator

jbarlow83 commented Oct 9, 2023

--continue-on-soft-render-error will work around the issue.

This is likely a Ghostscript issue. Earlier versions of Ghostscript (9.55) don't produce this error. Some time around 10.x, it appeared. I cannot make sense of the error message from Ghostscript.

I could not identify any issue with the input PDF (as seen right before handing off to Ghostscript). iText RUPS, qpdf, Foxit report no syntax issues or other issues. I also tried deleting all of the metadata before processing, and the struct tree, and this does not fix the issue.

@jbarlow83 jbarlow83 added third party issue Problem with a third party dependency and removed bug labels Oct 9, 2023
@stumpylog
Copy link
Contributor

From the paperless-ngx side, this type of error is described here in our trouble shooting documentation along with the workaround.

@jbarlow83
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
third party issue Problem with a third party dependency
Projects
None yet
Development

No branches or pull requests

3 participants