Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the file name contains single quotes, an error occurs in Document Set Syncing. #1431

Open
slovx2 opened this issue May 8, 2024 · 0 comments

Comments

@slovx2
Copy link

slovx2 commented May 8, 2024

I used the File Connector to upload files, one of which is named: Error 6000- Characters Aren't Positive Integers.pdf. The indexing process completed normally.

However, when I create a Document Set to import the content from the above document, the UI continuously shows that it is syncing.

The background error log shows:

05/08/2024 07:29:23 AM             index.py 175 : Error occurred getting chunk by Document ID FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf:
Headers: {'User-Agent': 'python-requests/2.31.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '257', 'Content-Type': 'application/json'}
Payload: {'yql': "select documentid from danswer_chunk_intfloat_multilingual_e5_small where document_id contains 'FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf'", 'timeout': '10s', 'offset': 0, 'hits': 128}
Status Code: 400
Response Content: {"root":{"id":"toplevel","relevance":1.0,"fields":{"totalCount":0},"errors":[{"code":4,"summary":"Invalid query parameter","message":"Could not create query from YQL: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}","stackTrace":"com.yahoo.processing.IllegalInputException: com.yahoo.search.yql.ProgramCompileException: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}\n\tat com.yahoo.search.yql.YqlParser.parseYqlProgram(YqlParser.java:888)\n\tat com.yahoo.search.yql.YqlParser.parse(YqlParser.java:275)\n\tat com.yahoo.search.yql.MinimalQueryInserter.insertQuery(MinimalQueryInserter.java:95)\n\tat com.yahoo.search.yql.MinimalQueryInserter.search(MinimalQueryInserter.java:80)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.searcher.FieldCollapsingSearcher.search(FieldCollapsingSearcher.java:90)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.querytransform.PhrasingSearcher.search(PhrasingSearcher.java:60)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.prelude.statistics.StatisticsSearcher.search(StatisticsSearcher.java:235)\n\tat com.yahoo.search.Searcher.process(Searcher.java:134)\n\tat com.yahoo.processing.execution.Execution.process(Execution.java:112)\n\tat com.yahoo.search.searchchain.Execution.search(Execution.java:499)\n\tat com.yahoo.search.handler.SearchHandler.searchAndFill(SearchHandler.java:348)\n\tat com.yahoo.search.handler.SearchHandler.search(SearchHandler.java:393)\n\tat com.yahoo.search.handler.SearchHandler.handleBody(SearchHandler.java:269)\n\tat com.yahoo.search.handler.SearchHandler.handle(SearchHandler.java:178)\n\tat com.yahoo.container.jdisc.ThreadedHttpRequestHandler.handle(ThreadedHttpRequestHandler.java:77)\n\tat com.yahoo.container.jdisc.ThreadedHttpRequestHandler.handleRequest(ThreadedHttpRequestHandler.java:87)\n\tat com.yahoo.container.jdisc.ThreadedRequestHandler$RequestTask.processRequest(ThreadedRequestHandler.java:191)\n\tat com.yahoo.container.jdisc.ThreadedRequestHandler$RequestTask.run(ThreadedRequestHandler.java:185)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)\n\tat java.base/java.lang.Thread.run(Thread.java:840)\nCaused by: com.yahoo.search.yql.ProgramCompileException: query:L1:177 mismatched input 't' expecting {<EOF>, 'select', ';'}\n\tat com.yahoo.search.yql.ProgramParser$ErrorListener.syntaxError(ProgramParser.java:91)\n\tat org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)\n\tat org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)\n\tat org.antlr.v4.runtime.DefaultErrorStrategy.reportInputMismatch(DefaultErrorStrategy.java:327)\n\tat org.antlr.v4.runtime.DefaultErrorStrategy.reportError(DefaultErrorStrategy.java:139)\n\tat com.yahoo.search.yql.yqlplusParser.program(yqlplusParser.java:358)\n\tat com.yahoo.search.yql.ProgramParser.parseProgram(ProgramParser.java:111)\n\tat com.yahoo.search.yql.ProgramParser.parse(ProgramParser.java:122)\n\tat com.yahoo.search.yql.YqlParser.parseYqlProgram(YqlParser.java:886)\n\t... 29 more\n"}]}}
Exception: 400 Client Error: Bad Request for url: http://index:8081/search/
05/08/2024 07:29:23 AM            celery.py 170 : Failed to sync document set 4
Traceback (most recent call last):
  File "/app/danswer/document_index/vespa/index.py", line 168, in _get_vespa_chunk_ids_by_document_id
    res.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: http://index:8081/search/

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/danswer/background/celery/celery.py", line 143, in sync_document_set_task
    _sync_document_batch(
  File "/app/danswer/background/celery/celery.py", line 128, in _sync_document_batch
    document_index.update(update_requests=update_requests)
  File "/app/danswer/document_index/vespa/index.py", line 839, in update
    for doc_chunk_id in _get_vespa_chunk_ids_by_document_id(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/danswer/document_index/vespa/index.py", line 181, in _get_vespa_chunk_ids_by_document_id
    raise requests.HTTPError(error_base) from e
requests.exceptions.HTTPError: Error occurred getting chunk by Document ID FILE_CONNECTOR__98f48aa9-b9de-4852-b321-f4f66fbef794/Error 6000- Characters Aren't Positive Integers .pdf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant