Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TypeError: '_io.BufferedRandom' #808

Open
dedalo944 opened this issue May 6, 2024 · 2 comments
Open

[BUG] TypeError: '_io.BufferedRandom' #808

dedalo944 opened this issue May 6, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@dedalo944
Copy link
Contributor

Describe the bug
Some pdf can't get insert into the rabbithole, the "problem" is in upload.py, row 50, deepcopy(file).
After the error, the FrontEnd crash and need to refresh.

To Reproduce

  1. Open the Cat
  2. Insert File
  3. File inserted, nothing happen
  4. This in console:
    INFO: 172.19.0.1:49624 - "POST /rabbithole/ HTTP/1.1" 500 Internal Server Error cheshire_cat_core | ERROR: Exception in ASGI application cheshire_cat_core | Traceback (most recent call last): cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi cheshire_cat_core | result = await app( # type: ignore[func-returns-value] cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__ cheshire_cat_core | return await self.app(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 292, in __call__ cheshire_cat_core | await super().__call__(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__ cheshire_cat_core | await self.middleware_stack(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__ cheshire_cat_core | raise exc cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__ cheshire_cat_core | await self.app(scope, receive, _send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 91, in __call__ cheshire_cat_core | await self.simple_response(scope, receive, send, request_headers=headers) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 146, in simple_response cheshire_cat_core | await self.app(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__ cheshire_cat_core | raise exc cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__ cheshire_cat_core | await self.app(scope, receive, sender) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__ cheshire_cat_core | raise e cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__ cheshire_cat_core | await self.app(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__ cheshire_cat_core | await route.handle(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle cheshire_cat_core | await self.app(scope, receive, send) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 66, in app cheshire_cat_core | response = await func(request) cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 273, in app cheshire_cat_core | raw_response = await run_endpoint_function( cheshire_cat_core | File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 190, in run_endpoint_function cheshire_cat_core | return await dependant.call(**values) cheshire_cat_core | File "/app/cat/routes/upload.py", line 50, in upload_file cheshire_cat_core | stray.rabbit_hole.ingest_file, stray, deepcopy(file), chunk_size, chunk_overlap cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy cheshire_cat_core | y = _reconstruct(x, memo, *rv) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct cheshire_cat_core | state = deepcopy(state, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy cheshire_cat_core | y = copier(x, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict cheshire_cat_core | y[deepcopy(key, memo)] = deepcopy(value, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 172, in deepcopy cheshire_cat_core | y = _reconstruct(x, memo, *rv) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 271, in _reconstruct cheshire_cat_core | state = deepcopy(state, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 146, in deepcopy cheshire_cat_core | y = copier(x, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 231, in _deepcopy_dict cheshire_cat_core | y[deepcopy(key, memo)] = deepcopy(value, memo) cheshire_cat_core | File "/usr/local/lib/python3.10/copy.py", line 161, in deepcopy cheshire_cat_core | rv = reductor(4) cheshire_cat_core | TypeError: cannot pickle '_io.BufferedRandom' object

Expected behavior
Should parse the text and insert into the vector memory.

Additional context
Putting file instead of deepcopy() works, but its not the correct procedure.

@dedalo944 dedalo944 added the bug Something isn't working label May 6, 2024
@AndreaValenti01
Copy link

I've the same issue.
@dedalo944 have you found a solution? The latest version works well on Windows but not on linux distro (i use debian)

@dedalo944
Copy link
Contributor Author

I've the same issue. @dedalo944 have you found a solution? The latest version works well on Windows but not on linux distro (i use debian)
this one into upload.py, for now

def deep_copy_upload_file(upload_file: UploadFile) -> UploadFile:
    file_bytes = upload_file.file.read()
    
    new_upload_file = UploadFile(
        filename=upload_file.filename,
        file=io.BytesIO(file_bytes)
    )
    
    return new_upload_file

# receive files via http endpoint
@router.post("/")
async def upload_file(
    request: Request,
    file: UploadFile,
    background_tasks: BackgroundTasks,
    chunk_size: int | None = Body(
        default=None,
        description="Maximum length of each chunk after the document is split (in tokens)",
    ),
    chunk_overlap: int | None = Body(default=None, description="Chunk overlap (in tokens)"),
    stray = Depends(session),
) -> Dict:
    """Upload a file containing text (.txt, .md, .pdf, etc.). File content will be extracted and segmented into chunks.
    Chunks will be then vectorized and stored into documents memory.
    """

    # Check the file format is supported
    admitted_types = stray.rabbit_hole.file_handlers.keys()

    # Get file mime type
    content_type = mimetypes.guess_type(file.filename)[0]
    log.info(f"Uploaded {content_type} down the rabbit hole")

    # check if MIME type of uploaded file is supported
    if content_type not in admitted_types:
        raise HTTPException(
            status_code=400,
            detail={
                "error": f'MIME type {content_type} not supported. Admitted types: {" - ".join(admitted_types)}'}
        )

    # upload file to long term memory, in the background
    background_tasks.add_task(
        # we deepcopy the file because FastAPI does not keep the file in memory after the response returns to the client
        # https://github.com/tiangolo/fastapi/discussions/10936
        stray.rabbit_hole.ingest_file, stray, deep_copy_upload_file(file), chunk_size, chunk_overlap
    )

    # reply to client
    return {
        "filename": file.filename,
        "content_type": file.content_type,
        "info": "File is being ingested asynchronously",
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants