Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues Encountered and Resolutions Discovered #93

Open
CamDuffy1 opened this issue Feb 2, 2024 · 0 comments
Open

Issues Encountered and Resolutions Discovered #93

CamDuffy1 opened this issue Feb 2, 2024 · 0 comments

Comments

@CamDuffy1
Copy link

I wanted to share the following issues I ran into while deploying this app, along with some resolutions discovered along the way:

Company names not appearing in the search bar on the landing page

  • When hosting the backend on Render and the frontend on Vercel, the backend environment variable BACKEND_CORS_ORIGINS must include the following:
    • The address of the frontend webpage, beginning with https:// and without a trailing slash (ex: https://xxx.vercel.app).
    • The API endpoint to get documents from the backend (ex: https://xxx.onrender.com/api/document), without a trailing slash.
    • Deploying backend to Cloud #11 (comment)
  • When hosting the app locally on Codespace, port 8000 needs to be set to public visibility. Visibility is set to private by default.

'Failed to load PDF file' on the conversation page

Upon inspecting the webpage using my browser's dev tools, I noticed there are two different causes for this issue:

  1. Mixed content error
    • When the frontend is deployed on Vercel, the value for the CDN_BASE_URL backend environment variable must begin with https://. This is because every deployment on Vercel is served over an HTTPS connection, so any requests for content served over HTTP will be blocked.
    • The seed_db.py script sets the URL of the documents to be fetched by the frontend web page. The CDN_BASE_URL environment variable must be set correctly before running this script. Otherwise, the web page will try to load PDF documents from incorrect URLs.
    • In my case, I had mistakenly set this variable to the public URL of the S3 bucket beginning with http://, resulting in the mixed content error.
  2. Blocked by CORS Policy
    • CORS must be configured on the S3 bucket used to store the website assets (i.e., PDF documents). Otherwise, the webpage cannot retrieve this content. This is the S3 bucket specified by the S3_ASSET_BUCKET_NAME backend environment variable.
    • This CORS configuration is not done using the BACKEND_CORS_ORIGINS backend environment variable. Rather, it is a setting on the S3 bucket itself.
    • CORS must be configured on the S3 bucket even when using CloudFront as a content delivery network for the S3 bucket.
    • If using CloudFront, set the CDN_BASE_URL backend environment variable to the CloudFront distribution domain name, prefixed with https://.
    • To configure CORS from the AWS console, go to the S3 bucket > Permission > Cross-origin resource sharing (CORS) > Edit; and define the CORS configuration through JSON (see AWS documentation - CORS configuration).
    • The CORS configuration must allow "GET" methods. The address of the frontend webpage deployed on Vercel must be listed as an allowed origin (ex: "AllowedOrigins": ["https://xxx.vercel.app"]).

Permission denied error while running the seed_db.py script with remote S3 buckets (i.e., not localstack)

  • The seed_db.py script runs as a result of any of the following commands from the backend directory:
    • make seed_db
    • make seed_db_preview
    • make seed_db_based_on_env
  • This script is also run by the Cron Job when triggered in Render.
  • The permission denied error I observed came as a result of trying to save the docstore, graph_store, and index_store JSON files to the S3 bucket specified by the backend environment variable S3_BUCKET_NAME.
  • Making the S3 bucket publicly available resolved this issue. This can be done by:
    • Unchecking all 'Block public access' settings in the permissions tab of the S3 bucket via the AWS Console.
    • Creating a bucket policy that allows all S3 actions ("Action":"s3:*") against the bucket and the objects within it ("Resource": ["<Bucket ARN>", "<Bucket ARN>/*"]).

The Web Service and Cron Job deployed on Render must be, at minimum, the standard instance type

  • The Web Service and Cron Job will run out of memory, causing an error, while deploying the backend or running a cron job if they are created using the Free or Starter instance types on Render.

Limitations

One limitation I noticed is that it doesn't seem possible to host the application locally (from Codespace or your desktop) while also using remote S3 buckets to store the StorageContext or app assets.

  • The RENDER backend environment variable is used to indicate whether the backend is hosted on Render.com. It can only take on the values of True or False.
  • It is automatically set to True when the backend is hosted and running on Render, and False otherwise.
  • The issue arises in config.py (backend/app/core/config.py), which is used to set the S3 endpoint URL.
    • If the backend is not running on Render, the S3 endpoint URL is set to "http://localhost:4566". This is the localstack endpoint.
    • So, from what I've observed, it is not possible to configure the S3 endpoint URL to reference anything other than the localstack S3 bucket when the backend is not hosted on Render.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant