Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Set ByteStream's mime_type attribute for web based resources #7681

Merged
merged 2 commits into from May 13, 2024

Conversation

vblagoje
Copy link
Member

@vblagoje vblagoje commented May 10, 2024

Why:

Enhances consistency and clarity in handling MIME types in ByteStream across Haystack. ByteStream's mime_type attribute gets set properly for web originating ByteStreams. In web originating ByteStream's content_type is still left intact in meta dictionary for backward compatibility.

What:

  • Sets mime_type attribute of ByteStream instead of indirect storage of MIME types within a meta dictionary. This change allows direct and uniform access to MIME types, simplifying handling for both web resources and local files.

How can it be used:

  • Fetching and Routing Based on MIME Types: Users can now straightforwardly fetch and route documents based on their MIME types, applying specific processing or handling routines directly:
    if stream.mime_type == "application/pdf":
        # Process PDF stream
  • Enhanced Clarity in Code: The management of MIME types is simplified, enhancing code readability and reducing potential for errors by avoiding nested data structures.

How did you test it:

  • Unit tests in test_file_router.py were adjusted to reflect the direct assignment and checking of mime_type.
  • Tests confirm correct assignment, access, and utilization of the mime_type attribute across components, ensuring accurate management and application for routing and processing.
  • Integration testing may be conducted to verify consistent utilization across processing pathways and to confirm no impact on existing functionalities.

Notes for the reviewer:

  • Please ensure the mime_type attribute implementation covers all necessary cases where MIME type information is accessed or modified.
  • Special attention might be required to ensure backward compatibility or migration from the previous approach using the meta dictionary.

@vblagoje vblagoje added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label May 10, 2024
@coveralls
Copy link
Collaborator

coveralls commented May 10, 2024

Pull Request Test Coverage Report for Build 9058615189

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 5 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.01%) to 90.402%

Files with Coverage Reduction New Missed Lines %
components/fetchers/link_content.py 5 78.31%
Totals Coverage Status
Change from base Build 9030307905: -0.01%
Covered Lines: 6527
Relevant Lines: 7220

💛 - Coveralls

@vblagoje vblagoje marked this pull request as ready for review May 13, 2024 07:01
@vblagoje vblagoje requested review from a team as code owners May 13, 2024 07:01
@vblagoje vblagoje requested review from dfokina and shadeMe and removed request for a team May 13, 2024 07:01
@vblagoje vblagoje removed the ignore-for-release-notes PRs with this flag won't be included in the release notes. label May 13, 2024
@vblagoje
Copy link
Member Author

@shadeMe @wochinge has all the context already and understands the nuances of the changes compared to the previous one we dropped. Assigning this review to him 🙏

@vblagoje vblagoje requested review from wochinge and removed request for shadeMe May 13, 2024 07:04
@vblagoje vblagoje changed the title feat: Set ByteStream mime_type in LinkContentFetcher, use mime_type exclusive for mime/content type feat: Use ByteStream's mime_type attribute for web based resources May 13, 2024
@vblagoje vblagoje changed the title feat: Use ByteStream's mime_type attribute for web based resources feat: Set ByteStream's mime_type attribute for web based resources May 13, 2024
Copy link
Contributor

@wochinge wochinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, clean solution! 💯

@vblagoje vblagoje merged commit 811b93d into main May 13, 2024
42 checks passed
@vblagoje vblagoje deleted the mime_type branch May 13, 2024 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FileTypeRouter should get mime type from ByteStream mime type attribute instead of `meta
3 participants