Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add namedtuple, pyspark, ibis, lc to SDK coverage #895

Merged
merged 8 commits into from May 14, 2024

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented May 8, 2024

This is an SDK update to capture more data types.

These all push to a dict -- we could create specific UI experiences for them, will do so in the future.

Changes

  • Fix for not breaking on lazily imported modules

SDK now can capture:

  • ibis Tables -- schema
  • pyspark dataframes -- schema & plans
  • langchain docs -- content
  • langchain messages -- content
  • namedtuples -- coverts to dict

How I tested this

locally

Notes

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz force-pushed the add_nametuple_support_to_sdk branch 2 times, most recently from 1e5a616 to 6758624 Compare May 8, 2024 01:00
@skrawcz skrawcz marked this pull request as ready for review May 8, 2024 01:01
This was missed. Adds matrix to run over different python versions.

We don’t do 3.12 because ray doesn’t support that yet. Should be fine.
They can be converted to dictionaries easily. So we do that.
Otherwise we special case 'secret_key' since that's a legacy way
we were telling people to wrap API Keys with.

Fixes some unit tests that weren't updated.
These are lazily done - so best we can do is get some
schema and maybe a query plan in the case of pyspark.
Because some dependencies might not exist -- because they
are lazily loaded. This guards against that.
This covers documents and messages. It is basic, and not
mean to really enable deserialization, but we do have the
components here to know which is which.

Otherwise rather than a special type, using dict for now. Can
always change it and coordinate with the UI.

This also adds missing tests for the other new SDK additions.

Note: we use `_stats` as the prefix for these files to indicate that there could
be missing information. If we wanted to properly cache etc,
we could do so, but we'd want to do it via another means I think.
@skrawcz skrawcz force-pushed the add_nametuple_support_to_sdk branch from 6758624 to 1968b43 Compare May 13, 2024 20:26
@skrawcz skrawcz mentioned this pull request May 13, 2024
7 tasks
@skrawcz skrawcz changed the title Add nametuple support to sdk Add namedtuple, pyspark, ibis, lc to SDK coverage May 13, 2024
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

ui/sdk/requirements-test.txt Show resolved Hide resolved
ui/sdk/src/hamilton_sdk/driver.py Show resolved Hide resolved
ui/sdk/src/hamilton_sdk/tracking/ibis_stats.py Outdated Show resolved Hide resolved
ui/sdk/tests/tracking/test_stats.py Show resolved Hide resolved
To clarify / provide more color to the documentation.
@skrawcz skrawcz merged commit 683e657 into main May 14, 2024
24 of 26 checks passed
@skrawcz skrawcz deleted the add_nametuple_support_to_sdk branch May 14, 2024 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants