-
Notifications
You must be signed in to change notification settings - Fork 4
Change how read/write args are read from metadata #15
Comments
Following up on And we then access the args as follows: assert context.metadata is not None
io_args = context.metadata.get(self.metadata_args_key, {}) |
I like your ideas, I'll probably implement them. However, I'd like to wait a little to see if dagster-io/dagster#15125 would get any traction. |
Please, think about keeping the IOManager-related config in the configuration of IOManager itself. There is already the |
Hey @ei-grad, thanks for your interest in the lib! I don't think all parameters should go into the config. I see two reasons for including them there:
Now, you are saying some parameters can't be expressed in terms of metadata, but can be expressed in terms of configuration. This is a bit surprising to me, because as I understand similar constraints (for example, being json-like) are applied in both cases. Would you be able to explain how exactly does metadata "not play well" in the cases you brought up above, please? By the way, in terms of partitioning I was thinking to reuse the Thanks again! |
The current implementation is unnecessarily susceptible to breaking changes. Say a new version of Polars (or
pyarrow.dataset
) removes/renames a givenwrite_parquet
argument. Because the arguments are currently hard-coded in thePolarsParquetIOManager
, this would break user scripts if they upgrade to the newer version of Polars.I think a good solution would be switching to a dictionary (called something along the lines of
polars_io_args
-- I'm sure there's a better name) as follows:And the
dump_df_to_path
andscan_df_from_path
method then forward these args to Polars, without hard-coding their names:This
Finally, this might also enable (in the future) 'hybrid' IO managers which save the same asset in two ways, because you can separate each IO manager's args in different keys, even if the names of the children-args clash/overlap.
The only problem with the above solution is that I can't think of a "clean" solution that also allows you to pass
allow_pyarrow_filer
values becauseio_args
is unpacked at a different place.The text was updated successfully, but these errors were encountered: