Make ExecutionVariables available in Sagemaker Jobs #4676

lorenzwalthert · 2024-05-13T14:00:33Z

Describe the feature you'd like
Make ExecutionVariables available in Training and Processing Jobs.

How would this feature be used? Please describe.

Let's assume I want to create an S3 URI that involves the current pipeline execution ID. I can do that with Join() and ExecutionVariables.PIPELINE_EXECUTION_ID from sagemaker.workflow.execution_variables. However, Join() is the only operation supported to my knowledge, I can't just do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc). A workaround for that is defer such logic to a Lamba step or processing step. However, sagemaker.workflow.execution_variables.ExecutionVariables.PIPELINE_EXECUTION_ID seems not available in a processing job, with or without .to_string():

TypeError: Pipeline variables do not support __str__ operation. Please use `.to_string()` to convert it to string type in execution time or use `.expr` to translate it to Json for display purpose in Python SDK.

Not sure this behaviour applies to ExecutionVariables only or to all PipelineVariables. In the latter case, the problem seems to have a bigger scope.

Describe alternatives you've considered

Resolve the parameter before entering the container context and pass it as an argument.

The text was updated successfully, but these errors were encountered:

qidewenwhen · 2024-05-13T20:30:16Z

Hi @ lorenzwalthert, thanks for reaching out!

Not sure this behaviour applies to ExecutionVariables only or to all PipelineVariables.

For this, yes, the behavior applies to all PipelineVariables. This is because PipelineVariables are placeholders in compile time and are only parsed in pipeline execution time. Thus, we can not do the following in SDK when defining a pipeline definition.

do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc)

Currently we only provide the Join and JsonGet functions in SDK to perform operation on the PipelineVariables in execution time. We may not plan to add more such functions in the near future.

Hence, for other operations, leveraging a LambdaStep can be one solution.

Besides LambdaStep, as you're using training and processing steps, can you try out our recently launched new feature - @step and see if it can get you out of this issue?
https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-step-decorator.html.

In your case, the code can be similar to the following.

Note: because the custom_func runs in pipeline execution time when the ExecutionVariables.PIPELINE_EXECUTION_ID or exe_var has already been parsed, we can do any python primitive string operations to it.

    from sagemaker.workflow.function_step import step

    @step(
        name="...",
        keep_alive_period_in_seconds=600,
        ...
    )
    def custom_func(exe_var):
        # Add your ML logics here, which will be run in a training job in pipeline execution time
        return exe_var.[0:2] # <<<<<<<<<<<<<<<<<<<<<<<<<<<<

    custom_func_output = custom_func(
        exe_var=ExecutionVariables.PIPELINE_EXECUTION_ID,
    )

    pipeline = Pipeline(
        name=pipeline_name,
        steps=[custom_func_output],
        sagemaker_session=sagemaker_sessione,
    )

   pipeline.create(role)

   execution = pipeline.start()

qidewenwhen · 2024-06-03T21:16:52Z

Closing this issue as we did not get response in the last 3 week. Feel free to reopen if you have further questions. Thanks!

qidewenwhen added the component: pipelines Relates to the SageMaker Pipeline Platform label May 13, 2024

qidewenwhen added the type: question label May 13, 2024

liujiaorr assigned liujiaorr and qidewenwhen and unassigned qidewenwhen and liujiaorr May 28, 2024

qidewenwhen closed this as completed Jun 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ExecutionVariables available in Sagemaker Jobs #4676

Make ExecutionVariables available in Sagemaker Jobs #4676

lorenzwalthert commented May 13, 2024 •

edited

qidewenwhen commented May 13, 2024

qidewenwhen commented Jun 3, 2024

Make ExecutionVariables available in Sagemaker Jobs #4676

Make ExecutionVariables available in Sagemaker Jobs #4676

Comments

lorenzwalthert commented May 13, 2024 • edited

qidewenwhen commented May 13, 2024

qidewenwhen commented Jun 3, 2024

lorenzwalthert commented May 13, 2024 •

edited