Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ExecutionVariables available in Sagemaker Jobs #4676

Closed
lorenzwalthert opened this issue May 13, 2024 · 2 comments
Closed

Make ExecutionVariables available in Sagemaker Jobs #4676

lorenzwalthert opened this issue May 13, 2024 · 2 comments
Assignees
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: question

Comments

@lorenzwalthert
Copy link

lorenzwalthert commented May 13, 2024

Describe the feature you'd like
Make ExecutionVariables available in Training and Processing Jobs.

How would this feature be used? Please describe.

Let's assume I want to create an S3 URI that involves the current pipeline execution ID. I can do that with Join() and ExecutionVariables.PIPELINE_EXECUTION_ID from sagemaker.workflow.execution_variables. However, Join() is the only operation supported to my knowledge, I can't just do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc). A workaround for that is defer such logic to a Lamba step or processing step. However, sagemaker.workflow.execution_variables.ExecutionVariables.PIPELINE_EXECUTION_ID seems not available in a processing job, with or without .to_string():

TypeError: Pipeline variables do not support __str__ operation. Please use `.to_string()` to convert it to string type in execution time or use `.expr` to translate it to Json for display purpose in Python SDK.

Not sure this behaviour applies to ExecutionVariables only or to all PipelineVariables. In the latter case, the problem seems to have a bigger scope.

Describe alternatives you've considered

Resolve the parameter before entering the container context and pass it as an argument.

@qidewenwhen qidewenwhen added the component: pipelines Relates to the SageMaker Pipeline Platform label May 13, 2024
@qidewenwhen
Copy link
Member

Hi @ lorenzwalthert, thanks for reaching out!

Not sure this behaviour applies to ExecutionVariables only or to all PipelineVariables.

For this, yes, the behavior applies to all PipelineVariables. This is because PipelineVariables are placeholders in compile time and are only parsed in pipeline execution time. Thus, we can not do the following in SDK when defining a pipeline definition.

do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc)

Currently we only provide the Join and JsonGet functions in SDK to perform operation on the PipelineVariables in execution time. We may not plan to add more such functions in the near future.

Hence, for other operations, leveraging a LambdaStep can be one solution.

Besides LambdaStep, as you're using training and processing steps, can you try out our recently launched new feature - @step and see if it can get you out of this issue?
https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-step-decorator.html.

In your case, the code can be similar to the following.

Note: because the custom_func runs in pipeline execution time when the ExecutionVariables.PIPELINE_EXECUTION_ID or exe_var has already been parsed, we can do any python primitive string operations to it.

    from sagemaker.workflow.function_step import step

    @step(
        name="...",
        keep_alive_period_in_seconds=600,
        ...
    )
    def custom_func(exe_var):
        # Add your ML logics here, which will be run in a training job in pipeline execution time
        return exe_var.[0:2] # <<<<<<<<<<<<<<<<<<<<<<<<<<<<

    custom_func_output = custom_func(
        exe_var=ExecutionVariables.PIPELINE_EXECUTION_ID,
    )

    pipeline = Pipeline(
        name=pipeline_name,
        steps=[custom_func_output],
        sagemaker_session=sagemaker_sessione,
    )

   pipeline.create(role)

   execution = pipeline.start()

@qidewenwhen
Copy link
Member

Closing this issue as we did not get response in the last 3 week. Feel free to reopen if you have further questions. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: question
Projects
None yet
Development

No branches or pull requests

3 participants