You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the feature you'd like
Make ExecutionVariables available in Training and Processing Jobs.
How would this feature be used? Please describe.
Let's assume I want to create an S3 URI that involves the current pipeline execution ID. I can do that with Join() and ExecutionVariables.PIPELINE_EXECUTION_ID from sagemaker.workflow.execution_variables. However, Join() is the only operation supported to my knowledge, I can't just do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc). A workaround for that is defer such logic to a Lamba step or processing step. However, sagemaker.workflow.execution_variables.ExecutionVariables.PIPELINE_EXECUTION_ID seems not available in a processing job, with or without .to_string():
Not sure this behaviour applies to ExecutionVariables only or to all PipelineVariables.
For this, yes, the behavior applies to all PipelineVariables. This is because PipelineVariables are placeholders in compile time and are only parsed in pipeline execution time. Thus, we can not do the following in SDK when defining a pipeline definition.
do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving a PipelineVariable etc)
Currently we only provide the Join and JsonGet functions in SDK to perform operation on the PipelineVariables in execution time. We may not plan to add more such functions in the near future.
Hence, for other operations, leveraging a LambdaStep can be one solution.
In your case, the code can be similar to the following.
Note: because the custom_func runs in pipeline execution time when the ExecutionVariables.PIPELINE_EXECUTION_ID or exe_var has already been parsed, we can do any python primitive string operations to it.
from sagemaker.workflow.function_step import step
@step(
name="...",
keep_alive_period_in_seconds=600,
...
)
def custom_func(exe_var):
# Add your ML logics here, which will be run in a training job in pipeline execution time
return exe_var.[0:2] # <<<<<<<<<<<<<<<<<<<<<<<<<<<<
custom_func_output = custom_func(
exe_var=ExecutionVariables.PIPELINE_EXECUTION_ID,
)
pipeline = Pipeline(
name=pipeline_name,
steps=[custom_func_output],
sagemaker_session=sagemaker_sessione,
)
pipeline.create(role)
execution = pipeline.start()
Describe the feature you'd like
Make ExecutionVariables available in Training and Processing Jobs.
How would this feature be used? Please describe.
Let's assume I want to create an S3 URI that involves the current pipeline execution ID. I can do that with
Join()
andExecutionVariables.PIPELINE_EXECUTION_ID
fromsagemaker.workflow.execution_variables
. However,Join()
is the only operation supported to my knowledge, I can't just do arbitrary transformations involving Pipeline variables (e.g. taking a substring, performing aritmetic with float or int parameters, evaluating an if condition involving aPipelineVariable
etc). A workaround for that is defer such logic to a Lamba step or processing step. However,sagemaker.workflow.execution_variables.ExecutionVariables.PIPELINE_EXECUTION_ID
seems not available in a processing job, with or without.to_string()
:Not sure this behaviour applies to
ExecutionVariable
s only or to allPipelineVariable
s. In the latter case, the problem seems to have a bigger scope.Describe alternatives you've considered
Resolve the parameter before entering the container context and pass it as an argument.
The text was updated successfully, but these errors were encountered: