Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: onnx expose session options #346

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

russellbrooks
Copy link

Optional passthrough onnxruntime.SessionOptions to the underlying ONNX InferenceSession – re: slack thread

When inference is running in a virtualized environment, e.g. docker container, the ONNX inference session infers the underlying hardware of the host machine such as number of CPU cores, rather than what the container actually has access to (similar issues can arise in other python multiprocessing tools). This can result in oversaturating the CPU cores and having noisy-neighbor issues across containers sharing the host (e.g. a large EC2 with 96 cores).

These passthrough options can be specified using the existing JSON file pointed to via the env variable UNSTRUCTURED_DEFAULT_MODEL_INITIALIZE_PARAMS_JSON_PATH and as an example, here's JSON that would limit the model inference to 4 CPU cores:

{
    "model_name": "detectron2_onnx",
    "session_options_dict": {
        "intra_op_num_threads": 4,
        "inter_op_num_threads": 4
    }
}

Param reference:

  • Intra-Op Parallelism: This setting controls the number of threads used for parallel execution within a single operator. For example, if a matrix multiplication operation can be parallelized, this setting determines how many threads will work on that operation. Default is 0 to let onnxruntime choose.
  • Inter-Op Parallelism: This controls the number of threads that can run different operators in parallel. For instance, if the model architecture allows for multiple layers or operations to execute simultaneously (i.e., they are not dependent on the output of each other), this setting manages how many of such operations can run at the same time. Default is 0 to let onnxruntime choose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant