Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mlserver example #1110

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Mlserver example #1110

wants to merge 2 commits into from

Conversation

robertgshaw2-neuralmagic
Copy link
Contributor

No description provided.


NUM_THREADS = 2
URL = "http://localhost:8080/v2/models/text-classification-model/infer"
sentences = ["I hate using GPUs for inference", "I love using DeepSparse on CPUs"] * 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should * 100 be * NUM_THREADS if we are taking only sentences[:NUM_THREADS] elements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsnm2 see suggestion below

@@ -0,0 +1,75 @@
# **Step 1: Installation**

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

best to add an intro paragraph to give users a heads up of what this example does.

Comment on lines +23 to +27
threads = [threading.Thread(target=tfunc, args=(sentence,)) for sentence in sentences[:NUM_THREADS]]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like this creates NUM_THREADS threads to make the request, is that intended?
Might make more sense to create len(sentences) threads and execute NUM_THREADS at a time.

You can do this out of the box with ThreadPoolExecutor with something like:

Suggested change
threads = [threading.Thread(target=tfunc, args=(sentence,)) for sentence in sentences[:NUM_THREADS]]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
from concurrent.futures.thread import ThreadPoolExecutor
threadpool = ThreadPoolExecutor(max_workers=NUM_THREADS)
results = threadpool.map(tfunc, sentences)

URL = "http://localhost:8080/v2/models/text-classification-model/infer"
sentences = ["I hate using GPUs for inference", "I love using DeepSparse on CPUs"] * 100

def tfunc(text):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would rename to something more descriptive like inference_request

Comment on lines +19 to +20
for output in resp["outputs"]:
print(output["data"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executing a list printout while multithreaded may cause a race condition, any reason to not return the value and print in sequence at the end? (ie consider thread 1 and thread 2 happen to execute exactly at the same time, they will print their lines at the same time and might not tell which is which)


NUM_THREADS = 2
URL = "http://localhost:8080/v2/models/text-classification-model/infer"
sentences = ["I hate using GPUs for inference", "I love using DeepSparse on CPUs"] * 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsnm2 see suggestion below

@@ -0,0 +1,27 @@
import requests, threading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest a few in line comments for self-documentation

task = self._settings.parameters.task,
model_path = self._settings.parameters.model_path,
batch_size = self._settings.parameters.batch_size,
sequence_length = self._settings.parameters.sequence_length,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a place for generic kwargs in the settings? Would be cool if we could use that instead to dump extra pipeline args so we can get full generic pipeline support out of the box

@@ -0,0 +1,19 @@
from mlserver import MLModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, love that it works out of the box - let's throw in the serving command as a comment just for convenience

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants