Add support for `llm` in `spacy init config` #12820

rmitsch · 2023-07-13T06:50:35Z

Description

Add support for llm component. in spacy init config

Types of change

Enhancement.

Checklist

I confirm that I have the right to submit this contribution under the project's MIT license.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

rmitsch · 2023-07-14T20:13:07Z

As of now spacy-llm is required if llm is added in the pipeline. This is because we need its registry entries to resolve --llm.task ner to e. g. spacy.NER.v2 (note that it the current state we don't select the highest version, this is a todo). This means we'd have to install spacy-llm for the CI tests though. Are we ok with that?

Alternatively we could also required the registry handle to be set explicitly: e. g. --llm.task spacy.NER.v2. In this case there's no need to require spacy-llm, but it's somewhat less consistent with the way --pipeline is configured.

Opinions?

adrianeboyd · 2023-07-17T11:54:14Z

spacy/cli/templates/quickstart_training.jinja

@@ -3,7 +3,7 @@ the docs and the init config command. It encodes various best practices and
 can help generate the best possible configuration, given a user's requirements. #}
 {%- set use_transformer = hardware != "cpu" and transformer_data -%}
 {%- set transformer = transformer_data[optimize] if use_transformer else {} -%}
-{%- set listener_components = ["tagger", "morphologizer", "parser", "ner", "textcat", "textcat_multilabel", "entity_linker", "span_finder", "spancat", "spancat_singlelabel", "trainable_lemmatizer"] -%}
+{%- set listener_components = ["tagger", "morphologizer", "parser", "ner", "textcat", "textcat_multilabel", "entity_linker", "span_finder", "spancat", "spancat_singlelabel", "trainable_lemmatizer", "llm"] -%}


llm shouldn't be in this list of components with listeners, which will cause tok2vec or transformer to be added (depending on the rest of the template, of course).

Makes sense. I vaguely remember that the LLM block was not output properly if "llm" was not in listener_components. I'll check.

adrianeboyd · 2023-07-17T11:55:16Z

spacy/cli/templates/quickstart_training.jinja

+{% if "llm" in components -%}
+[components.llm]
+factory = "llm"
+
+[components.llm.model]
+@llm_models = "{{ llm_spec['model'] }}"
+
+[components.llm.task]
+@llm_tasks = "{{ llm_spec['task'] }}"
+{% endif -%}
+


I think (depending on exactly how this gets designed in the end), it should be possible to only have this block once because it does not depend on the listener setup.

adrianeboyd · 2023-07-17T11:57:19Z

spacy/tests/test_cli.py

+@pytest.mark.parametrize("pipeline", [["llm"]])
+@pytest.mark.parametrize("llm_model", ["noop"])
+@pytest.mark.parametrize("llm_task", ["ner", "sentiment"])
+def test_init_config_llm(pipeline, llm_model, llm_task):


I think this test should probably be in spacy-llm unless it's installed by default (similar to spacy-transformers, where I'm not sure this is tested at all, but where it could be tested).

IMO we should consider installing spacy-llm by default going forward. At some point we want to move it to the core codebase, but for now, having it installed by default + kept as separate repo has the advantages that we can still release updates more quickly with spacy-llm while not troubling users with an additional install command. The requirements are minimal.

…console

svlandeg

This is currently not working for most OS HF models, as they don't have a default name like e.g. ""spacy.GPT-3-5.v1" does.

For instance, spacy init config myconfig.cfg -p "llm" --llm.model "Falcon" --llm.task 'ner'

will fail with

✘ Config validation error
llm.model -> name       field required
{'@llm_models': 'spacy.Falcon.v1'}

svlandeg · 2023-07-25T16:21:07Z

spacy/cli/init_config.py

+                valid_values.add(reg_name)
+                if reg_name.lower() == user_value:
+                    spec["matched_reg_handle"] = reg_handle
+                    break


If there are 2 versions of a model/task, is this code guaranteed to give you the latest version?

It doesn't - I just checked - it produces spacy.NER.v1 instead of spacy.NER.v2 when given "ner"

No, it just grabs the first occurence as of now. I left that as TBD since it wasn't clear how we want to go forward.

Update 'spacy init config' to support 'llm' component.

97fa11f

rmitsch added enhancement Feature requests and improvements feat / cli Feature: Command-line interface labels Jul 13, 2023

rmitsch self-assigned this Jul 13, 2023

rmitsch added 3 commits July 13, 2023 08:56

Format.

564c2b0

Format.

4236942

Fix mypy issues.

f9dddbb

rmitsch mentioned this pull request Jul 13, 2023

Warning instead of error on auth verification failure in REST models explosion/spacy-llm#218

Merged

3 tasks

Remove incorrect argument shorthands for LLM properties.

9cf8ac8

adrianeboyd reviewed Jul 17, 2023

View reviewed changes

svlandeg added 3 commits July 25, 2023 18:05

use msg.fail instead of raising error to provide nicer output on the …

8da042c

…console

better representation of valid values

0948dc1

update msg

a3bf379

svlandeg reviewed Jul 25, 2023

View reviewed changes

svlandeg changed the base branch from master to main January 29, 2024 09:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for `llm` in `spacy init config` #12820

Add support for `llm` in `spacy init config` #12820

rmitsch commented Jul 13, 2023 •

edited

rmitsch commented Jul 14, 2023

adrianeboyd Jul 17, 2023 •

edited

rmitsch Jul 26, 2023

adrianeboyd Jul 17, 2023

adrianeboyd Jul 17, 2023

svlandeg Jul 17, 2023 •

edited

svlandeg left a comment •

edited

svlandeg Jul 25, 2023

svlandeg Jul 25, 2023

rmitsch Jul 26, 2023

Add support for llm in spacy init config #12820

Are you sure you want to change the base?

Add support for llm in spacy init config #12820

Conversation

rmitsch commented Jul 13, 2023 • edited

Description

Types of change

Checklist

rmitsch commented Jul 14, 2023

adrianeboyd Jul 17, 2023 • edited

Choose a reason for hiding this comment

rmitsch Jul 26, 2023

Choose a reason for hiding this comment

adrianeboyd Jul 17, 2023

Choose a reason for hiding this comment

adrianeboyd Jul 17, 2023

Choose a reason for hiding this comment

svlandeg Jul 17, 2023 • edited

Choose a reason for hiding this comment

svlandeg left a comment • edited

Choose a reason for hiding this comment

svlandeg Jul 25, 2023

Choose a reason for hiding this comment

svlandeg Jul 25, 2023

Choose a reason for hiding this comment

rmitsch Jul 26, 2023

Choose a reason for hiding this comment

Add support for `llm` in `spacy init config` #12820

Add support for `llm` in `spacy init config` #12820

rmitsch commented Jul 13, 2023 •

edited

adrianeboyd Jul 17, 2023 •

edited

svlandeg Jul 17, 2023 •

edited

svlandeg left a comment •

edited