Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LangSmith "Criteria()" and "LabeledCriteria()" do not accept custom criteria in TypeScript, despite examples #5188

Closed
5 tasks done
w-biggs opened this issue Apr 23, 2024 · 5 comments · Fixed by #5403
Closed
5 tasks done
Assignees
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@w-biggs
Copy link

w-biggs commented Apr 23, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain.js documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain.js rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Neither of these work. The first is what is implied to work by the comments in

, and the second is what I would expect to work based on the Python equivalent code.

const evalResult = await runOnDataset(
  llmFunction,
  datasetName,
  {
    evaluators: [{
      evaluatorType: "criteria",
      criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
    }],
  },
);
const compliantCriteria = Criteria({
  evaluatorType: "criteria",
  criteria: { "isCompliant": "Does the submission comply with the requirements of XYZ" }
});

const evalResult = await runOnDataset(
  llmFunction,
  datasetName,
  {
    evaluators: [compliantCriteria],
  },
);

Error Message and Stack Trace (if applicable)

No response

Description

There seems to be no way to use custom criteria in an criteria evaluator, which differs from the Python implementation.

The first method above does not seem to work -- the object is not compatible with what it wants to be in the evaluators array. The second method does not work either -- Criteria()'s argument is a Criteria type instead of a CriteriaLike, and if that line is @ts-ignore'd, it seems to fail with a 422 "expected a string" error when sending "feedback" to the LangSmith server.

Putting this issue in here because the code is in this repo and not in LangSmith proper (yet?)

System Info

langchain@0.1.35
Node v21.5.0

Copy link

dosubot bot commented Apr 23, 2024

It looks like the issue you're encountering with Criteria() and LabeledCriteria() in TypeScript stems from the way custom criteria are expected to be defined and used within the LangChain JS library. Unlike the Python implementation, where you might directly pass custom criteria as objects, LangChain JS, particularly with LangSmith, requires a more structured approach.

For custom criteria in TypeScript, you should leverage the CriteriaEvalChain class for defining and evaluating custom criteria. Here's a streamlined example on how to define and use a single custom criterion:

import { CriteriaEvalChain } from '@langchain/evaluation/criteria';
import { BaseLanguageModelInterface } from '@langchain/core/language_models/base';

const llm: BaseLanguageModelInterface = /* your language model instance */;

const customCriterion = {
  creativity: "Does the submission demonstrate novelty or unique ideas?",
};

const criteriaEvalChain = await CriteriaEvalChain.fromLLM(llm, customCriterion);

const input = "Tell me a joke";
const prediction = "I ate some square pie but I don't know the square of pi.";

const res = await criteriaEvalChain.evaluateStrings({ input, prediction });

console.log(res);

This example demonstrates defining a creativity criterion and evaluating a prediction against it. For multiple criteria, simply expand the customCriterion object with additional key-value pairs representing each criterion.

Remember, the CriteriaEvalChain class is key to using custom criteria effectively in TypeScript with LangChain JS. This approach provides a structured and flexible way to evaluate predictions against custom-defined criteria, offering detailed reasoning, values, and scores for each criterion.

If you're still facing issues or have more questions, feel free to ask!

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@dosubot dosubot bot added the auto:bug Related to a bug, vulnerability, unexpected error with an existing feature label Apr 23, 2024
@w-biggs
Copy link
Author

w-biggs commented Apr 24, 2024

@dosu This doesn't apply in this scenario -- I don't want to create an eval chain, I need to create an EvalConfig, since that is what runOnDataset() accepts as an argument.

Copy link

dosubot bot commented Apr 24, 2024

To address the issue with using custom criteria in an EvalConfig for runOnDataset() in TypeScript with LangChain.js, it's important to ensure that your custom criteria are correctly defined and formatted to be compatible with the expectations of the LangSmith server. Here's a concise guide to help you set up your custom criteria correctly:

  1. Define Custom Criteria Using the Criteria Function: When creating a custom criteria configuration, use the Criteria function. Make sure the criteria passed are stringifiable since the LangSmith server expects string values for evaluation.

  2. Stringify Values: All values passed to the criteria property must be stringifiable to avoid the "422 expected a string" error. This includes ensuring objects passed as criteria are correctly converted to strings.

  3. Format Evaluator Inputs Correctly: Use a custom function to format the evaluation data (input, prediction, reference) so that it's correctly stringified and formatted for the LangSmith server. This step is crucial for ensuring compatibility and avoiding errors.

  4. Example Configuration:

const customCriteriaConfig = Criteria("helpfulness", {
  formatEvaluatorInputs: ({ rawInput, rawPrediction }) => ({
    prediction: getSingleStringifiedValue(rawPrediction),
    input: getSingleStringifiedValue(rawInput),
  }),
  llm: /* your language model instance */,
  feedbackKey: "customFeedbackKey" // Optional: specify if needed
});

In this setup, Criteria is utilized to define a custom criteria evaluator named "helpfulness". The formatEvaluatorInputs function ensures that inputs and predictions are stringified, addressing the server's requirements.

  1. Incorporate Custom Criteria into EvalConfig:
const evalConfig = {
  evaluators: [customCriteriaConfig],
  // Include other necessary configuration options here
};

This configuration can then be used directly with runOnDataset() or any other function in LangChain.js that accepts an EvalConfig.

By following these steps, you should be able to use custom criteria within your EvalConfig successfully, ensuring compatibility with the LangSmith server and avoiding the "422 expected a string" error.

Details

Sources


About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

@jacoblee93
Copy link
Collaborator

Sorry for the delay!

@dqbd could you have a look at this one?

@dqbd
Copy link
Collaborator

dqbd commented May 15, 2024

Hello @w-biggs!

The issue stems from the invalid type, which will be fixed in #5403. In the meantime, you can either:

  1. Ignore the error with @ts-expect-error directive
{
  evaluators: [
    // @ts-expect-error Invalid type
    Criteria({
      isCompliant: "Does the submission comply with the requirements of XYZ",
    }),
  ],
}
  1. pass formatEvaluatorInputs to obtain prediction, input, reference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants