Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
-
Updated
Apr 18, 2024 - Python
Examining how large language models (LLMs) perform across various synthetic regression tasks when given (input, output) examples in their context, without any parameter update
A benchmark for prompt injection detection systems.
LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.
FM-Leaderboard-er allows you to create leaderboard to find the best LLM/prompt for your own business use case based on your data, task, prompts
Add a description, image, and links to the llm-benchmarking topic page so that developers can more easily learn about it.
To associate your repository with the llm-benchmarking topic, visit your repo's landing page and select "manage topics."