llms-benchmarking

LLM Benchmarks play a crucial role in assessing the performance of Language Model Models (LLMs). However, it is essential to recognize that these benchmarks have their own limitations. This interactive tool is designed to engage users in a quiz game based on popular LLM benchmarks, offering an insightful way to explore and understand them

streamlit llms llms-benchmarking

Updated Jan 1, 2024
Python

stair-lab / villm-eval

Star

Evaluation of Language Models in Non-English Languages

llms-benchmarking llm-evaluation-framework

Updated Apr 2, 2024
Python

dinesh-kumar-mr / MediVQA

Star

Part of our final year project work involving complex NLP tasks along with experimentation on various datasets and different LLMs

vqa medical-application vqa-dataset vqa-med-2018 llms llms-benchmarking

Updated Jan 12, 2024
HTML

Santhoshi-Ravi / minerva

Star

Evaluating and enhancing Large Language Models (LLMs) using mathematical datasets through innovative Multi-Agent Debate Architecture, without traditional fine-tuning or Retrieval-Augmented Generation techniques. This project explores advanced strategies to boost LLM capabilities in mathematical reasoning.

llm llms-benchmarking multi-agent-debate

Updated Apr 27, 2024
Jupyter Notebook

SharathHebbar / eval_llms

Star

eleutherai llm-evaluation llms-benchmarking

Updated Feb 4, 2024
Jupyter Notebook

EvilPsyCHo / Open-LLM-Benchmark

Star

Evaluate open-source language models on Agent, formatted output, command following, long text, multilingual, coding, and custom task capabilities. 开源语言模型在Agent，格式化输出，指令追随，长文本，多语言，代码，自定义任务的能力基准测试。

openai evaluation-framework huggingface large-language-models llamacpp vllm llm-agent llms-benchmarking

Updated May 10, 2024
Python

Improve this page

Add a description, image, and links to the llms-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llms-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llms-benchmarking

Here are 20 public repositories matching this topic...

ChemFoundationModels / ChemLLMBench

parea-ai / parea-sdk-py

lamalab-org / chem-bench

epfl-dlab / cc_flows

declare-lab / resta

minnesotanlp / cobbler

Paulescu / text-embedding-evaluation

logikon-ai / cot-eval

PrincySinghal / Html-code-generation-from-LLM

parea-ai / parea-sdk-ts

dippatel1994 / Large-Language-Models-Evaluation-Benchmarks-Collection

melvinebenezer / Liah-Lie_in_a_haystack

s2e-lab / RegexEval

lwachowiak / LLMs-for-Social-Robotics

aflah02 / Humans-v-s-LLM-Benchmarks

stair-lab / villm-eval

dinesh-kumar-mr / MediVQA

Santhoshi-Ravi / minerva

SharathHebbar / eval_llms

EvilPsyCHo / Open-LLM-Benchmark

Improve this page

Add this topic to your repo