EleutherAI / lm-evaluation-harness Public

Notifications
Fork 1.4k
Star 5.3k

Code
Issues 195
Pull requests 59
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 2

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

195 Open 653 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

ImportError: cannot import name 'HfApi' from 'huggingface_hub'

#1889 opened May 25, 2024 by baberabb

Multiple issues Encountered During Tasks Verification

#1885 opened May 25, 2024 by zhabuye

can we add C4 and PTB tasks for PpL? feature request

A feature that isn't implemented yet.

#1884 opened May 25, 2024 by 123wujiao

Add Regression Testing feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1883 opened May 24, 2024 by haileyschoelkopf

eval with Alpaca template

#1882 opened May 24, 2024 by oneonlee

Evaluation MC Questions

#1875 opened May 23, 2024 by kangqi-ni

chat model evaluation

#1870 opened May 22, 2024 by jordane95

Add more math evaluation tasks

#1869 opened May 22, 2024 by jordane95

Llama2-hf-q40.gguf model got very poor results on lambada_openai tasks, but was fine on other tasks.

#1866 opened May 21, 2024 by intellinjun

Is there something wrong with 'google/gemma-1.1-2b-it' ?

#1854 opened May 18, 2024 by rangehow

--device cuda:3 not honored when using --model vllm bug

Something isn't working.

documentation

Improvements or additions to documentation.

#1846 opened May 15, 2024 by LGLG42

How to use Zeno

#1842 opened May 14, 2024 by DavidAdamczyk

Inconsistent evaluation results with Chat Template

#1841 opened May 14, 2024 by shiweijiezero

AssertionError: aggregation named 'mean' conflicts with existing registered aggregation!

#1839 opened May 14, 2024 by hunter2009pf

Bug: wrong until default value for chat based model

#1837 opened May 14, 2024 by YilunZhou

sha256 for datasets or samples

#1836 opened May 13, 2024 by artemorloff

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2

#1833 opened May 13, 2024 by l2002924700

Using Language Models as Evaluators feature request

A feature that isn't implemented yet.

#1831 opened May 13, 2024 by lintangsutawika

Errors when loading exact_match.py

#1830 opened May 13, 2024 by twxin

eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."

#1829 opened May 12, 2024 by Jp-17

Add More Tests feature request

A feature that isn't implemented yet.

#1827 opened May 12, 2024 by haileyschoelkopf

I get this error whenever I try to run an eval: ImportError: cannot import name 'HfApi' from 'huggingface_hub'

#1826 opened May 12, 2024 by menhguin

Avoid slow testing due to network issues.

#1824 opened May 11, 2024 by pixeli99

The input format for XNLI seems wired?

#1822 opened May 10, 2024 by SefaZeng

TypeError: 'NoneType' object is not iterable when using cache and loglikelihood_rolling

#1821 opened May 10, 2024 by mdocekal

Previous 1 2 3 4 5 6 7 8 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly