Skip to content

This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media πŸ“Š

License

Notifications You must be signed in to change notification settings

nicolay-r/RuSentNE-LLM-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RuSentNE-LLM-Benchmark β€’ twitter

This repository assess the LLMs reasoning capabilities in Targeted Sentiment Analysis on RuSentNE dataset proposed as a part of the self-titled competition.

In particular, we use pre-treained LLMs for the following datset splits:

  1. πŸ”“ Development
  2. πŸ”’ Final

The following reasoning we use [quick-cot] to experiment with:

  • Instruction Prompts
  • Chain-of-Thoughts (THoR)

πŸ” Accessing the results

All the sqlite results are stored in contents table.

Option 1. You may use sqlitebrowser for accessing the results for exporting into CSV. accessability

Option 2. Use sqlite2csv.py script implemented in this repository.

πŸ”“ Development Results

twitter

This is an open-access dataset split (sentiment labels available) utilized for the development stage and could be used anyone in evaluation checks.

Dataset: valiation_data_labeled.csv

Model lang Mode F1(P,N) F1(P,N,0) N/A % Answers
GPT-3.5-0613 πŸ‡ΊπŸ‡Έ CoT THoR 43.41 46.14 - answers
GPT-3.5-1106 πŸ‡ΊπŸ‡Έ CoT THoR 40.85 40.04 - answers
mistral-7b πŸ‡ΊπŸ‡Έ CoT THoR 42.74 51.77 0.04 answers
Model lang Mode F1(P,N) F1(P,N,0) N/A % Answers
GPT-4-turbo-2024-04-09 πŸ‡ΊπŸ‡Έ zero-shot 50.83 61.25 0.0 answers
GPT-3.5-0613 πŸ‡ΊπŸ‡Έ zero-shot 47.39 57.99 0.0 answers
GPT-3.5-1106 πŸ‡ΊπŸ‡Έ zero-shot 45.73 52.54 0.0 answers
mistral-large-latest πŸ‡ΊπŸ‡Έ zero-shot 45.24 58.29 0.0 answers
llama-3-70b-instruct πŸ‡ΊπŸ‡Έ zero-shot 48.96 60.71 0.0 answers
mixtral-8x22b πŸ‡ΊπŸ‡Έ zero-shot 45.94 58.34 0.0 answers
Phi-3-small-8k-instruct πŸ‡ΊπŸ‡Έ zero-shot 46.87 57.08 0.07 answers
mixtral-8x7b πŸ‡ΊπŸ‡Έ zero-shot 46.31 56.1 0.07 answers
Mistral-7B-Instruct-v0.3 πŸ‡ΊπŸ‡Έ zero-shot 45.58 56.08 0.0 answers
llama-3-8b-instruct πŸ‡ΊπŸ‡Έ zero-shot 45.61 54.88 0.0 answers
Phi-3-mini-4k-instruct πŸ‡ΊπŸ‡Έ zero-shot 44.86 55.52 0.0 answers
mistral-7b πŸ‡ΊπŸ‡Έ zero-shot 42.87 53.69 0.11 answers
gpt-4o πŸ‡ΊπŸ‡Έ zero-shot 42.23 55.88 0.0 answers
llama-2-70b-chat πŸ‡ΊπŸ‡Έ zero-shot 41.97 53.98 13.44 answers
Model lang Mode F1(P,N) F1(P,N,0) N/A % Answers
GPT-3.5-0613 πŸ‡·πŸ‡Ί zero-shot 44.52 54.67 1.51 answers
gpt-4o πŸ‡·πŸ‡Ί zero-shot 43.93 57.38 0.0 answers
GPT-3.5-1106 πŸ‡·πŸ‡Ί zero-shot 41.46 47.17 0.46 answers
GPT-4-turbo-2024-04-09 πŸ‡·πŸ‡Ί zero-shot 41.28 55.7 0.0 answers
mistral-large-latest πŸ‡·πŸ‡Ί zero-shot 22.35 43.09 0.04 answers
llama-3-70b-instruct πŸ‡·πŸ‡Ί zero-shot 45.21 58.32 0.0 answers
mixtral-8x22b πŸ‡·πŸ‡Ί zero-shot 41.49 54.55 0.0 answers
mixtral-8x7b πŸ‡·πŸ‡Ί zero-shot 39.96 53.56 0.18 answers
mistral-7b πŸ‡·πŸ‡Ί zero-shot 41.71 47.57 0.18 answers
mistral-7B-Instruct-v0.3 πŸ‡·πŸ‡Ί zero-shot 41.59 44.28 0.18 answers
Phi-3-small-8k-instruct πŸ‡·πŸ‡Ί zero-shot 40.77 49.78 0.14 answers
llama-3-8b-instruct πŸ‡·πŸ‡Ί zero-shot 40.23 48.02 0.35 answers
Phi-3-mini-4k-instruct πŸ‡·πŸ‡Ί zero-shot 35.4 32.7 0.04 answers
llama-2-70b-chat πŸ‡·πŸ‡Ί zero-shot 16.68 36.77 1.48 answers

πŸ”’ Final Results

arXiv

This leaderboard and obtained LLM answers is a part of the experiments in paper: Large Language Models in Targeted Sentiment Analysis in Russian.

Dataset: final_data.csv

Model lang Mode F1(P,N) F1(P,N,0) N/A % Answers
GPT-4-1106-preview πŸ‡ΊπŸ‡Έ CoT THoR 50.13 55.93 - answers
GPT-3.5-0613 πŸ‡ΊπŸ‡Έ CoT THoR 44.50 48.17 - answers
GPT-3.5-1106 πŸ‡ΊπŸ‡Έ CoT THoR 42.58 42.18 - answers
GPT-4-1106-preview πŸ‡ΊπŸ‡Έ zero-shot (short) 54.59 64.32 - answers
GPT-3.5-0613 πŸ‡ΊπŸ‡Έ zero-shot (short) 51.79 61.38 - answers
GPT-3.5-1106 πŸ‡ΊπŸ‡Έ zero-shot (short) 47.04 53.19 - answers
Mistral-7B-instruct-v0.1 πŸ‡ΊπŸ‡Έ zero-shot 49.46 58.51 - answers
Mistral-7B-instruct-v0.2 πŸ‡ΊπŸ‡Έ zero-shot 44.82 56.04 - answers
DeciLM πŸ‡ΊπŸ‡Έ zero-shot 43.85 53.65 1.44 answers
Microsoft-Phi-2 πŸ‡ΊπŸ‡Έ zero-shot 40.95 42.77 3.13 answers
Gemma-7B-IT πŸ‡ΊπŸ‡Έ zero-shot 40.96 44.63 - answers
Gemma-2B-IT πŸ‡ΊπŸ‡Έ zero-shot 31.75 45.96 2.62 answers
Flan-T5-xxl πŸ‡ΊπŸ‡Έ zero-shot 36.46 42.63 1.90 answers
Model lang Mode F1(P,N) F1(P,N,0) N/A % Answers
GPT-4-1106-preview πŸ‡·πŸ‡Ί zero-shot (short) 48.04 60.55 0.0 answers
GPT-3.5-0613 πŸ‡·πŸ‡Ί zero-shot (short) 45.85 57.36 0.0 answers
GPT-3.5-1106 πŸ‡·πŸ‡Ί zero-shot (short) 35.07 48.53 0.0 answers
Mistral-7B-Instruct-v0.2 πŸ‡·πŸ‡Ί zero-shot 42.60 48.05 0.0 answers

References

If you find the results and findings in Final Results section valuable πŸ’Ž, feel free to cite the related work as follows:

@misc{rusnachenko2024large,
      title={Large Language Models in Targeted Sentiment Analysis}, 
      author={Nicolay Rusnachenko and Anton Golubev and Natalia Loukachevitch},
      year={2024},
      eprint={2404.12342},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

This repository highlights the LLMs reasoning capabilities of ✨ Mistral / LLaMA-3 / Phi-3 / Gemma / Flan-T5 / GPT-4o ✨ in Targeted Sentiment Analysis in Russian / Translated to English mass-media πŸ“Š

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages