This repository assess the LLMs reasoning capabilities in Targeted Sentiment Analysis on RuSentNE dataset proposed as a part of the self-titled competition.
In particular, we use pre-treained LLMs for the following datset splits:
- π Development
- π Final
The following reasoning we use [quick-cot] to experiment with:
- Instruction Prompts
- Chain-of-Thoughts (THoR)
All the sqlite
results are stored in contents
table.
Option 1. You may use sqlitebrowser
for accessing the results for exporting into CSV
.
Option 2. Use sqlite2csv.py
script implemented in this repository.
This is an open-access dataset split (sentiment labels available) utilized for the development stage and could be used anyone in evaluation checks.
Dataset: valiation_data_labeled.csv
Model | lang | Mode | F1(P,N) | F1(P,N,0) | N/A % | Answers |
---|---|---|---|---|---|---|
GPT-3.5-0613 | πΊπΈ | CoT THoR | 43.41 | 46.14 | - | answers |
GPT-3.5-1106 | πΊπΈ | CoT THoR | 40.85 | 40.04 | - | answers |
mistral-7b | πΊπΈ | CoT THoR | 42.74 | 51.77 | 0.04 | answers |
Model | lang | Mode | F1(P,N) | F1(P,N,0) | N/A % | Answers |
---|---|---|---|---|---|---|
GPT-4-turbo-2024-04-09 | πΊπΈ | zero-shot | 50.83 | 61.25 | 0.0 | answers |
GPT-3.5-0613 | πΊπΈ | zero-shot | 47.39 | 57.99 | 0.0 | answers |
GPT-3.5-1106 | πΊπΈ | zero-shot | 45.73 | 52.54 | 0.0 | answers |
mistral-large-latest | πΊπΈ | zero-shot | 45.24 | 58.29 | 0.0 | answers |
llama-3-70b-instruct | πΊπΈ | zero-shot | 48.96 | 60.71 | 0.0 | answers |
mixtral-8x22b | πΊπΈ | zero-shot | 45.94 | 58.34 | 0.0 | answers |
Phi-3-small-8k-instruct | πΊπΈ | zero-shot | 46.87 | 57.08 | 0.07 | answers |
mixtral-8x7b | πΊπΈ | zero-shot | 46.31 | 56.1 | 0.07 | answers |
Mistral-7B-Instruct-v0.3 | πΊπΈ | zero-shot | 45.58 | 56.08 | 0.0 | answers |
llama-3-8b-instruct | πΊπΈ | zero-shot | 45.61 | 54.88 | 0.0 | answers |
Phi-3-mini-4k-instruct | πΊπΈ | zero-shot | 44.86 | 55.52 | 0.0 | answers |
mistral-7b | πΊπΈ | zero-shot | 42.87 | 53.69 | 0.11 | answers |
gpt-4o | πΊπΈ | zero-shot | 42.23 | 55.88 | 0.0 | answers |
llama-2-70b-chat | πΊπΈ | zero-shot | 41.97 | 53.98 | 13.44 | answers |
Model | lang | Mode | F1(P,N) | F1(P,N,0) | N/A % | Answers |
---|---|---|---|---|---|---|
GPT-3.5-0613 | π·πΊ | zero-shot | 44.52 | 54.67 | 1.51 | answers |
gpt-4o | π·πΊ | zero-shot | 43.93 | 57.38 | 0.0 | answers |
GPT-3.5-1106 | π·πΊ | zero-shot | 41.46 | 47.17 | 0.46 | answers |
GPT-4-turbo-2024-04-09 | π·πΊ | zero-shot | 41.28 | 55.7 | 0.0 | answers |
mistral-large-latest | π·πΊ | zero-shot | 22.35 | 43.09 | 0.04 | answers |
llama-3-70b-instruct | π·πΊ | zero-shot | 45.21 | 58.32 | 0.0 | answers |
mixtral-8x22b | π·πΊ | zero-shot | 41.49 | 54.55 | 0.0 | answers |
mixtral-8x7b | π·πΊ | zero-shot | 39.96 | 53.56 | 0.18 | answers |
mistral-7b | π·πΊ | zero-shot | 41.71 | 47.57 | 0.18 | answers |
mistral-7B-Instruct-v0.3 | π·πΊ | zero-shot | 41.59 | 44.28 | 0.18 | answers |
Phi-3-small-8k-instruct | π·πΊ | zero-shot | 40.77 | 49.78 | 0.14 | answers |
llama-3-8b-instruct | π·πΊ | zero-shot | 40.23 | 48.02 | 0.35 | answers |
Phi-3-mini-4k-instruct | π·πΊ | zero-shot | 35.4 | 32.7 | 0.04 | answers |
llama-2-70b-chat | π·πΊ | zero-shot | 16.68 | 36.77 | 1.48 | answers |
This leaderboard and obtained LLM answers is a part of the experiments in paper: Large Language Models in Targeted Sentiment Analysis in Russian.
Dataset: final_data.csv
Model | lang | Mode | F1(P,N) | F1(P,N,0) | N/A % | Answers |
---|---|---|---|---|---|---|
GPT-4-1106-preview | πΊπΈ | CoT THoR | 50.13 | 55.93 | - | answers |
GPT-3.5-0613 | πΊπΈ | CoT THoR | 44.50 | 48.17 | - | answers |
GPT-3.5-1106 | πΊπΈ | CoT THoR | 42.58 | 42.18 | - | answers |
GPT-4-1106-preview | πΊπΈ | zero-shot (short) | 54.59 | 64.32 | - | answers |
GPT-3.5-0613 | πΊπΈ | zero-shot (short) | 51.79 | 61.38 | - | answers |
GPT-3.5-1106 | πΊπΈ | zero-shot (short) | 47.04 | 53.19 | - | answers |
Mistral-7B-instruct-v0.1 | πΊπΈ | zero-shot | 49.46 | 58.51 | - | answers |
Mistral-7B-instruct-v0.2 | πΊπΈ | zero-shot | 44.82 | 56.04 | - | answers |
DeciLM | πΊπΈ | zero-shot | 43.85 | 53.65 | 1.44 | answers |
Microsoft-Phi-2 | πΊπΈ | zero-shot | 40.95 | 42.77 | 3.13 | answers |
Gemma-7B-IT | πΊπΈ | zero-shot | 40.96 | 44.63 | - | answers |
Gemma-2B-IT | πΊπΈ | zero-shot | 31.75 | 45.96 | 2.62 | answers |
Flan-T5-xxl | πΊπΈ | zero-shot | 36.46 | 42.63 | 1.90 | answers |
Model | lang | Mode | F1(P,N) | F1(P,N,0) | N/A % | Answers |
---|---|---|---|---|---|---|
GPT-4-1106-preview | π·πΊ | zero-shot (short) | 48.04 | 60.55 | 0.0 | answers |
GPT-3.5-0613 | π·πΊ | zero-shot (short) | 45.85 | 57.36 | 0.0 | answers |
GPT-3.5-1106 | π·πΊ | zero-shot (short) | 35.07 | 48.53 | 0.0 | answers |
Mistral-7B-Instruct-v0.2 | π·πΊ | zero-shot | 42.60 | 48.05 | 0.0 | answers |
If you find the results and findings in Final Results section valuable π, feel free to cite the related work as follows:
@misc{rusnachenko2024large,
title={Large Language Models in Targeted Sentiment Analysis},
author={Nicolay Rusnachenko and Anton Golubev and Natalia Loukachevitch},
year={2024},
eprint={2404.12342},
archivePrefix={arXiv},
primaryClass={cs.CL}
}