Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: UTC 多标签评价指标有问题 #8381

Open
1 task done
JoshonSmith opened this issue May 7, 2024 · 2 comments
Open
1 task done

[Bug]: UTC 多标签评价指标有问题 #8381

JoshonSmith opened this issue May 7, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@JoshonSmith
Copy link

软件环境

- paddlepaddle:
- paddlepaddle-gpu: 2.6
- paddlenlp: 2.7.2

重复问题

  • I have searched the existing issues

错误描述

代码位置:https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/zero_shot_text_classification/run_eval.py
    def compute_metrics(eval_preds):
        labels = paddle.to_tensor(eval_preds.label_ids, dtype="int64")
        preds = paddle.to_tensor(eval_preds.predictions)

        preds = paddle.nn.functional.sigmoid(preds)
        preds = preds[labels != -100].numpy()
        labels = labels[labels != -100].numpy()
        preds = preds > data_args.threshold
        micro_f1 = f1_score(y_pred=preds, y_true=labels, average="micro")
        macro_f1 = f1_score(y_pred=preds, y_true=labels, average="macro")

        return {"micro_f1": micro_f1, "macro_f1": macro_f1}
问题:
        preds = preds[labels != -100].numpy()
        labels = labels[labels != -100].numpy()
这两句代码导致后面计算指标时有问题,有这两句代码的结果是:
 precision    recall  f1-score   support

           0     0.9895    0.9879    0.9887      5793
           1     0.9227    0.9320    0.9273       897

    accuracy                         0.9804      6690
   macro avg     0.9561    0.9600    0.9580      6690
weighted avg     0.9805    0.9804    0.9805      6690
没有这两句代码的指标结果是:
              precision    recall  f1-score   support

           0     0.9527    0.9699    0.9612       166
           1     0.8333    0.8730    0.8527        63
           2     0.9250    0.9737    0.9487        38
           3     0.9742    0.9437    0.9587       160
           4     0.8696    0.9524    0.9091        42
           5     0.9620    0.9620    0.9620       184
           6     1.0000    0.7619    0.8649        21
           7     0.8955    0.9524    0.9231        63
           8     0.9596    0.9694    0.9645        98
           9     0.6875    0.7097    0.6984        62

   micro avg     0.9227    0.9320    0.9273       897
   macro avg     0.9059    0.9068    0.9043       897
weighted avg     0.9245    0.9320    0.9276       897
 samples avg     0.9312    0.9377    0.9294       897

我的数据是10个类别多标签,明显没有这两句代码的结果是正确的

稳定复现步骤 & 代码

https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/zero_shot_text_classification/run_eval.py
def compute_metrics(eval_preds):
注释其中的
preds = preds[labels != -100].numpy()
labels = labels[labels != -100].numpy()

@JoshonSmith JoshonSmith added the bug Something isn't working label May 7, 2024
@w5688414
Copy link
Contributor

label在处理的时候有-100(padding字段),这是为了训练的时候不参与损失函数的计算,请参考:

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/CrossEntropyLoss_cn.html#crossentropyloss

@JoshonSmith
Copy link
Author

label在处理的时候有-100(padding字段),这是为了训练的时候不参与损失函数的计算,请参考:

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/nn/CrossEntropyLoss_cn.html#crossentropyloss

感谢说明训练时的loss细节,
但是这些代码是在 zero_shot_text_classification/run_eval.py ,run_eval.py 不是训练代码,应该是评估代码。

按照 https://github.com/PaddlePaddle/PaddleNLP/blob/develop/applications/zero_shot_text_classification/README.md 说明,python run_eval.py 是模型评估预测,run_train.py是训练模型代码

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants