Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Class Span Classification Support #1034

Open
wants to merge 56 commits into
base: master
Choose a base branch
from

Conversation

Steven-Yiran
Copy link
Contributor

Summary

This pull request aims to add support for the multi-class span classification dataset proposed in #385. The multi-class support is an extension of the single-class functionalities in #982. The issue display format for display_issues() is modified to suit the span classification dataset. The new additions are backward compatible with single-class usage in #982.

An example of a multi-class span classification task:

import numpy as np
from cleanlab.experimental.span_classification import find_label_issues

# 3 span class with labels 1, 2, 3
tokens = [
    ['a', 'b', 'c', 'd'],
    ['e', 'f', 'g'],
]
labels = [
    [[0], [1, 2], [1, 3], [0]],
    [[1], [2, 3], [3]],
]
pred_probs = [
    np.array([[0.9, 0.2, 0.3], [0.9, 0.9, 0.2], [0.9, 0.1, 0.7], [0.1, 0.1, 0.1]]),
    np.array([[0.1, 0.9, 0.1], [0.1, 0.9, 0.9], [0.1, 0.9, 0.9]]),
]

issues = find_label_issues(labels, pred_probs)
# {1: [(0, 0), (1, 0)], 2: [(1, 0), (1, 2)], 3: []}

display_issues(issues, tokens, labels=labels, pred_probs=pred_probs)
# Span Class: 1
# Sentence index: 0, Token index: 0
# Token: a
# According to provided labels/pred_probs, token marked as outside span but predicted inside span with probability: 0.9
# ----
# a b c d
#
# ...

Steven-Yiran and others added 30 commits September 3, 2023 15:23
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
Co-authored-by: Jonas Mueller <1390638+jwmueller@users.noreply.github.com>
Updated related functions to add class_names optional arg and added related tests.
Added class_accuracy functions and plotting functions; modified doc string
- modified class_name and class_to_show behavior
- added division by zero prevention;
- addressed comments
Copy link

codecov bot commented Mar 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.25%. Comparing base (09245a9) to head (dbd9b06).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1034      +/-   ##
==========================================
+ Coverage   96.19%   96.25%   +0.05%     
==========================================
  Files          74       74              
  Lines        5841     5841              
  Branches     1043     1043              
==========================================
+ Hits         5619     5622       +3     
+ Misses        132      130       -2     
+ Partials       90       89       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants