Skip to content
This repository has been archived by the owner on Nov 8, 2022. It is now read-only.

bug: ECB Alignment issues with raw ECB files #158

Open
samjtozer opened this issue Apr 29, 2020 · 0 comments
Open

bug: ECB Alignment issues with raw ECB files #158

samjtozer opened this issue Apr 29, 2020 · 0 comments
Labels
bug Something isn't working

Comments

@samjtozer
Copy link

samjtozer commented Apr 29, 2020

I've been looking through your processed ECB data (thanks for sharing a processed version) and cross-comparing with that of the original files.

I've noticed there seems to be an alignment issue. If you look at your raw data https://raw.githubusercontent.com/NervanaSystems/nlp-architect/master/datasets/ecb/ecb_all_event_mentions.json

{
        "coref_chain": "ACT15731460277214564",
        "doc_id": "1_21ecbplus.xml",
        "is_continuous": true,
        "is_singleton": false,
        "mention_head": "agreed",
        "mention_head_lemma": "agree",
        "mention_head_pos": "VERB",
        "mention_id": "1_21ecbplus.xml_6_15",
        "mention_ner": null,
        "mention_type": "ACT",
        "predicted_coref_chain": null,
        "score": -1.0,
        "sent_id": 6,
        "tokens_number": [
            15
        ],
        "tokens_str": "agreed",
        "topic_id": "1_ecbplus"
    },

If I then go back to the raw ECB xml files and look at sentence 6 in file 1_21ecbplus, the corresponding tokens are:

<token t_id="106" sentence="6" number="0">Nothing</token>
<token t_id="107" sentence="6" number="1">bad</token>
<token t_id="108" sentence="6" number="2">is</token>
<token t_id="109" sentence="6" number="3">going</token>
<token t_id="110" sentence="6" number="4">to</token>
<token t_id="111" sentence="6" number="5">happen</token>
<token t_id="112" sentence="6" number="6">.</token>
<token t_id="113" sentence="6" number="7">"</token>

Reason why I'd want to go back and check this is if i want to pull out the full token list and attach it to this payload, the alignment is off.

Is this a bug? Or am I looking at this incorrectly...

@samjtozer samjtozer added the bug Something isn't working label Apr 29, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant