Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Website agent refuses to create event if extraction is empty #3356

Open
radry opened this issue Feb 19, 2024 · 1 comment
Open

Website agent refuses to create event if extraction is empty #3356

radry opened this issue Feb 19, 2024 · 1 comment

Comments

@radry
Copy link

radry commented Feb 19, 2024

Problem:
I have an XML that I scrape via xpath for a certain key. Sometimes that key/value doesn't exist, thus the website agent returns [] according to the log. The website agent is triggered by another event and is set to merge, so it will pass the incoming values to its own event.

Expected behaviour:
Create an event when "merge" is set. Even if the exctraction is empty.

What I unsuccessfully tried so far to circumvent this behaviour:

  • scrape and additional (throwaway) value that is always present in the xml
    Result: No event was triggered because one of the extractions returned []

  • scrape as text with regexp instead of xml/xpath. Use a template with {{ value | default: 'none' }} to return the extracted value.
    Result: No event was triggered because regexp found nothing (as expected because the value doesn't exist). Default value also wasn't used.

{
  "expected_update_period_in_days": "1",
  "url": "fooredacted",
  "type": "xml",
  "mode": "merge",
  "extract": {
    "torrenturl": {
      "xpath": "/rss/channel/item[1]/link",
      "value": "string(.)",
      "hidden": "true"
    }
  },
  "template": {
    "thumb": "{{thumb}}",
    "pv_url": "{{pv_url}}",
    "name": "{{name}}",
    "altimage": "{{altimage}}",
    "torrenturl": "{{torrenturl | default: 'none' }}"
  }
}

Template is optional, I tried with and without. Only when the value is present in the xml an event is triggered.

@radry
Copy link
Author

radry commented Feb 19, 2024

After struggling for hours I finally managed to find a workaround via xpath:

Failing to use if/else in the xpath (which exist according to documentation) I used this trick but it wasn't easy to figure out how to use in in huginn. I hope this helps someone else with the same problem:

{
  "expected_update_period_in_days": "1",
  "url": "foo",
  "type": "xml",
  "mode": "merge",
  "extract": {
    "torrenturl": {
      "xpath": "/rss/channel",
      "value": "concat(substring(./item[1]/link, 1, number(contains(./item[1]/link,'h'))      * string-length(./item[1]/link)), substring('none', 1, number(not(contains(./item[1]/link,'h'))) * string-length('none')))",
      "hidden": "true"
    }
  },
  "template": {
    "thumb": "{{thumb}}",
    "pv_url": "{{pv_url}}",
    "name": "{{name}}",
    "altimage": "{{altimage}}",
    "torrenturl": "{{torrenturl}}"
  }
}

Take note of the "xpath" and "value". The key to success is to use an xpath that will always exist, even if the element further down the tree doesn't exist. This will trigger huginn to read the "value" which is also an xpath.
In the "value" we can then apply the if/else workaround.

Where you "split" the xpath is not important, you could use / as "xpath" too probably.

Regardless of my success, this is only a workaround. I still think the website agent should have an option to always create an event, even when empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant