Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for all tags defined by the sitemap protocol #2313

Open
simonbrunel opened this issue Jan 31, 2024 · 0 comments
Open

Add support for all tags defined by the sitemap protocol #2313

simonbrunel opened this issue Jan 31, 2024 · 0 comments
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@simonbrunel
Copy link

Which package is the feature request for? If unsure which one to select, leave blank

@crawlee/utils

Feature

Loading sitemaps using Sitemap.load() should give access to the other tags defined by the Sitemaps format: loc, lastmod, changefreq and priority.

Motivation

Sitemaps give information about when each page has been last modified, priority, etc... and while I'm sure there are other libraries to load sitemaps, it's easier to rely on the Crawlee utils instead (consistency and less dependencies). Since you already provide a Sitemap util, it should be relatively easy to expose other tags other than the url.

Ideal solution or implementation, and any additional constraints

export type SitemapEntry = {
  changefreq?: 'always' | 'hourly' | 'daily' ...,
  lastmod?: ISODate;
  priority?: number;
  url: string;
}

class Sitemap {
  readonly urls: string[]; // deprecated (maybe)

  constructor(readonly entries: SitemapEntry[]) {
    this.urls = entries.map((d) => d.url);
  }
}

// Usage
const sitemap = await Sitemap.load(...);
for (const entry of sitemap.entries) {
   console.log(entry.url, entry.lastmod, ...);
}

Alternative solutions or implementations

No response

Other context

No response

@simonbrunel simonbrunel added the feature Issues that represent new features or improvements to existing features. label Jan 31, 2024
@B4nan B4nan added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Issues that represent new features or improvements to existing features. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

2 participants