-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MediaWiki and Wikipedia Connectors #1250
Conversation
@qthequartermasterman is attempting to deploy a commit to the Danswer Team on Vercel. A member of the Team first needs to authorize it. |
@yuhongsun96 How can I make this easier to review? |
# Conflicts: # backend/danswer/configs/constants.py # backend/danswer/connectors/factory.py # web/src/components/icons/icons.tsx # web/src/lib/types.ts
Hi! Will try to get to it soon, apologies on the delay and thanks for your patience with us Thanks also for the great work and contribution! |
@yuhongsun96 Any update on this? |
Taking a look now 🫡 , thanks! |
Looks good, a couple requests:
Thanks for the amazing work! |
The connectors already inherit from class MediaWikiConnector(LoadConnector, PollConnector): I also swapped Is that what you're referring to? Also, would you like me to update the
Does this look like what you're asking?
I will open a PR doing so shortly. It may be a few days given the upcoming holiday weekend. |
Ya, that's perfect, the bottom section is for "poll" connectors, the top for "load", that's the way most users think about it! Granted the Web connector does update but a lot of people already have it mentally associated the other way so we never moved it :P I can change the poll frequency myself, that's trivial, a day seem reasonable! Thanks for the amazing work and looking forward to the docs! |
Resolves #1141.
I am happy to iterate on this with inputs from the devs.
Summary
This PR adds a general MediaWiki Connector which will connect to most MediaWiki sites, including Wikipedia, fandom sites, and many others. There is also a subclass Wikipedia Connector which is a light wrapper around the MediaWikiConnector which uses the special handling for Wikipedia.
The connector is based on
pywikibot
.It will optionally recurse over categories to obtain additional pages.
It supports both polling and loading.
Possible Future improvements
There is a solution for handling general MediaWiki sites which generates a
Family
class automatically by querying a given site using several heuristics (built intopywikibot
). This will not handle any special cases however. Wikipedia, for example, has some extra language sites that wouldn't otherwise be found by the generic technique. This specialFamily
class is built intopywikibot
, and is used here. There are many more specialFamily
classes to deal with various sites built intopywikibot
. None of these other special cases are included, because it's not clear to me which ones would be useful.Additionally, there is no special handling for other types of pages, such as talk pages; just regular pages and categories.