-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SharePoint Server NTLM and Host Named Site Collections support #2341
base: main
Are you sure you want to change the base?
SharePoint Server NTLM and Host Named Site Collections support #2341
Conversation
…connector supports Host Named Site Collections. Added support for NTLM by switching from aiohttp to httpx.
💚 CLA has been signed |
buildkite test this |
@anghelnicolae thanks for submitting a PR! First, in order for any contributions from you to be accepted, you'll need to sign Elastic's Contributor Agreement. You can access that here: https://www.elastic.co/contributor-agreement Next, we'll need to make sure all the tests pass. You can run tests locally with Before you invest in changing all the tests though, I'd suggest a different approach. First, for NTLM support, is it possible to use one client (httpx) to handle the authentication, and the existing client to handle everything else? This would limit the blast radius of your change. Second, handling non-path-based site collections is something we've had reported for Sharepoint Online as well. See: #2112 I'd like to see us address both connectors in a consistent manner. And I think the right way to do that is probably with a new configuration field to indicate what type of site collection identifier you're providing. The change as you've made it would break this connector for all its current users when they upgraded, and that's not going to be acceptable. |
@seanstory What happens now? Is this PR closed and I submit a new one after I update my code? |
That's fine! The CLA check is passing now. ✅
Maybe it's not. I don't know a lot about NTLM, so maybe it doesn't make sense. But we've had instances before where you have two instance variables, like (psudocode):
so that you don't have to implement a complex auth flow, but can use an existing library to facilitate getting a lower-level authnetication token that results from that flow. This was the kind of pattern I was suggesting.
thanks!
I'll leave that up to you. If you want to iterate on this PR, it's fine to leave this open. You can move it to a "draft" state and then let us know when you feel it's ready for a review. Alternatively, if you'd rather start over from scratch, we can close this. And it's not a 1-way door, we can always re-open a closed PR later. |
Re: NTLM - is it a setting in Sharepoint Server, that you can use only one or the other? Or is it still possible to access Sharepoint Server with basic auth even if NTLM is enabled? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also left a couple questions to better understand your change :)
@@ -35,13 +35,13 @@ | |||
SELECTED_FIELDS = "WikiField, Modified,Id,GUID,File,Folder" | |||
|
|||
URLS = { | |||
PING: "{host_url}{parent_site_url}_api/web/webs", | |||
SITES: "{host_url}{parent_site_url}/_api/web/webs?$skip={skip}&$top={top}", | |||
LISTS: "{host_url}{parent_site_url}/_api/web/lists?$skip={skip}&$top={top}&$expand=RootFolder&$filter=(Hidden eq false)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between Hidden and NoCrawl? Do we want both to be eq false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the field that SharePoint Search uses to decide, so I thought it would be a better approach. The results are very similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the connector is also used by other customers, it should not break their flows - if it's not identical, we need to be very very careful.
I'd leave it as is if it works for you, if not we can see and make this a configurable option via configurable fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed back to Hidden, but will add this option in a different pull request.
@@ -483,6 +478,7 @@ def get_default_configuration(cls): | |||
"label": "Comma-separated list of SharePoint site collections to index", | |||
"order": 4, | |||
"type": "list", | |||
"required": False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make it false? Empty value will be treated like "index only root site"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my mistake, I am planning to add a configuration checkbox that allows the connector to try to get the list of sites from SharePoint Search, making this field optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - it's possible, but I'd advice to make it in a separate PR so that we could review it better. Large PRs with multiple features tend to take a lot of time to review, fix and merge
for collection in self.sharepoint_client.site_collections_path: | ||
server_relative_url.append(collection) | ||
for collection in self.sharepoint_client.site_collections: | ||
server_relative_url.append(f"{collection}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this change? Collection is a string, why stringify again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fixed.
): | ||
server_relative_url.append(site_data["ServerRelativeUrl"]) | ||
server_relative_url.append(site_data["Url"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rename server_relative_url
to server_url
cause it's not relative any more
@@ -844,3 +840,13 @@ async def download_func(self, response_data): | |||
it instead contains a key with bytes in its response. | |||
""" | |||
yield bytes(response_data, "utf-8") | |||
|
|||
async def chunked_download_func(self, download_func): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have a different function implementation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the one in source.py assumes it's receiving a aiohttp response object (response.content.iter_chunked), which is not the case when using httpx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah makes sense, thanks for the explanation!
In IIS, under the site hosting the SharePoint web application that you're targeting, there is an "Authentication" setting. This setting allows multiple authentication methods to be available for the site: Anonymous, ASP.NET Impersonation, Basic, Forms, Windows. Basic Authentication is by default disabled and rightfully so. To answer your question, yes, it's possible to access SharePoint with both Basic and NTLM(Windows) at the same time. |
I see. From what I understood NTLM is relatively popular, so we can have it here. If you don't severely need it, I'd skip it TBH. If you need it, we can make it configurable via configurable field - user chooses type of auth and in the connector we instantiate the client based on chosen auth method. I checked We'll look into |
I'm going to be indexing farms that I have no control over, so they won't have Basic Authentication enabled. So I really need it. |
Makes sense. In this case we will need to: |
Added option to select between Basic and NTLM authentication Other changes suggested by the Elasticsearch team
I've addressed most of the recommendations, but I haven't modified the tests yet. Can you take a look and let me know if it's OK now? |
Hi @anghelnicolae, I've skimmed through the changes and they look good to me. I will give some manual testing this week and get back to you as soon as possible! |
buildkite test this |
☝️ checking to make sure the unit tests still pass. I'd also love to see some unit tests added for these changes. |
@anghelnicolae I'm not sure if you can see our buildkite CI. In case you can't, the tests are not currently passing. The current test output
You can run the tests locally with You will also need to get the linter to pass. Usually this can be done automagically with Eventually we'll also need to get the integration tests (ftests) to pass. You can run this with |
Hi @seanstory, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at it and testing a bit it looks like everything is good - you're using a great approach, now fixing tests/linters is next step, after that we'll do some testing on a real instance and will be happy to merge this change!
@@ -462,54 +489,65 @@ def get_default_configuration(cls): | |||
dictionary: Default configuration. | |||
""" | |||
return { | |||
"authentication":{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need "display": "dropdown" here to make it an actual dropdown - now it looks like a textfield
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in next commit.
On it. I'll let you know when I finish. |
I've finished updating the tests. I also ran the autoformatter. |
Buildkite test this |
Hey @anghelnicolae! I ran the build and see that linter is not happy about Additionally, our functional tests broke with the change - likely due to the changes in configuration that you've added. You can run functional tests with Let me know if any of these steps are not clear, and I'll be happy to guide you :) |
Buildkite test this |
@anghelnicolae I see that CI is still red, have the functional test ran successfully for you? |
No, but they don't really try to run, so I'm probably missing something in my environment. I fixed the obvious errors from the previous run, but I have no clue what to do next. All this is pretty difficult for someone who's never worked with Python or Linux before. It would be a very good idea for someone to write some documentation about how to setup a development environment. These are the errors I'm getting when trying to run the functional tests: None
Any chance you guys can pick up the process from here? |
Sure I'll give it a look, should be a straightforward fix |
PING: "{site_url}/_api/web/webs", | ||
SITE: "{parent_site_url}/_api/web", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity - why did SITE url changed from _api/web/webs
to _api/web
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It didn't change, it's an addition similar to SITES. I need to get the whole SPWeb object for the current site and all it's subsites in order to access the relative and full URLs of the sites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, can you send an example jsob object that it returns if possible for our functional tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a full response looks like this:
{ "d":{ "AllowRssFeeds":true, "AlternateCssUrl":"", "AppInstanceId":"00000000-0000-0000-0000-000000000000", "Configuration":0, "Created":"2023-10-24T00:24:36", "CurrentChangeToken":{ "__metadata":{ "type":"SP.ChangeToken" }, "StringValue":"1;2;cc594a12-3ded-451a-b868-40ccec07d354;638495628235200000;40440" }, "CustomMasterUrl":"/_catalogs/masterpage/seattle.master", "Description":"Portal Home", "DesignPackageId":"00000000-0000-0000-0000-000000000000", "DocumentLibraryCalloutOfficeWebAppPreviewersDisabled":false, "EnableMinimalDownload":true, "HeaderEmphasis":0, "HorizontalQuickLaunch":false, "Id":"cc594a12-3ded-451a-b868-40ccec07d354", "IsMultilingual":false, "Language":1033, "LastItemModifiedDate":"2024-04-24T13:40:24Z", "LastItemUserModifiedDate":"2024-04-24T11:33:33Z", "MasterUrl":"/_catalogs/masterpage/seattle.master", "NoCrawl":false, "OverwriteTranslationsOnChange":false, "ResourcePath":{ "__metadata":{ "type":"SP.ResourcePath" }, "DecodedUrl":"https://nsp1" }, "QuickLaunchEnabled":true, "RecycleBinEnabled":true, "ServerRelativeUrl":"/", "SiteLogoUrl":null, "SyndicationEnabled":true, "Title":"Portal Home", "TreeViewEnabled":false, "UIVersion":15, "UIVersionConfigurationEnabled":false, "Url":"https://nsp1", "WebTemplate":"STS", "WelcomePage":"SitePages/Home.aspx" } }
But you only care about a few of these, plus some other values that will be useful in the future:
{ "d":{ "Created":"2023-10-24T00:24:36", "Description":"Portal Home", "Id":"cc594a12-3ded-451a-b868-40ccec07d354", "Language":1033, "LastItemModifiedDate":"2024-04-24T13:40:24Z", "NoCrawl":false, "ServerRelativeUrl":"/", "Title":"Portal Home", "Url":"https://nsp1", "WebTemplate":"STS", "WelcomePage":"SitePages/Home.aspx", "HasUniqueRoleAssignments":false } }
@anghelnicolae I was able to fix the functional tests and pushed here: #2475 However, I see a merge conflict too - do you think you can fix the merge conflicts in this PR and I fix the functional tests further after? |
I see 3 merge conflicts, I will try to do it tomorrow. Thanks for fixing the functional tests, can you tell me what I need to do to run the tests? |
@anghelnicolae I will need your full run log to tell you what went wrong (ones you sent before did not include data from the start of the run). My theory was that you did not have docker daemon launched so elasticsearch failed to start. Can you check that docker is running locally? If it's not the case, can you share the logs from the moment you started running the test? |
Unfortunately my company has got me working on something else so I can't finish this process. Sorry for that. |
Thanks for the heads up, @anghelnicolae! I'll raise it with the team and we'll see what to do with this PR next (a lot of good work, just need to rebase it on top of the current main) |
Closes #2340
The existing connector only supports host header site collections and only the "sites" managed path, meaning site collections that derive their URL from the web application's URL. SharePoint supports host named site collections which can have any URL, so it doesn't make sense to force the user to only index a particular subset of sites.
For NTLM, since aiohttp doesn't support NTLM authentication the only option with async support is "httpx", so this is what I used.
Checklists
Pre-Review Checklist
config.yml.example
)v7.13.2
,v7.14.0
,v8.0.0
)Changes Requiring Extra Attention
Release Note