Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add glob to match filetype #10960

Closed
wants to merge 5 commits into from

Conversation

guibes
Copy link

@guibes guibes commented Apr 25, 2024

Release Notes:

  • Improved the match of file_types using Globs (#10765).

Now on settings.json file we accept Glob, for example:

"file_types": {
    "Dockerfile": [
      "Dockerfile",
      "Dockerfile.*",
    ]
  }

Release Notes:

N/A

Copy link

cla-bot bot commented Apr 25, 2024

We require contributors to sign our Contributor License Agreement, and we don't have @guibes on file. You can sign our CLA at https://zed.dev/cla. Once you've signed, post a comment here that says '@cla-bot check'.

@guibes
Copy link
Author

guibes commented Apr 25, 2024

@cla-bot check

@cla-bot cla-bot bot added the cla-signed The user has signed the Contributor License Agreement label Apr 25, 2024
Copy link

cla-bot bot commented Apr 25, 2024

The cla-bot has been summoned, and re-checked this pull request!

@@ -485,7 +486,16 @@ impl LanguageRegistry {
.and_then(|types| types.get(language_name))
.unwrap_or(&empty)
.iter()
.any(|suffix| path_suffixes.contains(&Some(suffix.as_str())));
.any(|suffix| {
path_suffixes.iter().any(|path_suffix| {
Copy link
Contributor

@osiewicz osiewicz Apr 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we'd be parsing globs on each invokation of language_for_file_internal, which is a pretty hot function; I think a better option would be to change type of user_file_types to Option<&HashMap<Arc<str>, Vec<Pattern>>> (and the corresponding type in settings: https://github.com/zed-industries/zed/blob/main/crates/language/src/language_settings.rs#L57) and do the parsing in settings deserialization (https://github.com/zed-industries/zed/blob/main/crates/language/src/language_settings.rs#L566). That way, invalid globs will be rejected at settings load time, which should be better.

Also, mea culpa but we already have a dependency for glob parsing (globset) - we can use that instead of glob as it should also be presumably more efficient for our use case (of multiple globs) with GlobSet. This is my fault, as I've suggested using glob initially. Sorry!
Thus, user_file_types will be HashMap<Arc<str>, GlobSet> in the end.

Overall though this looks good to me; once we move glob parsing to Settings-load time, we should be good to merge :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started refactor the code to accept Option<&HashMap<Arc<str>, Vec<Pattern>>> on AllLanguageSettings but I faced some issues to add the Lifetime parameter (need a lot of refactors to add it), so I add the type of file_types as Option<HashMap<Arc<str>, GlobSet>>, treated it on impl settings::Settings for AllLanguageSettings method like this:

            for (language, suffixes) in &user_settings.file_types {
                let mut builder = GlobSetBuilder::new();
                for suffix in suffixes {
                    builder.add(Glob::new(suffix)?);
                }
                file_types
                    .as_mut()
                    .unwrap()
                    .insert(language.clone(), builder.build()?);
            }

With this I need get the reference on

pub fn language_for_file(
using user_file_types.file_types.as_ref() and refactor the path_matches_custom_suffix to be something like:

            let path_matches_custom_suffix = user_file_types
                .and_then(|types| types.get(language_name))
                .map_or(false, |globset| {
                    path_suffixes
                        .iter()
                        .any(|&suffix| suffix.map_or(false, |suffix| globset.is_match(suffix)))
                });

I'll push the changes, if have some appointments, how I said, I'm a newcomer and I little bit afraid to make some big changes on the code.

Copy link
Contributor

@osiewicz osiewicz Apr 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, what you have done is pretty much fine; I didn't mean to use a reference type on the AllLanguageSettings itself, since - as you've pointed out - we'd have to carry that lifetime around everywhere, and that would be a pain in the bum.

@@ -54,7 +54,7 @@ pub struct AllLanguageSettings {
pub copilot: CopilotSettings,
defaults: LanguageSettings,
languages: HashMap<Arc<str>, LanguageSettings>,
pub(crate) file_types: HashMap<Arc<str>, Vec<String>>,
pub(crate) file_types: Option<HashMap<Arc<str>, GlobSet>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub(crate) file_types: Option<HashMap<Arc<str>, GlobSet>>,
pub(crate) file_types: HashMap<Arc<str>, GlobSet>,

Comment on lines +602 to +604
.as_mut()
.unwrap()
.insert(language.clone(), builder.build()?);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.as_mut()
.unwrap()
.insert(language.clone(), builder.build()?);
.insert(language.clone(), builder.build()?);

@@ -563,7 +563,7 @@ impl settings::Settings for AllLanguageSettings {
.and_then(|c| c.disabled_globs.as_ref())
.ok_or_else(Self::missing_default)?;

let mut file_types: HashMap<Arc<str>, Vec<String>> = HashMap::default();
let mut file_types: Option<HashMap<Arc<str>, GlobSet>> = Some(HashMap::default());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut file_types: Option<HashMap<Arc<str>, GlobSet>> = Some(HashMap::default());
let mut file_types: HashMap<Arc<str>, GlobSet> = HashMap::default();

I'm pretty sure that we may be able to omit the type altogether, though I can't check that rn:

Suggested change
let mut file_types: Option<HashMap<Arc<str>, GlobSet>> = Some(HashMap::default());
let mut file_types = HashMap::default();

@@ -453,7 +454,7 @@ impl LanguageRegistry {
self.language_for_file_internal(
&file.full_path(cx),
content,
Some(&user_file_types.file_types),
user_file_types.file_types.as_ref(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
user_file_types.file_types.as_ref(),
Some(&user_file_types.file_types),

@mikayla-maki
Copy link
Contributor

mikayla-maki commented May 18, 2024

As this PR hasn't been updated in 3 weeks, I'm going to close this. We're still interested in the feature though, and if you'd like to incorporate @osiewicz's feedback then open a new PR, we'd be happy to re-evaluate it for merging :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed The user has signed the Contributor License Agreement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants