Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Optimal intercept initialization for simple objectives #10298

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

david-cortes
Copy link
Contributor

@david-cortes david-cortes commented May 18, 2024

ref #9899

This PR modifies the intercept initialization for simple objectives (logistic, poisson, gamma, tweedie) to use their closed-form optimal solutions (as in: the number that minimizes the objective function) instead of a non-optimal one-step Newton.

For these objectives, the optimal intercept corresponds simply to the link function applied to the mean of the response variable. Since base_score already undergoes this transformation, the PR here just changes calculation to the mean of the response variable in those cases.

For multi-target versions of these objectives, it sets them to zero instead as otherwise applying a common intercept might not make much sense for the given problem.

Note that there's still room for improvements:

  • Custom user-defined functions would most likely be better served by a default score of zero or by a 1D newton estimation. I wasn't sure where in the code to identify when a user-defined objective is passed though.
  • Other objectives would likely benefit from using more than one newton step for the intercept estimation.

Note1: I wasn't sure about how to calculate a weighted sample mean here (not familiar with GPU computing and the 'devices' logic). Would be helpful to have a WeightedMean function under stats if possible, to use in case there's sample weights.

Note2: The compiler checks here don't like turning a linalg::Tensor<T, 2> into linalg::Tensor<T, 1> by reinterpret_cast. I'm also not sure what would be the right way to do it without a data copy.

Note3: I wasn't sure where to add tests for the changes here. For example, would be ideal to test that binary:logistic and binary:logitraw produce the same raw scores, but I'm not sure where's the right place to add such test.

@david-cortes david-cortes changed the title Optimal intercept initialization for simple objectives [WIP] Optimal intercept initialization for simple objectives May 18, 2024
@trivialfis
Copy link
Member

trivialfis commented May 20, 2024

Thank you for working on this! I will look into the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants