Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoTS multiple variables #224

Open
chenxurich opened this issue Jan 11, 2024 · 5 comments
Open

AutoTS multiple variables #224

chenxurich opened this issue Jan 11, 2024 · 5 comments

Comments

@chenxurich
Copy link

Hello, Happy New Year to you all.

I read some articles that AutoTS can support time series analysis with multiple independent and dependent variables, if it is true, could you help me to learn more details?

a. how to know which independent variables are selected for the final ensemble model and why;
b. whether the final model can output reports with the model's equations, errors, upper and lower forecasts, etc;
c. how to determine the optimal number of iterations.

Since this is a really good methodology, I would like to be able to learn it better, thanks!

@winedarksea
Copy link
Owner

It would probably be best for you to start by running a few examples.

A) there are dozens of different models here. How they generate forecasts varies by model, so I can't give a single answer. If you have a final model selected from a forecasting task, you can post a question and I can explain that specific model.

B) model parameters, errors, and forecasts (with bounds) are available for all models. An 'equation' is only available from two of the models, Cassandra probably being the best to start with if you need more breakdown.

C) it depends on a lot of factors. Generally the search space is so large that some improvement can always be found with more searching. The constraint will be based on your available compute time.

@chenxurich
Copy link
Author

At first I input all the data, specifying only the time and the dependent variable.

Later I input only the A,B,C variables and the C variable fluctuates a lot. The best model is a BestN model with 'SeasonalNaive', 'ETS', 'Ensemble', 'DatepartRegression', 'UnobservedComponents'. I'm really confused cuz I don't know if such a model is reusable, and how changes in the independent variables affect the predictions.

The following code may be helpful:
model = AutoTS(forecast_length = 15, ensemble = "auto", frequency = "D", max_generations = 5)

@winedarksea
Copy link
Owner

None of the models there are multivariate so they A, B, and C won't be influencing each other. If you want a simple automl solution where you specify an X to Y, try flaml. Time series forecasting though is rather different from other data science. Most predictions come from the history of the variable itself and it's relation to the calendar/seasonality. For example the DatepartRegression there is a standard regression model such as from sklearn with an X that is various calendar/date part features and the Y is the outcome of that variable.

Maybe using the production_example.py would help using real world data it pulls in.

@chenxurich
Copy link
Author

chenxurich commented Jan 12, 2024

Thanks!

If I specify model_list = ['VAR'] or some other multivariate models, how do I specify the independent variables? Or could AutoTS help me to choose? Please feel free to correct me if I'm wrong.

@winedarksea
Copy link
Owner

winedarksea commented Jan 12, 2024

The entire idea behind AutoTS is that it does everything for you. If you want to hand build a specific model, the statsmodels package is probably the best place to start.
For most time series, most of the predictive power comes from

  1. the history of the target to forecast
  2. the calendar seasonality relation to the target to forecast
    it might be worth just passing into AutoTS your target, no other information, and looking at the forecast to see how it does with that.

There are some models that utilize, in various ways, other information.
The future_regressor allows passing in information that is known for the future.
There is also the use of 'covariates' which is other information not known for the future. Examples include weather data, economic market indicators, and so on. These usually are not that helpful because they are too noisy or lack a direct enough relationship to easily be found.

It is also worth asking what your bigger goal is with independent variables? Are you trying to provide explain-ability in some way or trying to explain relationships among variables? Or are you just trying to get a good forecast?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants