[Question] Multi covariates setting and scaling problem using own data #2350

ALH84007 · 2024-04-25T02:54:13Z

I imported the data and converted it into time series, set the target and covariates, divided the training set, test set, and validation set, and normalized them separately. It is not clear to me how to divide the training set, test set, and validation set and scaling them after stacking or concatenating more than two covariates.
If I set:
past_covariates = concatenate([A, B, C, D], axis=1)
stacked_covariates = past_covariates1.stack([past_covariates2, past_covariates3, ..., past_covariatesN])
how to divide and scale past covariates?

Below is my code, the division and standardization of training sets, test sets and validation sets also seem troublesome, please give me some suggestions.

train_ratio = 0.8
val_ratio = 0.1
test_ratio = 0.1

file_path = 'D:/XXX.csv'
df = pd.read_csv(file_path)

df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.set_index('Timestamp', inplace=True)

# TODO: multivariate time series
# target_column_names = ['algal', 'chlorophyll']
target_column_names = ['chlorophyll']
covariate_columns_names = ['TEM', 'PH', 'DO', 'conductivity', 'turbidity', 'PV', 'AN', 'TP', 'TN']

target_series = TimeSeries.from_dataframe(df, value_cols=target_column_names, freq='H')
covariate_series = TimeSeries.from_dataframe(df, value_cols=covariate_columns_names, freq='H')

# time series split
train_target, temp_target = target_series.split_before(train_ratio)
val_target, test_target = temp_target.split_before(val_ratio / (1 - train_ratio))
train_covariates, temp_covariates = covariate_series.split_before(train_ratio)
val_covariates, test_covariates = temp_covariates.split_before(val_ratio / (1 - train_ratio))
# time series scaled
scaler_target = Scaler()
scaler_covariates = Scaler()

target_scaled = scaler_target.fit_transform(target_series)

train_target_scaled = scaler_target.fit_transform(train_target)
val_target_scaled = scaler_target.transform(val_target)
model_target_scaled = concatenate([train_target_scaled, val_target_scaled])
test_target_scaled = scaler_target.transform(test_target)

train_covariates_scaled = scaler_covariates.fit_transform(train_covariates)
val_covariates_scaled = scaler_covariates.transform(val_covariates)
model_covariates_scaled = concatenate([train_covariates_scaled, val_covariates_scaled])
test_covariates_scaled = scaler_covariates.transform(test_covariates)
all_covariates_scaled = concatenate([model_covariates_scaled, test_covariates_scaled])

# plot
train_target_scaled.plot(label="training")
val_target_scaled.plot(label="validation")
test_target_scaled.plot(label="test")
plt.show()

The text was updated successfully, but these errors were encountered:

madtoinou · 2024-04-25T07:15:24Z

Hi @ALH84007,

Your code looks great; you fit the Scaler on the training split of the target and then, apply it to the validation and test sets before concatenating them together.

Having a multivariate covariates does not change anything, the Scaler will process them individually (independently from the others components ranges) so you can keep your code as it is. Not sure to understand what your problem is here?

ALH84007 · 2024-04-25T08:28:19Z

Hi @ALH84007,

Your code looks great; you fit the Scaler on the training split of the target and then, apply it to the validation and test sets before concatenating them together.↳

Having a multivariate covariates does not change anything, the Scaler will process them individually (independently from the others components ranges) so you can keep your code as it is. Not sure to understand what your problem is here?↳

Thank you for your reply. I was wondering if I need to stack or concatenate covariates, and after stacking can I still divide and standardize them according to the existing code. Now I got it according to your reply~

madtoinou · 2024-04-25T08:34:50Z

If the new covariates can be considered as new components, and not "temporal continuation" of existing components, you indeed need to stack them.

The code will continue to work as long as the new covariates (components) are added before fitting the scaler for the first time (otherwise, it will complain about the dimensions of the series).

madtoinou added the question Further information is requested label Apr 25, 2024

dennisbader closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Multi covariates setting and scaling problem using own data #2350

[Question] Multi covariates setting and scaling problem using own data #2350

ALH84007 commented Apr 25, 2024 •

edited

madtoinou commented Apr 25, 2024

ALH84007 commented Apr 25, 2024 •

edited

madtoinou commented Apr 25, 2024

[Question] Multi covariates setting and scaling problem using own data #2350

[Question] Multi covariates setting and scaling problem using own data #2350

Comments

ALH84007 commented Apr 25, 2024 • edited

madtoinou commented Apr 25, 2024

ALH84007 commented Apr 25, 2024 • edited

madtoinou commented Apr 25, 2024

ALH84007 commented Apr 25, 2024 •

edited

ALH84007 commented Apr 25, 2024 •

edited