Subset Models for Multivariate Time Series Forecast
1st MulTiSA, ICDE 2024
Raphael Saldanha, Victor Ribeiro, Eduardo Pena, Marcel Pedroso, Reza Akbarinia, Patrick Valduriez, Fabio Porto
Inria (FR), LNCC (BR)
2024-05-13
Introduction
Abundant multivariate time series, good opportunity for forecasting machine learning methods
Data may present intrinsic diversity of samples, affecting model’s performance on different parts of the input
Global models: use all available time series
Local models: use only time series pertaining to each sample
Data subsets models: our proposal
Case example
Dengue disease is transmitted by mosquitoes and is a Public Health concern. Record number cases on 2024 in Brazil, tendency to increase with global warming
A typical forecasting model is targeted to predict number of cases based on climate indicators (rain and temperature)
A global model would use data from all municipalities, facing difficulties related to distinct temporal and spatial disease transmission patterns
Objective
Propose a subset modeling framework
Accommodate regional variations across diverse units (e.g. municipalities)
Cost-effective training with robust prediction capabilities in comparison with global models
Subset modeling framework
Identify subsets within the dataset with similar patterns
Train models for each subset
Use the model trained on the subset data for prediction
Datasets
Dengue dataset. Weekly cases count, from 2011 to 2020, for 333 municipalities.
Climate dataset. Average maximum and minimum temperature, total precipitation. Same time and spatial units and coverage.
All indicators were standardized (with zero mean and one SD)
Experimental setup
Identify data subsets considering dengue cases and covariates patterns across municipalities with DTW distance.
Select the optimum number of subsets (\(k\)) considering silhouette score
Train random forest Global Model with and without the subset id feature information
Train random forest Subsets Models
Evaluate forecasting model’s performance on test data