flowchart LR A[Domain data] --> G[Global Model] A --> C(Clustering) C --> K1[Subset model 1] C --> K2[Subset model 2] C --> Kn[Subset model k] G --> P1[Inference] K1 --> P2[Inference] K2 --> P2 Kn --> P2 P1 --> AC[Performance comparison] P2 --> AC
A Domain Partitioning Strategy for Data-efficient Machine Learning
Inria
Complex data present internal diversity
ML systems may present a good overall performance
But it is not uniformly equal on all parts of the input
Given a dataset \(D\), train a global ML model \(G\)
Identify a number of subsets \(S_k\) on \(D\)
Train ML models on each \(S_k\)
For inference, assign the incoming sample to the corresponding \(S_k\)
Compare the performance observed on \(G\) and \(S\) models for each unit
flowchart LR A[Domain data] --> G[Global Model] A --> C(Clustering) C --> K1[Subset model 1] C --> K2[Subset model 2] C --> Kn[Subset model k] G --> P1[Inference] K1 --> P2[Inference] K2 --> P2 Kn --> P2 P1 --> AC[Performance comparison] P2 --> AC
The subset may have a priori definitions
Identified with data-driven methods, like clustering techniques