Training Set
Category: science
The subset of data used to teach the machine learning model.
The training set is the "textbook." If you want to build a model that predicts home values, the training set must contain thousands of examples of past sales. A "clean" textbook leads to a smart model; a "messy" one leads to a useless, biased system.
Common Examples
- Before training, we clean the training set to remove outliers that would otherwise skew the model’s baseline pricing predictions.
- The quality and diversity of the training set is the primary predictor of how well the AI will handle real-world deployment challenges.