Training Set

Category: science

The subset of data used to teach the machine learning model.

The training set is the "textbook." If you want to build a model that predicts home values, the training set must contain thousands of examples of past sales. A "clean" textbook leads to a smart model; a "messy" one leads to a useless, biased system.

Common Examples

  • Before training, we clean the training set to remove outliers that would otherwise skew the model’s baseline pricing predictions.
  • The quality and diversity of the training set is the primary predictor of how well the AI will handle real-world deployment challenges.

AvoCoLab – Community, News & Market Intelligence