Training data

From Computer Science Wiki

Training data is a set of data used to train a machine learning model. It consists of input data and corresponding labels or target values, and is used to fit or optimize the parameters of the model.

The input data, also called the features or predictors, are the variables or characteristics of the data that are used to make predictions or decisions. For example, in a machine learning model that predicts the price of a house based on its size, location, and number of bedrooms, the input data would be the size, location, and number of bedrooms of the house.

The labels or target values, also called the responses or outcomes, are the values that the model is trying to predict or classify. For example, in a machine learning model that predicts the price of a house, the label or target value would be the price of the house.

The goal of training a machine learning model is to find the values of the model's parameters that minimize the error or loss between the predicted labels and the true labels of the training data. This is done using an optimization algorithm, such as gradient descent or stochastic gradient descent, which adjusts the parameters of the model based on the error or loss of the model on the training data.

Once the model has been trained on the training data, it can be evaluated using a separate set of data called the test data or validation data. This allows the model to be tested on new data to see how well it generalizes to unseen data.