K-nearest neighbour (k-NN) algorithm

From Computer Science Wiki

In the field of machine learning, the nearest neighbor algorithm is a method used for classification and regression. It is based on the idea of finding the training data point that is closest in distance to the new data point, and then using the label of that training point to make a prediction.

The nearest neighbor algorithm is a simple, intuitive method that can be used for a variety of tasks. It is particularly useful for tasks where the relationships between the features and the label are not linear. It is also useful when the training data is small, since it does not rely on building a model to make predictions.

To implement the nearest neighbor algorithm, you need to calculate the distance between the new data point and all of the training data points. The distance can be calculated using a variety of distance measures, such as the Euclidean distance or the Manhattan distance. Once you have calculated the distances, you can find the nearest neighbor by selecting the training point with the smallest distance. You can then use the label of that training point to make a prediction for the new data point.

One disadvantage of the nearest neighbor algorithm is that it can be computationally expensive, especially when the training data is large. It is also sensitive to the scale of the features, so it is important to normalize the data before using the algorithm.