bolt > neighbors > _KNeighborsClassifier

KNeighborsClassifier

The KNeighborsClassifier is a classic machine learning algorithm used for classification tasks. This model leverages the concept of "proximity" by identifying the k closest data points to a given input and assigning the most frequent label among these neighbors. It's especially effective for problems where similar data points tend to belong to the same class.

Parameters

`Knn/fit(X: List[List[f24]], y: List[f24], n_neighbors: u24, p: f24)`

X: Training data, a list of lists where each sublist represents a feature vector of type f24.
y: Target values, list of labels associated with each feature vector in X, type f24.
n_neighbors: The number of neighbors to consider for classification (k value), type u24.
p: The power parameter for the Minkowski metric. When p = 1, this uses the Manhattan distance (L1), and when p = 2, it uses the Euclidean distance (L2). Type f24.

Returns: A fitted KNeighborsClassifier instance.

`Knn/predict(model: KNeighborsClassifier, X_test: List[List[f24]])`

model: A fitted KNeighborsClassifier instance.
X_test: Test data, a list of lists where each sublist represents a feature vector, type f24.

Returns: Predicted class labels for each data sample in X_test, as an array with shape (n_queries,).

Example Usage

X = [[5.0, 3.0], [3.0, 2.0], [1.5, 9.0], [7.0, 2.0]]
y = [0.0, 1.0, 0.0, 1.0]
k = 5
p = 2.0

# Fit the KNeighborsClassifier model
model = Knn/fit(X, y, k, p)

# Define test data
X_test = [[2.0, 7.0], [5.0, 3.0]]

# Predict class labels for test data
y_pred = Knn/predict(model, X_test)

Tuning Tips

Choosing k: Larger k values reduce sensitivity to noise but may smooth out decision boundaries. Typical values are between 3 and 10.
Choosing p: Experiment with p=1 and p=2 initially to see which metric works best. Higher values may result in less interpretable metrics.