本文共 7350 字,大约阅读时间需要 24 分钟。
class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, weights=’uniform’, algorithm=’auto’, leaf_size=30, p=2, metric=’minkowski’, metric_params=None, n_jobs=1, **kwargs)
Parameters: | n_neighbors : int, optional (default = 5)
weights : str or callable, optional (default = ‘uniform’)
algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
leaf_size : int, optional (default = 30)
p : integer, optional (default = 2)
metric : string or callable, default ‘minkowski’
metric_params : dict, optional (default = None)
n_jobs : int, optional (default = 1)
|
---|
重要的参数:
n_jobs : 并行参数
Methods
(X, y) | Fit the model using X as training data and y as target values |
([deep]) | Get parameters for this estimator. |
([X, n_neighbors, return_distance]) | Finds the K-neighbors of a point. |
([X, n_neighbors, mode]) | Computes the (weighted) graph of k-Neighbors for points in X |
(X) | Predict the class labels for the provided data |
(X) | Return probability estimates for the test data X. |
(X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
(**params) | Set the parameters of this estimator. |
几个重要方法:
predict
( X ) Predict the class labels for the provided data
Parameters: | X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
|
---|---|
Returns: | y : array of shape [n_samples] or [n_samples, n_outputs]
|
predict_proba
( X ) Return probability estimates for the test data X.
Parameters: | X : array-like, shape (n_query, n_features), or (n_query, n_indexed) if metric == ‘precomputed’
|
---|---|
Returns: | p : array of shape = [n_samples, n_classes], or a list of n_outputs
|
score
( X, y, sample_weight=None ) Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: | X : array-like, shape = (n_samples, n_features)
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
sample_weight : array-like, shape = [n_samples], optional
|
---|---|
Returns: | score : float
|
Example1
from sklearn.neighbors import KNeighborsClassifierX = [[0], [1], [2], [3]]y = [0, 0, 1, 1]neigh = KNeighborsClassifier(n_neighbors=3)neigh.fit(X,y)print(neigh.predict([[1.1]])) #预测出所在类样本标签print(neigh.predict_proba([[0.9]])) #预测''' [0] [[0.66666667 0.33333333]] # 分别对应这个 标签为 0 , 1 的可能性'''
Example2
from sklearn.neighbors import NearestNeighborssamples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]]neigh = NearestNeighbors(n_neighbors=1)neigh.fit(samples)print(neigh.kneighbors([[1., 1., 1.]]))''' resuult: (array([[0.5]]), array([[2]], dtype=int64))'''#第一个数组代表最近距离,第二个数组代表最近点的索引,当参数return_distance = False时 不返回距离 只返回索引
Example3
from sklearn.neighbors import KNeighborsClassifierX = [[0], [1], [2], [3]]y = [0, 0, 1, 1]neigh = KNeighborsClassifier(n_neighbors=3)neigh.fit(X,y)print('到每个样本的距离及本身索引:')print(neigh.kneighbors())#默认 计算每个样本的 所有近邻 并返回距离print('准确率:',neigh.score([[0.8],[1.5]],[1,0]))'''到每个样本的距离及本身索引:(array([[1., 2., 3.], 表示第2,3,4样本到第一个样本的距离 [1., 1., 2.], [1., 1., 2.], [1., 2., 3.]]), array([[1, 2, 3],表示到第一个样本的索引 [0, 2, 3], [3, 1, 0], [2, 1, 0]], dtype=int64))准确率: 0.5'''
一个分类的小练习:
from sklearn import datasets, neighbors, linear_modeldigits = datasets.load_digits()X_digits = digits.datay_digits = digits.targetn_samples = len(X_digits)X_train = X_digits[:int(.9 * n_samples)]y_train = y_digits[:int(.9 * n_samples)]X_test = X_digits[int(.9 * n_samples):]y_test = y_digits[int(.9 * n_samples):]knn = neighbors.KNeighborsClassifier()logistic = linear_model.LogisticRegression()print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))print('LogisticRegression score: %f' % logistic.fit(X_train, y_train).score(X_test, y_test))
多个分类算法的对比:
from itertools import productimport numpy as npimport matplotlib.pyplot as pltfrom sklearn import datasetsfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.svm import SVCfrom sklearn.ensemble import VotingClassifier# Loading some example datairis = datasets.load_iris()X = iris.data[:, [0, 2]]y = iris.target# Training classifiersclf1 = DecisionTreeClassifier(max_depth=4)clf2 = KNeighborsClassifier(n_neighbors=7)clf3 = SVC(kernel='rbf', probability=True)eclf = VotingClassifier(estimators=[('dt', clf1), ('knn', clf2), ('svc', clf3)], voting='soft', weights=[2, 1, 2])clf1.fit(X, y)clf2.fit(X, y)clf3.fit(X, y)eclf.fit(X, y)# Plotting decision regionsx_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))f, axarr = plt.subplots(2, 2, sharex='col', sharey='row', figsize=(10, 8))for idx, clf, tt in zip(product([0, 1], [0, 1]), [clf1, clf2, clf3, eclf], ['Decision Tree (depth=4)', 'KNN (k=7)', 'Kernel SVM', 'Soft Voting']): Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) print(type(Z)) Z = Z.reshape(xx.shape) axarr[idx[0], idx[1]].contourf(xx, yy, Z, alpha=0.4) axarr[idx[0], idx[1]].scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor='k') axarr[idx[0], idx[1]].set_title(tt)plt.show()