zoukankan      html  css  js  c++  java
  • scikit-learn包的学习资料

    http://scikit-learn.org/stable/modules/clustering.html#k-means

    http://my.oschina.net/u/175377/blog/84420

    K-Means clustering参数说明:

    http://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans

    class sklearn.cluster.KMeans(n_clusters=8init='k-means++'n_init=10max_iter=300tol=0.0001,precompute_distances='auto'verbose=0random_state=Nonecopy_x=Truen_jobs=1)

    n_clusters : int, optional, default: 8

    The number of clusters to form as well as the number of centroids to generate.

    集群形成的数量以及质心产生的数量。

    max_iter : int, default: 300

    Maximum number of iterations of the k-means algorithm for a single run.

    k-means算法的一个单一运行的最大迭代数。

    n_init : int, default: 10

    Number of time the k-means algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.

    不同质心的种子的k - means算法将运行的次数。最终结果将是n_init次连续运行的最好的输出。

    init : {‘k-means++’, ‘random’ or an ndarray}

    Method for initialization, defaults to ‘k-means++’:

    初始化的方法,默认为“k - means + +”:

    ‘k-means++’ : selects initial cluster centers for k-mean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.“k - means + +”:用优化的方式来加速收敛,以选择k-mean初始集群中心。

    ‘random’: choose k observations (rows) at random from data for the initial centroids.

    ‘random’:从数据中随机的选择k个观测值作为初始的聚类中心。

    If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.

    如果一个n胃数组传递,它的形状应该是(n_clusters n_features),并给出初始中心。

    precompute_distances : {‘auto’, True, False}

    Precompute distances (faster but takes more memory).

    预计算的距离(更快,但需要更多的内存)。

    ‘auto’ : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision.

    ‘auto’:当n_samples * n_clusters > 1200万时,不要预先计算距离。这对应于使用双精度数据会带来平均大约100 mb的开销。

    True : always precompute distances

    False : never precompute distances

    tol : float, default: 1e-4

    Relative tolerance with regards to inertia to declare convergence

    对于精度的惯性收敛

    n_jobs : int

    The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.用于计算的工作量。这是通过计算每个n_init并行运行。

    If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.

    random_state : integer or numpy.RandomState, optional

    The generator used to initialize the centers. If an integer is given, it fixes the seed. Defaults to the global numpy random number generator.

    verbose : int, default 0

    Verbosity mode.

    copy_x : boolean, default True

    When pre-computing distances it is more numerically accurate to center the data first. If copy_x is True, then the original data is not modified. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean.

    cluster_centers_ : array, [n_clusters, n_features]

    Coordinates of cluster centers

    labels_ : :

    Labels of each point

    inertia_ : float

    Sum of distances of samples to their closest cluster center.

     

  • 相关阅读:
    Visual Studio日志
    选择jQuery的理由
    第三方开发者可将JIT和编译器引入WinRT吗?
    Visual Studio 2012和.NET 4.5已经就绪!
    500TB——Facebook每天收集的数据量
    Netflix开源他们的另一个架构——Eureka
    PhoneGap 2.0 发布
    快速哈希算法破坏了加密的安全性
    Blend for Visual Studio 2012:为Windows 8应用所用的UX工具
    系统中的故障场景建模
  • 原文地址:https://www.cnblogs.com/j6-2/p/4779455.html
Copyright © 2011-2022 走看看