zoukankan      html  css  js  c++  java
  • pandas.factorize()

    pandas官网  http://pandas.pydata.org/pandas-docs/stable/generated/pandas.factorize.html

    pandas.factorize(valuessort=Falseorder=Nonena_sentinel=-1size_hint=None)

    Encode the object as an enumerated type or categorical variable.

    作用是将object型变量转换成枚举型或者类别型

    Prameters:

    values : sequence

    A 1-D seqeunce. Sequences that aren’t pandas objects are coereced to ndarrays before factorization.

    sort : bool, default False

    Sort uniques and shuffle labels to maintain the relationship.

    order

    Deprecated since version 0.23.0: This parameter has no effect and is deprecated.

    na_sentinel : int, default -1

    Value to mark “not found”.

    size_hint : int, optional

    Hint to the hashtable sizer.

    Returns:

    labels : ndarray

    An integer ndarray that’s an indexer into uniquesuniques.take(labels) will have the same values as values.

    uniques : ndarray, Index, or Categorical

    The unique valid values. When values is Categorical, uniques is a Categorical. When values is some other pandas object, an Index is returned. Otherwise, a 1-D ndarray is returned.

    Note:Even if there’s a missing value in valuesuniques will not contain an entry for it.

    Example

    1、 pd.factorize(values)

    >>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'])
    >>> labels
    array([0, 0, 1, 2, 0])
    >>> uniques
    array(['b', 'a', 'c'], dtype=object)

    2、 pd.factorize(values, sort = True)

    >>> labels, uniques = pd.factorize(['b', 'b', 'a', 'c', 'b'], sort=True)
    >>> labels
    array([1, 1, 0, 2, 1])
    >>> uniques
    array(['a', 'b', 'c'], dtype=object)

    3、Missing values are indicated in labels with na_sentinel (-1 by default). Note that missing values are never included in uniques.

    >>> labels, uniques = pd.factorize(['b', None, 'a', 'c', 'b'])
    >>> labels
    array([ 0, -1,  1,  2,  0])
    >>> uniques
    array(['b', 'a', 'c'], dtype=object)

    4、Thus far, we’ve only factorized lists (which are internally coerced to NumPy arrays). When factorizing pandas objects, the type of uniques will differ. For Categoricals, a Categorical is returned.

    >>> cat = pd.Categorical(['a', 'a', 'c'], categories=['a', 'b', 'c'])
    >>> labels, uniques = pd.factorize(cat)
    >>> labels
    array([0, 0, 1])
    >>> uniques
    [a, c]
    Categories (3, object): [a, b, c]
    Notice that 'b' is in uniques.categories, desipite not being present in cat.values.

    5、For all other pandas objects, an Index of the appropriate type is returned.

    >>> cat = pd.Series(['a', 'a', 'c'])
    >>> labels, uniques = pd.factorize(cat)
    >>> labels
    array([0, 0, 1])
    >>> uniques
    Index(['a', 'c'], dtype='object')
  • 相关阅读:
    Uni项目启动微信、QQ、淘宝、抖音、京东等APP的方法
    input 标签为checkbox时修改 checkbox的样式
    关于CSS HACK
    前端JS生成可用的MD5加密代码
    把ucharts 封装成组件
    学期总结
    作业1
    作业02
    C语言I博客作业09
    C语言I博客作业08
  • 原文地址:https://www.cnblogs.com/xiaodongsuibi/p/9098504.html
Copyright © 2011-2022 走看看