zoukankan      html  css  js  c++  java
  • Categorical Data

    This is an introduction to pandas categorical data type, including a short comparison with R’s factor.
    Categoricals are a pandas data type, which correspond to categorical variables in statistics: a variable, which can take on only a limited, and usually fixed, number of possible values (categories; levels in R). Examples are gender, social class, blood types, country affiliations, observation time or ratings via Likert scales.

    In contrast to statistical categorical variables, categorical data might have an order (e.g. ‘strongly agree’ vs ‘agree’ or ‘first observation’ vs. ‘second observation’), but numerical operations (additions, divisions, …) are not possible.

    All values of categorical data are either in categories or np.nan. Order is defined by the order of categories, not lexical order of the values. Internally, the data structure consists of a categories array and an integer array of codes which point to the real value in the categories array.

    The categorical data type is useful in the following cases:

    • A string variable consisting of only a few different values. Converting such a string variable to a categorical variable will save some memory, see here.
    • The lexical order of a variable is not the same as the logical order (“one”, “two”, “three”). By converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order, see here.
    • As a signal to other python libraries that this column should be treated as a categorical variable (e.g. to use suitable statistical methods or plot types).

    概括:Categorical Data数据类型就类似“性别”、“血型”、“班级”等,只能是一些固定的“值“。Categorical Data可以有不同级别,但是不能用于数值计算。

  • 相关阅读:
    今天是元旦啊
    [待解决]python 函数加括号和不加括号的区别
    Jupyter Notebook的快捷键列表误操作发现的新大陆
    Series选择和过滤
    做鸢尾花切片练习中的'&'问题:(&,|)和(and,or)
    报错合集
    关于随机数种子seed的问题尽量使用numpy下的seed
    pandas创建Series序列/hashable
    在jupyter notebook中插入截图
    xml反序列化时,如何生成与之对应的类文件
  • 原文地址:https://www.cnblogs.com/yaos/p/14014469.html
Copyright © 2011-2022 走看看