zoukankan      html  css  js  c++  java
  • Pandas 之 Series / DataFrame 初识

    import numpy as np
    import pandas as pd
    

    Pandas will be a major tool of interest throughout(贯穿) much of the rest of the book. It contains data structures and manipulation tools designed to make data cleaning(数据清洗) and analysis fast and easy in Python. pandas is often used in tandem(串联) with numerical computing tools like NumPy and SciPy, analytical libraries like statsmodels and scikit-learn, and data visualization(可视化) libraries like matplotlib. pandas adopts(采用) sinificant(显著的,大量的) parts of NumPy's idiomatic(惯用的) style of array based computing, especially array-based functions and preference for data processing without for loops.(面向数组编程)

    While pandas adopts many coding idioms(惯用的) from NumPy, the biggest difference is that pandas is disgined for working with tabular(表格型) or heterogeneous(多样型) data. NumPy, by contrast(对比), is best suite for working with homogeneous numerical array data. -> pandas 是表格型数据处理的一种最佳方案(作者很能吹的哦)

    Since become an open source project in 2010, pandas has matured(成熟的) into a quite large library that is applicable(适用于) in a broad set of real-world use cases. -> 被广泛使用 The developer community has grown to over 800 distinct(活跃的) contributors, who have been helping build the project as they have used
    it to solve their day-to-day data problems. -> 解决日常生活中的大量数据处理问题

    Throughout the rest of the book, I use the following import convention for pandas:

    import pandas as pd
    # from pandas import Serieser, DataFrame
    

    Thus, whever you see pd in code, it is refering to pandas. You may also find it easier to import Series and Dataframe into the local namespace since they are frequently used:

    "from pandas import Series DataFrame"

    To get start with pandas, you will need to comfortable(充分了解) with its two workhorse data structures: Series and DataFrame. While(尽管) they are not a universal solution for every problem, they provide a solid(稳定的), easy-to-use basis for most applications.

    Series

    A series is a one-dimensional array-like object containing a sequence of values(of similar types to NumPy types) and an associated array of data labels, called it's index. The simplest(简明来说) Series is formed from only an array of data. -> Series像是一个有索引的一维NumPy数组.

    obj = pd.Series([4, 7, -5, 3])
    obj
    
    0    4
    1    7
    2   -5
    3    3
    dtype: int64
    

    The string representation(代表) of a Series displaye interactively(交互地) show the index on the left and the value on the right.(索引显示在左边, 值在右边) Since we did not specify(指定) an index for the data, a default one consisting of the integer 0 throught N-1(where N is the lenght of the data)(索引从0开始的) is created. You can get the array representation and index object of the Series via(通过) its values and index attributes, respectively: -> 通过其values, index属性进行访问和设置.

    obj.values
    
    array([ 4,  7, -5,  3], dtype=int64)
    
    obj.index  # like range(4)
    
    RangeIndex(start=0, stop=4, step=1)
    

    Often it will be describe to create a Series with an index identifying each data point with a lable:

    obj2 = pd.Series([4, 7, -5, 3], index=['d', 'b', 'a', 'c'])
    obj2
    
    "打印索引"
    obj2.index
    
    d    4
    b    7
    a   -5
    c    3
    dtype: int64
    
    '打印索引'
    
    Index(['d', 'b', 'a', 'c'], dtype='object')
    

    Compared with NumPy arrays, you can use labels in the index when selecting single values or a set of values.-> 通过index来选取单个或多个元素

    "选取单个元素[index]"
    obj2['a']
    
    "修改元素-直接赋值-修改是-inplace"
    obj2['d'] = 'cj'
    
    "选取多个元素[[index]], 注意, 没有值则会NaN, 比较健壮的"
    obj2[['c', 'a', 'd', 'xx']]
    
    
    '选取单个元素[index]'
    
    -5
    
    '修改元素-直接赋值-修改是-inplace'
    
    '选取多个元素[[index]], 注意, 没有值则会NaN, 比较健壮的'
    
    c:pythonpython36libsite-packagespandascoreseries.py:851: FutureWarning: 
    Passing list-likes to .loc or [] with any missing label will raise
    KeyError in the future, you can use .reindex() as an alternative.
    
    See the documentation here:
    https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
      return self.loc[key]
    
    c       3
    a      -5
    d      cj
    xx    NaN
    dtype: object
    
    "对元素赋值修改, 默认是原地修改的"
    obj2
    
    '对元素赋值修改, 默认是原地修改的'
    
    d    cj
    b     7
    a    -5
    c     3
    dtype: object
    

    Here ['c', 'a', 'd'] is interpreted(被要求为) as a list of indices, even though it contains strings instead of integers.-> 多个索引的键, 先用一个列表存起来, 再作为一个参数给索引.

    Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar multiplication(标量乘), or appplying math functions)函数映射, will preserve the index-value link: -> 像操作NumPy数组一样操作, 如bool数组, 标量乘, 数学函数等..

    "过滤出Series中大于0的元素及对应索引"
    "先还原数据, 字符不能和数字比较哦"
    obj2['d'] = 4 
    
    obj2[obj2 > 0]
    
    "标量计算"
    obj2 * 2
    
    "调用NumPy函数"
    "需要用values过滤掉索引, cj 觉得, 不然会报错"
    np.exp(obj.values)
    
    '过滤出Series中大于0的元素及对应索引'
    
    '先还原数据, 字符不能和数字比较哦'
    
    d    4
    b    7
    c    3
    dtype: object
    
    '标量计算'
    
    d      8
    b     14
    a    -10
    c      6
    dtype: object
    
    '调用NumPy函数'
    
    '需要用values过滤掉索引, cj 觉得, 不然会报错'
    
    array([5.45981500e+01, 1.09663316e+03, 6.73794700e-03, 2.00855369e+01])
    
    
    "cj test"
    obj2 > 0
    
    np.exp(obj2)
    
    'cj test'
    
    
    d     True
    b     True
    a    False
    c     True
    dtype: bool
    
    
    ---------------------------------------------------------------------------
    
    AttributeError                            Traceback (most recent call last)
    
    <ipython-input-39-86002a981278> in <module>
          2 obj2 > 0
          3 
    ----> 4 np.exp(obj2)
    
    
    AttributeError: 'int' object has no attribute 'exp'
    
    

    Another way to think about a Series is as fixed-lenght, ordered dict, as it's a mapping of index values to data values. -> (Series可以看做是一个有序字典映射, key是index, value.) It can be used in many contexts(情景) where you might use a dict:

    "跟字典操作一样, 遍历, 选取, 默认都是对key进行操作"
    
    'b' in obj2
    'xxx' in obj2
    
    
    
    '跟字典操作一样, 遍历, 选取, 默认都是对key进行操作'
    
    
    True
    
    
    False
    
    

    Should you have data contained in a Python dict, you can create a Series from it by pass the dict: -> 可直接将Python字典对象转为Series, index就是key.

    sdata = {'Ohio':35000, 'Texas':71000, 'Oregon':16000, 'Utah':5000}
    
    "直接可将字典转为Series"
    obj3 = pd.Series(sdata)
    obj3
    
    '直接可将字典转为Series'
    
    
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    
    
    # cj test
    
    "多层字典嵌套也是可以的, 但只会显示顶层结构"
    
    cj_data = {'Ohio':{'sex':1, 'age':18}, 'Texas':{'cj':123}}
    
    pd.Series(cj_data)
    
    '多层字典嵌套也是可以的, 但只会显示顶层结构'
    
    
    Ohio     {'sex': 1, 'age': 18}
    Texas              {'cj': 123}
    dtype: object
    
    

    When you are only passing a dict, the index in the resulting Series will have the dict's keys in sorted order. You can override this by passing the dict keys in order you want them to appear in the resulting Series: -> 传入字典对象, 默认的index是key, 可以通过重写index来达到任何我们期望的结果:

    "重写, 覆盖掉原来的index"
    
    states = ['California', 'Ohio', 'Oregon', 'Texas']
    
    "相同的字段直接 替换, 没有的字段, 则显示为NA"
    obj4 = pd.Series(sdata, index=states)
    obj4
    
    '重写, 覆盖掉原来的index'
    
    
    '相同的字段直接 替换, 没有的字段, 则显示为NA'
    
    
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    
    

    Here, three values found in sdata were palced in the appropriate(适当的) location, (替换, 字段相同), but since no value for 'Carlifornia' was found, it appears as NaN(not a number), which is considered in pandas to mark(标记) missing or NA values. Since 'Utah' was not include in states, it is excluded from the resulting object.

    I will use the terms(短语) 'missing' or 'NA' interchangeably(交替地) to refer to(涉及) missing data. The isnull and notnull functions in pandas should be used to detect(检测) missing data:

    "pd.isnull(), pd.notnull() 用来检测缺失值情况"
    pd.isnull(obj4)
    
    "正向逻辑"
    pd.notnull(obj4)
    
    "Series also has these as instance methods:"
    obj4.notnull()
    
    'pd.isnull(), pd.notnull() 用来检测缺失值情况'
    
    
    California     True
    Ohio          False
    Oregon        False
    Texas         False
    dtype: bool
    
    
    '正向逻辑'
    
    
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool
    
    
    'Series also has these as instance methods:'
    
    
    California    False
    Ohio           True
    Oregon         True
    Texas          True
    dtype: bool
    
    

    I discuss working with missing data in more detail in Chapter 7.

    A usefull Series feature for many applications is that it automatically(自动地) aligns(对齐) index label in arithmetic operations. -> Series 在算数运算中, 会自动地对齐索引,即相同索引, 会被认为一个索引 这点很关键.

    obj3
    obj4
    
    "obj3 + obj4, index相同, 直接数值相加, 不想同则NaN"
    obj3 + obj4
    
    Ohio      35000
    Texas     71000
    Oregon    16000
    Utah       5000
    dtype: int64
    
    
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    dtype: float64
    
    
    'obj3 + obj4, index相同, 直接数值相加, 不想同则NaN'
    
    
    California         NaN
    Ohio           70000.0
    Oregon         32000.0
    Texas         142000.0
    Utah               NaN
    dtype: float64
    
    

    Data alignment features(数据对齐的功能) will be in addressed in more detail later. If you have experience with databases, you can think about this as being simalar to a join operation. ->(数据对齐, 就跟数据的的连接是相似的, 内连接, 左连接, 右连接)

    Both the Series object itself and its index hava a name attribute, which integrates(一体化) with other keys areas of pandas functionality: -> (name属性, 是将一些键区域联系在一起的)

    "设置键的名字 obj4.name='xxx'"
    obj4.name = 'population'
    
    "设置索引的名字 obj4.index.name = 'xxx'"
    obj4.index.name = 'state'
    
    obj4
    
    "设置键的名字 obj4.name='xxx'"
    
    
    "设置索引的名字 obj4.index.name = 'xxx'"
    
    
    state
    California        NaN
    Ohio          35000.0
    Oregon        16000.0
    Texas         71000.0
    Name: population, dtype: float64
    
    

    A Series's index can be altered(改变) in-place by assignment. -> index 可通过赋值的方式, 原地改变

    obj
    
    "通过obj.index = 'xxx'实现原地修改索引, 数量不匹配则会报错哦"
    
    obj.index = ['Bob', 'Steve', 'Jeff', 'Ryan']
    obj
    
    Bob      4
    Steve    7
    Jeff    -5
    Ryan     3
    dtype: int64
    
    
    "通过obj.index = 'xxx'实现原地修改索引, 数量不匹配则会报错哦"
    
    
    Bob      4
    Steve    7
    Jeff    -5
    Ryan     3
    dtype: int64
    
    

    DataFrame

    A DataFrame represents a rectangular table of data(矩形数据表) and contains an ordered collecton of columns, each of which can be different value type(numeric, string, boolean, etc..)-> (每一列可以包含不同的数据类型) The DataFrame has both a row and column index;(包含有行索引index, 和列索引columns)
    It can be thought of as a dict fo Series all sharing the same index.(共享相同索引的Series) Under the hood(从底层来看) the data is stored as one or more two-dimensional blocks rather than a list, dict, or some other collection fo one-dimensional arrays.(数据被存储为多个二维数组块而非list, dict, 或其他一维数组) The exact(详细的) details of DataFrame's internals(底层原理) are outside the scope of this book.

    While a DataFrame is physically(原本用来表示) two-dimensional, you can use it to represent higher dimensional data in a tabular format using hierarchical(分层的) indexing, a subject we wil discuss in Chapter8 and an ingredient(成分) in some of the more advanced data-handling features in pandas. -> 分层索引处理多维数据, 和更多处理高维数据的先进功能在pandas中都能学习到.

    There are many ways to construct(构造) a DataFrame, though one of the most common is from a dict of equal-length lists of or NumPy array. ->(构造一个DataFrame最常见的方式是传入一个等长字典, or 多维数组)

    data = {
        'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
        'year': [2000, 2001, 2002, 2001, 2002, 2003],
        'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]
    }
    
    frame = pd.DataFrame(data)
    

    The resulting DataFrame will have its index assigned automatically as with Series, and the columns are placed in sorted order:

    frame
    
    state year pop
    0 Ohio 2000 1.5
    1 Ohio 2001 1.7
    2 Ohio 2002 3.6
    3 Nevada 2001 2.4
    4 Nevada 2002 2.9
    5 Nevada 2003 3.2

    If you are using the Jupyter notebook, pandas DataFrame objects will be displayed as a more browser-friendly HTML table.

    For large DataFrames, the head method selects only the first five rows: -> df.head() 默认查看前5行

    frame.head()
    
    state year pop
    0 Ohio 2000 1.5
    1 Ohio 2001 1.7
    2 Ohio 2002 3.6
    3 Nevada 2001 2.4
    4 Nevada 2002 2.9

    If you specify a sequence of columns, The DataFrame's columns will be arranged in that order: -> 指定列的顺序

    "按指定列的顺序排列"
    pd.DataFrame(data, columns=['year', 'state', 'pop'])
    
    '按指定列的顺序排列'
    
    
    year state pop
    0 2000 Ohio 1.5
    1 2001 Ohio 1.7
    2 2002 Ohio 3.6
    3 2001 Nevada 2.4
    4 2002 Nevada 2.9
    5 2003 Nevada 3.2

    If you pass a column that isn't contained in the dict, it will appear with missing values the result:

    frame2 = pd.DataFrame(data, 
                         columns=['year', 'state', 'pop', 'debt'],
                         index=['one', 'two', 'three', 'four', 'five', 'six'])
    
    "对于没有的 columns, 则会新建, 值为NaN"
    frame2
    
    "index没有, 则会报错哦, frame.columns 可查看列索引"
    frame2.columns
    
    '对于没有的 columns, 则会新建, 值为NaN'
    
    
    year state pop debt
    one 2000 Ohio 1.5 NaN
    two 2001 Ohio 1.7 NaN
    three 2002 Ohio 3.6 NaN
    four 2001 Nevada 2.4 NaN
    five 2002 Nevada 2.9 NaN
    six 2003 Nevada 3.2 NaN
    'index没有, 则会报错哦, frame.columns 可查看列索引'
    
    
    Index(['year', 'state', 'pop', 'debt'], dtype='object')
    
    

    A column in a DataFrame can be retrieve(被检索) as a Series either by dict-like notation or by attribute:
    ->(列表作为索引, 或者df.列名)

    "中括号索引[字段名]"
    frame2['state']
    
    "通过属方式 df.字段名"
    frame2.state
    
    '中括号索引[字段名]'
    
    
    one        Ohio
    two        Ohio
    three      Ohio
    four     Nevada
    five     Nevada
    six      Nevada
    Name: state, dtype: object
    
    
    '通过属方式 df.字段名'
    
    
    one        Ohio
    two        Ohio
    three      Ohio
    four     Nevada
    five     Nevada
    six      Nevada
    Name: state, dtype: object
    
    

    Attribute-like access(eg, frame2.year) and tab completion(完成) of column names in Ipython is provided as a convenience. -> 通过属性的方式来选取列名是挺方便的.
    Frame2[column] works for any column name, but frame2.column only works when the column name is valid Python variable name.

    Note that the returned Series have the same index as the DataFrame,(返回的Series具有相同的索引) and their name attribute has been appropriately(适当地) set.

    Rows can also be retrieve by position or name with the special loc attribute(much more than this later) -> loc属性用来选取行...

    "选取index为three的行 loc[index]"
    frame2.loc['three']
    
    "选取第二行和第三行, frame.loc[1:2]"
    frame.loc[1:2]
    
    '选取index为three的行 loc[index]'
    
    
    year     2002
    state    Ohio
    pop       3.6
    debt      NaN
    Name: three, dtype: object
    
    
    '选取第二行和第三行, frame.loc[1:2]'
    
    
    state year pop
    1 Ohio 2001 1.7
    2 Ohio 2002 3.6

    Columns can be modified by assignment. For example, the enpty 'debt' column could be assigned a scalar value or an array of values: -> 原地修改值

    frame2['debet'] = 16.5
    
    "原地修改了整列的值了"
    frame2
    
    '原地修改了整列的值了'
    
    
    year state pop debt debet
    one 2000 Ohio 1.5 NaN 16.5
    two 2001 Ohio 1.7 NaN 16.5
    three 2002 Ohio 3.6 NaN 16.5
    four 2001 Nevada 2.4 NaN 16.5
    five 2002 Nevada 2.9 NaN 16.5
    six 2003 Nevada 3.2 NaN 16.5
    "原地修改, 自动对齐"
    frame2['debet'] = np.arange(6)
    
    "删除掉debt列, axis=1, 列, inplace=True原地删除"
    frame2.drop(labels='debt', axis=1, inplace=True)
    
    frame2
    
    '原地修改, 自动对齐'
    
    
    '删除掉debt列, axis=1, 列, inplace=True原地删除'
    
    
    year state pop debet
    one 2000 Ohio 1.5 0
    two 2001 Ohio 1.7 1
    three 2002 Ohio 3.6 2
    four 2001 Nevada 2.4 3
    five 2002 Nevada 2.9 4
    six 2003 Nevada 3.2 5
    frame2.columns
    
    Index(['year', 'state', 'pop', 'debet'], dtype='object')
    
    
    frame2.drop()
    
    frame2['debt']
    
    one      0
    two      1
    three    2
    four     3
    five     4
    six      5
    Name: debt, dtype: int32
    
    

    When you are assigning list or arrays to a column, the value's lenght must match the lenght of the DataFrame.(插入数据的长度必须能对齐, 不然后缺失值了) If you assign a Series, it's labels will be realigned exactly to the DataFrame's index, inserting missing values in any holes:

    val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
    
    "自动对齐, 根据index"
    frame2['debet'] = val
    
    frame2
    
    '自动对齐, 根据index'
    
    
    year state pop debet
    one 2000 Ohio 1.5 NaN
    two 2001 Ohio 1.7 -1.2
    three 2002 Ohio 3.6 NaN
    four 2001 Nevada 2.4 -1.5
    five 2002 Nevada 2.9 -1.7
    six 2003 Nevada 3.2 NaN

    Assigning a column that doesn't exist will create a new colum. The del keyword will delete columns as with a dict. -> del 来删除列

    As an example of del, I first add a new column of boolean values where the state columns equals 'Ohio':

    frame2['eastern'] = frame2.state == 'Ohio'
    
    "先新增一列 eastern"
    frame2
    
    "然后用 del 关键子去删除该列"
    del frame2['eastern']
    
    "显示字段名, 发现 eastern列被干掉了, 当然, drop()方法也可以"
    frame2.columns
    
    '先新增一列 eastern'
    
    
    year state pop debet eastern
    one 2000 Ohio 1.5 NaN True
    two 2001 Ohio 1.7 -1.2 True
    three 2002 Ohio 3.6 NaN True
    four 2001 Nevada 2.4 -1.5 False
    five 2002 Nevada 2.9 -1.7 False
    six 2003 Nevada 3.2 NaN False
    '然后用 del 关键子去删除该列'
    
    
    '显示字段名, 发现 eastern列被干掉了, 当然, drop()方法也可以'
    
    
    Index(['year', 'state', 'pop', 'debet'], dtype='object')
    
    

    The column returned from indexing a DataFrame is a view on teh underlying data, not a copy.(视图哦, in-place的) Thus, any in-place modifications to the Series will be reflected in the DataFrame. The column can be explicitly copied with the Serie's copy method. -> 可以显示指定列进行拷贝, 不然操作的是视图.

    Another common form of data is a nested dict of dicts:

    pop = {
        'Nevada': {2001:2.4, 2002:2.9},
        'Ohio': {2000:1.5, 2001:1.7, 2002:3.6}
    }
    

    If the nested dict is passed to the DataFrame, pandas will interpret the outer dict keys as the columns and the inner keys as the row indices: ->(字典一层嵌套, pandas 会将最外层key作为columns, 内层key作为index)

    frame3 = pd.DataFrame(pop)
    "外层字典的键作为column, 值的键作为index"
    frame3
    
    '外层字典的键作为column, 值的键作为index'
    
    
    Nevada Ohio
    2000 NaN 1.5
    2001 2.4 1.7
    2002 2.9 3.6

    You can transpose the DataFrame(swap rows and columns) with similar syntax to a NumPy array:

    "转置"
    frame3.T
    
    '转置'
    
    
    2000 2001 2002
    Nevada NaN 2.4 2.9
    Ohio 1.5 1.7 3.6

    The keys in the inner dicts(内部键, index) are combined and sorted to form the index in the result. This isn't true if an explicit index is specified:

    # pd.DataFrame(pop, index=('a', 'b','c'))
    
    

    Dicts of Series are treated in much the same way.

    pdata = {
        'Ohio': frame3['Ohio'][:-1],
        'Nevada': frame3['Nevada'][:2]
    }
    
    pd.DataFrame(pdata)
    
    Ohio Nevada
    2000 1.5 NaN
    2001 1.7 2.4

    For a complete list of things you can pass the DataFrame constructor(构造), see Table5-1.
    If a DataFrame's index and columns have their name attributes, these will also be displayed: -> 设置行列索引的名字属性

    frame3.index.name = 'year'
    frame3.columns.name = 'state'
    
    frame3
    
    state Nevada Ohio
    year
    2000 NaN 1.5
    2001 2.4 1.7
    2002 2.9 3.6

    As with Series, the values attribute returns the data contained in the DataFrame as a two-dimensional ndarray: -> values属性返回的是二维的

    frame3.values
    
    array([[nan, 1.5],
           [2.4, 1.7],
           [2.9, 3.6]])
    
    

    If the DataFrame's columns are different dtypes, the dtype of the values array will be chosen to accommodate(容纳) all of the columns.

    "会自动选择dtype去容纳各种类型的数据"
    frame2.values
    
    '会自动选择dtype去容纳各种类型的数据'
    
    
    array([[2000, 'Ohio', 1.5, nan],
           [2001, 'Ohio', 1.7, nan],
           [2002, 'Ohio', 3.6, nan],
           [2001, 'Nevada', 2.4, nan],
           [2002, 'Nevada', 2.9, nan],
           [2003, 'Nevada', 3.2, nan]], dtype=object)
    
    

    Table 5-1 Possible data inputs to DataFrame constructor

    • 2D ndarray A matrix of data, passing optional and columns labels
    • .......用到再说吧

    Index Objects

    pandas's Index objects are responsible(保存) for holding the axis labels and other metadata(like the axis name or names). Any array or other sequence of lables you use when constructing(构造) a Series or DataFrame is internally(内部地) converted to an Index(转为索引):

    obj = pd.Series(range(3), index=['a', 'b', 'c'])
    
    index = obj.index
    index
    
    index[1:]
    obj
    
    Index(['a', 'b', 'c'], dtype='object')
    
    
    Index(['b', 'c'], dtype='object')
    
    
    a    0
    b    1
    c    2
    dtype: int64
    
    

    Index objects are immutable(不可变的) and thus can't be modified by the user:

    index[1] = 'd'
    
    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    <ipython-input-14-a452e55ce13b> in <module>
    ----> 1 index[1] = 'd'
    
    
    c:pythonpython36libsite-packagespandascoreindexesase.py in __setitem__(self, key, value)
       2063 
       2064     def __setitem__(self, key, value):
    -> 2065         raise TypeError("Index does not support mutable operations")
       2066 
       2067     def __getitem__(self, key):
    
    
    TypeError: Index does not support mutable operations
    
    
    "index 不可变哦"
    index
    
    'index 不可变哦'
    
    
    Index(['a', 'b', 'c'], dtype='object')
    
    
    labels = pd.Index(np.arange(3))
    labels
    
    Int64Index([0, 1, 2], dtype='int64')
    
    
    obj2 = pd.Series([1.5, -2.5, 0], index=labels)
    obj2
    
    0    1.5
    1   -2.5
    2    0.0
    dtype: float64
    
    
    obj2.index is labels
    
    True
    
    

    Unlike Python sets, a pandas Index can con

    Selections with dumplicate labels will select all occurrences(发生) of that label.

    Each Index has a number of methods and properties for set logic which answer other common questions about the data it contains. Some useful ones are summarized in Table 5-2

    • append Concatenate with additional Index objects, producing a new index
    • difference Compute set difference as Index
    • intersection Compute set intersection
    • union Compute set union
    • isin -> 是否在里面
    • delete Compute new index with element at index i deleted
    • drop Compute new index by deleting passed values
    • insert Compute new index by inserting element at index i
    • is_unique Return True if the index has no duplicate values
    • unique Compute the array of unique values in the index.
  • 相关阅读:
    科学-化学:化学百科
    科学-物理:物理学 (自然科学学科)百科
    科学-建筑学-建筑美学:建筑美学百科
    科学-建筑学:建筑学百科
    科学-哲学-美学:美学(中国哲学二级学科)
    哲学:哲学(世界观学说、社会形态之一)
    科学-语文:语文(语言和文学的简称)
    科学-分析:分析
    建模:数学建模
    科学-数学:数学
  • 原文地址:https://www.cnblogs.com/chenjieyouge/p/11869423.html
Copyright © 2011-2022 走看看