zoukankan      html  css  js  c++  java
  • np.where与pd.Series.where,pd.DataFrame.where的用法及区别

    np.where与pd.Series.where及pd.DataFrame用法不一样,下面一一进行学习,总结:

    import numpy as np
    import pandas as pd
    
    help(np.where)
    
    Help on built-in function where in module numpy.core.multiarray:
    
    where(...)
        where(condition, [x, y])
        
        Return elements, either from `x` or `y`, depending on `condition`.
        
        If only `condition` is given, return ``condition.nonzero()``.
        
        Parameters
        ----------
        condition : array_like, bool
            When True, yield `x`, otherwise yield `y`.
        x, y : array_like, optional
            Values from which to choose. `x`, `y` and `condition` need to be
            broadcastable to some shape.
        
        Returns
        -------
        out : ndarray or tuple of ndarrays
            If both `x` and `y` are specified, the output array contains
            elements of `x` where `condition` is True, and elements from
            `y` elsewhere.
        
            If only `condition` is given, return the tuple
            ``condition.nonzero()``, the indices where `condition` is True.
        
        See Also
        --------
        nonzero, choose
        
        Notes
        -----
        If `x` and `y` are given and input arrays are 1-D, `where` is
        equivalent to::
        
            [xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
        
        Examples
        --------
        >>> np.where([[True, False], [True, True]],
        ...          [[1, 2], [3, 4]],
        ...          [[9, 8], [7, 6]])
        array([[1, 8],
               [3, 4]])
        
        >>> np.where([[0, 1], [1, 0]])
        (array([0, 1]), array([1, 0]))
        
        >>> x = np.arange(9.).reshape(3, 3)
        >>> np.where( x > 5 )
        (array([2, 2, 2]), array([0, 1, 2]))
        >>> x[np.where( x > 3.0 )]               # Note: result is 1D.
        array([ 4.,  5.,  6.,  7.,  8.])
        >>> np.where(x < 5, x, -1)               # Note: broadcasting.
        array([[ 0.,  1.,  2.],
               [ 3.,  4., -1.],
               [-1., -1., -1.]])
        
        Find the indices of elements of `x` that are in `goodvalues`.
        
        >>> goodvalues = [3, 4, 7]
        >>> ix = np.isin(x, goodvalues)
        >>> ix
        array([[False, False, False],
               [ True,  True, False],
               [False,  True, False]])
        >>> np.where(ix)
        (array([1, 1, 2]), array([0, 1, 1]))
    

    • np.where用法

    从上面帮助信息可以看到:np.where的参数有condition,可选参数x,y。
    而有无可选参数以及可选参数x,y的维数将直接影响np.where的返回结果:如果没有可选参数x,y则相当于np.nonzero,返回condition数组的True或者非0的包含索引列表对的元组;如果有x,y则输出的数组形状首先与condition,x,y的一致(如果不一致,则广播为一致)根据condition的值来从x,y中挑选值。

    (1)无可选参数,x,y

    a=np.random.randint(0,high=2,size=(3,3));a
    
    array([[0, 1, 1],
           [1, 1, 0],
           [1, 1, 0]])
    
    np.where(a)
    
    (array([0, 0, 1, 1, 2, 2], dtype=int64),
     array([1, 2, 0, 1, 0, 1], dtype=int64))
    

    (2)有x,y,输出结果的形状是condition,x,y的广播后的数组的形状,然后根据condition从x,y中挑选。

    cond=np.array([True,False])
    
    x=np.arange(6).reshape(3,2);x
    
    array([[0, 1],
           [2, 3],
           [4, 5]])
    
    y=np.array([[100,200]])
    
    cond.shape
    
    (2,)
    
    x.shape
    
    (3, 2)
    
    y.shape
    
    (1, 2)
    

    所以广播后的形状应该是(3,2)

    result=np.where(cond,x,y);result
    
    array([[  0, 200],
           [  2, 200],
           [  4, 200]])
    
    result.shape
    
    (3, 2)
    
    • pandas中的where
    help(pd.DataFrame.where)
    
    Help on function where in module pandas.core.generic:
    
    where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
        Return an object of same shape as self and whose corresponding
        entries are from self where `cond` is True and otherwise are from
        `other`.
        
        Parameters
        ----------
        cond : boolean NDFrame, array-like, or callable
            Where `cond` is True, keep the original value. Where
            False, replace with corresponding value from `other`.
            If `cond` is callable, it is computed on the NDFrame and
            should return boolean NDFrame or array. The callable must
            not change input NDFrame (though pandas doesn't check it).
        
            .. versionadded:: 0.18.1
                A callable can be used as cond.
        
        other : scalar, NDFrame, or callable
            Entries where `cond` is False are replaced with
            corresponding value from `other`.
            If other is callable, it is computed on the NDFrame and
            should return scalar or NDFrame. The callable must not
            change input NDFrame (though pandas doesn't check it).
        
            .. versionadded:: 0.18.1
                A callable can be used as other.
        
        inplace : boolean, default False
            Whether to perform the operation in place on the data
        axis : alignment axis if needed, default None
        level : alignment level if needed, default None
        errors : str, {'raise', 'ignore'}, default 'raise'
            - ``raise`` : allow exceptions to be raised
            - ``ignore`` : suppress exceptions. On error return original object
        
            Note that currently this parameter won't affect
            the results and will always coerce to a suitable dtype.
        
        try_cast : boolean, default False
            try to cast the result back to the input type (if possible),
        raise_on_error : boolean, default True
            Whether to raise on invalid data types (e.g. trying to where on
            strings)
        
            .. deprecated:: 0.21.0
        
        Returns
        -------
        wh : same type as caller
        
        Notes
        -----
        The where method is an application of the if-then idiom. For each
        element in the calling DataFrame, if ``cond`` is ``True`` the
        element is used; otherwise the corresponding element from the DataFrame
        ``other`` is used.
        
        The signature for :func:`DataFrame.where` differs from
        :func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
        ``np.where(m, df1, df2)``.
        
        For further details and examples see the ``where`` documentation in
        :ref:`indexing <indexing.where_mask>`.
        
        Examples
        --------
        >>> s = pd.Series(range(5))
        >>> s.where(s > 0)
        0    NaN
        1    1.0
        2    2.0
        3    3.0
        4    4.0
        
        >>> s.mask(s > 0)
        0    0.0
        1    NaN
        2    NaN
        3    NaN
        4    NaN
        
        >>> s.where(s > 1, 10)
        0    10.0
        1    10.0
        2    2.0
        3    3.0
        4    4.0
        
        >>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
        >>> m = df % 3 == 0
        >>> df.where(m, -df)
           A  B
        0  0 -1
        1 -2  3
        2 -4 -5
        3  6 -7
        4 -8  9
        >>> df.where(m, -df) == np.where(m, df, -df)
              A     B
        0  True  True
        1  True  True
        2  True  True
        3  True  True
        4  True  True
        >>> df.where(m, -df) == df.mask(~m, -df)
              A     B
        0  True  True
        1  True  True
        2  True  True
        3  True  True
        4  True  True
        
        See Also
        --------
        :func:`DataFrame.mask`
    

    从上面帮助信息可以看到:DataFrame和Series的where函数遵循的是if-then模式,即调用者(DataFrame,或者Series)中的元素对于在condition中为True的保留,为False的,用other填充(默认为nan),inplace默认为False,即返回一个与调用者形状一样的DataFrame或者Series,如果为True,则原地修改.其与mask方法正好相反.

    • np.where与DataFrame或Series的where方法的区别:

    (1)numpy中是模块级别的函数,numpy模块下ndarray对象并没有where方法;而pandas没有模块级别where方法,只能通过DataFrame,Series对象来调用

    (2)np.where中condition可以是数组,布尔值,而pandas的DataFrame及Series的condition不仅可以是数组,布尔值,还可以是函数句柄;
    (3)前者有对于condition为True的选择集合x,而后者遵循的是if-then模式,仅对condition为False情况给出其选择集合
    (4)前者返回值的形状与condition,x,y有关,是三者广播后数组的形状;而后者返回值与调用者保持一致
    (5)后者有inplace参数,可以决定是返回一个新的对象还是对调用者原地修改;而前者本身就是要重组一个数组,所以没有inplace这个参数.

    ##### 愿你一寸一寸地攻城略地,一点一点地焕然一新 #####
  • 相关阅读:
    Educational Codeforces Round 86 (Rated for Div. 2)
    第十六届东南大学大学生程序设计竞赛(春、夏季)
    Codeforces Round #643 (Div. 2)
    [P3384] 【模板】轻重链剖分
    [BJOI2012] 连连看
    [CF1349C] Orac and Game of Life
    Codeforces Round #641 (Div. 2)
    [TJOI2018] 数学计算
    [CF1157D] N Problems During K Days
    [CF1163C1] Power Transmission (Easy Edition)
  • 原文地址:https://www.cnblogs.com/johnyang/p/14456719.html
Copyright © 2011-2022 走看看