zoukankan      html  css  js  c++  java
  • PyTorch载入图片后ToTensor解读(含PIL和OpenCV读取图片对比)

    概述

    PyTorch在做一般的深度学习图像处理任务时,先使用dataset类和dataloader类读入图片,在读入的时候需要做transform变换,其中transform一般都需要ToTensor()操作,将dataset类中__getitem__()方法内读入的PIL或CV的图像数据转换为torch.FloatTensor。详细过程如下:

    PIL与CV数据格式

    1. PIL(RGB)
      PIL(Python Imaging Library)是Python中最基础的图像处理库,一般操作如下:
    from PIL import Image
    import numpy as np
    image = Image.open('test.jpg') # 图片是400x300 宽x高
    print type(image) # out: PIL.JpegImagePlugin.JpegImageFile
    print image.size  # out: (400,300)
    print image.mode # out: 'RGB'
    print image.getpixel((0,0)) # out: (143, 198, 201)
    # resize w*h
    image = image.resize((200,100),Image.NEAREST)
    print image.size # out: (200,100)
    '''
    代码解释
    **注意image是 class:`~PIL.Image.Image` object**,它有很多属性,比如它的size是(w,h),通道是RGB,,他也有很多方法,比如获取getpixel((x,y))某个位置的像素,得到三个通道的值,x最大可取w-1,y最大可取h-1
    比如resize方法,可以实现图片的放缩,具体参数如下
    resize(self, size, resample=0) method of PIL.Image.Image instance
        Returns a resized copy of this image.
    
        :param size: The requested size in pixels, as a 2-tuple:
           (width, height). 
        注意size是 (w,h),和原本的(w,h)保持一致
        :param resample: An optional resampling filter.  This can be
           one of :py:attr:`PIL.Image.NEAREST`, :py:attr:`PIL.Image.BOX`,
           :py:attr:`PIL.Image.BILINEAR`, :py:attr:`PIL.Image.HAMMING`,
           :py:attr:`PIL.Image.BICUBIC` or :py:attr:`PIL.Image.LANCZOS`.
           If omitted, or if the image has mode "1" or "P", it is
           set :py:attr:`PIL.Image.NEAREST`.
           See: :ref:`concept-filters`.
        注意这几种插值方法,默认NEAREST最近邻(分割常用),分类常用BILINEAR双线性,BICUBIC立方
        :returns: An :py:class:`~PIL.Image.Image` object.
    
    '''
    image = np.array(image,dtype=np.float32) # image = np.array(image)默认是uint8
    print image.shape # out: (100, 200, 3)
    # 神奇的事情发生了,w和h换了,变成(h,w,c)了
    # 注意ndarray中是 行row x 列col x 维度dim 所以行数是高,列数是宽
    
    1. OpenCV(python版)(BGR)
      OpenCV是一个很强大的图像处理库,适用面更广,可以在各种场合看到,性能也较好,相关代码也较多。常用操作如下:
    import cv2
    import numpy as np
    image = cv2.imread('test.jpg')
    print type(image) # out: numpy.ndarray
    print image.dtype # out: dtype('uint8')
    print image.shape # out: (300, 400, 3) (h,w,c) 和skimage类似
    print image # BGR
    '''
    array([
            [ [143, 198, 201 (dim=3)],[143, 198, 201],... (w=200)],
            [ [143, 198, 201],[143, 198, 201],... ],
            ...(h=100)
          ], dtype=uint8)
    
    '''
    # w*h
    image = cv2.resize(image,(100,200),interpolation=cv2.INTER_LINEAR)
    print image.dtype # out: dtype('uint8')
    print image.shape # out: (200, 100, 3) 
    '''
    注意注意注意 和skimage不同 
    resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) 
    关键字参数为dst,fx,fy,interpolation
    dst为缩放后的图像
    dsize为(w,h),但是image是(h,w,c)
    fx,fy为图像x,y方向的缩放比例,
    interplolation为缩放时的插值方式,有三种插值方式:
    cv2.INTER_AREA:使用象素关系重采样。当图像缩小时候,该方法可以避免波纹出现。当图像放大时,类似于 CV_INTER_NN方法    
    cv2.INTER_CUBIC: 立方插值
    cv2.INTER_LINEAR: 双线形插值 
    cv2.INTER_NN: 最近邻插值
    [详细可查看该博客](http://www.tuicool.com/articles/rq6fIn)
    '''
    '''
    cv2.imread(filename, flags=None):
    flag:
    cv2.IMREAD_COLOR 1: Loads a color image. Any transparency of image will be neglected. It is the default flag. 正常的3通道图
    cv2.IMREAD_GRAYSCALE 0: Loads image in grayscale mode 单通道灰度图
    cv2.IMREAD_UNCHANGED -1: Loads image as such including alpha channel 4通道图
    注意: 默认应该是cv2.IMREAD_COLOR,如果你cv2.imread('gray.png'),虽然图片是灰度图,但是读入后会是3个通道值一样的3通道图片
    
    '''
    

    另外,PIL图像在转换为numpy.ndarray后,格式为(h,w,c),像素顺序为RGB
    OpenCV在cv2.imread()后数据类型为numpy.ndarray,格式为(h,w,c),像素顺序为BGR

    torchvision.transforms.ToTensor()

    torchvision.transforms.transforms.py:61

    class ToTensor(object):
        """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    
        Converts a PIL Image or numpy.ndarray (H x W x C) in the range
        [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].
        """
    
        def __call__(self, pic):
            """
            Args:
                pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
    
            Returns:
                Tensor: Converted image.
            """
            return F.to_tensor(pic)
    
        def __repr__(self):
            return self.__class__.__name__ + '()'
    

    torchvision.transforms.functional.py:32

    def to_tensor(pic):
        """Convert a ``PIL Image`` or ``numpy.ndarray`` to tensor.
    
        See ``ToTensor`` for more details.
    
        Args:
            pic (PIL Image or numpy.ndarray): Image to be converted to tensor.
    
        Returns:
            Tensor: Converted image.
        """
        if not(_is_pil_image(pic) or _is_numpy_image(pic)):
            raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))
    
        if isinstance(pic, np.ndarray):
            # handle numpy array
            img = torch.from_numpy(pic.transpose((2, 0, 1)))
            # backward compatibility
            if isinstance(img, torch.ByteTensor):
                return img.float().div(255)
            else:
                return img
    
        if accimage is not None and isinstance(pic, accimage.Image):
            nppic = np.zeros([pic.channels, pic.height, pic.width], dtype=np.float32)
            pic.copyto(nppic)
            return torch.from_numpy(nppic)
    
        # handle PIL Image
        if pic.mode == 'I':
            img = torch.from_numpy(np.array(pic, np.int32, copy=False))
        elif pic.mode == 'I;16':
            img = torch.from_numpy(np.array(pic, np.int16, copy=False))
        elif pic.mode == 'F':
            img = torch.from_numpy(np.array(pic, np.float32, copy=False))
        elif pic.mode == '1':
            img = 255 * torch.from_numpy(np.array(pic, np.uint8, copy=False))
        else:
            img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
        # PIL image mode: L, P, I, F, RGB, YCbCr, RGBA, CMYK
        if pic.mode == 'YCbCr':
            nchannel = 3
        elif pic.mode == 'I;16':
            nchannel = 1
        else:
            nchannel = len(pic.mode)
        img = img.view(pic.size[1], pic.size[0], nchannel)
        # put it from HWC to CHW format
        # yikes, this transpose takes 80% of the loading time/CPU
        img = img.transpose(0, 1).transpose(0, 2).contiguous()
        if isinstance(img, torch.ByteTensor):
            return img.float().div(255)
        else:
            return img
    

    可以从to_tensor()函数看到,函数接受PIL Image或numpy.ndarray,将其先由HWC转置为CHW格式,再转为float后每个像素除以255.

  • 相关阅读:
    设计模式(2)——工厂模式详解
    直观理解梯度,以及偏导数、方向导数和法向量等
    如何编译和调试Python内核源码?
    VGG(2014),3x3卷积的胜利
    Network in Network(2013),1x1卷积与Global Average Pooling
    ZFNet(2013)及可视化的开端
    一文搞懂 deconvolution、transposed convolution、sub-­pixel or fractional convolution
    从AlexNet(2012)开始
    ImageNet主要网络benchmark对比
    仿射变换及其变换矩阵的理解
  • 原文地址:https://www.cnblogs.com/ocean1100/p/9494640.html
Copyright © 2011-2022 走看看