A guide to convlution arithmetic for deep learning-Transposed Convolution
Transposed convolutions – also called fractionally strided convolutions or deconvolutions – work by swapping the forward and backward passes of a convolution. One way to put it is to note that the kernel defines a convolution, but whether it’s a direct convolution or a transposed convolution is determined by how the forward and backward passes are computed.
For instance, although the kernel w defines a convolution whose forward and backward passes are computed by multiplying with C and C T respectively, it also defines a transposed convolution whose forward and backward passes are computed by multiplying with C T and (C T ) T = C respectively.
Finally note that it is always possible to emulate a transposed convolution with a direct convolution. The disadvantage is that it usually involves adding many columns and rows of zeros to the input, resulting in a much less efficient implementation.
Notably, the kernel’s and stride’s sizes remain the same, but the input of the transposed convolution is now zero padded. One way to understand the logic behind zero padding is to consider the connectivity pattern of the transposed convolution and use it to guide the design of the equivalent convolution. For example, the top left pixel of the input of the direct convolution only contribute to the top left pixel of the output, the top right pixel is only connected to the top right output pixel, and so on.