zoukankan      html  css  js  c++  java
  • 《The challenge of realistic music generation: modelling raw audio at scale》论文阅读笔记

    The challenge of realistic music generation: modelling raw audio at scale

    作者:Deep  mind三位大神

    出处:NIPS 2018

    • Abstract

    首先提出了基于表达方式的音乐生成(high-level representations such as scoresor MIDI)有一些自己的问题,经过高度抽象后,音乐中的一些细节特征损失掉了,从而导致perception of musicality and realism 的损失。本文的音乐数据生成在raw audio domain中进行。autoregressive models(自回归模型)在处理波形speech数据中表现不俗,但在处理音乐时,we find them biased towards capturing local signal structure at the expense of modelling long-range correlations,于是本文提出autoregressive discrete autoencoders (ADAs) 帮助AR model capture long-range correlations in waveforms。

    • Introduction

    强调了music在不同的timescale上展现的structure特性,并且列出了midi等表示形式的限制,主要还是在丢失音乐性相关细节和乐器相关细节上。

    1.1 raw audio signal

    吹了一波wave signal的好处,优势,和上面提到的midi做比较,并指出在wave形式下建模更具挑战性和难度。

    1.2 相关生成模型

    相比于表示型数据,audio waveforms生成模型的研究历史并不长,原因是:This was long thought to be infeasible due to the scale of the problem, as audio signals are often sampled at rates of 16 kHz or higher(不太明白为什么,应该是采样成本较高). 近期的AR模型采用step步进的方式来进行生成,如Wavenet,VRNN,WaveRNN,SampleRNN,解决了采样成本的问题,这里也提到了用GAN来生成波形文件。

    贡献:1.提出文献关注点较少的raw audio domain的生成模型,可以作为benchmark测试ability of a model to capture long-range structure in data

               2. We investigate the capabilities of autoregressive models for this task, and demonstrate a computationally efficient method to enlarge their receptive fields using autoregressive discrete autoencoders (ADAs)

               3. introduce the argmax autoencoder (AMAE) as an alternative to vector quantisation variational autoencoders (VQ-VAE)

    • Scaling up autoregressive models for music

    要为long-range structure建模,需要enlarge the receptive fields,wavenet,sampleRNN都提出自己的方式来扩大接受野,但内存限制很容易触及天花板

    (未完待续)

    重要参考文献:

    Arecurrent latent variable model for sequential data

    Experiments in musical intelligence

    Synthesizing audio with generative adversarial networks

    Samplernn: An unconditional end-to-end neural audio generation model

     

  • 相关阅读:
    CSS3中的Transition属性详解
    jq 全选/取消效果
    多维数组问题 int (*a)[] int []
    C语言输入多组问题~ungetc回退字符到stdin
    2015-12-14重启博客之旅
    转载~kxcfzyk:Linux C语言多线程库Pthread中条件变量的的正确用法逐步详解
    lsof 一切皆文件
    转载自~浮云比翼: 不忘初衷,照顾好自己。
    转载自~浮云比翼:Step by Step:Linux C多线程编程入门(基本API及多线程的同步与互斥)
    梳理回顾
  • 原文地址:https://www.cnblogs.com/punkcure/p/9277681.html
Copyright © 2011-2022 走看看