zoukankan      html  css  js  c++  java
  • 《The challenge of realistic music generation: modelling raw audio at scale》论文阅读笔记

    The challenge of realistic music generation: modelling raw audio at scale

    作者:Deep  mind三位大神

    出处:NIPS 2018

    • Abstract

    首先提出了基于表达方式的音乐生成(high-level representations such as scoresor MIDI)有一些自己的问题,经过高度抽象后,音乐中的一些细节特征损失掉了,从而导致perception of musicality and realism 的损失。本文的音乐数据生成在raw audio domain中进行。autoregressive models(自回归模型)在处理波形speech数据中表现不俗,但在处理音乐时,we find them biased towards capturing local signal structure at the expense of modelling long-range correlations,于是本文提出autoregressive discrete autoencoders (ADAs) 帮助AR model capture long-range correlations in waveforms。

    • Introduction

    强调了music在不同的timescale上展现的structure特性,并且列出了midi等表示形式的限制,主要还是在丢失音乐性相关细节和乐器相关细节上。

    1.1 raw audio signal

    吹了一波wave signal的好处,优势,和上面提到的midi做比较,并指出在wave形式下建模更具挑战性和难度。

    1.2 相关生成模型

    相比于表示型数据,audio waveforms生成模型的研究历史并不长,原因是:This was long thought to be infeasible due to the scale of the problem, as audio signals are often sampled at rates of 16 kHz or higher(不太明白为什么,应该是采样成本较高). 近期的AR模型采用step步进的方式来进行生成,如Wavenet,VRNN,WaveRNN,SampleRNN,解决了采样成本的问题,这里也提到了用GAN来生成波形文件。

    贡献:1.提出文献关注点较少的raw audio domain的生成模型,可以作为benchmark测试ability of a model to capture long-range structure in data

               2. We investigate the capabilities of autoregressive models for this task, and demonstrate a computationally efficient method to enlarge their receptive fields using autoregressive discrete autoencoders (ADAs)

               3. introduce the argmax autoencoder (AMAE) as an alternative to vector quantisation variational autoencoders (VQ-VAE)

    • Scaling up autoregressive models for music

    要为long-range structure建模,需要enlarge the receptive fields,wavenet,sampleRNN都提出自己的方式来扩大接受野,但内存限制很容易触及天花板

    (未完待续)

    重要参考文献:

    Arecurrent latent variable model for sequential data

    Experiments in musical intelligence

    Synthesizing audio with generative adversarial networks

    Samplernn: An unconditional end-to-end neural audio generation model

     

  • 相关阅读:
    循环的注意点
    c语言实践输出某个区间中不是3的倍数的偶数
    while循环for循环优缺点和应用
    while 和do while循环的区别
    多重if else和switch case的区别
    if else的执行流程
    多个if和一个ifelse的区别
    对两个变量排序,从小到大输出
    【译】第四篇 Integration Services:增量加载-Updating Rows
    【译】第三篇 Integration Services:增量加载-Adding Rows
  • 原文地址:https://www.cnblogs.com/punkcure/p/9277681.html
Copyright © 2011-2022 走看看