  • 多媒体文件格式之FLV

    [时间:2016-07] [状态:Open]

    FLV是一个相对简单的多媒体格式,仅支持单节目,也就是说每个FLV只能至多一个音频、至多一个视频。FLV(Flash Video)是Adobe的一个免费开放的音视频格式。由于在流媒体中应用比较多,还是可以简单了解下的。

    0. 学习多媒体容器格式的目的


    1. 该容器中数据是如何组织的?
    2. 该容器包含哪些编码格式的数据?这些数据是如何存储的?
    3. 该容器包含哪些元数据信息?包含哪些节目信息?
    4. 对于支持多节目的容器格式,如何找到对应的音频流、视频流、字幕流?
    5. 如何确定该容器的节目播放时长?
    6. 如何从该容器中提取音频、视频、字幕数据,并交给解码器解码,有时间戳否?
    7. 该容器是否支持seek?有哪些辅助信息?
    8. 是否支持直接流化?
    9. 哪里可以找到该容器格式最标准的文档资料?
    10. 有哪些可用的工具,方便分析容器格式异常或者错误?

    1. FLV整体结构


       prev tag size
         tag 0
       prev tag size
         tag N
       prev tag size   


    2. FLV header

    FLV header总共9个字节,其具体含义如下表所示:

    Field Type Comment
    Signature UI8 Signature byte always 'F' (0x46)
    Signature UI8 Signature byte always 'L' (0x4C)
    Signature UI8 Signature byte always 'V' (0x56)
    Version UI8 File version (for example, 0x01 for FLV version 1)
    TypeFlagsReserved UB[5] Must be 0
    TypeFlagsAudio UB[1] Audio tags are present
    TypeFlagsReserved UB[1] Must be 0
    TypeFlagsVideo UB[1] Video tags are present
    DataOffset UI32 Offset in bytes from start of file to startof body (that is, size of header)

    前三个字节是文件标志,FLV的文件标志为固定的“FLV",字节(0x46, 0x4C,0x56),这个可以用于唯一识别FLV文件。
    注意上表中UB表示二进制,UB[n]表示n bit数据长度。

    下面看一下一段比较典型FLV header的二进制码流:
    46 4c 56 01 05 00 00 00 09

    3. FLV body

    FLV header之后紧跟着就是body。标准中推荐使用DataOffset字段读取FLV body的偏移位置,这里面记录了所有的音频、视频、脚本等数据。

    Field Type--- Comment
    PreviousTagSize0 UI32 Always 0
    Tag1 FLVTAG First tag
    PreviousTagSize1 UI32 Size of previous tag, including its header, in bytes. For FLV version 1, this value is 11 plus the DataSize of the previous tag.
    Tag2 FLVTAG Second tag
    ... ... ...
    PreviousTagSizeN-1 UI32 Size of second-to-last tag, including its header, in bytes.
    TagN FLVTAG Last tag
    PreviousTagSizeN UI32 Size of last tag, including its header, in bytes.


    FLV Tag中有VideoTag、AudioTag、ScriptTag三种,可以通过标志位区分,其结构定义如下:

    Field Type Comment
    Reserved UB[2] Reserved for FMS, should be 0
    Filter UB[1] Indicates if packets are filtered.
    0 = No pre-processing required.
    1 = Pre-processing (such as decryption) of the packet is required before it can be rendered.
    Shall be 0 in unencrypted files, and 1 for encrypted tags.
    TagType UB[5] Type of contents in this tag. The following types are defined:
    8 = audio
    9 = video
    18 = script data
    DataSize UI24 Length of the message. Number of bytes after StreamID to end of tag (Equal to length of the tag – 11)
    Timestamp UI24 Time in milliseconds at which the data in this tag applies. This value is relative to the first tag in the FLV file, which always has a timestamp of 0.
    TimestampExtended UI8 Extension of the Timestamp field to form a SI32 value. This field represents the upper 8 bits, while the previous Timestamp field represents the lower 24 bits of the time in milliseconds.
    StreamID UI24 Always 0.
    AudioTagHeader IF TagType == 8 AudioTagHeader AudioTagHeader element
    VideoTagHeader IF TagType == 9 VideoTagHeader VideoTagHeader element
    EncryptionHeader IF Filter == 1 EncryptionTagHeader Encryption header shall be included for each protected sample
    FilterParams IF Filter == 1 FilterParams FilterParams shall be included for each protected sample
    Data IF TagType == 8 AUDIODATA
    IF TagType == 9 VIDEODATA
    IF TagType == 18 SCRIPTDATA
    Data specific for each media type.

    第1字节:其中5 bit,TagType标志当前Tag的类型,音频(0x08),视频(0x09),Script Data(0x12),除此之外,其他值非法;
    第2-4字节:表示一个无符号24位整型数值,表示当前Tag Data的大小;
    第9-11字节:UI24类型,表示Stream ID,总是0。

    后面的数据对应的包括Tag header和实际负载数据。

    Audio Tag


    Field Type---- Comment
    SoundFormat UB[4] Format of SoundData. The following values are defined:
    0 = Linear PCM, platform endian
    2 = MP3
    10 = AAC
    AAC is supported in Flash Player 9,0,115,0 and higher.
    SoundRate UB[2] Sampling rate. The following values are defined:
    0 = 5.5 kHz
    1 = 11 kHz
    2 = 22 kHz
    3 = 44 kHz
    SoundSize UB[1] Size of each audio sample. This parameter only pertains to uncompressed formats. Compressed formats always decode to 16 bits internally.
    0 = 8-bit samples
    1 = 16-bit samples
    SoundType UB[1] Mono or stereo sound
    0 = Mono sound
    1 = Stereo sound
    AACPacketType IF SoundFormat == 10
    The following values are defined:
    0 = AAC sequence header
    1 = AAC raw

    很明显,这里面记录了音频编码类型、采样率、量化位数,对于AAC编码,还会包含额外的sequence header。

    Video Tag


    Field Type--------------- Comment
    Frame Type UB[4] Type of video frame. The following values are defined:
    1 = key frame (for AVC, a seekable frame)
    2 = inter frame (for AVC, a non-seekable frame)
    3 = disposable inter frame (H.263 only)
    4 = generated key frame (reserved for server use only)
    5 = video info/command frame
    CodecID UB[4] Codec Identifier. The following values are defined:
    2 = Sorenson H.263
    3 = Screen video
    4 = On2 VP6
    5 = On2 VP6 with alpha channel
    6 = Screen video version 2
    7 = AVC
    AVCPacketType IF CodecID == 7
    The following values are defined:
    0 = AVC sequence header
    1 = AVC NALU
    2 = AVC end of sequence (lower level NALU sequence ender is not required or supported)
    CompositionTime IF CodecID == 7
    IF AVCPacketType == 1
      Composition time offset
    See ISO 14496-12, 8.15.3 for an explanation of composition times. The offset in an FLV file is always in milliseconds.


    IF FrameType == 5
    ELSE (
    	IF CodecID == 2
    	IF CodecID == 3
    	IF CodecID == 4
    	IF CodecID == 5
    	IF CodecID == 6
    	IF CodecID == 7


    0 = Start of client-side seeking video frame sequence
    1 = End of client-side seeking video frame sequence


    Script Tag

    Script Tag包含的负载数据是ScriptTagBody类型,里面的SCRIPTDATA编码为AMF(Action Message Format)。ScriptTagBody由Name和Value两个字段组成(类型均为SCRIPTDATAVALUE)。那么SCRIPTDATAVALUE如何定义的,见下表:

    Field Type--------------- Comment
    Type UI8 Type of the ScriptDataValue. The following types are defined:
    0 = Number
    1 = Boolean
    2 = String
    3 = Object
    4 = MovieClip (reserved, not supported)
    5 = Null
    6 = Undefined
    7 = Reference
    8 = ECMA array
    9 = Object end marker
    10 = Strict array
    11 = Date
    12 = Long string
    ScriptDataValue IF Type == 0
    IF Type == 1
    IF Type == 2
    IF Type == 3
    IF Type == 7
    IF Type == 8
    IF Type == 10
    IF Type == 11
    IF Type == 12
    Script datavalue.
    The Boolean value is (ScriptDataValue ≠ 0).



    Property Name Type Comment
    audiocodecid Number Audio codec ID used in the file (see AudioTagHeader for available SoundFormat values)
    audiodatarate Number Audio bit rate in kilobits per second
    audiodelay Number Delay introduced by the audio codec in seconds
    audiosamplerate Number Frequency at which the audio stream is replayed
    audiosamplesize Number Resolution of a single audio sample
    canSeekToEnd Boolean Indicating the last video frame is a key frame
    creationdate String Creation date and time
    duration Number Total duration of the file in seconds
    filesize Number Total size of the file in bytes
    framerate Number Number of frames per second
    height Number Height of the video in pixels
    stereo Boolean Indicating stereo audio
    videocodecid Number Video codec ID used in the file (see VideoTagHeader for available CodecID values)
    videodatarate Number Video bit rate in kilobits per second
    width Number Width of the video in pixels

    4. 其他问题











