1.认识goole audio set(引用一个Google数据集audio set 使用教程)
数据集的两种形式:
1:描述每个片段的文本(csv)文件,包括YouTube视频ID、开始时间、结束时间和一个或多个标签。
2: TensorFlow Record 文件,称为feature dataset,frame-level features are stored as tensorflow.SequenceExample protocol buffers,这个
原型被以文本的形式给出:
context: {
feature: {
key : "video_id"
value: {
bytes_list: {
value: [YouTube video id string]
}
}
}
feature: {
key : "start_time_seconds"
value: {
float_list: {
value: 6.0
}
}
}
feature: {
key : "end_time_seconds"
value: {
float_list: {
value: 16.0
}
}
}
feature: {
key : "labels"
value: {
int64_list: {
value: [1, 522, 11, 172] # The meaning of the labels can be found here.
}
}
}
}
feature_lists: {
feature_list: {
key : "audio_embedding"
value: {
feature: {
bytes_list: {
value: [128 8bit quantized features]
}
}
feature: {
bytes_list: {
value: [128 8bit quantized features]
}
}
}
... # Repeated for every second of the segment
}
}