Music Transformer 实际数据流转示例-洪萨配资

Music Transformer 实际数据流转示例

让我通过一个具体的音乐片段来展示整个 Music Transformer 的数据流转过程。

实际示例：处理一段简单的钢琴旋律

原始输入：C大调简单旋律

音符1: C4 (音高60), 时长0.5秒, 力度80 音符2: E4 (音高64), 时长0.5秒, 力度80 音符3: G4 (音高67), 时长0.5秒, 力度80 音符4: C5 (音高72), 时长1.0秒, 力度90 休止符: 0.5秒 音符5: G4 (音高67), 时长0.5秒, 力度75 音符6: E4 (音高64), 时长0.5秒, 力度75 音符7: C4 (音高60), 时长1.0秒, 力度85

第一阶段：原始 MIDI 到 NoteSequence

1.1 MIDI 文件读取

# 输入 MIDI 文件: simple_melody.mid # 读取后的 NoteSequence 对象: note_sequence { id: "simple_melody_001" filename: "simple_melody.mid" # 音符信息 notes { pitch: 60 # C4 start_time: 0.0 end_time: 0.5 velocity: 80 instrument: 0 program: 0 } notes { pitch: 64 # E4 start_time: 0.5 end_time: 1.0 velocity: 80 instrument: 0 program: 0 } notes { pitch: 67 # G4 start_time: 1.0 end_time: 1.5 velocity: 80 instrument: 0 program: 0 } notes { pitch: 72 # C5 start_time: 1.5 end_time: 2.5 velocity: 90 instrument: 0 program: 0 } # ... 其余音符 # 元信息 total_time: 4.0 tempos { qpm: 120.0 } time_signatures { numerator: 4 denominator: 4 } }

第二阶段：数据预处理和清洗

2.1 应用延音踏板效果

# apply_sustain_control_changes 处理 # 假设 MIDI 中有踏板控制信息 # 处理后可能延长某些音符的结束时间 # 处理前: note.pitch=60, start=0.0, end=0.5 # 处理后 (如果踏板被按下): note.pitch=60, start=0.0, end=0.7 # 结束时间延长

2.2 清理控制变化信息

# del control_changes[:] 清理 # 移除所有控制变化信息，简化数据 # 原来的 control_changes 字段被清空 # 处理前： note_sequence { notes { ... } control_changes { control_number: 64 # 延音踏板 control_value: 127 # 踩下 time: 0.0 } control_changes { control_number: 64 control_value: 0 # 抬起 time: 2.0 } } # 处理后： note_sequence { notes { # 音符已经被踏板效果处理过，结束时间延长 } control_changes { } # 清空控制变化信息 }

第三阶段：数据增强

3.1 时间拉伸 (1.05倍)

# stretch_note_sequence(sequence, 1.05) # 所有时间乘以 1.05 处理前: note.pitch=60, start=0.0, end=0.5 note.pitch=64, start=0.5, end=1.0 处理后: note.pitch=60, start=0.0, end=0.525 # 0.5 * 1.05 note.pitch=64, start=0.525, end=1.05 # 1.0 * 1.05

3.2 音高转置 (+2半音)

# 在 Score2PerfMaestroLanguageUncroppedAug 中定义： @property def stretch_factors(self): return [0.95, 0.975, 1.0, 1.025, 1.05] # 5种不同的拉伸因子 # 这意味着每首原始音乐会被处理成5个不同速度的版本： # 0.95倍 (加快5%) - 更急促的感觉 # 0.975倍 (加快2.5%) - 稍微加快 # 1.0倍 (原始速度) - 标准版本 # 1.025倍 (放慢2.5%) - 稍微放缓 # 1.05倍 (放慢5%) - 更从容的感觉 # transpose_note_sequence(sequence, 2) # 所有音高加2 处理前: note.pitch=60 # C4 note.pitch=64 # E4 note.pitch=67 # G4 处理后: note.pitch=62 # D4 note.pitch=66 # F#4 note.pitch=69 # A4

第四阶段：转换为性能事件序列

4.1 量化处理

# quantize_note_sequence_absolute(ns, steps_per_second=100) # 每秒100个时间步 原始时间: note.start_time=0.0, end_time=0.525 量化后: note.quantized_start_step=0 note.quantized_end_step=53 # 0.525 * 100 = 52.5 ≈ 53

4.2 转换为 Performance 对象

# 创建 Performance 对象 performance = Performance( quantized_sequence=quantized_ns, num_velocity_bins=32, max_shift_steps=100 ) # Performance 事件序列: [ PerformanceEvent(event_type=NOTE_ON, event_value=62), # D4 PerformanceEvent(event_type=TIME_SHIFT, event_value=53), # 等待53个时间步 PerformanceEvent(event_type=NOTE_OFF, event_value=62), # D4结束 PerformanceEvent(event_type=NOTE_ON, event_value=66), # F#4 PerformanceEvent(event_type=TIME_SHIFT, event_value=53), # 等待53个时间步 PerformanceEvent(event_type=NOTE_OFF, event_value=66), # F#4结束 # ... 其余事件 ]

第五阶段：编码为整数序列

5.1 事件编码

# 使用 MidiPerformanceEncoder 编码 编码映射: NOTE_ON (62) -> 105 TIME_SHIFT (53) -> 23 NOTE_OFF (62) -> 156 NOTE_ON (66) -> 78 TIME_SHIFT (53) -> 201 NOTE_OFF (66) -> 45 编码后的整数序列: [105, 23, 156, 78, 201, 45, 178, 92, 67, 201, 134, 78, 23, 156]

5.2 n-gram 优化（如果有）

# 假设有常用的 n-gram 模式 # 常见模式 "105, 23, 156" (NOTE_ON, TIME_SHIFT, NOTE_OFF) 被替换为 1001 优化前: [105, 23, 156, 78, 201, 45, 178, 92, 67, 201, 134, 78, 23, 156] 优化后: [1001, 78, 201, 45, 178, 92, 67, 201, 134, 78, 1001] 第六阶段：保存为 TFRecord

6.1 TFExample 格式

# generator_utils.to_example() 生成 tf_example { features { feature { key: "targets" value { int64_list { value: 1001 # 优化后的 n-gram value: 78 # NOTE_ON F#4 value: 201 # TIME_SHIFT value: 45 # NOTE_OFF F#4 value: 178 # NOTE_ON A4 # ... 其余值 } } } } } # 保存为文件: # data/processed/score2perf_maestro_language_uncropped_aug-train-00000-of-00001.tfrecord

第七阶段：模型训练中的数据使用

7.1 批处理

# 训练时从 TFRecord 读取数据并批处理 批次数据形状: [batch_size=32, sequence_length=2048] batch_targets = [ [1001, 78, 201, 45, 178, 92, ...], # 序列1 [156, 23, 78, 201, 45, 178, ...], # 序列2 [201, 45, 178, 92, 67, 201, ...], # 序列3 # ... 其余29个序列 ]

7.2 模型输入处理

# Transformer 模型处理 # 1. 嵌入层处理 embedded_targets = embedding_layer(batch_targets) # [32, 2048, 384] # 2. 位置编码 position_encoded = add_positional_encoding(embedded_targets) # 3. 编码器处理 encoder_output = transformer_encoder(position_encoded) # 4. 解码器处理（自回归） decoder_output = transformer_decoder( targets=position_encoded[:, :-1, :], # 输入去掉最后一个 encoder_output=encoder_output, targets_position=None ) # 5. 输出投影 logits = output_projection(decoder_output) # [32, 2047, vocab_size]

第八阶段：音乐生成

8.1 自回归生成过程

# 生成一个新的音乐序列 # 初始种子序列 seed = [1001, 78, 201] # 我们处理过的开头部分 # 逐个生成后续事件 generated_sequence = seed.copy() for step in range(100): # 生成100个事件 # 模型预测 logits = model.predict(generated_sequence) # 返回最后一个位置的logits # 采样下一个事件 (temperature=0.9) next_event = sample_with_temperature(logits[-1], temperature=0.9) # 添加到序列 generated_sequence.append(next_event) # 检查是否应该停止 if next_event == 0: # 假设0是结束标记 break # 最终生成的序列可能是: # [1001, 78, 201, 45, 178, 92, 67, 201, 134, 78, 23, 156, 105, 23, ...]

第九阶段：解码为 MIDI

9.1 整数序列解码

# 解码生成的整数序列 encoded_sequence = [1001, 78, 201, 45, 178, 92, 67, 201, 134, 78, 23, 156] # n-gram 解码 decoded_events = [] for token in encoded_sequence: if token >= 1000: # n-gram 标记 ngram_events = get_ngram_events(token) # [105, 23, 156] decoded_events.extend(ngram_events) else: decoded_events.append(token) # 解码后: [105, 23, 156, 78, 201, 45, 178, 92, 67, 201, 134, 78, 23, 156]

9.2 转换为 Performance 事件

# 解码为 Performance 事件 performance_events = [] for event_id in decoded_events: event = decode_event(event_id) # 根据编码映射表解码 performance_events.append(event) # performance_events: # [ # PerformanceEvent(NOTE_ON, 62), # D4 # PerformanceEvent(TIME_SHIFT, 23), # 等待23个时间步 # PerformanceEvent(NOTE_OFF, 62), # D4结束 # PerformanceEvent(NOTE_ON, 66), # F#4 # ... # ]

9.3 转换为 NoteSequence

# Performance 转换为 NoteSequence note_sequence = performance.to_sequence() # 自动处理时间和音符匹配 生成的 NoteSequence: note_sequence { notes { pitch: 62 # D4 start_time: 0.0 end_time: 0.23 # 根据量化步长计算 velocity: 80 } notes { pitch: 66 # F#4 start_time: 0.23 end_time: 0.74 velocity: 80 } # ... 其余音符 total_time: 4.5 # 根据最后音符计算 tempos { qpm: 120 } }

第十阶段：导出为 MIDI 文件

10.1 保存 MIDI 文件

# 最终输出 output_file = "./generated/generated_melody_20251210_1516.mid" note_seq.sequence_proto_to_midi_file(note_sequence, output_file) # 生成的 MIDI 文件内容: # - 文件大小: 约 2KB # - 音符数: 7个 # - 时长: 4.5秒 # - 包含完整的时间签名和速度信息

完整数据流转总结

原始概念: 简单C大调旋律 ↓ MIDI文件: simple_melody.mid ↓ NoteSequence: 包含7个音符的protobuf对象 ↓ 数据清洗: 应用踏板效果，清理控制信息 ↓ 数据增强: 时间拉伸1.05倍，音高转置+2半音 ↓ 量化处理: 转换为每秒100步的离散时间 ↓ Performance: 事件序列 [NOTE_ON, TIME_SHIFT, NOTE_OFF, ...] ↓ 整数编码: [105, 23, 156, 78, 201, 45, ...] ↓ n-gram优化: [1001, 78, 201, 45, ...] ↓ TFRecord: 保存为训练数据文件 ↓ 模型训练: 32批次，每批2048长度序列 ↓ 自回归生成: 从种子序列逐步生成新序列 ↓ 解码处理: 整数→事件→Performance→NoteSequence ↓ MIDI导出: generated_melody_20251210_1516.mid ↓ 最终产物: 可播放的音乐文件

这个具体的例子展示了从一个简单的音乐概念如何通过 Music Transformer 的各个处理阶段，最终变成一个可以播放的 MIDI 文件的完整过程。

Music Transformer 实际数据流转示例

Music Transformer 实际数据流转示例

实际示例：处理一段简单的钢琴旋律

原始输入：C大调简单旋律

第一阶段：原始 MIDI 到 NoteSequence

1.1 MIDI 文件读取

第二阶段：数据预处理和清洗

2.1 应用延音踏板效果

2.2 清理控制变化信息

第三阶段：数据增强

3.1 时间拉伸 (1.05倍)

3.2 音高转置 (+2半音)

第四阶段：转换为性能事件序列

4.1 量化处理

4.2 转换为 Performance 对象

第五阶段：编码为整数序列

5.1 事件编码

5.2 n-gram 优化（如果有）

6.1 TFExample 格式

第七阶段：模型训练中的数据使用

7.1 批处理

7.2 模型输入处理

第八阶段：音乐生成

8.1 自回归生成过程

第九阶段：解码为 MIDI

9.1 整数序列解码

9.2 转换为 Performance 事件

9.3 转换为 NoteSequence

第十阶段：导出为 MIDI 文件

10.1 保存 MIDI 文件

完整数据流转总结

D2RML终极指南：暗黑破坏神2重制版智能多开神器

PyTorch-CUDA-v2.6镜像定期更新策略：安全补丁与性能优化

PixiJS微信小程序适配方案：突破性能瓶颈的3大技术革新

如何快速掌握SharpDX：.NET图形开发的终极指南

Potrace技术解析：从位图到矢量图形的智能转换实战指南

Go语言高性能API架构实战：从Sun-Panel看现代后端系统设计