MIDI 文件作为训练数据-洪萨配资

原文：towardsdatascience.com/midi-files-as-training-data-b67852c8b291?source=collection_archive---------3-----------------------#2024-09-13

MIDI 表演

我们可以在 MIDI 表演中找到四种信息：

当音符开始时：音符起始
当音符结束时：音符偏移（或音符持续时间计算为偏移 - 开始）
演奏了哪个音符：音符音高
键按下的“强度”如何：音符速度

**音符起始和结束（以及持续时间）**以秒为单位表示，对应于演奏者按下和释放音符的秒数。

音符音高通过一个整数编码，范围从 0（最低）到 127（最高）；请注意，比钢琴能演奏的音符范围更广；钢琴的音域对应 21–108\。

音符速度也通过一个整数进行编码，范围从 0（静音）到 127（最大强度）。

绝大多数 MIDI 表演都是钢琴表演，因为大多数 MIDI 乐器是 MIDI 键盘。其他 MIDI 乐器（例如 MIDI 萨克斯风、MIDI 鼓和 MIDI 吉他传感器）也存在，但并不那么常见。

最大的人类 MIDI 表演数据集（古典钢琴音乐）是由 Google Magenta 提供的Maestro 数据集。

MIDI 表演的主要特性

MIDI 表演的一个基本特性是永远不会有完全相同起始或持续时间的音符（理论上这是可能的，但在实践中极不可能）。

事实上，即使他们真的努力，演奏者也无法准确地同时按下两个（或更多）音符，因为人类的精确度是有限的。音符持续时间也同样如此。此外，这对大多数音乐家来说并不是优先考虑的，因为时间偏差有助于产生更具表现力或更具律动感的感觉。最后，连续的音符之间会有一些静默，或者部分重叠。

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/8683281fa2d944677d7994e066df482c.png

因此，MIDI 演奏有时也被称为未量化 MIDI。时间位置分布在一个连续的时间尺度上，而不是量化到离散的位置（由于数字编码的原因，技术上它是离散的尺度，但非常细腻，因此我们可以认为它是连续的）。

实践示例

让我们来看一个 MIDI 演奏。我们将使用 ASAP 数据集，它托管在 GitHub 上。

在你喜欢的终端（我在 Windows 上使用 PowerShell）中，去到一个方便的位置并克隆仓库。

git clone https://github.com/fosfrancesco/asap-dataset

我们还将使用 Python 库 Partitura 来打开 MIDI 文件，因此你可以在你的 Python 环境中安装它。

pip install partitura

现在一切已设置好，让我们打开 MIDI 文件，并打印前 10 个音符。由于这是一个 MIDI 演奏，我们将使用load_midi_performance函数。

frompathlibimportPathimportpartituraaspt# set the path to the asap dataset (change it to your local path!)asap_basepath=Path('../asap-dataset/')# select a performance, here we use Bach Prelude BWV 848 in C#performance_path=Path("Bach/Prelude/bwv_848/Denisova06M.mid")print("Loading midi file: ",asap_basepath/performance_path)# load the performanceperformance=pt.load_performance_midi(asap_basepath/performance_path)# extract the note arraynote_array=performance.note_array()# print the dtype of the note array (helpful to know how to interpret it)print("Numpy dtype:")print(note_array.dtype)# print the first 10 notes in the note arrayprint("First 10 notes:")print(performance.note_array()[:10])

这个 Python 程序的输出应该是这样的：

Numpy dtype:[('onset_sec','<f4'),('duration_sec','<f4'),('onset_tick','<i4'),('duration_tick','<i4'),('pitch','<i4'),('velocity','<i4'),('track','<i4'),('channel','<i4'),('id','<U256')]First10notes:[(1.0286459,0.21354167,790,164,49,53,0,0,'n0')(1.03125,0.09765625,792,75,77,69,0,0,'n1')(1.1302084,0.046875,868,36,73,64,0,0,'n2')(1.21875,0.07942709,936,61,68,66,0,0,'n3')(1.3541666,0.04166667,1040,32,73,34,0,0,'n4')(1.4361979,0.0390625,1103,30,61,62,0,0,'n5')(1.4361979,0.04296875,1103,33,77,48,0,0,'n6')(1.5143229,0.07421875,1163,57,73,69,0,0,'n7')(1.6380209,0.06380209,1258,49,78,75,0,0,'n8')(1.6393229,0.21484375,1259,165,51,54,0,0,'n9')]

你可以看到，我们有音符的起始时间和时长（秒）、音高和力度。其他字段对于 MIDI 演奏来说不那么重要。

起始时间和时长也以ticks（时隙）表示。这更接近 MIDI 文件中实际编码这种信息的方式：选择一个非常短的时间持续单位（= 1 tick），然后所有时间信息都作为该单位的倍数进行编码。当你处理音乐演奏时，通常可以忽略这些信息，直接使用秒数信息。

你可以验证，永远不会有两个音符的起始时间或时长完全相同！

MIDI 乐谱

MIDI 乐谱使用更丰富的 MIDI 消息集来编码信息，如时间签名、调性签名、小节和拍号位置。

因此，它们类似于音乐乐谱（乐谱纸），尽管它们仍然缺少一些重要信息，例如音高拼写、连音符、附点、休止符、连线等……

时间信息不是以秒为单位编码的，而是以更具音乐抽象性的单位编码，如四分音符。

MIDI 乐谱的主要特性

MIDI 乐谱的一个基本特征是，所有音符的起始时间都对齐到一个量化网格，该网格首先由小节位置定义，然后通过递归的整数分割（主要是 2 和 3，但也有其他分割，如 5、7、11 等）来定义小节内的附点音符。

实践示例

我们现在将查看巴赫前奏曲 BWV 848 C# 大调的乐谱，这是我们之前加载的演奏的乐谱。Partitura 有一个专门的load_score_midi函数。

frompathlibimportPathimportpartituraaspt# set the path to the asap dataset (change it to your local path!)asap_basepath=Path('../asap-dataset/')# select a score, here we use Bach Prelude BWV 848 in C#score_path=Path("Bach/Prelude/bwv_848/midi_score.mid")print("Loading midi file: ",asap_basepath/score_path)# load the scorescore=pt.load_score_midi(asap_basepath/score_path)# extract the note arraynote_array=score.note_array()# print the dtype of the note array (helpful to know how to interpret it)print("Numpy dtype:")print(note_array.dtype)# print the first 10 notes in the note arrayprint("First 10 notes:")print(score.note_array()[:10])

这个 Python 程序的输出应该是这样的：

Numpy dtype:[('onset_beat','<f4'),('duration_beat','<f4'),('onset_quarter','<f4'),('duration_quarter','<f4'),('onset_div','<i4'),('duration_div','<i4'),('pitch','<i4'),('voice','<i4'),('id','<U256'),('divs_pq','<i4')]First10notes:[(0\.,1.9958333,0\.,0.99791664,0,479,49,1,'P01_n425',480)(0\.,0.49583334,0\.,0.24791667,0,119,77,1,'P00_n0',480)(0.5,0.49583334,0.25,0.24791667,120,119,73,1,'P00_n1',480)(1\.,0.49583334,0.5,0.24791667,240,119,68,1,'P00_n2',480)(1.5,0.49583334,0.75,0.24791667,360,119,73,1,'P00_n3',480)(2\.,0.99583334,1\.,0.49791667,480,239,61,1,'P01_n426',480)(2\.,0.49583334,1\.,0.24791667,480,119,77,1,'P00_n4',480)(2.5,0.49583334,1.25,0.24791667,600,119,73,1,'P00_n5',480)(3\.,1.9958333,1.5,0.99791664,720,479,51,1,'P01_n427',480)(3\.,0.49583334,1.5,0.24791667,720,119,78,1,'P00_n6',480)]