A collection of contrived questions intended to address some common misconceptions and uncertainties regarding MIDI.
A FAQ has to start somewhere, and this seems as good a starting question as any.
MIDI is an acronym for Musical Instrument Digital Interface. It is both a hardware and software specification.
It allows electronic musical instruments (synthesizers, samplers, drum machines, etc) and MIDI-capable computers to talk to one another, and for musical performance related data to be communicated between them. In that respect, MIDI is the language with which they communicate.
Using a system of MIDI channels, a single MIDI connection allows independent control of up to 16 different instruments (analogous to 16 musicians).
For slightly more detail about the hardware and software aspects of the MIDI specification, see my Introductory Guide to MIDI. For even more detail regarding the software aspects, see my Guide to the MIDI Software Specification.
These terms are used to specify the number of different notes / instruments that a sound generator is capable of playing simultaneously.
• Monophonic is used to describe an instrument that can play only a single note at any time. A flute is an example of a monophonic instrument.
• Polyphonic is used to describe instruments that are capable of playing more than one note at a time. A piano is an example of a polyphonic instrument. An acoustic guitar would be described as a 6 note polyhonic instrument.
• Multi-timbral (many timbres) describes a sound generator that is capable of simultaneously producing the sounds of two or more musical instruments. The individual timbres are often referred to as parts, and these can each be monophonic or polyphonic. Each part is assigned its own MIDI channel so that each timbre can be addressed independently.
Early multi-timbral sound generators were rather limited, e.g. offering 8 note polyphony and 3 part multi-timbrality, with fixed allocation (two monophonic parts and one 6 note polyphonic part).
These days, most sound generators are at least 32 note polyphonic and 16 part multitimbral, with dynamic voice allocation (i.e. the number of notes used for each part varies as needed, within the overall limitations).
Due to the use of MIDI channels to address each part, and the limit of 16 different MIDI channels, any sound generator with just a single MIDI input can be no more than 16 part multi-timbral.
MIDI files are the standard means of transferring MIDI data amongst users – it is a common format across all computing platforms, with numerous example files being available on the internet.
The content of a MIDI file is structured as a series of blocks of data referred to as chunks. A valid MIDI file will contain at least two chunks - the first always being a header chunk, followed by one or more track chunks.
Track chunks contain a sequence of events, each of which has a
delta time value associated with it - i.e. the number of ticks
(amount of time) since the previous event. There are three types of event : MIDI,
SysEx and Meta.
It is possible to mix MIDI messages having different MIDI channels within the same track chunk, i.e. multi-channel tracks are possible.
Meta events are specific to MIDI files, and have no direct equivalent within the standard MIDI specification. They are used for sequencer related information (track names, tempo info, etc).
For full details, see my MIDI Files document.
Compared with sound data files (e.g. Wave or even MP3), MIDI files are extremely compact.
Any decent MIDI sequencer should allow MIDI files to be loaded and saved, in addition to the use of any proprietary file format.
This is an often asked question that reveals a common misconception regarding MIDI files. The question doesn't actually make sense in that MIDI files do not contain sound data. MIDI is concerned with performance actions rather than the sound produced by these actions.
Performance actions are described by the various MIDI events (MIDI messages), covering such actions as : select an instrument, start playing a note, bend any notes currently sounding, stop playing a note, etc.
A MIDI Tone Generator (e.g. sampler, synthesizer, drum machine) interprets the MIDI data it receives (representing the performance actions) and produces the appropriate noises.
So, coming back to the original question, the sound quality you get from your MIDI files is entirely dependant on the sound quality of your Tone Generator(s).
Musical quality, however, is a different matter altogether.
There are three different MIDI file formats (as specified by the format info in the file's header chunk).
Format 0 : These files comprise a single track chunk comtaining both tempo and (possibly multi-channel) MIDI message data.
Format 1 : These files comprise two or more track chunks, that are intended to be played simultaneously. The first track is a tempo track that contains only tempo related Meta events (i.e. no actual MIDI message data). The remaining tracks are each capable of multiple MIDI channel data (though often just one channel per track, as it's more versatile and less messy when it comes to editing). This format is analogous to a multitrack tape recorder paradigm.
Format 2 : These files comprise one or more track chunks that are completely independent of one another, i.e. they are not intended to be played simultaneously. The individual tracks can be complete songs, or they can contain patterns (musical phrases) that represent the elements of a composition. There is no separate tempo track as each track can contain its own (independent) tempo information. This format is analogous to a drum machine paradigm. Format 2 files are not very common on the web, as they either represent work in progress or a bundled collection of songs.
Note Tempo information includes Tempo, TimeSig, KeySig, Cueing, etc. Meta events.
The tickdiv value (in a MIDI file's header chunk) specifies two aspects - the timing resolution used within the file, and whether the timing is metrical (bars and beats) or timecode (hours, minutes and seconds) based. Most publicly distributed MIDI files use metrical time.
For metrical time, tickdiv specifies the number of ticks that make up a quarter note (i.e. the number of sub-divisions of a quarter note).
A common value used in many MIDI files is 96 ppqn (pulses per quarter note), i.e. 384 ticks per 4/4 bar.
You will notice that 96 is a nice number for dividing by 2 or 3 (with further repeated halving), so when using this value for tickdiv, triplets and dotted notes are fine right down to hemi-demi-semiquavers (see timing resolution).
There is no simple straightforward answer to this question, as a number of factors are involved.
With serial transmission there's no such thing as notes playing at exactly the same time. If you play a three note chord, no matter how tightly, those three NoteOn messages will be sent one after the other.
MIDI uses a baud rate of 31250, which equates to ~3000 bytes/sec. NoteOn/Off commands (without running status) are 3 bytes each, hence the note resolution of a MIDI data stream is ~1 ms. This is independent of tempo, however at higher tempos you will be more likely to encounter this limitation. With running status in effect, this value drops to 2/3 of a ms.
So the timing resolution of MIDI note data flowing along a MIDI cable is either 2/3 or 1 ms.
This assumes that the transmitting and receiving devices aren't busy doing other things (see latency).
If timecode based time is specified, then the timing resolution is one of 24, 25, 29.97 or 30 frames per second, and is independent of tempo.
If however metrical time is in use (i.e. the tickdiv value specifies divisions of a quarter note), then the timing resolution varies with the tempo, until it hits the limit imposed by the serial transmission method, mentioned above.
The commonly used tickdiv value of 96 (i.e. 384 ticks per 4/4 bar) gives us (at the bottom end) :
1 tick = 1/256 note triplet 2 ticks = 1/128 note triplet 3 ticks = 1/128 note 4 ticks = 1/64 note triplet 6 ticks = 1/64 note (hemidemisemiquaver) 8 ticks = 1/32 note triplet 9 ticks = dotted 1/64 note 12 ticks = 1/32 note (demisemiquaver)
This level of resolution may seem to be more than enough for most purposes, and indeed it is if you don't mind your MIDI files sounding a little too perfect. However, when humans play music they introduce subtle timing variations that are perceived as expression rather than inaccuracy. Thus the finer resolution permitted by using a higher tickdiv value enables a human feel to be captured (or introduced) as notes can be positioned marginally off their precise position.
Latency is a timing related issue that is implementation dependant (rather than being a limit imposed by the MIDI specifications). It refers to the delay between the request for an event and the occurrence of that event (e.g. the delay between pressing a key on a MIDI keyboard and hearing the sound produced).
With dedicated hardware synthesizers, this is rarely an issue – even with budget devices. When using a desktop computer however, latency can become a problem, as desktop computers are constantly busy even when apparently idling.
As a general rule, a process implemented in hardware will be quicker than one implemented in software. A soft synth or plug-in software effect (both of which can involve heavy processing) can introduce quite high latencies. A computer with a fast processor and fast RAM can keep software latencies at bay.
A latency of 10ms or less is considered tolerable for real-time playing. A good quality soundcard with (eg) ASIO drivers could have a latency as low as 2ms.
Quantisation is a process that imposes a regularity on the timing of a musical performance, thus it can be used to tidy up sloppy playing. Sequencers often offer a variety of different quantisation types - tight, groove, human, etc.
This is the process of forcing a strictly mechanical accuracy. It essentially applies a grid to the timeline. So if you sloppily play 4 quarter notes, tight quantisation will move the notes to be in the exact 1/4 bar positions (assuming a time signature of 4/4 and quarter note quantisation).
This is intended to tidy up sloppy playing, but not as mechanically as standard quantisation. There are two main approaches : either notes can be partially quantised (i.e. moved towards but not actually to their exact position) or they can be tightly quantised and then have a random displacement introduced. It can also be used to remove the perfect regularity from a tightly quantised section of music.
This is slightly different in that it enables a specific groove to be imparted on a section of music. This groove can be one that you specify (e.g. you can do things like advance/retard, or change the emphasis of notes which fall on or near a particular position in the bar), or it can be taken from another track.
The very first sequencers were analogue hardware devices where quantisation wasn't an issue because they only had 16, 24 or maybe 32 steps to a bar. Thus note placement was inherently quantised. Also these sequencers used step-time entry - there was no concept of capturing a real-time performance. It was only with the arrival of software sequencers that greater timing resolution and real-time capture became possible.
However, the increased timing resolution presented a problem for people who weren't technically musically proficient (i.e. people who played sloppily with regards to timing). To combat this, quantisation features were added to sequencers. Initially this was just tight quantisation, which was received well by sloppy players, though hated and thus not used by more proficient musicians because it removed all their intentional subtle timing nuances.
Less rigid forms of quantisation (e.g. Human or Groove quantisation) aim to satisfy both camps by helping sloppy players to sound not so sloppy yet without making them sound mechanically (and thus boringly) perfect, whilst adding useful compositional aids that can also be of benefit to more advanced users.
There are three approaches to ensuring a particular instrumentation within a MIDI file :
This is an extension to the original MIDI standard. It specifies the voice mapping for the 128 voices available through the ProgChange MIDI message. On a device supporting the BankSelect command (i.e. one that provides multiple banks of up to 128 voices in each), the GM voices will be in bank zero. A set of 47 drum sounds is also specified.
This approach is best for MIDI files that are intended for public distribution, as most people who have MIDI capability of some kind, will most likely have a GM mode available to them.
Most hardware synthesizers provide System Exclusive commands allowing voice definition data to be transferred to and from the synthesizer. Thus it is possible to grab voice definitions from the synthesizer and, using a MIDI sequencer, embed this system exclusive data within a MIDI file (at the beginning, with the song data itself starting a couple of bars later). Thenafter whenever playing this MIDI file, the embedded system exclusive messages will be sent to the synthesizer, setting up the required voices.
This is common practice amongst people who like to define/edit their own voices, though is only really suitable for personal use, as the System Exclusive voice data is device specific.
This is a more recent addition to the MIDI standard that allows wavetable voice definitions (or samples if you prefer) to be embedded within a MIDI file. Thus anyone playing the file via a DLS capable device will hear the exact sounds that the file's composer intended.
Although as yet not many devices support DLS, it will almost certainly become more common. Its main drawback is that sample data tends to be rather large, and MIDI isn't the fastest of communications mediums.
Summarising these three approaches :
|GM||Not device specific.
No file size penalty.
Many devices have GM capability.
|Variations in the timbre and volume envelope of voices between different GM devices.|
|SysEx||Voices are exact.||Device specific.
Files will be slightly larger due to the extra data.
|DLS||Not device specific.
Voices are exact.
|Although it has been around for a while, its use is not common.
Files will be significantly larger due to the embedded sample data.