When you set a tuning fork into motion (a), it vibrates the air molecules around it (b).
Zones of greater or lesser density (c) produce differences in air pressure.
We experience and perceive these differences in pressure as sound.
Look closely at the graph of density (c)...
It describes a back-and-forth motion between high pressure and low pressure zones.
This basic motion is predictable and periodic, and is known as Simple Harmonic Motion.
Imagine: we tie a pen to the vibrating tuning fork, and trace its position on a scrolling paper
(like a seismograph used to record ground motion during an earthquake)...
If we zoom in using a magnifying glass, we'll notice the same back-and-forth motion
oscillating up and down on the paper.
This time, however, the motion is measuring the displacement of the tuning fork in space.
This regular back-and-forth movement between 2 points is an oscillation
and can be described as sinusoidal, meaning it takes the shape of a sine wave.
A sine wave is a mathematical abstraction of simple harmonic motion.
We can describe the physical process of simple harmonic motion using a mass-spring system.
In (a) there is no motion: the mass is still, and it rests at a point (B) of equilibrium.
When we set the spring into motion (b), the mass oscillates up (C) and down (A).
If we slow a down a video of a violin string several hundred times,
we can observe simple harmonic motion in the string’s vibration.
Displacement measures the physical movement of something in space.
If we graph the displacement of the mass in space, as we did the tuning fork (a),
we can describe (b) its speed (velocity) and its change in speed (acceleration).
When the mass crosses through its “rest” position (equilibrium), it is moving its fastest.
But when the mass reaches its highest and lowest positions, its change in speed
(acceleration) has the greatest magnitude (positive + or negative -).
All 3 properties (Displacement, Velocity, and Acceleration) can be described
using sine waves that reveal simple harmonic motion.
A sound wave is very similar...
Here, displacement corresponds to amplitude (loudness)
on the Y-axis, and its change over time on the X-axis.
This sound wave is a single note on a plucked string instrument:
It begins with high amplitude, when the string vibrates its loudest, with lots of energy,
but then gets quieter as time goes on, until silence arrives: when the string stops vibrating.
The string loses energy. Excited by just 1 pluck, the string’s energy dissipates
over time, as it returns to a state of equilibrium.
We must remember that this waveform, with its smooth curves, exists only in nature.
When we produce sound through physical means,
we obtain “ideal” smooth and continuous analogue waveforms.
We can measure the instantaneous amplitude (loudness) of a signal at any point in time,
Yet, this analogue waveform remains “ideally” smooth and continuous,
with an infinite number of points along the X-axis we could measure!
Here, an audio engineer works with mangetic tape and a vinyl record lathe.
These machines record analogue sound waves from electrical signals
which capture continuous physical vibrations picked up by microphones.
In digital audio, there is no smooth curve but only discrete points that represent curves.
We obtain these points by taking periodic measurements of a waveform’s instantaneous amplitude.
Each point represents the value of a signal’s amplitude at an exact point in time.
These periodic measurements of a waveform are called samples.
The original waveform above was sampled every 5 milliseconds (msec).
To reconstruct the analogue signal, a computer can insert values between the sampled values.
This process of inserting new data between points is called interpolation.
Each black square point represents a sampled value from a signal,
and each red line segment represents additional points
inserted between the samples to improve the signal’s smoothness.
The red line segments look and sound much like the original analogue signal...
...but crucially, it will never be precise...
A digital signal can only approximate an original analogue sound wave.
To obtain a digital representation of an audio signal, for example, on your computer,
you will need something to convert between analogue and digital formats.
This is accomplished using an Audio Interface, which connects microphones,
speakers, and other analogue equipment producing a physical vibration
to a computer and to other digital signal processing equipment.
Input signals are converted: analog-to-digital conversion (ADC)
in order to record, edit, synthesize, and process sound on a computer,
If we disassemble a microphone, we will find a diaphragm...
...which physically vibrates in response to sound,
much like the diaphragms in our ears.
Microphones convert physical energy into electrical signals,
and your ADC converts this into a digital signal.
Output signals are also converted: digital-to-analogue conversion (DAC),
in order to amplify, broadcast, and send sound and audio data from a computer.
If we slow a down a video of a speaker several hundred times,
we will see the speaker cone’s physical vibration.
Your DAC converts digital signals into electrical signals,
and your speakers convert this into physical energy.
In digital audio, the sound quality largely depends
on how often we take samples of an input signal.
How often we sample an audio signal is called
the sampling rate — or sampling frequency.
As a rate, the sampling rate measures how many samples are taken per second.
In the example above, A. utilizes a very low sampling rate.
There is more time taken between each sample value.
B. and C. utilize higher sampling rates, which encode increasingly
higher frequencies from the original signal into digital format.
Lower sampling rates decrease the quality of the audio
because they retain fewer components from the original sound.
However, they take up less memory in our computers.
Higher sampling rates increase the quality of the audio
because they retain more components from the original sound.
However, they take up more memory in our computers.
So, in choosing a sampling rate, there is a tradeoff
between audio quality and computational memory.
In digital audio, the standard sampling rate is 44.1 kHz (killohertz)
Meaning amplitudes are sampled 44,100 times per second!
44,100 is the sampling rate you will find on compact discs (CDs)
and is appropriate for listening, playback, and consumer audio applications.
But for audio recording, editing, and production,
a higher sampling rate is necessary in order to hear detail.
Production sampling rates of 48 kHz and 96 kHz are common.
In our MHL Digitale Kreation classes, 48 kHz is our standard
sampling rate for production and final playback formats.
Another measure of audio quality is its bit depth
which measures the dynamic range of an audio signal
(its total loudness and its resolution of loudness).
We want our musical recordings to have a wide dynamic range.
The louder and softer our recordings are,
the more expressive and dynamic our music will be!
Notice the difference in the number of steps, along the Y-axis,
on the left and right sides of the image above...
The left side has a lower bit depth, while the right side has a higher bit depth,
meaning the right side has more points of resolution
to represent the original amplitude of an audio signal.
Here, the analogue input signal (from a microphone) is in blue,
and its digital representation is in orange.
The sample values are quantized at 3 different bit depths:
2 bits, 4 bits, and 8 bits (left to right).
Notice that the shape of the original signal improves as bit depth increases...
Also, notice the quantization error / noise graphs below (in blue)
representing the difference between the digital signal and the analogue input signal.
This difference is the error between the original and digital signals.
Notice there is less noise as bit depth increases.
Dynamic range is the difference between the loudest and softest sounds in our signal.
When we record and edit audio using higher bit depths,
we have more digital numbers available
to represent the full dynamic range of our music.
At lower bit depths, quieter sounds are masked by error noise,
and louder sounds clip and distort.
In this image, the green bars represent the dynamic range.
The dynamic range is wider for higher bit depths (like 24 instead of 16):
an increased range of numbers to represents the loud and quiet parts of our music.
Notice the dark green bars too: this represent the noise floor,
meaning the quietest sounds which become digital error noise.
Here are 2 waveforms.
The upper waveform is softer or quieter:
It only reaches an amplitude of +0.5 / -0.5
The lower waveform is louder:
It reaches a maximum amplitude of +1 / -1
Sounds that are too loud for an audio system to handle are clipped.
Notice how the top and bottom of this waveform becomes flat
at +1 and -1, creating hard edges instead of smooth curves.
Above +1 and below -1, the waveform should continue making smooth, rounded curves.
But digital audio can only represent sampled amplitude values between +1 and -1.
So, sounds that are louder than this will clip and sound distorted.
Here are some examples of the same sound recorded at a different bit depths.
At lower bit depths, listen for distored audio signals.
CD quality audio utilizes a bit depth of 16 bits.
16 bits is a reasonable value for listening
and for consumer audio,
but not for audio editing and recording.
In our MHL Digitale Kreation classes, 24 bits is our standard
bit depth for production and final playback formats.
This table illustrates the difference between sample rate and bit depth.
Sample rate corresponds to resolution on the X-axis (time),
while bit depth corresponds to resolution on the Y-axis (amplitude).
Higher values for both result in a better reproduction of a waveform.
The sample rate (or sampling frequency) measures the
number of times per second that an analogue input signal is sampled.
When a signal is sampled, its instantaneous amplitude is measured.
Computers can interpolate between samples by inserting new values
between sample points, making the waveform smoother.
Higher sample rates correspond to more high frequency sounds
represented from the original signal.
Lower sample rates correspond to fewer high frequency sounds,
and a less realistic reproduction of an original signal.
Bit Depth measures the dynamic range of recorded sound.
Higher bit depths correspond to a wider dynamic range,
representing the louder and softer components of a signal.
Lower bit depths mask quieter sounds with error noise,
and louder sounds clip and distort.
CDs are issued with a sample rate of 44.1 kHz and a bit depth of 16 bits.
Remember: these values are for listening and consumer audio,
but they are not suitable for audio production.
In our MHL Digitale Kreation classes,
a sampling frequency of 48 kHz and a bit depth of 24 bits
is our standard for production and final playback formats.
In your projects and assignments,
export audio files using these values.
In digital audio, we can easily describe
a sound’s basic physical measures:
Frequency is a physical measure of a sound’s
perceived quality of highness or lowness.
Frequency is measured in units of hertz (Hz), or cycles per second,
which is comparable to units like RPM (revolutions per minute),
used to describe the rotation of wheels on a vehicle.
Frequency corresponds to vibration,
and describes the compression and rarefaction
of vibrating air molecules, which carry sound waves.
We humans experience, perceive, and describe frequency as pitch.
Frequency corresponds to vibration,
and is measured as the distance between
the crests or troughs of a waveform.
This distance is called the wavelength (λ)
for regular and repeating (periodic) waveforms.
As frequency increases and is described using a higher value in hertz,
that is, when there are more cycles per second,
the distance between crests in the waveform decreases.
Higher frequencies therefore correspond to shorter wavelengths.
It turns out that all of the frequencies we hear and that our ears
are sensitive to occupy just a small range of vibration.
Everything we can hear is located in the zone VLF (“very low frequencies”).
Above this range are radio transmission frequencies,
the infared, visible light, and ultraviolet light spectra,
and the high frequencies of radiation.
We humans experience, perceive, and describe
this change in frequency as higher in pitch.
Frequency and pitch are related, but different concepts...
Frequency is the physical measure of vibration.
Pitch is a perceptual measure of our experience of frequency.
Pitch describes how our ears respond to changes in frequency.
The graph above illustrates our logarithmic perception of pitch:
Each time we double a frequency value (Hz), pitch is raised by an octave.
We perceive that the distance between each octave is the same.
We therefore experience frequency on a linear scale.
But this does not correspond to the physical reality of vibration:
The distance between each octave, in frequency, changes depending on the register...
For example, the distance between A0 and A1 is only 27.5 Hz,
but the distance between A5 and A6 is 880 Hz.
This means that changes in higher frequencies
require greater changes in the number of cycles per second,
while changes in lower frequencies
require smaller changes in cycles per second.
In other words, changes in higher frequencies
require more physical energy
than changes in lower frequencies,
which require less physical energy.
Different sound sources, including acoustic instruments,
occupy different frequency ranges.
Can you imagine what this graph would look like
if it were plotted in frequency rather than pitch?
We have already mentioned amplitude...
Amplitude is a measure of the intensity of sound:
its perceived quality of loudness or softness.
When a waveform is louder, the crests do not occur more often, or more frequently...
Instead, the intensity of sound corresponds to the height of each wave;
that is, the maximum of each crest,
and the minimum of each trough.
In this image, the upper waveform is softer or quieter than the lower waveform,
which is louder or more intense than the upper waveform.
Let’s consider how this results in the analogue reproduction of a digital sound wave;
that is, when we send a waveform to our computer’s DAC to be amplified over loudspeakers...
A speaker vibrating faster or slower corresponds to a change in frequency,
but the force of its vibration, that is, whether the speaker cone is displaced
or moves over a greater distance in the same amount of time,
corresponds to the loudness or intensity of sound.
If the cone moves fast and is displaced by a small distance or force,
the result will be a high and quiet sound.
If it moves slowly and is displaced by a larger distance or force,
the result will be a low and loud sound.
Linear Amplitude | dBSPL |
---|---|
1 | 0 |
0.5 | -6 |
0.25 | -12 |
0.125 | -18 |
0.1 | -20 |
0.01 | -40 |
0.001 | -60 |
0.0001 | -80 |
0 | -inf |
Like frequency, amplitude is represented on a number of possible scales.
And like frequency, amplitude is perceived on a linear scale
while its physical properties are measured on a logarithmic scale.
In music, we use a relative scale with dyammic markings
to express loudness, using symbols like ff and ppp.
Linear Amplitude | dBSPL |
---|---|
1 | 0 |
0.5 | -6 |
0.25 | -12 |
0.125 | -18 |
0.1 | -20 |
0.01 | -40 |
0.001 | -60 |
0.0001 | -80 |
0 | -inf |
Linear amplitude is one kind of digital scale.
Audio sample values are encoded into a sound file using linear amplitude.
0 is equivalent to silence and 1 is equivalent to the loudest possible sound.
On this scale, all possible levels of loudness are represented between 0 and 1.
Linear Amplitude | dBSPL |
---|---|
1 | 0 |
0.5 | -6 |
0.25 | -12 |
0.125 | -18 |
0.1 | -20 |
0.01 | -40 |
0.001 | -60 |
0.0001 | -80 |
0 | -inf |
dBSPL means sound pressure level (SPL) measured in decibels (dB)
and is a measure of pressure within the air carrying sound waves.
It is mapped onto a scale (dB) whose values represent a ratio
between a sound and a reference value, like the pressure value of silence.
Linear Amplitude | dBSPL |
---|---|
1 | 0 |
0.5 | -6 |
0.25 | -12 |
0.125 | -18 |
0.1 | -20 |
0.01 | -40 |
0.001 | -60 |
0.0001 | -80 |
0 | -inf |
Here, the reference value 0 represents the
loudest possible sound a digital audio system can handle.
All other values are increasingly negative as they become softer.
Silence is represnted as infinitely quiet (-inf).
You will find a related scale dBFS printed on the faders of mixing consoles
(“decibels relative to full scale”).
0 is full scale and represents no change to the incoming signal.
Values above 0 represent amplification of the signal,
while values below 0 represent attenuation of the signal.
dBSPL | Common Sounds |
---|---|
140 | threshold of pain |
130 | jet taking off |
120 | rock concert |
110 | symphony orchestra fortissimo |
100 | truck engine |
90 | heavy traffic |
80 | retail store |
70 | office |
60 | normal conversation |
50 | silent house |
30 | leaves rustling |
20 | wind |
0 | weakest perceptible sound |
Here is a decibel scale that reverses the reference frequency,
where 0 is now the quietest sound possible.
All other common sounds are described as pressure levels above silence.
from Sound Synthesis and Sampling by Martin Russ
Synthesis is defined in the Chambers 21st Century Dictionary as:
“building up; putting together; making a whole out of parts.”
The process of synthesis is thus a bringing together.
The word synthesis is frequently used in just two major contexts:
the creation of chemical compounds
and production of electronic sounds.
from Sound Synthesis and Sampling by Martin Russ
Sound synthesis is the process of producing sound.
It can reuse existing sounds by processing (or transforming) them,
or it can generate sound electronically or mechanically.
In our class, sound synthesis is used to designate the latter, meaning:
Sound synthesis generates sound electronically or mechanically.
Found sounds will be our term for reusing existing sounds.
For example, when we used a tape machine
to sample and transform existing sound,
that is, sound recorded with a microphone,
this process is separate from the generation of new electronic sound (synthesis).
In synthesis, basic waveforms are often used:
For example, sine, sawtooth, triangle, and square waves.
These waveforms are “built up” and “put together” to form a “whole.”
Often, this “whole” is more complex that its individual, “pure” waveforms.
Synthesis is produced using a synthesizer:
a machine that is used to electronically or mechanically produce sound.
For example, this Buchla synthesizer passes electronic signals through many colored patch cables,
and is controlled by many knobs, buttons, and sliders that Suzanne Ciani performs with.
We hear the waveforms produced by the synthesizer through our speakers.
Or this Yamaha DX-7 synthesizer, which uses a familiar keyboard as its user interface,
and is controlled in a much more conventional and musically-intuitive way.
Synthesizers consist of many parts that produce or modify electronic signals,
such as oscillators and filters. An oscillator generates periodic waveforms,
such as sine waves. A filter allows some frequencies to pass through it,
while blocking others. Synths may use additional low frequency oscillators (LFOs)
to modulate their sounds; that is, to produce “change over time.”
Click through the embedded slide presentation above...
These slides and especially the images herein are from a number of well-known texts on acoustics, music, and musical technology below: