Home Page Author: theover.tripod.com
To make clear what the basics and some more advanced ideas in sound synthesis are, this page contains overview of theory of sampling and fourier analysis, sound generating, and various synthesizer related kinds of signal processing building blocks.
Copyright is preserved for all materials on this and affiliated pages, permission can be granted to use and reuse the materials for non-commercial purposes with the source is mentioned after the author gives written permission.
First the idea needed to understand what sounds samples are, what they look like, and what they probably sound like. As this page grows, more will be added on specific synthesis elements I experimented with.
The Java applet on this page takes on a central role, because it allows viewers to try out for themselves what certain signals look like, are made of, and even their frequency spectrum.
software is available, for doing
string simulations (
the program sources are in the zip file,
example sound file 1,
page with older version, cygwin library and tcl link explanation ),
sample processing (A)
the program source version A (C file),
see page A,
for more sources and examples,
example sound file A,
example sound file B,
and even for
the program source,
which all work with CD quality (currently mono) .wav files (see the various pages for examples), which can be played with netscape, window media player and the like.
I can't use these myself at the moment, but the sources and windows 95 , 386+/pentium+ executable programs are available. Beware of the need for the cygwin library, the right version and where it can be obtained is indicated on various pages where the programs are described. There is a zipped version on somewhere for easier download, just install the version going with the .prg file in the same directory as the program file to make things work, or it can be put in the 'windows' directory.
These programs are completely capable of making high quality synthesized sounds and process them in various ways, except they don't contain various synthesizer building blocks such as analog (equivalent) type of filtering, envelope generators and controlled amplifiers, and they also don't have interpolated fourier spectrum synthesis and sample looping facilities, that are available in (8 bit) example software ( the program sources are in the zip file, older source as text file see page) for driving my current synthesizer hardware. That software doesn't generate output wave files, it communicates over the printer port with my hardware, though it can be used to do graphical interactive waveform generating and display on a 286+ machine (vga or so).
As I have time I'll add information about wav players, PC audio processing software, and mpeg coding.
The amplitude of the pressure waves is a measure for how loud a sound appears, whereas the repeat rate of the pressure changes is a measure for the frequency of the sound, fast pressure changes are high frequencies (or high notes), slow pressure changes are low frequencies.
When a microphone is used, the (small) electrical signal from it is similar to the patterns of changed pressure traversing the air it detects, and after electronical amplification, such a signal can be made visible for instance on an oscilloscope, where is can be seen a changing, wavey line on the screen. If you'd have a seismograph writer, the machine with pens seen wiggling when an earthquake occurs, which is fast enough, the pressure changes picked up from a certain listening position in the air, by an ear or a microphone, could be recorded on paper as wiggly patterns in a non-volatile form.
A computer with a sound card and a microphone can also be used to make a sounds recording, but in a different way, and can display that recording in graphical form. The main idea as many will know is that the sound making an electrical signal in the microphone is amplified and fed to an Analog to Digital Converter in the soundcard. Such a converter periodically takes a sample of the magnitude of the signal, and computes the voltage of the microphone signal at those times, which are then recorded as a long list of what are called sample values.
The AD converter takes the electrical wire from the input on one side, and produces digital samples on its outputs on the other side, making digital signal patterns according to the voltages it measures all the time. The computer puts these values in its memory, one every little time, storing them in a part of its memory where they later can be used or processed. The DA converter van be given these values from that memory in the same order, which then converts them back to an electrical signal, which sounds the same as the signal that has been put in the input of the AD converter, which is the called sampling and sample play back as general terms.
The idea of sampling is that both the number of values for each sample are limited, and the number of voltages recorded per second are limited, on a CD player, there are pretty exactly 44100 samples per second (44 thousand) taken or played, and 65536 levels recognized. That means a number is recorded every about 50th of a millisecond, and the number is about 5 digits accurate, which is not that easy electronically, though currently it is not considered all to hard, there are many chips that have circuits to do this job.
The term sampling refers to the idea of taking a sample of the signal at some point in time, and there is a lot of theory indicating how many samples we need to take to be able to reconstruct the original signal from samples. The fact that that is possible is not at all trivial, and it depends on the fact that the signal has limited bandwidth, which means that the highest frequency in it is limited. As an idea, when the wiggles made by the seimograph are slow, it would suffice to make a little mark every centimeter or so to get a good idea of the signal, whereas when it changes rapidly all the time, it might not even by good enough to do so every millimeter.
The remarkeble part of the theory is, that it can be mathematically proven that the samples every now and then can be enough to reconstruct the exact original signal, with mathematical, perfect accuracy, when the highest frequency of the signal is not more than half the sample frequency.
The less well known term quantization refers to the idea that when we measure the hight of the signal, for instance with a centimeter on the seismograph paper, we measure maybe up to a millimeter, but not more accurate, so wiggles smaller than a millimeter are lost, and not recorded. Doing so introduces a quantisation error of for instance + or - half a millimeter. For an audio signal of lets say .65 volt, the quantisation error of an 16 bit signal would be about 1/65000 of .65 volt is 10 micro volt, which is a very small voltage difference. But when a small signal is present, for instance of .0065 volt (turn the volume knob a quarter or so), the quantisation error on the CD players output connectors it about 1 % of the signal, which is quite audible, and unacceptable for hifi audio.
The sampling accuracy is pretty good when 16 bits are used, though there are newer recording studio facilities and media (such as dvd) where even more accurate signals of for instance 20 (1 in a million) or 24 bits are used, to prevent distortion with smaller signals.
The fun starts when a sample is put in a computer memory and it can be modified or stored, for anything from phone answering machines to making a hit record. The lists of samples are for normal CD quality 44100x2x2 = ca. 176 kilo bytes or one sixth of a megabyte per second recorded sound, because every sample takes two bytes times two channels for stereo. So on a 128 Megabyte computer system, about 10 minutes of CD quality sound samples fits in memory, without the need to store it on disc. For dealing with sound synthesis, this is a generous amout that is normally not needed, except when for instance every note of a grand piano is sampled seperately, for various loudness levels, then it may well not suffice.
As we know from commercials and such, when we play a voice back at different pitch, we get growly or micky mouse voices, and when we repeat a little piece of a recording a few time, funny stuttering effects occur. The former is an example of changing the pitch of the recorded sample, which can be done by changing how fast we feed the samples to the digital to analog signal converter, if we do this faster we are sooner at the end of the sample and the sounds will be higher in pitch. The other effect is called 'looping' a sample, repeating a section of it, for instance one word or a syllable, in a loop, many times.
Clear enough there is no reason to limit the type of signals which we can record as a sample, so we can plug in a guitar, a electronical organ, a effect record, whatever to the ad converter from the soundcard, and make digital recordings in the form of short or long 'samples', which is than terminology for refering to the list of bytes in the computer containing the audio signal like the seismograph paper.
The same then holds as with the voices sounding like donald mickey mouse when we vary the playback pitch: the sound becomes unnatural when it is detuned, a middle piano key is not a bass note played at 78 rpm from an LP, the sound structure is different, even though the instrument is the same.
The main point is that the way a recording sounds, depends on the signals appearance, certain signals sound certain ways. The main point of the seismograph approach is that it visualizes the signal as a graph, which can at least be inspected visually. Suppose we put all the samples on a row as a wiggely line on the computer screen, we in principle should be able to tell what signal graph corresponds to which audio impression. When the line is completely constant, there is no signal, when there are wild and rapid excusions, there are probably wild noises with high frequencies going on.
The rest is hard, realy quite hard, for two reasons: there are a lot of samples to put on a graph, 44 thousand for a short, one second sound, and the shape of the wiggle is not so easy to relate to a specific sound at all in general. The former can be solved by showing graphs which are 'zoomed down' from many samples, by taking a lot of them together for each little line segment of the graph, which works for instance to see the pauses between words of a sample of a spoken sentence, which are little flat, 0 amplitude pieces of sound absence in the graph.
It doesn't work that well on for more signal information, which can be understood
Certain menus will simply overwrite the formulas with preset values, and some also press the plot buttons, while others only change the formula, so you can plot yourself when desired.
These links also change the expression field when clicked.
A square waves ' first 8 harmonics
's first 8 harmonics (with reversed time, to make it an
A square 's first 32 harmonics
The idea of the non-perfect waveforms is that it can be seen that if we do the reverse of the analysis of the perfect waves, that is if we make the fundamental components, the sine waves of the spectrum ourselves, in the right ratio, the same waveform type can be made, and that the more compoments of the infinite sequence we take into account, the more the 'perfect' wave is approximated.
Also, we see the same effect as takes place in filtering, if we take harmonics away on the high side, the sound gets duller, as if it has its higher fequency components filtered out by a low-pass filter, and the graph gets rounder, the fast changes are smoothened.
The idea of the time reversal and shifting shows that the same sinusodial components can be shifted all at the same time, without the spectrum changing. It should be clearly noted that the phases in spectrum are important, and should be in the graph, but just shifting the whole wave and letting its shape intact doesn't change the sound impression spectrumwise.
When frequency of the waveform is increased a bit, by ten percent in the example, it doesn't fit the window anymore, and there will be a discontinuity where the wave 'wraps around' at the boundaries of the analysis interval, the sound becomes sharp at those points, which is reflected in the spectrum by a range of harmonics. The prinipal component analysis of the waveform as part of a longer sample than fails to give a good impression in easy musical terms.
example filtered (no phase correction)
spectrum change as generating harmonics
The basis is the modulation of oscilators in frequency sense, that is the frequency of one oscilator is changed by anothers' output waveform. The result is that if we use sine oscilators, for the simplest starting spectrum
digital update rate (also control signals)
frequency accuracy, detuning, phase stability (also oscilator syncing)
dynamical properties (filter types, resonant behaviour, transients, non-linearities)
Attack portion and signal phases
mutual influences of parts
That system is built completely by hand from dozens of chips, many electronics parts, and easily a thousand wires to connect it all up. It has red led display with 16 characters, a little calculator keyboard with 40 keys, and can be played by a little baby synthesizer keyboard with a few octaves. It drives an audio amplifier and quality speakers for monitoring the sounds that come from its, custom built, high speed 8 or 9 bit (but accurately so) DA converter.
Retro page about my old synth (20 years ago) and proposals for new ones, in dutch.
Diary page list With various pages mentioning in depth some of the synth experiments and buildup
Pictures of most synths I have owned.
example waveguide C code
A basic sampler harware design, electronics and digital part, that is, with simulation results
Highlights Page, lots of stuff, also about the string modeling software
Previous university page I worked (with picture), and project page.
Piano recording mpeg file.
Page has been visited
(from feb 12, '01)