Audio forensics on CEDAR Cambridge
The cleaning up of audio signals for forensic investigation imposes very different requirements from those encountered when restoring audio for rerelease or broadcast. In the former case, intelligibility is the overriding criterion; in the latter, listenability is paramount. There are, of course, overlaps between the disciplines. For example, transcription experts can work more effectively and for longer if the material is more listenable, although they will extract more words if the material is more intelligible. Some commentators suggest that enhancing listenability is a simple procedure, satisfied by applying EQ, compression, and gain. This is false, and any from the full arsenal of CEDAR restoration tools may be relevant to a given job, depending upon the types of noises that are obscuring the wanted audio. Nonetheless, enhancing intelligibility often requires further algorithms of the class 'adaptive filters'.
Types of noise encountered in audio forensic investigation
Covert and other surveillance recordings are usually made under less-than-ideal circumstances, so one or more forms of noise will almost always be present. A non-exhaustive list of such noises includes:
- air conditioning noise
- excessive reverberation and echoes
- wind and rain
- road vehicles and aircraft
- engine noise
- domestic appliances such as refrigerators
- live music
- other conversations
- radio and TV
- induced hum and/or buzz from lighting and other sources
- interference from nearby transmitters such as mobile telephones
- faulty microphones and recording equipment
- induced noise on a recording medium
Some of these noises are best removed using processes such as debuzz or declickle, and many are reduced by the judicious use of band-limiting EQ and FNR. Others will respond best to the adaptive filters described here, or to a combination of adaptive and other filtering techniques.
What are Adaptive Filters?
The principal use of filters in a forensic setting is to separate the wanted part of the signal (generally speech) from the undesired. If the characteristics of the noise are statistically constant (for example hiss from a tape or a constant hum or whistle) it is possible to design a static filter that, in some mathematical sense, optimally separates the speech from the noise. This is what we instinctively do when setting up a notch filter to remove a simple tone. There are, however, circumstances when this job is difficult or impossible to do manually. For example:
- when the noise has a complicated spectrum, such that the filter needs a large number of parameters to be adjusted
- when the noise characteristics are varying rapidly, such that manual setting and adjustment of the parameters is impossible
- when the noise exhibits both of these characteristics (e.g. the sound of a TV broadcast mixed in with the signal)
In these cases we require a filter which has a large number of internal parameters that it can automatically adjust on our behalf (perhaps with some minimal degree of guidance), in accordance with the varying signal characteristics. Such filters are called adaptive filters and CEDAR Cambridge offers two types, both implemented in single- and cross-channel forms.
Single-channel filters seek to separate the wanted sound (speech) from unwanted sounds (background noise) by identifying the rate of change of components within the overall signal. Because speech changes rapidly, all rapidly varying signal components are deemed to be wanted and are retained. Less rapidly varying components are rejected and are not passed to the output.
Cross channel filters separate the desired signal from an obscuring signal such as a radio or TV broadcast by comparing the signal to a reference track containing the interference. The adaptive filter determines which elements of the recording are due to the content within the reference (even if this has been changed significantly by the acts of, for example, transmission or replay) and which elements do not match the reference, and are therefore deemed to be the required signal. To obtain optimum results, the surveillance recording and the reference must be time-aligned, so CEDAR Cambridge provides an additional module to ensure that this is the case.
FNR embodies our most sophisticated algorithm to date, and provides a remarkable amount of noise reduction without destroying the wanted speech. It's also very simple to use. Whether you want to reduce the noise by a few dBs to aid transcription or dig the speech out of a particularly noisy signal or recording to aid intelligibility, FNR will prove to be invaluable.
Derived from the CEDAR Trinity surveillance system supplied to law enforcement and security agencies, Trinity Enhance has been designed specifically for audio forensic use. It provides four related tools that, using just four sliders, allow users to suppress background noise, reveal voices and increase the intelligibility of speech. It also allows you to enhance the background if that contains wanted detail.
See Trinity Enhance