-
Essay / Sound Quantization Analysis
Table of ContentsSound Quantization AnalysisSound Pulse Code ModulationDiscrete Fourier Transform AlgorithmWindow FunctionsAcoustic FingerprintPure tones do not exist naturally, but every sound in the world is the sum of several pure tones at different amplitudes. A musical song is played by several instruments and singers. All of these instruments produce a combination of sine waves at multiple frequencies and the whole thing is an even larger combination of sine waves. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get an original essay A spectrogram is a highly detailed and precise image of your audio, displayed in 2D or 3D. Audio is displayed on a graph versus time and frequency, with brightness or pitch (3D) indicating amplitude. While a waveform shows how the amplitude of your signal changes over time, the spectrogram shows this change for each frequency component of the signal. As an example, you can see the impact of droplets consistently forming large surface bubbles and the standard “bloop” noise in the figure. . 4. Color represents amplitude in dB. In this spectrogram, some frequencies are more important than others, so we can build a fingerprint algorithm. Analog signals are continuous signals, which means that if you take one second of an analog signal, you can divide that second into parts that last a fraction of a second. . In the digital world, you cannot afford to store an infinite amount of information. You must have a minimum unit, for example 1 millisecond. During this unit of time, the sound cannot change, so this unit must be short enough so that the digital song sounds like the analog song and large enough to limit the space needed to store the music. Nyquist's sampling theorem provides a prescription for nominal value. sampling interval required to avoid aliasing. This can be stated simply as follows: the sampling frequency must be at least twice the highest frequency contained in the signal. Or in mathematical terms: fs = 2 fc where fs is the sampling frequency (the frequency at which samples are taken per unit of time or space), and fc is the highest frequency contained in the signal. A theorem by Nyquist and Shannon states that if you want to digitize a signal from 0 Hz to 20 kHz, you need at least 40,001 samples per second. The standard sampling rate for digital music in the music industry is 44.1 kHz and each sample is assigned 16 bits. Some theorem definitions describe this process as a perfect recreation of the signal. The main idea is that a sinusoidal signal at a frequency F needs at least 2 points per cycle to be identified. If your sampling frequency is at least twice the frequency of your signal, you will get at least 2 points per cycle of the original signal. Sampling, the process of converting a signal into a digital sequence, is also called analog. –to–digital conversion. Quantification is another conversion process, which involves precise measurement of each sample. Analog-to-digital and digital-to-analog converters encode and decode these signals to record our voices, display images on the screen, or play audio clips through speakers. Since we can digitize media, we can manipulate, recreate, modify, produce and store text, images andsounds. The theorem, although it may seem simple, has changed the way our modern digital world works. We can uniformly use media to our advantage in several ways. The limitations we have can be resolved through filters and adjusting our sample rates or frequencies. Although it does not have the same shape or amplitude, the frequency of the sampled signal remains the same. Analog-to-digital converters perform this type of function to create a series of digital values from the given analog signal. The following figure represents an analog signal. This signal to be converted to digital must undergo sampling and quantization.Analysis of sound quantizationQuantization is the process of mapping input values from a large set (often a continuous set) to output values in a larger set. small (countable). Rounding and truncation are typical examples of quantification processes. Quantization is involved to some extent in almost all digital signal processing, because the process of representing a signal in digital form usually involves rounding. Quantization also forms the core of virtually all lossy compression algorithms. Quantization makes the range of a signal discrete so that the quantized signal takes only a discrete, usually finite, set of values. Unlike sampling, quantification is generally irreversible and results in loss of information. This therefore introduces a distortion into the quantized signal which cannot be eliminated. One of the fundamental choices in quantization is the number of discrete quantization levels to use. The fundamental tradeoff in this choice is the quality of the resulting signal versus the amount of data needed to represent each sample. Figure 6 shows an analog signal and quantized versions for several different numbers of quantization levels. With L levels we need N = log2 L bits to represent the different levels, or vice versa, with N bits we can represent L = 2N levels. Pulse-Code Modulation SoundPulse-code modulation (PCM) is a system used to translate analog signals into digital data. It is used by compact discs and most electronic devices. For example, when you listen to an mp3 file on your computer/phone/tablet, the mp3 is automatically transformed into a PCM signal and then sent to your headphones. A PCM stream is a stream of organized bits. It can be composed of several channels. For example, stereo music has 2 channels. In a stream, the signal amplitude is divided into samples. The number of samples per second corresponds to the sampling rate of the music. For example, music sampled at 44.1 kHz will have 44,100 samples per second. Each sample gives the (quantized) amplitude of the sound of the corresponding fraction of a second. There are several PCM formats but the most used in audio is the stereo (linear) PCM 44.1 kHz, 16-bit depth format. This format contains 44,100 samples for every second of music. Each sample takes 4 bytes (Fig. 7): 2 bytes (16 bits) for the intensity (from -32,768 to 32,767) of the left speaker 2 bytes (16 bits) for the intensity (from -32,768 to 32,767) 32,767) from the right speakerIn a 44.1 kHz 16-bit depth PCM stereo format, you have 44,100 samples like this for every second of music. Discrete Fourier Transform Algorithm DFT (Discrete Fourier Transform) applies to discrete signals and gives a discrete spectrum (the frequencies inside the signal). The discrete Fourier transform (DFT) is a methodallowing to convert a sequence of N complex numbers x0, x1, … xN-1 into a new sequence of N complex numbersIn this formula:N is the size of the window: the number of samples which compose the signalX(n) represents the nth frequency groupx(k) is a kth sample of the audio signal. DFT is useful in many applications, including simple spectral analysis of the signal. Knowing how a signal can be expressed as a combination of waves allows you to manipulate this signal and compare different signals: digital files (jpg, mp3, etc.) can be reduced by eliminating the contributions of the less important waves from the combination. sound files can be compared by comparing the x(k) coefficients of the DFT. Radio waves can be filtered to avoid “noise” and listen to important components of the signal. Other applications of DFT arise because it can be calculated very efficiently. by the fast Fourier transform (FFT) algorithm. For example, DFT is used in state-of-the-art algorithms to multiply polynomials and large integers together; Instead of working directly with polynomial multiplication, it turns out to be faster to calculate the DFT of polynomial functions and convert the problem of multiplying polynomials into an analogous problem involving their DFTs. Window Functions In signal processing, a window function is a mathematical function that has a zero value outside a chosen interval. For example, a function that is constant within the interval and zero elsewhere is called a rectangular window, which describes the shape of its graphical representation. When another function or waveform/sequence of data is multiplied by a window function, the product also has a zero value outside the interval: all that remains is the part where they overlap, the " view through the window. In typical applications, the window functions used are non-negative, smooth, "bell"-shaped curves. Rectangle, triangle and other functions can also be used. A more general definition of window functions does not require that they be identically zero outside an interval, as long as the product of the window multiplied by its argument is square integrable, and, more precisely, that the function tends sufficiently quickly towards zero. The Fourier transform of the cos ?t function is zero, except at frequency ± ?. However, many other functions and waveforms do not have practical closed-form transformations. Alternatively, one could be interested in their spectral content only during a certain period of time. In either case, the Fourier transform (or similar transformation) can be applied to one or more finite intervals of the waveform. Typically, the transformation is applied to the product of the waveform and a window function. Any window (including rectangular) affects the spectral estimate calculated by this method. Windowing a simple waveform like cos?t causes its Fourier transform to develop non-zero values (commonly called spectral leakage) at frequencies other than ?. Leakage tends to be worst (highest) near ? and at least at frequencies farthest from ?. If the analyzed waveform includes two sinusoids of different frequencies, leakage can interfere with the ability to distinguish them spectrally. If their frequencies are different and one component is weaker, then leaks from the stronger component can mask the presence of the weaker one. But if the frequencies are similar, leakage can make them intractable even when the sinusoids are of equal strength. The rectangular window has excellentresolution characteristics for sinusoids of comparable strength, but is a poor choice for sinusoids of disparate amplitudes. This characteristic is sometimes described as low dynamic range. At the other extreme of the dynamic range are windows with the lowest resolution and sensitivity, that is, the ability to reveal relatively weak sinusoids in the presence of additive random noise. This is because noise produces a stronger response with high dynamic range windows than with high resolution windows. Therefore, high dynamic range windows are most often justified in broadband applications, where the analyzed spectrum is expected to contain many different components of varying amplitudes. Between the extremes are moderate windows, such as those of Hamming and Hann. They are commonly used in narrowband applications, such as a telephone channel spectrum. In summary, spectral analysis involves a tradeoff between resolving comparable resistance components with similar frequencies and resolving disparate resistance components with different frequencies. This trade-off occurs when the window function is chosen. When the input waveform is sampled in time, instead of being continuous, the analysis is typically performed by applying a window function and then a discrete Fourier transform (DFT). But DFT only provides a sparse sampling of the real discrete-time Fourier transform (DTFT) spectrum. Figure 8 shows part of the DTFT for a sinusoid with a rectangular window. The actual frequency of the sine wave is indicated by "0" on the horizontal axis. Everything else is leakage, exaggerated by the use of logarithmic presentation. The frequency unit is “DFT bins”; that is, the integer values on the frequency axis correspond to the frequencies sampled by the DFT. Thus, the figure represents a case where the actual frequency of the sinusoid coincides with a DFT sample and the maximum value of the spectrum is accurately measured by this sample. When a certain amount is missing from the maximum value (up to ½ bin), the measurement error is called scalloping loss (inspired by the shape of the peak). For a known frequency, such as a musical note or sinusoidal test signal, the matching of the frequency to a DFT bucket can be predefined by choosing a sampling rate and window length that translate to an integer cycles in the window. In processing, operations are chosen to improve some aspect of a signal's quality by exploiting differences between the signal and corrupting influences. When the signal is a sinusoid corrupted by additive random noise, spectral analysis distributes the signal and noise components differently, which often makes it easier to detect the presence of the signal or measure certain characteristics, such as amplitude and frequency. This is because the signal-to-noise ratio (SNR) is improved by distributing the noise evenly, while concentrating most of the sinusoid's energy around one frequency. Processing gain is a term often used to describe an improvement in SNR. The processing gain of spectral analysis depends on the window function, both its noise bandwidth and its potential scalloping loss. These effects are partially offset, because the least scalloped windows naturally have the most leaks. The frequencies of the sinusoids are chosen such that one encounters no scalloping and the other encounters maximum scalloping. Both sinusoids suffer less SNR loss under the Hann window thanunder the Blackman–Harris window. In general (as mentioned previously), this acts as a deterrent to using high dynamic range windows in low dynamic range applications. The human ear automatically and involuntarily performs a calculation that requires years of mathematical education to accomplish. The ear formulates a transformation by converting sound – pressure waves traveling through time and through the atmosphere – into a spectrum, a description of sound as a series of volumes at distinct pitches. The brain then transforms this information into perceived sound. A similar conversion can be performed using mathematical methods on the same sound waves or virtually any other fluctuating signal that varies over time. The Fourier transform is the mathematical tool used to perform this conversion. In simple terms, the Fourier transform converts waveform data in the time domain to the frequency domain. The Fourier transform achieves this by decomposing the original time waveform into a series of sinusoidal terms, each with a unique amplitude, frequency, and phase. This process in effect converts a time domain waveform that is difficult to describe. mathematically into a more manageable series of sinusoidal functions which, when added together, exactly reproduce the original waveform. Plotting the amplitude of each sinusoidal term against its frequency creates a power spectrum, which is the response of the original waveform in the frequency domain. Figure 10 illustrates this concept of converting time to the frequency domain. The Fourier transform has become a powerful analytical tool in various scientific fields. In some cases, the Fourier transform can provide a way to solve complex equations describing dynamic responses to electricity, heat, or light. In other cases, it can identify regular contributions to a fluctuating signal, helping to make sense of observations in astronomy, medicine and chemistry. Perhaps because of its usefulness, the Fourier transform was adapted for use on a personal computer. Algorithms have been developed to bridge the personal computer and its ability to evaluate large quantities of numbers with the Fourier transform to provide a personal computer-based solution to the representation of waveform data in the frequency domain. The fast Fourier transform (FFT) is a computationally efficient method for generating a Fourier transform. The main advantage of an FFT is speed, which it achieves by reducing the number of calculations needed to analyze a waveform. A disadvantage associated with FFT is the limited range of waveform data that can be transformed and the need to apply a window weighting function to the waveform to compensate for spectral leakage. FFT is just a faster implementation of DFT. The FFT algorithm reduces an n-point Fourier transform to approximately (n/2) log2(n) complex multiplications. For example, calculated directly, a TFD on 1,024 (i.e. 210) data points would require n2 = 1,024 × 1,024 = 220 = 1,048,576 multiplications. The FFT algorithm reduces this to approximately (n/2) log2(n) = 512 × 10 = 5,120 multiplications, for an improvement of a factor of 200. But the increase in speed comes at the cost of versatility. The FFT function automatically imposes certain restrictions on the time series to be evaluated in order to generate a meaningful and accurate frequency response. Since the FFT function by definition uses a base 2 logarithm, it requires that the range or length of the time series to be evaluated contain a number.