------------------------------- Speech Coding Before 1994 --------------- -------------------------
SPEECH Quality Is Claissified Into Four General Categories: 1) Broadcast - Above 64 KBITS / S2) Toll or NetWork (200-3200Hz) - Above 16 KBITS / S3) Communication - Above 4.0 KBITS / S4) Synthetic - Below 4.0 KBITS / S
Object Mesurement: 1) Signal-to-noise (SNR) 2) Segmental SNR (segsnr) 3) Articulation Index4) Log Spectral Distance5) The Euclidean Distance
Subjective Mesurement: Diagnostic Rhyme Test (DRT) - An Intelligiblity Measure Where The Subject's Task Is To Recognize One of Two Possible Words in A
Set of rhyming pairs.diagnostic aceptablitity Mesure (DAM) - BASEDOON RESULTS OF TEST METHODS Evaluating The Quality of a Communication System Based
On Teh AcceptableIlity of Speech As Perceived by Trained Normal Listener.mean Opinion Score (MOS) - Involves 12 To 24 listeners Who Are Instructed To Rate Phonetical Balanced Records According to A
FIVE-Level Quality Scale.
Waveform Coders: a.scalar and vector quantization1) Scalar Quantizancepulse-Code Modulation (PCM) - A Memoryless Proces That Quantizes Amplitudes By Rounding Off Each Sample To One of A Set of
Discrete Values.Adaptive PCM (APCM) - Uniform Quantizer. Step size is estimated from paspAst Coded Speech Samples. (A 7-Bit Log Quantizer for
SPEECH ACHIEVES The Performance of A 12-Bit Uniform Quantizer) Differential PCM (DPCM) - Utilizes The Redundancy in The Speech Waveform by Exploiting The Correlation Between Adjacntent
Samples (Better Than PCM for Rate Atatvie DPCM (ADPCM) - The Step Size In DPCM IS Adaptive.delta Modulation (DM) - A Sub-Class of DPCM Where The Difference is Encode ONLY DIFCERENCE IS ENCODEDODLY With 1 bit.adaptive DM (ADM) -The Step Size In DM is Adaptive.
Standards: G.721 CCITTTARD (1988) --- ADPCM 32-KBITS / SG.723 - ADPCM 24 and 40 KBITS / S (The Performance of Adpcm Degrades Quickly for Rates Below 24 KBITS / S) 2) Vector Quantization . --consists of an N-dimensional quantizer and a codebook The incoming data are formed into a N-dimesional vector, then is mapped by quantizer to an entry in the codebook.Full searched (F-VQ) - the codebook is fully searched for each incoming.Tree-structured vector quantizer - the codebook is searched in "tree" way (a degradation fo 1 db in the SNR compared with F- VQ) Mulistep VQ -. consist of a cascade of two or more quantizers , each one encoding the error or residual of the previous quantizer (1 dB better in the SNR compared to F-VQ) LBG - an iterative codebook design algorithm:. inital guess for the codebook and then interative improvement by using a large
number of training vectors.Gain / Shape VQ (GS-VQ) - normalizing the vectors fo the codebook and encoding the gain separately (0.7 db improvement compared to the F-VQ) Adaptive codebooks (A-VQ) -. the codebook Is Adaptive Forward or Backword.
B.sub-Band and Transform Coders 1) Sub-Band Coders (SBC) - the signal band is divided into frequency sub-bands using a bandk of bandpass filters standard:. AT & T voice store-and-forward standard - used for Voice Storage AT 16 or 24 KBITS / S AND CONSIS OF FIVE-BAND NONUNIFORM TREE-
structured QMF bank in conjunction with APCM coders A silence compression alogrithm is also part of the standard.CCITT G.772 -. for 7-kHz audio at 64 kbits / s for ISDN teleconferencing, based on two-band sub-band / ADPCM CODER. LOW FREQUENCY
Suband is Quantized AT 48 KBITS / S While The High-Frequency Sub-Band Is Coded AT 16 KBITS / S.
2) Transform Coders (TC) -. The transform components of a unitary transform are quantized at the transmitter and decoded andinverse-transformed at the receiver The bit-rate reduction lies in the fact that unitary transform tend to generate near-
uncorrelated transform components which can be coded independently.several siscrete transform: Discrete Cosine Transform (DCT) (near optimal) Discrete Fourier Transform (DFT) Walsh-Hadamard Transform (WHT) kARHUNEN-lOEVE tRANSFORM (kLT) (optimal) Adaptive transform coder ( ATC) - Encodeds The Transform Components Using Adaptive Quantization and Bit Assignment Rules.
// from here, i omit mantic examples ....
Speech coding using sinusoidal analysis-synthesis models -. Relies on sinusoidal representations of the speech waveform.A speech Analysis-synthesis using the short-Time Fourier TransformTime-varying spectral analysis can be performed using the short-time Fourier transform (STFT).
B.Sinusoidal Transform Coding (STC) - using unitary sinusoidal transforms implies that speech waveform si represented by a set of narrowband functions (based on the fact that voiced speech is typically highly periodic and hence it can be represented by a constraned set of. SINUSOIDS)
C.The Multiband Excitation Coder (MBE) - relies on a model that treats the short-time speech spectrum as the product of an excitation spectrum and a vocal tract envelopeimproved Multiband Excitation Coder (IMBE) - quantizeing the MBE model parameters.
Standard: Australian Mobile Staellite Standard (AUSSAT) AND THE INTERNATIONAL Mobile Satellite (Inmarsat_m) Employ Imbe That Operates AT 6.4 KBITS / S
Vododer methods .-- Speech-Specific CODER.THE Performance of Vocoders Generally Degrades for Nonspeech Signals. Rely On Speech-Specific
Analysis-synthesis which is mostly based on the source-system model.a.the channel and the formant vocoderrelies on representing the speech spectrum as the product of vocal tract and extra eXCITATION SPECTRA.
B. HomoMorphic Vocoders - Vocal Tract and the Ecxitation Log-Magnutude Spectra Can Be Combined Additive To Produce The Speech Log-MagnutudE Spectrum.
C. Linear-Predictive Vocoders (LPC) - predict the sample by uisng a linear comibation of last samples a) The calssical two-state excitation modelstandard:. LPC-10 - usins a 10th-order predictor to estimate the vocal-tract Parameters.
b) Mixed Excitation ModellPC Combined with others ..?
C) Residual excited linear prediction (RELP) - encodes the residual of LPC, and allot more bits for the perceptually important components (the quality of RELP coder at rates above 4.8 kbits / s is higher than the analogous two stated LPC).
Analysis-by Synthesis Linear Predictive Coders
--system parameters are determined by linear prediction and the eXCITATION SEQUENCE IS DETERMINDED by closed-loop or analyysis-by-synthesis Optimaization
A.Multipulse-Excited Linear Prediction (MPLP) -.. Forms an excitation sequence which consists of multiple nonuniformly spaced pulses Both amplitude and locations of the pluses are determined one pluse at a time such that the weighted mean squared error is minimized (produced Good Quality Speech At Rates As Low AS 10 KBITS / S)
B.Regular Pulse Excitation Coder (RPE) - the pulses in the RPE coder are uniformly spaced and therefor their position are determined by specifying the location k of the first pulse within the frame and the spacing between nonzero pulse.