~~NOTOC~~ ====== Codec latency in voice communications over the Internet (VoIP) ====== Speech codecs allow the transformation of the human voice into digital signal. This article explains how much latency can be attributed to coded operations when planning latency budget in VoIP deployments. ===== Codec latency: The basic case ===== The speech is sampled into what is called frames. A frame represents a certain amount of speech encoded and compressed into digital representation. To build a frame, the encoder must sample a minimum amount of voice, a buffering protocol is thus used. For this reason, the encoding operation may introduce at least delay that corresponds to the processing of one frame. Moreover, several encoders look into succeeding frames in order to perform further compression operations on the speech. These encoders that use look-ahead compression schemes, require that encoder buffer several frames, hence introducing latency of several frames. Time required for encoding or decoding a single frame is assumed to be the same time of speech carried in the frame [ITU-G.114]. For example, a frame that carries 20ms of speech requires 20ms to encode and 20ms to decode. For this reason, the total latency introduced by the encoding and decoding operation L_codec is defined as follows: L_codec = 2 * FrameSize + LookAhead (eq.1) ===== Codec latency in packet networks ===== In addition, if the encoded frames are to be transported over the Internet, the IP protocol is used. Since IP packets can be larger than the encoded frames, it may be possible to carry several frames in a single IP packet. To fill N frames in an IP packet the encoder takes at least L_codec = (N+1) * FrameSize + LookAhead (eq.2) The Figure below shows and example of the composition of coded latency when N = 2. It is clear that the minimum latency is 3 * FrameSize + LookAhead. |{{technotes:encoder-latency.jpg?600|}}| The maximum codec latency expressed in eq-2 corresponds to //N * FrameSize// additional delay when compared which the minimum codec latency expressed in eq-1. The maximal additional delay is the time required to process an IP packet by the receiving side. When the receiving side finishes processing packet P, it must have packet P+1 ready. Thus the time between receiving packet P and packet P+1 must be less than //N * FrameSize//. A codec for transmission over packet networks such as IP, would never introduce a latency of more than //N * FrameSize//. Therefore, the maximum attributable latency to encoding operations is of //2N * Framezise + LookAhead//. The extra latency introduced by codec is mainly due to the fact that when link layers used at the sending side are slow (e.g. Modem connection), slower codecs must be used in order to clock frames at a rate compatible with link layer speeds. However, in real life, when encoding processing approaches the limit in eq-2, most certainly the quality of the speech will be distorted since network latency and jitter would cause the //N * FrameSize// limit to be exceeded. ===== Codec latency: an example ===== Delay values for coders in IP-based applications can be found in [ITU-G.114]. For example, the //Pulse Code Modulation (PCM)// [ITU-G.711], popular as part of the ITU H.323 codecs suite, introduces a maximum delay of 0.375ms and a minimum delay of 0.25ms in systems transmitting one frame per IP packet (N=1). For example, when using PCM and a packet size of 10ms, encoded frames worth 10ms are encapsulated in IP packets. Since PCM frames are 0.125ms, 80 frames (N=80) are transmitted in each IP packet. When applying the formulas given by eq-1 and eq-2, the latency attributable to codec operations using PCM is at minimum (80 + 1) * 0.125 = 10.125ms (LookAhead=0 in PCM) , and at maximum ((2 * 80) + 1 ) * 0.125 = 20.125ms. ===== Related articles ===== * **[[Network communication properties and QoS of VoIP]]** {{tag>performance services}}