Thursday 28 May 2009

XVID Video Encoding Process

You can refer the code for this write up here. First of all, We all know Video consist of certain frames displaying approximately 30 frames per second.
A frame is of 640*480 size, with 30fps and this way the received data rate becomes:
640x480x30x2x8 ~= 147 Mbits/sec
Target data rate for the compressed data stream is ~4MBits/sec therefore target compression ratio is about 35:1. If we look on the genearal internet connection speed which is hardly in few kpbs to Mbps. Hence there is a need of proper encoding rather than using all 30 frames.
We call 1st frame as INTRA and the fourth frame as PREDICDTED. Rest frames in between are called bi-directional frames.
We have encoded suppose the very 1st frame INTRA then we are to the next frame that is PREDICTED . For this we take the help of already encoded frame that's INTRA .

Now comes how do we encode this PREDICTED ('P') frame , for this we divide our frame into 16*16 "macroblocks" . For each of macoblock of the predicted frame we do a comprehensive search on the 'macroblocks' of INTRA ('I') frame and the frame with the minimum SDA is the one which should be the 'macroblock' for the PREDICTED frame , we calculate using formula

where x and y are translations in x and y directions and i,j are pixel position. Now we can have x and y as zero , nonzero and negative so we have particular x and y for each of the macroblocks of the PREDICTED frame.
Since we have a 16*16 macroblock , now we divide it into four equal parts of 8*8 and use quantization to generate bitstream . For 16 we have 1-16 quantization levels and we have 2 points to consider before choosing the quantization level:
1> If we choose a lower quantization level we have less errors but the bitstreams become longer. 2>If we choose a higher quantization level we have more errors but the bitstreams become shorter.
We have BEC to select quantization level and we have to consider the tradeoffs between the two above points.
The figure shows 8*8 a part of the four of the divided parts of macroblocks of the predicted frame.

We have to scan each of the pixel for a zero or non zero number and in this way we will get a strings of bits which is to be encoded in run of zeros. For e.g.

87300600004012 will be encode as

(8,0), (7,0), (3,2), (6,4), (4,1), (1,0), (2,0) it means 8 is followed by zero zeroes and 3 is followed by 2 zeroes .

After we do this we encode it in through Huffman coding.

Till now we were encoding the predicted frame but the process for encoding INTRA frame is almost the same except that we have no prior information of previous frames so we dont have the differece to take minimum from.

Now coming to encoding the B-frames(2,3) , we have following 3 methods:

1>using only INTRA frame2>using only PREDICTED frame

3>using both INTRA as well as the PREDICTED frame

The frame encoding sequece can be represented in terms of frames as 1423756........


Variable Block-Size Motion Compensation
Discrete Cosine Transform
Yet another Xvid-build with a bugfix