Questions?   E-mail us!
Sales 1-800-217-2013

 

 
   

 
   
   

 

 
  Search

By product name, model number, or manufacturer

Serving you on-line since 1997!

 
 

 
  CliffNotes on MPEG-1 and MPEG-2 FAQs

 

Hello,

Before you ask, I have looked at many MPEG-2 FAQ's and have come up short-handed on how it compresses video for the most part, since I can't understand most of it. If someone can explain all the frame types (I-frame, P-frame, etc.), quantization, coded block patterns, 16x16, motion prediction, and whatever else you can in English, I would greatly appreciate it.

Thanks so much!

Cliff:

MPEG-1 and MPEG-2 are motion video compression standards created by the Moving Picture Experts Group. This group is a joint committee of the International Standardization Organization and the International Electromechanical Commission.

The MPEG-1 Standard, completed as a draft in 1992, defines a bit stream of compressed audio and video data with a data rate of 1.5 Mbits/sec as being suitable for CD-ROMs and Video-CD applications. It is possible to generate MPEG-1 streams with other data rates. The MPEG-1 Standard is formally described in ISO/IEC 11172.

The MPEG-2 Standard was designed later for digital transmission of broadcast quality video with data rates from 2 to 10 Mbits/sec. It was written to be more "generic" that is to address a broader range of applications, and is the compression standard for DVD and various digital television systems. The MPEG-2 Standard is described in ISO/IEC 13818 documents.

MPEG-1 and MPEG-2 motion video compression standards are based on the interframe method of compression. This means that some frames are encoded based on representing changes comparatively to another previously encoded frame(s). Since usually only a small portion of the frame changes, it helps to reduce the amount of data to be stored.

An MPEG stream can have three types of frames:

- Intra (I)
- Predicted (P)
- Bi-directional interpolated (B).

Intra (I) frames are coded without any references to any other frames. Predicted (P) frames are coded with references to previously encoded P or I frames. Predicted frames provide significantly better compression than Intra coded (I) frames. Bi-directional interpolated (B) frames contain references to both previous P or I frames and the next P or I frame. Bi-directional frames provide the best compression.

A minimum independently encoded rectangle on the frame is called macroblock. It has a size of 16x16 pixels. When encoding P or B frames the encoder searches each macroblock for the most similar block of pixels from the previously encoded frame. This process is called Motion Estimation. As a result, the encoder transmits motion vectors that represent the relative coordinates of macroblocks. Then the encoder transmits only the difference between current and preceding macroblocks. To reduce spatial redundancy in data, Discrete Cosine Transformation (DCT) Quantization and Huffman encoding are used.

Motion Estimation is usually the most time consuming part of MPEG encoding. Discrete Cosine Transformation (DCT) is a process of representing original data as a linear sum of basic cosine functions with different frequencies. Each image delivers one brightness and two color signals per pixel. The DCT converts these signals into frequency coefficients containing the color and brightness information. The signals can then be compressed more easily. Quantization is the use of complex mathematical operations to ensure that image parts that are important to the human eye are represented precisely and irrelevant information is represented with less precision. Huffman encoding reduces the amount of transferred data using statistical distribution of quantized DCT coefficients. This method evaluates how often and with what probability certain values occur. Values that seldom occur receive a long code, while values occurring often receive a short code.

The MPEG file consists of compressed video data, called the video stream, and compressed audio data, called the audio stream. It can also contain only one of the streams mentioned above.

Well, that's my story and I'm sticking to it ;-))

Best Regards, Cliff

Cliff,

Thanks so much for your MPEG report! That was easily the most to-the-point article on MPEG that I have seen, and it was easy to understand! Amazing! Again, thanks for putting effort to make it easy to understand. I do have a few more questions, probably simple ones.

1.) I read about 8 x 8 blocks inside macroblocks. Is that true, and can they get smaller?

I'll give you a simple answer first, followed by the book answer.

Yes, that is true. The macroblock (16x16) is the working level for MPEG2, which results in packets of data being encoded containing fill instructions for the proper decoding in the macroblock header. The 8x8 blocks within the macroblock are used to determine the DCT for that macroblock and the 16x16 macroblocks are used for predictions between frames.

Why was the 8x8 DCT size chosen?

Experiments showed little compaction gains could be achieved with larger transform sizes, especially in light of the increased implementation complexity. A fast DCT algorithm will require roughly double the number of arithmetic operations per sample when the linear transform point size is doubled. Naturally, the best compaction efficiency has been demonstrated using locally adaptive block sizes (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See Gary Sullivan and Rich Baker "Efficient Quadtree Coding of Images and Video," ICASSP 91, pg. 2661-2664.].

Inevitably, adaptive block transformation sizes introduce additional side information overhead while forcing the decoder to implement programmable or hardwired recursive DCT algorithms. If the DCT size becomes too large, then more edges (local discontinuities) and the like become absorbed into the transform block, resulting in wider propagation of Gibbs (ringing) and other unpleasant phenomena. Finally, with larger transform sizes, the DCT term is even more critically sensitive to quantization noise.

2.) Also, is 16 x 16 the maximum macroblock size?

Yes, except in the case of interlaced video where it is 32x16.

Why was the 16x16-prediction size chosen?

The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8 blocks, given the normative 4:2:0 chroma ratio. Starting with medium size images, the 16x16 area provides a good balance between side information overhead & complexity and motion compensated prediction accuracy. In gist, experiments showed that the 16x16 was a good trade-off between complexity and coding efficiency.

3.) What is a coded block pattern, or CBP as they call it? I'm guessing it's a dither-like pattern made in a single macroblock using smaller blocks?

No, that is not correct.

Coded Block Pattern: (CBP --not to be confused with Constrained Parameters!) When the frame prediction is particularly good, the displaced frame difference (DFD, or temporal macroblock prediction error) tends to be small, often with entire block energy being reduced to zero after quantization. This usually happens only at low bit rates. Coded block patterns prevent the need for transmitting End of Block (EOB) symbols in those zero coded blocks. Coded block patterns are transmitted in the macroblock header only if the macrobock_type flag indicates so.

4.) Here is why I want to know this: When looking at a still DVD full-screen or zoomed widescreen picture, it appears that picture actually loses color over the analog versions. I can clearly make out the differences between colors on non-complicated elements of the picture. Would a dither-like effect ever be used to compensate lost color? This one area is confusing me the most.

I think I can clear up some of your confusion on this matter.

If you are referring to the displayed image on a computer monitor, remember this. A computer monitor is NOT the best method of viewing video from DVD for the following reasons;

PC video adapters/monitors have a limited number of gray-levels and a non linear (compressed) color scale. With 64 gray-levels the screen intensity (color) will change only once for every 4 gray scale units. This can group several single color shades into the same intensity making the image look like it has been dithered. This dithering type effect only happens on low detail (low data rate) areas. Also some decoder chips can't decode the "blacker that black" level needed for a pure black. This will give the image a washed out look.

Best Regards, Cliff

 

 

 
           Copyright 1999-2005  Digital Connection.  All rights reserved.