Hello,
Before you ask, I have looked at many MPEG-2 FAQ's and have come up short-handed on how it
compresses video for the most part, since I can't understand most of it. If someone can
explain all the frame types (I-frame, P-frame, etc.), quantization, coded block patterns,
16x16, motion prediction, and whatever else you can in English, I would greatly appreciate
it.
Thanks so much!
Cliff:
MPEG-1 and MPEG-2 are motion video compression standards created by the Moving Picture
Experts Group. This group is a joint committee of the International Standardization
Organization and the International Electromechanical Commission.
The MPEG-1 Standard, completed as a draft in 1992, defines a bit stream of compressed
audio and video data with a data rate of 1.5 Mbits/sec as being suitable for CD-ROMs and
Video-CD applications. It is possible to generate MPEG-1 streams with other data rates.
The MPEG-1 Standard is formally described in ISO/IEC 11172.
The MPEG-2 Standard was designed later for digital transmission of broadcast quality video
with data rates from 2 to 10 Mbits/sec. It was written to be more "generic" that
is to address a broader range of applications, and is the compression standard for DVD and
various digital television systems. The MPEG-2 Standard is described in ISO/IEC 13818
documents.
MPEG-1 and MPEG-2 motion video compression standards are based on the interframe method of
compression. This means that some frames are encoded based on representing changes
comparatively to another previously encoded frame(s). Since usually only a small portion
of the frame changes, it helps to reduce the amount of data to be stored.
An MPEG stream can have three types of frames:
- Intra (I)
- Predicted (P)
- Bi-directional interpolated (B).
Intra (I) frames are coded without any references to any other frames. Predicted (P)
frames are coded with references to previously encoded P or I frames. Predicted frames
provide significantly better compression than Intra coded (I) frames. Bi-directional
interpolated (B) frames contain references to both previous P or I frames and the next P
or I frame. Bi-directional frames provide the best compression.
A minimum independently encoded rectangle on the frame is called macroblock. It has a size
of 16x16 pixels. When encoding P or B frames the encoder searches each macroblock for the
most similar block of pixels from the previously encoded frame. This process is called
Motion Estimation. As a result, the encoder transmits motion vectors that represent the
relative coordinates of macroblocks. Then the encoder transmits only the difference
between current and preceding macroblocks. To reduce spatial redundancy in data, Discrete
Cosine Transformation (DCT) Quantization and Huffman encoding are used.
Motion Estimation is usually the most time consuming part of MPEG encoding. Discrete
Cosine Transformation (DCT) is a process of representing original data as a linear sum of
basic cosine functions with different frequencies. Each image delivers one brightness and
two color signals per pixel. The DCT converts these signals into frequency coefficients
containing the color and brightness information. The signals can then be compressed more
easily. Quantization is the use of complex mathematical operations to ensure that image
parts that are important to the human eye are represented precisely and irrelevant
information is represented with less precision. Huffman encoding reduces the amount of
transferred data using statistical distribution of quantized DCT coefficients. This method
evaluates how often and with what probability certain values occur. Values that seldom
occur receive a long code, while values occurring often receive a short code.
The MPEG file consists of compressed video data, called the
video stream, and compressed audio data, called the audio stream. It can also contain only
one of the streams mentioned above.
Well, that's my story and I'm sticking to it ;-))
Best Regards, Cliff
Cliff,
Thanks so much for your MPEG report! That was easily the most to-the-point article on MPEG
that I have seen, and it was easy to understand! Amazing! Again, thanks for putting effort
to make it easy to understand. I do have a few more questions, probably simple ones.
1.) I read about 8 x 8 blocks inside macroblocks. Is that true, and can they get smaller?
I'll give you a simple answer first, followed by the book
answer.
Yes, that is true. The macroblock (16x16) is the working level for MPEG2, which results in
packets of data being encoded containing fill instructions for the proper decoding in the
macroblock header. The 8x8 blocks within the macroblock are used to determine the DCT for
that macroblock and the 16x16 macroblocks are used for predictions between frames.
Why was the 8x8 DCT size chosen?
Experiments showed little compaction gains could be achieved with larger transform sizes,
especially in light of the increased implementation complexity. A fast DCT algorithm will
require roughly double the number of arithmetic operations per sample when the linear
transform point size is doubled. Naturally, the best compaction efficiency has been
demonstrated using locally adaptive block sizes (e.g. 16x16, 16x8, 8x8, 8x4, and 4x4) [See
Gary Sullivan and Rich Baker "Efficient Quadtree Coding of Images and Video,"
ICASSP 91, pg. 2661-2664.].
Inevitably, adaptive block transformation sizes introduce additional side information
overhead while forcing the decoder to implement programmable or hardwired recursive DCT
algorithms. If the DCT size becomes too large, then more edges (local discontinuities) and
the like become absorbed into the transform block, resulting in wider propagation of Gibbs
(ringing) and other unpleasant phenomena. Finally, with larger transform sizes, the DCT
term is even more critically sensitive to quantization noise.
2.) Also, is 16 x 16 the maximum macroblock size?
Yes, except in the case of interlaced video where it is 32x16.
Why was the 16x16-prediction size chosen?
The 16x16 area corresponds to the Least Common Multiple (LCM) of 8x8 blocks, given the
normative 4:2:0 chroma ratio. Starting with medium size images, the 16x16 area provides a
good balance between side information overhead & complexity and motion compensated
prediction accuracy. In gist, experiments showed that the 16x16 was a good trade-off
between complexity and coding efficiency.
3.) What is a coded block pattern, or CBP as they call
it? I'm guessing it's a dither-like pattern made in a single macroblock using smaller
blocks?
No, that is not correct.
Coded Block Pattern: (CBP --not to be confused with Constrained Parameters!) When the
frame prediction is particularly good, the displaced frame difference (DFD, or temporal
macroblock prediction error) tends to be small, often with entire block energy being
reduced to zero after quantization. This usually happens only at low bit rates. Coded
block patterns prevent the need for transmitting End of Block (EOB) symbols in those zero
coded blocks. Coded block patterns are transmitted in the macroblock header only if the
macrobock_type flag indicates so.
4.) Here is why I want to know this: When looking at a
still DVD full-screen or zoomed widescreen picture, it appears that picture actually loses
color over the analog versions. I can clearly make out the differences between colors on
non-complicated elements of the picture. Would a dither-like effect ever be used to
compensate lost color? This one area is confusing me the most.
I think I can clear up some of your confusion on this matter.
If you are referring to the displayed image on a computer monitor, remember this. A
computer monitor is NOT the best method of viewing video from DVD for the following
reasons;
PC video adapters/monitors have a limited number of gray-levels and a non linear
(compressed) color scale. With 64 gray-levels the screen intensity (color) will change
only once for every 4 gray scale units. This can group several single color shades into
the same intensity making the image look like it has been dithered. This dithering type
effect only happens on low detail (low data rate) areas. Also some decoder chips can't
decode the "blacker that black" level needed for a pure black. This will give
the image a washed out look.
Best Regards, Cliff
|