The many flavors of H.264 video

H.264 is not a single video codec; it is a family of codecs with some shared shortcuts grouped into 17 sets of profiles and 16 levels of constraints. Video creators and playback software share a mutual understanding of these shortcuts, which are often accelerated by specialized chipsets. This post examines a few of the many flavors of H.264 video and their application in mobile, desktop, and Flash Player environments.

Ogg Theora 1.1 macroblocks example

A compressed video is a series of shortcuts shared between a video creator and a viewer. A series of pictures, 30 pictures per second in most capture devices, are analyzed and compared, collapsing a group of pictures into a single photograph and variances between pictures before or after its place in the series. All lossy video codecs examine a series of pictures and look for pieces that can be thrown out and replaced with shortcuts to recreate video quality with less stored data. Specialized decoders in our playback software, often assisted by chips especially programmed to quickly execute these shortcuts, decompress video with these specialized instruction sets. Shortcuts can be patented, leading to some of the intellectual property concerns around H.264, VP8, and Theora video as video playback, and encoding targets, are increasingly integrated with web browsers implementing support for native HTML5 <video>.

  1. H.264 flavors
  2. The Apple effect
  3. Flash Player for mobile
  4. WebM and VP8
  5. Summary

H.264 flavors

H.264 is not a single video codec; it is a family of codecs with some shared shortcuts grouped into 17 sets of profiles and 16 levels of constraints. Decoding software, often backed by chips specially wired for video tasks (such as NVIDIA’s PureVideo) fill a storage buffer and try to compute video frames more quickly than those frames are requested from the player. High-complexity profiles and levels offer the highest quality video in the smallest file size but require a larger file buffer and computational horsepower to quickly decompress a video. High complexity works well in an overpowered desktop environment but videos must be adjusted for simplified, battery sipping use cases such as a mobile phone.

Feature Baseline
(iPod)
Main
(iPad)
High
(MacBook)
Flexible macroblock ordering (FMO)
Arbitrary slice ordering (ASO)
Redundant slices (RS)
B slices
Interlaced coding (PicAFF, MBAFF)
CABAC entropy coding
8×8 vs. 4×4 transform adaptivity
Quantization scaling matrices
Separate Cb and Cr QP control
Monochrome (4:0:0)

Videos are encoded with specific playback targets in mind based on maximum compatibility. The iPhone 3GS supports H.264 Baseline Level 3.0. The iPhone 4 and iPad support H.264 Main Profile Level 3.1. The latest netbooks with NVIDIA ION and PureVideo HD support H.264 High Profile Level 4.1. A video optimized for desktop, notebook, or netbook playback encoded using H.264 High Level 4.1 will not playback on an iPhone.

The Apple effect

Adobe has repeatedly said that Apple mobile devices cannot access “the full web” because 75% of video on the web is in Flash. What they don’t say is that almost all this video is also available in a more modern format, H.264, and viewable on iPhones, iPods and iPads.

Steve Jobs, April 2010

Adobe’s Flash Player added support for H.264 video decoding in August 2007 with its Flash Player 9 Update 3 Beta 2 (9.0.115) release. Websites previously included a video file, a Flash video container (FLV) with a On2 VP6 or Sorenson video track, into a single Flash file for distribution and playback. The launch of H.264 support in Flash decoupled the video player and the video file, loading videos over the network when a viewer initiates playback (a much lighter payload for embeds such as YouTube). Video websites can directly expose MP4 downloads to iTunes, the QuickTime browser plugin, or search engines for download and indexing.

Decoupling the Flash video viewer from the underlying video provides direct access but does not necessarily deliver video “viewable on iPhones, iPods and iPads.” Video publishers need to dumb down their video for Apple’s low-power devices (and Flash mobile), or a video will be viewable but not playable.

YouTube exposes multiple video resolutions on its website. Each video resolution uses a slightly different version of H.264 but none of these videos delivered to desktop web browsers are compatible with an iPhone 3GS and its Baseline profile requirement. Let’s take a look at the underlying videos exposed in the default Flash version of YouTube for the latest weekly address from the White House.

Exposed YouTube web formats

720p
MP4, High profile level 4.1
480p
FLV, Main profile level 3.1
360p
FLV, Main profile level 3.0

The H.264 videos used by YouTube for default video playback on web browsers are not compatible with portable Apple devices not built off an A4 processor. YouTube is creating special video files for iOS and other mobile devices.

Flash Player for mobile

On June 22 Adobe released Flash Player 10.1 for mobile, its first full Flash player written for ARM instruction set architectures. Flash for mobile does not solve the video playback problem. Flash can draw a player area and display a preview image of the video in place of a failed plugin icon. Video playback ultimately depends on the hardware decoder horsepower behind the scenes and its ability to deliver video frames and synchronized audio to your mobile device’s screen faster than intended playback and within the constraints of small file buffer and memory available on mobile. Flash for Mobile renders a player and its interaction elements; video for mobile still relies on simpler sets of shortcuts targeting hardware-accelerated features and available computing resources on mobile.

WebM and VP8

Google introduced the WebM file format on May 19 with a container based on Matroska, a VP8 video track, and Vorbis audio. Google released any patent rights it may assert over VP8 and released the source code for libvpx, a reference encoder and decoder, with 17 test vectors for implementers. The popular FFmpeg project, used by many web publishers for encoding and by Google Chrome for decoding, quickly added native VP8 support in late June. FFmpeg’s VP8 implementation was able to highly leverage video encoder and decoder shortcuts already used by H.264, opening VP8 to hardware-accelerated playback by chipsets optimized for H.264 shortcuts. If your encoder, decoder, and hardware already pays into the H.264 patent licensing pool run by MPEG-LA the shared, patent-asserted shortcuts present in VP8 can be a good thing. If you were hoping for a Freedom-loving replacement for Theora, VP8 may not be clear of patent assertions (but Mozilla seems to like it).

Summary

Web developers are excited about H.264 video and the rise of browser-native playback through HTML5 <video> markup. H.264 is a family of standards, each with its own set of shortcuts shared between a video publishing tool and a video player. The excitement over mobile video has overlooked the intricacies of H.264 profiles and levels detailed by RFC 4281 and the changing landscape of hardware-accelerated video on mobile. Video publishers should be aware of playback differences between playback devices and either choose a lowest common denominator or specifically target the quality and file size of an intended playback device.

4 comments

Commentary on "The many flavors of H.264 video":

  1. Richie Walker on wrote:

    Excellent article thanks. I wonder if you could shed some light on what such a lowest common denominator encode might look like?

    Am I correct in thinking that to leverage better quality encodes we would move up the profile tree so to speak. ie. dropping back to using “baseline” would give us something that played on more devices/in more places, while using “main” even “high” profile will play less widely but be better quality? Or am I trying to oversimplify and missing the point here? cheers

  2. Lloyd Budd on wrote:

    Fascinating post. It makes me appreciate, all the more, the expertise that you bring to your VideoPress work.

    It also leads me to believe that ubiquitous video that anyone and everyone can produce, distribute, and consume is a ways away.

  3. Carl Edman on wrote:

    You may be mistaken regarding the technical capacity of recent iPhones.

    My DVD library is stored on a NAS encoded at h264 *high* profile level 3.0 (little reason to go to a higher level for de-interlaced DVD content in either NTSC or PAL).

    A year or two ago, iTunes would not even allow me to transfer such encodes to the then-current iPhones.

    But after getting an iPhone 4, I tried again and–not only would the files transfer without error and quickly (which rules out any re-coding done behind the scenes by iTunes–every single MP4 played just fine. In fact, going back to my old iPhone 3GS, I found that some, but not all, of these files played on it fine too.

    • Dags on wrote:

      @Carl, Niall is correct. The iPhone 4 only supports up to Main Profile level 3.1. This is very clearly stated in the iPhone specs. Having said that the feature of the High Profile that commonly causes problems is 8×8 transforms in I-frames. If your encoding settings disable 8×8 DCT (which is the default for a lot of programs) then it’s not surprising that they work on your iPhone. For example, encoding with the Main Profile preset in FFmpeg simply includes the flag ‘-dct8x8′.