How does video work on the web?
You probably interact with video
online every day but do you really understand
how it works? I thought I did. Until recently when I was implementing video messaging into a web app and I quickly
realized how little I knew. A simple question, “What format should a video be in so any device can watch it?”, brought
me down the rabbit hole that is video on the web. In this article I provide a high level overview of video topics.
Specifically highlighting some vocabulary that was confusing and priming your brain to go deeper on any of these topics
if you want.
We will cover:
- Anatomy of a video: Codecs and Container Formats
- Streaming vs Progressive Downloads
Anatomy of a video: Codecs and Container Formats
The layout of a multimedia container.
If a video is an MP4 that means the container format is MP4. The container format decides how the data inside the
file is organized. The container format does not indicate how the actual audio or video data is encoded or compressed.
Examples of container formats are WebM
, MP4 and Matroska.
History: One of the very first multimedia file formats was
the Interchange File Format (IFF) developed by Electronic Arts
in 1985. The format’s design was partly inspired by the format Apple’s Macintosh’s were using for
their clipboard. You can check out the original IFF spec which is actually
a pretty interesting read as far as technical documents go.
There are 3 things inside the container: metadata, video data and audio data. Metadata tells us a lot about what is
going on in the container. Here is the output from
mediainfo test.mkv for a video on my computer:
Complete name : test.mkv
Format : Matroska
Format version : Version 4
File size : 792 KiB
Writing application : Chrome
Writing library : Chrome
IsTruncated : Yes
FileExtension_Invalid : mkv mk3d mka mks
Video ID : 2
Format : AVC
Format/Info : Advanced Video Codec
Codec ID : V_MPEG4/ISO/AVC
Width : 640 pixels
Height : 480 pixels
Display aspect ratio : 4:3
Frame rate mode : Variable
Language : English
Default : Yes
Forced : No
Audio ID : 1
Format : Opus Codec
ID : A_OPUS
Channel(s) : 1 channel
Channel layout : C
Sampling rate : 48.0 kHz
Bit depth : 32 bits
Compression mode : Lossy
Delay relative to video : 59 ms
Language : English
Default : Yes
Forced : No
We can see that the container format is Matroska, and the video data is in Advanced Video Coding (AVC) format and the
audio data is in Opus format. These video and audio formats are known as codecs. The codec (an amalgam of the
words coder and decoder) is the algorithm that is used to encode and decode the media data. Examples of audio
codecs are AAC
and Opus. Examples of video codecs
, HEVC/H.265 and VP9.
There are many other codecs out there, however, unless you are doing very specific codec work (like trying to
improve Netflix’s encoding) then you can just stick with the widely used and supported ones.
I will not attempt to describe the details of any codecs here as that is very far out of my wheelhouse, but the main
things to understand are:
- Different container formats can hold different
- Browsers can only play a subset of all codecs and formats
Streaming vs Progressive Downloads
When we use a simple container format like MP4 and point a video element at it the browser will begin a progressive
download this means the browser will start downloading the video into memory from start to finish. If a user tries to
seek to different spot in the video the browser will request that part of the file from the server and continue
downloading from there. This method is memory intensive on the clients machine because the browser will attempt to hold
the entire video in memory. This makes progressive download unsuitable for longer videos. To avoid buffering the video
in memory what you want is a stream.
A streaming protocol, like HTTP Live Streaming (HLS), utilizes the
same containers and codecs that you would find in a regular video file, but it will chop the data into bite size chunks.
So instead of a single file your video is represented as a directory with a manifest file and the chunks of data. To
play a stream the browser reads the manifest file to find the locations of the chunks, then begins requesting the data.
The browser will play the data as soon as it is received and does not keep already played chunks in memory. Therfore,
the memory impact on the client is the same for a long video as a short video.
An optimization that streaming protocols support
is Adaptive Bitrate Streaming (ABR). With ABR, we create
multiple different versions of those data chunks, each encoded at a
different bitrate (lower bitrate means
lower quality). The browser will request data chunks at the highest bitrate it’s internet connection can handle without
having choppy video. If the browser is experiencing choppy video, it will request data at a lower bitrate to smooth out
The process of taking a video from one format to another is known as transcoding. If you want to convert a user
uploaded WebM video into an adaptive bitrate HLS stream you will need to transcode. Transcoding works by decoding the
video to a raw (uncompressed) format then encoding in the desired format. The principle
of garbage in, garbage out applies here, you can never
transcode to a higher quality than what you started with. The standard tool for transcoding
is ffmpeg which has a huge amount of options and can run pretty much anywhere. However, if
you have a large amount of videos maybe you don’t want to deal with having to run ffmpeg as a service you instead
could use a hosted 3rd party solution. AWS offers their MediaConvert service
which hooks in nicely with S3 and CloudFront. There are also companies who solely do transcoding
like Zencoder which could also be a good option.
Browsers are picky about what streaming protocols, container formats and codecs they are willing to work with. This is
really where you need to pay attention as a web developer. A useful resource is the Mozilla Developer Network (
MDN) media type and format guide, which has information on
This looks pretty bad to your users.
Note: Browsers implement the
function canPlayType which takes one
parameter, a string of a MIME type, and returns a string response which tells you if it can play the video. Due to the
diverse nature of container formats and codecs, the browser will only give one of three responses: "" (empty string,
meaning no the browser can’t play the video), “maybe” and “probably”.
To answer the original question, “What format should a video be in so any device can watch it?”, the best answer
we can give is an MP4 container format with the codecs Advanced Audio Coding (AAC) for audio and AVC/H.264 for video.
For streaming, the HLS protocol is supported by every major browser. If you want to go deeper into the rabbit role of
formats and codecs, many of their specifications are open online. To learn more about best practices around using video
on the web this article from Google’s Web Fundamentals
series is great. Hopefully this article gave you a better understanding of the basics of video on the web.