Background knowledge

What is streaming?

Simply put, the server sends a stream of video, audio, and optionally subtitles, over the internet, to the client.

The difference between streaming and downloading a video file is that streaming is real-time. The user does not have to wait until the whole file is downloaded. The server divides the video file into many parts and serves a bit of it at a time, and the client downloads it and plays it. If the internet connection allows, depending on the speed and latency of the connection, the server serves more parts of the video file, and the client downloads it and saves it locally, which is called buffering so that the video can play more smoothly.

A large video is like a lake, and streaming is like a stream or river, the quality and experience of streaming largely depend on the width of the stream (the quality of said internet connection).

See more on What is streaming? - How video streaming works - Cloudflare

What is Jellyfin

Jellyfin is a free (as in freedom) software media system, a free (also as in free beer) media solution that consists of both server and client, uses HLS (HTTP Live Streaming protocol) to stream media from server to client, and allows users to have complete control of their media.

A Jellyfin server requires installation of software packages on an appropriate OS/platform. After the server installation, the user is expected to create a library and to add/import media into the library. Most people also set up a NFS share to share the storage over network, as well as GPU hardware acceleration for real-time transcoding.

After properly setting up the server, the user can use a browser or other clients to connect to the server and stream media.

What could possibly go wrong?

This is a non-exhausted list of what I have experienced:

  • The server cannot access the media files (permission denied, need to properly set the file system permission)

  • Green artifacts appear on the transcoded video when using AMD GPU (Radeon RX 6600XT) with VAAPI hardware acceleration

  • Messed up metadata

  • Unresponsive browser window (Chrome’s fault probably)

  • Stuttering and low performance when playing 4K H265 10-bit HDR video (had to manually transcode to H264 8-bit SDR)

  • When attempting to play anything at all, it displays a pop-up “Playback error - This client isn’t compatible with the media and the server isn’t sending a compatible media format.” (solved by restarting the browser, still happening a few times a week as of May 2023)

  • Chrome refuses to directly play 4K H265 10-bit HDR video even though it reported that hardware support is enabled

  • Stuck at loading (while I am writing this)

The issues can be roughly divided into these categories: network, server, and client. Network issues are very complicated, so let’s focus on the server side and the client side of the problem.

Formats

Why can some videos “Direct Play” or “Direct Stream” but others require “transcoding”? To answer this, one must know something called codec support.

Many people are familiar with video files with the extensions mp4, avi, mov, mkv, m4v, webm, vob, 3gp, flv, rmvb, and people that have used a computer long enough typically experienced the pop-up error unsupported video format, which is usually shown when a video player does not support such video file. File extensions like mp4 and mkv are merely the name of the container of video, audio, and subtitles. How video or audio is encoded is another story. A raw video consists of frames of pictures like analog films. The pictures become a movie when they are moved/played at a certain speed, typically about 24 frames per second. In the digital age, video formats are specific ways to code/compress video into something a machine can understand, in other words, 1s and 0s. Some containers can hold almost all codecs while others may have limited support for certain formats.

Codec is defined as software or hardware that encodes or decodes a data stream or signal. Here, sometimes the term codec is used loosely referring to media formats.

Video formats and codecs

Typically DVDs use MPEG-2 Part 2 as the main video codec, a.k.a. H.262, or simply MPEG-2. This is decided by the DVD standard.

On the other hand, Blu-rays use MPEG-4 Part 10 or H.264 or AVC (Advanced Video Coding). This is by far the most popular video codec, and nearly all video players, (software, or dedicated Blu-ray playback hardware) support it.

To support 4K UHD on Blu-ray, a newer and more efficient codec was developed, named MPEG-H Part 2 or H.265 or HEVC (High Efficiency Video Coding). It is able to maintain the same quality while reducing the file size from 25% to 50% compared to H.264, and it supports HDR (High Dynamic Range video). HDR has a much larger color range, in other words, it can display much more color, brightness, and contrast than the old SDR (Standard Dynamic Range), which was based on CRT display.

However, the organization known as MPEG owns the MPEG standards and their related patents. Using these MPEG technologies commercially requires paying royalties. That is why popular video streaming platform like YouTube does not use them. On the other hand, developers in the open source community reverse-engineered and released the free tool x264 and x265, both are licensed under the terms of GNU GPL v2.0 (or later), one of the strongest copyleft licenses in the world.

The above video formats were developed specifically for their implementations on physical media, like DVD or Blu-ray. Microsoft published VC1, making it an open but non-free format that was used on many Blu-ray discs in the earlier days.

On the other hand, VP8 and VP9, released by Google as open and royalty-free formats, which are analog to H.264 and H.265 in terms of efficiency, are two of the three main formats used on YouTube after it switched from H.264 following the call by Free Software Foundation in an open letter. However, a Luxembourg-based company named Sisvel formed patent pools for VP9 (and AV1) laid claims on the patents that allegedly were used in VP9 and AV1. Some people suspect that Sisvel is a patent troll.

Alliance for Open Media (AOM), a non-profit organization that later released the AV1 standard and its libaom codec under BSD 2-Clause License in 2018, was formed to create royalty-free media standards to compete with MPEG. Despite that the patent claims were made and patent licenses were sold, AV1 quickly gained popularity, soon Google enabled streaming AV1 coded contents on YouTube, many hardware decoders were released by different vendors, and the desktop GPU vendors Nvidia, AMD, and Intel released their desktop discrete graphics cards with built-in AV1 hardware encode/decode acceleration, which is anticipated to be a huge push to the wide adoption of live streaming in AV1 codec.

The members of AOM mainly consist of multi-national mega-corporations which are assumed that they want to maximize their profit by switching to open media formats, but their efforts are met with patent claims by the companies which are also assumed to maximize their profit. Ironic.

Ironic

This is a still picture of the movie Star Wars: Revenge of the Sith (2005), where Chancellor Palpatine tallked to Anakin Skywalker at the senate. I do not claim its cpoyright.

All the above video formats have complete or partial support from various clients. In terms of resource consumption when decoding/encoding these formats, the newer ones are more resource intensive than the older ones. Decoding should not be a problem on modern computers or phones. Encoding, on the other hand, consumes much more CPU resources when using software, less so using GPU hardware acceleration, depending on the GPU vendor’s implementations. Specifically, when encoding in AV1 using software, if the highest efficiency option is used, a 30-second clip can take as long as 7 hours to encode, which is about hundreds of times slower than H.265. However, hardware encoding has its drawback. GPU is fast at doing repetitive work using the same instructions with multiple data streams, but it trades off picture quality and the flexibility of configurations for a fast encoding speed. Software encoding which uses CPU, can offer flexibility and control over the encoding process and have higher visual quality and reduction in artifacts.

Audio formats and codecs

Audio on the other hand has a simpler story. Audio streams typically contain less amount of data compared to video streams.

Pulse-code modulation (PCM), which was used in the days of telegraphy and telephony before the digital age, is the standard that is most widely used. Linear pulse-code modulation (LPCM) is a format used in audio files with the extension WAV (Waveform Audio File Format). The compact disc (CD) stores stereo audio using 44.1 kHz sampling rate and 16-bit resolution audio. It is also a part of DVD and Blu-ray standards.

However, (L)PCM is uncompressed, so the size of data is considered large and uneconomic when being transported over network or physical media.

MP3 took over the world with its compact file size while maintaining reasonable audio fidelity in the internet boom of the 1990s and 2000s. Unfortunately, it was soon associated with music copyright infringement.

FLAC (Free Lossless Audio Codec), popular among enthusiasts, can compress audio losslessly. YouTube uses Opus on the platform to reduce the bitrate using lossy compression and maintain a certain level of audio quality.

DVD and Blu-ray audio mainly use Dolby Digital (AC3), DTS, or their successors, like EAC3, Dolby TrueHD, Dolby Atmos, DTS-HD MA, DTS:X.

Audio encoding is not very resource intensive comparing to video encoding.

Subtitles

For what matters in Jellyfin, the formats of subtitles can be roughly divided into two categories: text-based and picture-based.

Text-based subtitles include the popular SubRip Text (SRT) and ASS/SSA. For picture-based subtitles, VobSub is used on DVD, and PGSSUB is used on Blu-ray.

“Burned-in” subtitles, on the other hand, are permanently embedded into the video, and they cannot be removed. This is inconvenient for those who want to turn off subtitles, thus separate subtitles are more popular, either text-based or picture-based.

Why do formats matter

Transcoding is the process to convert one format to another, and it consumes more server resources than usual. The goal is to Direct Play all media. However, being able to play the video file on one’s PC without transcoding does not mean it also plays well with streaming. When the container, video, audio, and subtitle are all compatible with the client, Direct Play will happen. If the audio, subtitle, or container is incompatible with the client, Direct Stream will occur. Direct Stream is a process when only the video is directly streamed without transcoding. Audio may be transcoded to AAC format, or be remuxed (remultiplexed) with video and subtitle to a new container TS, the container used by DVD. Transcoding happens when the video format is incompatible with the client, or when the subtitle is “burned in”.

Jellyfin has a Codec Table that lists the codec compatibility in detail. H.264 with 8-bit color depth is the most supported video format across all platforms, and it is the format that the media will be transcoded to by default if transcoding happens. Newer formats like H.265 and AV1 are not well supported by clients.

As for audio, because transcoding audio is not nearly as resource-intensive, it is not much of a concern. Although FLAC is also supported on all clients listed, Jellyfin transcodes incompatible audio formats to AAC by default.

Subtitles can lead to video transcoding too, if a picture-based subtitle track is selected or the “force burn in all subtitles” option is enabled, the subtitle track will be “burned in” to video. This is the most resource-intensive scenario because two transcodings happen at the same time, while the subtitle layer is placed on the video. PGS and Vob extracted from Blu-ray and DVD are typically burned in when streaming. This is the least desirable situation, and users should choose SRT whenever available, but the container should be mkv (Matroska, a free media container) for compatibility if the user chooses not to place the SRT file as an external file, so the video, audio, and the SRT subtitle should be muxed (multiplexed) beforehand using free (libre) tools like mkvmerge (CLI) and MKVToolNix (GUI).

Jellyfin uses a custom-built FFmpeg to transcode or remux. FFmpeg is a free software, a collection of libraries and tools for processing media. User can supply their own FFmpeg binary by building from source code or using one of the pre-built binaries from other developers, but this is not recommended by Jellyfin, because Jellyfin requires certain build options and libraries to be enabled for hardware acceleration.

A note on hardware acceleration

Hardware acceleration is not very easy to set up. The video quality and performance may not be satisfactory. AMD GPUs are known to have less satisfactory performance, and in my case, I can see massive chunks of green artifacts when watching 4K HDR HEVC contents with HDR -> SDR tone mapping enabled on Chrome, and the transcoding max out at 27 fps using AMD RX 6600XT. Intel iGPUs may be better, but the latest Intel Arc dGPU has many Linux driver issues as it does on Windows, as of mid 2023.

While Intel and AMD GPU can use the open source VAAPI, Nvidia GPU uses a proprietary video codec API called NVENC/NVDEC, and Nvidia imposes an artificial limitation of 3 or 5 simultaneous encoding sessions on their consumer-grade graphics card, 3 or 5 sessions depending on the driver version used. This restriction can be bypassed using an unofficial driver patch.

GPU transcoding does not require much computational power as gaming does, so a second-hand or mid-range graphics card with proper codec supports should be fine if it is exclusively used by Jellyfin.