So now that h.264, h.265, and AV1 seem to be the three major codecs with hardware support, I wonder what will be the next one?
Hopefully AV2.
That'd be h264 (associated patents expired in most of the world), vp9 and av1.
h265 aka HEVC is less common due to dodgy, abusive licensing. Some vendors even disable it with drivers despite hardware support because it is nothing but legal trouble.
IIRC AV1 decoding hardware started shipping within a year of the bitstream being finalized. (Encoding took quite a bit longer but that is pretty reasonable)
We already have some of the stepping stones for this. But honestly much better for upscaling poor quality streams vs just gives things a weird feeling when it is a better quality stream.
Yeah, that's... sparse uptake. A few smart TV SOCs have it, but aside from Intel it seems that none of the major computer or mobile vendors are bothering. AV2 next it is then!
(And yes, even for something like Netflix lots of people consume it with phones.)
Where did it say that?
> AV1 powers approximately 30% of all Netflix viewing
Is admittedly a bit non-specific, it could be interpreted as 30% of users or 30% of hours-of-video-streamed, which are very different metrics. If 5% of your users are using AV1, but that 5% watches far above the average, you can have a minority userbase with an outsized representation in hours viewed.
I'm not saying that's the case, just giving an example of how it doesn't necessarily translate to 30% of devices using Netflix supporting AV1.
Also, the blog post identifies that there is an effective/efficient software decoder, which allows people without hardware acceleration to still view AV1 media in some cases (the case they defined was Android based phones). So that kinda complicates what "X% of devices support AV1 playback," as it doesn't necessarily mean they have hardware decoding.
If it was a stat about users they’d say “of users”, “of members”, “of active watchers”, or similar. If they wanted to be ambiguous they’d say “has reached 30% adoption” or something.
Also, either way, my point was and still stands: it doesn't say 30% of devices have hardware encoding.
2020 feels close, but that's 5 years.
They mentioned they delivered a software decoder on android first, then they also targeted web browsers (presumably through wasm). So out of these 30%, a good chunk of it is software not hardware.
That being said, it's a pretty compelling argument for phone and tv manufacturers to get their act together, as Apple has already done.
Eventually people and companies will associate HEVC with "that thing that costs extra to work", and software developers will start targeting AV1/2 so their software performance isn't depending on whether the laptop manufacturer or user paid for the HEVC license.
[1] https://arstechnica.com/gadgets/2025/11/hp-and-dell-disable-...
Hopefully, we can just stay on AV1 for a long while. I don't feel any need to obsolete all the hardware that's now finally getting hardware decoding support for AV1.
This is a big victory for the patent system.
I am eagerly awaiting for AV2 test results.
I'm running an LG initially released in 2013 and the only thing I'm not happy with is that about a year ago Netflix ended their app for that hardware generation (likely for phasing out whatever codec it used). Now I'm running that unit behind an Amazon fire stick and the user experience is so much worse.
(that LG was a "smart" TV from before they started enshittifying, such a delight - had to use and set up a recent LG once on a family visit and it was even worse than the fire stick, omg, so much worse!)
Imagine a criminal investigation. A witness happened to take a video as the perpetrator did the crime. In the video, you can clearly see a recognizable detail on the perpetrator's body in high quality; a birthmark perhaps. This rules out the main suspect -- but can we trust that the birthmark actually exists and isn't hallucinated? Would a non-AI codec have just showed a clearly compression-artifact-looking blob of pixels which can't be determined one way or the other? Or would a non-AI codec have contained actual image data of the birth mark in sufficient detail?
Using AI to introduce realistic-looking details where there was none before (which is what your proposed AI codec inherently does) should never happen automatically.
[0] https://github.com/RootMyTV/RootMyTV.github.io [1] https://github.com/throwaway96/downgr8
https://en.wikipedia.org/wiki/JBIG2#:~:text=Character%20subs...
AV1 was specifically designed to be friendly for a hardware decoder and that decision makes it friendly to software decoding. This happened because AOMedia got hardware manufacturers on the board pretty early on and took their feedback seriously.
VP8/9 took a long time to get decent hardware decoding and part of the reason for that was because the stream was more complex than the AV1 stream.
The material belief is that modern trained neural network methods that improve on ten generations of variations of the discrete cosine transform and wavelets, can bring a codec from "1% of knowing" to "5% of knowing". This is broadly useful. The level of abstraction does not need to be "The AI told the decoder to put a finger here", it may be "The AI told the decoder how to terminate the wrinkle on a finger here". An AI detail overlay. As we go from 1080p to 4K to 8K and beyond we care less and less about individual small-scale details being 100% correct, and there are representative elements that existing techniques are just really bad at squeezing into higher compression ratios.
I don't claim that it's ideal, and the initial results left a lot to be desired in gaming (where latency and prediction is a Hard Problem), but AI upscaling is already routinely used for scene rips of older videos (from the VHS Age or the DVD Age), and it's clearly going to happen inside of a codec sooner or later.
I'm not sure what you mean by "patent system" having a victory here, but it's not that the goal of promoting innovation is happening.
I don't see anything in that comment implying such a thing. It's just about the uptake of decoders.
AI upscaling built in to video players isn't a problem, as long as you can view the source data by disabling AI upscaling. The human is in control.
AI upscaling and detail hallucination built in to video codecs is a problem.
AI compression doesn't have to be the level of compression that exists in image generation prompts, though. A SORA prompt might be 500 bits (~1 bit per character natural English), while a decompressed 4K frame that you're trying to bring to 16K level of simulated detail starts out at 199 million bits. It can be a much finer level of compression.
This is very true, but we're talking about an entertainment provider's choice of codec for streaming to millions of subscribers.
A security recording device's choice of codec ought to be very different, perhaps even regulated to exclude codecs which could "hallucinate" high-definition detail not present in the raw camera data, and the limitations of the recording media need to be understood by law enforcement. We've had similar problems since the introduction of tape recorders, VHS and so on, they always need to be worked out. Even the phantom of Helibronn (https://en.wikipedia.org/wiki/Phantom_of_Heilbronn) turned out to be DNA contamination of swabs by someone who worked for the swab manufacturer.
I think they certainly go hand in hand in that algorithms relatively easier for software vs previously are easier for hardware vs previously and vice versa, but they are good at different things.
Bit masking/shifting is certainly more expensive in software, but it's also about the cheapest software operation. In most cases it's a single cycle transform. In the best cases, it's something that can be done with some type of SIMD instruction. And in even better cases, it's a repeated operation which can be distributed across the array of GPU vector processors.
What kills both hardware and software performance is data dependency and conditional logic. That's the sort of thing that was limited in the AV1 stream.
The coding side of "codec" needs to know what the decoding side would add back in (the hypothetical AI upscaling), so it knows where it can skimp and get a good "AI" result anyway, versus where it has to be generous in allocating bits because the "AI" hallucinates too badly to meet the quality requirements. You'd also want it specified, so that any encoding displays the same on any decoder, and you'd want it in hardware as most devices that display video rely on dedicated decoders to play it at full frame rate and/or not drain their battery. It it's not in hardware, it's not going to be adopted. It is possible to have different encodings, so a "baseline" encoding could leave out the AI upscaler, at the cost of needing a higher bitrate to maintain quality, or switching to a lower quality if bitrate isn't there.
Separating out codec from upscaler, and having a deliberately low-resolution / low-bitrate stream be naively "AI upscaled" would, IMHO, look like shit. It's already a trend in computer games to render at lower resolution and have dedicated graphics card hardware "AI upscale" (DLSS, FSR, XeSS, PSSR), because 4k resolutions are just too much work to render modern graphics consistently at 60fps. But the result, IMHO, noticibly and distractingly glitches and errors all the time.
Where did you read that it was designed to make creating an hardware decoder easier?
Ok, I don't think I'll find it. I think I'm mostly just regurgitating what I remember watching at one of the research symposiums. IDK which one it was unfortunately [1]
[1] https://www.youtube.com/@allianceforopenmedia2446/videos
But this just indicates that HEVC etc. is a dead end anyway.
When I'm watching something on YouTube on my iPhone, they're usually shipping me something like VP9 video which requires a software decoder; on a sick day stuck in bed I can burn through ten percent of my battery in thirty minutes.
Meanwhile, if I'm streaming from Plex, all of my media is h264 or h265 and I can watch for hours on the same battery life.
Continous improvement ? CADT ? What shall the next one bring ? Free meals ?
Will it, though ?
Why create a SW spec and hope that the HW will support it ? Why not design together with HW ?
He's not talking about simple bit shifts. Imagine if you had to swap every other bit of a value. In hardware that's completely free; just change which wires you connect to. In software it takes several instructions. The 65 bit example is good too. In hardware it makes basically no difference to go from 64 bits to 65 bits. In software it is significantly more complete - it can more than double computation time.
I think where software has the advantage is sheer complexity. It's harder to design and verify complex algorithms in hardware than it is in software, so you need to keep things fairly simple. The design of even state-of-the-art CPUs is surprisingly simple; a cycle accurate model might only be a few tens of thousands of lines of code.
> I'm not claiming that software will be more efficient. I'm claiming that things that make it easy to go fast in hardware make it easy to go fast in software.
The actual constraints on what makes hardware or software slow are remarkably similar. It's not ultimately the transforms on the data which slow down software, it's when you inject conditional logic or data loads. The same is true for hardware.
The only added constraint software has is a limited number of registers to operate on. That can cause software to put more pressure on memory than hardware does. But otherwise, similar algorithms accomplishing the same task will have similar performance characteristics.
Your example of the bitshift is a good illustration of that. Yes, in hardware it's free. And in software it's 3 operations which is pretty close to free. Both will spend far more time waiting on main memory to load up the data for the masking than they will spend doing the actual bit shuffling. The constraint on the software is you are burning maybe 3 extra registers. That might get worse if you have no registers to spare forcing you to constantly load and store.
This is the reason SMT has become ubiquitous on x86 platforms. Because CPUs spend so much time waiting on data to arrive that we can make them do useful work while we wait for those cache lines to fill up.
Saying "hardware can do this for free" is an accurate statement, but you are missing the 80/20 of the performance. Yes, it can do something subcycle that costs software 3 cycles to perform. Both will wait for 1000 cycles while the data is loaded up from main memory. A fast video codec that is easy to decode with hardware gets there by limiting the amount of dataloads that need to happen to calculates a given frame. It does that by avoiding wonky frame transformations. By preferring compression which uses data-points in close memory proximity.