Introducing graphics offload

Some of us in the GTK team have spent the last month or so exploring the world of linux kernel graphics apis, in particular, dmabufs. We are coming back from this adventure with some frustrations and some successes.

What is a dmabuf?

A dmabuf is a memory buffer in kernel space that is identified by a file descriptor. The idea is that you don’t have to copy lots of pixel data around, and instead just pass a file descriptor between kernel subsystems.

Reality is of course more complicated that this rosy picture: the memory may be device memory that is not accessible in the same way as ‘plain’ memory, and there may be more than one buffer (and more than one file descriptor), since graphics data is often split into planes (e.g. RGB and A may be separate, or Y and UV).

Why are dmabufs useful?

I’ve already mentioned that we hope to avoid copying the pixel data and feeding it through the GTK compositing pipeline (and with 4k video, that can be quite a bit of data for each frame).

The use cases where this kind of optimization matters are those where frequently changing content is displayed for a long time, such as

Video players
Virtual machines
Streaming
Screencasting
Games

In the best case, we may be able to avoid feeding the data through the compositing pipeline of the compositor as well, if the compositor supports direct scanout and the dmabuf is suitable for it. In particular on mobile systems, this may avoid using the GPU altogether, thereby reducing power consumption.

Details

GTK has already been using dmabufs since 4.0: When composing a frame, GTK translates all the render nodes (typically several for each widget) into GL commands, sends those to the GPU, and mesa then exports the resulting texture as a dmabuf and attaches it to our Wayland surface.

But if the only thing that is changing in your UI is the video content that is already in a dmabuf, it would be nice to avoid the detour through GL and just hand the data directly to the compositor, by giving it the file descriptor for the the dmabuf.

Wayland has the concept of subsurfaces that let applications defer some of their compositing needs to the compositor: The application attaches a buffer to each (sub)surface, and it is the job of the compositor to combine them all together.

With what is now in git main, GTK will create subsurfaces as-needed in order to pass dmabufs directly to the compositor. We can do this in two different ways: If nothing is drawn on top of the dmabuf (no rounded corners, or overlaid controls), then we can stack the subsurface above the main surface without changing any of the visuals.

This is the ideal case, since it enables the compositor to set up direct scanout, which gives us a zero-copy path from the video decoder to the display.

If there is content that gets drawn on top of the video, we may not be able to get that, but we can still get the benefit of letting the compositor do the compositing, by placing the subsurface with the video below the main surface and poking a translucent hole in the main surface to let it peek through.

The round play button is what forces the subsurface to be placed below the main surface here.

GTK picks these modes automatically and transparently for each frame, without the application developer having to do anything. Once that play button appears in a frame, we place the subsurface below, and once the video is clipped by rounded corners, we stop offloading altogether. Of course, the advantages of offloading also disappear.

The graphics offload visualization in the GTK inspector shows these changes as they happen:

Initially, the camera stream is not offloaded because the rounded corners clip it. The magenta outline indicates that the stream is offloaded to a subsurface below the main surface (because the video controls are on top of it). The golden outline indicates that the subsurface is above the main surface.

How do you use this?

GTK 4.14 will introduce a GtkGraphicsOffload widget, whose only job it is to give a hint that GTK should try to offload the content of its child widget by attaching it to a subsurface instead of letting GSK process it like it usually does.

To create suitable content for offloading, the new GdkDmabufTextureBuilder wraps dmabufs in GdkTexture objects. Typical sources for dmabufs are pipewire, video4linux or gstreamer. The dmabuf support in gstreamer will be much more solid in the upcoming 1.24 release.

When testing this code, we used the GtkMediaStream implementation for pipewire by Georges Basile Stavracas Neto that can be found in pipewire-media-stream and libmks by Christian Hergert and Bilal Elmoussaoui.

What are the limitations?

At the moment, graphics offload will only work with Wayland on Linux. There is some hope that we may be able to implement similar things on MacOS, but for now, this is Wayland-only. It also depends on the content being in dmabufs.

Applications that want to take advantage of this need to play along and avoid doing things that interfere with the use of subsurfaces, such as rounding the corners of the video content. The GtkGraphicsOffload docs have more details for developers on constraints and how to debug problems with graphics offload.

Summary

The GTK 4.14 release will have some interesting new capabilities for media playback. You can try it now, with the just-released 4.13.3 snapshot.

Please try it and let us know what does and doesn’t work for you.