Playback Synchronization & Video Walls with GStreamer

Hello again, and I hope you’re having a pleasant end of the year (if you are, you might want to consider avoiding the news until next year).

In a previous post, I wrote about synchronized playback with GStreamer, and work on this has continued apace. Since I last wrote about it, a bunch of work has been completed:

  • Added support for sending a playlist to clients (instead of a single URI),
  • Added the ability to start/stop playback,
  • Cleaned up the API considerably to improve the potential for it to be included upstream,
  • Turned the control protocol implementation into an interface to remove the necessity to use the built-in TCP server (different use-cases might want different transports),
  • Improved overall robustness of code and documentation,
  • Introduced an API for clients to send the server information about themselves, and finally
  • Added an API for the server to send video transformations for each specific client to apply before rendering.

While the other bits are exciting in their own right, in this post I’m going to talk about the last two items.

Video Walls

For those of you who aren’t familiar with the term, a video wall is an array of displays that are aligned to make a larger display; these are often used in public installations. One way to set up a video wall is to have each display connected to a small computer (such as the Raspberry Pi), and have them play a part of a video that’s cropped and scaled for the display it’s connected to.

Playback Synchronization Video Walls with GStreamer - video-wall

The tricky part, of course, is synchronization; this is where gst-sync-server comes in. Since it’s already possible to play a given stream in sync across devices on a network the only missing piece was the ability to distribute a set of per-client transformations that the clients could apply; support for this is now complete.

In order to keep things clean from an API perspective, I took the following approach

  • Clients now have the ability to send a client ID and configuration (which is just a dictionary) when they first connect to the server.
  • The server API emits a signal with the client ID and configuration, this provides the ability to know when a client connects, what kind of display it’s running, and where it’s positioned.
  • The server now has additional fields to send a map of client IDs to a set of video transformations.

This allows us to do fancy things, like having each client manage its own information with the server while dynamically adapting the set of transformations based on what’s connected. Of course, the simpler case of having a static configuration on the server also works.


Since seeing is believing, here’s a demo of the synchronised playback in action:

The setup is my laptop, which has an Intel GPU, and my desktop, which has an NVidia GPU. These are connected to two monitors.

The video resolution is 1920×800, and I’ve adjusted the crop parameters to account for the bezels, so the video looks continuous. I’ve uploaded the text configuration if you’re curious about what that looks like.

As I mention in the video, the synchronization is not as tight than I would like it to be. This is most likely because of the differing device configurations. I’ve been working with Nicolas Dufresne to address this shortcoming by using some timing extensions the Wayland protocol allows; more news on this as it breaks. More generally, I’ve also worked to quantify the degree of sync, but I’m going to leave that for another day.

p.s. the reason I used kmssink in the demo was that it was the quickest way I know of to get a full-screen video going. I’m happy to hear about alternatives, though

Future Work

Make it Real

I implemented my demo quite quickly by having the example server code use a static configuration. What I would like is to have a proper application that people can easily package and deploy on the embedded systems used in real video walls. If you’re interested in taking this up, I’d be happy to help out. Bonus points if we can dynamically calculate transformations based on client configuration (position, display size, bezel size, etc.)

Hardware Acceleration

One thing that’s bothering me is that the video transformations are applied using GStreamer software elements. This works fine(ish) for the hardware I’m developing on, but in real life OpenGL(ES) or platform specific elements should be used to have hardware-accelerated trasnformations. My initial thoughts are for this to be either an API for playbin or a GstBin that takes a set of transformations as parameters and sets up the best method to do this internally based on whatever sink is available downstream (some sinks provide cropping and other transformations).

Why not Audio?

I’ve only written about video transformations here, but we can do the same with audio transformations too. For example, multi-room audio systems allow you to configure the locations of wireless speakers — so you can set which one is left, right, center, etc. — and the speaker will automatically play the appropriate channel. It should be quite easy to implement this with the infrastructure that’s currently in place.

Here’s to the New Year!

I hope you enjoyed reading this post, I’ve had great responses from a lot of people about how they might be able to use this work. If there’s something you’d like to see, leave a comment or file an issue.

Happy end of the year, and all the best for 2017!

Author: Arun Raghavan

Arun is a developer on the GStreamer multimedia framework and the PulseAudio multimedia server. He lives in Bangalore, India, and generally lurks online as `louiswu` or `Ford_Prefect`.