Planning Out the Future of Media on Linux at the Linux Kernel Summit Media Workshop – Seoul, South Korea

The Linux Kernel Summit Media Workshop was held on October 26, 2015 in Seoul, South Korea. We had 21 attendees from various companies and places in the world gather to discuss the current state of Linux Media and the challenges that need to be overcome to push these technologies into the future. This article will cover the major topics that were discussed during this workshop and the decisions that were made about the direction of this community.

 

Planning Out the Future of Media on Linux at the Linux Kernel Summit Media Workshop Seoul South Korea - ks2015_media_ws_group_photo

Codec API Improvements

Currently, Video4Linux version 2 (V4L2) memory to memory devices uses the Stream API to handle coders/decoders (codecs), but it might be time for us to develop a Frame API to handle newer devices.

Stream API

The original V4L2 codec API was developed along with the Exynos codec driver. Since the device implements high-level operations in hardware, the resulting API was high-level as well with drivers accepting unprocessed raw streams. This matches older ARM SoCs where CPU power wasn’t deemed enough to implement stream parsing.

Drivers implement two V4L2 buffer queues, one on the uncompressed side and one on the compressed side. The two queues operate independently without a 1:1 correspondence between consumed and produced buffers. The mem2mem V4L2 kernel framework can’t be used to implement these codec drivers because it hard codes this 1:1 correspondence. This is a Kernel framework issue, not a V4L2 userspace API issue.

Frame API (Slice API)

CPUs are getting faster in the ARM world, and the trend is to implement lower-level hardware codecs that require stream parsing on the CPU. CPU code needs to slice the stream, extract information, process it, and pass the result to a shader-like device. This is the model used on Intel platforms and implemented in the VA API library.

Drivers still implement two V4L2 buffer queues, but the encoded stream is split into frames of slices, and a large number of codec-specific controls need to be set from parsed stream information.

Stream parsing and parameter calculation is better handled in the user space. The user space is responsible for managing reference frames and for passing data to the codec in such a way that an input buffer will always produce an output buffer. The two queues operate together with a 1:1 correspondence between buffers. The memory to memory (mem2mem) framework is usable in this case.

The sources buffer contains only slice data (macroblocks + coefficient data). The controls contain information extracted from stream parsing, a list of reference frames, and a DPB (Decoded Picture Buffer). The request API can be used to associate controls with source buffers.

Keeping references to reference frames is one of the remaining problems (especially with DMABUF, and possibly with MMAP in the future when we’ll have the ability to destroy MMAP buffers while streaming). This problem also exists for the stream API, and more discussion is needed to design a solution.

For encoders, header generation should probably be done in the user space as well, because the code is complex and doesn’t require a kernel implementation.

The user space code should be implemented as libv4l2 plugins to interface between the frame API exposed by the kernel and a stream API exposed to applications.

References:

Intel libVA SDK: https://bugs.freedesktop.org/show_bug.cgi?id=92533
Request API: https://lwn.net/Articles/641204/
Chromium Code (user): https://code.google.com/p/chromium/codesearch#chromium/src/content/common/gpu/media/v4l2_slice_video_decode_accelerator.cc

kABI and API Documentation status

There are two types of documentation for the medis subsystem inside the Kernel tree:

  • the Linux Media Infrastructure API, with describes the userspace/kernelspace Aplication Program Interface;
  • the media device drivers kABI, with describes the Kernel Aplication Basic interface, that should be used by the media Kernel drivers to integrate with the media core drivers.

In the case of the Linux Media Infrastructure API, the current status is:

  • The DVB documentation API for DVR/demux and CA are outdated and need some work.
  • The level of documentation at V4L2 is superior than the ones provided for the other media APIs.

For the kABI the current status is that the DVB Demux and CA documentation are OK, but the remaining media documentation is incomplete, as it lacks documentation for several data structs and even some important functions used by DVB, Video4Linux, the Media Controller and Remote Controller drivers.

In the Video4Linux part of the documentation, non-core V4L2 headers are placed in the same directory of the core headers, making it harder to identify the gaps. It was agreed to move some documentation from Documentation/{video4linux,dvb} to the kAPI DocBook. During our discussion, it was pointed that the kABI needs more review by developers and sub-maintainers.

We decided that all new kernel kABI need to be documented, but it was agreed that adding new fields to undocumented structures doesn’t require the submitter to document the full struct (although it obviously would be very much appreciated). One suggestion was to use the kernel-doc script also for the ‘top-level’ documentation, as found at drivers/gpu/drm/drm_atomic_helper.c (look for “DOC: overview”) and Documentation/DocBook/drm.tmpl (look for “!Pdrivers/gpu/drm/drm_atomic_helper.c overview”) for the media documentation.

For more information about this, check out the slides that were presented on this subject:

uAPI Improvements for DVB Streaming

There are some demand for improving the DVB core in order to better support the needs for TV embeded hardware, at the videobuf2 and a way to represent SoC pipelines.

videobuf2

Junghak Sung (Samsung) covered the current status of the videobuf2 Digital Video Broadcasting (DVB) patches:

  • Added DVB memory map (mmap) support using a set of ioctls similar to the ones at V4L2 side.
  • No direct access to DMA buffers (DMABUF) support at the moment; it was recommended to add support for such feature.
  • Support for DVB output (‘mem2mem demux’) is also recommended, although this is not required for the first version of the DVB.
  • Streaming on/off is automatically called in the DVB API: should this be available as I/O control system calls (ioctl)? On a first glance, no need was seen for this, so the proposal doesn’t add support for it.
  • DVB framework needs to be extended to see whether the user space uses read() or stream I/O system calls. As Video4Linux does this as well, it was proposed to use the same method for DVB too.
  • No need to add (or remove) buffers at runtime (à la CREATE_BUF does, at the V4L2 API) is foreseen.

SoC pipelines

Samsung SoCs have full hardware pipelines from the tuner output to the screen that, once set, won’t require any intervention at runtime. This requires connecting a DVB device to a V4L2 and/or DRM device.

Hardware pipelines include hardware-configured scalers for seamless resolution changes in order to achieve real-time operation when the input resolution changes. This is because there are some requirements for changing the resolution without causing any visible image glimpses. Currently, it’s hard to make this work with the current DVB framework without the Media Controller.

v4l2_buffer & Y2038

Should we create a new structure and fix several other issues in one go? This could be an opportunity to start supporting sub-streams.

Arnd Bergmann (Linaro) pointed out that even if we do this, backwards-compatible code will be needed in order to handle the changes in size of the structure between 32 and 64 bit calls. It sounds reasonable, though since it’s time to create a v4l2_buffer_v2 now we’ll need to increase the size of the struct anyway.

There might be a cache-related problem due to the way DMABUF is handled. DMABUF buffers are only unmapped when applications queue a different DMABUF fd on the same V4L2 buffer index. By the time the unmap is carried out, the driver doesn’t have access to the buffer, but it still tries to clean the cache. The root cause of this needs to be investigated.

A related issue is in how V4L2 keeps DMABUF mappings around for performance improvement. This prevents buffer eviction by the GPU, and  this should be discussed with DRM developers and dma-buf maintainers.

We also discussed whether or not we should  add support for sub-streams in a buffer. This is a potential option for 3D capture, but for other cases there’s not so much interest, and use cases would be needed before we could take action.

Formats are specified for buffers and makes it difficult to transport meta-data in a plane along with image planes. Redesign of v4l2_buffer and v4l2_plane should take such use case into account.

poll() system call behavior for Output Streams

The current poll() behavior is not usable for codecs. Since mem2mem is considered to be the main use case for output devices, we need to optimize for it and implement write emulation specific code in poll() as a special case. We’ll revert to the old implementation and update the documentation accordingly.

poll()system call behavior for Capture Streams

The logic to wait for buffers at the Videobuf2 queue (q->waiting_for_buffers) is v4l2 specific, but after the vb2 split, it is now part of the vb2 core. We discussed whether this should be moved to v4l2 or be kept in the vb2 core.
After our discussion, we came to the conclusion that we should make this v4l2 specific and allow DVB (and other subsystems) to use the standard behavior.
Proposals are welcome on ways to make the user space select the ‘waiting_for_buffers’ behavior for V4L2 since it’s awkward as well (but needed for backwards compatibility).

Hardware Color Space Conversion (CSC) Status Update

For video output hardware CSC already works; the V4L2 struct that contains the pixel format for uniplane and multiplane streams, e. g. v4l2_pix_format and v4l2_pix_format_mplane, contains all the information of the memory format, including color space information. The driver can enable CSC hardware as needed. However, for video capture the color space information (colorspace, ycbcr_enc, quantization and xfer_func fields) is documented as being returned by the driver meaning the application cannot set it to a desired color space.
The suggested solution is to add a new flag (V4L2_PIX_FMT_FLAG_REQUEST_CSC) at the v4l2_pix_format flags field. If such flag is set, the application-provided color space information will be used by the driver, otherwise the color space fields are zeroed. The core will always clear the flag, and drivers have to set it in the ioctls that use it (VIDIOC_G_FMT, VIDIOC_TRY_FMT, VIDIOC_S_FMT) to signal that they support HW color space conversion. For drivers, if all color space fields are 0, then the color space of the capture stream will be returned, otherwise a CSC will be attempted.

HDMI-CEC Status Update

The last patch series for HDMI-CEC was v9. The primary remaining to do is to rewrite the documentation for the CEC framework internals and to address some cleanup requests from Russell King.
Hans Verkuil (Cisco) voiced a displeasure with cec_ready() v4l2_subdev ops callback, but considering this is an internal API, it can be changed later. This call is used when the CEC adapter is created.

VIDIOC_CREATE_BUFS ioctl behavior with regards to V4L2 format Validation

We also discussed how to validate the V4L2 stream format, given by struct v4l2_format format. It is documented as “Filled in by the application, preserved by the driver” and “Unsupported formats will result in an error.” However, most drivers only use the sizeimage field, and some do partial validation (usually the pixel format and resolution). Validation is done in the vb2 queue_setup() callback which leads to a void pointer argument in vb2 core. The original idea was for this format to be different different from the current, and the created buffers would be valid for the requested format. In practice it’s the different buffer size that is of interest.  This results in an inconsistent behavior, and the question is how to fix it.

It was proposed to handle it the same way as REQUEST_BUFS: expect that user space calls TRY_FMT or similar to fill the format field and only use the sizeimage field in drivers.

Media Controller Next Generation status update

We briefly discussed the MC next gen status, focusing the discussions on how this will work with the Advanced Linux Sound Architecture (ALSA). Mark Brown (Linaro) and Vinod Koul (Intel) pointed out that the current ALSA patches are barely touching the needs; an aSoC pipeline would be a way more complex than what was done so far on the ALSA patchset. Vinod and Mark are looking for a way to be able to set a topology using ‘cat file >/dev/mediaX‘. This is currently provided via request_firmware logic on aSoC. We need to find a way to keep a core MC support that would be consistent with the current topology setup on aSoC. Both ALSA and media developers need to be involved on such development.

Atomic configuration updates across subsystems (MC for links, V4L2/ALSA for other parameters, …) also need to be taken into account. Ordering matters for audio device routing changes unless the hardware can apply routing changes atomically.

VIDIOC_SUBDEV_QUERYCAP ioctl addition

QUERYCAP is desired for v4l-subdev devices, because there are currently no ioctls that are always available for v4l-subdev devices. Having such an ioctl will make v4l2-compliance easier to write. It would also be nice to find the media controller from a device node.

Two solutions were proposed:

  1. Create a struct v4l2_subdev_capability and add a new V4L2-specific ioctl.
  2. Add support for a MEDIA_IOC_DEVICE_INFO ioctl to all MC-aware interfaces.

It was suggested to check if implementing MEDIA_IOC_INTERFACE_INFO is easy to be added on the various subsystems (dvb, alsa, drm, iio).

DELETE_BUFFERS ioctl addition

Laurent Pinchart (Ideas on Board) suggested we create a new ioctl that is able to dynamically delete some buffers created via CREATE_BUFFERS. Related to this issue is allowing for more than 32 buffers (this has been requested in the past). One of the issues is if the buffer numbers will have “holes” or if the Kernel will dynamically re-allocate bufs array in vb2_queue. The problem with having holes is the need for an efficient index-to-vb2_buffer lookup. We agreed that we’d like to have it in addition to the ability to have more than 32 buffers.

Workshop Format

The Linux Kernel Summit Media Workshop is very technical in its current form, and it’s hard for non-core developers to follow. We discussed some of alternatives to make it more accessible, and we’ve considered splitting it into two parts:

  • The first would be less technical, and focus more on status updates, presenting work in progress,  and would invite presentations from a boarder audience, such as application developers.
  • The second part would be technical content for core devs.

We considered breaking this up into a two day format, or doing two one-day workshops a year, one technical and one open for all. We are going to look at how other subsystems do this in order to choose the best rout of action. It was proposed to do the two day approach for media workshops that have at least two days available. It was suggested that we should do one of the media workshops next year co-located with ALSA workshop, in order to address the Media Controller related questions.

Looking Ahead

As always, this workshop was highly productive and allowed us to work out many issues with Linux media. South Korea was a great location to host this event, and we look forward to seeing some familiar faces at Linux Kernel media workshops in 2016!

Author: Mauro Carvalho Chehab

Mauro is the maintainer of the Linux kernel media and EDAC subsystems and Tizen on Yocto. He's also a major contributor to the Reliability Availability and Serviceability (RAS) subsystems.