Linux Kernel License Practices Revisited with SPDX®

The licensing text in the Linux kernel source files is inconsistent in verbiage and format. Typically, in each of its ~100k files there is a license text that describes which license applies to each specific file. While all licenses are GPLv2 compatible, properly identifying the licenses that are applicable to a specific file is very hard. To address this problem, a group of developers recently embarked on a mission to use SPDX® to research and map these inconsistencies in the licensing text. As a result of this 10 month long effort, the Linux 4.14 release includes changes to make the licensing text consistent across the kernel source files and modules.

Linux Kernel License

As stated on its COPYING file, the Linux kernel’s default license is GPLv2, with an exception that grants additional rights to the kernel users:

  • NOTE! This copyright does *not* cover user programs that use kernel
    services by normal system calls - this is merely considered normal use
    of the kernel, and does *not* fall under the heading of "derived work".
    Also note that the GPL below is copyrighted by the Free Software
    Foundation, but the instance of code that it refers to (the Linux
    kernel) is copyrighted by me and others who actually wrote it.
    
    Also note that the only valid version of the GPL as far as the kernel
    is concerned is _this_ particular version of the license (ie v2, not
    v2.2 or v3.x or whatever), unless explicitly otherwise stated.
    

The kernel’s COPYING file produces two practical effects:

  1. User-space applications can use non-GPL licenses by using the the above mentioned exception.
  2. It allows using different licenses in the kernel’s source files, when explicitly defined as such.

The Current Kernel License Model

A common practice is to add a comment at the beginning of each file with some sort of text like:

  • /*
     * Copyright (c) 2017 by foo <foo@bar>
     *
     * This program is free software; you can redistribute it and/or
     * modify it under the terms of the GNU General Public License
     * as published by the Free Software Foundation; either version 2
     * of the License, or (at your option) any later version.
     *
     * This program is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
     * GNU General Public License for more details.
     */
    

However, Philippe Ombredanne’s research showed that:

  • there are 64,742 distinct license statements
    • … in 114,597 blocks of text
    • … in 42,602 files
  • license statements represent 480,455 lines of text;
  • licenses are worded in 1,015 different ways;
  • there are about 85 distinct licenses, the bulk being the GPL.

Also, before kernel 4.14, there were about 11,000 files without any license at all. However, due to the COPYING file they’re defaulted to be GPLv2. This inconsistency makes it complex to determine which license applies to a particular kernel version.

Software Package Data Exchange® (SPDX®)

The Linux Foundation has sponsored the SPDX® project to solve the license identification challenges in open source software. The goal of the project is to provide license identifiers inside the source code that could be easily parsed by machines and would allow checking for license compliance of an open source project easier.

In practice, supporting SPDX® inside source code is as simple as adding an SPDX® tag (SPDX-License-Identifier) with the license that applies (usually, GPL-2.0). If you’re the copyright holder, you may also consider removing the now redundant licensing text.

An example commits that add SPDX® tags and cleanup redundant license from USB over IP driver:

commit 5fd54ace4721fc5ce2bb5aef6318fcf17f421460
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Nov 3 11:28:30 2017 +0100

    USB: add SPDX identifiers to all remaining files in drivers/usb/
    
    It's good to have SPDX identifiers in all files to make it easier to
    audit the kernel tree for correct licenses.
    
    Update the drivers/usb/ and include/linux/usb* files with the correct
    SPDX license identifier based on the license text in the file itself.
    The SPDX identifier is a legally binding shorthand, which can be used
    instead of the full boiler plate text.
    
    This work is based on a script and data from Thomas Gleixner, Philippe
    Ombredanne, and Kate Stewart.
    
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Kate Stewart <kstewart@linuxfoundation.org>
    Cc: Philippe Ombredanne <pombredanne@nexb.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
    Acked-by: Johan Hovold <johan@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 11b9a22799cc..0e16a7dcfb38 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0+

...

Depending on the type of the source file, the tag will be either one of the tags below:

SPDX® licence identifier tags at the source code
Type of file SPDX License tag
C source: // SPDX-License-Identifier:
C header: /* SPDX-License-Identifier: */
ASM: /* SPDX-License-Identifier: */
scripts: # SPDX-License-Identifier:
.rst: .. SPDX-License-Identifier:
.dts{i}: // SPDX-License-Identifier:

Replacing the licenses inside each source file by a single SPDX license will likely reduce the kernel source files by ~400k lines, with is a nice cleanup, and solve all those issues with ~64k different license statements.

Future Work

Besides the license on each kernel source file, all Linux modules use MODULE_LICENSE() macro to specify their individual license. Right now, the following values for the macro are valid for a module within the official kernel sources:

Types of licenses for MODULE_LICENSE() macro
Macro argument License
“GPL” GNU Public License v2 or later
“GPL v2” GNU Public License v2 only
“GPL and additional rights” GNU Public License v2 rights and more
“Dual BSD/GPL” GNU Public License v2 or BSD license choice
“Dual MIT/GPL” GNU Public License v2 or MIT license choice
“Dual MPL/GPL” GNU Public License v2 or MPL license choice

Any other value “taints” the kernel when such module is loaded, in order to inform the user that a proprietary module was loaded.

With the addition of SPDX® at the kernel, the next step will be to use SPDX tags for Module License. The current plans seem to be to replace the MODULE_LICENSE() macro with MODULE_LICENSE_SPDX() that would take the same SPDX identifier as used in the source code and convert to the values above to keep backward compatibility. The method to achieve this is still under discussion at the Linux Kernel Mailing List.

Additional References:

LWN.net has an interesting article covering other aspects: SPDX identifiers in the kernel. The Free Software Foundation Europe has a set of best practices for License identification at the source code, called REUSE.

The SPDX Trademark is owned by the Linux Foundation.

Author: Mauro Carvalho Chehab

Mauro is the maintainer of the Linux kernel media and EDAC subsystems and Tizen on Yocto. He's also a major contributor to the Reliability Availability and Serviceability (RAS) subsystems.