Following Points will be covered in this Blog :
- Importance of Kernel Testing
- Common Problems with Kernel & Updates
- Common Kernel Testing Methods
- Kernel Regressions Test Scenarios
- Kernel Concurrency & Data Race Bugs
- Proposed Kernel Test Methodology
- Kernel Stability, Stress, Regression Test Tools
Importance of Kernel Testing
Kernel Unit testing, Continuous integration testing, regression, and stress testing are important for debugging and fixing customer and user found bugs before and after the release.
It costs more in time and effort to debug and fix a customer found problem.
Booting and testing mainline and linux-next helps find and fix problems before the kernel is released. Testing the kernel and learning how to debug problems both brings immediate value to the kernel and system uptime.
Common Problems with Kernel & Updates
A kernel that worked for the user and a more recent kernel no longer works for the user.
Kernel bugs make it harder for users to upgrade the kernel and result in many people running old kernels with known security holes.
The real problems are for example: – system doesn’t boot – system crashes with a specific workload – mp3 playback now stutters when copying files in the background etc.
Automated tests can find some of these problems, but many of the problems that affect only some specific hardware or the interactive feeling of the computer can not be found automatically.
Common Kernel Testing Methods
General Kernel Stress Tests : This will primarily stress VFS subsystem and generate VM pressure to cause swapping. These tests will help uncover locking issues on a SMP system. The tests are targeted at finding out incorrect/inadequate error handling.
Syscall fuzzing with random seeds helps simulate lack of synchronization with timer interrupts, disk block layout, disk access latencies (stressing heads/platters speed changes, disk FW ops).
Variable/incremental nCPUs and amt-RAM. Validating with fsck and dumpfs-like tools.
mmap, forkbomb, pipe, pthread, lock, memcache, mirrors, snapshots, zfs test of vdev as a file and snapshot clones et al based tests, Parallel mount / umount and zpool hang.
Kernel Regressions Test Scenarios
Install and boot new Kernel/patch and do following ‘Basic Testing‘ : Test Networking, ssh, start web browser, rsync large file, git clone and pull, download files, wget, ftp, Play Audio and Video files, connect new external devices etc.
Check for kernel regressions/errors with dmesg -t -l mer|crit|alert|err|warn
Run Ktest and kselftest for general functional and sanity kernel testing. ktest is an automated test suite that can test builds, installs, and kernel boots.
LAVA-Test Automated Testing Framework is a framework to help with automated installation and executions of kernel tests (running LTP in LAVA framework can be accomplished with a few commands).
Kernel Concurrency & Data Race Bugs
A race condition is caused by unexpected dependencies on the relative timing of events. Ex : A R&D programmer incorrectly assumed that a particular event would always happen before another.
Some of the common causes of race conditions are signals, access checks, and file opens. Operating-system kernels often use spinlocks : ‘spinlock’ causes a thread trying to acquire it to simply wait in a loop (“spin”) while repeatedly checking lock availability. Since the thread remains active but is not performing a useful task, resources are wasted and end up in time outs.
Kernel concurrency bugs are very difficult to find as they are only triggered under certain instruction inter-leavings.
kernel developers find concurrency bugs mostly by manual code inspection and stress testing (applying intense workloads to increase the chances of triggering concurrency bugs).
Code inspection is labor-intensive and requires significant skill and experience, and stress testing, despite having low overhead and being amenable to automation, offers no guarantees and can easily fail to uncover difficult to find concurrency bugs. Kernel Validation is related to some quality factors, such as: availability, reliability, response time, and throughput.
Proposed Kernel Test Methodology
Stressing individual system calls, Stressing at Low system resources, Introduce Resources Hogging Test Programs. Supplying Random inputs (no of invocations, startup delays, Test Program Mix).
Stress Test around : Kernel Panics, Livelocks, Deadlocks, Memory Leaks, UFS2 snapshots, Mangling ELF Headers (analyse, transform and manipulate binary data based on ELF symbol tables), Mangling File systems etc.
Fuzz testing is Black Box testing technique for finding implementation bugs using malformed/semi-malformed data injection in an automated fashion.(Memory Leaks, race conditions, Control flow integrity, buffer overflows, etc)
Kernel Locking Tests : Test Cases for kernel’s natural use of spinlocks, rwlocks, mutexes and rwsems.
Validation of Kernel locking should consider all possible “deadlock scenarios” such as: assuming arbitrary number of CPUs, arbitrary irq context and task context constellations, running arbitrary combinations of all the existing locking scenarios, multi-CPU, multi-context races etc.
fault-injection : inject slab allocation failures. kmalloc(), kmem_cache_alloc(), inject page allocation failures. alloc_pages(), get_free_pages(), injects disk IO errors on devices permitted by setting, /sys/block//make-it-fail (generic_make_request()). Boot options can be used to inject faults during early boot before debugfs becomes available.
Kernel Stability, Stress, Regression Test Tools
Working smarter applies to testing
It is a simple workload generator for POSIX systems. It imposes a configurable amount of CPU, memory, I/O, and disk stress on the system.
Important Test include : CPU compute, Cache thrashing, Drive stress, I/O syncs, VM stress, Socket stressing, Context switching, Process creation and termination.
It includes over 60 different stress tests, over 50 CPU specific stress tests that exercise floating point, integer, bit manipulation and control flow, over 20 virtual memory stress tests.
 The Linux Test Project test suite http://ltp.sourceforge.net/
It provides conformance, functional, and stress testing. Its focus is on Threads, Clocks & Timers, Signals, Message Queues, and Semaphores.
 The Ballista Project http://www-2.cs.cmu.edu/afs/cs/project/edrc-ballista/www/index.html (Part of LTP Now)
Test cases of invalid system call to characterize these exceptions. Ballista testing can find ways to make operating systems crash and force software packages to suffer abnormal termination instead of gracefully returning error indications.
The Ballista Project , implements stress testing of syscalls, but is not straight forward to port.
 http://valgrind.org/docs/manual/hg-manual.html : Helgrind is a thread safety checker and finding race conditions. Test Cases set of threads sharing a common address space, thread creation, thread joining, thread exit, mutexes (locks), condition variables (inter-thread event notifications), reader-writer locks, spinlocks, semaphores and barriers.
 https://github.com/skivm/ski : SKI is an experimental virtual machine monitor, based on QEMU, that allows developers to test operating system kernels for concurrency bugs.
SKI: Exposing Kernel Concurrency Bugs through Systematic Schedule Exploration
 http://kerncomp.sourceforge.net/ : repository of scripts and other resources for the automated building and regression testing of the Linux kernel
 https://github.com/jmmv/kyua : Kyua is a testing framework for infrastructure software, originally designed to equip BSD-based operating systems with a test suite.
 https://github.com/markjdb/sysfuzz : sysfuzz is a program that attempts to perform fuzz testing of the FreeBSD kernel’s system call interface.
Hackbench benchmark and a Kernel stress test scheduler.
It creates a specified number of pairs of schedulable entities (either threads or traditional processes) which communicate via either sockets or pipes and time how each pair to send data back and forth.
Kernel Debugging Tools : kDB, kprobes, oprofile & kdump
https://sourceware.org/systemtap/ : SystemTap eliminates the need for the Testers to go through the tedious and disruptive instrument, recompile, install, and reboot sequence that may be otherwise required to collect data.
kmemcheck is a dynamic checking tool that detects and warns about some uses of uninitialized Kernel memory.
kmemleak can be used to detect possible kernel memory leaks in a way similar to a tracing garbage collector.
AutoTest : It is designed primarily to test the Linux kernel, though it is useful for many other purposes such as qualifying new hardware, virtualization testing and other general user space program testing under linux platforms.
kernelci.org is a community based, open source distributed test automation system focused Linux kernel testing efforts in order to provide a single place where to store, view, compare and track these results.
Very Important to detect, bisect, report and fix regressions on upstream Kernel trees before they even reach mainline commit.