semu Contribution: Create VirtIO Sound Device Playback

Introduction

semu ¹ is a minimalist RISC-V system emulator which runs a guest Linux Kernel and corresponding userland. It utilizes VirtIO to access the I/O resources reside on host (called para-virtualization).

VirtIO specifies the guest how to interact the I/O resources resides on host, and there is no exception of sound resource. Created by OpenSynergy ², such specification lets guest OS can use sound resource in automobile area as the application usually resides in an isolated environment which full virtualization becomes the bottleneck of I/O transmission.

In this post, I make a contribution that creates the very first VirtIO sound device playback, which is applied on RISC-V system emulator that use MMIO as its interrupt basis, on the planet.

Goal

This contribution aims for these goals:

Create a VirtIO sound device playback.
Support Linux and macOS host.

The whole content of the contribution can be viewed in here: https://github.com/sysprog21/semu/pull/53

Implementation

Prepareing Environment

As the guest OS is Linux, you have to activate the ALSA ³ and sound VirtIO driver building options in the configuration of Linux building:

# ALSA requires System V IPC
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y

CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_VIRTIO=y

Furthermore, you have to install some ALSA utilities for testing the playback. Taking buildroot setting as example:

BR2_PACKAGE_ALSA_UTILS=y
BR2_PACKAGE_ALSA_UTILS_APLAY=y
BR2_PACKAGE_ALSA_UTILS_SPEAKER_TEST=y

Initialization

The initialization setup is straightforward: follow what the specification tells you to do.

If the initialization is set up correctly, you will receive such messages when booting up:

[    4.011962] ALSA device list:
[    4.015962]   #0: VirtIO SoundCard at platform/f4400000.virtio/virtio2

Playing Sounds

How the Driver Sends PCM Frames to Device

Before we let the device plays sound, we need to realize how the driver sends PCM frames. By observation on Linux Kernel v6.7, its sound driver does these:

Send PREPARE command.
- At the meantime, the sound driver sends PCM frames for pre-buffering.
Send START command to start playing.
Send STOP command to stop playing.
- Meanwhile, the sound driver stop sending PCM frames.
Send RELEASE command to release the stream.

As such, we need to implement the threading model as follows:

A multi-thread model to serve the control and TX events at the same time.
A queue to store PCM frames.

Threading Model

I propose my threading model as below ASCII art:

# originally generated by Google Nano Banana Pro with Gemini 3
# then edited by me

+-----------------------------------------------------------------------------+
|                             THREADING MODEL                                 |
+-----------------------------------------------------------------------------+
|                                                                             |
| [PRODUCER: TX THREAD]                        [CONSUMER: CALLBACK THREAD OF  |
|                                                         SOUND BACKEND]      |
|                                                                             |
|    +==========+                                                             |
|    | TX virtq |                                                             |
|    +==========+                                                             |
|        |                                                                    |
|        v (1) Fetch PCM Frames                                               |
|  +-------------+                                                            |
|  |   TX-THRD   |                                          +---------------+ |
|  |             |                                          | CALLBACK-THRD | |
|  | [Accumulate]|                                          |               | |
|  |      |      |                                          |   [WAITING]   | |
|  |   <Check>   |                                          |   (Blocked    | |
|  | Period Size |                                          |    on CV)     | |
|  |   Reached?  |                                          |               | |
|  +------+------+                                          +------+--------+ |
|         |                                                        ^          |
|         | (Yes: Batch Ready)                                     |          |
|         |                                                        |          |
|         | (2) Enqueue Batch                                      |          |
|         v                    +===============+                   |          |
|         +------------------->|     QUEUE     |                   |          |
|                              +===============+                   |          |
|                                      |                           |          |
|         | (3) SEND NOTIFICATION      | (4) Data Available        |          |
|         |     (CV Signal)            +-------------------------->|          |
|         v                                                        |          |
|       ( ! ) - - - - - - - - - - - - - - - - - - - - - - - - - > ( ! )       |
|                                                                  |          |
|                                                      (5) Wake Up & Read     |
|                                                                  v          |
|                                                           +-------------+   |
|                                                           |SOUND BACKEND|   |
|                                                           +-------------+   |
|                                                                             |
+-----------------------------------------------------------------------------+

For such implementation of threading model, I would like to make some remarks:

Using CV (Conditional Variable) will be suffice for lightweight locking.
As the driver always sends PCM frames (the only exception is the end of the stream) with a whole period size (period_bytes, specifically), the produce notifies the consumer once it receives a whole period size).
As the PCM frames are sent at the same time in PREPARE and START state, using multi-threading instead some kind of lock-free design in DPDK ⁴ reduce the complexity.
For thread implementaion, I choose PThreads as we have the needs of cross-platform compatibility.

Other Necessary Works

As the interrupt foundation of semu is MMIO, I add some configurations in the dts of semu:

snd0: virtio@4700000 {
    compatible = "virtio,mmio";
    reg = <0x4700000 0x200>;
    interrupts = <5>;
};

Limitation

ALSA relies on system timer. However, semu currently has some issues of timer, which lets ALSA in guest OS stops sending any PCM frames after a period time.

Yet, there exists a chance to play the entire sound by adjust the buffer size. For instance, I have tried by setting the buffer size to eight times of period size and the sound plays to the end (with some repeating artifacts, though).

Wrap Up

This post depicts the implementation of a VirtIO sound device playback. It not only becomes the very first implementation of on RISC-V, but also leaves a mark with the scarce-to-none resource of VirtIO sound device.

References

https://github.com/sysprog21/semu ↩
https://www.opensynergy.com/ ↩
https://wiki.archlinux.org/title/Advanced_Linux_Sound_Architecture ↩
https://doc.dpdk.org/guides/prog_guide/ring_lib.html ↩