Tag Archives: Projects

Device Reservation Spec

2009-02-26 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/device-reservation.html

The JACK folks and I have agreed on a little specification for device
reservation that allows clean hand-over of audio device access from
PulseAudio to JACK and back. The specification is generic enough to allow
locking/hand-over of other device types as well, not just audio cards. So, in
case someone needs to implement a similar kind of locking/handover for any kind of resource here’s some
prior art you can base your work on. Given that HAL is supposed to go away
pretty soon this might be an option for a replacement for HAL’s current device
locking. The logic is as simple as it can get. Whoever owns a certain service name on
the D-Bus session bus owns the device access. For further details, read the spec.

There’s even a reference
implementation available, which both JACK2 and PulseAudio have now
integrated.

Also known as PAX SOUND SERVERIS.

Having fun with bzr

2009-02-25 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/bizarre-fun.html

#nocomments y

So I wanted to hack proper channel mapping query support into libsndfile, something I have had
on my TODO list for years. The first step was to find the source code
repository for it. That was easy. Alas the VCS used is bzr. There are some
very vocal folks on the Internet who claim that the bzr user interface is
stupendously easy to use in contrast to git which apparantly is the very
definition of complexity. And if it is stated on the Internet it must be true.
I think I mastered git quite well, so yeah, checking out the sources with bzr
can’t be that difficult for my limited brain capacity.

So let’s do what Erik suggests for checking out the sources:

$ bzr get http://www.mega-nerd.com/Bzr/libsndfile-pub/

Calling this I get a nice percentage counter that starts at 0% and ends at, … uh, 0%. That gives me a real feeling of progress. It takes a while, and then I get an error:

bzr: ERROR: Not a branch: "http://www.mega-nerd.com/Bzr/libsndfile-pub/".

Now that’s a useful error message. They even include an all-caps word! I guess that error message is right — it’s not a branch, it is a repository. Or is it not?

So what do we do about this? Maybe get is not actually the right verb. Let’s try to play around a bit. Let’s use the verb I use to get sources with in git:

$ bzr clone http://www.mega-nerd.com/Bzr/libsndfile-pub/

Hmm, this results in exactly same 0% to 0% progress counter, and the same useless error message.

Now I remember that bzr is actually more inspired by Subversion’s UI than by git’s, so let’s try it the SVN way.

$ bzr checkout http://www.mega-nerd.com/Bzr/libsndfile-pub/

Hmm, and of course, I get exactly the same results again. A counter that counts from 0% to 0% and the same useless error message.

Ok, maybe that error is bzr’s standard reply? Let’s check this out:

$ bzr waldo http://www.mega-nerd.com/Bzr/libsndfile-pub/
bzr: ERROR: unknown command "waldo"

Apparently not. bzr actually knows more than one error message.

Ok, I admit doing this by trial-and-error is a rather lame approach. RTFM! So let’s try this.

$ man bzr-get
No manual entry for bzr-get

Ouch. No man page? How awesome. Ah, wait, maybe they have only a single unreadable mega man page for everything. Let’s try this:

$ man bzr

Wow, this actually worked. Seems to list all commands. Now let’s look for the help on bzr get:

/bzr get
Pattern not found  (press RETURN)

Hmm, no documentation for their most important command? That’s weird! Ok, let’s try it again with our git vocabulary:

/bzr clone
Pattern not found  (press RETURN)

Ok, this not funny anymore. Apparently the verbs are listed in alphabetical order.
So let’s browse to the letter g as in get. However it doesn’t
exist. There’s bzr export, and then the next entry is bzr help (Oh, irony!) — but no get in-between.

Ok, enough of this shit. Maybe the message wants to tell us that the repo
actually doesn’t exist (even though it confusingly calls it a “branch”). Let’s
go back to the original page at Erik’s site and read things again. Aha, the
“main archive archive can be found at (yes, the directory looks empty, but
it isn’t): http://www.mega-nerd.com/Bzr/libsndfile-pub/“.
Hmm, indeed — that URL looks very empty when it is accessed. How weird though
that in bzr a repo is an empty directory!

And at this point I gave up and downloaded the tarball to make my patches
against. I have still not managed to check out the sources from the repo.
Somehow I get the feeling the actual repo really isn’t available anymore under that address.

So why am I blogging about this? Not so much to start another flamefest, to
nourish the fanboys, nor because it is so much fun to bash other people’s work or
simply to piss people off. It’s more for two reasons:

Firstly, simply to make
the point that folks can claim a thousand times that git’s UI sucks and bzr’s
UI is awesome. It’s simply not true. From what I experienced it is not the
tiniest bit better. The error messages useless, the documentation incomplete,
the interfaces surprising and exactly as redundant as git’s. The only
effective difference I noticed is that it takes a bit longer to show those
error messages with bzr — the Python tax. To summarize this more positively: git excels as much as bzr does. Both’ documentation, their error messages and their user interface are the best in their class. And they have all the best chances for future improvement.

And the second reason of course is that I’d still like to know what the correct way to get the sources is. But for that I should probably ask Erik himself.

Generating Copyright Headers from git History

2009-02-21 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/copyright.html

Here’s a little a little tool I
wrote that automatically generates copyright headers for source files in a
git repository based on the git history.

Run it like this:

~/projects/pulseaudio$ copyright.py src/pulsecore/sink.c src/pulsecore/core-util.c

And it will give you this:

File: src/pulsecore/sink.c
	Copyright 2004, 2006-2009 Lennart Poettering
	Copyright 2006-2007 Pierre Ossman
	Copyright 2008-2009 Marc-Andre Lureau
File: src/pulsecore/core-util.c
	Copyright 2004, 2006-2009 Lennart Poettering
	Copyright 2006-2007 Pierre Ossman
	Copyright 2008 Stelian Ionescu
	Copyright 2009 Jared D. McNeill
	Copyright 2009 Marc-Andre Lureau

This little script could use love from a friendly soul to make it crawl entire source trees and patch in appropriate copyright headers. Anyone up for it?

Tagging Audio Streams

2009-02-21 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/tagging-audio.html

So you are hacking an audio application and the audio data you are
generating might eventually end up in PulseAudio before it is played. If that’s the case then please make sure
to read this!

Here’s the quick summary for Gtk+ developers:

PulseAudio can enforce all kinds of policy on sounds. For example, starting
in 0.9.15, we will automatically pause your media player while a phone call is
going on. To implement this we however need to know what the stream you
are sending to PulseAudio should be categorized as: is it music? Is it a
movie? Is it game sounds? Is it a phone call stream?

Also, PulseAudio would like to show a nice icon and an application name next
to each stream in the volume control. That requires it to be able to deduce
this data from the stream.

And here’s where you come into the game: please add three lines like the
following next to the beginning of your main() function to your Gtk+
application:

...
g_set_application_name(_("Totem Movie Player"));
gtk_window_set_default_icon_name("totem");
g_setenv("PULSE_PROP_media.role", "video", TRUE);
...

If you do this then the PulseAudio client libraries will be able to figure out the rest for you.

There is more meta information (aka “properties”) you can set for your application or for your streams that is useful to PulseAudio. In case you want to know more about them or you are looking for equivalent code to the above example for non-Gtk+ applications, make sure to read the mentioned page.

Thank you!

Oh, and even if your app doesn’t do audio, calling g_set_application_name() and gtk_window_set_default_icon_name() is always a good idea!

How to Version D-Bus Interfaces Properly and Why Using / as Service Entry Point Sucks

2009-02-11 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/versioning-dbus.html

So you are designing a D-Bus interface and want to make it future-proof. Of
course, you thought about versioning your stuff. But you wonder how to do that
best. Here are a few things I learned about versioning D-Bus APIs which might
be of general interest:

Version your interfaces! This one is pretty obvious. No explanation
needed. Simply include the interface version in the interface name as suffix.
i.e. the initial release should use org.foobar.AwesomeStuff1, and if
you do changes you should introduce org.foobar.AwesomeStuff2, and so
on, possibly dropping the old interface.

When should you bump the interface version? Generally, I’d recommend only
bumping when doing incompatible changes, such as function call signature
changes. This of course requires clients to handle the
org.freedesktop.DBus.Error.UnknownMethod error properly for each function
you add to an existing interface. That said, in a few cases it might make sense
to bump the interface version even without breaking compatibility of the calls.
(e.g. in case you add something to an interface that is not directly visible in
the introspection data)

Version your services! This one is almost as obvious. When you
completely rework your D-Bus API introducing a new service name might be a
good idea. Best way to do this is by simply bumping the service name. Hence,
call your service org.foobar.AwesomeService1 right from the beginning
and then bump the version if you reinvent the wheel. And don’t forget that you
can acquire more than one well-known service name on the bus, so even if you
rework everything you can keep compatibilty. (Example: BlueZ 3 to BlueZ 4 switch)

Version your ‘entry point’ object paths! This one is far from
obvious. The reasons why object paths should be versioned are purely technical,
not philosophical: for signals sent from a service D-Bus overwrites the
originating service name by the unique name (e.g. :1.42) even if you
fill in a well-known name (e.g. org.foobar.AwesomeService1). Now,
let’s say your application registers two well-known service names, let’s say
two versions of the same service, versioned like mentioned above. And you have
two objects — one on each of the two service names — that implement a generic
interface and share the same object path: for the client there will be no way
to figure out to which service name the signals sent from this object path
belong. And that’s why you should make sure to use versioned and hence
different paths for both objects. i.e. start with
/org/foobar/AwesomeStuff1 and then bump to
/org/foobar/AwesomeStuff2 and so on. (Also see David’s comments about this.)

When should you bump the object path version? Probably only when you
bump the service name it belongs to. Important is to version the ‘entry point’
object path. Objects below that don’t need explicit versioning.

In summary: For good D-Bus API design you should version all three: D-Bus interfaces, service names and ‘entry point’ object paths.

And don’t forget: nobody gets API design right the first time. So even if
you think your D-Bus API is perfect: version things right from the beginning
because later on it might turn out you were not quite as bright as you thought
you were.

A corollary from the reasoning behind versioning object paths as described
above is that using / as entry point object path for your service is a
bad idea. It makes it very hard to implement more than one service or service
version on a single D-Bus connection. Again: Don’t use / as entry
point object path. Use something like /org/foobar/AwesomeStuff!

Writing Volume Control UIs is Hard

2009-02-10 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/writing-volume-control-uis.html

Writing modern volume control UIs (i.e. ‘mixer tools’) is much harder to get
right than it might appear at first. Because that is the way it is I’ve put
together a rough
guide what to keep in mind when writing them for PulseAudio. Originally
just intended to be a bit of help for the gnome-volume-control guys I believe
this could be an interesting read for other people as well.

It touches a lot of topics: volumes in general, how to present them,
what to present, base volumes, flat volumes, what to do about multichannel
volumes, controlling clients, controlling cards, handling default devices,
saving/restoring volumes/devices, sound event sliders, how to monitor PCM and
more.

So make sure to give it at least a quick peek! If you plan to write a volume
control for ncurses or KDE (hint, hint!) even more so, it’s a must read.

Maybe this might also help illustrating why I think that abstracting volume
control interfaces inside of abstraction layers such as Phonon or GStreamer is
doomed to fail, and just not even worth the try.

And now, without further ado I give you ‘Writing Volume Control UIs’.

Oh Nine Fifteen

2009-02-10 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/oh-nine-fifteen.html

Last week I’ve released a
test version for the upcoming 0.9.15 release of PulseAudio. It’s going to be a major one,
so here’s a little overview what’s new from the user’s perspective.

Flat Volumes

Based on code originally contributed by Marc-André Lureau we now
support Flat Volumes. The idea behind flat volumes has been
inspired by how Windows Vista handles volume control: instead of
maintaining one volume control per application stream plus one device
volume we instead fix the device volume automatically to the “loudest”
application stream volume. Sounds confusing? Actually it’s right the contrary, it feels pretty natural and
easy to use and brings us a big step forward to reduce a bit the
number of volume sliders in the entire audio pipeline from the application to what you hear.

The flat volumes logic only applies to devices where we know the
actual multiplication factor of the hardware volume slider. That’s
most devices supported by the ALSA kernel drivers except for a few
older devices and some cheap USB hardware that exports invalid
dB information.

On-the-fly Reconfiguration of Devices (aka “S/PDIF Support”)

PulseAudio will now automatically probe all possible combinations
of configurations how to use your sound card for playback and
capturing and then allow on-the-fly switching of the
configuration. What does that mean? Basically you may now switch
beetween “Analog Stereo”, “Digital S/PDIF Stereo”, “Analog Surround
5.1” (… and so on) on-the-fly without having to reconfigure PA on
the configuration file level or even having to stop your streams. This
fixes a couple of issues PA had previously, including proper SPDIF
support, and per-device configuration of the channel map of
devices.

Unfortunately there is no UI for this yet, and hence you need to
use pactl/pacmd on the command line to switch between the
profiles. Typing list-cards in pacmd will tell you
which profiles your card supports.

In a later PA version this functionality will be extended to also
allow input connector switching (i.e. microphone vs. line-in) and
output connector switching (i.e. internal speakers vs. line-out)
on-the-fly.

Native support for 24bit samples

PA now supports 24bit packed samples as well as 24bit stored in
the LSBs of 32bit integers natively. Previously these formats were
always converted into 32bit MSB samples.

Airport Express Support

Colin Guthrie contributed native Airport Express support. This will
make the RAOP
audio output of ApEx routers appear like local sound devices
(unfortunately sound devices with a very long latency), i.e. any
application connecting to PulseAudio can output audio to ApEx devices
in a similar way to how iTunes can do it on MacOSX.

Before you ask: it is unlikely that we will ever make PulseAudio be
able to act as an ApEx compatible device that takes connections from
iTunes (i.e. becoming a RAOP server instead of just an RAOP client).
Apple has an unfriendly attitude of dongling their devices to their
applications: normally iTunes has to cryptographically authenticate
itself to the device and the device to iTunes. iTunes’ key has been
recovered by the infamous Jon Lech
Johansen, but the device key is still unknown. Without that key it
is not realistically possible to disguise PA as an ApEx.

Other stuff

There have been some extensive changes to natively support
Bluetooth audio devices well by directly accessing BlueZ. This code
was originally contributed by the GSoC student João Paulo Rechi
Vita. Initially, 0.9.15 was intended to become the version were BT audio
just works. Unfortunately the kernel is not really up to that yet, and
I am not sure everything will be in place so that 0.9.15 will ship
with well working BT support.

There have been a lot of internal changes and API additions. Most of
these however are not visible to the user.

Pascal,

2009-01-27 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/pascal-terjan.html

replacing integral parts of a system is always a bit of a dilemma. If we
replace it only after all the other software/drivers that interface with it is known
to work well with it then nobody will bother doing all that compatbility work
since they can say “Nobody uses it yet, so why should I bother?” — and hence
the change can never take place.

If we replace it before everything works perfectly well with it, then folks
will complain: “Oh my god, it doesn’t work with my software/drivers, you suck!” — like you just did (though in more polite words).

Hence regardless which way we do it we will do it the wrong way. Biting the
bullet and doing the change is however still the better, the only path to
improvement. With the limited amount of manpower we have pushing things out
knowing that there is some software/drivers that don’t work well with it is our only
option — especially if the software in question is unfixable by us since it is
closed source.

Hence, if we’d do as you wish and not make the distributions adopt
PulseAudio right now we can forget about fixing audio on Linux entirely and it
will stagnate forever.

As mentioned by J5 this was the same story with D-Bus, HAL, with udev, and other stuff.

And again, folks may claim that PulseAudio is very buggy. While it certainly
has bugs, like every software has, most of the issues reported are not things
we can or should fix/work-around in PulseAudio, but that are in other layers of
the system. In ALSA, in the drivers, in the client applications. However only
PA makes them become visible since it depends on a lot more functionality to
work properly than any other program before. And quite frankly we use a lot of stuff exactly nobody has used
before and that of course was broken due that (in ALSA as one example).

Having said all this. Just pointing to other folks to blame doesn’t really
solve the problem. I did a lot of testing on different sound chips, making
sure PulseAudio works fine on them. Of course it’s a limited testing set (six
cards right now to be exact, a seventh model currently being sent to me by my
employer, Red Hat.). The list of cards that are currently known to be
problematic are listed
in our Wiki.

I am not saying that the points you make are rubbish. However, please see the big picture before getting vocal about it.

Automatic Backtrace Generation

2008-10-30 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/automatic-backtrace.html

Ubuntu has Apport. Fedora has nothing. That sucks big time.

Here’s the result of a few minutes of hacking up something similar to Apport based on the awesome (and much underused) Frysk debugging tool kit. It doesn’t post any backtraces on any Internet servers and has no fancy UI — but it automatically dumps a stacktrace of every crashing process on the system to syslog and stores all kinds of data in /tmp/core.*/ for later inspection.

#!/bin/bash
set -e
export PATH=/sbin:/bin:/usr/sbin:/usr/bin
DIR="/tmp/core.$1.$2"
umask 077
mkdir "$DIR"
cat > "$DIR/core"
exec &> "$DIR/dump.log"
set +e
echo "$1" > "$DIR/pid"
echo "$2" > "$DIR/timestamp"
echo "$3" > "$DIR/uid"
echo "$4" > "$DIR/gid"
echo "$5" > "$DIR/signal"
echo "$6" > "$DIR/hostname"
set -x
fauxv "$DIR/core" > "$DIR/auxv"
fexe "$DIR/core" > "$DIR/exe"
fmaps "$DIR/core" > "$DIR/maps"
PKGS=`/usr/bin/fdebuginfo "$DIR/core" | grep "\-\-\-" | cut -d ' ' -f 1 | sort | uniq | grep '^/'| xargs rpm -qf | sort | uniq`
[ "x$PKGS" != x ] && debuginfo-install -y $PKGS
fstack -rich "$DIR/core" > "$DIR/fstack"
set +x
(
	echo "Application `cat "$DIR/exe"` (pid=$1,uid=$3,gid=$4) crashed with signal $5."
	echo "Stack trace follows:"
	cat "$DIR/fstack"
	echo "Auxiliary vector:"
	cat "$DIR/auxv"
	echo "Maps:"
	cat "$DIR/maps"
	echo "For details check $DIR"
) | logger -p local6.info -t "frysk-core-dump-$1"

Copy that into a file $SOMEWHERE/frysk-core-dump. Then do a chmod +x $SOMEWHERE/frysk-core-dump and a chown root:root $SOMEWHERE/frysk-core-dump. Now, tell the kernel that core dumps should be handed to this script:

# echo "|$SOMEWHERE/frysk-core-dump %p %t %u %g %s %h" > /proc/sys/kernel/core_pattern

Finally, increase RLIMIT_CORE to actually enable core dumps. ulimit -c unlimited is a good idea. This will enable them only for your shell and
everything it spawns. In /etc/security/limits.conf you can enable
them for all users. I haven’t found out yet how to enable them globally
in Fedora though, i.e. for every single process that is started after boot including system daemons.

You can test this with running sleep 4711 and then dumping core with C-\. The stacktrace should appear right-away in /var/log/messages.

This script will automatically try to install the debugging symbols for the crashing application via yum. In some cases it hence might take a while until the backtrace appears in syslog.

Don’t forget to install Frysk before trying this script!

You can’t believe how useful this script is. Something crashed and the backtrace is already waiting for you! It’s a bugfixer’s wet dream.

I am a bit surprised though that noone else came up with this before me. Or maybe I am just too dumb to use Google properly?

People of the Free World [1]!

2008-10-22 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/free-sound-themes.html

GNOME 2.24 supports XDG
sound themes. Unfortunately however right now there is only a single sound
theme in existence: the sound-theme-freedesktop
— which is pretty basic.

Help us change this! There are many web sites like art.gnome.org which provide a large selection
of graphical themes for Gtk+, Metacity, icon sets and so on. We want to see a similarly large selection of sound themes available! And we’d like you to contribute to this!

How do you prepare sound themes? Read the XDG Sound Theming
and the XDG Sound
Naming specifications. Start with basing your work on the aforementioned sound-theme-freedesktop.
And then just go ahead!

Please note that only subset of the sounds listed in the Sound Naming
Specification is currently hooked up properly — i.e. generated when “input
feedback” is enabled or triggered by applications. Nonetheless it makes sense
to include them in your theme, because eventually they will be hooked up.

When you put a theme together, make sure that you only select sounds that
have a sensible Free Software license — or if you have produced them yourself
you pick a good license yourself. GPLv2+, LGPLv2+, CC-BY-SA 3.0 and CC-BY 3.0
are good choices.

Not everyone is as lucky as Richard Hughes and has a mom who is
practically an endless source of special effect sounds. If your mom sucks then
don’t despair! The OLPC team has compiled a huge set of Free sounds
that is waiting to be made an XDG sound theme. I am eagerly looking forward to your sound
themes that make use of “The Berklee Sampling
Archive – Volume 13 – synthesizer – fx (126 samples) spaceships, lasers,
explosions, machineguns, glisses” to start a war in space each time you
click a button on your screen!^[1]

Footnotes

[1] Free as in free desktops that is.

[2] OK, to be honest I am not actually that eagerly looking forward to that. Spacewar-at-your-fingertips is pretty lame in comparison to a theme called “Richard’s Mom”^[3].

[3] You have no idea what all those Hughsie’s-Mom-jokes are about? Then listen to the sound files that are shipped with gnome-power-manager!

Responses to my Audio API Guide

2008-09-26 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/guide-to-sound-apis-followup.html

My Audio API guide got quite a few responses.

The Good

Takashi
likes it. And so
does David. Which is great because both are key people in the Linux
multimedia community.

It made it to LWN. I sincerely
and humbly hope this is not going to stay the only news site picking this up.
😉

The safe ALSA part of the recommendations will most likely be added to the ALSA documentation soon. The GNOME-relevent part I will be adding to the GNOME platform overview.

The Bad

Aaron
basically likes it, although he appears disappointed that KDE’s and Qt’s
Phonon wasn’t mentioned more positively. Aaron is very fair in his criticism.
Nonetheless I don’t think it is valid. My guide is not a list of alternatives.
It’s a list of recommendations. My recommendations. I do believe that my
recommendations very much match the mainstream of the opinions of the key
people in Linux multimedia and desktop audio. Of course I don’t nearly know
everyone of the key hackers in Linux multimedia. But I do know most of those
who are actively interested in collaboration, whose projects have a lot
mindshare and who attend the conferences that matter for Linux desktop audio.

Also see Christian’s comments on Aaron’s post.

The Ugly

It wasn’t my intention to start another GNOME-vs.-KDE flamefest.
Unfortunately a lot of people took this as great opportunity to troll at the
various blog comment forums. I guess it is inevitable that some of those whose
favourite software is not listed on a recommendation guide like this start to
clamour about that. It’s a pity not everyone who thinks I am treating KDE
unfairly criticises that as fairly and reasonable as Aaron. Anyway, I humbly take this as a sign that
people do consider this guide to be relevant and much needed. 😉

Everybody Loves Pretty Graphics

2008-09-25 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/the-linux-audio-stack.html

As kind of a followup to my Guide to Linux
Sound APIs here’re some pretty graphics I just drew. (At least “pretty” to
the degree of my limited drawing abilities). It’s a block diagram depicting the
Linux audio stack. A lot of people already drew something similar, and often
enough the result was horribly complicated and — in its conclusion
disappointing. So, here’s my try:

Linux Audio Stack

The components interface each other across the horizontal lines. The
vertical lines seperate unrelated components. The drawing only includes
modern, supported APIs and systems as described in the aforementioned blog
article. It (hopefully) shows that things in the Linux audio world are not
all that bad at all and we have workable answers for most questions without
too much complexity, although they might not entirely make everyone overly
happy.

In an outburst of bias I completely ommited KDE-specific technologies from
this drawing. I guess even if I would have included them it’d be called biased
anyway, so why bother? Also, they would have distracted the reader and complicated the
drawing considerably due to KDE’s affection for pluggable backends. So: if you
care about KDE, please ignore this diagram.

A Guide Through The Linux Sound API Jungle

2008-09-24 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/guide-to-sound-apis.html

At the Audio MC at the Linux Plumbers Conference one
thing became very clear: it is very difficult for programmers to
figure out which audio API to use for which purpose and which API not
to use when doing audio programming on Linux. So here’s my try to
guide you through this jungle:

What do you want to do?

I want to write a media-player-like application!: Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)
I want to add event sounds to my application!: Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)
I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!: Use JACK and/or the full ALSA interface.
I want to do basic PCM audio playback/capturing!: Use the safe ALSA subset.
I want to add sound to my game!: Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.
I want to write a mixer application!: Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.
I want to write audio software for the plumbing layer!: Use the full ALSA stack.
I want to write audio software for embedded applications!: For technical appliances usually the safe ALSA subset is a good choice, this however depends highly on your use-case.

You want to know more about the different sound APIs?

GStreamer: GStreamer is the de-facto
standard media streaming system for Linux desktops. It supports decoding and
encoding of audio and video streams. You can use it for a wide range of
purposes from simple audio file playback to elaborate network
streaming setups. GStreamer supports a wide range of CODECs and audio
backends. GStreamer is not particularly suited for basic PCM playback
or low-latency/realtime applications. GStreamer is portable and not
limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [Programming Manuals and References]
libcanberra: libcanberra
is an abstract event sound API. It implements the XDG
Sound Theme and Naming Specifications. libcanberra is a blessed
GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be
used with other desktop environments as well. In addition to an easy
interface for playing sound files, libcanberra provides caching
(which is very useful for networked thin clients) and allows passing
of various meta data to the underlying audio system which then can be
used to enhance user experience (such as positional event sounds) and
for improving accessibility. libcanberra supports multiple backends
and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [API Reference]
JACK: JACK is a sound system for
connecting professional audio production applications and hardware
output. It’s focus is low-latency and application interconnection. It
is not useful for normal desktop or embedded use. It is not an API
that is particularly useful if all you want to do is simple PCM
playback. JACK supports multiple backends, although ALSA is best
supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [API Reference]
Full ALSA: ALSA is the Linux API
for doing PCM playback and recording. ALSA is very focused on
hardware devices, although other backends are supported as well (to a
limit degree, see below). ALSA as a name is used both for the Linux
audio kernel drivers and a user-space library that wraps these. ALSA — the library — is
comprehensive, and portable (to a limited degree). The full ALSA API
can appear very complex and is large. However it supports almost
everything modern sound hardware can provide. Some of the
functionality of the ALSA API is limited in its use to actual hardware
devices supported by the Linux kernel (in contrast to software sound
servers and sound drivers implemented in user-space such as those for
Bluetooth and FireWire audio — among others) and Linux specific
drivers. [API
Reference]
Safe ALSA: Only a subset of the full ALSA API works on all backends ALSA
supports. It is highly recommended to stick to this safe subset
if you do ALSA programming to keep programs portable, future-proof and
compatible with sound servers, Bluetooth audio and FireWire audio. See
below for more details about which functions of ALSA are considered
safe. The safe ALSA API is a suitable abstraction for basic,
portable PCM playback and recording — not just for ALSA kernel driver
supported devices. Among the supported backends are ALSA kernel driver
devices, OSS, PulseAudio, JACK.
Phonon and KNotify: Phonon is high-level
abstraction for media streaming systems such as GStreamer, but goes a
bit further than that. It supports multiple backends. KNotify is a
system for “notifications”, which goes beyond mere event
sounds. However it does not support the XDG Sound Theming/Naming
Specifications at this point, and also doesn’t support caching or
passing of event meta-data to an underlying sound system. KNotify
supports multiple backends for audio playback via Phonon. Both APIs
are KDE/Qt specific and should not be used outside of KDE/Qt
applications. [Phonon API Reference] [KNotify API Reference]
SDL: SDL is a portable API
primarily used for full-screen game development. Among other stuff it
includes a portable audio interface. Among others SDL support OSS,
PulseAudio, ALSA as backends. [API Reference]
PulseAudio: PulseAudio is a sound system
for Linux desktops and embedded environments that runs in user-space
and (usually) on top of ALSA. PulseAudio supports network
transparency, per-application volumes, spatial events sounds, allows
switching of sound streams between devices on-the-fly, policy
decisions, and many other high-level operations. PulseAudio adds a glitch-free
audio playback model to the Linux audio stack. PulseAudio is not
useful in professional audio production environments. PulseAudio is
portable beyond Linux. PulseAudio has a native API and also supports
the safe subset of ALSA, in addition to limited,
LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports
OSS and ALSA as backends and provides connectivity to JACK. [API
Reference]
OSS: The Open Sound System is a
low-level PCM API supported by a variety of Unixes including Linux. It
started out as the standard Linux audio system and is supported on
current Linux kernels in the API version 3 as OSS3. OSS3 is considered
obsolete and has been fully replaced by ALSA. A successor to OSS3
called OSS4 is available but plays virtually no role on Linux and is
not supported in standard kernels or by any of the relevant
distributions. The OSS API is very low-level, based around direct
kernel interfacing using ioctl()s. It it is hence awkward to use and
can practically not be virtualized for usage on non-kernel audio
systems like sound servers (such as PulseAudio) or user-space sound
drivers (such as Bluetooth or FireWire audio). OSS3’s timing model
cannot properly be mapped to software sound servers at all, and is
also problematic on non-PCI hardware such as USB audio. Also, OSS does
not do sample type conversion, remapping or resampling if
necessary. This means that clients that properly want to support OSS
need to include a complete set of converters/remappers/resamplers for
the case when the hardware does not natively support the requested
sampling parameters. With modern sound cards it is very common to
support only S32LE samples at 48KHz and nothing else. If an OSS client
assumes it can always play back S16LE samples at 44.1KHz it will thus
fail. OSS3 is portable to other Unix-like systems, various differences
however apply. OSS also doesn’t support surround sound and other
functionality of modern sounds systems properly. OSS should be
considered obsolete and not be used in new applications. ALSA and
PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [Programming Guide]

All sound systems and APIs listed above are supported in all
relevant current distributions. For libcanberra support the newest
development release of your distribution might be necessary.

All sound systems and APIs listed above are suitable for
development for commercial (read: closed source) applications, since
they are licensed under LGPL or more liberal licenses or no client
library is involved.

You want to know why and when you should use a specific sound API?

GStreamer: GStreamer is best used for very high-level needs: i.e. you want to
play an audio file or video stream and do not care about all the tiny
details down to the PCM or codec level.
libcanberra: libcanberra is best used when adding sound feedback to user input
in UIs. It can also be used to play simple sound files for
notification purposes.
JACK: JACK is best used in professional audio production and where interconnecting applications is required.
Full ALSA: The full ALSA interface is best used for software on “plumbing layer” or when you want to make use of very specific hardware features, which might be need for audio production purposes.
Safe ALSA: The safe ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.
Phonon and KNotify: Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.
SDL: SDL is best used in full-screen games.
PulseAudio: For now, the PulseAudio API should be used only for applications
that want to expose sound-server-specific functionality (such as
mixers) or when a PCM output abstraction layer is already available in
your application and it thus makes sense to add an additional backend
to it for PulseAudio to keep the stack of audio layers minimal.
OSS: OSS should not be used for new programs.

You want to know more about the safe ALSA subset?

Here’s a list of DOS and DONTS in the ALSA API if you care about
that you application stays future-proof and works fine with
non-hardware backends or backends for user-space sound drivers such as
Bluetooth and FireWire audio. Some of these recommendations apply for
people using the full ALSA API as well, since some functionality
should be considered obsolete for all cases.

If your application’s code does not follow these rules, you must have
a very good reason for that. Otherwise your code should simply be considered
broken!

DONTS:

Do not use “async handlers”, e.g. via
snd_async_add_pcm_handler() and friends. Asynchronous
handlers are implemented using POSIX signals, which is a very
questionable use of them, especially from libraries and plugins. Even
when you don’t want to limit yourself to the safe ALSA subset
it is highly recommended not to use this functionality. Read
this for a longer explanation why signals for audio IO are
evil.
Do not parse the ALSA configuration file yourself or with
any of the ALSA functions such as snd_config_xxx(). If you
need to enumerate audio devices use snd_device_name_hint()
(and related functions). That
is the only API that also supports enumerating non-hardware audio
devices and audio devices with drivers implemented in userspace.
Do not parse any of the files from
/proc/asound/. Those files only include information about
kernel sound drivers — user-space plugins are not listed there. Also,
the set of kernel devices might differ from the way they are presented
in user-space. (i.e. sub-devices are mapped in different ways to
actual user-space devices such as surround51 an suchlike.
Do not rely on stable device indexes from ALSA. Nowadays
they depend on the initialization order of the drivers during boot-up
time and are thus not stable.
Do not use the snd_card_xxx() APIs. For
enumerating use snd_device_name_hint() (and related
functions). snd_card_xxx() is obsolete. It will only list
kernel hardware devices. User-space devices such as sound servers,
Bluetooth audio are not included. snd_card_load() is
completely obsolete in these days.
Do not hard-code device strings, especially not
hw:0 or plughw:0 or even dmix — these devices define no channel
mapping and are mapped to raw kernel devices. It is highly recommended
to use exclusively default as device string. If specific
channel mappings are required the correct device strings should be
front for stereo, surround40 for Surround 4.0,
surround41, surround51, and so on. Unfortunately at
this point ALSA does not define standard device names with channel
mappings for non-kernel devices. This means default may only
be used safely for mono and stereo streams. You should probably prefix
your device string with plug: to make sure ALSA transparently
reformats/remaps/resamples your PCM stream for you if the
hardware/backend does not support your sampling parameters
natively.
Do not assume that any particular sample type is supported
except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE,
FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.
Do not use snd_pcm_avail_update() for
synchronization purposes. It should be used exclusively to query the
amount of bytes that may be written/read right now. Do not use
snd_pcm_delay() to query the fill level of your playback
buffer. It should be used exclusively for synchronisation
purposes. Make sure you fully understand the difference, and note that
the two functions return values that are not necessarily directly
connected!
Do not assume that the mixer controls always know dB information.
Do not assume that all devices support MMAP style buffer access.
Do not assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.
Do not try to recover with your own code from ALSA error conditions such as buffer under-runs. Use snd_pcm_recover() instead.
Do not touch buffering/period metrics unless you have
specific latency needs. Develop defensively, handling correctly the
case when the backend cannot fulfill your buffering metrics
requests. Be aware that the buffering metrics of the playback buffer
only indirectly influence the overall latency in many
cases. i.e. setting the buffer size to a fixed value might actually result in
practical latencies that are much higher.
Do not assume that snd_pcm_rewind() is available and works and to which degree.
Do not assume that the time when a PCM stream can receive
new data is strictly dependant on the sampling and buffering
parameters and the resulting average throughput. Always make sure to
supply new audio data to the device when it asks for it by signalling
“writability” on the fd. (And similarly for capturing)
Do not use the “simple” interface snd_spcm_xxx().
Do not use any of the functions marked as “obsolete”.
Do not use the timer, midi, rawmidi, hwdep subsystems.

DOS:

Use snd_device_name_hint() for enumerating audio devices.
Use snd_smixer_xx() instead of raw snd_ctl_xxx()
For synchronization purposes use snd_pcm_delay().
For checking buffer playback/capture fill level use snd_pcm_update_avail().
Use snd_pcm_recover() to recover from errors returned by any of the ALSA functions.
If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use snd_pcm_rewind() if you need to react to user input quickly.

FAQ

What about ESD and NAS?: ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.
ALSA isn’t portable!: That’s not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.
Portability is key to me! What can I do?: Unfortunately no truly portable (i.e. to Win32) PCM API is
available right now that I could truly recommend. The systems shown
above are more or less portable at least to Unix-like operating
systems. That does not mean however that there are suitable backends
for all of them available. If you care about portability to Win32 and
MacOS you probably have to find a solution outside of the
recommendations above, or contribute the necessary
backends/portability fixes. None of the systems (with the exception of
OSS) is truly bound to Linux or Unix-like kernels.
What about PortAudio?: I don’t think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it’s your choice.
Oh, why do you hate OSS4 so much?: I don’t hate anything or anyone. I just don’t think OSS4 is a
serious option, especially not on Linux. On Linux, it is also
completely redundant due to ALSA.
You idiot, you have no clue!: You are right, I totally don’t. But that doesn’t hinder me from recommending things. Ha!
Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!: Sorry, that’s not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.

Final Words

Of course these recommendations are very basic and are only intended to
lead into the right direction. For each use-case different necessities
apply and hence options that I did not consider here might become
viable. It’s up to you to decide how much of what I wrote here
actually applies to your application.

This summary only includes software systems that are considered
stable and universally available at the time of writing. In the
future I hope to introduce a more suitable and portable replacement
for the safe ALSA subset of functions. I plan to update this text
from time to time to keep things up-to-date.

If you feel that I forgot a use case or an important API, then
please contact me or leave a comment. However, I think the summary
above is sufficiently comprehensive and if an entry is missing I most
likely deliberately left it out.

(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)

Oh, and please syndicate this, digg it. I’d like to see this guide to be well-known all around the Linux community. Thank you!

My take on the Plumbers Conference

2008-09-22 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lpc-summary.html

I just came back from the Linux
Plumbers Conference. As some of you might know I was doing an MC about Audio
there. Don Marti attended the track and wrote up an interesting article over at LWN.
It’s a recommended read, including the immense number of comments it already
resulted in. (I will try to reply to all comments coming up, in case you have questions — just post them over at LWN)

I must really say though that calling that article “It’s a mess” and
highlighting my critical comments on the situation this way makes me feel
slightly uncomfortable, though. Sure, we have some issues to fix and it’s the
words I chose at the conference — but it’s only part of the story. Things are
not really all that bad, and we have enough good stuff to focus on.

I enjoyed LPC, and especially the audio MC a lot. The discussions during the
MC were lively, focussed and very enlightening. Much better than at others
conferences I have been to the information flow was two-ways: instead of just
having a speaker who talks about stuff and attendees that listen to them, here
all talks were very interactive — a lot of people in the audience had
something to say, and the others did benefit from it.

LPC organization was flawless, Portland is awesome. The food was good, too.
To summarize: I am happy, very happy! I look forward for another iteration next year and hope we’ll be able to
have an audio MC then, too.

LPC organizers: rock on! Takashi, Jonathan: thank you very much for your
participation in the Audio MC!

(If you are not subscribed to LWN but want to read the article linked above, ping me, I can hand out a few free links. Alternatively, wait for thursday and it will be available for free.)

Audio BoF

2008-09-18 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/audio-bod-lpc.html

To whom it may concern: there’ll be an Audio BoF tomorrow (Thu) at the Linux Plumbers Conference, starting at 4
pm. Dont miss it.

New libcanberra backends

2008-08-28 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/canberra-oh-eight.html

I released libcanberra 0.8 a
few hours ago. Biggest changes are some portability fixes for Solaris/FreeBSD,
inclusion of an OSS backend (contributed by Joe Marcus Clarke) and a
GStreamer backend (contributed by Marc-André Lureau). This will hopefully make
certain doubts regarding libcanberra void.

Oh, and libcanberra now has a homepage.

PulseAudio on Transifex

2008-08-28 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/pa-on-tx.html

Thanks to Dimitris Glezos
PulseAudio and its auxiliary tools are
now available on Fedora’s Transifex for
translation. If you want to contribute translations, please submit them via
Transifex, which will then result in direct commits to our upstream source
code repositories — without further delay or workload on my side. Submission via
other ways (bug report, mail …) will no longer be accepted.

Submit your translations now for
PulseAudio, for
the volume control, and for
the preferences dialog. And while we are at it, Avahi’s waiting
for your translations, too.

Scott,

2008-08-12 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/apple-development-platform.html

in
contrast to what you say the Apple audio stack (CoreAudio) is far less
streamlined that it might appear on first sight. The different APIs that make
up the Apple audio stack are far more redundant than you might think. Also,
they are different in programming style, and you can list at least as many
seperate components for different areas of audio with different API/naming
styles as you just did for the Linux audio stack.

Listing two components of the Linux audio stack that are considered
obsolete these days, and listing one item twice doesn’t really help making your
post unassailable.

Having said that, yes, our Linux audio stack is still chaotic,
redundant, badly documented and incomplete. You are very welcome to help fixing
this. But just doing a bit PR and sticking a single name on the sum of it all doesn’t
even touch the real problems we have with the audio APIs on Linux.

Free software development is in its very essence distributed. The fact that
our APIs sometimes appear a bit higgledy-piggledy is probably just an
inevitable consequence of this.

String Pools

2008-07-26 Lennart Poettering

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/string-pools.html

In part 2.4.3 of Ulrich Drepper’s excellent How To Write Shared
Libraries (which unfortunately is a bit out-of-date these days) Ulrich
suggests replacing arrays of constant strings by a single concatenated string
plus an index lookup table, to avoid unnecessary relocations during startup of
ELF programs. Maintaining this string pool is however troublesome,
it is hard to read and difficult to edit. In appendix B Ulrich
lists an example C excerpt which contains some code for simplifying the
maintaining of such strings pools, after an idea from Bruno Haible. In my
opinion however that suggestion is not that much simpler, and requires
splitting off the actual strings into a seperate source file. Ugly!

Some Free Software uses string pools to speed up relocation, e.g. GTK+.
Some development tools like gperf
contain support for string pools.

All solutions for string pool maintaining I could find on the Internet were not exactly
beautiful. Either they were completely manual, manual plus a validity checking
tool, or very very cumbersome. Googling around I was unable to find a satisfactory tool for this purpose^[1].

After Diego Petteno complained about
my heavy use of arrays of constant strings in libatasmart I sat down to
change the situation, and wrote strpool.c,
a simple parser for a very, very minimal subset of C, written in plain ANSI C.
It looks for two special comment markers /* %STRINGPOOLSTART% */ and
/* %STRINGPOOLSTOP% */, moves all immediate strings between those
markers into a common string pool and rewrites the input with the strings
replaced by indexes. Code accessing those strings must use the
special _P() macro. With these minimal changes to a
source file, passing it through strpool.c will automatically rewrite
it to a string-poolized version. The nice thing about this is that the
necessary changes in the source are minimal, and the code stays compilable with
and without passing it through the strpool.c preprocessor.

Here’s an example. First the original non-string-poolized version:

static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};

static int main(int argc, char* argv[]) {
	printf("%s\n", table[2]);
	return 1;
}

For later use with strpool.c we change this like this:

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	"waldo",
	"uxknurz",
	"foobar",
	"fubar"
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

When passed through strpool.c this will be rewritten as:

/* Saved 3 relocations, saved 0 strings (0 b) due to suffix compression. */
static const char _strpool_[] =
	"waldo\0"
	"uxknurz\0"
	"foobar\0"
	"fubar\0";
#ifndef STRPOOL
#define STRPOOL
#endif
#ifndef _P
#define _P(x) (_strpool_ + ((x) - (const char*) 1))
#endif

#ifndef STRPOOL
#define _P(x) x
#endif

/* %STRINGPOOLSTART% */
static const char* const table[] = {
	((const char*) 1),
	((const char*) 7),
	((const char*) 15),
	((const char*) 22)
};
/* %STRINGPOOLSTOP% */

static int main(int argc, char* argv[]) {
	printf("%s\n", _P(table[2]));
	return 1;
}

All three versions can be compiled directly with gcc. However, the version
that was passed through strpool.c compresses the number of
relocations for the table array from 4 to 1. Which isn’t much of a
difference, but the larger your tables are the more relevant the difference in
the number of necessary relocations gets.

A more realistic example is atasmart.c which after being preprocessed with strpool.c looks like this. In this specific example the number of necessary startup relocations goes down from > 100 to 9.

I am note sure if the parser is 100% correct, but it works fine with all sources I tried. It even does suffix compression like gcc does for normal strings too.