Tag Archives: streaming

AMF0/3 encoding/decoding in Node.JS

Post Syndicated from Delian Delchev original http://deliantech.blogspot.com/2014/06/amf03-encodingdecoding-in-nodejs.html

I am writing my own RTMP restreamer (RTMP is Adobe’s dying streaming protocol widely used with Flash) in Node.JS.Although, there are quite of few RTMP modules, no one is complete, nor operates with Node.JS buffers, nor support fully ether AMF0 or AMF3 encoding and decoding.So I had to write one on my own.The first module is the AMF0/AMF3 utils that allow me to encode or decode AMF data. AMF is a binary encoding used in Adobe’s protocols, very similar to BER (used in ITU’s protocols) but supporting complex objects. In general the goal of AMF is to encode ActiveScript objects into binary. As ActiveScript is a language belonging to the JavaScript’s familly, basically the ActiveScript’s objects are javascript objects (with the exception of some simplified arrays).My module is named node-amfutils and is now available in the public NPM repository as well as here https://github.com/delian/node-amfutilsIt is not fully completed nor very well tested as I have very limited environment to do the tests. However, it works for me and provides the best AMF0 and AMF3 support currently available for Node.JS – It can encode/decode all the objects defined in both AMF0 and AMF3 (the other AMF modules in the npm repository supports partial AMF0 or partial AMF3)It uses Node.JS buffers (it is not necessary to do string to buffer to string conversion, as you have to do with the other modules)It is easy to use this module. You just have to do something like this:var amfUtils = require(‘node-amfutils’);var buffer = amfUtils.amf0Encode([{ a: “xxx”},b: null]);

A Guide Through The Linux Sound API Jungle

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/guide-to-sound-apis.html

At the Audio MC at the Linux Plumbers Conference one
thing became very clear: it is very difficult for programmers to
figure out which audio API to use for which purpose and which API not
to use when doing audio programming on Linux. So here’s my try to
guide you through this jungle:

What do you want to do?

I want to write a media-player-like application!
Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)

I want to add event sounds to my application!
Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)

I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!
Use JACK and/or the full ALSA interface.

I want to do basic PCM audio playback/capturing!
Use the safe ALSA subset.

I want to add sound to my game!
Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.

I want to write a mixer application!
Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.

I want to write audio software for the plumbing layer!
Use the full ALSA stack.

I want to write audio software for embedded applications!
For technical appliances usually the safe ALSA subset is a good choice, this however depends highly on your use-case.

You want to know more about the different sound APIs?

GStreamer
GStreamer is the de-facto
standard media streaming system for Linux desktops. It supports decoding and
encoding of audio and video streams. You can use it for a wide range of
purposes from simple audio file playback to elaborate network
streaming setups. GStreamer supports a wide range of CODECs and audio
backends. GStreamer is not particularly suited for basic PCM playback
or low-latency/realtime applications. GStreamer is portable and not
limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [Programming Manuals and References]

libcanberra
libcanberra
is an abstract event sound API. It implements the XDG
Sound Theme and Naming Specifications
. libcanberra is a blessed
GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be
used with other desktop environments as well. In addition to an easy
interface for playing sound files, libcanberra provides caching
(which is very useful for networked thin clients) and allows passing
of various meta data to the underlying audio system which then can be
used to enhance user experience (such as positional event sounds) and
for improving accessibility. libcanberra supports multiple backends
and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [API Reference]

JACK

JACK is a sound system for
connecting professional audio production applications and hardware
output. It’s focus is low-latency and application interconnection. It
is not useful for normal desktop or embedded use. It is not an API
that is particularly useful if all you want to do is simple PCM
playback. JACK supports multiple backends, although ALSA is best
supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [API Reference]

Full ALSA

ALSA is the Linux API
for doing PCM playback and recording. ALSA is very focused on
hardware devices, although other backends are supported as well (to a
limit degree, see below). ALSA as a name is used both for the Linux
audio kernel drivers and a user-space library that wraps these. ALSA — the library — is
comprehensive, and portable (to a limited degree). The full ALSA API
can appear very complex and is large. However it supports almost
everything modern sound hardware can provide. Some of the
functionality of the ALSA API is limited in its use to actual hardware
devices supported by the Linux kernel (in contrast to software sound
servers and sound drivers implemented in user-space such as those for
Bluetooth and FireWire audio — among others) and Linux specific
drivers. [API
Reference
]

Safe ALSA

Only a subset of the full ALSA API works on all backends ALSA
supports. It is highly recommended to stick to this safe subset
if you do ALSA programming to keep programs portable, future-proof and
compatible with sound servers, Bluetooth audio and FireWire audio. See
below for more details about which functions of ALSA are considered
safe. The safe ALSA API is a suitable abstraction for basic,
portable PCM playback and recording — not just for ALSA kernel driver
supported devices. Among the supported backends are ALSA kernel driver
devices, OSS, PulseAudio, JACK.

Phonon and KNotify

Phonon is high-level
abstraction for media streaming systems such as GStreamer, but goes a
bit further than that. It supports multiple backends. KNotify is a
system for “notifications”, which goes beyond mere event
sounds. However it does not support the XDG Sound Theming/Naming
Specifications at this point, and also doesn’t support caching or
passing of event meta-data to an underlying sound system. KNotify
supports multiple backends for audio playback via Phonon. Both APIs
are KDE/Qt specific and should not be used outside of KDE/Qt
applications. [Phonon API Reference] [KNotify API Reference]

SDL

SDL is a portable API
primarily used for full-screen game development. Among other stuff it
includes a portable audio interface. Among others SDL support OSS,
PulseAudio, ALSA as backends. [API Reference]

PulseAudio

PulseAudio is a sound system
for Linux desktops and embedded environments that runs in user-space
and (usually) on top of ALSA. PulseAudio supports network
transparency, per-application volumes, spatial events sounds, allows
switching of sound streams between devices on-the-fly, policy
decisions, and many other high-level operations. PulseAudio adds a glitch-free
audio playback model to the Linux audio stack. PulseAudio is not
useful in professional audio production environments. PulseAudio is
portable beyond Linux. PulseAudio has a native API and also supports
the safe subset of ALSA, in addition to limited,
LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports
OSS and ALSA as backends and provides connectivity to JACK. [API
Reference
]

OSS

The Open Sound System is a
low-level PCM API supported by a variety of Unixes including Linux. It
started out as the standard Linux audio system and is supported on
current Linux kernels in the API version 3 as OSS3. OSS3 is considered
obsolete and has been fully replaced by ALSA. A successor to OSS3
called OSS4 is available but plays virtually no role on Linux and is
not supported in standard kernels or by any of the relevant
distributions. The OSS API is very low-level, based around direct
kernel interfacing using ioctl()s. It it is hence awkward to use and
can practically not be virtualized for usage on non-kernel audio
systems like sound servers (such as PulseAudio) or user-space sound
drivers (such as Bluetooth or FireWire audio). OSS3’s timing model
cannot properly be mapped to software sound servers at all, and is
also problematic on non-PCI hardware such as USB audio. Also, OSS does
not do sample type conversion, remapping or resampling if
necessary. This means that clients that properly want to support OSS
need to include a complete set of converters/remappers/resamplers for
the case when the hardware does not natively support the requested
sampling parameters. With modern sound cards it is very common to
support only S32LE samples at 48KHz and nothing else. If an OSS client
assumes it can always play back S16LE samples at 44.1KHz it will thus
fail. OSS3 is portable to other Unix-like systems, various differences
however apply. OSS also doesn’t support surround sound and other
functionality of modern sounds systems properly. OSS should be
considered obsolete and not be used in new applications. ALSA and
PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [Programming Guide]

All sound systems and APIs listed above are supported in all
relevant current distributions. For libcanberra support the newest
development release of your distribution might be necessary.

All sound systems and APIs listed above are suitable for
development for commercial (read: closed source) applications, since
they are licensed under LGPL or more liberal licenses or no client
library is involved.

You want to know why and when you should use a specific sound API?

GStreamer

GStreamer is best used for very high-level needs: i.e. you want to
play an audio file or video stream and do not care about all the tiny
details down to the PCM or codec level.

libcanberra

libcanberra is best used when adding sound feedback to user input
in UIs. It can also be used to play simple sound files for
notification purposes.

JACK

JACK is best used in professional audio production and where interconnecting applications is required.

Full ALSA

The full ALSA interface is best used for software on “plumbing layer” or when you want to make use of very specific hardware features, which might be need for audio production purposes.

Safe ALSA

The safe ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.

Phonon and KNotify

Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.

SDL

SDL is best used in full-screen games.

PulseAudio

For now, the PulseAudio API should be used only for applications
that want to expose sound-server-specific functionality (such as
mixers) or when a PCM output abstraction layer is already available in
your application and it thus makes sense to add an additional backend
to it for PulseAudio to keep the stack of audio layers minimal.

OSS

OSS should not be used for new programs.

You want to know more about the safe ALSA subset?

Here’s a list of DOS and DONTS in the ALSA API if you care about
that you application stays future-proof and works fine with
non-hardware backends or backends for user-space sound drivers such as
Bluetooth and FireWire audio. Some of these recommendations apply for
people using the full ALSA API as well, since some functionality
should be considered obsolete for all cases.

If your application’s code does not follow these rules, you must have
a very good reason for that. Otherwise your code should simply be considered
broken!

DONTS:

Do not use “async handlers”, e.g. via
snd_async_add_pcm_handler() and friends. Asynchronous
handlers are implemented using POSIX signals, which is a very
questionable use of them, especially from libraries and plugins. Even
when you don’t want to limit yourself to the safe ALSA subset
it is highly recommended not to use this functionality. Read
this for a longer explanation why signals for audio IO are
evil.

Do not parse the ALSA configuration file yourself or with
any of the ALSA functions such as snd_config_xxx(). If you
need to enumerate audio devices use snd_device_name_hint()
(and related functions). That
is the only API that also supports enumerating non-hardware audio
devices and audio devices with drivers implemented in userspace.

Do not parse any of the files from
/proc/asound/. Those files only include information about
kernel sound drivers — user-space plugins are not listed there. Also,
the set of kernel devices might differ from the way they are presented
in user-space. (i.e. sub-devices are mapped in different ways to
actual user-space devices such as surround51 an suchlike.

Do not rely on stable device indexes from ALSA. Nowadays
they depend on the initialization order of the drivers during boot-up
time and are thus not stable.

Do not use the snd_card_xxx() APIs. For
enumerating use snd_device_name_hint() (and related
functions). snd_card_xxx() is obsolete. It will only list
kernel hardware devices. User-space devices such as sound servers,
Bluetooth audio are not included. snd_card_load() is
completely obsolete in these days.

Do not hard-code device strings, especially not
hw:0 or plughw:0 or even dmix — these devices define no channel
mapping and are mapped to raw kernel devices. It is highly recommended
to use exclusively default as device string. If specific
channel mappings are required the correct device strings should be
front for stereo, surround40 for Surround 4.0,
surround41, surround51, and so on. Unfortunately at
this point ALSA does not define standard device names with channel
mappings for non-kernel devices. This means default may only
be used safely for mono and stereo streams. You should probably prefix
your device string with plug: to make sure ALSA transparently
reformats/remaps/resamples your PCM stream for you if the
hardware/backend does not support your sampling parameters
natively.

Do not assume that any particular sample type is supported
except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE,
FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.

Do not use snd_pcm_avail_update() for
synchronization purposes. It should be used exclusively to query the
amount of bytes that may be written/read right now. Do not use
snd_pcm_delay() to query the fill level of your playback
buffer. It should be used exclusively for synchronisation
purposes. Make sure you fully understand the difference, and note that
the two functions return values that are not necessarily directly
connected!

Do not assume that the mixer controls always know dB information.

Do not assume that all devices support MMAP style buffer access.

Do not assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.

Do not try to recover with your own code from ALSA error conditions such as buffer under-runs. Use snd_pcm_recover() instead.

Do not touch buffering/period metrics unless you have
specific latency needs. Develop defensively, handling correctly the
case when the backend cannot fulfill your buffering metrics
requests. Be aware that the buffering metrics of the playback buffer
only indirectly influence the overall latency in many
cases. i.e. setting the buffer size to a fixed value might actually result in
practical latencies that are much higher.

Do not assume that snd_pcm_rewind() is available and works and to which degree.

Do not assume that the time when a PCM stream can receive
new data is strictly dependant on the sampling and buffering
parameters and the resulting average throughput. Always make sure to
supply new audio data to the device when it asks for it by signalling
“writability” on the fd. (And similarly for capturing)

Do not use the “simple” interface snd_spcm_xxx().

Do not use any of the functions marked as “obsolete”.

Do not use the timer, midi, rawmidi, hwdep subsystems.

DOS:

Use snd_device_name_hint() for enumerating audio devices.

Use snd_smixer_xx() instead of raw snd_ctl_xxx()

For synchronization purposes use snd_pcm_delay().

For checking buffer playback/capture fill level use snd_pcm_update_avail().

Use snd_pcm_recover() to recover from errors returned by any of the ALSA functions.

If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use snd_pcm_rewind() if you need to react to user input quickly.

FAQ

What about ESD and NAS?

ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.

ALSA isn’t portable!

That’s not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.

Portability is key to me! What can I do?

Unfortunately no truly portable (i.e. to Win32) PCM API is
available right now that I could truly recommend. The systems shown
above are more or less portable at least to Unix-like operating
systems. That does not mean however that there are suitable backends
for all of them available. If you care about portability to Win32 and
MacOS you probably have to find a solution outside of the
recommendations above, or contribute the necessary
backends/portability fixes. None of the systems (with the exception of
OSS) is truly bound to Linux or Unix-like kernels.

What about PortAudio?

I don’t think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it’s your choice.

Oh, why do you hate OSS4 so much?

I don’t hate anything or anyone. I just don’t think OSS4 is a
serious option, especially not on Linux. On Linux, it is also
completely redundant due to ALSA.

You idiot, you have no clue!

You are right, I totally don’t. But that doesn’t hinder me from recommending things. Ha!

Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!

Sorry, that’s not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.

Final Words

Of course these recommendations are very basic and are only intended to
lead into the right direction. For each use-case different necessities
apply and hence options that I did not consider here might become
viable. It’s up to you to decide how much of what I wrote here
actually applies to your application.

This summary only includes software systems that are considered
stable and universally available at the time of writing. In the
future I hope to introduce a more suitable and portable replacement
for the safe ALSA subset of functions. I plan to update this text
from time to time to keep things up-to-date.

If you feel that I forgot a use case or an important API, then
please contact me or leave a comment. However, I think the summary
above is sufficiently comprehensive and if an entry is missing I most
likely deliberately left it out.

(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)

Oh, and please syndicate this, digg it. I’d like to see this guide to be well-known all around the Linux community. Thank you!

A Guide Through The Linux Sound API Jungle

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/guide-to-sound-apis.html

At the Audio MC at the Linux Plumbers Conference one
thing became very clear: it is very difficult for programmers to
figure out which audio API to use for which purpose and which API not
to use when doing audio programming on Linux. So here’s my try to
guide you through this jungle:

What do you want to do?

I want to write a media-player-like application!
Use GStreamer! (Unless your focus is only KDE in which cases Phonon might be an alternative.)
I want to add event sounds to my application!
Use libcanberra, install your sound files according to the XDG Sound Theming/Naming Specifications! (Unless your focus is only KDE in which case KNotify might be an alternative although it has a different focus.)
I want to do professional audio programming, hard-disk recording, music synthesizing, MIDI interfacing!
Use JACK and/or the full ALSA interface.
I want to do basic PCM audio playback/capturing!
Use the safe ALSA subset.
I want to add sound to my game!
Use the audio API of SDL for full-screen games, libcanberra for simple games with standard UIs such as Gtk+.
I want to write a mixer application!
Use the layer you want to support directly: if you want to support enhanced desktop software mixers, use the PulseAudio volume control APIs. If you want to support hardware mixers, use the ALSA mixer APIs.
I want to write audio software for the plumbing layer!
Use the full ALSA stack.
I want to write audio software for embedded applications!
For technical appliances usually the safe ALSA subset is a good choice, this however depends highly on your use-case.

You want to know more about the different sound APIs?

GStreamer
GStreamer is the de-facto
standard media streaming system for Linux desktops. It supports decoding and
encoding of audio and video streams. You can use it for a wide range of
purposes from simple audio file playback to elaborate network
streaming setups. GStreamer supports a wide range of CODECs and audio
backends. GStreamer is not particularly suited for basic PCM playback
or low-latency/realtime applications. GStreamer is portable and not
limited in its use to Linux. Among the supported backends are ALSA, OSS, PulseAudio. [Programming Manuals and References]
libcanberra
libcanberra
is an abstract event sound API. It implements the XDG
Sound Theme and Naming Specifications
. libcanberra is a blessed
GNOME dependency, but itself has no dependency on GNOME/Gtk/GLib and can be
used with other desktop environments as well. In addition to an easy
interface for playing sound files, libcanberra provides caching
(which is very useful for networked thin clients) and allows passing
of various meta data to the underlying audio system which then can be
used to enhance user experience (such as positional event sounds) and
for improving accessibility. libcanberra supports multiple backends
and is portable beyond Linux. Among the supported backends are ALSA, OSS, PulseAudio, GStreamer. [API Reference]
JACK
JACK is a sound system for
connecting professional audio production applications and hardware
output. It’s focus is low-latency and application interconnection. It
is not useful for normal desktop or embedded use. It is not an API
that is particularly useful if all you want to do is simple PCM
playback. JACK supports multiple backends, although ALSA is best
supported. JACK is portable beyond Linux. Among the supported backends are ALSA, OSS. [API Reference]
Full ALSA
ALSA is the Linux API
for doing PCM playback and recording. ALSA is very focused on
hardware devices, although other backends are supported as well (to a
limit degree, see below). ALSA as a name is used both for the Linux
audio kernel drivers and a user-space library that wraps these. ALSA — the library — is
comprehensive, and portable (to a limited degree). The full ALSA API
can appear very complex and is large. However it supports almost
everything modern sound hardware can provide. Some of the
functionality of the ALSA API is limited in its use to actual hardware
devices supported by the Linux kernel (in contrast to software sound
servers and sound drivers implemented in user-space such as those for
Bluetooth and FireWire audio — among others) and Linux specific
drivers. [API
Reference
]
Safe ALSA
Only a subset of the full ALSA API works on all backends ALSA
supports. It is highly recommended to stick to this safe subset
if you do ALSA programming to keep programs portable, future-proof and
compatible with sound servers, Bluetooth audio and FireWire audio. See
below for more details about which functions of ALSA are considered
safe. The safe ALSA API is a suitable abstraction for basic,
portable PCM playback and recording — not just for ALSA kernel driver
supported devices. Among the supported backends are ALSA kernel driver
devices, OSS, PulseAudio, JACK.
Phonon and KNotify
Phonon is high-level
abstraction for media streaming systems such as GStreamer, but goes a
bit further than that. It supports multiple backends. KNotify is a
system for “notifications”, which goes beyond mere event
sounds. However it does not support the XDG Sound Theming/Naming
Specifications at this point, and also doesn’t support caching or
passing of event meta-data to an underlying sound system. KNotify
supports multiple backends for audio playback via Phonon. Both APIs
are KDE/Qt specific and should not be used outside of KDE/Qt
applications. [Phonon API Reference] [KNotify API Reference]
SDL
SDL is a portable API
primarily used for full-screen game development. Among other stuff it
includes a portable audio interface. Among others SDL support OSS,
PulseAudio, ALSA as backends. [API Reference]
PulseAudio
PulseAudio is a sound system
for Linux desktops and embedded environments that runs in user-space
and (usually) on top of ALSA. PulseAudio supports network
transparency, per-application volumes, spatial events sounds, allows
switching of sound streams between devices on-the-fly, policy
decisions, and many other high-level operations. PulseAudio adds a glitch-free
audio playback model to the Linux audio stack. PulseAudio is not
useful in professional audio production environments. PulseAudio is
portable beyond Linux. PulseAudio has a native API and also supports
the safe subset of ALSA, in addition to limited,
LD_PRELOAD-based OSS compatibility. Among others PulseAudio supports
OSS and ALSA as backends and provides connectivity to JACK. [API
Reference
]
OSS
The Open Sound System is a
low-level PCM API supported by a variety of Unixes including Linux. It
started out as the standard Linux audio system and is supported on
current Linux kernels in the API version 3 as OSS3. OSS3 is considered
obsolete and has been fully replaced by ALSA. A successor to OSS3
called OSS4 is available but plays virtually no role on Linux and is
not supported in standard kernels or by any of the relevant
distributions. The OSS API is very low-level, based around direct
kernel interfacing using ioctl()s. It it is hence awkward to use and
can practically not be virtualized for usage on non-kernel audio
systems like sound servers (such as PulseAudio) or user-space sound
drivers (such as Bluetooth or FireWire audio). OSS3’s timing model
cannot properly be mapped to software sound servers at all, and is
also problematic on non-PCI hardware such as USB audio. Also, OSS does
not do sample type conversion, remapping or resampling if
necessary. This means that clients that properly want to support OSS
need to include a complete set of converters/remappers/resamplers for
the case when the hardware does not natively support the requested
sampling parameters. With modern sound cards it is very common to
support only S32LE samples at 48KHz and nothing else. If an OSS client
assumes it can always play back S16LE samples at 44.1KHz it will thus
fail. OSS3 is portable to other Unix-like systems, various differences
however apply. OSS also doesn’t support surround sound and other
functionality of modern sounds systems properly. OSS should be
considered obsolete and not be used in new applications.
ALSA and
PulseAudio have limited LD_PRELOAD-based compatibility with OSS. [Programming Guide]

All sound systems and APIs listed above are supported in all
relevant current distributions. For libcanberra support the newest
development release of your distribution might be necessary.

All sound systems and APIs listed above are suitable for
development for commercial (read: closed source) applications, since
they are licensed under LGPL or more liberal licenses or no client
library is involved.

You want to know why and when you should use a specific sound API?

GStreamer
GStreamer is best used for very high-level needs: i.e. you want to
play an audio file or video stream and do not care about all the tiny
details down to the PCM or codec level.
libcanberra
libcanberra is best used when adding sound feedback to user input
in UIs. It can also be used to play simple sound files for
notification purposes.
JACK
JACK is best used in professional audio production and where interconnecting applications is required.
Full ALSA
The full ALSA interface is best used for software on “plumbing layer” or when you want to make use of very specific hardware features, which might be need for audio production purposes.
Safe ALSA
The safe ALSA interface is best used for software that wants to output/record basic PCM data from hardware devices or software sound systems.
Phonon and KNotify
Phonon and KNotify should only be used in KDE/Qt applications and only for high-level media playback, resp. simple audio notifications.
SDL
SDL is best used in full-screen games.
PulseAudio
For now, the PulseAudio API should be used only for applications
that want to expose sound-server-specific functionality (such as
mixers) or when a PCM output abstraction layer is already available in
your application and it thus makes sense to add an additional backend
to it for PulseAudio to keep the stack of audio layers minimal.
OSS
OSS should not be used for new programs.

You want to know more about the safe ALSA subset?

Here’s a list of DOS and DONTS in the ALSA API if you care about
that you application stays future-proof and works fine with
non-hardware backends or backends for user-space sound drivers such as
Bluetooth and FireWire audio. Some of these recommendations apply for
people using the full ALSA API as well, since some functionality
should be considered obsolete for all cases.

If your application’s code does not follow these rules, you must have
a very good reason for that. Otherwise your code should simply be considered
broken!

DONTS:

  • Do not use “async handlers”, e.g. via
    snd_async_add_pcm_handler() and friends. Asynchronous
    handlers are implemented using POSIX signals, which is a very
    questionable use of them, especially from libraries and plugins. Even
    when you don’t want to limit yourself to the safe ALSA subset
    it is highly recommended not to use this functionality. Read
    this for a longer explanation why signals for audio IO are
    evil.
  • Do not parse the ALSA configuration file yourself or with
    any of the ALSA functions such as snd_config_xxx(). If you
    need to enumerate audio devices use snd_device_name_hint()
    (and related functions). That
    is the only API that also supports enumerating non-hardware audio
    devices and audio devices with drivers implemented in userspace.
  • Do not parse any of the files from
    /proc/asound/. Those files only include information about
    kernel sound drivers — user-space plugins are not listed there. Also,
    the set of kernel devices might differ from the way they are presented
    in user-space. (i.e. sub-devices are mapped in different ways to
    actual user-space devices such as surround51 an suchlike.
  • Do not rely on stable device indexes from ALSA. Nowadays
    they depend on the initialization order of the drivers during boot-up
    time and are thus not stable.
  • Do not use the snd_card_xxx() APIs. For
    enumerating use snd_device_name_hint() (and related
    functions). snd_card_xxx() is obsolete. It will only list
    kernel hardware devices. User-space devices such as sound servers,
    Bluetooth audio are not included. snd_card_load() is
    completely obsolete in these days.
  • Do not hard-code device strings, especially not
    hw:0 or plughw:0 or even dmix — these devices define no channel
    mapping and are mapped to raw kernel devices. It is highly recommended
    to use exclusively default as device string. If specific
    channel mappings are required the correct device strings should be
    front for stereo, surround40 for Surround 4.0,
    surround41, surround51, and so on. Unfortunately at
    this point ALSA does not define standard device names with channel
    mappings for non-kernel devices. This means default may only
    be used safely for mono and stereo streams. You should probably prefix
    your device string with plug: to make sure ALSA transparently
    reformats/remaps/resamples your PCM stream for you if the
    hardware/backend does not support your sampling parameters
    natively.
  • Do not assume that any particular sample type is supported
    except the following ones: U8, S16_LE, S16_BE, S32_LE, S32_BE,
    FLOAT_LE, FLOAT_BE, MU_LAW, A_LAW.
  • Do not use snd_pcm_avail_update() for
    synchronization purposes. It should be used exclusively to query the
    amount of bytes that may be written/read right now. Do not use
    snd_pcm_delay() to query the fill level of your playback
    buffer. It should be used exclusively for synchronisation
    purposes. Make sure you fully understand the difference, and note that
    the two functions return values that are not necessarily directly
    connected!
  • Do not assume that the mixer controls always know dB information.
  • Do not assume that all devices support MMAP style buffer access.
  • Do not assume that the hardware pointer inside the (possibly mmaped) playback buffer is the actual position of the sample in the DAC. There might be an extra latency involved.
  • Do not try to recover with your own code from ALSA error conditions such as buffer under-runs. Use snd_pcm_recover() instead.
  • Do not touch buffering/period metrics unless you have
    specific latency needs. Develop defensively, handling correctly the
    case when the backend cannot fulfill your buffering metrics
    requests. Be aware that the buffering metrics of the playback buffer
    only indirectly influence the overall latency in many
    cases. i.e. setting the buffer size to a fixed value might actually result in
    practical latencies that are much higher.
  • Do not assume that snd_pcm_rewind() is available and works and to which degree.
  • Do not assume that the time when a PCM stream can receive
    new data is strictly dependant on the sampling and buffering
    parameters and the resulting average throughput. Always make sure to
    supply new audio data to the device when it asks for it by signalling
    “writability” on the fd. (And similarly for capturing)
  • Do not use the “simple” interface snd_spcm_xxx().
  • Do not use any of the functions marked as “obsolete”.
  • Do not use the timer, midi, rawmidi, hwdep subsystems.

DOS:

  • Use snd_device_name_hint() for enumerating audio devices.
  • Use snd_smixer_xx() instead of raw snd_ctl_xxx()
  • For synchronization purposes use snd_pcm_delay().
  • For checking buffer playback/capture fill level use snd_pcm_update_avail().
  • Use snd_pcm_recover() to recover from errors returned by any of the ALSA functions.
  • If possible use the largest buffer sizes the device supports to maximize power saving and drop-out safety. Use snd_pcm_rewind() if you need to react to user input quickly.

FAQ

What about ESD and NAS?
ESD and NAS are obsolete, both as API and as sound daemon. Do not develop for it any further.
ALSA isn’t portable!
That’s not true! Actually the user-space library is relatively portable, it even includes a backend for OSS sound devices. There is no real reason that would disallow using the ALSA libraries on other Unixes as well.
Portability is key to me! What can I do?
Unfortunately no truly portable (i.e. to Win32) PCM API is
available right now that I could truly recommend. The systems shown
above are more or less portable at least to Unix-like operating
systems. That does not mean however that there are suitable backends
for all of them available. If you care about portability to Win32 and
MacOS you probably have to find a solution outside of the
recommendations above, or contribute the necessary
backends/portability fixes. None of the systems (with the exception of
OSS) is truly bound to Linux or Unix-like kernels.
What about PortAudio?
I don’t think that PortAudio is very good API for Unix-like operating systems. I cannot recommend it, but it’s your choice.
Oh, why do you hate OSS4 so much?
I don’t hate anything or anyone. I just don’t think OSS4 is a
serious option, especially not on Linux. On Linux, it is also
completely redundant due to ALSA.
You idiot, you have no clue!
You are right, I totally don’t. But that doesn’t hinder me from recommending things. Ha!
Hey I wrote/know this tiny new project which is an awesome abstraction layer for audio/media!
Sorry, that’s not sufficient. I only list software here that is known to be sufficiently relevant and sufficiently well maintained.

Final Words

Of course these recommendations are very basic and are only intended to
lead into the right direction. For each use-case different necessities
apply and hence options that I did not consider here might become
viable. It’s up to you to decide how much of what I wrote here
actually applies to your application.

This summary only includes software systems that are considered
stable and universally available at the time of writing. In the
future I hope to introduce a more suitable and portable replacement
for the safe ALSA subset of functions. I plan to update this text
from time to time to keep things up-to-date.

If you feel that I forgot a use case or an important API, then
please contact me or leave a comment. However, I think the summary
above is sufficiently comprehensive and if an entry is missing I most
likely deliberately left it out.

(Also note that I am upstream for both PulseAudio and libcanberra and did some minor contributions to ALSA, GStreamer and some other of the systems listed above. Yes, I am biased.)

Oh, and please syndicate this, digg it. I’d like to see this guide to be well-known all around the Linux community. Thank you!

FOMS/LCA Recap

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/foms-lca-recap.html

Finally, here’s my linux.conf.au 2007 and FOMS 2007
recap. Maybe a little bit late, but better late then never.

FOMS was a very well organized conference with a packed schedule
and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the
attendees without any opposition (at least none was expressed
aloud). After a few “discussions” on a few mailing lists (including
GNOME MLs) and some personal emails I got, I had thought that more
people were in opposition of the idea of having a userspace sound
daemon for the desktop. Apparently, I was overly pessimistic. Good
news, that!

During the FOMS conference we discussed the problems audio on Linux
currently has. One of the major issues still is that we’re lacking a
cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific
and complicated to use. The only real contender is PortAudio. However,
PortAudio has its share of problems and hasn’t reach wide adoption
yet. Right now most larger software projects implement an audio
abstraction layer of some kind, and mostly in a very dirty, simplistic
and limited fasion. MPlayer does, Xine does it, Flash does
it. Everyone does it, and it sucks. (Note: this is only a very short
overview why audio on Linux sucks right now. For a longer one, please
have a look on the first 15mins of my PulseAudio talk at LCA, linked
below.)

Several people were asking why not to make the PulseAudio API the
new “standard” PCM API for Linux. Due to several reasons that would be a
bad idea. First of all, the PulseAudio API cannot be used on anything
else but PulseAudio. While PulseAudio has been ported to Win32, Vista
already has a userspace desktop sound server, hence running PulseAudio
on top of that doesn’t make much sense. Thus the API is not exactly
cross-platform. Secondly, I – as the guy who designed it – am not
happy with the current PulseAudio API. While it is very powerful it is
also very difficult to use and easy to misuse, mostly due to its fully
asynchronous nature. In addition it is also not the exactly smallest
API around.

So, what could be done about this? We agreed on a – maybe –
controversional solution: defining yet another abstracted PCM audio
API. Yes, fixing the problem that we have too many conflicting,
competing sound systems by defining yet another API sounds like a
paradoxon, but I do believe this is the right path to follow. Why?
Because none of the currently available solutions is suitable for all
application areas we have on Linux. Either the current APIs are not
portable, or they are horribly difficult to use properly, or have a
strange license, or are too simple in their functionality. MacOSX
managed to establish a single audio API (CoreAudio) that makes almost
everyone happy on that system – and we should be able to do same for
Linux. Secondly, none of the current APIs has been designed with
network sound servers in mind. However, proper networking support
reflects back into the API, and in a non-trivial way. An API which
works fine in networked environment needs to eliminate roundtrips
where possible, be open for time interpolation and have a flexible
buffering (besides other minor things). Thirdly none of the current
APIs offers enough functionality to properly support all the needs of
modern desktop sound systems, such as per-stream volumes, stream names
and notifications about external state changes.

During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin
(from Xiph) and I sat down and designed a draft API for the
functionality we would like to see in this API. For the time being we
dubbed it libsydney, after the city where we started this
project. I plan to make this the only supported audio API for
PulseAudio, eventually. Thus, if you will code against PulseAudio you
will get cross-platform support for free. In addition, because
PulseAudio is now being integrated into the major distributions (at
least Ubuntu and Fedora), this library will be made available on most
systems through the backdoor.

So, what will this new API offer? Firstly, the buffering model is
much more powerful than of any current sound API. The buffering model
mostly follows PulseAudio’s internal buffering model which
(theoretically) can offer zero-latency streaming and has been
pioneered by Jim Gettys’ AF sound server. It allows you to seek around
in the playback buffer very flexibly. This is very useful to allow
very fast reaction to the user’s playback control commands while still
allowing large buffers, which are good to deal with high network
lag. In addition it is very handy for the programmer, such as when
implementing streaming clients where packets may arrive
out-of-order. The API will emulate this buffering model on top of
traditional audio devices, and when used on top of PulseAudio it will use
its native implementation. The API will also clearly define which
sound formats are guaranteed to be available, thus making it a lot
easier to code without thinking of different hardware supporting
different formats all the time. Of course, the API will be easier to
use than PulseAudio’s current API. It will be very portable, scaling
from FPU-less architectures to pro-audio machines with a massive
number of synchronised channels. There are several modes available to
deal with XRUNs semi-automatically, one of them guaranteeing that the
time axis stays linear and monotonical in all events.

The list of features of this new API is much longer, however,
enough of these grand plans! We didn’t write any real code for this
yet. To make sure that this project is not another one of those which
are announced grandiosely without ever producing any code I will stop
listing features here now. We will eventually publish a first draft of
our C API for public discussion. Stay tuned.

Side-by-side with libsydney I discussed an abstract API
for desktop event sounds with Mikko (i.e. those annoying “bing” sounds
when you click a button and the like). Dubbed libcanberra
(named after the city which one of the developers visited after
Sydney), this will hopefully be for the PulseAudio sample cache API
what libsydney is for the PulseAudio streaming API: a total
replacement.

As a by-product of the libsydney discussion Jean-Marc
coded a
fast C resampling library
supporting both floating point and fixed
point and being licensed under BSD. (In contrast to
libsamplerate which is GPL and floating-point-only, but which
probably has better quality). PulseAudio will make use of this new
library, as will libsydney. And I sincerly hope that ALSA,
GStreamer and other projects replace their crappy home-grown
resamplers with this one!

For PulseAudio I was looking for a CODEC which we could use to
encode audio if we have to transfer it over the network. Such a CODEC
would need to have low CPU requirements and allow low-latency
operation, while providing hifi audio. Compression ratio is not such a
high requirement. Unfortunately, as it seems no such CODEC exists,
especially not a “Free” one. However, the Xiph people recommended to
hack up a special version of FLAC for this task. FLAC is fast, has
(obviously) good quality and if hacked up could provide low-latency
encoding. However, FLAC doesn’t compress that well. Current PulseAudio
thin-client installations require 170kB network bandwidth for each
client if hifi audio is used. Encoding this in FLAC this could cut
this in half. Not perfect, but better than nothing.

So, that was FOMS! FOMS is a definitely highly recommended
conference. If you have the chance to attend next year, don’t miss it!
I’ve never been to a more productive, packed conference in my life!

At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our
talk about Avahi went very well. During my flights to and back from
.au I hacked up avahi-ui
which I also announced during that talk. Also, in related news,
tedp started to work on an implementation of NAT-PMP
(aka “reverse firewall piercing”; both client and server) for
inclusion in Avahi. This will hopefully make the upcoming Wide-Area
DNS support in Avahi much more useful.

linux.conf.au was a very exciting conference. As a speaker
you’re treated like a rock star, with stuff like the speakers dinner,
the speakers adventure (climbing on top of Sydney’s AMP tower) and
the penguin dinner. Heck, the organizers even picked me up at the
airport, something I really didn’t expect when I landed in Sydney,
which however is quite nice after a 27h flight.

Two talks I particularly enjoyed at LCA:

nouveau – reverse engineered nvidia drivers (Ogg Theora)
burning cpu and battery on the gnome desktop (Ogg Theora)

And just for the sake of completeness, here are the links to my presentations:

The PulseAudio Sound Server (Ogg Theora; Slides)
Using Avahi the “Right Way” (Ogg Theora; Slides)

Ok, that’s it for now. Thanks go to Silvia Pfeiffer, the rest of
the FOMS team and the Seven Team for organizing these two amazing
conferences!

FOMS/LCA Recap

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/foms-lca-recap.html

Finally, here’s my linux.conf.au 2007 and FOMS 2007
recap. Maybe a little bit late, but better late then never.

FOMS was a very well organized conference with a packed schedule
and a lot of high-profile attendees. To my surprise PulseAudio has been accepted by the
attendees without any opposition (at least none was expressed
aloud). After a few “discussions” on a few mailing lists (including
GNOME MLs) and some personal emails I got, I had thought that more
people were in opposition of the idea of having a userspace sound
daemon for the desktop. Apparently, I was overly pessimistic. Good
news, that!

During the FOMS conference we discussed the problems audio on Linux
currently has. One of the major issues still is that we’re lacking a
cross-platform PCM audio API everyone agrees on. ALSA is Linux-specific
and complicated to use. The only real contender is PortAudio. However,
PortAudio has its share of problems and hasn’t reach wide adoption
yet. Right now most larger software projects implement an audio
abstraction layer of some kind, and mostly in a very dirty, simplistic
and limited fasion. MPlayer does, Xine does it, Flash does
it. Everyone does it, and it sucks. (Note: this is only a very short
overview why audio on Linux sucks right now. For a longer one, please
have a look on the first 15mins of my PulseAudio talk at LCA, linked
below.)

Several people were asking why not to make the PulseAudio API the
new “standard” PCM API for Linux. Due to several reasons that would be a
bad idea. First of all, the PulseAudio API cannot be used on anything
else but PulseAudio. While PulseAudio has been ported to Win32, Vista
already has a userspace desktop sound server, hence running PulseAudio
on top of that doesn’t make much sense. Thus the API is not exactly
cross-platform. Secondly, I – as the guy who designed it – am not
happy with the current PulseAudio API. While it is very powerful it is
also very difficult to use and easy to misuse, mostly due to its fully
asynchronous nature. In addition it is also not the exactly smallest
API around.

So, what could be done about this? We agreed on a – maybe –
controversional solution: defining yet another abstracted PCM audio
API. Yes, fixing the problem that we have too many conflicting,
competing sound systems by defining yet another API sounds like a
paradoxon, but I do believe this is the right path to follow. Why?
Because none of the currently available solutions is suitable for all
application areas we have on Linux. Either the current APIs are not
portable, or they are horribly difficult to use properly, or have a
strange license, or are too simple in their functionality. MacOSX
managed to establish a single audio API (CoreAudio) that makes almost
everyone happy on that system – and we should be able to do same for
Linux. Secondly, none of the current APIs has been designed with
network sound servers in mind. However, proper networking support
reflects back into the API, and in a non-trivial way. An API which
works fine in networked environment needs to eliminate roundtrips
where possible, be open for time interpolation and have a flexible
buffering (besides other minor things). Thirdly none of the current
APIs offers enough functionality to properly support all the needs of
modern desktop sound systems, such as per-stream volumes, stream names
and notifications about external state changes.

During FOMS and LCA, Mikko Leppanen (from Nokia), Jean-Marc Valin
(from Xiph) and I sat down and designed a draft API for the
functionality we would like to see in this API. For the time being we
dubbed it libsydney, after the city where we started this
project. I plan to make this the only supported audio API for
PulseAudio, eventually. Thus, if you will code against PulseAudio you
will get cross-platform support for free. In addition, because
PulseAudio is now being integrated into the major distributions (at
least Ubuntu and Fedora), this library will be made available on most
systems through the backdoor.

So, what will this new API offer? Firstly, the buffering model is
much more powerful than of any current sound API. The buffering model
mostly follows PulseAudio’s internal buffering model which
(theoretically) can offer zero-latency streaming and has been
pioneered by Jim Gettys’ AF sound server. It allows you to seek around
in the playback buffer very flexibly. This is very useful to allow
very fast reaction to the user’s playback control commands while still
allowing large buffers, which are good to deal with high network
lag. In addition it is very handy for the programmer, such as when
implementing streaming clients where packets may arrive
out-of-order. The API will emulate this buffering model on top of
traditional audio devices, and when used on top of PulseAudio it will use
its native implementation. The API will also clearly define which
sound formats are guaranteed to be available, thus making it a lot
easier to code without thinking of different hardware supporting
different formats all the time. Of course, the API will be easier to
use than PulseAudio’s current API. It will be very portable, scaling
from FPU-less architectures to pro-audio machines with a massive
number of synchronised channels. There are several modes available to
deal with XRUNs semi-automatically, one of them guaranteeing that the
time axis stays linear and monotonical in all events.

The list of features of this new API is much longer, however,
enough of these grand plans! We didn’t write any real code for this
yet. To make sure that this project is not another one of those which
are announced grandiosely without ever producing any code I will stop
listing features here now. We will eventually publish a first draft of
our C API for public discussion. Stay tuned.

Side-by-side with libsydney I discussed an abstract API
for desktop event sounds with Mikko (i.e. those annoying “bing” sounds
when you click a button and the like). Dubbed libcanberra
(named after the city which one of the developers visited after
Sydney), this will hopefully be for the PulseAudio sample cache API
what libsydney is for the PulseAudio streaming API: a total
replacement.

As a by-product of the libsydney discussion Jean-Marc
coded a
fast C resampling library
supporting both floating point and fixed
point and being licensed under BSD. (In contrast to
libsamplerate which is GPL and floating-point-only, but which
probably has better quality). PulseAudio will make use of this new
library, as will libsydney. And I sincerly hope that ALSA,
GStreamer and other projects replace their crappy home-grown
resamplers with this one!

For PulseAudio I was looking for a CODEC which we could use to
encode audio if we have to transfer it over the network. Such a CODEC
would need to have low CPU requirements and allow low-latency
operation, while providing hifi audio. Compression ratio is not such a
high requirement. Unfortunately, as it seems no such CODEC exists,
especially not a “Free” one. However, the Xiph people recommended to
hack up a special version of FLAC for this task. FLAC is fast, has
(obviously) good quality and if hacked up could provide low-latency
encoding. However, FLAC doesn’t compress that well. Current PulseAudio
thin-client installations require 170kB network bandwidth for each
client if hifi audio is used. Encoding this in FLAC this could cut
this in half. Not perfect, but better than nothing.

So, that was FOMS! FOMS is a definitely highly recommended
conference. If you have the chance to attend next year, don’t miss it!
I’ve never been to a more productive, packed conference in my life!

At LCA I met fellow Avahi coder Trent Lloyd for the first time. Our
talk about Avahi went very well. During my flights to and back from
.au I hacked up avahi-ui
which I also announced during that talk. Also, in related news,
tedp started to work on an implementation of NAT-PMP
(aka “reverse firewall piercing”; both client and server) for
inclusion in Avahi. This will hopefully make the upcoming Wide-Area
DNS support in Avahi much more useful.

linux.conf.au was a very exciting conference. As a speaker
you’re treated like a rock star, with stuff like the speakers dinner,
the speakers adventure (climbing on top of Sydney’s AMP tower) and
the penguin dinner. Heck, the organizers even picked me up at the
airport, something I really didn’t expect when I landed in Sydney,
which however is quite nice after a 27h flight.

Two talks I particularly enjoyed at LCA:

And just for the sake of completeness, here are the links to my presentations:

Ok, that’s it for now. Thanks go to Silvia Pfeiffer, the rest of
the FOMS team and the Seven Team for organizing these two amazing
conferences!

Polypaudio 0.8 Released

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/polypaudio-0.8.html

The reports of Polypaudio’s death are greatly exaggerated.

We are proud to announce the release of Polypaudio
0.8, our networked sound daemon for Linux, other Unix-like operating
systems, and Microsoft Windows. Since the last official release, 0.7,
more than a year has passed. In the meantime Polypaudio experienced
major improvements. Major contributions have been made by both Pierre
Ossman and me. Pierre is being payed by Cendio AB to work on
Polypaudio. Cendio distributes Polypaudio along with their ThinLinc Terminal
Server
.

Some of the major changes:

New playback buffer model that allows applications to freely seek in
the server side playback buffer (both with relative and absolute indexes) and to synchronize
multiple streams together, in a way that the playback times are guaranteed to
stay synchronized even in the case of a buffer underrun. (Lennart)

Ported to Microsoft Windows and Sun Solaris (Pierre)

Many inner loops (like sample type conversions) have been ported
to liboil, which
enables us to take advantage of modern SIMD instruction sets, like MMX or SSE/SSE2. (Lennart)

Support for channel maps which allow applications to assign
specific speaker positions to logical channels. This enables support
for “surround sound”. In addition we now support seperate volumes for
all channels. (Lennart)

Support for hardware volume control for drivers that support
it. (Lennart, Pierre)

Local users may now be authenticated just by the membership in a
UNIX group, without the need to exchange authentication cookies. (Lennart)

A new driver module module-detect which detects
automatically what local output devices are available and loads the
needed drivers. Supports ALSA, OSS, Solaris and Win32 devices. (Lennart, Pierre)

Two new modules implementing RTP/SDP/SAP based multicast audio
streaming. Useful for streaming music to multiple PCs with speakers
simultaneously. Or for implementing a simple “always-on” conferencing
solution for the LAN. Or for sharing a single MIC/LINE-IN jack on the
LAN. (Lennart)

Two new modules for connecting Polypaudio to a JACK audio server
(Lennart)

A new Zeroconf (mDNS/DNS-SD) publisher module. (Lennart)

A new module to control the volume of an output sink with a LIRC supported infrared remote
control, and another one for doing so with a multimeda keyboard. (Lennart)

Support for resolving remote host names asynchronously using libasyncns. (Lennart)

A simple proof-of-concept HTTP module, which dumps the current daemon status to HTML. (Lennart)

Add proper validity checking of passed parameter to every single
API functions. (Lennart)

Last but not least, the documentation has been beefed up a lot and
is no longer just a simple doxygen-based API documentation (Pierre, Lennart)

Sounds good, doesn’t it? But that’s not all!

We’re really excited about this new Polypaudio release. However,
there are more very exciting, good news in the Polypaudio world. Pierre
implemented a Polypaudio plugin for alsa-libs. This means you
may now use any ALSA-aware application to access a Polypaudio sound
server! The patch has already merged upstream, and will probably
appear in the next official release of alsa-plugins.

Due to the massive internal changes we had to make a lot of modifications to
the public API. Hence applications which currently make use of the Polypaudio
0.7 API need to be updated. The patches or packages I maintain will be updated
in the next weeks one-by-one. (That is: xmms-polyp, the MPlayer patch, the
libao patch, the GStreamer patch and the PortAudio patch)

A side note: I wonder what this new release means for Polypaudio in
Debian. I’ve never been informed by the Debian maintainers of
Polypaudio that it has been uploaded to Debian, and never of the
removal either. In fact I never exchanged a single line with those who
were the Debian maintainers of Polypaudio. Is this the intended way
how the Debian project wants its developers to communicate with
upstream? I doubt that!

How does Polypaudio compare to ESOUND?

Polypaudio does everything what ESOUND does, and much more. It is a
fully compatible drop-in replacement. With a small script you can make
it command line compatible (including autospawning). ESOUND clients
may connect to our daemon just like they did to the original ESOUND
daemon, since we implemented a compatibility module for the ESOUND
protocol.

Support for other well known networked audio protocols (such as
NAS) should be easy to add – if there is a need.

For a full list of the features that Polypaudio has over ESOUND,
see Polypaudio’s
homepage
.

How does Polypaudio compare to ALSA‘s dmix?

Some people might ask whether there still is a need for a sound
server in times where ALSA’s dmix plugin is available. The
answer is: yes!

Firstly, Polypaudio is networked, which dmix is
not. However, there are many reasons why Polypaudio is useful on
non-networked systems as well. Polypaudio is portable, it is available
not just for Linux but for FreeBSD, Solaris and even Microsoft
Windows. Polypaudio is extensible, there is broad range of additional
modules
available which allow the user to use Polypaudio in many
exciting ways ALSA doesn’t offer. In Polypaudio streams, devices and
other server internals can be monitored and introspected freely. The
volume of the multiple streams may be manipulated independently of
each other, which allows new exciting applications like a work-alike
of the new per-application mixer tool featured in upcoming Windows
Vista. In multi-user systems, Polypaudio offers a secure and safe way
to allow multiple users to access the sound device
simultaneously. Polypaudio may be accessed through the ESOUND and the
ALSA APIs. In addition, ALSA dmix is still not supported properly by
many ALSA clients, and is difficult to setup.

A side node: dmix forks off its own simple sound daemon
anyway, hence there is no big difference to using Polypaudio with the
ALSA plugin in auto-spawning mode. (Though admittedly, those ALSA
clients that don’t work properly with dmix, won’t do so with our ALSA
plugin as well since they actually use the ALSA API incorrectly.)

How does Polypaudio compare to JACK?

Everytime people discuss sound servers on Unix/Linux and which way
is the right to go for desktops, JACK gets mentioned and suggested by some as a
replacement for ESOUND for the desktop. However, this is not
practical. JACK is not intended to be a desktop sound server, instead
it is designed for professional audio in mind. Its semantics are
different from other sound servers: e.g. it uses exclusively floating
point samples, doesn’t deal directly with interleaved channels and
maintains a server global time-line which may be stopped and seeked
around. All that translates badly to desktop usages. JACK is really
nice software, but just not designed for the normal desktop user,
who’s not working on professional audio production.

Since we think that JACK is really a nice piece of work, we added
two new modules to Polypaudio which can be used to hook it up to a
JACK server.

Get Polypaudio 0.8, while it is hot!

BTW: We’re looking for a logo for Polypaudio. Feel free to send us your suggestions!

Update: The Debian rant is unjust to Jeff Waugh. In fact, he had informed me that he prepared Debian packages of Polypaudio. I just never realized that he had actually uploaded them to Debian. What still stands, however, is that I’ve not been informed or asked about the removal.

Polypaudio 0.8 Released

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/polypaudio-0.8.html

The reports of Polypaudio’s death are greatly exaggerated.

We are proud to announce the release of Polypaudio
0.8, our networked sound daemon for Linux, other Unix-like operating
systems, and Microsoft Windows. Since the last official release, 0.7,
more than a year has passed. In the meantime Polypaudio experienced
major improvements. Major contributions have been made by both Pierre
Ossman and me. Pierre is being payed by Cendio AB to work on
Polypaudio. Cendio distributes Polypaudio along with their ThinLinc Terminal
Server
.

Some of the major changes:

  • New playback buffer model that allows applications to freely seek in
    the server side playback buffer (both with relative and absolute indexes) and to synchronize
    multiple streams together, in a way that the playback times are guaranteed to
    stay synchronized even in the case of a buffer underrun. (Lennart)
  • Ported to Microsoft Windows and Sun Solaris (Pierre)
  • Many inner loops (like sample type conversions) have been ported
    to liboil, which
    enables us to take advantage of modern SIMD instruction sets, like MMX or SSE/SSE2. (Lennart)
  • Support for channel maps which allow applications to assign
    specific speaker positions to logical channels. This enables support
    for “surround sound”. In addition we now support seperate volumes for
    all channels. (Lennart)
  • Support for hardware volume control for drivers that support
    it. (Lennart, Pierre)
  • Local users may now be authenticated just by the membership in a
    UNIX group, without the need to exchange authentication cookies. (Lennart)
  • A new driver module module-detect which detects
    automatically what local output devices are available and loads the
    needed drivers. Supports ALSA, OSS, Solaris and Win32 devices. (Lennart, Pierre)
  • Two new modules implementing RTP/SDP/SAP based multicast audio
    streaming. Useful for streaming music to multiple PCs with speakers
    simultaneously. Or for implementing a simple “always-on” conferencing
    solution for the LAN. Or for sharing a single MIC/LINE-IN jack on the
    LAN. (Lennart)
  • Two new modules for connecting Polypaudio to a JACK audio server
    (Lennart)
  • A new Zeroconf (mDNS/DNS-SD) publisher module. (Lennart)
  • A new module to control the volume of an output sink with a LIRC supported infrared remote
    control, and another one for doing so with a multimeda keyboard. (Lennart)
  • Support for resolving remote host names asynchronously using libasyncns. (Lennart)
  • A simple proof-of-concept HTTP module, which dumps the current daemon status to HTML. (Lennart)
  • Add proper validity checking of passed parameter to every single
    API functions. (Lennart)
  • Last but not least, the documentation has been beefed up a lot and
    is no longer just a simple doxygen-based API documentation (Pierre, Lennart)

Sounds good, doesn’t it? But that’s not all!

We’re really excited about this new Polypaudio release. However,
there are more very exciting, good news in the Polypaudio world. Pierre
implemented a Polypaudio plugin for alsa-libs. This means you
may now use any ALSA-aware application to access a Polypaudio sound
server! The patch has already merged upstream, and will probably
appear in the next official release of alsa-plugins.

Due to the massive internal changes we had to make a lot of modifications to
the public API. Hence applications which currently make use of the Polypaudio
0.7 API need to be updated. The patches or packages I maintain will be updated
in the next weeks one-by-one. (That is: xmms-polyp, the MPlayer patch, the
libao patch, the GStreamer patch and the PortAudio patch)

A side note: I wonder what this new release means for Polypaudio in
Debian. I’ve never been informed by the Debian maintainers of
Polypaudio that it has been uploaded to Debian, and never of the
removal either. In fact I never exchanged a single line with those who
were the Debian maintainers of Polypaudio. Is this the intended way
how the Debian project wants its developers to communicate with
upstream? I doubt that!

How does Polypaudio compare to ESOUND?

Polypaudio does everything what ESOUND does, and much more. It is a
fully compatible drop-in replacement. With a small script you can make
it command line compatible (including autospawning). ESOUND clients
may connect to our daemon just like they did to the original ESOUND
daemon, since we implemented a compatibility module for the ESOUND
protocol.

Support for other well known networked audio protocols (such as
NAS) should be easy to add – if there is a need.

For a full list of the features that Polypaudio has over ESOUND,
see Polypaudio’s
homepage
.

How does Polypaudio compare to ALSA‘s dmix?

Some people might ask whether there still is a need for a sound
server in times where ALSA’s dmix plugin is available. The
answer is: yes!

Firstly, Polypaudio is networked, which dmix is
not. However, there are many reasons why Polypaudio is useful on
non-networked systems as well. Polypaudio is portable, it is available
not just for Linux but for FreeBSD, Solaris and even Microsoft
Windows. Polypaudio is extensible, there is broad range of additional
modules
available which allow the user to use Polypaudio in many
exciting ways ALSA doesn’t offer. In Polypaudio streams, devices and
other server internals can be monitored and introspected freely. The
volume of the multiple streams may be manipulated independently of
each other, which allows new exciting applications like a work-alike
of the new per-application mixer tool featured in upcoming Windows
Vista. In multi-user systems, Polypaudio offers a secure and safe way
to allow multiple users to access the sound device
simultaneously. Polypaudio may be accessed through the ESOUND and the
ALSA APIs. In addition, ALSA dmix is still not supported properly by
many ALSA clients, and is difficult to setup.

A side node: dmix forks off its own simple sound daemon
anyway, hence there is no big difference to using Polypaudio with the
ALSA plugin in auto-spawning mode. (Though admittedly, those ALSA
clients that don’t work properly with dmix, won’t do so with our ALSA
plugin as well since they actually use the ALSA API incorrectly.)

How does Polypaudio compare to JACK?

Everytime people discuss sound servers on Unix/Linux and which way
is the right to go for desktops, JACK gets mentioned and suggested by some as a
replacement for ESOUND for the desktop. However, this is not
practical. JACK is not intended to be a desktop sound server, instead
it is designed for professional audio in mind. Its semantics are
different from other sound servers: e.g. it uses exclusively floating
point samples, doesn’t deal directly with interleaved channels and
maintains a server global time-line which may be stopped and seeked
around. All that translates badly to desktop usages. JACK is really
nice software, but just not designed for the normal desktop user,
who’s not working on professional audio production.

Since we think that JACK is really a nice piece of work, we added
two new modules to Polypaudio which can be used to hook it up to a
JACK server.

Get Polypaudio 0.8, while it is hot!

BTW: We’re looking for a logo for Polypaudio. Feel free to send us your suggestions!

Update: The Debian rant is unjust to Jeff Waugh. In fact, he had informed me that he prepared Debian packages of Polypaudio. I just never realized that he had actually uploaded them to Debian. What still stands, however, is that I’ve not been informed or asked about the removal.