Avahi 0.6.22

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/avahi-0.6.22.html

A couple of minutes ago I released Avahi
0.6.22
into the wild, the newest iteration of everyone’s favourite zero
configuration networking suite.

Avahi Logo

You ask why this is something to blog about?

Firstly, new in this version is Sjoerd Simons’ avahi-gobject
library, a GObject wrapper around the Avahi API. It allows full GObject-style
object oriented programming of Zeroconf applications, with signals and
everything. To all you GNOME/Gtk+ hackers out there: now it is even more fun to
hack your own Zeroconf applications for GNOME/Gtk+!

Secondly, this is the first release to ship i18n support. For those who
prefer to run their systems with non-english locales[1] this should
be good news. I’ve always been a little afraid of adding i18n support, since
this either meant that I would have contstantly had to commit i18n patches, or that I
would have needed to move my code to GNOME SVN. However, we now have Fedora’s Transifex,
which allows me to open up my SVN for translators without much organizational
work on my side. Translations are handled centrally, and commited back to my
repository when needed. It’s a bit like Canonical’s Rosetta, but with a focus
on commiting i18n changes upstream, and without being closed-source crap.

You like this release? Then give me a kudo on ohloh.net. My
ego still thirsts for gold, and I am still (or again) 25 positions away from
that. 😉

Footnotes

[1] Personally, I run my desktop with $LC_MESSAGES=C, but
LANG=de_DE, which are the settings I can recommend to everyone who is from Germany and wants to stay
sane. Unfortunately it is a PITA to configure this on
GNOME, though.

Back from India

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/photos/india.html

FOSS.in was one of the best conferences I have ever
been to, and a lot of fun. The organization was flawless and I can
only heartily recommend everyone to send in a presentation proposal for next year’s
iteration. I certainly hope the commitee is going to accept my proposals next year again. Especially the food was gorgeous.

I will spare you the usual conference photos, you can find a lot of those on flickr. However, what I will not spare you are a couple of photos I shot in Bangalore, Srirangapatna and Mysore.

India  
India  
India  
India  

India  
India  
India  
India  
India  
India  
India  

India  
India  
India  
India  
India  
India  
India

Panorama

Lazyweb: POSIX Process Groups and Sessions

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/pgrp-vs-session.html

Dear Lazyweb,

I have trouble understanding what exactly POSIX process groups and sessions
are good for. The POSIX
docs
are very vague on this. What exactly is the effect of being in a process
group with some other process, and what does being in the same session with it
add on top? And what is the benefit of being a group/session leader in contrast of just being a normal random process in the group/session?

The only thing I understood is that kill(2) with a negative first
parameter can be used to “multicast” signals to entire process groups, and that
SIGINT on C-c is delivered that way. But, is that all? The POSIX docs say
“… for the purpose of signaling, placement in foreground or background,
and other job control actions”, which is very vague. What are those
“other job control actions?”. What does job control persist of besides
multicasting signals? And what is “placement in foreground or background” other
than delivering signals?

And I totally don’t get POSIX sessions and how they differ from POSIX process groups. Please enlighten me!

Puzzled,
    Lennart

Lazyweb: POSIX Process Groups and Sessions

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/pgrp-vs-session.html

Dear Lazyweb,

I have trouble understanding what exactly POSIX process groups and sessions
are good for. The POSIX
docs
are very vague on this. What exactly is the effect of being in a process
group with some other process, and what does being in the same session with it
add on top? And what is the benefit of being a group/session leader in contrast of just being a normal random process in the group/session?

The only thing I understood is that kill(2) with a negative first
parameter can be used to “multicast” signals to entire process groups, and that
SIGINT on C-c is delivered that way. But, is that all? The POSIX docs say
… for the purpose of signaling, placement in foreground or background,
and other job control actions
“, which is very vague. What are those
other job control actions?“. What does job control persist of besides
multicasting signals? And what is “placement in foreground or background” other
than delivering signals?

And I totally don’t get POSIX sessions and how they differ from POSIX process groups. Please enlighten me!

Puzzled,
    Lennart

stet and AGPLv3

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2007/11/21/stet-and-agplv3.html

Many people don’t realize that the GPLv3 process actually began long
before the November 2005 announcement. For me and a few others, the GPLv3
process started much earlier. Also, in my view, it didn’t actually end
until this week, the FSF released the AGPLv3. Today, I’m particularly
proud that stet was the first software released covered by the terms of
that license.

The GPLv3 process focused on the idea of community, and a community is
built from bringing together many individual experiences. I am grateful
for all my personal experiences throughout this process. Indeed, I
would guess that other GPL fans like myself remember, as I do, the first
time the heard the phrase “GPLv3”. For me, it was a bit
early — on Tuesday 8 January 2002 in a conference room at MIT. On
that day, Richard Stallman, Eben Moglen and I sat down to have an
all-day meeting that included discussions regarding updating GPL. A key
issue that we sought to address was (in those days) called the
“Application Service Provider (ASP) problem” — now
called “Software as a Service (SaaS)”.

A few days later, on the telephone with Moglen2 one morning, as I stood in my
kitchen making oatmeal, we discussed this problem. I pointed out the
oft-forgotten section 2(c) of the GPL [version 2]. I argued that contrary
to popular belief, it does have restrictions on some minor
modifications. Namely, you have to maintain those print statements for
copyright and warranty disclaimer information. It’s reasonable, in other
words, to restrict some minor modifications to defend freedom.

We also talked about that old Computer Science problem of having a
program print its own source code. I proposed that maybe we needed a
section 2(d) that required that if a program prints its own source to
the user, that you can’t remove that feature, and that the feature must
always print the complete and corresponding source.

Within two months, Affero
GPLv1 was published
— an authorized fork of the GPL to test
the idea. From then until AGPLv3, that “Affero clause”
has had many changes, iterations and improvements, and I’m grateful
for all the excellent feedback, input and improvements that have gone
into it. The
result, the
Affero GPLv3 (AGPLv3) released on Monday
, is an excellent step
forward for software freedom licensing. While the community process
indicated that the preference was for the Affero clause to be part of
a separate license, I’m nevertheless elated that the clause continues
to live on and be part of the licensing infrastructure defending
software freedom.

Other than coining the Affero clause, my other notable personal
contribution to the GPLv3 was management of a software development
project to create the online public commenting system. To do the
programming, we contracted with Orion Montoya, who has extensive
experience doing semantic markup of source texts from an academic
perspective. Orion gave me my first introduction to the whole
“Web 2.0” thing, and I was amazed how useful the result was;
it helped the leaders of the process easily grok the public response.
For example, the intensity highlighting — which shows the hot
spots in the text that received the most comments — gives a very
quick picture of sections that are really of concern to the public. In
reviewing the drafts today, I was reminded that the big red area in
section 1 about “encryption and authorization codes”
is
substantially
changed and less intensely highlighted by draft 4
. That quick-look
gives a clear picture of how the community process operated to get a
better license for everyone.

Orion, a Classics scholar as an undergrad, named the
software stet for its original Latin definition: “let it
stand as it is”. It was his hope that stet (the software) would
help along the GPLv3 process so that our whole community, after filing
comments on each successive draft, could look at the final draft and
simply say: Stet!

Stet has a special place in software history, I believe, even if it’s
just a purely geeky one. It is the first software system in history to
be meta-licensed. Namely, it was software whose output was its own
license. It’s with that exciting hacker concept that I put up today
a Trac instance
for stet, licensed under the terms of the AGPLv3 [ which is now on
Gitorious ]
1.

Stet is by no means ready for drop-in production. Like most software
projects, we didn’t estimate perfectly how much work would be needed.
We got lazy about organization early on, which means it still requires a
by-hand install, and new texts must be carefully marked up by hand.
We’ve moved on to other projects, but hopefully SFLC will host the Trac
instance indefinitely so that other developers can make it better.
That’s what copylefted FOSS is all about — even when it’s
SaaS.

1Actually, it’s
under AGPLv3 plus an exception to allow for combining with the
GPLv2-only Request Tracker, with which parts of stet combine.

2Update
2016-01-06:After writing this blog post, I found
evidence in my email archives from early 2002, wherein Henry Poole (who
originally suggested the need for Affero GPL to FSF), began cc’ing me anew
on an existing thread. In that thread, Poole quoted text from Moglen
proposing the original AGPLv1 idea to Poole. Moglen’s quoted text in
Poole’s email proposed the idea as if it were solely Moglen’s own. Based
on the timeline of the emails I have, Moglen seems to have written to Poole
within 36-48 hours of my original formulation of the idea.

While I do not accuse Moglen of plagiarism, I believe he does at least
misremember my idea as his own, which is particularly surprising, as Moglen
(at that time, in 2002) seemed unfamiliar with the Computer Science concept
of a quine; I had to explain that concept as part of my presentation of my
idea. Furthermore, Moglen and I discussed this matter in a personal
conversation in 2007 (around the time I made this blog post originally) and
Moglen said to me: “you certainly should take credit for the Affero
GPL”. Thus, I thought the matter was thus fully settled back in
2007, and thus Moglen’s post-2007 claims of credit that write me out of
Affero GPL’s history are simply baffling. To clear up the confusion his
ongoing claims create, I added this footnote to communicate unequivocally
that my memory of that phone call is solid, because it was the first time I
ever came up with a particularly interesting licensing idea, so the memory
became extremely precious to me immediately. I am therefore completely
sure I was the first to propose the original idea of mandating preservation
of a quine-like feature in AGPLv1§2(d) (as a fork/expansion of
GPLv2§2(c)) on the telephone to Moglen, as described above. Moglen
has never produced evidence to dispute my recollection, and even agreed
with the events as I told them back in 2007.

Nevertheless, unlike Moglen, I do admit that creation of the final text of
AGPLv1 was a collaborative process, which included contributions from
Moglen, Poole, RMS, and a lawyer (whose name I don’t recall) whom Poole
hired. AGPLv3§13’s drafting was similarly collaborative, and included
input from Richard Fontana, David Turner, and Brett Smith, too.

Finally, I note my surprise at this outcome. In my primary community
— the Free Software community — people are generally extremely
good at giving proper credit. Unlike the Free Software community, legal
communities apparently are cutthroat on the credit issue, so I’ve
learned.

Emulated atomic operations and real-time scheduling

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/atomic-rt.html

Unfortunately not all CPU architectures have native support for atomic
operations
, or only support a very limited subset. Most
prominently ARMv5 (and
older) hasn’t any support besides the most basic atomic swap
operation[1]. Now, more and more free code is starting to
use atomic operations and lock-free
algorithms
, one being my own project, PulseAudio. If you have ever done real-time
programming
you probably know that you cannot really do it without
support for atomic operations. One question remains however: what to
do on CPUs which support only the most basic atomic operations
natively?

On the kernel side atomic ops are very easy to emulate: just disable
interrupts temporarily, then do your operation non-atomically, and afterwards
enable them again. That’s relatively cheap and works fine (unless you are on SMP — which
fortunately you usually are not for those CPUs). The Linux
kernel does it this way and it is good. But what to do in user-space, where you cannot just go and disable interrupts?

Let’s see how the different userspace libraries/frameworks do it
for ARMv5, a very relevant architecture that only knows an atomic swap (exchange)
but no CAS
or even atomic arithmetics. Let’s start with an excerpt from glibc’s
atomic operations implementation for ARM
:

/* Atomic compare and exchange. These sequences are not actually atomic;
there is a race if *MEM != OLDVAL and we are preempted between the two
swaps. However, they are very close to atomic, and are the best that a
pre-ARMv6 implementation can do without operating system support.
LinuxThreads has been using these sequences for many years. */

This comment says it all. Not good. The more you make use of atomic
operations the more likely you’re going to to hit this race. Let’s
hope glibc is not a heavy user of atomic operations. PulseAudio however is, and
PulseAudio happens to be my focus.

Let’s have a look on how Qt4
does it:

extern Q_CORE_EXPORT char q_atomic_lock;

inline char q_atomic_swp(volatile char *ptr, char newval)
{
register int ret;
asm volatile(“swpb %0,%1,[%2]”
: “=&r”(ret)
: “r”(newval), “r”(ptr)
: “cc”, “memory”);
return ret;
}

inline int q_atomic_test_and_set_int(volatile int *ptr, int expected, int newval)
{
int ret = 0;
while (q_atomic_swp(&q_atomic_lock, ~0) != 0);
if (*ptr == expected) {
*ptr = newval;
ret = 1;
}
q_atomic_swp(&q_atomic_lock, 0);
return ret;
}

So, what do we have here? A slightly better version. In standard
situations it actually works. But it sucks big time, too. Why? It
contains a spin lock: the variable q_atomic_lock is used for
locking the atomic operation. The code tries to set it to non-zero,
and if that fails it tries again, until it succeeds, in the hope that
the other thread — which currently holds the lock — gives it up. The
big problem here is: it might take a while until that happens, up to
1/HZ time on Linux. Usually you want to use atomic operations to
minimize the need for mutexes and thus speed things up. Now, here you
got a lock, and it’s the worst kind: the spinning lock. Not
good. Also, if used from a real-time thread the machine simply locks
up when we enter the loop in contended state, because preemption is
disabled for RT threads and thus the loop will spin forever. Evil. And
then, there’s another problem: it’s a big bottleneck, because all
atomic operations are synchronized via a single variable which is
q_atomic_lock. Not good either. And let’s not forget that
only code that has access to q_atomic_lock actually can
execute this code safely. If you want to use it for
lock-free IPC via shared memory this is going to break. And let’s not
forget that it is unusable from signal handlers (which probably
doesn’t matter much, though). So, in summary: this code sucks,
too.

Next try, let’s have a look on how glib
does it:

static volatile int atomic_spin = 0;

static int atomic_spin_trylock (void)
{
int result;

asm volatile (
“swp %0, %1, [%2]n”
: “=&r,&r” (result)
: “r,0” (1), “r,r” (&atomic_spin)
: “memory”);
if (result == 0)
return 0;
else
return -1;
}

static void atomic_spin_lock (void)
{
while (atomic_spin_trylock())
sched_yield();
}

static void atomic_spin_unlock (void)
{
atomic_spin = 0;
}

gint
g_atomic_int_exchange_and_add (volatile gint *atomic,
gint val)
{
gint result;

atomic_spin_lock();
result = *atomic;
*atomic += val;
atomic_spin_unlock();

return result;
}

Once again, a spin loop. However, this implementation makes use of
sched_yield() for asking the OS to reschedule. It’s a bit
better than the Qt version, since it doesn’t spin just burning CPU,
but instead tells the kernel to execute something else, increasing the
chance that the thread currently holding the lock is scheduled. It’s a
bit friendlier, but it’s not great either because this might still delay
execution quite a bit. It’s better then the Qt version. And probably
one of the very few ligitimate occasions where using
sched_yield() is OK. It still doesn’t work for RT — because
sched_yield() in most cases is a NOP on for RT threads, so
you still get a machine lockup. And it still has the
one-lock-to-rule-them-all bottleneck. And it still is not compatible
with shared memory.

Then, there’s libatomic_ops. It’s
the most complex code, so I’ll spare you to paste it here. Basically
it uses the same spin loop. With three differences however:

16 lock variables instead of a single one are used. The variable
that is used is picked via simple hashing of the pointer to the atomic variable
that shall be modified. This removes the one-lock-to-rule-them-all
bottleneck.

Instead of pthread_yield() it uses select() with
a small timeval parameter to give the current holder of the lock some
time to give it up. To make sure that the select() is not
optimized away by the kernel and the thread thus never is preempted
the sleep time is increased on every loop iteration.

It explicitly disables signals before doing the atomic operation.

It’s certainly the best implementation of the ones discussed here:
It doesn’t suffer by the one-lock-to-rule-them-all bottleneck. It’s
(supposedly) signal handler safe (which however comes at the cost of
doing two syscalls on every atomic operation — probably a very high
price). It actually works on RT, due to sleeping for an explicit
time. However it still doesn’t deal with priority
inversion
problems — which is a big issue for real-time
programming. Also, the time slept in the select() call might
be relatively long, since at least on Linux the time passed to
select() is rounded up to 1/HZ — not good for RT either. And
then, it still doesn’t work for shared memory IPC.

So, what do we learn from this? At least one thing: better don’t do
real-time programming with ARMv5[2]. But more practically, how
could a good emulation for atomic ops, solely based on atomic swap
look like? Here are a few ideas:

Use an implementation inspired by libatomic_ops. Right
now it’s the best available. It’s probably a good idea, though, to
replace select() by a nanosleep(), since on recent
kernels the latter doesn’t round up to 1/HZ anymore, at least when you
have high-resolution timers[3] Then, if you can live
without signal handler safety, drop the signal mask changing.

If you use something based on libatomic_ops and want to
use it for shared memory IPC, then you have the option to move the
lock variables into shared memory too. Note however, that this allows
evil applications to lock up your process by taking the locks and
never giving them up. (Which however is always a problem if not all
atomic operations you need are available in hardware) So if you do
this, make sure that only trusted processes can attach to your memory
segment.

Alternatively, spend some time and investigate if it is possible
to use futexes to sleep on the lock variables. This is not trivial
though, since futexes right now expect the availability of an atomic
increment operation. But it might be possible to emulate this good
enough with the swap operation. There’s now even a FUTEX_LOCK_PI
operation which would allow priority inheritance.

Alternatively, find a a way to allow user space disabling
interrupts cheaply (requires kernel patching). Since enabling
RT scheduling is a priviliged operation already (since you may easily
lock up your machine with it), it might not be too problematic to
extend the ability to disable interrupts to user space: it’s just yet
another way to lock up your machine.

For the libatomic_ops based algorithm: if you’re lucky
and defined a struct type for your atomic integer types, like the
kernel does, or like I do in PulseAudio with pa_atomic_t,
then you can stick the lock variable directly into your
structure. This makes shared memory support transparent, and removes
the one-lock-to-rule-them-all bottleneck completely. Of course, OTOH it
increases the memory consumption a bit and increases cache pressure
(though I’d assume that this is neglible).

For the libatomic_ops based algorithm: start sleeping for
the time returned by clock_getres()
(cache the result!). You cannot sleep shorter than that anyway.

Yepp, that’s as good as it gets. Unfortunately I cannot serve you
the optimal solution on a silver platter. I never actually did
development for ARMv5, this blog story just sums up my thoughts on all
the code I saw which emulates atomic ops on ARMv5. But maybe someone
who actually cares about atomic operations on ARM finds this
interesting and maybe invests some time to prepare patches for Qt,
glib, glibc — and PulseAudio.

Update: I added two more ideas to the list above.

Update 2: Andrew Haley just posted something like the optimal solution for the problem. It would be great if people would start using this.

Footnotes

[1] The Nokia 770 has an ARMv5 chip, N800 has ARMv6. The OpenMoko phone apparently uses ARMv5.

[2] And let’s not even think about CPUs which don’t even have an atomic swap!

[3] Which however you probably won’t, given that they’re only available on x86 on stable Linux kernels for now — but still, it’s cleaner.

Emulated atomic operations and real-time scheduling

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/atomic-rt.html

Unfortunately not all CPU architectures have native support for atomic
operations
, or only support a very limited subset. Most
prominently ARMv5 (and
older) hasn’t any support besides the most basic atomic swap
operation[1]. Now, more and more free code is starting to
use atomic operations and lock-free
algorithms
, one being my own project, PulseAudio. If you have ever done real-time
programming
you probably know that you cannot really do it without
support for atomic operations. One question remains however: what to
do on CPUs which support only the most basic atomic operations
natively?

On the kernel side atomic ops are very easy to emulate: just disable
interrupts temporarily, then do your operation non-atomically, and afterwards
enable them again. That’s relatively cheap and works fine (unless you are on SMP — which
fortunately you usually are not for those CPUs). The Linux
kernel does it this way and it is good. But what to do in user-space, where you cannot just go and disable interrupts?

Let’s see how the different userspace libraries/frameworks do it
for ARMv5, a very relevant architecture that only knows an atomic swap (exchange)
but no CAS
or even atomic arithmetics. Let’s start with an excerpt from glibc’s
atomic operations implementation for ARM
:

/* Atomic compare and exchange.  These sequences are not actually atomic;
   there is a race if *MEM != OLDVAL and we are preempted between the two
   swaps.  However, they are very close to atomic, and are the best that a
   pre-ARMv6 implementation can do without operating system support.
   LinuxThreads has been using these sequences for many years.  */

This comment says it all. Not good. The more you make use of atomic
operations the more likely you’re going to to hit this race. Let’s
hope glibc is not a heavy user of atomic operations. PulseAudio however is, and
PulseAudio happens to be my focus.

Let’s have a look on how Qt4
does it:

extern Q_CORE_EXPORT char q_atomic_lock;

inline char q_atomic_swp(volatile char *ptr, char newval)
{
    register int ret;
    asm volatile("swpb %0,%1,[%2]"
                 : "=&r"(ret)
                 : "r"(newval), "r"(ptr)
                 : "cc", "memory");
    return ret;
}

inline int q_atomic_test_and_set_int(volatile int *ptr, int expected, int newval)
{
    int ret = 0;
    while (q_atomic_swp(&q_atomic_lock, ~0) != 0);
    if (*ptr == expected) {
	*ptr = newval;
	ret = 1;
    }
    q_atomic_swp(&q_atomic_lock, 0);
    return ret;
}

So, what do we have here? A slightly better version. In standard
situations it actually works. But it sucks big time, too. Why? It
contains a spin lock: the variable q_atomic_lock is used for
locking the atomic operation. The code tries to set it to non-zero,
and if that fails it tries again, until it succeeds, in the hope that
the other thread — which currently holds the lock — gives it up. The
big problem here is: it might take a while until that happens, up to
1/HZ time on Linux. Usually you want to use atomic operations to
minimize the need for mutexes and thus speed things up. Now, here you
got a lock, and it’s the worst kind: the spinning lock. Not
good. Also, if used from a real-time thread the machine simply locks
up when we enter the loop in contended state, because preemption is
disabled for RT threads and thus the loop will spin forever. Evil. And
then, there’s another problem: it’s a big bottleneck, because all
atomic operations are synchronized via a single variable which is
q_atomic_lock. Not good either. And let’s not forget that
only code that has access to q_atomic_lock actually can
execute this code safely. If you want to use it for
lock-free IPC via shared memory this is going to break. And let’s not
forget that it is unusable from signal handlers (which probably
doesn’t matter much, though). So, in summary: this code sucks,
too.

Next try, let’s have a look on how glib
does it:

static volatile int atomic_spin = 0;

static int atomic_spin_trylock (void)
{
  int result;

  asm volatile (
    "swp %0, %1, [%2]\n"
    : "=&r,&r" (result)
    : "r,0" (1), "r,r" (&atomic_spin)
    : "memory");
  if (result == 0)
    return 0;
  else
    return -1;
}

static void atomic_spin_lock (void)
{
  while (atomic_spin_trylock())
    sched_yield();
}

static void atomic_spin_unlock (void)
{
  atomic_spin = 0;
}

gint
g_atomic_int_exchange_and_add (volatile gint *atomic,
			       gint           val)
{
  gint result;

  atomic_spin_lock();
  result = *atomic;
  *atomic += val;
  atomic_spin_unlock();

  return result;
}

Once again, a spin loop. However, this implementation makes use of
sched_yield() for asking the OS to reschedule. It’s a bit
better than the Qt version, since it doesn’t spin just burning CPU,
but instead tells the kernel to execute something else, increasing the
chance that the thread currently holding the lock is scheduled. It’s a
bit friendlier, but it’s not great either because this might still delay
execution quite a bit. It’s better then the Qt version. And probably
one of the very few ligitimate occasions where using
sched_yield() is OK. It still doesn’t work for RT — because
sched_yield() in most cases is a NOP on for RT threads, so
you still get a machine lockup. And it still has the
one-lock-to-rule-them-all bottleneck. And it still is not compatible
with shared memory.

Then, there’s libatomic_ops. It’s
the most complex code, so I’ll spare you to paste it here. Basically
it uses the same spin loop. With three differences however:

  1. 16 lock variables instead of a single one are used. The variable
    that is used is picked via simple hashing of the pointer to the atomic variable
    that shall be modified. This removes the one-lock-to-rule-them-all
    bottleneck.
  2. Instead of pthread_yield() it uses select() with
    a small timeval parameter to give the current holder of the lock some
    time to give it up. To make sure that the select() is not
    optimized away by the kernel and the thread thus never is preempted
    the sleep time is increased on every loop iteration.
  3. It explicitly disables signals before doing the atomic operation.

It’s certainly the best implementation of the ones discussed here:
It doesn’t suffer by the one-lock-to-rule-them-all bottleneck. It’s
(supposedly) signal handler safe (which however comes at the cost of
doing two syscalls on every atomic operation — probably a very high
price). It actually works on RT, due to sleeping for an explicit
time. However it still doesn’t deal with priority
inversion
problems — which is a big issue for real-time
programming. Also, the time slept in the select() call might
be relatively long, since at least on Linux the time passed to
select() is rounded up to 1/HZ — not good for RT either. And
then, it still doesn’t work for shared memory IPC.

So, what do we learn from this? At least one thing: better don’t do
real-time programming with ARMv5[2]. But more practically, how
could a good emulation for atomic ops, solely based on atomic swap
look like? Here are a few ideas:

  • Use an implementation inspired by libatomic_ops. Right
    now it’s the best available. It’s probably a good idea, though, to
    replace select() by a nanosleep(), since on recent
    kernels the latter doesn’t round up to 1/HZ anymore, at least when you
    have high-resolution timers[3] Then, if you can live
    without signal handler safety, drop the signal mask changing.
  • If you use something based on libatomic_ops and want to
    use it for shared memory IPC, then you have the option to move the
    lock variables into shared memory too. Note however, that this allows
    evil applications to lock up your process by taking the locks and
    never giving them up. (Which however is always a problem if not all
    atomic operations you need are available in hardware) So if you do
    this, make sure that only trusted processes can attach to your memory
    segment.
  • Alternatively, spend some time and investigate if it is possible
    to use futexes to sleep on the lock variables. This is not trivial
    though, since futexes right now expect the availability of an atomic
    increment operation. But it might be possible to emulate this good
    enough with the swap operation. There’s now even a FUTEX_LOCK_PI
    operation which would allow priority inheritance.
  • Alternatively, find a a way to allow user space disabling
    interrupts cheaply (requires kernel patching). Since enabling
    RT scheduling is a priviliged operation already (since you may easily
    lock up your machine with it), it might not be too problematic to
    extend the ability to disable interrupts to user space: it’s just yet
    another way to lock up your machine.
  • For the libatomic_ops based algorithm: if you’re lucky
    and defined a struct type for your atomic integer types, like the
    kernel does, or like I do in PulseAudio with pa_atomic_t,
    then you can stick the lock variable directly into your
    structure. This makes shared memory support transparent, and removes
    the one-lock-to-rule-them-all bottleneck completely. Of course, OTOH it
    increases the memory consumption a bit and increases cache pressure
    (though I’d assume that this is neglible).
  • For the libatomic_ops based algorithm: start sleeping for
    the time returned by clock_getres()
    (cache the result!). You cannot sleep shorter than that anyway.

Yepp, that’s as good as it gets. Unfortunately I cannot serve you
the optimal solution on a silver platter. I never actually did
development for ARMv5, this blog story just sums up my thoughts on all
the code I saw which emulates atomic ops on ARMv5. But maybe someone
who actually cares about atomic operations on ARM finds this
interesting and maybe invests some time to prepare patches for Qt,
glib, glibc — and PulseAudio.

Update: I added two more ideas to the list above.

Update 2: Andrew Haley just posted something like the optimal solution for the problem. It would be great if people would start using this.

Footnotes

[1] The Nokia 770 has an ARMv5 chip, N800 has ARMv6. The OpenMoko phone apparently uses ARMv5.

[2] And let’s not even think about CPUs which don’t even have an atomic swap!

[3] Which however you probably won’t, given that they’re only available on x86 on stable Linux kernels for now — but still, it’s cleaner.

Rain in Montreal

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/photos/montreal-rain.html

Sometimes, rain can be quite beautiful.

Montreal 1 Montreal 2 Montreal 3

I took these during my stay at Montreal after OLS 2007. Which reminds me: don’t miss my talks at foss.in 2007, linux.conf.au 2008 and FOMS 2008. I’ll be speaking about Avahi, PulseAudio and practical real-time programming in userspace.

The next step

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/pa-097.html

A few minutes ago, I finally released PulseAudio 0.9.7. Changes are numerous,
especially internally where the core is now threaded and mostly lock-free.
Check the rough list on the milestone page, announcement email. As many of you
know we are shipping a pre-release of 0.9.7 in Fedora 8, enabled by default. The final release
offers quite a few additions over that prerelease. To show off a couple of nice features, here’s a screencast, showing hotplug, simultaneous playback (what Apple calls aggregation) and zeroconfish network support:

screencast

Please excuse the typos. Yes, I still use XMMS, don’t ask [1]. Yes, you
need a bit of imagination to fully appreciate a screencast that lacks an audio track — but demos audio software.

So, what’s coming next? Earcandy, timer-based scheduling/”glitch-free” audio, scriptability through Lua, the todo list is huge. My unnoffical, scratchy, partly german TODO list for PulseAudio is available online.

As it appears all relevant distros will now move to PA by default. So,
hopefully, PA is coming to a desktop near you pretty soon. — Oh, you are one
of those who still don’t see the benefit of a desktop sound server? Then,
please reread this too
long email of mine
, or maybe this
ars.technica article
.

OTOH, if you happen to like this release, then consider giving me a kudo on ohloh.net, my ego wants a golden 10. 😉

logo

Footnotes:

[1] Those music players which categorize audio by ID3 tags just don’t
work for me, because most of my music files are very badly named. However, my
directory structure is very well organized, but all those newer players don’t
care about directory structures as it seems. XMMS doesn’t really either, but
xmms . does the job from the terminal.

Flameeyes, thank’s for hosting this clip.

The next step

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/pa-097.html

A few minutes ago, I finally released PulseAudio 0.9.7. Changes are numerous,
especially internally where the core is now threaded and mostly lock-free.
Check the rough list on the milestone page, announcement email. As many of you
know we are shipping a pre-release of 0.9.7 in Fedora 8, enabled by default. The final release
offers quite a few additions over that prerelease. To show off a couple of nice features, here’s a screencast, showing hotplug, simultaneous playback (what Apple calls aggregation) and zeroconfish network support:

screencast

Please excuse the typos. Yes, I still use XMMS, don’t ask [1]. Yes, you
need a bit of imagination to fully appreciate a screencast that lacks an audio track — but demos audio software.

So, what’s coming next? Earcandy, timer-based scheduling/”glitch-free” audio, scriptability through Lua, the todo list is huge. My unnoffical, scratchy, partly german TODO list for PulseAudio is available online.

As it appears all relevant distros will now move to PA by default. So,
hopefully, PA is coming to a desktop near you pretty soon. — Oh, you are one
of those who still don’t see the benefit of a desktop sound server? Then,
please reread this too
long email of mine
, or maybe this
ars.technica article
.

OTOH, if you happen to like this release, then consider giving me a kudo on ohloh.net, my ego wants a golden 10. 😉

logo

Footnotes:

[1] Those music players which categorize audio by ID3 tags just don’t
work for me, because most of my music files are very badly named. However, my
directory structure is very well organized, but all those newer players don’t
care about directory structures as it seems. XMMS doesn’t really either, but
xmms . does the job from the terminal.

Flameeyes, thank’s for hosting this clip.

Yummy Mango Yummy Lassi Yummy

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/lassi-lassi-popassi.html

Zeeshan,
Mango Lassi tastes a lot different than a milk shake, believe me! Also,
even if Mango Lassi was actually a western thing, do you know that just
recently I was witness of Sjoerd[1] ordering a Vindaloo Pizza
(or was it Korma?) at a Boston restaurant — italian pizza with indian-style curry on top. Now, that’s what some people
might be calling “ignorant of indian cuisine”. But actually I think that, like
in music, mixing different styles, combining things from different origins is a
good thing, and is what makes culture live.

Footnotes
[1] Who doesn’t have a blog. Can you believe it?

Yummy Mango Yummy Lassi Yummy

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/lassi-lassi-popassi.html

Zeeshan,
Mango Lassi tastes a lot different than a milk shake, believe me! Also,
even if Mango Lassi was actually a western thing, do you know that just
recently I was witness of Sjoerd[1] ordering a Vindaloo Pizza
(or was it Korma?) at a Boston restaurant — italian pizza with indian-style curry on top. Now, that’s what some people
might be calling “ignorant of indian cuisine”. But actually I think that, like
in music, mixing different styles, combining things from different origins is a
good thing, and is what makes culture live.

Footnotes
[1] Who doesn’t have a blog. Can you believe it?

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close