The GPL is a Tool to Encourage Freedom, Not an End in Itself

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/04/10/gpl-not-end-in-itself.html

I was amazed to be involved in yet another discussion recently
regarding the old debate about the scope of the GPL under copyright law.
The debate itself isn’t amazing — these debates have happened
somewhere every six months, almost on cue, since around 1994 or so.
What amazed me this time is that some people in the debate believed that
the GPL proponents intend to sneakily pursue an increased scope for
copyright law. Those who think that have completely misunderstood the
fundamental idea behind the GPL.

I’m disturbed by the notion that some believe the goal of the GPL is to
expand copyrightability and the inclusiveness of derivative works. It
seems that so many forget (or maybe they never even knew) that copyleft
was invented to hack copyright — to turn its typical applications
to software inside out. The state of affairs that software is
controlled by draconian copyright rules is a lamentable reality;
copyleft is merely a tool that diffuses the proprietary copyright
weaponry.

But, if it were possible to really consider reduction in copyright
control over software, then I don’t know of a single GPL proponent who
wouldn’t want to bilaterally reduce copyright’s scope for software. For
example, I’ve often proposed, since around 2001, that perhaps copyright
for software should only last three years, non-renewable, and that it
require all who wished to distribute non-public-domain software to
register the source with the Copyright Office. At the end of the three
years, the Copyright Office would automatically publish that now
public-domain source to the world.

If my hypothetical system were the actual (and only) legal regime for
software, and were equally applied to all software — from the
fully Free to the most proprietary — I’d have no sadness at all
that opportunities for GPL enforcement ended after three years, and that
all GPL’d software fell into the public domain on that tight schedule,
because proprietary software and FLOSS would have the same treatment.
Meanwhile, great benefit would be gained for the freedom of all software
users. In short, GPL is not an end in itself, and I wouldn’t want to
ignore the actual goal — more freedom for software users —
merely to strengthen one tool in that battle.

In one of my favorite films, Kevin Smith’s Dogma, Chris
Rock’s character, Rufus, argues that it’s better to have ideas than
beliefs, because ideas can change when the situation does, but beliefs
become ingrained and are harder to shake. I’m not a belief-less person,
but I certainly hold the GPL and the notion of copyleft firmly in the
“idea” camp, not the “belief” one. It’s
unfortunate that the entrenched interests outside of software are (more
or less) inadvertently strengthening software copyright, too. Thus, in
the meantime, we must hold steadfast to the GPL going as far as is
legally permitted under this ridiculously expansive copyright system we
have. But, should a real policy dialogue open on the reduction software
copyright’s scope, GPL proponents will be the first in line to encourage
such bilateral reduction.

The GPL is a Tool to Encourage Freedom, Not an End in Itself

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/04/10/gpl-not-end-in-itself.html

I was amazed to be involved in yet another discussion recently
regarding the old debate about the scope of the GPL under copyright law.
The debate itself isn’t amazing — these debates have happened
somewhere every six months, almost on cue, since around 1994 or so.
What amazed me this time is that some people in the debate believed that
the GPL proponents intend to sneakily pursue an increased scope for
copyright law. Those who think that have completely misunderstood the
fundamental idea behind the GPL.

I’m disturbed by the notion that some believe the goal of the GPL is to
expand copyrightability and the inclusiveness of derivative works. It
seems that so many forget (or maybe they never even knew) that copyleft
was invented to hack copyright — to turn its typical applications
to software inside out. The state of affairs that software is
controlled by draconian copyright rules is a lamentable reality;
copyleft is merely a tool that diffuses the proprietary copyright
weaponry.

But, if it were possible to really consider reduction in copyright
control over software, then I don’t know of a single GPL proponent who
wouldn’t want to bilaterally reduce copyright’s scope for software. For
example, I’ve often proposed, since around 2001, that perhaps copyright
for software should only last three years, non-renewable, and that it
require all who wished to distribute non-public-domain software to
register the source with the Copyright Office. At the end of the three
years, the Copyright Office would automatically publish that now
public-domain source to the world.

If my hypothetical system were the actual (and only) legal regime for
software, and were equally applied to all software — from the
fully Free to the most proprietary — I’d have no sadness at all
that opportunities for GPL enforcement ended after three years, and that
all GPL’d software fell into the public domain on that tight schedule,
because proprietary software and FLOSS would have the same treatment.
Meanwhile, great benefit would be gained for the freedom of all software
users. In short, GPL is not an end in itself, and I wouldn’t want to
ignore the actual goal — more freedom for software users —
merely to strengthen one tool in that battle.

In one of my favorite films, Kevin Smith’s Dogma, Chris
Rock’s character, Rufus, argues that it’s better to have ideas than
beliefs, because ideas can change when the situation does, but beliefs
become ingrained and are harder to shake. I’m not a belief-less person,
but I certainly hold the GPL and the notion of copyleft firmly in the
“idea” camp, not the “belief” one. It’s
unfortunate that the entrenched interests outside of software are (more
or less) inadvertently strengthening software copyright, too. Thus, in
the meantime, we must hold steadfast to the GPL going as far as is
legally permitted under this ridiculously expansive copyright system we
have. But, should a real policy dialogue open on the reduction software
copyright’s scope, GPL proponents will be the first in line to encourage
such bilateral reduction.

The GPL is a Tool to Encourage Freedom, Not an End in Itself

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/04/10/gpl-not-end-in-itself.html

I was amazed to be involved in yet another discussion recently
regarding the old debate about the scope of the GPL under copyright law.
The debate itself isn’t amazing — these debates have happened
somewhere every six months, almost on cue, since around 1994 or so.
What amazed me this time is that some people in the debate believed that
the GPL proponents intend to sneakily pursue an increased scope for
copyright law. Those who think that have completely misunderstood the
fundamental idea behind the GPL.

I’m disturbed by the notion that some believe the goal of the GPL is to
expand copyrightability and the inclusiveness of derivative works. It
seems that so many forget (or maybe they never even knew) that copyleft
was invented to hack copyright — to turn its typical applications
to software inside out. The state of affairs that software is
controlled by draconian copyright rules is a lamentable reality;
copyleft is merely a tool that diffuses the proprietary copyright
weaponry.

But, if it were possible to really consider reduction in copyright
control over software, then I don’t know of a single GPL proponent who
wouldn’t want to bilaterally reduce copyright’s scope for software. For
example, I’ve often proposed, since around 2001, that perhaps copyright
for software should only last three years, non-renewable, and that it
require all who wished to distribute non-public-domain software to
register the source with the Copyright Office. At the end of the three
years, the Copyright Office would automatically publish that now
public-domain source to the world.

If my hypothetical system were the actual (and only) legal regime for
software, and were equally applied to all software — from the
fully Free to the most proprietary — I’d have no sadness at all
that opportunities for GPL enforcement ended after three years, and that
all GPL’d software fell into the public domain on that tight schedule,
because proprietary software and FLOSS would have the same treatment.
Meanwhile, great benefit would be gained for the freedom of all software
users. In short, GPL is not an end in itself, and I wouldn’t want to
ignore the actual goal — more freedom for software users —
merely to strengthen one tool in that battle.

In one of my favorite films, Kevin Smith’s Dogma, Chris
Rock’s character, Rufus, argues that it’s better to have ideas than
beliefs, because ideas can change when the situation does, but beliefs
become ingrained and are harder to shake. I’m not a belief-less person,
but I certainly hold the GPL and the notion of copyleft firmly in the
“idea” camp, not the “belief” one. It’s
unfortunate that the entrenched interests outside of software are (more
or less) inadvertently strengthening software copyright, too. Thus, in
the meantime, we must hold steadfast to the GPL going as far as is
legally permitted under this ridiculously expansive copyright system we
have. But, should a real policy dialogue open on the reduction software
copyright’s scope, GPL proponents will be the first in line to encourage
such bilateral reduction.

What’s Cooking in PulseAudio’s glitch-free Branch

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/pulse-glitch-free.html

A while ago I started development of special branch of PulseAudio which is called
glitch-free. In a few days I will merge it back to PulseAudio
trunk, and eventually release it as 0.9.11. I think it’s time to
explain a little what all this “glitch-freeness” is about, what made
it so tricky to implement, and why this is totally awesome
technology. So, here we go:

Traditional Playback Model

Traditionally on most operating systems audio is scheduled via
sound card interrupts
(IRQs)
. When an application opens a sound card for playback it
configures it for a fixed size playback buffer. Then it fills this
buffer with digital PCM
sample data. And after that it tells the hardware to start
playback. Then, the hardware reads the samples from the buffer, one at
a time, and passes it on to the DAC
so that eventually it reaches the speakers.

After a certain number of samples played the sound hardware
generates an interrupt. This interrupt is forwarded to the
application. On Linux/Unix this is done via poll()/select(),
which the application uses to sleep on the sound card file
descriptor. When the application is notified via this interrupt it
overwrites the samples that were just played by the hardware with new
data and goes to sleep again. When the next interrupt arrives the next
block of samples is overwritten, and so on and so on. When the
hardware reaches the end of the hardware buffer it starts from its
beginning again, in a true ring buffer
fashion. This goes on and on and on.

The number of samples after which an interrupt is generated is
usually called a fragment (ALSA likes to call the same thing a
period for some reason). The number of fragments the entire
playback buffer is split into is usually integral and usually a power of
two, 2 and 4 being the most frequently used values.

Schematic overview
Image 1: Schematic overview of the playback buffer in the traditional playback model, in the best way the author can visualize this with his limited drawing abilities.

If the application is not quick enough to fill up the hardware
buffer again after an interrupt we get a buffer underrun
(“drop-out”). An underrun is clearly hearable by the user as a
discontinuity in audio which is something we clearly don’t want. We
thus have to carefully make sure that the buffer and fragment sizes
are chosen in a way that the software has enough time to calculate the
data that needs to be played, and the OS has enough time to forward
the interrupt from the hardware to the userspace software and the
write request back to the hardware.

Depending on the requirements of the application the size of the
playback buffer is chosen. It can be as small as 4ms for low-latency
applications (such as music synthesizers), or as long as 2s for
applications where latency doesn’t matter (such as music players). The
hardware buffer size directly translates to the latency that the
playback adds to the system. The smaller the fragment sizes the
application configures, the more time the application has to fill up
the playback buffer again.

Let’s formalize this a bit: Let BUF_SIZE be the size of the
hardware playback buffer in samples, FRAG_SIZE the size of one
fragment in samples, and NFRAGS the number of fragments the buffer is
split into (equivalent to BUF_SIZE divided by FRAG_SIZE), RATE the sampling
rate in samples per second. Then, the overall latency is identical to
BUF_SIZE/RATE. An interrupt is generated every FRAG_SIZE/RATE. Every
time one of those interrupts is generated the application should fill
up one fragment again, if it missed one interrupt this might become
more than one. If it doesn’t miss any interrupt it has
(NFRAGS-1)*FRAG_SIZE/RATE time to fulfill the request. If it needs
more time than this we’ll get an underrun. The fill level of the
playback buffer should thus usually oscillate between BUF_SIZE and
(NFRAGS-1)*FRAG_SIZE. In case of missed interrupts it might however
fall considerably lower, in the worst case to 0 which is, again, an
underrun.

It is difficult to choose the buffer and fragment sizes in an
optimal way for an application:

  • The buffer size should be as large as possible to minimize the
    risk of drop-outs.
  • The buffer size should be as small as possible to guarantee
    minimal latencies.
  • The fragment size should be as large as possible to minimize the
    number of interrupts, and thus the required CPU time used, to maximize
    the time the CPU can sleep for between interrupts and thus the battery
    lifetime (i.e. the fewer interrupts are generated the lower your audio
    app will show up in powertop, and that’s what all is about,
    right?)
  • The fragment size should be as small as possible to give the
    application as much time as possible to fill up the playback buffer,
    to minimize drop-outs.

As you can easily see it is impossible to choose buffering metrics
in a way that they are optimal on all four requirements.

This traditional model has major drawbacks:

  • The buffering metrics are highly dependant on what the sound hardware
    can provide. Portable software needs to be able to deal with hardware
    that can only provide a very limited set of buffer and fragment
    sizes.
  • The buffer metrics are configured only once, when the device is
    opened, they usually cannot be reconfigured during playback without
    major discontinuities in audio. This is problematic if more than one
    application wants to output audio at the same time via a sound server
    (or dmix) and they have different requirements on
    latency. For these sound servers/dmix the fragment metrics are
    configured statically in a configuration file, and are the same during
    the whole lifetime. If a client connects that needs lower latencies,
    it basically lost. If a client connects that doesn’t need as low
    latencies, we will continouisly burn more CPU/battery than
    necessary.
  • It is practically impossible to choose the buffer metrics optimal
    for your application — there are too many variables in the equation:
    you can’t know anything about the IRQ/scheduling latencies of the
    OS/machine your software will be running on; you cannot know how much
    time it will actually take to produce the audio data that shall be
    pushed to the audio device (unless you start counting cycles, which is
    a good way to make your code unportable); the scheduling latencies are
    hugely dependant on the system load on most current OSes (unless you
    have an RT system, which we generally do not have). As said, for sound
    servers/dmix it is impossible to know in advance what the requirements
    on latency are that the applications that might eventually connect
    will have.
  • Since the number of fragments is integral and at least 2
    on almost all existing hardware we will generate at least two interrupts
    on each buffer iteration. If we fix the buffer size to 2s then we will
    generate an interrupt at least every 1s. We’d then have 1s to fill up
    the buffer again — on all modern systems this is far more than we’d
    ever need. It would be much better if we could fix the fragment size
    to 1.9s, which still gives us 100ms to fill up the playback buffer
    again, still more than necessary on most systems.

Due to the limitations of this model most current (Linux/Unix)
software uses buffer metrics that turned out to “work most of the
time”, very often they are chosen without much thinking, by copying
other people’s code, or totally at random.

PulseAudio <= 0.9.10 uses a fragment size of 25ms by default, with
four fragments. That means that right now, unless you reconfigure your
PulseAudio manually clients will not get latencies lower than 100ms
whatever you try, and as long as music is playing you will
get 40 interrupts/s. (The relevant configuration options for PulseAudio are
default-fragments= and default-fragment-size-msec=
in daemon.conf)

dmix uses 16 fragments by default with a size of 21 ms each (on my
system at least — this varies, depending on your hardware). You can’t
get less than 47 interrupts/s. (You can change the parameters in
.asoundrc)

So much about the traditional model and its limitations. Now, we’ll
have a peek on how the new glitch-free branch of PulseAudio
does its things. The technology is not really new. It’s inspired
by what Vista does these days and what Apple CoreAudio has already
been doing for quite a while. However, on Linux this technology is
new, we have been lagging behind quite a bit. Also I claim that what
PA does now goes beyond what Vista/MacOS does in many ways, though of
course, they provide much more than we provide in many other ways. The
name glitch-free is inspired by the term Microsoft uses to
call this model, however I must admit that I am not sure that my
definition of this term and theirs actually is the same.

Glitch-Free Playback Model

The first basic idea of the glitch-free playback model (a
better, less marketingy name is probably timer-based audio
scheduling
which is the term I internally use in the PA codebase)
is to no longer depend on sound card interrupts to schedule audio but
use system timers instead. System timers are far more flexible then
the fragment-based sound card timers. They can be reconfigured at any
time, and have a granularity that is independant from any buffer
metrics of the sound card. The second basic idea is to use playback
buffers that are as large as possible, up to a limit of 2s or 5s. The
third basic idea is to allow rewriting of the hardware buffer at any
time. This allows instant reaction on user-input (i.e. pause/seek
requests in your music player, or instant event sounds) although the
huge latency imposed by the hardware playback buffer would suggest
otherwise.

PA configures the audio hardware to the largest playback buffer
size possible, up to 2s. The sound card interrupts are disabled as far
as possible (most of the time this means to simply lower NFRAGS to the
minimal value supported by the hardware. It would be great if ALSA
would allow us to disable sound card interrupts entirely). Then, PA
constantly determines what the minimal latency requirement of all
connected clients is. If no client specified any requirements we fill
up the whole buffer all the time, i.e. have an actual latency of
2s. However, if some applications specified requirements, we take the
lowest one and only use as much of the configured hardware buffer as
this value allows us. In practice, this means we only partially fill the
buffer each time we wake up. Then, we configure a system timer
to wake us up 10ms before the buffer would run empty and fill it up
again then. If the overall latency is configured to less than 10ms we
wakeup after half the latency requested.

If the sleep time turns out to be too long (i.e. it took more than
10ms to fill up the hardware buffer) we will get an underrun. If this
happens we can double the time we wake up before the buffer would run
empty, to 20ms, and so on. If we notice that we only used much less
than the time we estimated, we can halve this value again. This
adaptive scheme makes sure that in the unlikely event of a buffer
underrun it will happen most likely only once and never again.

When a new client connects or an existing client disconnects, or
when a client wants to rewrite what it already wrote, or the user
wants to change the volume of one of the streams, then PA will
resample its data passed by the client, convert it to the proper
hardware sample type, and remix it with the data of the other
clients. This of course makes it necessary to keep a “history” of data
of all clients around so that if one client requests a
rewrite we have the necessary data around to remix what already was
mixed before.

The benefits of this model are manyfold:

  • We minimize the overall number of interrupts, down to what the
    latency requirements of the connected clients allow us. i.e. we save power,
    don’t show up in powertop anymore for normal music playback.
  • We maximize drop-out safety, because we buffer up to 2s in the
    usual cases. Only with operating systems which have scheduling
    latencies > 2s we can still get drop-outs. Thankfully no
    operating system is that bad.
  • In the event of an underrun we don’t get stuck in it, but instead
    are able to recover quickly and can make sure it doesn’t happen again.
  • We provide “zero-latency”. Each client can rewrite its playback
    buffer at any time, and this is forwarded to the hardware, even if
    this means that the sample currently being played needs to be
    rewritten. This means much quicker reaction to user input, a more
    responsive user experience.
  • We become much less dependant on what the sound hardware provides
    us with. We can configure wakeup times that are independant from the
    fragment settings that the hardware actually supports.
  • We can provide almost any latency a client might request,
    dynamically without reconfiguration, without discontinuities in
    audio.

Of course, this scheme also comes with major complications:

  • System timers and sound card timers deviate. On many sound cards
    by quite a bit. Also, not all sound cards allow the user to query the
    playback frame index at any time, but only shortly after each IRQ. To
    compensate for this deviation PA contains a non-trivial algorithm
    which tries to estimate and follow the deviation over time. If this
    doesn’t work properly it might happen that an underrun happens much
    earlier than we expected.
  • System timers on Unix are not very high precision. On traditional
    Linux with HZ=100 sleep times for timers are rounded up to multiples
    of 10ms. Only very recent Linux kernels with hrtimers can
    provide something better, but only on x86 and x86-64 until now. This
    makes the whole scheme unusable for low latency setups unless you run
    the very latest Linux. Also, hrtimers are not (yet) exposed in
    poll()/select(). It requires major jumping through loops to
    work around this limitation.
  • We need to keep a history of sample data for each stream around, thus increasing the memory
    footprint and potentially increased cache pressure. PA tries to work
    against the increased memory footprint and cache pressure this might cause by doing
    zero-copy memory management.
  • We’re still dependant on the maximum playback buffer size the
    sound hardware supports. Many sound cards don’t even support 2s, but only
    300ms or suchlike.
  • The rewriting of the client buffers causing rewriting of the
    hardware buffer complicates the resampling/converting step
    immensly. In general the code to implement this model is more complex
    than for the traditional model. Also, ALSA has not really been
    designed with this design in mind, which makes some things very hard
    to get right and suboptimal.
  • Generally, this works reliably only on newest ALSA, newest kernel,
    newest everything. It has pretty steep requirements on software and
    sometimes even on hardware. To stay comptible with systems that don’t
    fulfill these requirements we need to carry around code for the
    traditional playback model as well, increasing the code base by far.

The advantages of the scheme clearly outweigh the complexities it
causes. Especially the power-saving features of glitch-free PA should
be enough reason for the embedded Linux people to adopt it
quickly. Make PA disappear from powertop even if you play music!

The code in the glitch-free is still rough and sometimes
incomplete. I will merge it shortly into trunk and then
upload a snapshot to Rawhide.

I hope this text also explains to the few remaining PA haters a
little better why PA is a good thing, and why everyone should have it
on his Linux desktop. Of course these changes are not visible on the
surface, my hope with this blog story is to explain a bit better why
infrastructure matters, and counter misconceptions what PA actually is
and what it gives you on top of ALSA.

BOSSA 2008

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/bossa-2008.html

Just three words: awesome awesome awesome.

And for those asking for it, here are my
slides
, in which I try to explain the new “glitch-free” audio scheduling
core of PulseAudio that I recently commited to the glitch-free branch
in PA SVN. I also try to make clear why this functionality is practically a
*MUST* for all people who want to have low-latency audio, minimal power
consumption and maximum drop-out safety for their audio playback. And thus, why
all those fancy embedded Linux devices should adopt it better sooner than
later. The slides might appear a bit terse if you don’t have that awesome guy
they usually come with presenting them to you.

Back from LCA

Post Syndicated from Lennart Poettering original https://0pointer.net/blog/projects/lca2008.html

After coming back from my somewhat extended linux.conf.au trip I spent the
whole day grepping through email. Only 263 unprocessed emails left in my inbox.
Yay.

PRTPILU

Thanks to the LCA guys, video footage is now available of all talks,
including my talk
Practical Real-Time Programming in Linux Userspace
(Theora,
Slides).
In my endless modesty I have to recommend: go, watch it, it contains some
really good stuff (including me not being able to divide 1 by 1000). Right now,
the real-time features of the Linux kernel are seldomly used on the desktop due
to a couple of reasons, among them general difficulty and unsafety to use them
but predominantly it’s probably just unawareness. There are a couple of
situations however, where scheduling desktop processes as RT makes a lot of
sense (think of video playback, mouse curse feedback, etc.), to decouple the
execution (scheduling) latency from the system load. This talk focussed mostly
on non-trivial technical stuff and all the limitations RT on Linux still has.
To fully grok what’s going on you thus need some insight into concurrent
programming and stuff.

My plan is to submit a related talk to GUADEC wich will focus more on
actually building RT apps for the desktop, in the hope we will eventually be
able to ship a desktop with audio and video that never skips, and where user
feedback is still snappy and quick even if we do the most complicated IO
intensive processing in lots of different processes in the background on slow
hardware.

I didn’t have time to go through all my slides (which I intended that way
and is perfectly OK), so you might want to browse through my slides even if you
saw the whole clip. The slides, however, are not particularly verbose.

Rumors

Regarding all those
rumors
that have been spread while I — the maintainer of PulseAudio — was in the
middle of the australian outback, fist-fighting with kangaroos near Uluru: I
am not really asking anyone to port their apps to the native PulseAudio API right now. While I do think
the API is quite powerful and not redundant, I also acknowledge that it is
very difficult to use properly (and very easy to misuse), (mostly) due to its
fully asynchronous nature. The mysterious libsydney project is
supposed to fix this and a lot more. libsydney is mostly the Dukem Nukem
Forever of audio APIs right now, but in contrast to DNF I didn’t really
announce it publicly yet, so it doesn’t really count. 😉 Suffice to
say, the current situation of audio APIs is a big big mess. We are working on
cleaning it up. For now: stick to the well established and least-broken APIs,
which boils down to ALSA. Stop using the OSS API now! Don’t program
against the ESD API (except for event sounds). But, most importantly: please
stop misusing the existing APIs. I am doing my best to allow all current APIs
to run without hassles on top of PA, but due to the sometimes blatant misues,
or even brutal violations of those APIs it is very hard to get that working
for all applications (yes, that means you, Adobe, and you, Skype). Don’t
expect that mmap is available on all audio devices — it’s not, and especially
not on PA. Don’t use /proc/asound/pcm as an API for enumerating audio
devices. It’s totally unsuitable for that. Don’t hard code device strings. Use
default as device string. Don’t make assumptions that are not and
cannot be true for non-hardware devices. Don’t fiddle around with period
settings unless you fully grok them and know what you are doing. In short: be
a better citizen, write code you don’t need to be ashamed of. ALSA has its
limitations and my compatibility code certainly as well, but this is not an
excuse for working around them by writing code that makes little children cry.
If you have a good ALSA backend for your program than this will not only fix
your issues with PA, but also with Bluetooth, you will have less code to
maintain and also code that is much easier to maintain.

Or even shorter: Fix. Your. Broken. ALSA. Client. Code. Thank you.

Oh, if you have questions regarding PA, just ping me on IRC (if I am
around) or write me an email, like everyone else. Mysterious, blogged pseudo
invitations to rumored meetings is not the best way to contact me.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

When your apt-mirror is always downloading

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2008/01/24/apt-mirror-2.html

When I started building our apt-mirror, I ran into a problem: the
machine was throttled against ubuntu.com’s servers, but I had completed
much of the download (which took weeks to get multiple distributions).
I really wanted to roll out the solution quickly, particularly because
the service from the remote servers was worse than ever due to the
throttling that the mirroring created. But, with the mirror incomplete,
I couldn’t so easily make available incomplete repositories.

The solution was to simply let apache redirect users on to the real
servers if the mirror doesn’t have the file. The first order of
business for that is to rewrite and redirect URLs when files aren’t
found. This is a straightforward Apache configuration:

           RewriteEngine on
           RewriteLogLevel 0
           RewriteCond %{REQUEST_FILENAME} !^/cgi/
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-F
           RewriteCond /var/spool/apt-mirror/mirror/archive.ubuntu.com%{REQUEST_FILENAME} !-d
           RewriteCond %{REQUEST_URI} !(Packages|Sources)\.bz2$
           RewriteCond %{REQUEST_URI} !/index\.[^/]*$ [NC]
           RewriteRule ^(http://%{HTTP_HOST})?/(.*) http://91.189.88.45/$2 [P]
         

Note a few things there:

  • I have to hard-code an IP number, because as I mentioned in
    the last
    post on this subject
    , I’ve faked out DNS
    for archive.ubuntu.com and other sites I’m mirroring. (Note:
    this has the unfortunate side-effect that I can’t easily take advantage
    of round-robin DNS on the other side.)

  • I avoid taking Packages.bz2 from the other site, because
    apt-mirror actually doesn’t mirror the bz2 files (although I’ve
    submitted a patch to it so it will eventually).

  • I make sure that index files get built by my Apache and not
    redirected.

  • I am using Apache proxying, which gives me Yet Another type of
    cache temporarily while I’m still downloading the other packages. (I
    should actually work out a way to have these caches used by apt-mirror
    itself in case a user has already requested a new package while waiting
    for apt-mirror to get it.)

Once I do a rewrite like this for each of the hosts I’m replacing with
a mirror, I’m almost done. The problem is that if for any reason my
site needs to give a 403 to the clients, I would actually like to
double-check to be sure that the URL doesn’t happen to work at the place
I’m mirroring from.

My hope was that I could write a RewriteRule based on what the
HTTP return code would be when the request completed. This was
really hard to do, it seemed, and perhaps undoable. The quickest
solution I found was to write a CGI script to do the redirect. So, in
the Apache config I have:

        ErrorDocument 403 /cgi/redirect-forbidden.cgi
        

And, the CGI script looks like this:

        #!/usr/bin/perl
        
        use strict;
        use CGI qw(:standard);
        
        my $val = $ENV{REDIRECT_SCRIPT_URI};
        
        $val =~ s%^http://(\S+).sflc.info(/.*)$%$2%;
        if ($1 eq "ubuntu-security") {
           $val = "http://91.189.88.37$val";
        } else {
           $val = "http://91.189.88.45$val";
        }
        
        print redirect($val);
        

With these changes, the user will be redirected to the original when
the files aren’t available on the mirror, and as the mirror gets more
accurate, they’ll get more files from the mirror.

I still have problems if for any reason the user gets a Packages or
Sources file from the original site before the mirror is synchronized,
but this rarely happens since apt-mirror is pretty careful. The only
time it might happen is if the user did an apt-get update when
not connected to our VPN and only a short time later did one while
connected.

The collective thoughts of the interwebz