More Xen Tricks

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2007/08/24/more-xen.html

In
my previous
post about Xen
, I talked about how easy Xen is to configure and
set up, particularly on Ubuntu and Debian. I’m still grateful that
Xen remains easy; however, I’ve lately had a few Xen-related
challenges that needed attention. In particular, I’ve needed to
create some surprisingly messy solutions when using vif-route to
route multiple IP numbers on the same network through the dom0 to a
domU.

I tend to use vif-route rather than vif-bridge, as I like the control
it gives me in the dom0. The dom0 becomes a very traditional
packet-forwarding firewall that can decide whether or not to forward
packets to each domU host. However, I recently found some deep
weirdness in IP routing when I use this approach while needing
multiple Ethernet interfaces on the domU. Here’s an example:

Multiple IP numbers for Apache

Suppose the domU host, called webserv, hosts a number of
websites, each with a different IP number, so that I have Apache
doing something like1:

Listen 192.168.0.200:80
Listen 192.168.0.201:80
Listen 192.168.0.202:80

NameVirtualHost 192.168.0.200:80
<VirtualHost 192.168.0.200:80>

NameVirtualHost 192.168.0.201:80
<VirtualHost 192.168.0.201:80>

NameVirtualHost 192.168.0.202:80
<VirtualHost 192.168.0.202:80>

The Xen Configuration for the Interfaces

Since I’m serving all three of those sites from webserv, I
need all those IP numbers to be real, live IP numbers on the local
machine as far as the webserv is concerned. So, in
dom0:/etc/xen/webserv.cfg I list something like:

vif = [ ‘mac=de:ad:be:ef:00:00, ip=192.168.0.200’,
‘mac=de:ad:be:ef:00:01, ip=192.168.0.201’,
‘mac=de:ad:be:ef:00:02, ip=192.168.0.202’ ]

… And then make webserv:/etc/iftab look like:

eth0 mac de:ad:be:ef:00:00 arp 1
eth1 mac de:ad:be:ef:00:01 arp 1
eth2 mac de:ad:be:ef:00:02 arp 1

… And make webserv:/etc/network/interfaces (this is
probably Ubuntu/Debian-specific, BTW) look like:

auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 192.168.0.200
netmask 255.255.255.0
auto eth1
iface eth1 inet static
address 192.168.0.201
netmask 255.255.255.0
auto eth2
iface eth2 inet static
address 192.168.0.202
netmask 255.255.255.0

Packet Forwarding from the Dom0

But, this doesn’t get me the whole way there. My next step is to make
sure that the dom0 is routing the packets properly to
webserv. Since my dom0 is heavily locked down, all
packets are dropped by default, so I have to let through explicitly
anything I’d like webserv to be able to process. So, I
add some code to my firewall script on the dom0 that looks like:2

webIpAddresses=”192.168.0.200 192.168.0.201 192.168.0.202″
UNPRIVPORTS=”1024:65535″

for dport in 80 443;
do
for sport in $UNPRIVPORTS 80 443 8080;
do
for ip in $webIpAddresses;
do
/sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip
–syn -m state –state NEW
–sport $sport –dport $dport -j ACCEPT

/sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip
–sport $sport –dport $dport
-m state –state ESTABLISHED,RELATED -j ACCEPT

/sbin/iptables -A FORWARD -o eth0 -s $ip
-p tcp –dport $sport –sport $dport
-m state –state NEW,ESTABLISHED,RELATED -j ACCEPT
done
done
done

Phew! So at this point, I thought I was done. The packets should find
their way forwarded through the dom0 to the Apache instance running on
the domU, webserv. While that much was true, I now have
the additional problem that packets got lost in a bit of a black hole
on webserv. When I discovered the black hole, I quickly
realized why. It was somewhat atypical, from webserv’s
point of view, to have three “real” and different Ethernet
devices with three different IP numbers, which all talk to the exact
same network. There was more intelligent routing
needed.3

Routing in the domU

While most non-sysadmins still use the route command to
set up local IP routes on a GNU/Linux host, iproute2
(available via the ip command) has been a standard part
of GNU/Linux distributions and supported by Linux for nearly ten
years. To properly support the situation of multiple (from
webserv’s point of view, at least) physical interfaces on
the same network, some special iproute2 code is needed.
Specifically, I set up separate route tables for each device. I first
encoded their names in /etc/iproute2/rt_tables (the
numbers 16-18 are arbitrary, BTW):

16 eth0-200
17 eth1-201
18 eth2-202

And here are the ip commands that I thought would work
(but didn’t, as you’ll see next):

/sbin/ip route del default via 192.168.0.1

for table in eth0-200 eth1-201 eth2-202;
do
iface=`echo $table | perl -pe ‘s/^(S+)-.*$/$1/;’`
ipEnding=`echo $table | perl -pe ‘s/^.*-(S+)$/$1/;’`
ip=192.168.0.$ipEnding
/sbin/ip route add 192.168.0.0/24 dev $iface table $table

/sbin/ip route add default via 192.168.0.1 table $table
/sbin/ip rule add from $ip table $table
/sbin/ip rule add to 0.0.0.0 dev $iface table $table
done

/sbin/ip route add default via 192.168.0.1

The idea is that each table will use rules to force all traffic coming
in on the given IP number and/or interface to always go back out on
the same, and vice versa. The key is these two lines:

/sbin/ip rule add from $ip table $table
/sbin/ip rule add to 0.0.0.0 dev $iface table $table

The first rule says that when traffic is coming from the given IP number,
$ip, the routing rules in table, $table should
be used. The second says that traffic to anywhere when bound for
interface, $iface should use table,
$table.

The tables themselves are set up to always make sure the local network
traffic goes through the proper associated interface, and that the
network router (in this case, 192.168.0.1) is always
used for foreign networks, but that it is reached via the correct
interface.

This is all well and good, but it doesn’t work. Certain instructions
fail with the message, RTNETLINK answers: Network is
unreachable, because the 192.168.0.0 network cannot be found
while the instructions are running. Perhaps there is an
elegant solution; I couldn’t find one. Instead, I temporarily set
up “dummy” global routes in the main route table and
deleted them once the table-specific ones were created. Here’s the
new bash script that does that (lines that are added are emphasized
and in bold):

/sbin/ip route del default via 192.168.0.1
for table in eth0-200 eth1-201 eth2-202;
do
iface=`echo $table | perl -pe ‘s/^(S+)-.*$/$1/;’`
ipEnding=`echo $table | perl -pe ‘s/^.*-(S+)$/$1/;’`
ip=192.168.0.$ipEnding
/sbin/ip route add 192.168.0.0/24 dev $iface table $table

/sbin/ip route add 192.168.0.0/24 dev $iface src $ip

/sbin/ip route add default via 192.168.0.1 table $table
/sbin/ip rule add from $ip table $table

/sbin/ip rule add to 0.0.0.0 dev $iface table $table

/sbin/ip route del 192.168.0.0/24 dev $iface src $ip
done
/sbin/ip route add 192.168.0.0/24 dev eth0 src 192.168.0.200
/sbin/ip route add default via 192.168.0.1
/sbin/ip route del 192.168.0.0/24 dev eth0 src 192.168.0.200

I am pretty sure I’m missing something here — there must be a
better way to do this, but the above actually works, even if it’s
ugly.

Alas, Only Three

There was one additional confusion I put myself through while
implementing the solution. I was actually trying to route four
separate IP addresses into webserv, but discovered that
I got found this error message (found via dmesg on the
domU):
netfront can’t alloc rx grant refs. A quick google
around showed me
that the
XenFaq, which says that Xen 3 cannot handled more than three network
interfaces per domU
. Seems strangely arbitrary to me; I’d love
to hear why cuts it off at three. I can imagine limits at one and
two, but it seems that once you can do three, n should be
possible (perhaps still with linear slowdown or some such). I’ll
have to ask the Xen developers (or UTSL) some day to find out what
makes it possible to have three work but not four.

1Yes, I know I
could rely on client-provided Host: headers and do this with full
name-based virtual hosting, but I don’t
like to do that for good reason (as outlined in the Apache
docs)
.

2Note that the
above firewall code must run on dom0, which has one real
Ethernet device (its eth0) that is connected properly to
the wide 192.168.0.0/24 network, and should have some IP
number of its own there — say 192.168.0.100. And,
don’t forget that dom0 is configured for vif-route, not
vif-bridge. Finally, for brevity, I’ve left out some of the
firewall code that FORWARDs through key stuff like DNS. If you are
interested in it, email me or look it up in a firewall book.

3I was actually a
bit surprised at this, because I often have multiple IP numbers
serviced from the same computer and physical Ethernet interface.
However, in those cases, I use virtual interfaces
(eth0:0, eth0:1, etc.). On a normal system,
Linux does the work of properly routing the IP numbers when you attach
multiple IP numbers virtually to the same physical interface.
However, in Xen domUs, the physical interfaces are locked by Xen to
only permit specific IP numbers to come through, and while you can set
up all the virtual interfaces you want in the domU, it will only get
packets destine for the IP number specified in the vif
section of the configuration file. That’s why I added my three
different “actual” interfaces in the domU.

I wonder …

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/send-file.html

… whether the guys behind this know about this?

It’s a pleasure to see as many projects as possible making use of Avahi.
OTOH I believe that all solutions should speak the same protocol. Using
Apple’s somewhat standardized link-local iChat/XMPP protocol (which is what Telekinesis does) seems to be the
best option to me: because you get MacOSX interoperability for free and
many IM clients (including many on Windows) already contain support for this as
well.

I wonder …

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/send-file.html

… whether the guys behind this know about this?

It’s a pleasure to see as many projects as possible making use of Avahi.
OTOH I believe that all solutions should speak the same protocol. Using
Apple’s somewhat standardized link-local iChat/XMPP protocol (which is what Telekinesis does) seems to be the
best option to me: because you get MacOSX interoperability for free and
many IM clients (including many on Windows) already contain support for this as
well.

CUPS 1.3b1 gained Zeroconf support

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/cups-bonjour.html

Seems CUPS now comes with Zeroconf/Bonjour network printer browsing support included in the upstream tarball. I haven’t
tried this myself, but presumably CUPS should work on Avahi as well, since we ship a — these days nearly
perfect — Bonjour compatibility library.

In Fedora Rawhide this
functionality
seems to be enabled already. Other distibutions, please follow!

Seems at least one good thing came from the recent Apple buyout of CUPS/Easy
Software Products
: I can now remove one item from my TODO list which has been there for a long time already.

CUPS 1.3b1 gained Zeroconf support

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/cups-bonjour.html

Seems CUPS now comes with Zeroconf/Bonjour network printer browsing support included in the upstream tarball. I haven’t
tried this myself, but presumably CUPS should work on Avahi as well, since we ship a — these days nearly
perfect — Bonjour compatibility library.

In Fedora Rawhide this
functionality
seems to be enabled already. Other distibutions, please follow!

Seems at least one good thing came from the recent Apple buyout of CUPS/Easy
Software Products
: I can now remove one item from my TODO list which has been there for a long time already.

Slides for LRL and OLS

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/ols-lrl-slides.html

For those interested: here’re my slides for my presentations at LRL and OLS:

Ottawa Linux Symposium 2007: Cleaning up the Linux Desktop Audio Mess (Not too much new stuff here if you already read my LCA slides)

LugRadio Live 2007: Six Use Cases for Avahi

LWN linked a short summary of my OLS talk.

Slides for LRL and OLS

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/ols-lrl-slides.html

For those interested: here’re my slides for my presentations at LRL and OLS:

LWN linked a short summary of my OLS talk.

Im Zentrum der Macht

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/photos/im-zentrum-der-macht.html

The Government District in Berlin, with the Reichstag and the offices of the members of the Bundestag:

Im Zentrum der Macht

The Diana Temple in the Hofgarten in Munich:

Hofgarten

The Königsplatz in Munich:

Königsplatz

The Residenz in Munich:

Residenz

View from the tower of Old St. Peter in Munich:

St. Peter

Green pastures of Hamburg-Wohldorf:

Wohldorfer Feld

All my panoramic photos. (Warning! Page contains a lot of oversized, badly scaled images.)

Im Zentrum der Macht

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/photos/im-zentrum-der-macht.html

The Government District in Berlin, with the Reichstag and the offices of the members of the Bundestag:

Im Zentrum der Macht

The Diana Temple in the Hofgarten in Munich:

Hofgarten

The Königsplatz in Munich:

Königsplatz

The Residenz in Munich:

Residenz

View from the tower of Old St. Peter in Munich:

St. Peter

Green pastures of Hamburg-Wohldorf:

Wohldorfer Feld

All my panoramic photos. (Warning! Page contains a lot of oversized, badly scaled images.)

Re: Avahi – what happened. on Solaris..?

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/project-indiana-part2.html

In response to Darren Kenny:

On Linux (and FreeBSD) nss-mdns has been
providing decent low-level integration of mDNS at the nsswitch level for
ages. In fact it even predates Avahi by a few months. Porting it to Solaris
would have been almost trivial. And, Sun engineers even asked about nss-mdns,
so I am quite sure that Sun knew about this.

You claim that our C API was internal? I wonder who told you that. I
definitely did not. The API has been available on the Avahi web site for ages
and is relatively well documented [1], I wonder how anyone could
ever come to the opinion that it was “internal”. Regarding API stability: yes, I
said that we make no guarantees about API stability — but I also said it was
a top-priority for us to keep the API compatible. I think that is the best you
can get from any project of the Free Software community. If there is
something in an API that we later learn is irrecoverably broken or stupid by design, then
we take the freedom to replace that or remove it entirely. Oh, and even Sun
does things like that in Java, Just think of the Java 1.x
java.lang.Thread.stop() API.

nss-mdns does not make any use of D-Bus. It never did, it never will.

GNOME never formally made the decision to go Avahi AFAIK. It’s just what
everyone uses because it is available on all distributions. Also, a lot of GNOME software
can also be compiled against HOWL/Bonjour.

Implementing the Avahi API on top of the Bonjour API is just crack. For a
crude comparison: this is like implementing a POSIX compatiblity layer on top
of the DOS API. Crack. Just crack. There is lot of functionality you can
*never* emulate in any reasonable way on top of the current Bonjour API:
properly integrated IPv4+IPv6 support, AVAHI_BROWSER_ALL_FOR_NOW, the fact that the Avahi API is
transaction-based, all the different flag definitions, and a lot more. From a
technical persepective emulating Avahi on top of Bonjour is not feasible, while
the other way round perfectly is.

Let’s also not forget that Avahi comes with a Bonjour compatibility layer,
which gets almost any Bonjour app working on top of Avahi. And in contrast your
Avahi-on-top-of-Bonjour stuff it is not inherently borked. Yes, our Bonjour compatibility layer is
not perfect, but should be very easy to fix if there should still be an
incompatibility left. And the API of that layer is of course as much set in
stone as the upstream Bonjour API. Oh, and you wouldn’t have to run two daemons instead of
just one. And you would only need to ship and maintain a single mDNS package.
Oh, and the compatibility layer would only be needed for the few remaing
applications that still use Bonjour exclusively, and not by the majority of
applications.

So, in effect you chose Bonjour because of its API and added some Avahi’sh
API on top and this all is totally crackish. If you’d have done it the other way round
you would have gotten both APIs as well, but the overall solution would not
have been totally crackish. And let’s not forget that Avahi is much more
complete than Bonjour. (Maybe except wide-area support, Federico!).

Anyway, my original rant was not about the way Sun makes its decision but
just about the fact that your Avahi-to-Bonjour-bridge is … crack! And that
it remains.

Wow, six times crack in a single article.

Footnotes:

[1] For a Free Software API at least.

Re: Avahi – what happened. on Solaris..?

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/project-indiana-part2.html

In response to Darren Kenny:

  • On Linux (and FreeBSD) nss-mdns has been
    providing decent low-level integration of mDNS at the nsswitch level for
    ages. In fact it even predates Avahi by a few months. Porting it to Solaris
    would have been almost trivial. And, Sun engineers even asked about nss-mdns,
    so I am quite sure that Sun knew about this.
  • You claim that our C API was internal? I wonder who told you that. I
    definitely did not. The API has been available on the Avahi web site for ages
    and is relatively well documented [1], I wonder how anyone could
    ever come to the opinion that it was “internal”. Regarding API stability: yes, I
    said that we make no guarantees about API stability — but I also said it was
    a top-priority for us to keep the API compatible. I think that is the best you
    can get from any project of the Free Software community. If there is
    something in an API that we later learn is irrecoverably broken or stupid by design, then
    we take the freedom to replace that or remove it entirely. Oh, and even Sun
    does things like that in Java, Just think of the Java 1.x
    java.lang.Thread.stop() API.
  • nss-mdns does not make any use of D-Bus. It never did, it never will.
  • GNOME never formally made the decision to go Avahi AFAIK. It’s just what
    everyone uses because it is available on all distributions. Also, a lot of GNOME software
    can also be compiled against HOWL/Bonjour.
  • Implementing the Avahi API on top of the Bonjour API is just crack. For a
    crude comparison: this is like implementing a POSIX compatiblity layer on top
    of the DOS API. Crack. Just crack. There is lot of functionality you can
    *never* emulate in any reasonable way on top of the current Bonjour API:
    properly integrated IPv4+IPv6 support, AVAHI_BROWSER_ALL_FOR_NOW, the fact that the Avahi API is
    transaction-based, all the different flag definitions, and a lot more. From a
    technical persepective emulating Avahi on top of Bonjour is not feasible, while
    the other way round perfectly is.

Let’s also not forget that Avahi comes with a Bonjour compatibility layer,
which gets almost any Bonjour app working on top of Avahi. And in contrast your
Avahi-on-top-of-Bonjour stuff it is not inherently borked. Yes, our Bonjour compatibility layer is
not perfect, but should be very easy to fix if there should still be an
incompatibility left. And the API of that layer is of course as much set in
stone as the upstream Bonjour API. Oh, and you wouldn’t have to run two daemons instead of
just one. And you would only need to ship and maintain a single mDNS package.
Oh, and the compatibility layer would only be needed for the few remaing
applications that still use Bonjour exclusively, and not by the majority of
applications.

So, in effect you chose Bonjour because of its API and added some Avahi’sh
API on top and this all is totally crackish. If you’d have done it the other way round
you would have gotten both APIs as well, but the overall solution would not
have been totally crackish. And let’s not forget that Avahi is much more
complete than Bonjour. (Maybe except wide-area support, Federico!).

Anyway, my original rant was not about the way Sun makes its decision but
just about the fact that your Avahi-to-Bonjour-bridge is … crack! And that
it remains.

Wow, six times crack in a single article.

Footnotes:

[1] For a Free Software API at least.

Project Indiana

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/project-indiana.html

Dear Sun Microsystems,

I wonder if the mythical “Project Indiana” consists of patches
like these
which among other strange things make the Avahi daemon just a frontend to the Apple
Bonjour
daemon. Given that Avahi is a superset of
Bonjour in both functionality and API this is just so ridiculuous —
I haven’t seen such a monstrous crack in quite a while.

Sun, you don’t get it, do you? That way you will only reach the
crappiness, bugginess and brokeness of Windows, not the power and
usability of Linux.

Oh, and please rename that “fork” of Avahi to something completely
different — because it actually is exactly that: something completely
different than Avahi.

Love,
     Lennart

Project Indiana

Post Syndicated from Lennart Poettering original http://0pointer.net/blog/projects/project-indiana.html

Dear Sun Microsystems,

I wonder if the mythical “Project Indiana” consists of patches
like these
which among other strange things make the Avahi daemon just a frontend to the Apple
Bonjour
daemon. Given that Avahi is a superset of
Bonjour in both functionality and API this is just so ridiculuous —
I haven’t seen such a monstrous crack in quite a while.

Sun, you don’t get it, do you? That way you will only reach the
crappiness, bugginess and brokeness of Windows, not the power and
usability of Linux.

Oh, and please rename that “fork” of Avahi to something completely
different — because it actually is exactly that: something completely
different than Avahi.

Love,

     Lennart

Virtually Reluctant

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2007/06/12/virtually-reluctant.html

Way back when User
Mode Linux (UML)
was the “only way” the Free Software
world did anything like virtualization, I was already skeptical.
Those of us who lived through the coming of age of Internet security
— with a remote root exploit for every day of the week —
became obsessed with the chroot and its ultimate limitations. Each
possible upgrade to a better, more robust virtual environment was met
with suspicion on the security front. I joined the many who doubted
that you could truly secure a machine that offered disjoint services
provisioned on the same physical machine. I’ve recently revisited
this position. I won’t say that Xen has completely changed my mind,
but I am open-minded enough again to experiment.

For more than a decade, I have used chroots as a mechanism to segment a
service that needed to run on a given box. In the old days
of ancient BINDs and sendmails, this was often the best we could do
when living with a program we didn’t fully trust to be clean of
remotely exploitable bugs.

I suppose those days gave us all rather strange sense of computer
security. I constantly have the sense that two services running on
the same box always endanger each other in some fundamental way. It
therefore took me a while before I was comfortable with the resurgence
of virtualization.

However, what ultimately drew me in was the simple fact that modern
hardware is just too darn fast. It’s tough to get a machine these
days that isn’t ridiculously overpowered for most tasks you put in
front of it. CPUs sit idle; RAM sits empty. We should make more
efficient use of the hardware we have.

Even with that reality, I might have given up if it wasn’t so easy. I
found a good link
about Debian on Xen
, a useful entry in
the Xen Wiki
, and some good
network
and LVM
examples
. I also quickly learned how to use RAID/LVM
together for disk redundancy inside Xen instances
. I even got bonded
ethernet
working with some help to add
additional network redundancy.

So, one Saturday morning, I headed into the office, and left that
afternoon with two virtual servers running. It helped that Xen 3.0 is
packaged properly for recent Ubuntu versions, and a few obvious
apt-get installs get you what you need on edgy and
feisty. In fact, I only struggled (and only just a bit) with the
network, but quickly discovered two important facts:

VIF network routing in my opinion is a bit easier to configure and
more stable than VIF bridging, even if routing is a bit
slower.

sysctl -w net.ipv4.conf.DEVICE.proxy_arp=1 is needed to
make the network routing down into the instances work
properly.

I’m not completely comfortable yet with the security of virtualization.
Of course, locking down the Dom0 is absolutely essential, because
there lies the keys to your virtual kingdom. I lock it down with
iptables so that only SSH from a few trusted hosts comes
in, and even services as fundamental as DNS can only be had from a few
trusted places. But, I still find myself imagining ways people can
bust through the instance kernels and find their way to the
hypervisor.

I’d really love to see a strong line-by-line code audit of the
hypervisor and related utilities to be sure we’ve got something we can
trust. However, in the meantime, I certainly have been sold on the
value of this approach, and am glad it’s so easy to set up.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close