More Xen Tricks

Post Syndicated from Bradley M. Kuhn original http://ebb.org/bkuhn/blog/2007/08/24/more-xen.html

In
my previous
post about Xen
, I talked about how easy Xen is to configure and
set up, particularly on Ubuntu and Debian. I’m still grateful that
Xen remains easy; however, I’ve lately had a few Xen-related
challenges that needed attention. In particular, I’ve needed to
create some surprisingly messy solutions when using vif-route to
route multiple IP numbers on the same network through the dom0 to a
domU.

I tend to use vif-route rather than vif-bridge, as I like the control
it gives me in the dom0. The dom0 becomes a very traditional
packet-forwarding firewall that can decide whether or not to forward
packets to each domU host. However, I recently found some deep
weirdness in IP routing when I use this approach while needing
multiple Ethernet interfaces on the domU. Here’s an example:

Multiple IP numbers for Apache

Suppose the domU host, called webserv, hosts a number of
websites, each with a different IP number, so that I have Apache
doing something like1:

        Listen 192.168.0.200:80
        Listen 192.168.0.201:80
        Listen 192.168.0.202:80
        ...
        NameVirtualHost 192.168.0.200:80
        <VirtualHost 192.168.0.200:80>
        ...
        NameVirtualHost 192.168.0.201:80
        <VirtualHost 192.168.0.201:80>
        ...
        NameVirtualHost 192.168.0.202:80
        <VirtualHost 192.168.0.202:80>
        ...
        

The Xen Configuration for the Interfaces

Since I’m serving all three of those sites from webserv, I
need all those IP numbers to be real, live IP numbers on the local
machine as far as the webserv is concerned. So, in
dom0:/etc/xen/webserv.cfg I list something like:

        vif  = [ 'mac=de:ad:be:ef:00:00, ip=192.168.0.200',
                 'mac=de:ad:be:ef:00:01, ip=192.168.0.201',
                 'mac=de:ad:be:ef:00:02, ip=192.168.0.202' ]
        

… And then make webserv:/etc/iftab look like:

        eth0 mac de:ad:be:ef:00:00 arp 1
        eth1 mac de:ad:be:ef:00:01 arp 1
        eth2 mac de:ad:be:ef:00:02 arp 1
        

… And make webserv:/etc/network/interfaces (this is
probably Ubuntu/Debian-specific, BTW) look like:

        auto lo
        iface lo inet loopback
        auto eth0
        iface eth0 inet static
         address 192.168.0.200
         netmask 255.255.255.0
        auto eth1
        iface eth1 inet static
         address 192.168.0.201
         netmask 255.255.255.0
        auto eth2
        iface eth2 inet static
         address 192.168.0.202
         netmask 255.255.255.0
        

Packet Forwarding from the Dom0

But, this doesn’t get me the whole way there. My next step is to make
sure that the dom0 is routing the packets properly to
webserv. Since my dom0 is heavily locked down, all
packets are dropped by default, so I have to let through explicitly
anything I’d like webserv to be able to process. So, I
add some code to my firewall script on the dom0 that looks like:2

        webIpAddresses="192.168.0.200 192.168.0.201 192.168.0.202"
        UNPRIVPORTS="1024:65535"
        
        for dport in 80 443;
        do
          for sport in $UNPRIVPORTS 80 443 8080;
          do
            for ip in $webIpAddresses;
            do
              /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \
                --syn -m state --state NEW \
                --sport $sport --dport $dport -j ACCEPT
        
              /sbin/iptables -A FORWARD -i eth0 -p tcp -d $ip \
                --sport $sport --dport $dport \
                -m state --state ESTABLISHED,RELATED -j ACCEPT
        
              /sbin/iptables -A FORWARD -o eth0 -s $ip \
                -p tcp --dport $sport --sport $dport \
                -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
            done  
          done
        done
        

Phew! So at this point, I thought I was done. The packets should find
their way forwarded through the dom0 to the Apache instance running on
the domU, webserv. While that much was true, I now have
the additional problem that packets got lost in a bit of a black hole
on webserv. When I discovered the black hole, I quickly
realized why. It was somewhat atypical, from webserv‘s
point of view, to have three “real” and different Ethernet
devices with three different IP numbers, which all talk to the exact
same network. There was more intelligent routing
needed.3

Routing in the domU

While most non-sysadmins still use the route command to
set up local IP routes on a GNU/Linux host, iproute2
(available via the ip command) has been a standard part
of GNU/Linux distributions and supported by Linux for nearly ten
years. To properly support the situation of multiple (from
webserv‘s point of view, at least) physical interfaces on
the same network, some special iproute2 code is needed.
Specifically, I set up separate route tables for each device. I first
encoded their names in /etc/iproute2/rt_tables (the
numbers 16-18 are arbitrary, BTW):

        16      eth0-200
        17      eth1-201
        18      eth2-202
        

And here are the ip commands that I thought would work
(but didn’t, as you’ll see next):

        /sbin/ip route del default via 192.168.0.1
        
        for table in eth0-200 eth1-201 eth2-202;
        do
           iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'`
           ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'`
           ip=192.168.0.$ipEnding
           /sbin/ip route add 192.168.0.0/24 dev $iface table $table
        
           /sbin/ip route add default via 192.168.0.1 table $table
           /sbin/ip rule add from $ip table $table
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        done
        
        /sbin/ip route add default via 192.168.0.1 
        

The idea is that each table will use rules to force all traffic coming
in on the given IP number and/or interface to always go back out on
the same, and vice versa. The key is these two lines:

           /sbin/ip rule add from $ip table $table
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        

The first rule says that when traffic is coming from the given IP number,
$ip, the routing rules in table, $table should
be used. The second says that traffic to anywhere when bound for
interface, $iface should use table,
$table.

The tables themselves are set up to always make sure the local network
traffic goes through the proper associated interface, and that the
network router (in this case, 192.168.0.1) is always
used for foreign networks, but that it is reached via the correct
interface.

This is all well and good, but it doesn’t work. Certain instructions
fail with the message, RTNETLINK answers: Network is
unreachable
, because the 192.168.0.0 network cannot be found
while the instructions are running. Perhaps there is an
elegant solution; I couldn’t find one. Instead, I temporarily set
up “dummy” global routes in the main route table and
deleted them once the table-specific ones were created. Here’s the
new bash script that does that (lines that are added are emphasized
and in bold):

        /sbin/ip route del default via 192.168.0.1
        for table in eth0-200 eth1-201 eth2-202;
        do
           iface=`echo $table | perl -pe 's/^(\S+)\-.*$/$1/;'`
           ipEnding=`echo $table | perl -pe 's/^.*\-(\S+)$/$1/;'`
           ip=192.168.0.$ipEnding
           /sbin/ip route add 192.168.0.0/24 dev $iface table $table
        
           /sbin/ip route add 192.168.0.0/24 dev $iface src $ip
        
           /sbin/ip route add default via 192.168.0.1 table $table
           /sbin/ip rule add from $ip table $table
        
           /sbin/ip rule add to 0.0.0.0 dev $iface table $table
        
           /sbin/ip route del 192.168.0.0/24 dev $iface src $ip
        done
        /sbin/ip route add 192.168.0.0/24 dev eth0 src 192.168.0.200
        /sbin/ip route add default via 192.168.0.1 
        /sbin/ip route del 192.168.0.0/24 dev eth0 src 192.168.0.200
        

I am pretty sure I’m missing something here — there must be a
better way to do this, but the above actually works, even if it’s
ugly.

Alas, Only Three

There was one additional confusion I put myself through while
implementing the solution. I was actually trying to route four
separate IP addresses into webserv, but discovered that
I got found this error message (found via dmesg on the
domU):
netfront can't alloc rx grant refs. A quick google
around showed me
that the
XenFaq, which says that Xen 3 cannot handled more than three network
interfaces per domU
. Seems strangely arbitrary to me; I’d love
to hear why cuts it off at three. I can imagine limits at one and
two, but it seems that once you can do three, n should be
possible (perhaps still with linear slowdown or some such). I’ll
have to ask the Xen developers (or UTSL) some day to find out what
makes it possible to have three work but not four.


1Yes, I know I
could rely on client-provided Host: headers and do this with full
name-based virtual hosting, but I don’t
like to do that for good reason (as outlined in the Apache
docs)
.

2Note that the
above firewall code must run on dom0, which has one real
Ethernet device (its eth0) that is connected properly to
the wide 192.168.0.0/24 network, and should have some IP
number of its own there — say 192.168.0.100. And,
don’t forget that dom0 is configured for vif-route, not
vif-bridge
. Finally, for brevity, I’ve left out some of the
firewall code that FORWARDs through key stuff like DNS. If you are
interested in it, email me or look it up in a firewall book.

3I was actually a
bit surprised at this, because I often have multiple IP numbers
serviced from the same computer and physical Ethernet interface.
However, in those cases, I use virtual interfaces
(eth0:0, eth0:1, etc.). On a normal system,
Linux does the work of properly routing the IP numbers when you attach
multiple IP numbers virtually to the same physical interface.
However, in Xen domUs, the physical interfaces are locked by Xen to
only permit specific IP numbers to come through, and while you can set
up all the virtual interfaces you want in the domU, it will only get
packets destine for the IP number specified in the vif
section of the configuration file. That’s why I added my three
different “actual” interfaces in the domU.