Getting started

Verba docent, exempla trahunt. So before starting talk about this project I'll spend some words demostrating how could all this stuff be used.

Imagine a 3 Linux boxes (let's call them clu1, clu2 and clu3). Each box has its own IP address ( and MAC address of course).
Let's assume:

clu1: IP Addr: 1.2.3.11 - MAC (00:00:01:01:01:01)
clu2: IP Addr: 1.2.3.12 - MAC (00:00:02:02:02:02)
clu3: IP Addr: 1.2.3.13 - MAC (00:00:03:03:03:03)

Each box runs an apache webserver. All the boxes shares the same data (e.g. via an nfs share, a SAN,...) and host the same domains. So reaching http://clu1/~fred/ or http://clu2/~fred/ or http://clu3/~fred/ produces the same result.

The goal is to group clu1 clu2 and clu3 into a cluster, and let them reply to a common IP addr... all without using a proxy, nor NAT, nor strange dns techniques.
All you have to do so is:
Downloading and installing lnlb on each machine. After you've done, simply log on each box and simply type:

# modprobe lnlb && modprobe lnlb_mod_default
# lnlbctl addif 1.2.3.10 eth0

<Now, this is what happens if we do an ipconfig>
# ifconfig
> eth0 Link encap:Ethernet HWaddr 00:02:A5:13:B9:48
>      inet addr:1.2.3.11 Bcast:1.2.3.255 Mask:255.255.255.0
>      ....
>
> nlb0 Link encap:UNKNOWN HWaddr 02:00:01:02:03:0A
>      inet addr:1.2.3.10 Mask: 255.255.255.255

Once you've done, just point your dns server to cluster IP addr (1.2.3.10 in the example)... the driver will do the rest and incoming connections will be distributed among clu1 clu2 and clu3.
Could you ever think it simpler? :)

Now, let's spend some words about how the magic is done.
(Some knowledge of IP protocol and Ethernet networks is required at this point)

When an host in the network (note: routers are network hosts too, so all this stuff will work with external requests coming from the internet with no problems, no worry) needs to send an IP datagram to the cluster IP address, first of all it sends an ARP request asking the MAC for that IP. At this point the driver will mangle arp replies and al the nodes will reply a (shared) "fake" MAC address.
The result is that traffic directed to the cluster is broadcasted on the network, since the switch will never learn the port associated to that MAC (yep here comes the bad news, the only price you pay for all this, is that inbound traffic is flooded to all the hosts... see remarks).

Every 7 seconds (this value can be changed during module loading adding the "heartbeat_interval=N" parameter to the "modprobe lnlb" command) each node sends an heartbeat to the rest of the cluster through an ethernet frame [proto 0x8870].
So every 7 seconds the nodes will synchronize and, during this convergence phase, every node knows weight of other nodes.
Once this is done, balancing of incoming traffic can start: each node determines whether to deliver (to the virtual interface) via a weigthed hash table mechanism: the higher weight a node has, the less incoming traffic it gets.

I'm quite sure that at this point you're probably wondering: what's weight... how is node weight estimated ?
weight. In the "driver world" node weight is strictly an unsigned int (1-65535 range accepted)... all the balancing process uses this value to distribute incoming traffic.
Let's now see how this value is estimated:
By default the driver uses system load average value.. but you can change this weight source to:

5 min Load Avg: (# lnlbctl weight_mode nlb0 loadavg5 )
15 min Load Avg (# lnlbctl weight_mode nlb0 loadavg15 )
Free memory %: (# lnlbctl weight_mode nlb0 freemem )

If these sources do not satisfy your needs (and here comes the good news) you can set the weight source to "manual" (#lnlbctl weight_mode nlb0 manual ) and feed the driver , e.g. through a crontab script that every 7 seconds (or more) does the following:

# lnlbctl set_weight nlb0 X (where X is a value between 1 and 65535)

Eventually you can replace x with the "stdin" word (without quotes) and push the value into stdin instead of command line.

Connection tracking
Since ... the driver ships with a connection tracking module (lnlb_mod_default... btw the driver will not work without this module).
Once a remote host (read " remote IP") sends a request to the cluster and its datagrams are delivered to a node, its session will be tracked and all following datagrams coming from that IP address will be sent to the same node.
Sessions are marked as expired after 30 mins of inactivity (no datagram received from the remote IP). This value can be altered when loading the lnlb_mod_default module adding a "conntrack_idle_timeout=X" parameter to the modprobe (value in seconds).
If you need any other particular session tracking method you can reimplement the default handler, or write a specific transport handler... see developers sections for more info.

Remarks
As told before, the main disadvantage is that inbound traffic is flooded over the network.
First of all I suggest you to group all nodes of a cluster in a separate VLAN in order to keep flooding limited to cluster nodes (I'm working on a multicast mode in order to use IGMP snooping to keep broadcasting limited to cluster withoud having to configure a VLAN... stay tuned).
Second... It's clear that this project could be (at least I hope so) mainly useful in context that requires few inbound traffic capabilities and (but more outbound)... (this sould be perfectly fine when balancing web services for instance).
Just to do a numeric example.. if you have a cluster made up by 4 nodes with a 100Mbit/s ethernet connection, the inbound capabilities of the cluster will be still 100Mbit/s total, but, since each node sends outbound traffic with its own MAC, outbound capabilities will be 4x100 Mbit/s.

That's all