Sunday, May 22, 2005
Novell BrainShare 2005
For the first time this year, I had an opportunity to attend Novell BrainShare Africa in
Jo'burg. I must admit to being a little sceptical before I came up as I'm
not a traditional Novell person — I've been forced into using Novell
by way of a job rather than by choice. That said, we'd all been to a talk
in PE earlier in the year where we were introduced to Novell Linux Desktop and I was
impressed by the open-source tack Novell seemed to have taken.
See more ...
posted by guy at: 09:20 SAST |
path: /general |
permanent link
Wednesday, May 18, 2005
That is one big penguin. oh & we're here

posted by at: 15:39 SAST |
path: /phone |
permanent link
Look an aeroplane

posted by at: 12:04 SAST |
path: /phone |
permanent link
Linuxworld (ok brainshare) the easy way
So we stopped for breakfast at Nanaga on the way to the airport. Hi russell.
posted by at: 10:27 SAST |
path: /phone |
permanent link
Friday, May 13, 2005
So now we can photo blog
D-arb posted an entry to his blog about how he had set up blogging of photos from his phone. It seemed like quite a useful idea and i've just got a new phone (a Sony Ericsson k700i) so I thought I'd give it a bash. This is the result. A blog entry from my phone, complete with obligatory photo of our cat. D-arb's shell script based method seemed overly complicated so this is a short and simple perl script based on MIME::Fast. Works for me :)

UPDATE 2005/08/25: source code
A couple of people have asked for the code I'm using to do this. YMMV and all that.
posted by at: 21:50 SAST |
path: /phone |
permanent link
Friday, May 06, 2005
FreeBSD DEVICE_POLLING, GigE and traffic
We have a FreeBSD machine on campus that is tasked with keeping 1300 students off our
academic network. This machine is a 2.8GHz machine running on an Intel
Winter
Park mother board with an Intel PRO/1000 MF Gigabit fibre NIC in it.
About two months ago we started to experience problems with the machine
going into live lockup during periods of high network activity. What was
happening was that the machine was spending its entire life processing
interrupts from the network card. This seemed an obvious candidate for polling(4)
so we duly compiled a kernel with DEVICE_POLLING and HZ=1000. This seemed
to solve our problems nicely.
Recently however, we've heard complaints from the students that transfer
speeds on their part of the network were somewhat slower than they used to
be. It is worth explaining at this point that our student network consists
of 40 separate subnets (one per residence) and that the machine in question is
the default gateway for each of these subnets. Thus is has about 40 vlan
interfaces on it all sitting on the same em
interface provided by the fibre NIC. Connectivity to the rest of our
network is provided by the 100Mbps on-board fxp,
but experience has shown that majority of the traffic stays within the
residence system. A few tests showed that their complaints were indeed
justified.
We initially though the problem might be saturated 100Mbps links in some of
our regional switching centres, so we got MRTG to draw us some graphs. This quickly
showed that we were barking up the wrong tree.
Some research showed that the em
device was generating large numbers of error (in the region of 600 per
second) and that throughput appeared to be capped at just over 20,000
packets a second (amounting to about 20MB/s traffic on a gigabit link).
This was clearly a problem as the left hand side of the two graphs below
shows:
(Note: These graphs are over 48 hours. The final 24 hours is
available in more detail: errors packets)
Google suggested a first take at a solution. It appears that what's been
happening is that packets have been arriving at the interface faster than
the polling loop has been able to remove them. This means that the NIC's
buffer will fill up and eventually packets will be discarded as there is no
space to store them. It seams that each time the buffer overflows, it
generates an error, so the 600 errors/sec that we were seeing at peak times
could probably be interpreted as 600 out of 1000 polling cycles found that
the buffer had been exceed. Not good.
The obvious thing to do was to increase the frequency at which we polled the
network card for packets so at about 8.30am (just to the right of middle on
the graphs) we installed a new kernel with the HZ value set to 1500. This
made a noticeable difference in the number of errors compared to the same
time the previous day and our packets/sec count appeared to stop
table-topping.
Unfortunately at about 1pm we noticed the error rate start to creep up again
and saw that the packet/sec count had again showed signs of reaching a
plateau. A new kernel was compiled and installed at about 3.30pm. Based on
our previous success, this time the HZ value was set at 2500. You can see
the impact of this as a slight drop in the errors and a slight increase in
the number of packets we could handle. Clearly we weren't getting bang for
buck any more out of the HZ value.
Some more research followed and this time we decided to experiment with the
kern.polling.user_frac
value. This value controls how much time the scheduler is prepared to
allocate to handling the polling loop and how much it reserves for user
process. The idea is by reserving CPU time we can prevent live lock up at
the expense of degraded performance at times of heavy load. By default this
value is 50%, so we decided to reduce it to 30% (leaving 70% of the CPU to
handle our network traffic). This happened at 5pm and caused the big change
visible in both graphs, showing that our problems were now clearly CPU
related.
polling(4)
suggested another knob to tweak, kern.polling.max_burst. Together with the
HZ kernel option, kern.polling.max_burst controls the total number of
packets the machine is prepared to handle each second. Its default value is
150, so we decided to double it to 300. This happened at around 8pm and the
results were surprising ... You'll notice a
decrease in
performance and an
increase in errors, which was completely counter
to what was expected. My suspicion is that we'd reached the CPU's limit and
that we were increasing the number of TCP retries. As a result, we backed
off back to the default kern.polling.max_burst value of 150.
So what effect did all of this have on network throughput? The answer to
that can be seen in our bandwidth graph for the same day:
Again, the last 24 hours is available in more detail.
You'll notice that throughput went from peaking at around 20MB/s to peaking
at around 35MB/s. A clear improvement. No doubt there are plenty of people
who're a little happier about how fast their crap is downloading.
We weren't satisfied there, however, so this morning we did even more
optimisations. Since we know we're CPU bound, and this machine's primary
use is to provide a firewall, we looked at the firewall rule-sets. We
generate graphs of the
traffic generated by each residence and at the time this was all happening,
ipfw
had four count rules for each of 40 residences. This was 160 odd rules just
dedicated to counting traffic. By judicious use of ipfw skipto
rules, we managed to reduce this by about two thirds — meaning that
for a particular packet, the firewall would only have to process about 50
rules. The effects of this aren't visible on the graphs above, but it did
give a slight increase (of about 5000/second) in the number of packets we
could process and showed our bandwidth use now peaks at over 40MB/s.
For good measure, just in case the fxp
100Mbps interface was slowing things down, we changed it to another em
interface running at 1Gbps (it is in fact the second fibre port on the PRO/1000
MF NIC). It hasn't made any noticeable difference though, which is
entirely what was expected. Ironically, the 100Mbps links in our regional
switching centres are now significantly closer to saturation than they were
(we're peaking at 65% utilisation rather than 30%). Its still not (yet) a
problem though.
All of this, however, means that we've effectively doubled the throughput of
our residence network in the last 24 hours. Students in residence probably
owe David and I pizza for two days of unpaid overtime ;-)
We're still not happy though. More optimisations can be made to the
firewall to reduce its CPU load and probably more tweaks can be made to the
polling
knobs. A good start is to perhaps reduce HZ back down to the point
at which it stops improving things and then maybe to play with
kern.polling.max_burst again.
Another consideration is the fact that the fibre NIC is capable of operating
on a PCI-X bus (64 bit, 66 MHz), but the Winter
Park board it is in only has a PCI bus. This shouldn't matter as the
PCI bandwidth should exceed the 125MB/s that gigabit Ethernet is capable of,
but it is a possible bottleneck. We're considering moving the firewall
(disk + NIC) to a Torrey
Pines-based machine to see if a PCI-X slot will improve things. We
could also throw more processor at the problem — the 2.8GHz P4 is now
old technology and 3.6GHz processors are easily available. We want to
exhaust the free solutions first though ;-)
Update: 2005/05/06 18:10
While I've been dealing with Telkom, David's spent a large part of this
afternoon looking at optimising the firewall ruleset that runs on this
machine. In particular we've changed the way stats are gathered, and we've
tried to reduce the number of rules that a single packet has to pass through
before it gets to the end of the ruleset. This has increased the network
throughput to a new peak of 55MB/s and, more importantly, has reduced the
amount of time the CPU spends processing network traffic from the
kern.polling.user_frac figure of 70% to about 45% — in other words we
now have capacity to spare. The graph of the last 48 hours shows it all:
We go from about 10MB/s two days ago to over 55MB/s at the moment. We're
now sitting at just under half the gigabit link's capacity, and we're
table-topping the graphs for our regional switching centre uplinks. Which
brings us full circle to where this all started :-)
posted by guy at: 12:29 SAST |
path: /systems |
permanent link
