Monday, May 23, 2011

Upgrading Chef 0.9 to 0.10 - some gotchas

There are a couple of problems I experienced with the upgrade of Opscode Chef from 0.9 to 0.10.

No DEB packages yet

Impatient as I was I wanted to upgrade before the Ubuntu deb's were released. So I uninstalled the deb install and did the ruby gem install, keeping my existing config & DB.  Perhaps I'm a bit naïve to think this should work :p

RabbitMQ not properly setup by chef-solo

Once the chef-solo bootstrap was complete I started up all the services and tried things out. It didn't work. Couldn't do the last step in the upgrade docs: knife index rebuild.

Turned out this was related to rabbitmq, which was also filling up the disk quickly with error logs.  The solution was to update the rabbitmq password: rabbitmqctl change_password chef testing.

Internal Server Error 500

Another error I keep getting on some servers is
File with checksum a73b7f6222549364ab0d6c4ed2442abf not found in the  repository (this should not happen) -  (Merb::ControllerExceptions::InternalServerError)"
It looks like there's an inconsistency between CouchDB and the filesystem cache.

The tips on chef bug #1397 explains how this can be fixed with a combination of Shef and rake install:
# shef
chef > require 'chef/checksum'
  => false 
chef > r = Chef::REST.new('http://localhost:5984/chef/_design/checksums/_view/', false, false)
 => #<Chef::REST:0x7f831db382b8 @redirect_limit=10, @cookies={}, @redirects_followed=0, @auth_credentials=#, @url="http://localhost:5984/chef/_design/checksums/_view/", @sign_request=true, @default_headers={}, @sign_on_redirect=true>
chef > r.get_rest("all")["rows"].each do |c| c["value"].cdb_destroy end

After doing this I did the rake install. During this process I had some difficulties that resulted in this error:
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/openssl/buffering.rb:178:in `syswrite': Broken pipe (Errno::EPIPE)
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/openssl/buffering.rb:178:in `do_write'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/openssl/buffering.rb:192:in `write'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:177:in `write0'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:153:in `write'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:168:in `writing'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:152:in `write'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1557:in `send_request_with_body_stream'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1527:in `exec'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1048:in `__request__'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/net_http_ext.rb:15:in `request'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:167:in `transmit'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:543:in `start'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:166:in `transmit'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:60:in `execute'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/request.rb:31:in `execute'
        from /Library/Ruby/Gems/1.8/gems/rest-client-1.6.1/lib/restclient/resource.rb:72:in `put'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:134:in `uploader_function_for'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:25:in `call'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:25:in `setup_worker_threads'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:24:in `loop'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:24:in `setup_worker_threads'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:23:in `initialize'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:23:in `new'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:23:in `setup_worker_threads'
        from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:135:in `map'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:22:in `each'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:22:in `map'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:22:in `setup_worker_threads'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_uploader.rb:69:in `upload_cookbook'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/knife/cookbook_upload.rb:138:in `upload'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/knife/cookbook_upload.rb:74:in `run'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_loader.rb:89:in `each'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/cookbook_loader.rb:88:in `each'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/knife/cookbook_upload.rb:72:in `run'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/knife.rb:391:in `run_with_pretty_exceptions'
        from /Library/Ruby/Gems/1.8/gems/chef-0.10.0/lib/chef/knife.rb:166:in `run'

To resolve this problem I removed the nginx proxy from the communications chain and that resolved the problem. The problem occurred in just a single recipe, and I believe the culprit was a 2.2MB file that was failing to upload through the proxy. Yes - I know chef should not be used with 2.2MB files.

After removing the proxy from the chain it was possible to do the rake install and all worked well.

Tuesday, April 12, 2011

LXC, interface bonding, vlans, macvlan and communication with host

I've been using LXC for several months now and have found it to be excellent. There are a few caveats, however these are being dealt with as LXC matures.

One of the difficulties with setting up a container system can be how the network is configured. In most cases if communication is required between the host and the container the advice is to use linux bridges on your host system. This works well in most cases, except with some network drivers configured for interface bonding with bridges and sub-interfaces on other vlans.

To summarize what didn't work with the tigon3 driver on Ubuntu 10.04.2, with a PPA 2.6.38-8-server kernel. This may or may not be the case with other drivers and distros.

bond0
  slaves eth0 eth1
br0
  bond0
  vethLXC0
  vethLXC1
bond0.100

So it turned out the tigon3 driver didn't support vlans when the base port was part of a bridge. This meant bond0.100 was behaving erratically, even with various bridge options configured, such as bridge_fd 0, etc.

The result was that using veth mode for the lxc containers was out of the question. So I turned to macvlan. Unfortunately the containers can't communicate with the host system when using the macvlan interface type, so another hurdle here.

Enter the newer macvlan bridge mode. The lxc.conf man page indicates that that when using macvlan bridge mode the guests can communicate with each other. Unfortunately they still can't communicate with the host. Easily solved - give the host a macvlan interface in bridge mode too. Well - not so easily solved since the Ubuntu 10.04 version of iproute2 doesn't support this.

So hack around this and install the Debian squeeze version of iproute into Ubuntu Lucid, then do the config and it works. Note that I'm using a PPA 2.6.38 kernel which has a more recent implementation of the macvlan code.

Once you have the right kernel and the right iproute2 version you need the following config in your /etc/network/interfaces to make this work:
auto bond0
iface bond0 inet static
    bond-slaves eth0 eth1
    bond-mode 1
    bond-miimon 100
    # address & netmask are required to satisfy the startup scripts
    address 0.0.0.0
    netmask 0.0.0.0

auto mv0
iface mv0 inet static
    pre-up ip link add link bond0 name mv0 type macvlan mode bridge
    address 10.0.0.3
    netmask 255.255.255.0

auto bond0.100
iface bond0.100 inet static
    address 192.168.0.3
    netmask 255.255.255.0

Now the host and guests can all communicate with each other and the guests will appear on the network with their own MAC address.

This approach also works on Debian Squeeze 6.0 with the FAI kernels.

Thursday, January 20, 2011

Determine the latency to a SixXS Point of Presence

I've recently moved from NL to the UK and was looking at my SixXS IPv6 link. I wanted to make sure I'm getting the best connection possible, so went about checking the latency between my endpoint and the PoPs.

First - I did a simple mtr to my existing PoP in NL, then to the only UK based PoP. This initial test showed the latency to my existing endpoint was around 23ms, while it was around 30ms to the local UK based PoP. This doesn't seem right, so dig further.

mtr --psize=1496 nl-endpoint.sixxs.net

mtr uses ICMP echo requests to determine latency. By default these are 56 bytes, which is an unrealistic packet size for determining latency practical between two points. Once the packet size was set to something more reasonable, such as --psize=1496, then I saw less latency to the UK PoP.

mtr --psize=1496 uk-endpoint.sixxs.net

Aside from using mtr, there is another way to get a broader idea of latency between your endpoint and the available PoPs. The PoPs page at SixXS has a list of all currently active pops. Using this, here's a one-liner to fping all of them and quickly determine your best link:

wget -q --no-check-certificate -O - https://www.sixxs.net/pops/ | grep -Eo '[a-z]{5}[0-9]{2}' | grep -v xhtml | sort -u | while read PoP; do echo -n "$PoP.sixxs.net "; done | xargs fping -i 100 -p 50 -b 1208 -c 50 -n -s -a

Sunday, January 2, 2011

LIRC in the 2.6.36 linux kernel

I've just upgraded to the 2.6.36 kernel and have found that lirc stopped working. lircd would load with the lirc-0.8.6 lirc_dev and lirc_sir modules in my kernel, but when irexec starts I'd get the following type of kernel oops.

BUG: unable to handle kernel NULL pointer dereference at 0000005c
IP: [] lirc_get_pdata+0x2e9/0x841 [lirc_dev]
*pde = 00000000 
Oops: 0000 [#6] 
last sysfs file: /sys/devices/virtual/block/md3/dev
Modules linked in: lirc_sir lirc_dev cls_route cls_u32 cls_fw sch_sfq sch_htb ipt_addrtype xt_DSCP xt_dscp xt_NFQUEUE xt_iprange xt_hashlimit xt_connmark nf_conntrack_sip w83697hf_wdt [last unloaded: lirc_dev]

Pid: 22422, comm: lircd Tainted: G      D     2.6.36.2_01 #3 CN700-8237R/ 
EIP: 0060:[] EFLAGS: 00010246 CPU: 0
EIP is at lirc_get_pdata+0x2e9/0x841 [lirc_dev]
EAX: f63fdec0 EBX: 80046900 ECX: 08062348 EDX: 80046900
ESI: f63fdec0 EDI: 00000000 EBP: 00000007 ESP: dc67bf14
 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process lircd (pid: 22422, ti=dc67a000 task=f3dddc00 task.ti=dc67a000)
Stack:
 08062348 f63fdec0 fd6f82bd 00000007 00000007 c1079b76 c1deb3a0 f576a7ac
<0> b78287c0 00000004 00000028 c105d5f3 f5d17b78 000000c7 00000000 00000000
<0> 00000000 f5dbd900 b78287c0 f5d17b78 000000a0 00000246 f63fdec0 00000020
Call Trace:
 [] ? lirc_get_pdata+0x2bd/0x841 [lirc_dev]
 [] ? do_vfs_ioctl+0x456/0x4a1
 [] ? handle_mm_fault+0x2e0/0x672
 [] ? sys_ioctl+0x44/0x64
 [] ? sysenter_do_call+0x12/0x26
Code: 57 56 89 c6 53 89 d3 83 ec 04 83 3d 78 98 6f fd 00 89 0c 24 8b 78 68 74 12 52 ff 77 28 57 68 37 8f 6f fd e8 5a 0d ce c3 83 c4 10 <8b> 47 5c 85 c0 74 1d 8b 68 20 85 ed 74 16 8b 0c 24 89 f0 89 da 
EIP: [] lirc_get_pdata+0x2e9/0x841 [lirc_dev] SS:ESP 0068:dc67bf14
CR2: 000000000000005c
---[ end trace 1bec319525f4926b ]---

In my digging I found that parts of lirc are now in the linux kernel as of 2.6.36. But it's not obvious how to make use of this yet.

First - the lirc_dev module is selected from the "Drivers > Staging" section. But it won't appear there until you enable "Drivers > Multimedia > Infrared remove control adapters". This is where the IR_CORE is selected.

Next - lirc-0.9 is going to be needed to make proper use of these in-kernel modules. As of writing this lirc-0.9 is still pre-release, so I have taken the shortest path to get a working remote.

On my gentoo system, after building my new kernel and installing it I used to emerge lirc again to make sure the modules are in place. This is no longer necessary since the modules I need are fully in kernel. I've found that simply replacing the lirc built modules with the in-kernel ones, and continuing to use lirc-0.8.7 appears to work now. The lirc modules loaded are lirc_dev and lirc_sir.

See the new maintainers blog post about this at http://wilsonet.com/?p=85