The greatest NRPE debugging trick yet!

2015-09-24 pmulrooney

While debugging a randomly failing check on some of our hypervisors, we came across a tip that makes debugging the failed NRPE checks so much easier. Simply add ‘2>&1’ to the end of the check definition in your nrpe.cfg and restart the service. This will send all the stderr to NRPE as well.
command[check_kvm_memory]=/usr/lib64/nagios/plugins/check_kvm_memstats -c 95 -w 90
command[check_kvm_memory]=/usr/lib64/nagios/plugins/check_kvm_memstats -c 95 -w 90 2>&1

It took the output of the command from..

NRPE: Unable to read output

error: failed to connect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/nrpe/.cache/libvirt/libvirt-sock': No such file or directory

Thanks to ufk at stack exchange for the tip.

Translating a NFS file handle to inode the long-ish way

2015-07-15 pmulrooney

There might be a better tool or easier way to do this, but the method works for me 🙂

I was looking into a large number of getfattr requests on one of our CentOS 6 NFS Servers, and was curious what files all the requests were for and where they were coming from. The where part just requires the always helpful tcpdump…

# tcpdump -i bond0 'tcp port nfs' .... 08:58:18.971667 IP xxxxxx.3434118295 > xxxxx.sdsc.edu.nfs: 128 getattr fh Unknown/0100060188D45B4900CC684F00000000000000000...

This alone gets us most of the information we are looking for with the exception of which file was being acted on. It does give us the NFS file handle though, and that is easy enough to translate. Using the chart on page 5 of the following pdf, we see the file handle has the format…

Length	Bytes	Field Name	Meaning	Typical Values
1	1	`fb_version`	NFS version	Always 1
1	2	`fb_auth_type`	Authentication method	Always 0
1	3	`fb_fsid_type`	File system ID encoding method	Always 0
1	4	`fb_fileid_type`	File ID encoding method	Always either 0, 1, or 2
4	5-8	`xdev`	Major/Minor number of exported device	Major number 3 (IDE), 8 (SCSI)
4	9-12	`xino`	Export inode number	Almost always 2
4	13-16	`ino`	Inode number	2 for `/`, 19 for `/home/foo`
4	17-20	`gen_no`	Generation number	0xFF16DDF1, 0x3F6AE3C0
4	21-24	`par_ino_no`	Parent’s inode number	2 for `/`, 19 for `/home`
8	25-32	Padding for NFSv2		Always 0
32	33-64	Unused by Linux

…so now we just need to split up our file handle to get the inode.

0	fb_version
1	fb_auth_type
0	fb_fsid_type
0	fb_fileid_type
0601	xdev
88D4	xino
5B49	ino (what we want)
00CC	gen_no
684F	par_ino_no
00000000	Padding
000000000…	??

Then we just need to convert from hex to decimal and pass it to find…

$ echo $((0x5b49)) 23369

find /path/to/export -inum 23369

And we get to take a bit of a shortcut here as this system only had a single export, so no need to figure out what filesystem it was coming from.

nfsd io bytes read/written counter lower then they were two seconds ago??

2015-07-01 pmulrooney

While looking at a busy CentOS 6 NFS server I decided to to write a quick script to generate a diff of the NFS v3/v4 statistics every five seconds to see what calls were being made. Easy enough to do, just read the values from the lines starting with proc(2|3|4|4ops) from /proc/net/rpc/nfsd and you are good to go. Several site provide you the mapping, but they are given in they should be in the same order as the output of ‘nfsstat -s’ if you do not want to look around.

I decided to include a diff of the output from the line ‘io … …’ as well as that will give the bytes read and written. Easy enough as well, until you start seeing negative values. This a decent system that when running full speed can do better than 10Gb/s of streaming data, but the bytes counters are down in the GBs.

Searched around a bit and all the sites confirmed that this is a just a counter that does not do any sort of ‘this is the amount changed since the last time you asked’. Decided to check the source code and caught it.

struct nfsd_stats { ... unsigned int fh_nocache_nondir; /* filehandle not found in dcache */ unsigned int io_read; /* bytes returned to read requests */ unsigned int io_write; /* bytes passed in write requests */ unsigned int th_cnt; /* number of available threads */ ...

};

Looks like it probably is a 32 bit unsigned int which would max out at 4GB. Which would explain everything. So not an accurate counter, but as long as you are not doing more than 4GB of io during your interval you can still find the diff…

if new_value > old_value return new_value - old_value else return 4294967296 - old_value + new_value

Handy trick for working with OpenStack Neutron network namespaces

2015-05-08 pmulrooney

While attempting to examine the plumbing of OpenStack Neutron networking you can find yourself with horribly long commands along the lines of…

# ip netns exec qdhcp-e4ea15c9-8597-4f0b-a3eb-81a74a402a24 ip addr
# ip netns exec qdhcp-e4ea15c9-8597-4f0b-a3eb-81a74a402a24 ip route

Save yourself a little sanity, and use the trick recommended by Open Cloud Blog. Simply launch a shell in that namespace, and the results of subsequent commands while be scoped to that namespace.

# ip netns exec qdhcp-e4ea15c9-8597-4f0b-a3eb-81a74a402a24 bash
# ip addr
# ip route
# exit

Just type ‘exit’ to back out.

KVM Backups – an exercise in screwing up your ext4 filesystem

2015-03-12 dferbert

We needed a way to provide VM level image backups purely for disaster recovery. Our KVM hosts have local disk, so offloading a qcow2 file seemed the best way. I found a script online [1] and started digging.

Few roadblocks at first. Centos 6 ships with qemu-0.12 where ‘qemu-img create’ doesn’t accept a snapshot name, and the ‘qemu-img convert’ can’t pull it back out. Little bit of compilation later, i had a shiny 2.1.2 qemu-img (the only thing the script needed differently to run.)

Looked great at first, ramped up with 3-4 VM guests. Nightly runs, blah blah.

Then a few days later the badness happened.

 Mar  5 12:15:49 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1178554
 Mar  5 12:15:49 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1177551
 Mar  5 12:15:49 host.sdsc.edu sshd[31024]: pam_env(sshd:setcred): Unable to open config file: /etc/security/pam_env.conf: Input/output error
 Mar  5 12:17:32 host.sdsc.edu kernel: EXT4-fs error (device vda2): ext4_lookup: deleted inode referenced: 1177463

Only one of the guests was having a problem, but on top of that the ‘qemu-img convert’ snapshot images were bad too. No amount of fsck’ing could fix those suckers.

The other guinea pig KVM guests were relatively idle, but this system was a postgres server. Firing up bonnie++ to the localdisk on a “working” guest caused it to break similarly.

I poked around for a solution that didn’t require installing non-standard installs on the host and found

virsh snapshot-create-as --quiesce

Whipped up a script revolving around that and…

It requires a few extra tweaks that need to be included. It does require qemu-ga, the QEMU Guest Agent, be installed/running on the guest and the guest XML definition to properly make a new device:

<channel type='pty'>
 <target type='virtio' name='org.qemu.guest_agent.0'/>
 <address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>

So far i’ve given this a proper run with multiple bonnie++ running and not had any problems with either the running base, a reboot on a rebased image, a copied off “old” base. And it even appears that leaving off the –quiesce doesn’t cause a break either. This is handy when you haven’t installed ‘qemu-ga’ and/or can’t restart the guest with modified XML definitions.

I find this a much better solution so far, but it’s not without limitations.

You must have sufficient disk space to hold two working copies of a guest (a limitation of libvirt-0.10). The new rebased running image and the old base. Depending on how you copy-out the old one, that may be the only space you need. With later versions of libvirt you should be able to use the ‘virsh blockcommit’ to shove the snapshot delta back into the old base. We’re stuck with ‘virsh blockcopy’ that makes a new base.

When we get our Centos 7 KVM guest up and going, i will see how things are different with a ‘modern’ libvirt/qemu pair.

References

[1] : http://www.sleepdontexist.com/2014/03/28/kvm_manage-sh-a-script-to-manage-your-kvm-machines/

GMond Python Module Notes

2015-03-12 pmulrooney

https://github.com/ganglia/monitor-core/wiki/Ganglia-GMond-Python-Modules

To create a custom GMond Python module requires a config file, and a module file which has three required functions: metric_init, metric_cleanup, & get_value*. In your config file (/etc/ganglia/conf.d/MODULE_NAME.pyconf) follow the structure:

modules {
  module {
    name = 'MODULE-NAME'
    language = 'python'

    param KEY {
      value = 'SOMETHING'
    }
  }
}

collection_group {
  collect_every = 30
  time_threshold = 30

  metric {
    name_match = "MODULE-NAME_(.+)"
  }
}

In your python module file (/usr/ganglia/lib64/ganglia/python_modules/MODULE_NAME.py) you need three functions:

metric_init : This is called when gmond starts, then not called again. You need to return a description of your checks (more about that below). It is passed a hash of params from the “param” block in the conf file.
metric_cleanup : Probably do not need it, but have to include it or it complains. Called when gmond if shutting down if you have any cleanup to do. Just use the block below…
get_value : Does not need to be named get_value, but whatever you pass to the ‘call_back’ value in the descriptors hash. This function is called by gmond for each metric each time it is run. After the initial run it will be called directly without going through metric_init. Gets passed the ‘name’ value from the descriptors hash. It must return the metric value.

Desc_Skel = {
    'name'        : 'METRIC_NAME', 
    'call_back'   : get_value, 
    'time_max'    : 10, # Does not do anything??
    'value_type'  : 'float', 
    'format'      : '%f', # https://docs.python.org/2/library/stdtypes.html#string-formatting
    'units'       : '', 
    'slope'       : 'both', # zero|positive|negative|both, but probably 'both'
    'description' : 'METRIC DESCRIPTION', # Only used when you run 'gmond -m'
    'groups'      : 'MODULE_NAME', # Not used the way we use gmond
}

If you want to have multiple metrics, pass them as an array in your return from the metric_init block.

Beyond that you can do whatever you want that is valid python. For debugging, include the following block at the bottom of your code:

# the following code is for debugging and testing
if __name__ == '__main__':
    descriptors = metric_init(PARAMS)
    for d in descriptors:
        print (('%s = %s') % (d['name'], d['format'])) % (d['call_back'](d['name']))

Random Notes:

Global variables are maintained between calls
To control frequency of calls I find it best to set a global variable with a time, and another with values. Then in my get_value function have it only refresh the data if it has been less then some time frame.
Use ‘gmond -m|grep MODULE_NAME’ to make sure gmond is finding your module.
View the system logs for gmond startup output that might tell you why your module is not working.
For some reason the first pass of checks seems to run as ‘root’ and not the user specified in the gmond.conf file. This seems like a bug, but can be useful if you just need elevated privileges for your metric_init, but not follow up calls. Also can be the cause of your check initially working, and then later failing.

SDSC IT Blog