I attended SCaLE 11x, my first technical conference, and had an amazing time. My favorite talk was Michael Day’s “Advancements with Open Virtualization & KVM” (link to slides). Michael’s presentation inspired me to continue my work on virt-back.
During my trip home I used the in-flight wifi to push this commit into the cloud from the clouds! This particular commit re-factored the dom object list generation into a simple-to-use class called Domfetcher. Domfetcher abstracts the libvirt API and grants access to the following helper methods:
Return a list of all dom objects
get_doms_by_names( guest_names= )
Accept a list of guest_names, return a list of related dom objects
Return a list of running dom objects
Return a list of shutoff but defined dom objects
This is an example of how to use Domfetcher:
# optionally supply hypervisor uri
domfetcher = virtback.Domfetcher()
doms = domfetcher.get_running_doms()
for dom in doms:
for dom in doms:
for dom in doms:
As always thanks for reading!
I came up this this script to kill certain programs after they run for too long. This works like similar to a timeout. Warning this script is pretty harsh and kills the program.
for pid in `pidof $PROGRAM`
if grep -q $pid $PIDSFILE
for pid in `pidof $PROGRAM`
echo $pid >> $PIDSFILE
Then I wrote a cronjob to kill hung programs:
* * * * * /usr/local/sbin/killprogs.sh
I spent the weekend fretting because one of my servers was basically being DOS’d by paying customers. During the outage I started thinking about the best way to scale and how I could make the code-base more efficient.
Linux top reported high load, in the 20′s. Eventually I figured out that the server was having IO performance issues.
I wasted a bunch of time attempting to fight fires. After about an hour of that I decided to scale my VPS vertically by giving it an extra 256mb of memory and a larger swap file (256mb to 1024mb).
These two changes were surprisingly effective and the IO issues resolved. Apparently the server was starving for memory.
Crisis averted for the moment. Now I am free to think clearly and engineer a proper solution instead of attempting to put out fires.
If you ever encounter a similar situation, attempt the simplest fix.
In this case, an extra $10.00 a month relieved the performance issues and bought myself some time, for the moment.
High load on web server after updating from Ubuntu 10.04 to Ubuntu 12.04 LTS
Check out charts which lineup to when I upgraded:
I couldn’t determine the cause of the load average increase…
Update: The issue might be memory bound. Check out this graph that show much higher swap.
After much research this appears to be a load calculation and display problem with the newer Linux kernels. The community has found Commit-ID: c308b56b5398779cd3da0f62ab26b0453494c3d4 to be the problem. The commit causes incorrect high reported load averages can be reported under conditions of light load and high enter/exit idle frequency conditions (greater then 25 hertz).
A nice fellow at http://www.smythies.com/~doug/network/load_average/new.html researched the topic between tick and tickless linux kernels and the effect they had on load averages. You should check it out.
Today our production Citrix NetScaler broke. The box wouldn’t boot and our only backup copy of the config was on the NetScaler itself.
Being the only Unix guy around I attempted to help out the admins working the outage. I SSH’d into the development NetScaler and noticed it runs on FreeBSD.
I suggested fetching the Hard drive and mounting it on a Linux computer. The NetScaler has one SATA (not SAS) disk so my desktop was compatible.
I installed the disk in the Linux tower and mounted the filesystem using the following command:
mount --read-only --type=ufs --test-opts ufstype=44bsd /dev/sda5 /mnt
Once mounted I was able to SCP interesting files to a safe location.
Warning, this procedure might void your warranty. If in doubt, call support first.
Just found this out the hard way…
It looks like the attachment of
/KVMROOT/guest-dev-app.img on guest-dev did not persist when the KVM host rebooted for patching.
As it appears the
virsh attach-disk command works a lot like the
In order to have a disk attachment persist after a reboot, I think we still need to do a
virsh edit <dom>.
virsh attach-disk command is useful because it allows us to attach disk images to guests without restarting.
virsh attach-disk is to
virsh edit is to
I recently needed to monitor an HTTPS API for response time and availability. At first I planned to just use the Nagios check_http command.
After gathering more requirements I learned that the API was protected by client certificate authentication. After some research I quickly found that no solution existed to monitor HTTP protected by client certs. I needed to write my own plugin.
This is the python plugin I came up with: check_http_client_cert.py
"""Nagios/Zenoss client cert https checker"""
from optparse import OptionParser
from time import time
from sys import exit
def request( hostname, port, cert_file, path ):
"""request a resource and return response object"""
c = httplib.HTTPSConnection( hostname, port, cert_file=cert_file )
c.request( "GET", path )
if __name__ == '__main__':
parser = OptionParser()
parser.add_option('-H', '--hostname', dest='hostname')
parser.add_option('-p', '--port', dest='port')
parser.add_option('-c', '--cert_file', dest='cert_file')
parser.add_option('-P', '--path', dest='path',
help="Path relative to root, like /image/search")
o, args = parser.parse_args()
start = time()
r = request( o.hostname, o.port, o.cert_file, o.path )
elapse = time() - start
if r.status >= 200 and r.status < 400:
print "HTTP OK:", r.status, r.reason, "|time=" + str(elapse) + "s;;;"
exit( 0 )
print "HTTP Critical:", r.status, r.reason
exit( 2 )
This graph could happen to you if you ever forget to configure munin email alerting:
It only took approximately 1 hour to diagnoses and resolve this issue however most of my web applications hosted on this server were down for about 11 hours. I was lucky that this outage fell on a weekend otherwise I would not have known about the problem till around 6:30pm!
Two of my pylons apps had session files that slowly ran away on me. The session files don’t consume much capacity however the shear quantity of them caused my inode usage to hit 100%.
Had I properly configured Munin’s email alerting this issue would have been identified well before it was a problem.
Want to know what alerted me to the problem? G Webmaster’s tools claimed it could not read my robots.txt on a few of my sites… After investigating I learned the site was down. Checking the Apache error logs pointed me to disk space issues.
df -ha reported everything was fine, however
df -hi reported 100% inode usage! At this point I started looking to cache and log locations to find lots of files, which lead me to my pylons web applications data/sessions directories.
Delete the session cache tree directories and allow the applications to rebuild them.
Todo: move /www off the root disk partition. This issue could have been much worse if I was unable to boot or login to remedy. Moving /www off root should prevent the web server from effecting the systems ability to boot.
Some operating systems depend on a specific version of python to function properly. For example, Yum on Redhat Enterprise Linux 5 (RHEL5) depends on python 2.4.3. This version of python lacks support from many utilities and 3rd party libraries. This guide will cover installing an alternative python instance while leaving the system’s python alone.
This guide supports the following operating systems: Redhat, CentOS, and Fedora. As of this publication the latest Python version was 2.7.2; You might want to determine if a newer version exists.
- Gather the dependencies:
gcc is a compiler used to build python
yum install gcc zlib-devel python-setuptools readline-devel
zlib-devel allows the python zlib module to be built.
python-setuptools provides the easy_install application.
readline-devel arrows readline and history handling in python shell.
- Download and untar the python sourcecode:
tar -xzvf Python-2.7.2.tgz
- Compile the sourcecode:
- Test new alternative python:
- Now we can install third party libraries into our alternative python.
python2.7 -m easy_install <package name or egg path>
Optionally we can create a virtualenv (for development) based on the python 2.7 install. Virtual environments appear useful for testing packages and libraries without installing them to the system owned python site-packages directory.
- Install virtualenv using easy_install:
- Create a new virtual python environment named virtpy:
virtualenv --no-site-packages -p /usr/local/bin/python2.7 virtpy
This will create a virtual python 2.7.2 environment named virtpy in your present working directory.
To invoke this environment run
source virtpy/bin/activate and your prompt should change to reflect the active virtualenv.
Now you can run
easy_install to install packages into virtpy/lib/python2.7/site-packages.
Thanks for reading, that’s all for now.
In a perfect world we should create backups but never need them. Although this statement holds truth, creating guest backups provides many more benefits.
The most common reasons system administrators restore from a virt-back guest backup:
- recovering from data corruption
- recovering deleted files
- recovering from a virus infection
- recovering from a compromised server
- backing out a failed change
- rolling back to a previous state
- testing disaster recovery plans
- cloning a server
- building test environments
During this article we will cover how to restore a system from a virt-back guest backup. This article will not cover how to restore a VM host server.
Virt-back guest restore procedure
In this guide our guest mbison has failed with a major corruption and we would like to restore from our backups. We have our running production guest images in /KVMROOT and our virt-back guest backups in /KVMBACK.
- Ensure the guest is shut off.
- move the bad image file out of the way
- untar the virt-back backup into place
- power up the guest
- Verify the guest is shut off by running:
- We noticed that mbison was still running so we invoked:
virt-back --verbose --shutdown mbison
- Move the corrupted image file out of the way:
mv /KVMROOT/mbison.img /KVMROOT/mbison.img.NFG
- Unzip and unarchive the backup using the following command:
sudo tar -xzvf /KVMBACK/mbison.tar.gz -C /KVMROOT --strip 1
- When the untar completes, start the guest:
virt-back --verbose --create mbison
- Connect to the guest over SSH and verify that all required services and applications start. Determine if the restore was successful.