Archive for June, 2012

Happy Password Change Day

Friday, June 22nd, 2012

If you work at a place like my company, your passwords expire regularly. If you have a job similar to mine, you have a whole mess of systems that you have to change your password on. If you have a personality like mine, this is a really boring task that you’d rather not deal with, but you have to.

Well, I have a solution for you. It’s based on my last post, that magical Python pexpect script. It’s stripped down a little more, but I’m sure you’ll find most of it very familiar if you read through the last one. So, without further ado, I introduce to you my magically delicious password change script:

#!/usr/bin/python
#
# Change a user's password on multiple systems, ensuring that given user
# has valid sudo access. (2 birds, 1 stone)
#
# It requires that you have sudo available on the target(s) and that you
# can run the given command under sudo. It does not require SSH keys be set
# up, since it handles the password dialogs for both SSH login and sudo
# access.
#
# Any vulgarities in this code are the result of being lazy about
# case sensitivity checks, and are not deliberate. If you decide to be
# offended, you need to get over it.

import pexpect
from optparse import OptionParser
import os
import getpass
import signal
import sys
from datetime import datetime

DEBUG = 0
jams = []
misses = []
hits = []

def getTargets(hostspec):
  global DEBUG
  if os.path.isfile(hostspec):
    if DEBUG:
      print "Reading hosts from file "+hostspec
    fh = open(hostspec, 'r')
    hosts=fh.read()
  else:
    if DEBUG:
      print "Using hosts from command line."
    hosts=hostspec
  return hosts.split()

def pullTrigger(target, oldpass, newpass, username):
  global DEBUG, jams, misses, hits
  rangeHot = "\$ "
  # First, we launch the ssh process and get logged in to the target
  # Set a 5 minute timeout on commands, not 30 seconds
  proc = pexpect.spawn("ssh "+target)
  while True:
    index = proc.expect(["The authenticity of host", "assword:", "Permission denied", rangeHot, pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline("yes")
    elif index == 1:
      proc.sendline(oldpass)
    elif index == 2:
      jams.append(target)
      if DEBUG:
        print "Dud cartridge. Clearing chamber, proceeding with firing plan..."
      proc.kill(signal.SIGKILL)
      return
    elif index == 3:
      break
    elif index == 4:
      jams.append(target)
      if DEBUG:
        print "Cartridge jammed, clearing chamber, proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return
    elif index == 5:
      jams.append(target)
      if DEBUG:
        print "Squib load. clearing chamber, proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return

  # Go root
  if DEBUG:
    print "Becoming root inside expect spawn."
  rangeHot = becomeRoot(proc, oldpass)
  if rangeHot == "EOF":
    misses.append(target)
    if DEBUG:
      print "Missed target low. Proceeding with firing plan."
    proc.kill(signal.SIGKILL)
    return
  if rangeHot == "TIMEOUT":
    misses.append(target)
    if DEBUG:
      print "Missed target high. Proceeding with firing plan."
    proc.kill(signal.SIGKILL)
    return

  # Change password
  proc.sendline("passwd "+username)
  proc.expect(":")
  proc.sendline(newpass)
  proc.expect(":")
  proc.sendline(newpass)

  index = proc.expect([rangeHot, pexpect.EOF, pexpect.TIMEOUT])
  if index != 0:
    misses.append(target)
    if DEBUG:
      print "Missed wide left. Proceeding with firing plan."
    proc.kill(signal.SIGKILL)
    return

  # A hit! A veritable hit! O frabjious day!
  hits.append(target)
  rangeHot = exitRoot(proc)
  proc.sendline("exit")

def exitRoot(proc):
  global DEBUG
  # Quick and dirty. This should really be nicer, but I'm lazy and it's
  # almost guaranteed to work if you actually got this far.
  if DEBUG:
    print "Leaving root shell."
  proc.sendline("exit")
  proc.expect("\$ ")
  return "\$ " 


def becomeRoot(proc, passwd):
  proc.sendline("uname -s")
  index = proc.expect(["SunOS", "Linux"])
  if index == 0:
    proc.sendline("super root-shell")
  elif index == 1:
    proc.sendline("sudo su -")
  while True:
    index = proc.expect(["assword", "\# ", pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline(passwd)
    elif index == 1:
      return "\# "
    elif index == 2:
      return "EOF"
    elif index == 3:
      return "TIMEOUT"

def main():
  global DEBUG, jams, misses, hits

  # Set up command line options / arguments
  parser = OptionParser()
  parser.disable_interspersed_args()
  parser.set_defaults(saveResults=True)
  parser.add_option("-H", "--hosts", dest="hostspec", help="hosts to run the command(s) on", metavar="HOSTSPEC", default="pyexphosts")
  parser.add_option ("-d", "--debug", action="store_true", dest="debug", help="print debugging messages")
 
  (options, args) = parser.parse_args()

  if options.debug:
    DEBUG=1
  
  targets = getTargets(options.hostspec)

  username = raw_input("User name to change password for: ")

  oldpass = getpass.getpass("Old password: ")
  newpass = getpass.getpass("New password: ")

  for target in targets:
    if DEBUG:
      print "Launching commands at target "+target
    pullTrigger(target, oldpass, newpass, username)

  if (len(jams)):
    print "Jams noticed:"
    for target in jams:
      print "Target "+target
  if (len(misses)):
    print "Misses noticed:"
    for target in misses:
      print "Target: "+target
  print "Done changing passwords."

if __name__ == "__main__":
  main()

Run ALL the things – everywhere!

Wednesday, June 20th, 2012

Yes, that meme is a bit overused and trite. That’s okay, it’s still fun. At least, I think it is, and since I’m the author, my opinion is the one that counts.

So why am I using it? Well, I came across some information I needed to collect from all of our Linux systems the other day. We have an in-house routine called ‘rrun’ that will let us launch commands on a specified set of systems, as root, on demand. Simple solution, right? Well, not really – unfortunately, the thing I needed to run wouldn’t run properly inside of the ‘rrun’ tool. What’s a poor deprived sysadmin soul to do in this situation?

Hopefully not what I did. I basically reinvented the wheel – though I think I made it better.

I remembered using an expect-based script many years ago that would ssh out to various systems and run commands for you, and thinking it was a wonderful thing. Well, I didn’t have that script any longer, and since I didn’t really want to re-learn Tcl, I looked for alternatives. I found Python’s pexpect module, which is basically a reimplementation of expect in Python.

After a bit of thinking and a lot of coding, I came up with the code you see below. If you like it, feel free to use it, though do be warned that the version I’m posting has not been extensively tested or Fred-proofed. I’ve also got some work left on refining the debugging levels and such, but that’s for later.

And yes, I did have firearms on the brain when I was writing it.  🙂

#!/usr/bin/python
#
# Clone of 'rrun', an internal program that runs a command as root on
# multiple target systems.
#
# It requires that you have sudo available on the target(s) and that you
# can run the given command under sudo. It does not require SSH keys be set
# up, since it handles the password dialogs for both SSH login and sudo
# access.
#
# Any vulgarities in this code are the result of being lazy about
# case sensitivity checks, and are not deliberate. If you decide to be
# offended, you need to get over it.

import pexpect
from optparse import OptionParser
import os
import getpass
import signal
import sys
from datetime import datetime

DEBUG = 0
jams = []
misses = []
hits = []

def getTargets(hostspec):
  global DEBUG
  if os.path.isfile(hostspec):
    if DEBUG:
      print "Reading hosts from file "+hostspec
    fh = open(hostspec, 'r')
    hosts=fh.read()
  else:
    if DEBUG:
      print "Using hosts from command line."
    hosts=hostspec
  return hosts.split()

def loadAmmunition(cmdspec):
  global DEBUG
  if os.path.isfile(cmdspec):
    if DEBUG:
      print "Reading commands from file "+cmdspec
    fh = open(cmdspec, 'r')
    commands = fh.read()
    fh.close()
  else:
    if DEBUG:
      print "Using commands from command line."
    commands = cmdspec
  return commands

def readPassword():
  return getpass.getpass("Use what password? ")

def pullTrigger(target, cmds, passwd):
  global DEBUG, jams, misses, hits
  rangeHot = "\$ "
  # First, we launch the ssh process and get logged in to the target
  # Set a 5 minute timeout on commands, not 30 seconds
  proc = pexpect.spawn("ssh "+target, timeout=300)
  while True:
    index = proc.expect(["The authenticity of host", "assword:", "Permission denied", rangeHot, pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline("yes")
    elif index == 1:
      proc.sendline(passwd)
    elif index == 2:
      jams.append(target)
      if DEBUG:
        print "Dud cartridge. Clearing chamber, proceeding with firing plan..."
      proc.kill(signal.SIGKILL)
      return
    elif index == 3:
      break
    elif index == 4:
      jams.append(target)
      if DEBUG:
        print "Cartridge jammed, clearing chamber, proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return
    elif index == 5:
      jams.append(target)
      if DEBUG:
        print "Squib load. clearing chamber, proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return

  # We're logged in. Create the shell file with the commands.
  proc.sendline("echo "+cmds+" > /tmp/expectcmd.sh")
  index = proc.expect([rangeHot, pexpect.EOF, pexpect.TIMEOUT])
  if index != 0:
    misses.append(target)
    if DEBUG:
      print "Stop firing into the ceiling! Proceeding with firing plan."
    proc.kill(aignal.SIGKILL)
    return

  # Go root (if indicated by sys.argv[0])
  if (sys.argv[0].endswith("rlaunch") ):
    if DEBUG:
      print "Becoming root inside expect spawn."
    rangeHot = becomeRoot(proc, passwd)
    if rangeHot == "EOF":
      misses.append(target)
      if DEBUG:
        print "Missed target low. Proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return
    if rangeHot == "TIMEOUT":
      misses.append(target)
      if DEBUG:
        print "Missed target high. Proceeding with firing plan."
      proc.kill(signal.SIGKILL)
      return

  # Execute the command, redirecting stdout/stderr
  proc.sendline("/bin/sh /tmp/expectcmd.sh > /tmp/expectcmd.out 2>/tmp/expectcmd.err")
  index = proc.expect([rangeHot, pexpect.EOF, pexpect.TIMEOUT])
  if index != 0:
    misses.append(target)
    if DEBUG:
      print "Missed wide left. Proceeding with firing plan."
    proc.kill(signal.SIGKILL)
    return

  # A hit! A veritable hit! O frabjious day!
  hits.append(target)
  if ( sys.argv[0].endswith("rlaunch") ):
    rangeHot = exitRoot(proc)
  proc.sendline("exit")

def exitRoot(proc):
  global DEBUG
  # Quick and dirty. This should really be nicer, but I'm lazy and it's
  # almost guaranteed to work if you actually got this far.
  if DEBUG:
    print "Leaving root shell."
  proc.sendline("exit")
  proc.expect("\$ ")
  return "\$ " 


def becomeRoot(proc, passwd):
  proc.sendline("sudo su -")
  while True:
    index = proc.expect(["assword", "\# ", pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline(passwd)
    elif index == 1:
      return "\# "
    elif index == 2:
      return "EOF"
    elif index == 3:
      return "TIMEOUT"

def cleanBrass(target, passwd):
  global DEBUG
  rangeHot = "\$ "
  if DEBUG:
    print "Cleaning up spent brass for target "+target
  # First, we launch the ssh process and get logged in to the target
  # Set a 5 minute timeout on commands, not 30 seconds
  proc = pexpect.spawn("ssh "+target, timeout=300)
  # We don't handle certain types of things we do in pulling the trigger since
  # we already know we succeeded once so we will succeed again.
  while True:
    index = proc.expect(["The authenticity of host", "assword:", rangeHot])
    if index == 0:
      proc.sendline("yes")
    elif index == 1:
      proc.sendline(passwd)
    elif index == 2:
      break

  # Go root (if indicated by sys.argv[0])
  if (sys.argv[0].endswith("rlaunch") ):
    if DEBUG:
      print "Becoming root inside expect spawn."
    rangeHot = becomeRoot(proc, passwd)
    if rangeHot == "EOF":
      if DEBUG:
        print "Spent brass behind you, not on range."
      proc.kill(signal.SIGKILL)
      return
    if rangeHot == "TIMEOUT":
      if DEBUG:
        print "Can't find any spent brass.."
      proc.kill(signal.SIGKILL)
      return

  # Execute the command, redirecting stdout/stderr
  proc.sendline("/bin/rm -rf /tmp/expectcmd.sh /tmp/expectcmd.out /tmp/expectcmd.err")
  proc.expect(rangeHot)
  if (sys.argv[0].endswith("rlaunch") ):
    rangeHot = exitRoot(proc)
  proc.sendline("exit")


def collectTarget(target, passwd):
  global DEBUG
  if DEBUG:
    print "Collecting results from target "+target
  proc = pexpect.spawn("scp "+target+":/tmp/expectcmd.out "+target+".out")
  while True:
    index = proc.expect(["assword:", "\$ ", pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline(passwd)
    elif index == 1:
      break
    elif index == 2:
      break
    elif index == 3:
      if DEBUG:
        print "Can't find target. Proceeding to next collection."
      break
  proc = pexpect.spawn("scp "+target+":/tmp/expectcmd.err "+target+".err")
  while True:
    index = proc.expect(["assword:", "\$ ", pexpect.EOF, pexpect.TIMEOUT])
    if index == 0:
      proc.sendline(passwd)
    elif index == 1:
      break
    elif index == 2:
      break
    elif index == 3:
      if DEBUG:
        print "Can't find target. Proceeding to next collection."
      break

def setupTargetFile(dirname):
  if os.path.isdir(dirname):
    d = datetime.now()
    os.rename(dirname, dirname+d.isoformat('@'))
  os.mkdir(dirname)

def main():
  global DEBUG, jams, misses, hits

  # Set up command line options / arguments
  parser = OptionParser()
  parser.disable_interspersed_args()
  parser.set_defaults(saveResults=True)
  parser.add_option("-c", "--commands", dest="cmdspec", help="one-line command or file with commands to run", metavar="CMDSPEC", default="pyexpcmds")
  parser.add_option("-H", "--hosts", dest="hostspec", help="hosts to run the command(s) on", metavar="HOSTSPEC", default="pyexphosts")
  parser.add_option("-r", "--results", dest="resdir", help="store results files in DIR", metavar="DIR", default="pyexpresults")
  parser.add_option("-R", "--no-results", dest="nolog", action="store_true")
  parser.add_option("-p", "--password", dest="passwd", help="optional password to use (if not specified, you will be prompted)", metavar="PASSWORD")
  parser.add_option ("-d", "--debug", action="store_true", dest="debug", help="print debugging messages")
  parser.add_option ("-n", "--no-clean", action="store_true", dest="nocleanup", help="Do not clean up the results files on the target systems")
 
  (options, args) = parser.parse_args()

  if options.debug:
    DEBUG=1
  
  targets = getTargets(options.hostspec)

  cmds = loadAmmunition(options.cmdspec)

  if not options.passwd:
    password = readPassword()
  else:
    password = options.passwd

  for target in targets:
    if DEBUG:
      print "Launching commands at target "+target
    pullTrigger(target, cmds, password)

  if options.nolog:
    if DEBUG:
      print "Discarding targets."
  else:
    if DEBUG:
      print "Collecting targets..."
    setupTargetFile(options.resdir)
    os.chdir(options.resdir)
    for target in hits:
      collectTarget(target, password)
    os.chdir('..')

  if not options.nocleanup:
    if DEBUG:
      print "Cleaning up spent brass from misses."
    for target in misses:
      cleanBrass(target, password)

    if DEBUG:
      print "Cleaning up spent brass from hits."
    for target in hits:
      cleanBrass(target, password)

  if DEBUG:
    if (len(jams)):
      print "Jams noticed:"
      for target in jams:
        print "Target "+target
    if (len(misses)):
      print "Misses noticed:"
      for target in misses:
        print "Target: "+target
    print "All ammuntion spent. Hope you had fun at the range!"

if __name__ == "__main__":
  main()

Making things go!

Wednesday, June 13th, 2012

When I started this job, it took me about a week – just under, really – to figure out some ways I could make some very quick and very effective improvements. I’ve gone over some of those in varying details in previous posts, it’s time to detail one of them in particular that I just accomplished.

One of the pain points I identified was the distribution of sysadmin tools. Before I arrived, it had been done by scp’ing a directory to newly deployed servers. I think we can all see the problems with that — data divergence, lack of updates, far too easy to forget to update one or more systems when a given script changes… fun times. I decided Something Had To Be Done. And Quickly.

So I did something. I started out by building a package of those scripts and getting it distributed by our RHN Satellite. That was fairly easy – once I had the package, I just sign and push. Then I started to tackle the whole creating the RPM package bit, which was going to be a wee bit more difficult.

I started out with an empty SVN repository. I couldn’t figure out a clean way of keeping the specfile for the package in with the source tree, so I create two main directories in the repo – packages and specs. The specs directory just has the specfile, nothing more. The packages directory has all the fun stuff. Since I didn’t want packages to bleed through to each other, I then created a new directory for the first package, let’s call it “adminscripts” (no that’s not the actual name I used, I’m sanitizing things as I write).

Inside the adminscripts directory, I established the usual trunk-tags-branches structure so common to SVN projects. This turned out to make things much easier down the line, but I can’t claim any sort of prescience about it – I just did it out of habit and because that’s the way the smart people do things. I’ve got the usual src/ directory off the main project directory, and a Makefile at the top level, so no surprises there. Making commits to the project, updating the source tree, and all that jazz is now “industry-standard” – anyone can start contributing as long as they know how things are done in 90% of open-source projects.

Now comes the first challenge – how do I start with this SVN repository and extract a tar bundle of just the source code? Well, that’s sort of simple, just check out the code and get rid of the “.svn/” directories everywhere, then bundle it up – but I don’t necessarily want to build HEAD. Hmm. Okay, let’s use the tags/ directory and check out a specific tag. This also forces an extra step on the coders to tell the build system that a given revision is ready for packaging, not entirely a bad thing. So we tag it with the release and version we want the RPM package to be, and check out that tag.

Okay, so there’s at lease one important detail – the checkout needs to be renamed after removing the .svn/ directories and before being bundled, since the rpmbuild process expects a directory named %{NAME}-%{VERSION}. That’s just an ‘mv’ command, though.

So now I have a way to get a specific version-release, how do I figure out *which* version-release? Turns out that’s remarkably simple – just parse the specfile with a little “awk”. I think I mentioned in a previous post just how much I love my little friend ‘awk’… anyway. Once I have the bundle, it’s a simple process to move the bundle and specfile into place and launch an rpmbuild job.

But wait… I don’t want to keep rebuilding the same thing every night if there’s no need to. Which means I need to track the builds I’ve done – or at least the ones that have succeeded. I chose to use a PostgreSQL database to do so, though I could have just as easily used any other database – or probably even flat files. I also want to know who to email on build errors – oh and on successes as well, that would be cool – so I throw that into the database.

Without going into too much detail about the database layout, I log which package-version-release combinations are built and when, and also log which emails go with errors and successes for which packages. Then I glue them together with a script that parses the specfile for the “current” version-release of all packages, checks to see if a build has been done, and if not launches the build script.

So basically, in my first month-and-change, I’ve created an end-to-end automated CI build process that goes from source code check-ins to a package ready for signing and distribution. Sure, it’s small scale and systems-oriented rather than application-oriented, but it is a major accomplishment. Plus, it can be easily extended to build applications for deployment – I designed it to be extensible that way. Does it have some limitations? Sure – but for a company this size (~300 employees) in the IT industry (our primary focus is providing web and other IT based services), it’s a pretty hefty addition to the arsenal.

VMware: Taking two steps back for every step forward

Friday, June 8th, 2012

Some of you may remember an earlier rant I went on about VMware support. I’m glad to say that it got resolved at the time, though not as smoothly as I could have hoped. Still, it got resolved, and I went on my happy way.

I’m sorry to report that VMware has failed me yet again – this time in a spectacularly embarrassing way for it’s engineering department. See, I’m at a new job, and the new job also uses VMware virtualization – fairly heavily. We’ve been running into some performance problems on the guest VMs related to either not having VMware-tools installed, or it being out of date. That is clearly our problem – and we started resolving it. By having to compile VMware tools manually, since for some reason the kernel we’re running (RHEL 5, 2.6.18-308.1.1.el5) doesn’t have precompiled modules. No biggie, it still works.

Well, compiling manually by running vmware-tools-config.pl on hundreds of boxes isn’t gonna fly, so I looked in to ways of compiling once and pushing out packages from our RHN Satellite. This is when VMware started to impress me. They have a new method of distibuting the vmware-tools package via YUM repositores. I found a treasure trove of RPMs at http://packages.vmware.com/tools, including a source RPM for the kmod package. Hallelujia! Oh frabjous day! My task has just been vastly simplified!

So I pulled down the source RPM for the kernel modules for the version of ESX we have and launched an “rpmbuild -bb”. The build failed. Wait, WHAT? Turns out that the source for the vmxnet and vmxnet3 modules have a conflicting definition for “struct napi_struct”. Some research led me to figure out that the kmod source for 4.0U2 was okay, since it had taken into account a port for GRO that Red Hat had done. I created diffs of those two trees, added the patches to the 4.0U1 build, and the package built. Okay, a little annoying, but understandable.

Now that I have a kernel module package, time to start pulling down all the other packages – for which, I should note, the source RPM is *not* available. So I pull them down one by one, starting with just the base “vmware-tools” package, doing a manual dependency resolution with wget and “rpm -qi –requires”. Well, I finally get to the “vmware-open-vm-tools-xorg-utilities” package, and it requires both xorg-x11-drv-vmware and xorg-x11-drv-vmmouse. Those are actually in the base RHEL channel.

This is where VMware failed utterly and completely. The binary RPM on their site, in the directory at http://packages.vmware.com/tools/esx/4.0u1/rhel5/x86_64, has version dependencies. Specifically, it depends on:

  • xorg-x11-drv-vmware >= 10.15.2.0
  • xorg-x11-drv-vmmouse >= 12.4.3.0

The versions of these packages available from Red Hat via RHN?

  • xorg-x11-drv-vmware           10.13.0-2
  • xorg-x11-drv-vmmouse        12.4.0-2

That’s right, the VMware packages for RHEL 5, as provided by VMware, require a version of Red Hat packages that doesn’t exist! What’s worse is the xorg-x11-drv-vmmouse packages doesn’t seem to exist for RHEL 6, so I can’t even try to back-port the RHEL 6 packages to RHEL 5. Which means that the past 3 hours of work in trying to generate local packages to install VMware Tools and not have to do so manually was wasted because VMware’s build system for their vmware-tools package is fundamentally broken. Did nobody at VMware bother to do any quality checking to ensure these packages can be installed? Does anybody from VMware realize just how idiotic the entire VMware organization looks to me right now?

EDIT: I’ve now found the two packages that provide the xorg-x11-drv-vmmouse and xorg-x11-drv-vmware versions required. They’re in the VMware download folder, but they have “vmware-open-vm-tools-xorg” type names. Not really too smart there, VMware… either use the same name as upstream, or require a different package name please. Don’t muddy things up like that. You look a lot less idiotic now, but you still look idiotic.

Vendor tools

Tuesday, June 5th, 2012

A number of the vendors that make the products I use seem to have a bit of a disconnect. They look at modern corporate computing and see Windows ruling the desktop. Which it does, no question about it – and with (for the most part) good reason. So, they focus their efforts on a Windows way of managing their products.

Don’t get me wrong here, I haven’t taken leave of my senses and embraced Microsoft. I still don’t like the Windows world, and I still think Windows does plenty of things wrong. But it does enough right that it makes sense, even to me, as the corporate desktop of choice.

Unfortunately, my desktop is not where I do most of my work. It’s on the server(s), which don’t have Windows GUIs, they have the command line tools. What vendors don’t seem to understand is that anything they put in the GUI should also be available in the CLI. Allow me to provide a specific example: we use Symantec NetBackup for our backups. I’ve just been given ownership of backups for all our UNIX / Linux servers, so I want to know what’s going on with it. To that end, I’m trying to write some scripts that give me the information I want on a routine basis. Thanks to another sysadmin who’s a friend, I found the “bpdbjobs” binary – and oh what a wonderful binary it is. Unfortunately, it will either give me verbose information about database entries, or it will give me headers for the very limited information in it’s default report. It will not give me headers for the verbose report, which is the combination I need in order for it to be useful.

Symantec, please give me a verbose (hint: -all_columns) report from bpdbjobs with header information. Yes, I can back up, yes I can restore, but if I don’t know – and can’t find out using your tools – what I’m backing up successfully, your product still isn’t doing what it should be.

The little things…

Friday, June 1st, 2012

It always seems that it’s all the little details that trip us up and cause the biggest problems. In life and in systems administration.

Well, it’s also those little things that have the biggest impact on others, and do the most towards solving problems and making life better. Yesterday, I was given “ownership” of our corporate backups. Ultimately, this just means that I take the blame for any major casters-up events, but it also means that I can make changes where I deem appropriate.

One of the things about the backup environment is that it send out a report every morning regarding the previous day’s backup runs, which has to be checked for errors. The report is generated by doing nothing more than running a series of commands against the backup database, so it reports all the jobs in chronological order based on start time. While this makes perfect sense, it is rather frustrating to have to page through a 2000+ line email to try and determine if there were errors. It’s far too easy to overlook the one character that has the numeric job status when you have 1,998 zeros and two sixes in that column.

Well, computers are great at repetitive tasks like “check each line of data for a non-zero value in this column”, so I decided to do some slicing and dicing. I brought out an old and trusted friend, awk, and told it to find me all the non-zero values and report them to me, then to find the machines those values are associated with and report on all jobs on those machines. Then I had it put all that information at the beginning of the nightly report email so I don’t have to scroll through the huge 2000+ line report to find the anomalies. I went ahead and let the report script put the whole big thing at the end, just like it wanted to, to avoid making it jealous of awk and getting all whiny and constipated later, but I really do like the ‘awk-ed’ aection better.

This morning, the other person whose job it is to go through this email and find the misbehaving systems came by and thanked me for making his job easier and his life better, since he now only has to spend about 30 seconds reading this email versus 5 to 10 minutes previously.

The Devil may be in the details, but often so is Salvation.