Archive

Archive for October, 2009

Nagios check_by_ssh and NAT

October 22nd, 2009

At a remote location, I have a number of machines to monitor but only one IP (dynamic on a residential connection). Most of my remote monitoring with Nagios uses check_by_ssh. Previously, I’d used one host for Nagios to SSH to, and then chained together another check_by_ssh to reach the remote hosts. Unfortunately, this means nothing past the one first host can get monitored if the first host is down. All of the other hosts (everything is behind NAT) have SSH visible externally on different ports.

SSH itself doesn’t like one IP/hostname with SSH on different ports – host key verification will fail, as the SSH client only looks at the address that it’s connecting to, not the port number. Normally, this is bypassed by using a .ssh/config file like:

Host foo1
        Hostname foo.example.com
        HostKeyAlias foo1
        CheckHostIP no
        Port 22
        User nagios
 
Host foo2
        Hostname foo.example.com
        HostKeyAlias foo2
        CheckHostIP no
        Port 222
        User nagios
 
Host foo3
        Hostname foo.example.com
        HostKeyAlias foo3
        CheckHostIP no
        Port 10022
        User nagios

And then you SSH using the “Host” named in the config file, not the actual hostname.

Unfortunately, the only way to get check_by_ssh to do this was a bit messy, and required defining a bunch of extra macros for each host:

/check_by_ssh -o Hostname=foo.example.com -o HostKeyAlias=foo1 -o CheckHostIP=no -o Port=222 -o User=nagios -H foo.example.com -C uptime

So, I made a quick little patch for check_by_ssh.c (patched against the released nagios-plugins-1.4.14) :

--- check_by_ssh.c      2009-10-22 14:32:26.000000000 -0400
+++ check_by_ssh_ORIG.c 2009-10-22 14:12:15.000000000 -0400
@@ -181,7 +181,6 @@
                {"skip", optional_argument, 0, 'S'}, /* backwards compatibility */
                {"skip-stdout", optional_argument, 0, 'S'},
                {"skip-stderr", optional_argument, 0, 'E'},
-               {"ssh-config", optional_argument, 0, "F"},
                {"proto1", no_argument, 0, '1'},
                {"proto2", no_argument, 0, '2'},
                {"use-ipv4", no_argument, 0, '4'},
@@ -199,7 +198,7 @@
                        strcpy (argv[c], "-t");
 
        while (1) {
-               c = getopt_long (argc, argv, "Vvh1246fqt:H:O:p:i:u:l:C:S::E::n:s:o:F:", longopts,
+               c = getopt_long (argc, argv, "Vvh1246fqt:H:O:p:i:u:l:C:S::E::n:s:o:", longopts,
                                 &option);
 
                if (c == -1 || c == EOF)
@@ -222,7 +221,7 @@
                                timeout_interval = atoi (optarg);
                        break;
                case 'H':                                                                       /* host */
-                 /* host_or_die(optarg); */     /* commented out 2009-10-22 by jantman for ssh config file use */
+                       host_or_die(optarg);
                        hostname = optarg;
                        break;
                case 'p': /* port number */
@@ -300,12 +299,6 @@
                        else
                                skip_stderr = atoi (optarg);
                        break;
-               /* added 2009-10-22 by jantman for ssh -F option (config file) */
-               case 'F':                                                                       /* ssh config file */
-                       comm_append("-F");
-                       comm_append(optarg);
-                       break;
-               /* END added 2009-10-22 by jantman */
                case 'o':                                                                       /* Extra options for the ssh command */
                        comm_append("-o");
                        comm_append(optarg);
@@ -411,8 +404,6 @@
   printf ("    %s\n", _("Ignore all or (if specified) first n lines on STDERR [optional]"));
   printf (" %s\n", "-f");
   printf ("    %s\n", _("tells ssh to fork rather than create a tty [optional]. This will always return OK if ssh is executed"));
-  printf (" %s\n", "-F");
-  printf ("    %s\n", _("path to ssh config file [optional]"));
   printf (" %s\n","-C, --command='COMMAND STRING'");
   printf ("    %s\n", _("command to execute on the remote machine"));
   printf (" %s\n","-l, --logname=USERNAME");

It works fine. The only problem is that I disabled the check that the given hostname/IP is valid, so instead of getting a nice “Invalid hostname/address – foobar” error, you’ll get the usual “Remote command execution failed: ssh: foobar: Name or service not known” error (though it will still give an exit code of 3). I had to do this because check_by_ssh was checking for a valid hostname itself, though SSH needs to be passed the “Host” alias as defined in the config file.

With the patch, we now have something nice and clean like:

./check_by_ssh -H foo1 -F /home/nagios/.ssh/config -l nagios -i /home/nagios/.ssh/id_dsa -C uptime

Which only adds the “-F” flag to what I was already using, and is safe to use for all hosts.

When I get a chance, I’ll figure out a way to gracefully deal with the host aliases (”fake hostnames”) and submit a patch. Most likely, I’ll add another option so that you have to specify both the actual hostname (so it can check that it exists) and the alias used in the config file (perhaps “-a”?)

Tech HowTos ,

Puppet problems with hostname in autosign.conf – Invalid pattern

October 14th, 2009

In playing with Puppet (0.24.8 on clients and server) today (well, building a new host) I came by a strange error when I ran puppet on the client:

err: Could not request certificate: Certificate retrieval failed: Invalid pattern css-storemanager

The thing that was so strange is that “css-storemanager” is the name of a host at my site, controlled by Puppet, but it has nothing to do with the host I was building. They’re different boxes, on different subnets, in different rooms. One is a SunFire and the other is an HP desktop.

Google turned up nothing. Running puppetmasterd with --debug --trace yielded:

info: Listening on port 8140
notice: Starting Puppet server version 0.24.8
notice: Allowing unauthenticated client ccf-hill019-12.example.edu(172.x.x.x) access to puppetca.getcert
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:289:in `parse'
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:170:in `pattern='
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:151:in `initialize'
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:80:in `new'
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:80:in `store'
/usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb:20:in `allow'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:54:in `autosign?'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:51:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:51:in `autosign?'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:50:in `open'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:50:in `autosign?'
/usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb:112:in `getcert'
/usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `to_proc'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:52:in `protect_service'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:85:in `setup_processor'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `call'
/usr/lib/ruby/1.8/xmlrpc/server.rb:336:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `each'
/usr/lib/ruby/1.8/xmlrpc/server.rb:323:in `dispatch'
/usr/lib/ruby/1.8/xmlrpc/server.rb:366:in `call_method'
/usr/lib/ruby/1.8/xmlrpc/server.rb:378:in `handle'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/processor.rb:44:in `process'
/usr/lib/ruby/site_ruby/1.8/puppet/network/xmlrpc/webrick_servlet.rb:68:in `service'
/usr/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
/usr/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
/usr/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
/usr/lib/ruby/1.8/webrick/server.rb:162:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
/usr/lib/ruby/1.8/webrick/server.rb:95:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:92:in `each'
/usr/lib/ruby/1.8/webrick/server.rb:92:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:23:in `start'
/usr/lib/ruby/1.8/webrick/server.rb:82:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:293:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:144:in `newthread'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `initialize'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `new'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:143:in `newthread'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:291:in `start'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:290:in `each'
/usr/lib/ruby/site_ruby/1.8/puppet.rb:290:in `start'
/usr/sbin/puppetmasterd:285
err: Invalid pattern css-storemanager

After a bit of investigation into that trace, I found the following code in /usr/lib/ruby/site_ruby/1.8/puppet/network/authstore.rb starting on line 242:

242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
            # Parse our input pattern and figure out what kind of allowal
            # statement it is.  The output of this is used for later matching.
            def parse(value)
                case value
                when /^(\d+\.){1,3}\*$/: # an ip address with a '*' at the end
                    @name = :ip
                    match = $1
                    match.sub!(".", '')
                    ary = value.split(".")
 
                    mask = case ary.index(match)
                    when 0: 8
                    when 1: 16
                    when 2: 24
                    else
                        raise AuthStoreError, "Invalid IP pattern %s" % value
                    end
 
                    @length = mask
 
                    ary.pop
                    while ary.length < 4
                        ary.push("0")
                    end
 
                    begin
                        @pattern = IPAddr.new(ary.join(".") + "/" + mask.to_s)
                    rescue ArgumentError => detail
                        raise AuthStoreError, "Invalid IP address pattern %s" % value
                    end
                when /^([a-zA-Z][-\w]*\.)+[-\w]+$/: # a full hostname
                    @name = :domain
                    @pattern = munge_name(value)
                when /^\*(\.([a-zA-Z][-\w]*)){1,}$/: # *.domain.com
                    @name = :domain
                    @pattern = munge_name(value)
                    @pattern.pop # take off the '*'
                    @length = @pattern.length
                else
                    # Else, use the IPAddr class to determine if we've got a
                    # valid IP address.
                    if value =~ /\/(\d+)$/
                        @length = Integer($1)
                    end
                    begin
                        @pattern = IPAddr.new(value)
                    rescue ArgumentError => detail
                        raise AuthStoreError, "Invalid pattern %s" % value
                    end
                    @name = :ip
                end

Following the trace back, I took a look at /usr/lib/ruby/site_ruby/1.8/puppet/network/handler/ca.rb starting at line 50:

50
51
52
53
54
55
56
57
            auth = Puppet::Network::AuthStore.new
            File.open(autosign) { |f|
                f.each { |line|
                    next if line =~ /^\s*#/
                    next if line =~ /^\s*$/
                    auth.allow(line.chomp)
                }
            }

After looking at this, it clicked that it must be what evaluates autosign.conf. Taking a look at mine, one line stood out: a line containing only “css-storemanager”, not a FQDN like all the rest. The parse() function in authstore.rb only accepts IP addresses and FQDNs (or IP addresses ending in a wildcard, or wildcard FQDNs). It appears to choke on hostnames (a string that doesn’t match an FQDN or IP). Interestingly, it also evaluates in order, and stops evaluating autosign.conf once it finds a match. So, if you’re a Bad Person like me, and left autosign turned on for all of your hosts, you wouldn’t notice this until you try and build a new box.

To solve this, just remove any offending lines from autosign.conf.

I’ve filed a bug report on the Puppet Trac: Issue 2723.

Tech HowTos