Zero-Touch Provisioning – Automated CPE onboarding in NSO

So, these days I guess we’re all trying to achieve “Automation Nirvana” and dream of rainbows, unicorns and turn-key automation
solutions with telepathic capabilities (heh, “AI” seem to be answer to all things nowadays).

rainbows-unicorns

IMO, getting there often require a decent amount sacrifices, occasionally admitting defeat and choose a path
that you’ve always sworn never to go by.

While working with (well, mostly studying) NSO, I thought that I’d give automated CPE onboarding a try. In the case of my
virtual lab, this device would be a Cisco CSR1000v. Being an XE-platform, it offers a neat little on-box Linux container with a Python
interpreter that could provide the scripting capabilities needed in order to register the device with NSO.

At first, this seemed to be totally awesome. But as I progressed it seemed to be a lot harder than I initially expected.
Going through a lot of the documents for on-the-box Python, it seemed that ZTP was as easy as just providing a few options via
DHCP and, viola!, it would run the script and the world would be a better place.

But apparently I was doing something wrong, because it didn’t…

My script was quite simple initially and consisted of several API calls to NSO’s REST API for:

  • Device registration/creation
  • Fetching device SSH keys
  • Syncing configuration from device
  • Changing authgroup upon fallback username/password removal
  • The Python modules pre-installed on the Guestshell did not cover the requests module, but instead ‘PyCurl’ and ‘urllib2’ were present which could provide the same functionality that was needed from requests.

    Script:

    import json
    import urllib
    import urllib2
    import base64
    import time
    import cli

    username=”user”
    password=”pass”

    base64string = base64.encodestring(‘%s:%s’ % (username, password))[:-1]

    rtr_ip = str(cli.execute(‘show dhcp lease’))
    rtr_ip = rtr_ip.split(‘ ‘)[3].strip()
    rtr_hostname = str(cli.execute(‘show running-config | include ^hostname’))
    rtr_hostname = rtr_hostname.replace(‘hostname ‘, ”).strip()

    rtr_json = {
        “tailf-ncs:device”: {
            “name”: str(rtr_hostname),
            “address”: str(rtr_ip),
            “authgroup”: “FALLBACK”,
            “device-type”: {
                “cli”: {
                    “ned-id”: “tailf-ned-cisco-ios-id:cisco-ios”,
                    “protocol”: “ssh”
                }
            },
            “state”: {
                “admin-state”: “unlocked”
            }
        }
    }

    BASE_URL = “http://10.255.192.6:8080/api”

    def create_rtr(data,hostname):

        req = urllib2.Request(url = BASE_URL + ‘/running/devices/device/’ + hostname, data = bytes(data.encode(“utf-8”)))
        req.get_method = lambda: ‘PUT’

        # Add the appropriate header.
        req.add_header(“Content-type”, “application/vnd.yang.data+json”)
        req.add_header(“Accept”, “application/vnd.yang.data+json”)
        req.add_header(“Authorization”, “Basic %s” % base64string)

        try:
            resp = urllib2.urlopen(req)
            print(resp.getcode())
        except urllib2.HTTPError as e:
            print e.code
            print e.read()

    def get_ssh_keys(hostname):

        req = urllib2.Request(url = BASE_URL + ‘/running/devices/device/’ + hostname + ‘/ssh/_operations/fetch-host-keys’)
        req.get_method = lambda: ‘POST’

        # Add the appropriate header.
        req.add_header(“Accept”, “application/vnd.yang.operation+json”)
        req.add_header(“Authorization”, “Basic %s” % base64string)

        try:
            resp = urllib2.urlopen(req)
            print(resp.getcode())
        except urllib2.HTTPError as e:
            print e.code
            print e.read()

    def sync_config(hostname):

        req = urllib2.Request(url = BASE_URL + ‘/running/devices/device/’ + hostname + ‘/_operations/sync-from’)
        req.get_method = lambda: ‘POST’

        # Add the appropriate header.
        req.add_header(“Accept”, “application/vnd.yang.operation+json”)
        req.add_header(“Authorization”, “Basic %s” % base64string)

        try:
            resp = urllib2.urlopen(req)
            print(resp.getcode())
        except urllib2.HTTPError as e:
            print e.code
            print e.read()

    def nso_apply_template(hostname):

        JSON = {
            “tailf-ncs:apply-template”: {
                  “template-name”: “CSR1Kv_CPE”,
                  “accept-empty-capabilities”: None
            }
        }

        data = json.dumps(JSON)
        req = urllib2.Request(url = BASE_URL + ‘/running/devices/device/’ + hostname + ‘/_operations/apply-template’, data = bytes(data.encode(“utf-8”)))
        req.get_method = lambda: ‘POST’

        # Add the appropriate header.
        req.add_header(“Accept”, “application/vnd.yang.operation+json”)
        req.add_header(“Content-Type”, “application/vnd.yang.operation+json”)
        req.add_header(“Authorization”, “Basic %s” % base64string)

        try:
            resp = urllib2.urlopen(req)
            print(resp.getcode())
        except urllib2.HTTPError as e:
            print e.code
            print e.read()

    def change_authgroup(hostname):

        JSON = {
            “tailf-ncs:device”: {
                “authgroup”: “NETADMIN”
            }
        }

        data = json.dumps(JSON)
        req = urllib2.Request(url = BASE_URL + ‘/running/devices/device/’ + hostname, data = bytes(data.encode(“utf-8”)))
        req.get_method = lambda: ‘PATCH’

        # Add the appropriate header.
        req.add_header(“Accept”, “application/vnd.yang.data+json”)
        req.add_header(“Content-Type”, “application/vnd.yang.data+json”)
        req.add_header(“Authorization”, “Basic %s” % base64string)

        try:
            resp = urllib2.urlopen(req)
            print(resp.getcode())
        except urllib2.HTTPError as e:
            print e.code
            print e.read()
     
     if __name__ == “__main__”:
        create_rtr(json.dumps(rtr_json),rtr_hostname)
        time.sleep(5)
        get_ssh_keys(rtr_hostname)
        time.sleep(5)
        sync_config(rtr_hostname)
        time.sleep(5)
        cli.configurep([‘no username cisco’, ‘enable secret notcisco’,
            ‘username notcisco privilege 15 secret notcisco’, ‘end’])
        time.sleep(5)
        change_authgroup(rtr_hostname)
        time.sleep(5)
        sync_config(rtr_hostname)
        time.sleep(5)
        nso_apply_template(rtr_hostname)

     

    Upon device bootup, the device would request an IP address through DHCP. In addition to the IP address, the DHCP response supplied a few DCHP options – option 150 (TFTP servers), option 67 (Filename) and option 12 (hostname).

     

    subnet 10.255.193.0 netmask 255.255.255.252 {
      option subnet-mask 255.255.255.252;
      option broadcast-address 10.255.193.3;
      option routers 10.255.193.1;
      option domain-name-servers 10.255.192.6;
      option domain-name “ciscotechie.lab”;
    }
    host crs1kv-1 {
      hardware ethernet 0c:3d:1d:2b:48:00;
      fixed-address 10.255.193.2;
      option bootfile-name “xe_ztp.py”;
      option ip-tftp-server 10.255.192.6;
      option host-name “csr1kv-1”;

    }

     

    As derived from the script above, the builtin cli module is used to extract the hostname from the router cli along with the provided IP address. This is used for device reqistration.
    Apparently, when booting up the script halted at some point and did not provide any usable output to the console. Trying to troubleshoot showed that the script seemed to work, but initiating the script in a specific way returned an error with traceback-info regarding a ‘log’ function. While digging through various post across the internet, I stumbled upon a post where it was indicated that this may be due to the (plain!!) http server not being enabled.

    Behind the scenes, the cli module does seem to initiate sockets to the router from within the guestshell and in the initial starting config the CSR1Kv only has the secure http service enabled. I must admit, that this annoyed me a bit since you don’t have any immediate option to resolve this (upgrading to 16.12 had the same issue).

     

    So, this is the point where you start to get creative. How do I enable the insecure http service automatically so the script can run and collect the information from the cli? … – sigh – old-school DHCP-based Autoconfiguration

    Great, so how do I then get the Python-script downloaded from the TFTP server, started the guestshell and run the script? … Of course! Embedded Event Manager! (This is the point where any network engineer start loose some of their credibility – if not all).

     

    For this change, all there’s to it is to provide a router-config file instead of the python script; So changing the dhcp option 67 to a config file name (in my case I named it xe_base.cfg)

     

    host crs1kv-1 {
      …
      option bootfile-name “xe_base.cfg”;
      …
    }

     

    The goal of the base config was, of course:

  • To enable the http-server
  • Provision the EEM applets and run them with privilege level 15 access
  • Having the EEM applets enabling the guestshell, when the IOx service is ready with required config
  • Having the EEM applets downloading the Python script from the TFTP server without user interaction, when the guestshell is up
  • Having the EEM applets instructing the guestshell to run the python script
  •  

    xe_base.cfg

    ip domain-name ciscotechie.lab
    aaa new-model
    aaa authentication login default local-case
    aaa authorization exec default local
    aaa authentication enable default enable
    username cisco privilege 15 secret cisco
    enable secret cisco
    access-list 100 permit tcp host 10.255.192.6 any eq 22
    access-list 100 permit tcp host 10.255.192.9 any eq 22
    ip http server
    line vty 0 4
     transport input ssh
    ip ssh version 2
    crypto key generate rsa general-keys modulus 2048
    !
    iox
    !
    ip http client source-interface GigabitEthernet1
    !
    ip nat inside source list NAT_ACL interface GigabitEthernet1 overload
    ip tftp source-interface GigabitEthernet1
    !
    ip access-list standard NAT_ACL
     10 permit 192.168.0.0 0.0.255.255

    !
    interface GigabitEthernet1
     ip nat outside
    !
    interface VirtualPortGroup31
     ip address 192.168.2.1 255.255.255.0
     ip nat inside
     no mop enabled
     no mop sysid

    !
    logging buffered debugging
    logging monitor debugging
    logging buffered 1000000
    logging host 10.255.192.6
    !
    file prompt quiet
    !
    event manager session cli username cisco privilege 15
    !
    event manager applet START_GUESTSHELL
     event syslog pattern “.*ioxman: IOX is ready.*” maxrun 360
     action 1.0 syslog msg “Starting EEM script START_GUESTSHELL”
     action 2.0 cli command “enable”
     action 2.1 cli command “debug event manager all”
     action 2.2 cli command “guestshell enable”
     action 3.0 wait 30
     action 4.0 cli command “conf t”
     action 4.1 cli command “app-hosting appid guestshell”
     action 4.2 cli command ” app-vnic gateway0 virtualportgroup 31 guest-interface 1″
     action 4.3 cli command ” guest-ipaddress 192.168.2.2 netmask 255.255.255.0″
     action 4.4 cli command ” exit”
     action 4.5 cli command ” app-vnic management guest-interface 0″
     action 4.6 cli command ” app-default-gateway 192.168.2.1 guest-interface 1″
     action 4.7 cli command ” start”
     action 4.8 cli command “end”

    !
    event manager applet ZTP
     event syslog pattern “.*guestshell.*RUNNING.*” maxrun 600
     action 1.0 cli command “enable”
     action 2.0 cli command “copy tftp://10.255.192.6/xe_ztp.py bootflash:”
     action 3.0 cli wait 60
     action 4.0 syslog msg “TFTP download of xe_ztp.py done”

     action 5.0 cli command “guestshell run python /bootflash/xe_ztp.py”
     action 6.0 syslog msg “ZTP applet finished”

    !

     

    There you have it! One way of doing automated CPE onboarding — by no means stating that it is ‘the right way, nor that all my assumptions along the way has been accurate. However, it was ‘one way’ of getting the job done. With needles, thread, duct tape and whatever option that could be dug up from the toolbox.

    This experience kinda made me wanting to update the old image on the stages of Cisco certification:

    Automation Engineer