Cisco Model-Driven Telemetry tutorial with Telegraf, InfluxDB, and Grafana

, June 30th 2020

(UPDATE: We now have a video demo to accompany this post! Check it out on YouTube.)

Network monitoring is an essential practice for maintaining a healthy and resilient network. Traditionally, SNMP has been the dominant protocol for gathering telemetry from network devices. In recent years, however, that has begun to change. Today, we'll discuss Model-Driven Telemetry backed by YANG models. We'll stream telemetry from a Cisco router and display the data on a beautiful Grafana dashboard.

IS SOMETHING WRONG WITH SNMP?

First off, let's give our respects to SNMP and the designers behind it. The original RFC of the protocol was released in 1988, so it's had an impressively long run in the real world. There's no shortage of organizations still using SNMP today.

In case you don't know, SNMP is short for Simple Network Management Protocol. Although it can be used for configuration management, this was rarely ever the case. Instead, SNMP became the dominant protocol for network monitoring. That is, SNMP, for the most part, is used for gathering statistics from network devices.

Let's now reflect on the performance of SNMP for network monitoring. Without a doubt, it has worked fairly well. There are some shortcomings, however. Let's go over a few.

  • Unreliable Transport - SNMP traps use UDP for transport. UDP is inherently unreliable. If a trap doesn't reach a data collector, the information will be lost.
  • Polling is Inefficient - SNMP-based solutions poll network devices. Polling adds overhead to CPU utilization and doesn't scale well with multiple data collectors.
  • Data Model Limitations - Data available through SNMP is described in MIBs that use syntax defined by SMIv2. A MIB is a very simple tree-like structure and comes with limitations.

SO, HOW DOES MODEL-DRIVEN TELEMETRY HELP?

To be pedantic, one could argue that Model-Driven Telemetry (MDT) isn't new as SNMP MIBs are still based on simplistic data models. However, when people use the term MDT, you can be almost sure that they are referring to solutions that use streaming rather than polling, and YANG models rather than MIBs.

With that said, let's go over the main advantages of Model-Driven Telemetry.

  • Streaming - Telemetry is streamed from network devices so polling is no longer required. That is, network devices will continually transmit data at periodic time intervals or when other events occur until the subscription is no longer active. This reduces CPU utilization and enables a scalable architecture.
  • Reliable Transport - Telemetry can be streamed using reliable transport protocols. A popular choice is gRPC, which runs over HTTP/2.
  • YANG Data Models - Telemetry subscriptions use YANG data models that are much more powerful than SNMP MIBs. Also, YANG models are used by NETCONF and RESTCONF for network automation. So, by migrating to YANG-based telemetry, operators can consolidate their solutions to a single data modelling language.

LET'S SETUP MDT ON A CISCO ROUTER

Cisco Telemetry with Telegraf, InfluxDB, and Grafana

Figure 1: Cisco Telemetry with Telegraf, InfluxDB, and Grafana

Now that we understand the advantages of Model-Driven Telemetry, let's see how we can set it up in real life. In our lab, we'll use a Cisco CSR router on GNS3.

We'll begin by configuring telemetry subscriptions on the router. Then, we'll use a Telegraf server as our data collector. Once Telegraf receives the data, we'll use InfluxDB for storing it in a time-series database. And finally, we'll use Grafana for presenting the data on a user-friendly dashboard.

HOW TO CHOOSE STATE DATA FROM YANG MODELS

To set up a telemetry subscription, we'll first need to choose metrics of interest from the YANG models of our router. We can find all of the YANG models for Cisco here.

We'll start by taking a look at the YANG model for CPU usage. By reading through the model, we can write an XPath expression for CPU utilization. We'll choose the 5-second average metric. As an alternative to manually writing our XPath expression, we could have also used a tool like YANG Explorer.

/process-cpu-ios-xe-oper:cpu-usage/cpu-utilization/five-seconds

It's also useful to monitor statistics on an IP interface. We can find the relevant YANG model here. Let's write an XPath expression that will gather all statistics from the "GigabitEthernet1" interface.

/interfaces-ios-xe-oper:interfaces/interface[name='GigabitEthernet1']/statistics

HOW TO CONFIGURE DIAL-OUT SUBSCRIPTIONS

Now that we have our XPath expressions, we're ready to configure dial-out subscriptions on our router. Once these subscriptions are configured, the router will begin to stream the metrics. The stream will only end when the configuration is removed. In a future post, we'll explore another approach using gNMI, an OpenConfig protocol.

We'll start with the config for the CPU utilization subscription.

The integer in the first line is an ID for the subscription.

We then select an encoding type. We're using Key-Value Google Protocol Buffers, which is compatible with the Telegraf plugin we'll later configure.

We then add our XPath expression.

We specify a source IP address for our stream.

We set the stream type to "yang-push". YANG push is a dial-out stream so will work even when the device is behind a NAT or stateful firewall.

We set our update policy to a periodic 500 centiseconds. This will instruct the stream to send updates every 5 seconds.

Finally, we set the server details of our Telegraf collector and gRPC for transport.

telemetry ietf subscription 1
 encoding encode-kvgpb
 filter xpath /process-cpu-ios-xe-oper:cpu-usage/cpu-utilization/five-seconds
 source-address 192.168.1.110
 stream yang-push
 update-policy periodic 500
 receiver ip address 192.168.1.23 57000 protocol grpc-tcp
 exit
!

Similarly, let's add the subscription for our interface statistics.

telemetry ietf subscription 2
 encoding encode-kvgpb
 filter xpath /interfaces-ios-xe-oper:interfaces/interface[name='GigabitEthernet1']/statistics
 source-address 192.168.1.110
 stream yang-push
 update-policy periodic 500
 receiver ip address 192.168.1.23 57000 protocol grpc-tcp
 exit
!

We'll also need to enable YANG management on our router with one line of configuration.

netconf-yang

We can verify the state of our YANG-related processes with an operational command. The key process is "pubd", which is responsible for telemetry.

Router# show platform software yang-management process
confd            : Running
nesd             : Running
syncfd           : Running
ncsshd           : Running
dmiauthd         : Running
nginx            : Running
ndbmand          : Running
pubd             : Running

With that done, we can verify our subscriptions are valid. In the example below, we check the validity of our second subscription.

Router# show telemetry ietf subscription 2 detail
Telemetry subscription detail:

  Subscription ID: 2
  Type: Configured
  State: Valid
  Stream: yang-push
  Filter:
    Filter type: xpath
    XPath: /interfaces-ios-xe-oper:interfaces/interface[name='GigabitEthernet1']/statistics
  Update policy:
    Update Trigger: periodic
    Period: 500
  Encoding: encode-kvgpb
  Source VRF:
  Source Address: 192.168.1.110
  Notes:

  Receivers:
    Address                                    Port     Protocol         Protocol Profile
    -----------------------------------------------------------------------------------------
    192.168.1.23                               57000    grpc-tcp

HOW TO SETUP TELEGRAF

Now that our subscriptions are configured, we're ready to set up our Telegraf collector. You can download Telegraf here. We'll skip over the installation steps as there's plenty of tutorials around for that.

Once Telegraf is installed, open up the "telegraf.conf" file. We'll then add our input plugin, which will instruct Telegraf to listen on port 57000 for gRPC telemetry. For full details on the plugin, check out the documentation here.

[[inputs.cisco_telemetry_mdt]]
 transport = "grpc"
 service_address = ":57000"

With that done, let's add an output plugin for storing the telemetry in an InfluxDB database.

[[outputs.influxdb]]
  database = "telegraf"
  urls = [ "http://127.0.0.1:8086" ]
  username = "telegraf"
  password = "password123"

That's all we need to do for Telegraf. We can now save our config file and restart the Telegraf service to reflect our changes.

The stream from our Cisco router should now be connected to our Telegraf service. We can verify this is the case with another Cisco command.

show telemetry ietf subscription 2 receiver

If all is well, we'll see that the state is now "Connected".

Router# show telemetry ietf subscription 2 receiver
Telemetry subscription receivers detail:

  Subscription ID: 2
  Address: 192.168.1.23
  Port: 57000
  Protocol: grpc-tcp
  Profile:
  State: Connected
  Explanation:

HOW TO SETUP INFLUXDB

Currently, our Telegraf server is configured to store data in an InfluxDB database that doesn't yet exist.

Let's fix that. Download InfluxDB from here. Again, there are many tutorials around you can follow for the installation.

Once installed, we can run the command "influx" to establish a session with the database. We'll then create a new database, user, and password. The names of these fields must match the names configured in our Telegraf plugin.

create database telegraf
create user telegraf with password 'password123'

We can verify the objects were created with "show" commands.

show databases
show users

This will give an output like the one below.

> show databases
name: databases
name
----
_internal
telegraf
> show users
user     admin
----     -----
telegraf false

HOW TO SETUP GRAFANA

We now have our telemetry streamed to Telegraf and stored in InfluxDB.

To complete our goal, we just need to present our data in a user-friendly format. To do so, we'll begin by downloading Grafana from here.

Once installed, we can create a new dashboard and begin to add panels for our telemetry. Here are two examples. One for CPU utilization and another for received packets per second.

Grafana Query for Cisco CPU Utilization

Figure 2: Grafana Query for Cisco CPU Utilization

Grafana Query for Cisco Interface RX-PPS

Figure 3: Grafana Query for Cisco Interface RX-PPS

Heading back to the dashboard, we'll now see a live feed of our telemetry!

Grafana Dashboard for Cisco Telemetry

Figure 4: Grafana Dashboard for Cisco Telemetry

There we go, that's everything essential we need to know to get started with Model-Driven Telemetry!

ULTRA CONFIG GENERATOR

Have you heard of Ultra Config Generator? If you haven't, I highly recommend you check it out.

You can download an Ultra Config template for telemetry using the link below. We've also shown a screenshot of the template in action.

Download: telemetry-subscription-v2-2020-06-30.json

UCG Telemetry Subscription Template

Figure 5: UCG Telemetry Subscription Template

We designed Ultra Config to allow network engineers to generate and automate network configuration in a highly flexible, efficient and elegant manner. Our users love the application and I hope that you will too.

Take care until next time!

Ultra Config


JOIN THE DISCUSSION

Subscribe to the Blog

Subscribe now and never miss a new post!