Cisco Model-Driven Telemetry tutorial with Telegraf, InfluxDB, and Grafana
(UPDATE: We now have a video demo to accompany this post! Check it out on YouTube.)
Network monitoring is an essential practice for maintaining a healthy and resilient network. Traditionally, SNMP has been the dominant protocol for gathering telemetry from network devices. In recent years, however, that has begun to change. Today, we'll discuss Model-Driven Telemetry backed by YANG models. We'll stream telemetry from a Cisco router and display the data on a beautiful Grafana dashboard.
IS SOMETHING WRONG WITH SNMP?
First off, let's give our respects to SNMP and the designers behind it. The original RFC of the protocol was released in 1988, so it's had an impressively long run in the real world. There's no shortage of organizations still using SNMP today.
In case you don't know, SNMP is short for Simple Network Management Protocol. Although it can be used for configuration management, this was rarely ever the case. Instead, SNMP became the dominant protocol for network monitoring. That is, SNMP, for the most part, is used for gathering statistics from network devices.
Let's now reflect on the performance of SNMP for network monitoring. Without a doubt, it has worked fairly well. There are some shortcomings, however. Let's go over a few.
- Unreliable Transport - SNMP traps use UDP for transport. UDP is inherently unreliable. If a trap doesn't reach a data collector, the information will be lost.
- Polling is Inefficient - SNMP-based solutions poll network devices. Polling adds overhead to CPU utilization and doesn't scale well with multiple data collectors.
- Data Model Limitations - Data available through SNMP is described in MIBs that use syntax defined by SMIv2. A MIB is a very simple tree-like structure and comes with limitations.
SO, HOW DOES MODEL-DRIVEN TELEMETRY HELP?
To be pedantic, one could argue that Model-Driven Telemetry (MDT) isn't new as SNMP MIBs are still based on simplistic data models. However, when people use the term MDT, you can be almost sure that they are referring to solutions that use streaming rather than polling, and YANG models rather than MIBs.
With that said, let's go over the main advantages of Model-Driven Telemetry.
- Streaming - Telemetry is streamed from network devices so polling is no longer required. That is, network devices will continually transmit data at periodic time intervals or when other events occur until the subscription is no longer active. This reduces CPU utilization and enables a scalable architecture.
- Reliable Transport - Telemetry can be streamed using reliable transport protocols. A popular choice is gRPC, which runs over HTTP/2.
- YANG Data Models - Telemetry subscriptions use YANG data models that are much more powerful than SNMP MIBs. Also, YANG models are used by NETCONF and RESTCONF for network automation. So, by migrating to YANG-based telemetry, operators can consolidate their solutions to a single data modelling language.
LET'S SETUP MDT ON A CISCO ROUTER
Figure 1: Cisco Telemetry with Telegraf, InfluxDB, and Grafana
Now that we understand the advantages of Model-Driven Telemetry, let's see how we can set it up in real life. In our lab, we'll use a Cisco CSR router on GNS3.
We'll begin by configuring telemetry subscriptions on the router. Then, we'll use a Telegraf server as our data collector. Once Telegraf receives the data, we'll use InfluxDB for storing it in a time-series database. And finally, we'll use Grafana for presenting the data on a user-friendly dashboard.
HOW TO CHOOSE STATE DATA FROM YANG MODELS
To set up a telemetry subscription, we'll first need to choose metrics of interest from the YANG models of our router. We can find all of the YANG models for Cisco here.
We'll start by taking a look at the YANG model for CPU usage. By reading through the model, we can write an XPath expression for CPU utilization. We'll choose the 5-second average metric. As an alternative to manually writing our XPath expression, we could have also used a tool like YANG Explorer.
It's also useful to monitor statistics on an IP interface. We can find the relevant YANG model here. Let's write an XPath expression that will gather all statistics from the "GigabitEthernet1" interface.
HOW TO CONFIGURE DIAL-OUT SUBSCRIPTIONS
Now that we have our XPath expressions, we're ready to configure dial-out subscriptions on our router. Once these subscriptions are configured, the router will begin to stream the metrics. The stream will only end when the configuration is removed. In a future post, we'll explore another approach using gNMI, an OpenConfig protocol.
We'll start with the config for the CPU utilization subscription.
The integer in the first line is an ID for the subscription.
We then select an encoding type. We're using Key-Value Google Protocol Buffers, which is compatible with the Telegraf plugin we'll later configure.
We then add our XPath expression.
We specify a source IP address for our stream.
We set the stream type to "yang-push". YANG push is a dial-out stream so will work even when the device is behind a NAT or stateful firewall.
We set our update policy to a periodic 500 centiseconds. This will instruct the stream to send updates every 5 seconds.
Finally, we set the server details of our Telegraf collector and gRPC for transport.
telemetry ietf subscription 1 encoding encode-kvgpb filter xpath /process-cpu-ios-xe-oper:cpu-usage/cpu-utilization/five-seconds source-address 192.168.1.110 stream yang-push update-policy periodic 500 receiver ip address 192.168.1.23 57000 protocol grpc-tcp exit !
Similarly, let's add the subscription for our interface statistics.
telemetry ietf subscription 2 encoding encode-kvgpb filter xpath /interfaces-ios-xe-oper:interfaces/interface[name='GigabitEthernet1']/statistics source-address 192.168.1.110 stream yang-push update-policy periodic 500 receiver ip address 192.168.1.23 57000 protocol grpc-tcp exit !
We'll also need to enable YANG management on our router with one line of configuration.
We can verify the state of our YANG-related processes with an operational command. The key process is "pubd", which is responsible for telemetry.
Router# show platform software yang-management process confd : Running nesd : Running syncfd : Running ncsshd : Running dmiauthd : Running nginx : Running ndbmand : Running pubd : Running
With that done, we can verify our subscriptions are valid. In the example below, we check the validity of our second subscription.
Router# show telemetry ietf subscription 2 detail Telemetry subscription detail: Subscription ID: 2 Type: Configured State: Valid Stream: yang-push Filter: Filter type: xpath XPath: /interfaces-ios-xe-oper:interfaces/interface[name='GigabitEthernet1']/statistics Update policy: Update Trigger: periodic Period: 500 Encoding: encode-kvgpb Source VRF: Source Address: 192.168.1.110 Notes: Receivers: Address Port Protocol Protocol Profile ----------------------------------------------------------------------------------------- 192.168.1.23 57000 grpc-tcp
HOW TO SETUP TELEGRAF
Now that our subscriptions are configured, we're ready to set up our Telegraf collector. You can download Telegraf here. We'll skip over the installation steps as there's plenty of tutorials around for that.
Once Telegraf is installed, open up the "telegraf.conf" file. We'll then add our input plugin, which will instruct Telegraf to listen on port 57000 for gRPC telemetry. For full details on the plugin, check out the documentation here.
[[inputs.cisco_telemetry_mdt]] transport = "grpc" service_address = ":57000"
With that done, let's add an output plugin for storing the telemetry in an InfluxDB database.
[[outputs.influxdb]] database = "telegraf" urls = [ "http://127.0.0.1:8086" ] username = "telegraf" password = "password123"
That's all we need to do for Telegraf. We can now save our config file and restart the Telegraf service to reflect our changes.
The stream from our Cisco router should now be connected to our Telegraf service. We can verify this is the case with another Cisco command.
show telemetry ietf subscription 2 receiver
If all is well, we'll see that the state is now "Connected".
Router# show telemetry ietf subscription 2 receiver Telemetry subscription receivers detail: Subscription ID: 2 Address: 192.168.1.23 Port: 57000 Protocol: grpc-tcp Profile: State: Connected Explanation:
HOW TO SETUP INFLUXDB
Currently, our Telegraf server is configured to store data in an InfluxDB database that doesn't yet exist.
Let's fix that. Download InfluxDB from here. Again, there are many tutorials around you can follow for the installation.
Once installed, we can run the command "influx" to establish a session with the database. We'll then create a new database, user, and password. The names of these fields must match the names configured in our Telegraf plugin.
create database telegraf create user telegraf with password 'password123'
We can verify the objects were created with "show" commands.
show databases show users
This will give an output like the one below.
> show databases name: databases name ---- _internal telegraf > show users user admin ---- ----- telegraf false
HOW TO SETUP GRAFANA
We now have our telemetry streamed to Telegraf and stored in InfluxDB.
To complete our goal, we just need to present our data in a user-friendly format. To do so, we'll begin by downloading Grafana from here.
Once installed, we can create a new dashboard and begin to add panels for our telemetry. Here are two examples. One for CPU utilization and another for received packets per second.
Figure 2: Grafana Query for Cisco CPU Utilization
Figure 3: Grafana Query for Cisco Interface RX-PPS
Heading back to the dashboard, we'll now see a live feed of our telemetry!
Figure 4: Grafana Dashboard for Cisco Telemetry
There we go, that's everything essential we need to know to get started with Model-Driven Telemetry!
ULTRA CONFIG GENERATOR
Have you heard of Ultra Config Generator? If you haven't, I highly recommend you check it out.
You can download an Ultra Config template for telemetry using the link below. We've also shown a screenshot of the template in action.
Figure 5: UCG Telemetry Subscription Template
We designed Ultra Config to allow network engineers to generate and automate network configuration in a highly flexible, efficient and elegant manner. Our users love the application and I hope that you will too.
Take care until next time!
JOIN THE DISCUSSION
Subscribe to the Blog
Subscribe now and never miss a new post!
Thank you. Your subscription request has been successfully lodged! You will now receive emails when new blogs are posted.