Skip to main content

Using Python to Process collectd Network Data

Published

The collectd daemon is a pretty useful tool for system administrators to track metrics on their servers. Actually, it’s useful for anyone who wants a daemon that will collect arbitrary statistics and shovel them somewhere on a regular basis. I used to use it a lot more before I moved most of my workloads to Kubernetes and began using Prometheus for monitoring. But I still use collectd, combined with Graphite, to build long term historical graphs of my infrastructure usage and performance to build projections for future needs rather than monitoring for immediate problems. (Prometheus does not work very well as a historical tool.)

When using collectd you configure some read plugins like CPU or Memory or PostgreSQL. Then you configure at least one write plugin to forward what is read by the read plugins to some permanent storage. I would venture to guess that most people configure the Graphite plugin but there are others like AMQP or CSV or even Prometheus.

All that said, there is one write plugin that I think goes unnoticed: the Network plugin. This plugin is usually advertised as being used to forward data from one collectd instance to another as it functions as both a read and write plugin. But you can also use it to forward traffic to a program that you’ve written to process in your own way, in real time.

First, let’s configure collectd to enable this plugin. We’re going to have it forward every single metric to localhost on port 25826. The network plugin uses UDP and according to the binary protocol documentation, port 25826 is the default port for this protocol. So add this to your collectd.conf file and restart collectd.

LoadPlugin network
<Plugin "network">
  Server "127.0.0.1" "25826"
</Plugin>

The next step is to write our decoder which we will do in Python. This code is based on code created Adrian Perez and updated by Rami Sayer and GrĂ©gory Starck and licensed under GPLv2. I updated it with some cosmetic changes and also to guarantee that plugin names and type names always appeared and appeared with consistent formatting. Here’s the code for the decoder.

So that just decodes what we pass to it. The next part is to write a listener to gets the data from the network plugin and decodes it into some Python objects that you can actually use.

The above is just some basic template code that you can use to get started. It will print all of the metrics it receives as it receives them, in a format that looks like this:

{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.9170213,
 'type_instance': 'user',
 'type_name': 'percent',
 'value': 4.0439340988517225}
{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.917231,
 'type_instance': 'system',
 'type_name': 'percent',
 'value': 2.9455816275586613}
{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.9172502,
 'type_instance': 'wait',
 'type_name': 'percent',
 'value': 0.0}
{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.9172595,
 'type_instance': 'nice',
 'type_name': 'percent',
 'value': 0.0}
{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.9172673,
 'type_instance': 'interrupt',
 'type_name': 'percent',
 'value': 0.0}
{'host_name': 'myhost.example.com',
 'interval': 10.0,
 'plugin_instance': '',
 'plugin_name': 'cpu',
 'timestamp': 1641153893.9172761,
 'type_instance': 'softirq',
 'type_name': 'percent',
 'value': 0.9985022466300548}

Different plugins have different ways of consolidating their metrics so you’ll have to experiment a little bit to see what you get. Note that even on plugins that read one thing, like CPU, the timestamp is different for all of the readings, so don’t try to use the timestamp to group events together. Also note that events can come in any order at any time. Finally, note that if you have two plugins generating the same values for plugin_instance, plugin_name, type_instance, and type_name then you will have to fix your configuration because it’s going to be hard to figure out which is what.

Now you can do with this data whatever you want. It can be a pretty handy tool and message format if you need streamed data and want to do arbitrary things with it.