Linux kernel, Linux system development, Networking

Monitoring Linux networking state using netlink

Once in my work, I needed to monitor all changes in the Linux networking subsystem: adding or deleting IP addresses, routes, etc.
Maybe the best way to do this is to use socket-based Netlink technology. Using Netlink, we can “subscribe” to some network-related notifications from the kernel. It’s also possible to send commands to the network stack and change the routing table, interface configurations, and packet filtering. For example, popular utilities like “iproute2” are also using Netlink to do their job.
The easiest way to access Netlink sockets from the userspace is to use a libnetlink library, which provides many macros, defines, and functions.
The worst part of this library and whole Netlink technology is a lack of good examples.
In this case, a good solution is using iproute2 source code to discover things you interesting in. This article is also may be used as a good startup point.

Introduction in Netlink

The Netlink is a socket-based Linux kernel interface used for inter-process communication (IPC) between both the kernel and userspace processes and between different userspace processes, in a way similar to the Unix domain sockets.

Like the Unix domain sockets, unlike INET sockets, Netlink communication cannot traverse host boundaries.
However, while the Unix domain sockets use the file system namespace, Netlink processes are addressed by process identifiers (PIDs).

Communication with Netlink is made using a separate socket’s family – AF_NETLINK.
Every Netlink message contains a header, represented with nlmsghdr structure. After the header may be attached some payload: some special structure or RAW data.
Netlink can split big messages into multiple parts. In such a case, every “partial” package is marked with NLM_F_MULTI flag, and the last package is marked with NLMSG_DONE flag.

There are a lot of useful macros that can help us to parse Netlink messages.
Everything is defined in Netlink.h and rtnetlink.h header files.

Creating of Netlink socket is pretty standard.

socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE)

where:
AF_NETLINK — netlink domain
SOCK_RAW — raw socket
NETLINK_ROUTE — required protocol.

In particular, NETLINK_ROUTE is used for routing and link information.

All available protocols can be found in the documentation. Here is a list of the most interesting:

  • NETLINK_ROUTE — routing and link information, monitoring and configuration routines
  • NETLINK_FIREWALL — transfer packets to userspace from the firewall
  • NETLINK_INET_DIAG — information about sockets of various protocol families
  • NETLINK_NFLOG — Netfilter/iptables ULOG
  • NETLINK_SELINUX — SELinux event notifications
  • NETLINK_NETFILTER — communications with Netfilter subsystem
  • NETLINK_KOBJECT_UEVENT — get kernel messages
  • NETLINK_USERSOCK — reserved for user-defined protocols

Communication

All communications through the Netlink socket is made with two well-known structures: msghdr and iovec.

struct iovec
{
    void *iov_base; // data buff
    __kernel_size_t iov_len; // size of the data
};

This structure contains a link to the actual message buffer with some data and its size.

struct msghdr {
    void *msg_name; // client addr (socket name)
    int msg_namelen; // length of the client addr
    struct iovec *msg_iov; // pointer to the iovec structure with message data
    __kernel_size_t msg_iovlen; // count of the data blocks
    void *msg_control; // points to a buffer for other protocol control-related messages or miscellaneous ancillary data. 
    __kernel_size_t msg_controllen; // length of the msg_control
    unsigned  msg_flags; // flags on received message
};

struct msghdr can be directly passed to socket’s recvmsg and sendmsg and used to minimize the number of directly supplied arguments.

This structure is defined in <sys/socket.h>

See recvmsg and sendmsg for details.

A Netlink message stored in iovec typically contains a Netlink message header (struct nlmsghdr) and the payload attached. The payload can consist of arbitrary data but usually contains a fixed size protocol-specific header followed by a stream of attributes.

struct nlmsghdr
{
    __u32 nlmsg_len; // message size, include this header
    __u16 nlmsg_type; // message type (see below)
    __u16 nlmsg_flags; // message flags (see below)
    __u32 nlmsg_seq; // sequence number
    __u32 nlmsg_pid; // sender identifier (typically - process id)
};

The following standard message types are defined:

  • NLMSG_NOOP – No operation, a message must be discarded
  • NLMSG_ERROR – Error message or ACK, see Error Message respectively ACKs
  • NLMSG_DONE – End of multipart sequence, see Multipart Messages
  • NLMSG_OVERRUN – Overrun notification (Error)

Every netlink protocol is free to define own message types. Note that message type values < NLMSG_MIN_TYPE (0x10) are reserved and may not be used.

The following standard flags are defined:

  • NLM_F_REQUEST — Request message
  • NLM_F_MULTI — Part of the multipart message
  • NLM_F_ACK — Acknowledge requested
  • NLM_F_ECHO — Request to echo this request; typical direction is from kernel to user
  • NLM_F_ROOT — Return based on the root of the tree
  • NLM_F_MATCH — Return all matching entries
  • NLM_F_ATOMIC — Is obsolete now, used to request an atomic operation
  • NLM_F_DUMP — Same as NLM_F_ROOT|NLM_F_MATCH

The client’s identifications (user and kernel spaces) are made with structure sockaddr_nl.

struct sockaddr_nl
{
    sa_family_t nl_family; // always AF_NETLINK
    unsigned short nl_pad; // typically filled with zeros
    pid_t nl_pid; // client identifier (process id)
    __u32 nl_groups; // mask for senders/recivers group
};

nl_pid – unique socket identifier, for the kernel sockets, this value is always zero. On the userspace, typically used current process id. This may cause problems in multithreading applications if multiple threads are trying to create and use Netlink sockets.
To work around this, we can initialize every nl_pid with this construction:

pthread_self() << 16 | getpid()

nl_groups — is a special bitmask of Netlink groups. This value is used after calling bind() on the Netlink socket to “subscribe” to specified groups’ events.
This is what we gonna use in our current task – network monitoring.

The definition of all groups can be found in the Netlink header file.
Here is some of them, which we can use in the current situation:

  • RTMGRP_LINK — notifications about changes in network interface (up/down/added/removed)
  • RTMGRP_IPV4_IFADDR — notifications about changes in IPv4 addresses (address was added or removed)
  • RTMGRP_IPV6_IFADDR — same for IPv6
  • RTMGRP_IPV4_ROUTE — notifications about changes in IPv4 routing table
  • RTMGRP_IPV6_ROUTE — same for IPv6
Netlink message payload

As I already said – after the header, we can found some payload, which may be split into parts. Libnetlink contains several macros that are extremely helpful in accessing and checking message payload.

Some most useful:

  • NLMSG_DATA — Get pointer to the message payload
  • NLMSG_PAYLOAD — Get the actual size of the message payload
  • NLMSG_ALIGN — Rounds the message size to the nearest aligned value
  • NLMSG_LENGTH — Get the size of the payload and returns a correct aligned value
  • NLMSG_SPACE — Get the actual size of the data in the Netlink packet
  • NLMSG_NEXT — Get the next part of the multipart message. When using these macros, it’s important to check for NLMSG_DONE message flag to avoid buffer overruns.
  • NLMSG_OK — Returns true if the message is correct and was successfully parsed

Practical usage of Netlink

Okay, I think that it’s enough of boring theory 🙂
Time to write some code and testing of the application.

Here is the full source code:

#include <errno.h>
#include <stdio.h>
#include <memory.h>
#include <net/if.h>
#include <arpa/inet.h>
#include <sys/socket.h>
#include <linux/rtnetlink.h>

// little helper to parsing message using netlink macroses
void parseRtattr(struct rtattr *tb[], int max, struct rtattr *rta, int len)
{
    memset(tb, 0, sizeof(struct rtattr *) * (max + 1));

    while (RTA_OK(rta, len)) {  // while not end of the message
        if (rta->rta_type <= max) {
            tb[rta->rta_type] = rta; // read attr
        }
        rta = RTA_NEXT(rta,len);    // get next attr
    }
}

int main()
{
    int fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);   // create netlink socket

    if (fd < 0) {
        printf("Failed to create netlink socket: %s\n", (char*)strerror(errno));
        return 1;
    }

    struct sockaddr_nl  local;  // local addr struct
    char buf[8192];             // message buffer
    struct iovec iov;           // message structure
    iov.iov_base = buf;         // set message buffer as io
    iov.iov_len = sizeof(buf);  // set size

    memset(&local, 0, sizeof(local));

    local.nl_family = AF_NETLINK;       // set protocol family
    local.nl_groups =   RTMGRP_LINK | RTMGRP_IPV4_IFADDR | RTMGRP_IPV4_ROUTE;   // set groups we interested in
    local.nl_pid = getpid();    // set out id using current process id

    // initialize protocol message header
    struct msghdr msg;  
    {
        msg.msg_name = &local;                  // local address
        msg.msg_namelen = sizeof(local);        // address size
        msg.msg_iov = &iov;                     // io vector
        msg.msg_iovlen = 1;                     // io size
    }   

    if (bind(fd, (struct sockaddr*)&local, sizeof(local)) < 0) {     // bind socket
        printf("Failed to bind netlink socket: %s\n", (char*)strerror(errno));
        close(fd);
        return 1;
    }   

    // read and parse all messages from the
    while (1) {
        ssize_t status = recvmsg(fd, &msg, MSG_DONTWAIT);

        //  check status
        if (status < 0) {
            if (errno == EINTR || errno == EAGAIN)
            {
                usleep(250000);
                continue;
            }

            printf("Failed to read netlink: %s", (char*)strerror(errno));
            continue;
        }

        if (msg.msg_namelen != sizeof(local)) { // check message length, just in case
            printf("Invalid length of the sender address struct\n");
            continue;
        }

        // message parser
        struct nlmsghdr *h;

        for (h = (struct nlmsghdr*)buf; status >= (ssize_t)sizeof(*h); ) {   // read all messagess headers
            int len = h->nlmsg_len;
            int l = len - sizeof(*h);
            char *ifName;

            if ((l < 0) || (len > status)) {
                printf("Invalid message length: %i\n", len);
                continue;
            }

            // now we can check message type
            if ((h->nlmsg_type == RTM_NEWROUTE) || (h->nlmsg_type == RTM_DELROUTE)) { // some changes in routing table
                printf("Routing table was changed\n");  
            } else {    // in other case we need to go deeper
                char *ifUpp;
                char *ifRunn;
                struct ifinfomsg *ifi;  // structure for network interface info
                struct rtattr *tb[IFLA_MAX + 1];

                ifi = (struct ifinfomsg*) NLMSG_DATA(h);    // get information about changed network interface

                parseRtattr(tb, IFLA_MAX, IFLA_RTA(ifi), h->nlmsg_len);  // get attributes
                
                if (tb[IFLA_IFNAME]) {  // validation
                    ifName = (char*)RTA_DATA(tb[IFLA_IFNAME]); // get network interface name
                }

                if (ifi->ifi_flags & IFF_UP) { // get UP flag of the network interface
                    ifUpp = (char*)"UP";
                } else {
                    ifUpp = (char*)"DOWN";
                }

                if (ifi->ifi_flags & IFF_RUNNING) { // get RUNNING flag of the network interface
                    ifRunn = (char*)"RUNNING";
                } else {
                    ifRunn = (char*)"NOT RUNNING";
                }

                char ifAddress[256];    // network addr
                struct ifaddrmsg *ifa; // structure for network interface data
                struct rtattr *tba[IFA_MAX+1];

                ifa = (struct ifaddrmsg*)NLMSG_DATA(h); // get data from the network interface

                parseRtattr(tba, IFA_MAX, IFA_RTA(ifa), h->nlmsg_len);

                if (tba[IFA_LOCAL]) {
                    inet_ntop(AF_INET, RTA_DATA(tba[IFA_LOCAL]), ifAddress, sizeof(ifAddress)); // get IP addr
                }

                switch (h->nlmsg_type) { // what is actually happenned?
                    case RTM_DELADDR:
                        printf("Interface %s: address was removed\n", ifName);
                        break;

                    case RTM_DELLINK:
                        printf("Network interface %s was removed\n", ifName);
                        break;

                    case RTM_NEWLINK:
                        printf("New network interface %s, state: %s %s\n", ifName, ifUpp, ifRunn);
                        break;

                    case RTM_NEWADDR:
                        printf("Interface %s: new address was assigned: %s\n", ifName, ifAddress);
                        break;
                }
            }

            status -= NLMSG_ALIGN(len); // align offsets by the message length, this is important

            h = (struct nlmsghdr*)((char*)h + NLMSG_ALIGN(len));    // get next message
        }

        usleep(250000); // sleep for a while
    }

    close(fd);  // close socket

    return 0;
}

The compilation is straightforward, nothing additional:

gcc netmon.c -o netmon

And run:

./netmon

Now you can try to play with your network interfaces – unplug and plug back of the Ethernet cable, reconnect WiFi, and so on.

You will get something like this:

It’s alive! 🙂

Data processing

In this example, you can find some new structures:

struct ifinfomsg
{
    unsigned char  ifi_family;  // interface family
    unsigned short ifi_type;    // device type
    int            ifi_index;   // interface index
    unsigned int   ifi_flags;   // device flags
    unsigned int   ifi_change;  // reserved, currently always 0xFFFFFFFF
};

struct ifinfomsg represents a network device and contains some useful fields, like device flags and index.

struct ifaddrmsg
{
    unsigned char  ifa_family;    // Adress type (AF_INET or AF_INET6)
    unsigned char  ifa_prefixlen; // Length of the network mask
    unsigned char  ifa_flags;     // Address flags
    unsigned char  ifa_scope;     // Address scope
    int            ifa_index;     // Interface index, same as in struct ifinfomsg
};

struct ifaddrmsg represents the network address assigned to the device

struct rtattr
{
    unsigned short rta_len; // Length of the option
    unsigned short rta_type; // Type of the option
    /* data */ 
}

struct rtattr is a helper structure used to store some parameters of the address or network link

After the successful creation of the Netlink socket, we initializing sockaddr_nl structure by setting a mask of the groups which messages we want to receive:
RTMGRP_LINK, RTMGRP_IPV4_IFADDR and RTMGRP_IPV4_ROUTE.
Also, at this point, we are allocating message structure and data buffer with a length of 8192 bytes.
After all of this, we can call bind() on a socket, subscribing to group events.
We get new messages from the socket in the infinity cycle and then parsing this message using Netlink macro.
Checking nlmsg_type field, we can detect the type of the received message. In the case of some interface/address event, we are digging deeper and getting all the interesting data.
All information is stored as an array of attributes with struct rtattr.
Using the little helper function parseRattr we can parse all attributes and extract readable information from this array.

struct ifinfomsg *ifi = (struct ifinfomsg*) NLMSG_DATA(h); // where h is netlink message header
parseRtattr(tb, IFLA_MAX, IFLA_RTA(ifi), h->nlmsg_len);
char* ifName = (char*)RTA_DATA(tb[IFLA_IFNAME]); // readable interface name, eth0 for example

You can check rtnetlink manual page to get more information about rtattr arrays and possible attributes indexes.

I believe that all other code in this example is pretty obvious and didn’t require detailed explanations.
But if you have some questions – please ask in the comments.

I hope this article will be helpful.

Additional materials:

  1. tools.ietf.org/html/rfc3549
  2. http://man7.org/linux/man-pages/man7/netlink.7.html
  3. http://man7.org/linux/man-pages/man7/rtnetlink.7.html
  4. http://linuxjournal.com/article/7356
Tagged , , ,

10 thoughts on “Monitoring Linux networking state using netlink

  1. Thanks for the article!
    There are a few weird not documented macros directly from linux kernel – how did you figure out what they are supposed to do? Seems like the only way to work with netlink without libs like libnl is to debug and pull things from iproute2…

    1. Hello! Yep, I spent a lot of the time trying to figure out how it supposed to work.
      I digged into the kernel and iproute2 sources, debugging and experimenting.

  2. NLMSG_DATA(h) is first casted to ifinfomsg and then ifaddrmsg in your code. Can you explain how it works? I thought we have ifinfomsg in case of NEW_LINK,DEL_LINK and ifaddrmsg in case of NEW_ADDR and DEL_ADDR

  3. Hello Oleg,
    Another question dear Oleg:). In function parseRtattr(), “h->nlmsg_len” is passed as “len” where it is the size of whole netlink message (nlmsghdr+ifinfomsg/ifaddrmsg+rtattrs). Then, this len is checked in RTA_OK and updated in RTA_NEXT macros. I think this size should be just size of rtattrs so that RTA_OK be valid.

  4. RTM_DELLINK event is not triggered in any case. Do you know why ? I thought it ll be triggered when I remove my cable but not

  5. Hi Oleg,
    Thanks for such an informative blogpost on netlink.
    Is there any specific post related on how to access nested attribute such as IFLA_LINKINFO.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.